|
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cycle—the equivalent of 10 teraflops on a chip. A chip with just eight of the cores would someday power a handset, Dally said.
The Echelon chip packs just twice as many cores as today's high-end Nvidia GPUs. However, today's cores handle just one double precision floating-point operation per cycle, compared to four for the Echelon chip.
Many of the advances in the chip come from its use of memory. The Echelon chip will use 256 Mbytes of SRAM memory that can be dynamically configured to meet the needs of an application.
For example, the SRAM could be broken up into as many as six levels of cache, each of a variable size. At the lowest level each core would have its own private cache.
The goal is to get data as close to processing elements as possible to reduce the need to move data around the chip, wasting energy. Thus SMs would have a hierarchy of processor registers that could be matched to locations in cache levels. In addition, the chip would have broadcast mechanisms so that the results of one task could be shared with any nodes that needed that data.
我简述一下:
Echelon 拥有 128 个 SM,每个 SM 有 8 个 "core",共计上千个内核,两倍于目前的 Fermi,但是 Echelon 每个 "core" 的双精度计算能力是 Fermi 的 4 倍!
片上 SRAM 有 256MB,可按应用分配,例如可以分成 6 级 cache,每一级的容量均可不相同,最靠近 SP 的一级可以作为私有内存使用。SM 寄存器层次方面,可以对应 cache 层级所在。
Echelon的广播机制允许 task 的结果能够被任何需要的节点共享。
它将采用 CPU ISA。
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有帐号?注册
x
|