ISSCC 2007 STI 65纳米Cell/Intel的TFLOPS芯片

Edison · 发表于 2006-11-30 09:57

Intel

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10×8 2D array
of floating-point cores and packet-switched routers, operating at 4GHz. The 15-FO4
design employs mesochronous clocking, fine-grained clock gating, dynamic sleep
transistors, and body-bias techniques. The 65nm 100M transistor die is designed to
achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

Implementation of the 65nm Dual-Core 64b Merom Processor

Merom is a dual-core 64b processor implementing the CoreTM architecture. The 143mm2
die has 291M transistors in a 65nm 8M process. The shared 4MB 16-way L2 cache
uses PMOS power gating to minimize leakage. The processor operates in a wide core
frequency range of 1 to 3GHz, a bus frequency range of 666 to 1333MHz and voltage
range of 0.85 to 1.325V, while providing 40% better power performance.

IBM

Design of the POWER6™ Microprocessor

The POWER6™ microprocessor combines ultra-high frequency operation, aggressive
power reduction, a highly scalable memory subsystem, and mainframe-like reliability,
availability, and serviceability. The 341mm2 700M transistor dual-core microprocessor
is fabricated in a 65nm SOI process with 10 levels of low-k copper interconnect. It
operates at clock frequencies over 5GHz in high-performance applications, and
consumes under 100W in power-sensitive applications.

A Distributed Critical-Path Timing Monitor for a 65nm High-Performance
Microprocessor

A distributed critical-path timing monitor (CPM) is designed as part of the POWER6TM
microprocessor in 65nm SOI. The CPM is capable of monitoring timing margin, process
variation, localized noise and VDD droop, or clock stability. It tracks critical-path delay to
within 3 FO2 delays at extreme operating voltages with a standard deviation less than
1/2 an FO2 delay. The CPM detects DC VDD droops greater than 10mV and tracks timing
changes greater than 1 FO2 delay.

AMD

An Integrated Quad-Core OpteronTM Processor

An integrated quad-core x86 processor is implemented in a 65nm 11M SOI CMOS
process. Based on an enhanced OpteronTM core, the SoC-developed processor employs
power- and thermal-management techniques throughout the design. The SRAM cache
designs target process variation considerations and future process scalability. A
DDR2/DDR3 combo-PHY and HT3 I/Os provide high-bandwidth interfaces.

SONY/Toshiba/IBM

Implementation of the CELL Broadband Engine™ in a 65nm SOI Technology
Featuring Dual-Supply SRAM Arrays Supporting 6GHz at 1.3V

The 65nm CELL Broadband Engine™ design features a dual power supply, which
enhances SRAM stability and performance using an elevated array-specific power
supply, while reducing the logic power consumption. Hardware measurements
demonstrate low-voltage operation and reduced scatter of the minimum operating
voltage. The chip operates at 6GHz at 1.3V and is fabricated in a 65nm CMOS SOI
technology.

Sun

An 8-Core 64-Thread 64b Power-Efficient SPARC SoC

The 8-core 64-thread 64b power-efficient 2nd-generation Niagara SPARC SoC has 4MB
L2 cache with one x8 PCI-Express, two 10G Ethernet (XAUI), and 8 FBDIMM ports. The
on-chip SerDes provide greater than 1Tb/s bandwidth. The 500M transistor chip with a
die size of 342mm2 is implemented in a 11M 65nm triple-Vt CMOS process

elisha · 发表于 2006-11-30 12:15

很感叹ibm的火星科技---说实话还是有点不信

Eji · 发表于 2006-12-2 01:30

well，當初90nm的Cell雖然可以跑4.6GHz，結果blade server跑2.4GHz，PS3跑3.2GHz還關掉一個SPE。
所以即使現在65nm可以跑6GHz，實際產品還是應該停留在那個地方啦....

Edison · 发表于 2006-12-2 12:29

Cell Blade是3.2GHz的。
Cell BE Blade 1 - Rev 3 (3Q06)
• Toshiba Southbridge, Infiniband Port, 3.2 GHz
• 1 GB XDRTM memory, Volume ramp in 2Q06
• System build via E&TS and e1350 process

只看该作者 · 发表于 2006-12-2 13:57

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2006-12-2 14:11

http://news.zdnet.co.uk/hardware/0,1000000091,39283631,00.htm

http://www.intel.com/technology/techresearch/terascale/index.htm

ftp://download.intel.com/research/platform/sp/hl_wp1.pdf

从原型芯片的die shot来看，并没有为每个芯片配置"大型"、"共享"的I/O电路，有可能是用堆叠封装的内存+光逻辑。

Edison · 发表于 2006-12-2 14:18

"“When combined with our recent breakthroughs in silicon photonics, these experimental chips address the three major requirements for tera-scale computing – teraOPS of performance, terabytes-per-second of memory bandwidth, and terabits-per-second of I/O capacity,” said Rattner. “While any commercial application of these technologies is years away, it is an exciting first step in bringing tera-scale performance to PCs and servers.”

Unlike existing chip designs where hundreds of millions of transistors are uniquely arranged, this chip’s design consists of 80 tiles laid out in an 8x10 block array. Each tile includes a small core, or compute element, with a simple instruction set for processing floating-point data, but is not Intel Architecture compatible. The tile also includes a router connecting the core to an on-chip network that links all the cores to each other and gives them access to memory.

The second major innovation is a 20 megabyte SRAM memory chip that is stacked on and bonded to the processor die. Stacking the die makes possible thousands of interconnects and provides more than a terabyte-per-second of bandwidth between memory and the cores.

Rattner demonstrated a third major innovation, the recently announced Hybrid Silicon Laser chip developed in collaboration with researchers at University of California, Santa Barbara. With this breakthrough, dozens or maybe hundreds of Hybrid Silicon Lasers could be integrated with other silicon photonic components onto a single silicon chip. This could lead to a terabit-per-second optical link capable of speeding terabytes of data between chips inside computers, between PCs, and between servers inside data centers."

http://www.intel.com/pressroom/a ... /20060926corp_b.htm

Prescott · 发表于 2006-12-2 15:11

:loveliness:

只看该作者 · 发表于 2006-12-2 15:54

提示: 作者被禁止或删除内容自动屏蔽

只看该作者 · 发表于 2006-12-2 16:13

提示: 作者被禁止或删除内容自动屏蔽

Eji · 发表于 2006-12-2 19:26

....既然ISA不同，感覺上定位有點像是co-processor array....
當然未來可能可以透過Execution Layer之類的方式來吸收。

Edison · 发表于 2007-1-19 14:28

最新的消息指出，INTEL的这个Polaris采用的是"each with its own local, programmer-managed memory"，换句话说，在编程上Polaris的难度可能不低于Cell。

Prescott · 发表于 2007-2-14 17:43

这个编程难度哪是不低于Cell，比Cell难多了。:mad:

所以只能做做研究用，明显就是拿去挑战编译器和函数库开发人员的智力用的。

只看该作者 · 发表于 2007-2-26 17:38

提示: 作者被禁止或删除内容自动屏蔽

帐号		自动登录	找回密码
密码			注册

RacingPHT 该用户已被删除	5^# 发表于 2006-12-2 13:57 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	9^# 发表于 2006-12-2 15:54 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	10^# 发表于 2006-12-2 16:13 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	14^# 发表于 2007-2-26 17:38 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

ISSCC 2007 STI 65纳米Cell/Intel的TFLOPS芯片

本帖子中包含更多资源