POPPUR爱换

 找回密码
 注册

QQ登录

只需一步,快速开始

手机号码,快捷登录

搜索
查看: 3713|回复: 39
打印 上一主题 下一主题

IBM与洛斯阿拉莫斯国家实验室合作,研发Peta级超级计算机 CELL双精度升级版跃然纸上

[复制链接]
跳转到指定楼层
1#
发表于 2006-9-6 12:38 | 只看该作者 回帖奖励 |正序浏览 |阅读模式
IBM has won a bid to build a supercomputer called Roadrunner that will include not just conventional Opteron chips but also the Cell processor used in the Sony Playstation, CNET News.com has learned. The supercomputer, for the Los Alamos National Laboratory, will be the world's fastest machine and is designed to sustain a performance level of a "petaflop," or 1 quadrillion calculations per second, said U.S. Sen. Pete Domenici earlier this year. Bidding for the system opened in May, when a congressional subcommittee allocated $35 million for the first phase of the project, said Domenici, a Republican from New Mexico, where the nuclear weapons lab is located.

Now sources familiar with the machine have said that IBM has won the contract and that the National Nuclear Security Administration is expected to announce the deal in coming days. The system is expected to be built in phases, beginning in September and finishing by 2007 if the government chooses build the full petaflop system.

There's plenty of competition in the high-end supercomputing race, though. Japan's Institute of Physical and Chemical Research, called RIKEN, announced in June that it had completed its Protein Explorer supercomputer. The Protein Explorer reached the petaflop level, RIKEN said, though not using the conventional Linpack supercomputing speed test.

Representatives of IBM and Los Alamos declined to comment for this story. The NNSA, which oversees U.S. nuclear weapons work at Los Alamos and other sites, didn't immediately respond to a request for comment.

Hybrid supercomputers
The Roadrunner system, along with the Protein Explorer and the seventh-fastest supercomputer, Tokyo Institute of Technology's Tsubame system built by Sun Microsystems, illustrate a new trend in supercomputing: combining general-purpose processors with special-purpose accelerator chips.

"Roadrunner is emphasizing acceleration technologies. Coprocessor acceleration is intrinsic to that particular design," said John Gustafson, chief technology officer of start-up ClearSpeed Technologies, which sells the accelerator add-ons used in the Tsubame system. (Gustafson was referring to the Roadrunner project in general, not to IBM's winning bid, of which he disclaimed knowledge.)

IBM's BladeCenter systems are amenable to the hybrid approach. A single chassis can accommodate both general-purpose Opteron blade servers and Cell-based accelerator systems. The BladeCenter chassis includes a high-speed communications links among the servers, and one source said the blades will be used in Roadrunner.

Advanced Micro Devices' Opteron processor is used in supercomputing "cluster" systems that spread computing work across numerous small machines joined with a high-speed network. In the case of Roadrunner, the Cell processor, designed jointly by IBM, Sony and Toshiba, provides the special-purpose accelerator.

Cell originally was designed to improve video game performance in the PlayStation 3 console. The single chip's main processor core is augmented by eight special-purpose processing cores that can help with calculations such as simulating the physics of virtual worlds. Those engines also are amenable to scientific computing tasks, IBM has said.

Using accelerators "expands dramatically" the amount of processing a computer can accomplish for a given amount of electrical power, Gustafson said.

"If we keep pushing traditional microprocessors and using them as high-performance computing engines, they waste a lot of energy. When you get to the petascale regions, you're talking tens of megawatts when using traditional x86 processors" such as Opteron or Intel's Xeon, he said.

"A watt is about a dollar a year if you have the things on all the time," so 10 megawatts per year equates to $10 million in operating expenses, Gustafson said.

A new partnership
The Los Alamos-IBM alliance is noteworthy for another reason as well. The Los Alamos lab has traditionally favored supercomputers from manufacturers other than IBM, including Silicon Graphics, Compaq and Linux Networx. Its sister lab and sometimes rival, Lawrence Livermore, has had the Big Blue affinity, housing the current top-ranked supercomputer, Blue Gene/L.

Los Alamos also houses earlier Big Blue behemoths such as ASC Purple, ASCI White and ASCI Blue Pacific. (ASCI stood for the Accelerated Strategic Computing Initiative, a federal effort to hasten supercomputing development to perform nuclear weapons simulation work, but has since been modified to the Advanced Simulation and Computing program.)

Blue Gene/L has a sustained performance of 280 teraflops, just more than one-fourth of the way to the petaflop goal.
The U.S. government has become an avid supercomputer customer, using the machines for simulations to ensure nuclear weapons will continue to work even as they age beyond their original design lifespans. Such physics simulations have grown increasingly sophisticated, moving from two to three dimensions, but more is better. Los Alamos expects Roadrunner will increase the detail of simulations by a factor of 10, one source said.

For twice-yearly ranking of supercomputers called the Top500 list, computers are ranked on the basis of a benchmark called Linpack that measures how many floating-point operations per second--"flops"--it can perform. Linpack is a convenient but incomplete representation of a machine's total ability, but it's nevertheless widely watched.

IBM has dominated the Top500 list with its Blue Gene/L supercomputing designs. But U.S. models haven't always led, and there's been some international rivalry: A Japanese system, NEC's Earth Simulator, topped the list for years.

IBM and petaflop computing are no strangers. Although customers can buy the current Blue Gene/L systems or rent their processing power from IBM, Blue Gene actually began as a research project in 2000 to reach the petaflop supercomputing level.
40#
 楼主| 发表于 2006-9-29 19:54 | 只看该作者
到下一代GPU我们会看到定点数和双精度的支持。
回复 支持 反对

使用道具 举报

potomac 该用户已被删除
39#
发表于 2006-9-29 19:10 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

38#
发表于 2006-9-29 18:37 | 只看该作者
有没有以后用显示芯片算浮点的可能,x1900的浮点不也是很无敌吗?没有程序能利用吗?
回复 支持 反对

使用道具 举报

37#
发表于 2006-9-29 12:58 | 只看该作者
原帖由 Prescott 于 2006-9-6 15:59 发表


既然用Cell作主要的FLOPS来源,也许就直接单Opteorn刀片就可以了,用不着双Opteon,也用不到双核心的Opteron。这样一个Opteron可以接3个Cell,而不是两个Opteon接4个。3个Cell加一个Opteorn的刀片也够复杂了。 ...

infiniband
这个已经不先进了。。

估计有其他的
回复 支持 反对

使用道具 举报

36#
发表于 2006-9-29 12:42 | 只看该作者
为什么这里的人好象都比IBM还要牛叉似的
回复 支持 反对

使用道具 举报

35#
发表于 2006-9-7 17:56 | 只看该作者
它所Opteron和Cell数量差不多

这平均下来是1个带1个
回复 支持 反对

使用道具 举报

34#
 楼主| 发表于 2006-9-7 17:29 | 只看该作者
原帖由 potomac 于 2006-9-7 17:25 发表

不能不计吧。这个算的总数,更何况PPE带的VMX不弱哦。


2-issue in-order,还要管理DMA初始化,又要分开编写程序和用不同的编译器,太不值得了。
回复 支持 反对

使用道具 举报

potomac 该用户已被删除
33#
发表于 2006-9-7 17:25 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

32#
 楼主| 发表于 2006-9-7 17:21 | 只看该作者
原帖由 potomac 于 2006-9-7 16:21 发表
你算的是8个SPE的浮点,PPE本身的没算。


PPE一般是忽略不计的。
回复 支持 反对

使用道具 举报

potomac 该用户已被删除
31#
发表于 2006-9-7 16:21 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

30#
发表于 2006-9-7 15:30 | 只看该作者
:wacko: 16000也太夸张了

超级计算机就是钱砸出来的
回复 支持 反对

使用道具 举报

29#
 楼主| 发表于 2006-9-7 15:13 | 只看该作者
well,现在有更加准确的情报了。

WASHINGTON, DC - 06 Sep 2006: The U.S. Department of Energy's National Nuclear Security Administration (NNSA) has selected IBM to design and build the world's first supercomputer to harness the immense power of the Cell Broadband Engine™ (Cell B.E.) processor aiming to produce a machine capable of a sustained speed of up to 1,000 trillion calculations per second, or one petaflop.


The 'hybrid' supercomputer, codenamed Roadrunner, will be installed at DOE's Los Alamos National Laboratory. In a first-of-a-kind design, Cell B.E. chips -- originally designed for video game platforms -- will work in conjunction with systems based on x86 processors from Advanced Micro Devices, Inc. (AMD).


Designed specifically to handle a broad spectrum of scientific and commercial applications, the supercomputer design will include new, highly sophisticated software to orchestrate over 16,000 AMD Opteron™ processor cores and over 16,000 Cell B.E. processors in tackling some of the most challenging problems in computing today. The revolutionary supercomputer will be capable of a peak performance of over 1.6 petaflops (or 1.6 thousand trillion calculations per second).


The machine is to be built entirely from commercially available hardware and based on the Linux® operating system. IBM® System x™ 3755 servers based on AMD Opteron technology will be deployed in conjunction with IBM BladeCenter® H systems with Cell B.E. technology. Each system used is designed specifically for high performance implementations.
Designed also with space and power consumption issues in mind, the system will employ advanced cooling and power management technologies and will occupy only 12,000 square feet of floor space, or approximately the size of three basketball courts.


New Era of Industry Supercomputing
Roadrunner's construction will involve the creation of advanced "Hybrid Programming" software which will orchestrate the Cell B.E.-based system and AMD system and will inaugurate a new era of heterogeneous technology designs in supercomputing. These innovations, created collaboratively among IBM and LANL engineers will allow IBM to deploy mixed-technology systems to companies of all sizes, spanning industries such as life sciences, financial services, automotive and aerospace design.


How it Works
Roadrunner's hybrid design will allow the system to segment complex mathematical equations, routing each segment to the part of the system that can most efficiently handle it. Typical compute processes, file IO, and communication activity will be handled by AMD Opteron processors while more complex and repetitive elements -- ones that traditionally consume the majority of supercomputer resources -- will be directed to the more than 16,000 Cell B.E. processors. Designed originally for gaming platforms, where intense graphics and real-time responsiveness are key, the Cell B.E. processor is ideal to speed Roadrunner through intense mathematical problems.


"This new supercomputer demonstrates a commitment to achieve a major advance in technological capability that will help enable scientists and businesses solve the most challenging problems," said Bill Zeitler, senior vice president, IBM Systems and Technology Group. "Los Alamos is a valued partner as we embark on this exciting journey."


"This installation with Los Alamos and IBM demonstrates the compelling benefits from industry leaders innovating around an open platform; in this case IBM and AMD collaborating in the use of AMD Opteron and the Cell B.E. processor to build powerful systems for highly specific Los Alamos Labs workloads," said Marty Seyer, senior vice president, Commercial Segment, AMD. "This is an excellent demonstration of Torrenza in action -- building on the performance and performance-per-watt advantages AMD delivers to create incredible value in leveraging HyperTransport technology to redefine how different systems, based on different processor platforms, can communicate with each other to solve some of the most complex computing problems."


IBM will begin shipping the new supercomputer to the DOE facility at the Los Alamos National Laboratory later this year, with completion of the installation and acceptance anticipated in 2008.


Based on the Power Architecture™, the Cell B.E. processor was developed in collaboration with IBM, Sony Corporation, Sony Computer Entertainment Inc. (Sony and Sony Computer Entertainment collectively referred to as Sony Group), and Toshiba Corporation.

在这另一篇官方新闻中,我们可以看到一些这样的数字:
1、这台超级计算机中安装的Opteron和CELL的数量均分别超过1万6000枚。
2、理论峰值为1.6 PFLOPS。
3、机器会在2008年完成安装。

我们目前已经知道的一些事实:
假设K8为Rev.G即FP单元保持目前的样子,双精度:2.8GHz*2FLOPS/Cycle*2 Core= 11.2 GFLOPS。 16000枚2.8GHz Opteron的双精度运算能力是 179200 GFLOPS = 179.2 TFLOPS

假设CELL依然是05年11月的DD3.1版本,单精度:2.8GHz*8FLOPS/Cycle*8 Core= 179.2 GFLOPS,16000枚2.8GHz DD3.1 CELL的单精度浮点能力是:2867200 GFLOPS = 2867.200 TFLOPS = 2.87 PFLOPS, DD3.1的CELL双精度性能只有单精度性能的1/10,也就是说16000枚DD3.1版CELL组成的双精度性能大约是0.28 PFLOPS。

如果都是按照目前的K8 dual core和CELL合并起来,性能只有0.47 PFLOPS。

很显然,如果采用目前16000枚K8 dual core和16000枚目前的CELL都是无法达成计划中的1.6 PFLOPS目标。



根据目前AMD和IBM在官方文件中透露的信息,这台机器真正采用的CELL很可能实际上是CELL DP或者说CELL双精度增强版,双精度性能可以做到单精度的1/2(相当于大多数目前通用处理器的单精度/双精度性能比例),这时候的情况就是:

65nm Rev.G K8 dual core: 2.8GHz*2FLOPS/Cycle*2 Core= 11.2 GFLOPS * 16000 = 179200 GFLOPS = 179.2 TFLOPS

CELL DP:2.8GHz*8FLOPS/Cycle*8 Core= 179.2 GFLOPS,16000枚CELL DP单精度性能 = 2867200 GFLOPS = 2867.200 TFLOPS = 2.87 PFLOPS,双精度性能为1.4336 PFLOPS。

这样,16000枚CELL DP + 16000枚 65nm Rev.G K8 dual core可以做到1.4336+0.1792 = 1.612 PFLOPS

如果opteron采用的是K8L级别的双核版,上面的1.612 PFLOPS可以变成1.792 PFLOPS。
回复 支持 反对

使用道具 举报

28#
发表于 2006-9-7 12:33 | 只看该作者
原帖由 BOSS 于 2006-9-7 10:06 发表
pci出了个世界级的超级电脑设计师



:funk::funk::funk::funk: 应该是银河系级别的
回复 支持 反对

使用道具 举报

27#
发表于 2006-9-7 10:06 | 只看该作者
pci出了个世界级的超级电脑设计师
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
26#
发表于 2006-9-7 09:47 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

25#
发表于 2006-9-6 22:30 | 只看该作者
35M USD的话,应该会用INFINIBAND吧?

估计可能会512节点的样子,传说中的IBM不会那么菜的
回复 支持 反对

使用道具 举报

24#
发表于 2006-9-6 19:03 | 只看该作者
原帖由 roadrunner 于 2006-9-6 17:22 发表
如果只是为了凑1P, 干脆纯CELL算了, 扯OPTERON进来干嘛


定500台PS3,接上电话线,OK搞定!

剩下的钱去度假
回复 支持 反对

使用道具 举报

23#
发表于 2006-9-6 17:22 | 只看该作者
如果只是为了凑1P, 干脆纯CELL算了, 扯OPTERON进来干嘛
回复 支持 反对

使用道具 举报

22#
发表于 2006-9-6 17:17 | 只看该作者
纯搞笑:

凑1P单精度,最简单的方法(关键是做个新的CELL的PCB板卡)

直接用4 opteron的板子, 拔掉3个opteron,只留一个CPU

做个特别的PCB卡(卡上焊cell和HTT桥等等), 插在拔掉CPU的CPU插座上

--> 迅速产生: 一个opteron & 3个cell的节点

N个节点,通过网卡、以太网连接,形成集群。



XXX T的超级计算器 产生了, 在国内可以...几千万的国家经费了-- 话说: 纯峰值很高, 又省电, 还便宜、低成本,跨时代的大飞跃。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

广告投放或合作|网站地图|处罚通告|

GMT+8, 2025-7-18 07:56

Powered by Discuz! X3.4

© 2001-2017 POPPUR.

快速回复 返回顶部 返回列表