英特尔 Larrabee 体系架构讨论主题

Edison · 发表于 2007-10-5 14:28

本主题是专门为讨论Larrabee开设的，因此要求相关的讨论以技术为主。

Larrabee目前确定的信息：

1、Intel的Doug Carmean为Larrabee的首席架构师

2、目前唯一发布过Larrabee细节场合是在斯坦福大学的CS448教程上，由Doug Carmean主讲的"Intel Larrabee"，但是相关的演讲文件因为NDA的缘故并没有公开下载；

3、目前有两份和Larrabee相关的幻灯片文件，一份是TerPro-Aject首席架构师Ed Davis的"Tera Tera Tera"，另一份为惠普高性能计算部Richard Kaufmann的"HP & PetaFLOPS"；

4、按照"Tera Tera Tera"的介绍，Larrabee的架构将如下的：

频率：                                           1.7GHz~2.5GHz
内核数：                                        16~24内核 in-order x86 ISA，4线程/内核
每核单周期双精度运算能力：                      Non-SSE：2
                                                w/SSE：8-16
每核单周期整数运算：                            ???
每内核cache容量, 时延：                         L1 32KB, 1 clock
                                                L2 256KB, 10 clock
                                                L3 没有
                                                64-byte cache-line
内核互联总线：                                  256byte/cycle Ring环路
Ring环路时延：                                  ???
内存：                                           1~2GB 128GB/s GDDR/FastDRAM
设备总线带宽：                                  QPI, 17GB/s/link, 时延50ns
峰值：                                           14~40 GF/core, 0.2-1.0TF/processor

5、按照HP & PetaFLOPS的介绍，内核的细节基本上和"Tera Tera Tera"类似，但是在内核数量和内存带宽上有不少的提升：

频率：                                           4GHz
内核数：                                           32内核 in-order x86 ISA, 4线程/内核
每核单周期双精度运算能力：                         Non-SSE：2
                                                   w/SSE：8-16
每核单周期整数运算：                               ???
每内核cache容量,  时延：                         L1 32KB, 1 clock
                                                L2 256KB, 10 clock
                                                L3 没有
                                                64-byte cache-line
内核互联总线：                                     256byte/cycle Ring环路
Ring环路时延：                                     ???
内存：                                           1~2GB 192GB/s GDDR/FastDRAM
设备总线带宽：                                     QPI, 17GB/s/link, 时延50ns
峰值：                                           ~2TF/processor

5、将会在2008年提供演示，可能推出的时间在2009年或者2010年；

6、针对的市场主要是高端图形以及高性能计算机（HPC），适用于Jpeg纹理、物理加速、抗锯齿、AI强化、光线追踪等；

7、非游戏图形开发方面有可能采用Intel称之为Ct的API。[更新，现在基本确定Larrabee的SDK暂名为"Native SDK"]

~~~~~~
大家可以就以下话题展开讨论：
1、x86 ISA采用in-order流水线是否由于先天性的缺陷而导致性能受到严重抑制
2、Larrabee和Cell、G8X、R600在架构上的特点、差异以及由此可能引出的应用、性能差别
3、你对Larrabee的前景有何看法
4、你希望Larrabee能在哪些方面作适度的改进

~~~~~~
参与讨论的时候请注意：
1、请不要把其他网站的新闻照抄过来，如果你需要大家关注其内容，只需要把链接和部分关键的段落提供，照搬的内容我们会予以删除。
2、与上面或者其他网友提供的信息重复或者重叠的内容请不要再引用。
3、请注意网络礼节。

更新：
2008年6月3日，Siggraph 2008出现了名为"Larrabee: A Many-Core x86 Architecture for Visual Computing"的专题讲座，出席的人员包括了：
Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth (Intel Corporation), Michael Abrash (RAD Game Tools), Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan (Intel Corporation), Pat Hanrahan (Stanford University)

内容概述：
This paper introduces the Larrabee many-core visual computing architecture (a new software rendering pipeline implementation), a many-core programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as fixed-function co-processors. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads and greatly increases the flexibility and programmability of the architecture as compared to standard GPUs.

会议召开时间为 8 月 12 日。

文件释出：
http://softwarecommunity.intel.c ... rrabee_manycore.pdf

2009 年的 GDC 09 上 Intel 将公布 LRBni ISA 的细节：

https://www.cmpevents.com/GD09/a ... =11&SessID=9139

SIMD Programming with Larrabee: A Second Look at the Larrabee New Instructions (LRBni) in Action
Speaker: Tom Forsyth (Programmer, Intel)
Date/Time: TBD
Track: Programming
Format: 60-minute Lecture
Experience Level: All

Session Description
Larrabee is Intel's revolutionary approach to take the current evolving programmability of the GPGPU to its logical end. The Larrabee architecture features many cores and threads, as well as a new vector instruction-set extension, the Larrabee new instructions (LRBni).

This talk follows Michael Abrash's first glimpse into LRBni and examines the programming methods and hardware instructions that help programmers get the most out of LRBni's extremely wide vector units. Starting with simple math examples that are fairly simple to vectorize, it moves through loops, conditionals, and more complex flow control, showing how to implement these algorithms in LRBni.

Next, the numberous choices of data format are examined - when to use SOA or AOS (and what those terms mean!), and how to use gather/scatter most efficiently from the same data structures used in an existing engine.

Finally, there is a quick look at efficient code scheduling and how to use the multiple hardware threads to help absorb instruction latencies.

Takeaway
The attendees will learn about the latest processor architecture from Intel, and the instruction set used to program it. Understanding how this architecture and instruction set works will give the attendee information on how to design the next iteration of their game engine, and the possibilities available when programming Larrabee natively.

Intended Audience and Prerequisites
Programmers will get the most from this talk, although it will be of interest to anyone interested in the nature of Larrabee, and the reasons why processor architecture is evolving in Larrabee's direction.

Rasterization on Larrabee: A First Look at the Larrabee New Instructions (LRBni) in Action
Speaker: Michael Abrash (Programmer, Rad Game Tools)
Date/Time: TBD
Track: Programming
Format: 60-minute Lecture
Experience Level: All

Session Description
Larrabee is Intel's revolutionary approach to take the current evolving programmability of the GPGPU to its logical end. The Larrabee architecture features many cores and threads, as well as a new vector instruction-set extension, the Larrabee new instructions (LRBni).

This talk will provide an overview of LRBni and discusses the major instruction features - 16-wide SIMD, multiply-add, ternary instructions, predication, built-in data-format conversion, and gather/scatter.

The talk will then take a close look at a specific - and not obviously vectorizable - application of LRBni - rasterization. This is a crucial stage in the Larrabee rendering pipeline, and it demonstrates how developers can use the flexibility of the new instruction set to solve problems that are not obviously shader-like.

Takeaway
The attendees will learn about the latest processor architecture from Intel, and the instruction set used to program it. Understanding how this architecture and instruction set works will give the attendee information on how to design the next iteration of their game engine, and the possibilities available when programming Larrabee natively.

Intended Audience and Prerequisites
Programmers will get the most from this talk, although it will be of interest to anyone interested in the nature of Larrabee, and the reasons why processor architecture is evolving in Larrabee's direction.

三毛妮 · 发表于 2007-10-5 14:33

没有L3缓存有有什么影响？

Edison · 发表于 2007-10-5 14:58

人们引入cache主要是因为所谓的局部性原理，即"最近使用的数据可能会被再次使用"的时间局部性和"最近使用过的cache-line中的其他字节可能会在不久后使用"的空间局部性。

对于Larrabee的应用来说，虽然时间局部性的作用不大，但是空间局部性还是比较高的，所以人们倾向于作小而简单的cache。level越低的存储层次，容量必须越大才有意义，但是要知道，Larrabee的L2 cache已经是256KB*32=8mb了，L3岂不是要32MB以上？这么大容量cache不仅成本相当高，而且可能会造成cache一致性实现的复杂度暴跳。另外，Larrabee的GDDR内存带宽有192GB/s，某种程度上足以充当L3了。

lemonninja · 发表于 2007-10-5 15:43

其实我觉得这个东西就是INTEL版本的CELL

其用途和CELL非常相似

justkick · 发表于 2007-10-5 15:59

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2007-10-5 16:51

在DirectX中本来就有一个参考渲染器，能把shader转换x86指令。

oakville · 发表于 2007-10-5 17:55

不太懂:(
我只想知道什么价w00t)

晶晶守护神 · 发表于 2007-10-5 18:02

提示: 作者被禁止或删除内容自动屏蔽

aeondxf · 发表于 2007-10-5 18:35

为什么larrabee不用IA64呢？在CORE晶体管数目相当的情况下应该可以提供更高一点的性能，而频率也不需要那么高。EPIC+OPENML应该是一个很极品的组合啊……
记得04还是05年IDF上INTEL曾公布过一张概念图，一块IC上中间是几个大核心，周围环绕着十数个小核心，再外边就是CACHE，LARRABEE和这个有没有承上的关系？

xiaolongzi · 发表于 2007-10-5 21:11

我觉得Larrabee的架构很灵活。有着CPU的传统。又有GPU的功能。相信Larrabee能成功。

shu0202 · 发表于 2007-10-5 21:20

个人以为高速L1足够用，对图形渲染来说，要么就搞个超大的缓存，L2看不出有什么必要。

shu0202 · 发表于 2007-10-5 21:26

不知道Intel是不是有跳过D3D和OpenGL的野心。直接的完全编程结构恐怕现在会浪费大量的运算能力，D3D-X86效率恐怕是大问题，运算能力再强大也是扯。除非Intel打算软硬件一起抓另起炉灶。

shu0202 · 发表于 2007-10-5 21:31

而且只靠这个运算机构无法完成图形渲染到结果输出的全部过程吧？

slice · 发表于 2007-10-5 21:51

:loveliness:
我是个外行，说错了大家不要笑我。
由于较强的通用性容易得到软件的良好支持支持，3DMAX或 maya等等最后渲染是不是也可以获益匪浅。目前这些还是靠CPU，啥专业显卡一点帮助也没有。
Larrabee针对的或许更本就不是我们认为的显卡或游戏卡领域。

mm.pcinlife.com · 发表于 2007-10-5 22:00

:funk: "Tera Tera Tera”看成“Tora Tora Tora"了

Prescott · 发表于 2007-10-5 22:02

这个是革命性的东东，至于会革了谁的命，谁又知道呢。

cxj3000 · 发表于 2007-10-5 22:13

我认为其背后隐藏着Intel更大的狼子野心，企图用X86统一所有平台的运算！

complexmind · 发表于 2007-10-5 22:54

原帖由 slice 于 2007-10-5 09:51 PM 发表 <a href="http://we.pcinlife.com/redirect.php?goto=findpost&pid=15497221&ptid=828668" target="_blank"><img src="http://we.pcinlife.com/images/common/back.gif" border="0" onload="if(this.width>screen.width*0.7) {this.resized=true; this.width=screen.width*0.7; this.alt='Click here to open new windownCTRL+Mouse wheel to zoom in/out';}" onmouseover="if(this.width>screen.width*0.7) {this.resized=true; this.width=screen.width*0.7; this.style.cursor='hand'; this.alt='Click here to open new windownCTRL+Mouse wheel to zoom in/out';}" onclick="if(!this.resized) {return true;} else {window.open('http://we.pcinlife.com/images/common/back.gif');}" onmousewheel="return imgzoom(this);" alt="" /></a> 
<img src="images/smilies/loveliness.gif" smilieid="60" border="0" alt="" /> 
我是个外行，说错了大家不要笑我。 
由于较强的通用性容易得到软件的良好支持支持，3DMAX或 maya等等最后渲染是不是也可以获益匪浅。目前这些还是靠CPU，啥专业显卡一点帮助也没有。 
Larrabee针 ...

同意啊，Cell其实对游戏没什么用的。。。。。。。。。

Edison · 发表于 2007-10-6 01:15

原帖由 slice 于 2007-10-5 21:51 发表
:loveliness:
我是个外行，说错了大家不要笑我。
由于较强的通用性容易得到软件的良好支持支持，3DMAX或 maya等等最后渲染是不是也可以获益匪浅。目前这些还是靠CPU，啥专业显卡一点帮助也没有。
Larrabee针 ...

这类软件应该是可以马上就能从Larrabee上获益的。

nanshan · 发表于 2007-10-6 01:53

我看，主要有几点问题，决定了这个构架的性能

1. 多内核下进程的调度和软件的支持，包括接口
2. 多内核下面对cache的使用，共享还是独立，怎么样的调度算法
3. intel是否打算用risc构架逐步取代现有的构架，如果intel始终要保留对原有构架的支持，那就看他如何分析专业和民用的市场
4. 价格，性价比的问题

帐号		自动登录	找回密码
密码			注册

justkick justkick 当前离线积分 1 IP卡狗仔卡头像被屏蔽	5^# 发表于 2007-10-5 15:59 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
justkick justkick 当前离线积分 1 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

晶晶守护神晶晶守护神当前离线积分 21 IP卡狗仔卡头像被屏蔽	8^# 发表于 2007-10-5 18:02 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
晶晶守护神晶晶守护神当前离线积分 21 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

英特尔 Larrabee 体系架构讨论主题

浏览过的版块