INQ: Alpha EV7总线变身CSI，浮点性能Nehalem将击败Power6

Prescott · 发表于 2007-2-28 12:45

INQ顺便又唱衰了一下Itanium。:mad:

明年这个时候就知道是真是假了 :p

http://www.theinquirer.net/default.aspx?article=37891

CSI is Itanic's final final throw of the dice

Analysis To be or to be iceberged

THESE DAYS, we don't hear much about the Itanic - after all, do we need to? Intel is confident of a decent performance position in the current 1S,2S market and across the 1S to 4S server space (some 95 per cent of the market, after all) once the Tigerton CPU with quad-FSB Caneland chipset comes out. The Core 2 family has pretty good integer and FP performance, the FSB still has a bit of speed potential (an official1600MHz is not impossible, to my mind), and AMD isn't exactly experiencing perfect execution currently.

The CSI - common system interconnect (or call it coherent scalable interconnect, if you wish) - was originally associated with the Tukwilla Itanium generation. One that, in some now very old roadmaps,would have been out in the market right now. It would have been a nice example of good, year-1999 Alpha EV7 interface technology finally coming out on an Intel platform several years after the brutal Alphacide. But, oh well. EV7's simpler version, HyperTransport, has been doing exactly that already for the past three years anyway.

However, Tukwilla will, at best, appear sometime (late?) next year and,in the meantime, the same period, the "Nehalem" 2008 generation of Intel's X86 entries will also have CSI inside.

We can assume an even faster core (better integer, improved FP to compete against AMD Barcelona, a new implementation of hyper-threading), or to be precise four of them on a die. These will be combined with even larger caches, a multichannel DDR3 on-chip memory controller and, yes, the same very-fast 6.4Gbps-per-pin - at least on the high-end parts - CSI interconnect. Most probably four of them per chip in the now-familiarEV7 North-South-East-West configuration (by the way, the 1999 EV7 also had a fifth non-cache coherent channel just for I/O to keep I/O stuff from affecting the main SMP scalability - that's how advanced them urdered platform was).

This, of course, is way better than the current Opterons with only up to three HT channels sharing both SMP and I/O traffic, but about thesame as Barcelona, where the highest-end parts are supposed to have four HT3 channels as well.

With AMD implementing sped-up HT3 in its upcoming 2007/2008CPU, Intel can't afford to cripple the outside interconnect of its X86CPUs just to keep Itanium's 'look' better. Therefore, Tukwilla and it ssupposed successor, Poulson, might only have their cores and caches to differentiate against the X86 in-house competition.

Now, if you look at the current SPECint and fp rates of Core 2Quad 65nm processors, and even assume conservatively that they will only improve 30 per cent per core in initial Bloomfield Nehalems -although it will probably quite a bit more than that, especially forSPECfp, to address the AMD competition - and that the Tukwilla would get roughly the same 30 per cent per-core improvement vs 1.6GHzMontecito - we're not talking about a pretty picture for the good ship Itanic.

In fact, even the FP performance gulf will be gone. Even IBM's expensive POWER6 will have problems with Nehalems in this case - even if it really comes out at full 5GHz, and you ideally linearly scale the current 2.3GHz POWER5+ scores to that level. This doesn't usually happen, by the way.

In summary, while, in Nehalem, Intel focuses on delivering the ultimate 2008 winner X86 core with the last scalability problem - memory and interconnect - finally solved, there will be even fewer reasons to look at another, incompatible and expensive, 64-bit architecture - even one from the same house, in this case Itanium.

Yes, its huge register sets and so on will give benefits in certain apps like some HPC routines, but the comparative advantages vs their own X86 will be even less than before.

On the other hand, a good CSI implementation might at least save the Itanic's currently rusty ship hull from the icebergs likePOWER6 and new Opterons in the life-saviing niches of the super computer and high-end server market, where its shared FSB was and still is affecting the scalability truly, badly, deeply.

Whether that will be enough to justify keeping the Itanic afloat beyond contractual obligations to HP and others - well that's up to Intel. Let us know what you think... µ

[ 本帖最后由 Prescott 于 2007-2-28 12:47 编辑 ]

只看该作者 · 发表于 2007-2-28 12:56

提示: 作者被禁止或删除内容自动屏蔽

Prescott · 发表于 2007-2-28 13:06

:wacko: 干吗老盯着EPIC作显卡？x86扩展一下比他合适多了。

Edison · 发表于 2007-2-28 13:30

这样就和FlexIO无缘了:P

Prescott · 发表于 2007-2-28 13:49

我看不出FlexIO和CSI有任何关系。CSI可远不止几条信号线而已。

[ 本帖最后由 Prescott 于 2007-2-28 13:53 编辑 ]

只看该作者 · 发表于 2007-2-28 13:54

提示: 作者被禁止或删除内容自动屏蔽

Prescott · 发表于 2007-2-28 14:25

原帖由 potomac 于 2007-2-28 13:54 发表

当年安腾就是瞄准高性能，高扩展来的。

现在搞到卡上，刨去显示这块不说。
原来搞CPU的那些经验和工具，都能对通用计算起到作用。
在保证性能的前提下，这块的开发成本明显降低N多。

说视频这块，VLI ...

EPIC太复杂了，显卡的核心in-order的就足够了，搞EPIC那种东西完全是浪费。

EPIC的设计目标是用相对简单的硬件设计达到超高的指令级并行效率，要达到这一点就需要解决几个大问题：指令相关性，内存访问延时，读写冲突，条件跳转等等等等。这些问题在GPU的应用范围内几乎都不存在。而GPU的设计方向是GPU大量极其简单的高频率核心（比解码器+ALU复杂不了多少）加上足够带宽，这和EPIC的设计初衷完全背道而驰。

EPIC的复杂是在ISA层面上的，实现起来不可能很简单。x86要达到EPIC同样的效率，处理器核心的实现就可能要复杂得多得多，但是如果要实现一个非常简单的高频率低功耗x86处理器要远比EPIC简单的多。

[ 本帖最后由 Prescott 于 2007-2-28 14:29 编辑 ]

只看该作者 · 发表于 2007-2-28 16:03

提示: 作者被禁止或删除内容自动屏蔽

Prescott · 发表于 2007-2-28 16:47

原帖由 potomac 于 2007-2-28 16:03 发表
偶有两点困惑。请老P帮偶解惑。:a)
现在这种G80/R60或者x86扩展。
1。提高到双精度的代价是不是太大了？（尺寸/性能比）
2。提高通用计算的平行能力的效率能有多少？（SLi/CF/多核）

1。我觉得不大，对于Intel来说。SSE单元本来就同时支持单双精度，吞吐量差一倍。如果Intel作GPU，工艺上至少会领先Fabless一代。
2。看Workload，区别很大，绝大多数应用4核8核的就很差了，别说几十上百了。但是也有不少近似线性的workload，大多是HPC和图像处理。

star_wrx · 发表于 2007-2-28 16:59

路过帮顶。。。。。。。

ghrs2010 · 发表于 2007-2-28 18:51

大致看完了

大意是Nehalem大幅改进的执行单元配合CSI提供的高带宽能够提高极出色的INT&FP性能,以至于Tukwila及其后续Poulson除了核心与缓存容量之外几乎与X86家用处理器没有区别(主要表现在性能的差异消失上)

the_god_of_pig · 发表于 2007-2-28 18:59

唱衰EPIC,INQ的风格，(_(

集成MC,增大的L2的4G的Tukwilla必然不会素食，很有可能轻松烧烤掉power6，

至于Nehalem,和Tukwilla比有点自不量力，除非比core2提升90%(不算继承MC的提升)，否则没有资格。

power6，哈，看奇迹是否会光顾:whistling:

INQ一向吹水，连Poulson也一路唱衰掉，就x86的天生丽质，EPIC的优势明明白白

BTW:偶再顶效率，唱衰暴力一回:p

只看该作者 · 发表于 2007-2-28 19:16

提示: 作者被禁止或删除内容自动屏蔽

只看该作者 · 发表于 2007-2-28 19:21

提示: 作者被禁止或删除内容自动屏蔽

HeavenPR · 发表于 2007-3-1 18:29

其实大幅度减少针脚才叫 YY

什么时候出个 LGA 370...w00t)

fayerlxy · 发表于 2007-3-1 18:48

原帖由 HeavenPR 于 2007-3-1 18:29 发表
其实大幅度减少针脚才叫 YY

什么时候出个 LGA 370...w00t)

CPU内建核反应堆或其他能源发生装置以后应该不难实现

itany · 发表于 2007-3-1 20:18

原帖由 HeavenPR 于 2007-3-1 18:29 发表
其实大幅度减少针脚才叫 YY

什么时候出个 LGA 370...w00t)

四根针脚的口香糖处理器 w00t)
主板上边有64个插座，支持热拔插 w00t)
具体形态可以参见SATA 的插座 :lol:

Prescott · 发表于 2007-3-1 20:24

原帖由 itany 于 2007-3-1 20:18 发表

四根针脚的口香糖处理器 w00t)
主板上边有64个插座，支持热拔插 w00t)
具体形态可以参见SATA 的插座 :lol:

应该是

四根针脚的口香糖主机 w00t)
显示器上边有64个插座，支持热拔插 w00t)
具体形态可以参见SATA 的插座 :lol:

itany · 发表于 2007-3-2 23:07

原帖由 Prescott 于 2007-3-1 20:24 发表

应该是

四根针脚的口香糖主机 w00t)
显示器上边有64个插座，支持热拔插 w00t)
具体形态可以参见SATA 的插座 :lol:

也许苹果真的在YY这样的系统呢 w00t)

帐号		自动登录	找回密码
密码			注册

potomac 该用户已被删除	2^# 发表于 2007-2-28 12:56 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	6^# 发表于 2007-2-28 13:54 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	8^# 发表于 2007-2-28 16:03 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	13^# 发表于 2007-2-28 19:16 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

potomac 该用户已被删除	14^# 发表于 2007-2-28 19:21 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡