RWT: Nehalem详解

Prescott · 发表于 2008-4-4 17:12

http://www.realworldtech.com/inc ... 2719&mode=print
原文请访问RWT

一些原先没有披露的细节：
1. L2 latency 低于12Cycle，L3在30-40 cycle之间
2. L3的设计："The advantage of an inclusive cache is that it can handle almost all coherency traffic without disturbing the private caches for each individual-core. If a cache access misses in the L3, it cannot be present in any of the L2 or L1 caches of the cores. On the other hand, Nehalem’s L3 also acts like a snoop filter for cache hits. Each cache line in the L3 contains four “core valid” bits denoting which cores may have a copy of that line in their private caches. If a “core valid” bit is set to 0, then that core cannot possibly have a copy of the cache line – while a “core valid” bit set to 1 indicates it is possible (but not guaranteed) that the core in question could have a private copy of the line. Since Nehalem uses the MESIF cache coherency protocol, as discussed previously, if two cores have valid bits, then the cache line is guaranteed to be clean (i.e. not modified). The combination of these two techniques lets the L3 cache insulate each of the cores from as much coherency traffic as possible, leaving more bandwidth available for actual data in the caches. "

性能的预期：
整数性能增长一般，HPC和浮点密集型程序将增长惊人，2倍或者更多都是有可能的（嘿嘿）。而企业级商业应用性能介于两者之间，不过，8核心的Beckton一年之后就会到来

In workloads that are not particularly bandwidth dependent, such as general integer applications, Nehalem will provide a moderate boost over the previous generation. The performance gains will largely come from the integrated memory controller, microarchitectural innovation and circuit techniques (the latter was not discussed by Intel at IDF and will probably be saved for later disclosure). For floating point and HPC workloads that are typically bandwidth bound, Nehalem will be nothing short of a miracle – with performance gains of 2X or better. Commercial server workloads such as OLTP databases, decision support and virtualization will certainly benefit from more bandwidth and lower latency as well, but not to the same extent as HPC or floating point applications. Of course, when thinking about performance it is essential to also keep time in mind – Beckton will be about a year behind Gainestown and Bloomfield.

itany · 发表于 2008-4-4 17:39

呵呵，P大没有说L1延迟增加了1周期…… 并不是都是好事啊
还有Nehalem把循环检测缓冲从译码器前边移到了后边，Tracing Cache又复活了……
本来预期Nehalem能拓宽取指宽度的，现在也没有变化，看来是白YY了

现在就是希望Nehalem能把频率如期拱上去了…… 当然很多人不关心官方能到多少，而是关心能超到多少 :lol:

elisha · 发表于 2008-4-4 17:48

看过了，很失望

the_god_of_pig · 发表于 2008-4-4 17:53

这个可得看:charles:

GZboy · 发表于 2008-4-4 18:22

提示: 作者被禁止或删除内容自动屏蔽

itany · 发表于 2008-4-4 19:40

原帖由 GZboy 于 2008-4-4 18:22 发表
很怀疑Nehalem单线程效率拼不拼得过现在的CORE2
难道Nehalem是拱频率的？:unsure:

Nehalem应该不会不如Core2的，废材的地方就是缓存容量变小了么，不过延迟也变小了啊，L3的延迟其实也不大了，Prescott的L2延迟也就是这个水平啊

NONO · 发表于 2008-4-4 19:48

INTEL應該不會再犯以前的錯，Nehalem跟同頻現在Core2相比性能只會更強，但就是不曉得強多少:unsure:

ITANIUM2 · 发表于 2008-4-4 20:08

P大，减少二级缓存增加三级缓存会不会导致性能降低？
:a)

原帖由 itany 于 2008-4-4 19:40 发表

Nehalem应该不会不如Core2的，废材的地方就是缓存容量变小了么，不过延迟也变小了啊，L3的延迟其实也不大了，Prescott的L2延迟也就是这个水平啊

Pr***n 的二级缓存延迟大概多少？

[ 本帖最后由 ITANIUM2 于 2008-4-4 20:12 编辑 ]

GZboy · 发表于 2008-4-4 20:19

提示: 作者被禁止或删除内容自动屏蔽

ITANIUM2 · 发表于 2008-4-4 20:33

还是等评测吧
怀疑是不是该买个E8x00，以后不会有这么大二级缓存的cpu YY了

itany · 发表于 2008-4-4 20:38

原帖由 GZboy 于 2008-4-4 20:19 发表

这个要看缓存的命中和延迟

如果拼命追求低延迟而把 L2 做的太小就会导至大量数据都要从L3里获取，这样效率不一定拼得过大L2+较高的延迟的组合

现在的CORE2是 14C

Merom是14周期，Penryn是15周期，Prescott是28周期
虽然Nehalem L2缓存减小了，但是带宽也是独享的
我想，控制L2缓存的容量一方面是减少延迟，另一方面要控制管芯面积，毕竟增加L2同时也要增加L3，还能保证核心能够上高频

现在L2缓存的延迟可能还没有最终定下来吧，我原来YY的是<=10周期的

Prescott · 发表于 2008-4-4 20:46

原帖由 GZboy 于 2008-4-4 18:22 发表
很怀疑Nehalem单线程效率拼不拼得过现在的CORE2
难道Nehalem是拱频率的？:unsure:

单线程性能绝大多数要高过现在的Penry，当然也会有例外。
很多HPC程序的性能真的是很吓人。

itany · 发表于 2008-4-4 20:49

原帖由 Prescott 于 2008-4-4 20:46 发表

单线程性能绝大多数要高过现在的Penry，当然也会有例外。
很多HPC程序的性能真的是很吓人。

嘿嘿，老大出来说话，群众就放心了

2PMM · 发表于 2008-4-4 21:34

性能功耗比呢相对于PENRY:loveliness:

GZboy · 发表于 2008-4-4 22:57

提示: 作者被禁止或删除内容自动屏蔽

GZboy · 发表于 2008-4-4 23:03

提示: 作者被禁止或删除内容自动屏蔽

1empress · 发表于 2008-4-4 23:06

提示: 作者被禁止或删除内容自动屏蔽

itany · 发表于 2008-4-4 23:06

原帖由 GZboy 于 2008-4-4 23:03 发表

在桌面方面应该提升不大，服务器方面是值得期待的~:lol:

不大也是提升啊…… :lol:

本来Intel自己也说，早年整数性能提升的很快，差不多两年多就能增加一倍，现在整数差不多要五年才能增长一倍
与此同时，浮点性能由原来的低速增长变为每两年提升一倍

桌面应用提升不大也认了，现在就是这个形势
只要在目前比较瓶颈的地方突破就行了，一个是多媒体编码和解码，一个是游戏，再一个就是科学计算
至于说PI的成绩能不能提高，无所谓了

[ 本帖最后由 itany 于 2008-4-4 23:09 编辑 ]

larrabee · 发表于 2008-4-4 23:15

提示: 作者被禁止或删除内容自动屏蔽

itany · 发表于 2008-4-4 23:19

原帖由 larrabee 于 2008-4-4 23:15 发表
我的专业就是数值算法，intel应该给我一台样机。。。

对阁下来说，Larrabee才是王道，您就自己玩自己，其乐融融吧 :lol:

帐号		自动登录	找回密码
密码			注册

GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽	5^# 发表于 2008-4-4 18:22 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽	9^# 发表于 2008-4-4 20:19 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽	15^# 发表于 2008-4-4 22:57 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽	16^# 发表于 2008-4-4 23:03 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
GZboy GZboy 当前离线积分 19 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

1empress 1empress 当前离线积分 8 IP卡狗仔卡头像被屏蔽	17^# 发表于 2008-4-4 23:06 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
1empress 1empress 当前离线积分 8 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

larrabee larrabee 当前离线积分 2 IP卡狗仔卡头像被屏蔽	19^# 发表于 2008-4-4 23:15 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
larrabee larrabee 当前离线积分 2 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

RWT: Nehalem详解

浏览过的版块