POPPUR爱换

 找回密码
 注册

QQ登录

只需一步,快速开始

手机号码,快捷登录

搜索
查看: 1291|回复: 0
打印 上一主题 下一主题

Niagara II

[复制链接]
跳转到指定楼层
1#
发表于 2006-9-6 23:05 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
其实上两个星期我已经有它的hotchips 18文档了,现在realworldtech把这部分内容整理好,发表了这片文章:

http://www.realworldtech.com/page.cfm?ArticleID=RWT090406012516&p=1
By: David Kanter







Figure 4 – Niagara II Floorplan
Commentary and Analysis
When assessing Niagara II, the thread partitioning stands out as a novel design decision. Most recent multithreaded designs had 2-4 threads (POWER5, Pentium 4 and Xeon, Itanium 2, EV8, Niagara I), which could be easily handled in a unified manner, so there was no need to group threads together.
Since Sun is in new territory, it is hardly surprising that they were forced to use new techniques for scalability. Searching through 8 threads to issue two instructions with no structural hazards would have impacted clockspeed significantly for Niagara II. Architectural simulations revealed that the performance impact of partitioning (and deferred hazard detection in the decode stage) was very small for server workloads, so the design choice was straightforward.
Assigning functional units to a specific set of threads creates a certain degree of asymmetry in multithreading, and is also fairly unusual. It will be interesting to see how other participants in the industry plan to handle higher levels of multithreading; although it appears that for now, most other companies will either use fewer than 8 threads, or different types of multithreading. Perhaps just as importantly, this blurring of the architectural lines likely presages future developments in Sun’s upcoming processor code-named Rock.  
One of the biggest improvments in Niagara II was the enhanced floating point support. As a general rule of thumb, performance critical floating point applications are rich in ILP, which would make Niagara II a less than ideal processor. However, some workloads simply require a massive amount of bandwidth, and Niagara II is fairly impressive in that regard. Moreover, perhaps this will push Sun into researching techniques to convert ILP into TLP. Certainly, it should be easy to distribute loop iterations (with no carried dependencies) between different threads.
More robust techniques along these lines could turn Niagara II into a very attractive HPC system and help the industry as a whole, although the financial merit of such an idea is unclear.  Although performance numbers were not forthcoming, the design objectives seem feasible and relatively competitive for Pro Acessor slated to arrive in the third quarter of 2007. The improvements in the cores and system architecture for Niagara II are substantial and should yield a factor of two improvement in performance. If Sun can hit their targets, these goals would translate into ~320K tpmC and ~150K BOPS in SPECjbb2005. This could put Niagara II at performance parity with the competition, and a lead in performance/watt. Either way, it is encouraging to see that Sun will continue to invest in novel architectures.
AcknowledgementsI would like to thank the following individuals for their help in writing this article:
Greg Grohoski
  • Robert Golla
  • Alex Plant
  • Marc Tremblay
  • and of course, anyone else who I may have forgotten.
您需要登录后才可以回帖 登录 | 注册

本版积分规则

广告投放或合作|网站地图|处罚通告|

GMT+8, 2024-5-15 21:45

Powered by Discuz! X3.4

© 2001-2017 POPPUR.

快速回复 返回顶部 返回列表