POPPUR爱换

标题: GF100的核心频率能达到725？！ [打印本页]

作者: 苯苯小哥 时间: 2010-1-24 13:22
标题: GF100的核心频率能达到725？！
What does caching do for graphics?
We've already spent ample time on this architecture's computing capabilities, so I won't revisit that ground again here. One question that we've had since hearing about the GF100's relatively robust cache architecture is what benefits caching might have for graphics—if any.
Most GPUs have a number of special-purpose pools of local storage. The GF100 is similar in that it has an instruction cache and a dedicated 12KB texture cache in each SM. However, each SM also has 64KB of L1 data storage that's a little bit different: it can be split either 48/16KB or 16/48KB between a local data store (essentially a software-managed cache) and a true L1 cache. For graphics, the GF100 uses the 48KB shared memory/16KB L1 cache configuration, so most of the local storage will be directly managed by Nvidia's graphics drivers, as it was in the GT200. The small L1 cache in each SM does have a benefit for graphics, though. According to Alben, if an especially long shader fills all of the available register space, registers can spill into this cache. That should avoid some worst-case scenarios that could greatly hamper performance.
More impressive is the GF100's 768KB L2 cache, which is coherent across the chip and services all requests to read and write memory. This cache's benefits for computing applications with irregular data access patterns are clear, but how does it help graphics? In several ways, Nvidia claims. Because this cache can store any sort of data, it has multiple uses: it has replaced the 256KB, read-only L2 texture cache and the write-only ROP cache in the GT200 with a single, unified read/write path that naturally maintains proper program order. Since it's larger, the L2 provides more texture coverage than the GT200's L2 texture cache, a straightforward benefit. Because it can store any sort of data, and because it may be the only local data store large enough to handle it, the L2 cache will hold the large amounts of geometry data generated during tessellation, too.
So there we have some answers. If it works well, caching should help enable the GF100's unprecedented levels of geometry throughput and contribute to the architecture's overall efficiency.
One more shot at likely speeds and feeds
Speaking of efficiency, that will indeed be the big question about the Fermi architecture and especially about the GF100. How efficient is the architecture in its first implementation?

Almost to scale? A GF100 die shot. Source: Nvidia.

The chip isn't in the wild yet, so no one has measured its exact die size. Nvidia, as matter of policy, doesn't disclose die sizes for its GPUs (they are, I believe, the last straggler on this point in the PC market). But we know the transistor count is about three billion, which is, well, hefty. How so large a chip will fare on TSMC's thus far troubled 40-nm fabrication process remains to be seen, but the signs are mixed at best.

作者: 苯苯小哥 时间: 2010-1-24 13:22
标题: GF100的核心频率能达到725？！
Although we don't yet have final product specs, Nvidia's Drew Henry set expectations for the GF100's power consumption by admitting the chip will draw more power under load than the GT200. That fact by itself isn't necessarily a bad thing—Intel's excellent Lynnfield processors consume more power at peak than their Core 2 Quad predecessors, but their total power consumption picture is quite good. Still, any chip this late and this large is going to raise questions, especially with a very capable, much smaller competitor already in the market.
With the new information we have about the GF100's graphics bits and pieces, we can revise our projections for its theoretical peak capabilities. Sad to say, our earlier projections were too bullish on several fronts, so most of our revisions are in a downward direction.
We don't have final clock speeds yet, but we do have a few hints. As I pointed out when we are talking about texturing, Nvidia's suggestion that the GF100's theoretical texture filtering capacity will be lower than the GT200's gives us an upper bound on clock speeds. The crossover point where GF100 would match the GeForce GTX 280 in texturing capacity is a 1505MHz core clock, with the texturing hardware running at half that frequency. We can probably assume the GF100's clocks will be a little lower than that.
We have another nice hint that running the texturing hardware at half the speed of the shaders rather than on a separate core clock will impart a 12-14% frequency boost. In this case, I'm going to be optimistic, follow a hunch, and assume the basis of comparison is the GT200b chip in the GeForce GTX 285. A clock speed boost in that range would get us somewhere near 725MHz for the half-speed clock and 1450MHz for the shaders. The GF100's various graphics units running at those speeds would yield the following peak theoretical rates.

	GT200	GF100	RV870
Transistor Count	1.4B	3.0B	2.15B
Process node	55 nm @ TSMC	40 nm @ TSMC	40 nm @ TSMC
Core clock	648 MHz	725 MHz	850 MHz
Hot clock	1476 MHz	1450 MHz	--
Memory clock	2600 MHz	4200 MHz	4800 MHz
ALUs	240	512	1600
SP FMA rate	0.708 Tflops	1.49 Tflops	2.72 Tflops
DP FMA rate	88.5 Gflops	186 Gflops*	544 Gflops
ROPs	32	48	32
Memory bus width	512 bit	384 bit	256 bit
Memory bandwidth	166.4 GB/s	201.6 GB/s	153.6 GB/s
ROP rate	21.4 Gpixels/s	34.8 Gpixels/s	27.2 Gpixels/s
INT8 Bilinear texel rate (Half rate for FP16)	51.8 Gtexels/s	46.4 Gtexels/s	68.0 Gtexels/s

I should pause to explain the asterisk next to the unexpectedly low estimate for the GF100's double-precision performance. By all rights, in this architecture, double-precision math should happen at half the speed of single-precision, clean and simple. However, Nvidia has made the decision to limit DP performance in the GeForce versions of the GF100 to 64 FMA ops per clock—one fourth of what the chip can do. This is presumably a product positioning decision intended to encourage serious compute customers to purchase a Tesla version of the GPU instead. Double-precision support doesn't appear to be of any use for real-time graphics, and I doubt many serious GPU-computing customers will want the peak DP rates without the ECC memory that the Tesla cards will provide. But a few poor hackers in Eastern Europe are going to be seriously bummed, and this does mean the Radeon HD 5870 will be substantially faster than any GeForce card at double-precision math, at least in terms of peak rates.
Otherwise, on paper, the GF100 projects to be superior to the Radeon HD 5870 only in terms of ROP rate and memory bandwidth. (Then again, it's now suddenly notable that we're not estimating triangle throughput. The GF100 will have a clear edge there.) That fact isn't necessarily a calamity. The GeForce GTX 280, for example, had just over half the peak shader arithmetic rate of the Radeon HD 4870 in theory, yet the GTX 280's delivered performance was generally superior. Much hinges on how efficiently the GF100 can perform its duties. What we can say with certainty is that the GF100 will have to achieve a new high-water mark in architectural efficiency in order to outperform the 5870 by a decent margin—something it really needs to do, given that it's a much larger piece of silicon.
Obviously, the GF100 is a major architectural transition for Nvidia, which helps explain its rather difficult birth. The advances it promises in both GPU computing and geometry processing capabilities are pretty radical and could be well worth the pain Nvidia is now enduring, when all is said and done. The company has tackled problems in this generation of technology that its competition will have to address eventually.
In attempting to handicap the GF100's prospects, though, I'm struggling to find a successful analog to such a late and relatively large chip. GPUs like the NV30 and R600 come to mind, along with CPUs like Prescott and Barcelona. All were major architectural revamps, and all of them conspicuously ran hot and underperformed once they reached the market. The only positive examples I can summon are perhaps the R520—the Radeon X1800 XT wasn't so bad once it arrived, though it wasn't a paragon of efficiency—and AMD's K8 processors, which were long delayed but eventually rewrote the rulebook for x86 CPUs. I suppose we'll find out soon enough where in this spectrum the GF100 will reside.

原地址：http://www.techreport.com/articles.x/18332/5

作者: 苯苯小哥 时间: 2010-1-24 13:23

725可能吗？这么高之前听说650就不错了

作者: 未来的水世界 时间: 2010-1-24 13:28
提示: 作者被禁止或删除内容自动屏蔽

作者: iamspy 时间: 2010-1-24 13:29
正常吧。毕竟都拱了半年多了。

作者: 苯苯小哥 时间: 2010-1-24 13:33

650核心频率是448SP GF360的吧
196.21驱动证实了GF100的核心和SHARDER工作比率是1：2，GF100不是说360版本 ...
未来的水世界发表于 2010-1-24 13:28

650是360啊 SHARDER不就是1300啦

作者: sowo 时间: 2010-1-24 14:03
文中说了桌面版的GF100的浮点能力被限制了，NV不想让桌面版抢了Telsa的生意

作者: skywalker_hao 时间: 2010-1-24 14:38
根据tesla那个数据算，shader过1.6g是有的

作者: gzpony 时间: 2010-1-24 14:45

文中说了桌面版的GF100的浮点能力被限制了，NV不想让桌面版抢了Telsa的生意
sowo 发表于 2010-1-24 14:03

唉，tomsmith123 还想买几十张桌面版回去搞集群的。现在看来果然还是要去买专业卡啊

作者: 苯苯小哥 时间: 2010-1-24 14:47

GF100再贵也不能跟特斯拉比。。。。。 NV赚的就是这黑钱

作者: sylphid 时间: 2010-1-24 14:48
提示: 作者被禁止或删除内容自动屏蔽

作者: 380 时间: 2010-1-24 16:08
提示: 作者被禁止或删除内容自动屏蔽

作者: iamw2d 时间: 2010-1-24 16:33
嘿嘿不知道ati怎么想的
5870的dp没阉反而把5770的阉了

作者: cfango 时间: 2010-1-24 16:50
超频能上去？

作者: 未来的水世界 时间: 2010-1-24 17:04
提示: 作者被禁止或删除内容自动屏蔽

作者: chdongsh 时间: 2010-1-24 19:07
双精度被砍了四份之三，晕，这还有多少能期待的东西。本来我还觉得这算是fermi最大的亮点了

作者: 380 时间: 2010-1-24 19:23
提示: 作者被禁止或删除内容自动屏蔽

作者: spring62 时间: 2010-1-24 19:34

双精度被砍了四份之三，晕，这还有多少能期待的东西。本来我还觉得这算是fermi最大的亮点了
chdongsh 发表于 2010-1-24 19:07

为了保证游戏卡不和TESLA抢生意……就和现在N卡不能改专业卡差不多吧

作者: td4587 时间: 2010-1-24 19:34
楼上的图看，只低了17瓦而已。

作者: disruptor 时间: 2010-1-24 19:35
文中的gt200 的单精度性能不对吧，貌似过了1T了。臆断的东西很多

作者: sadalee 时间: 2010-1-24 19:38

,实物出来了再做评论

作者: 未来的水世界 时间: 2010-1-24 20:00
提示: 作者被禁止或删除内容自动屏蔽

作者: westlee 时间: 2010-1-24 20:09
提示: 作者被禁止或删除内容自动屏蔽

作者: chdongsh 时间: 2010-1-24 20:12
看来双精度的应用只能期待 ATi 的下代卡了。希望68系列的计算易用性能有所突破。而且说起来 5870也应该说是320个完整的流处理器。

作者: 未来的水世界 时间: 2010-1-24 20:39
提示: 作者被禁止或删除内容自动屏蔽

作者: winningeleven 时间: 2010-1-24 20:41
好像不靠谱

作者: sowo 时间: 2010-1-24 20:41
GF100的双精度浮点这么低的原因文中也说了，双精度对图形计算没什么用，要不NV把浮点砍了干什么

作者: 苯苯小哥 时间: 2010-1-24 20:57
将furmark执行文件的名字改成了3Dmark06.exe运行，果然是，PCI-E暴增到5A
————————————————
哈哈真好玩改名还有这个效果

至于这功耗，实际应用有多少情况达到那么高啊

作者: 未来的水世界 时间: 2010-1-24 21:01
提示: 作者被禁止或删除内容自动屏蔽

作者: 未来的水世界 时间: 2010-1-24 21:05
提示: 作者被禁止或删除内容自动屏蔽

作者: 為蝦米 时间: 2010-1-24 22:19
回复 38# 未来的水世界

4870x2 的furmark功耗和W1zzard测试结果类似，380w+。

作者: yzdjx666 时间: 2010-1-24 22:37
正式发布前一切都是浮云！！

作者: PaulWong 时间: 2010-1-24 22:49

原测试链接地址：
从o c p看到这篇贴出来的测试文章，来自德国ht4u的测试，算是目前最准确的功耗测 ...
未来的水世界发表于 2010-1-24 21:05

不错支持，看来标称和实际差的真不小，没有好电源还真不行。

作者: 52pk 时间: 2010-1-24 22:53
TDP是为散热设计的，并不是实际功耗

作者: 苯苯小哥 时间: 2010-1-25 09:25

回复未来的水世界

4870x2 的furmark功耗和W1zzard测试结果类似，380w+。

為蝦米发表于 2010-1-24 22:19

这里面没有 5970啊那个多少呀

作者: 為蝦米 时间: 2010-1-25 09:36
回复 44# 苯苯小哥

作者: minijia 时间: 2010-1-25 09:41
看起来5系列的功耗都还不错。

欢迎光临 POPPUR爱换 (https://we.poppur.com/)