|
GF100的核心频率能达到725?!
Although we don't yet have final product specs, Nvidia's Drew Henry set expectations for the GF100's power consumption by admitting the chip will draw more power under load than the GT200. That fact by itself isn't necessarily a bad thing—Intel's excellent Lynnfield processors consume more power at peak than their Core 2 Quad predecessors, but their total power consumption picture is quite good. Still, any chip this late and this large is going to raise questions, especially with a very capable, much smaller competitor already in the market.
With the new information we have about the GF100's graphics bits and pieces, we can revise our projections for its theoretical peak capabilities. Sad to say, our earlier projections were too bullish on several fronts, so most of our revisions are in a downward direction.
We don't have final clock speeds yet, but we do have a few hints. As I pointed out when we are talking about texturing, Nvidia's suggestion that the GF100's theoretical texture filtering capacity will be lower than the GT200's gives us an upper bound on clock speeds. The crossover point where GF100 would match the GeForce GTX 280 in texturing capacity is a 1505MHz core clock, with the texturing hardware running at half that frequency. We can probably assume the GF100's clocks will be a little lower than that.
We have another nice hint that running the texturing hardware at half the speed of the shaders rather than on a separate core clock will impart a 12-14% frequency boost. In this case, I'm going to be optimistic, follow a hunch, and assume the basis of comparison is the GT200b chip in the GeForce GTX 285. A clock speed boost in that range would get us somewhere near 725MHz for the half-speed clock and 1450MHz for the shaders. The GF100's various graphics units running at those speeds would yield the following peak theoretical rates. | GT200 | GF100 | RV870 | Transistor Count | 1.4B | 3.0B | 2.15B | Process node | 55 nm @ TSMC | 40 nm @ TSMC | 40 nm @ TSMC | Core clock | 648 MHz | 725 MHz | 850 MHz | Hot clock | 1476 MHz | 1450 MHz | -- | Memory clock | 2600 MHz | 4200 MHz | 4800 MHz | ALUs | 240 | 512 | 1600 | SP FMA rate | 0.708 Tflops | 1.49 Tflops | 2.72 Tflops | DP FMA rate | 88.5 Gflops | 186 Gflops* | 544 Gflops | ROPs | 32 | 48 | 32 | Memory bus width | 512 bit | 384 bit | 256 bit | Memory bandwidth | 166.4 GB/s | 201.6 GB/s | 153.6 GB/s | ROP rate | 21.4 Gpixels/s | 34.8 Gpixels/s | 27.2 Gpixels/s | INT8 Bilinear texel rate
(Half rate for FP16) | 51.8 Gtexels/s | 46.4 Gtexels/s | 68.0 Gtexels/s |
I should pause to explain the asterisk next to the unexpectedly low estimate for the GF100's double-precision performance. By all rights, in this architecture, double-precision math should happen at half the speed of single-precision, clean and simple. However, Nvidia has made the decision to limit DP performance in the GeForce versions of the GF100 to 64 FMA ops per clock—one fourth of what the chip can do. This is presumably a product positioning decision intended to encourage serious compute customers to purchase a Tesla version of the GPU instead. Double-precision support doesn't appear to be of any use for real-time graphics, and I doubt many serious GPU-computing customers will want the peak DP rates without the ECC memory that the Tesla cards will provide. But a few poor hackers in Eastern Europe are going to be seriously bummed, and this does mean the Radeon HD 5870 will be substantially faster than any GeForce card at double-precision math, at least in terms of peak rates.
Otherwise, on paper, the GF100 projects to be superior to the Radeon HD 5870 only in terms of ROP rate and memory bandwidth. (Then again, it's now suddenly notable that we're not estimating triangle throughput. The GF100 will have a clear edge there.) That fact isn't necessarily a calamity. The GeForce GTX 280, for example, had just over half the peak shader arithmetic rate of the Radeon HD 4870 in theory, yet the GTX 280's delivered performance was generally superior. Much hinges on how efficiently the GF100 can perform its duties. What we can say with certainty is that the GF100 will have to achieve a new high-water mark in architectural efficiency in order to outperform the 5870 by a decent margin—something it really needs to do, given that it's a much larger piece of silicon.
Obviously, the GF100 is a major architectural transition for Nvidia, which helps explain its rather difficult birth. The advances it promises in both GPU computing and geometry processing capabilities are pretty radical and could be well worth the pain Nvidia is now enduring, when all is said and done. The company has tackled problems in this generation of technology that its competition will have to address eventually.
In attempting to handicap the GF100's prospects, though, I'm struggling to find a successful analog to such a late and relatively large chip. GPUs like the NV30 and R600 come to mind, along with CPUs like Prescott and Barcelona. All were major architectural revamps, and all of them conspicuously ran hot and underperformed once they reached the market. The only positive examples I can summon are perhaps the R520—the Radeon X1800 XT wasn't so bad once it arrived, though it wasn't a paragon of efficiency—and AMD's K8 processors, which were long delayed but eventually rewrote the rulebook for x86 CPUs. I suppose we'll find out soon enough where in this spectrum the GF100 will reside.
原地址:http://www.techreport.com/articles.x/18332/5 |
|