|
GF100 图形架构:
![]()
In a traditional pipeline setup for GPUs the Geometry Shader, Vertex Shader, Setup/Rasterizer functions would come at the front end of the pipeline. This creates a situation where data will be stored and read from memory on the video card. This is just how things have been done for the longest time, and NVIDIA believes the traditional setup creates a bottleneck in geometry performance.
Not so simply, what NVIDIA have done is to separate the Raster Engine from the pipeline and move it down into the GPCs in four parts, and they have created a new engine they are calling the "PolyMorph Engine" which is integrated into the SMs. First a little breakup of the hierarchy, the GF100 is made up of 4 GPCs (Graphics Processing Clusters) which break down into 4 SMs (Streaming Multiprocessors) which break down into 32 CUDA cores and 4 Texture Units and some other stuff. So, 32 CUDA cores plus 4 Texture Units plus the PolyMorph Engine make up an SM, and 4 SMs make up a GPC. With this kind of parallelism you can see how the GPU can be sliced and diced to create less expensive parts.
Inside each GPC you will find the actual Raster Engine, so there are basically 4 Raster Engines inside the GF100. Inside each SM (a culmination of 32 CUDA cores and 4 Texture Units) you will find the new PolyMorph Engine. The PolyMorph Engine contains the actual Vertex Fetch, Tessellator, Viewport Transform, Attribute Setup and Stream Output functions. Again, all of these functions, including the Rasterizer use to be in one area on the GPU sitting at the front end of the entire process in the pipeline.
NVIDIA claims 8X the geometry performance of GT 200. This re-ordering of the graphics pipeline caused an increase of 10% of the die size and from our understanding of the issue, is the reason GF100 is "late." The problem with all this moving about of the pipeline is that now you have a Tessellator and Triangle Setup in each SM and GPC so all your Triangles that get setup are all setup out of order.
![]()
简而言之,在 GF100 里,32 个 CUDA core + 4TMU + PolyMorph 引擎,构成一个 SM。4 个 SM 构成一个 GPC (Graphics Processing Clusters,图形处理簇),每个 GPC 内有一个光栅化处理引擎。
PolyMorph 引擎包含了 "实际" 的顶点拾取、拆分器、视口转换、属性设置以及 stream output 功能。
NVIDIA 声称 GF100 的几何性能达到 GT200 的8倍,而这部分图形流水线会增加 10% 的管芯面积,HardOCP 认为这是导致 GF100 晚到的原因。
另外,按照 semiaccurate 的说法,他们认为 GF100 的 PolyMorph 引擎实际上是在 CUDA Core 内增加了光栅化处理和前端几何处理的加速指令,并不存在物理上的 PolyMorph 引擎,当然,这只是 semiaccurate 自己的观点,并未得到实证来证明。而 semiaccurate 一贯以来都认为 GF100 是 Larrabee done worng,作者在 G80/R600 时代就发表过 R600 会揪翻 G80 等观点。 :p
http://www.semiaccurate.com/2010 ... d-unmanufacturable/ |
|