POPPUR爱换

标题: RADEON 2000体系结构的十大优化方向 [打印本页]

作者: Edison 时间: 2007-6-29 19:35
标题: RADEON 2000体系结构的十大优化方向
:charles: Top 10 optimization hints

Parallelize your code
In order to utilize the power of the scalar architecture it is important that the code is parallel. Avoid unnecessarily serializing your instructions and use parentheses to introduce explicit parallelism where possible. See the Parallelize your code section in this document.

Optimize all shader stages

On a unified architecture it is not just the dominant shader that dictates your final performance, but all shader stages consume resources from the shared pool of computation power. The gain will naturally be larger by optimizing the most heavily loaded shader, but improvements to the less loaded shaders will still improve performance. So unlike in earlier hardware generations a pixel shader limited case might see performance improvements by optimizing the vertex shader. See the unified architecture section.
Make proper use of Z optimizations

Render your scene in rough front-to-back order or use a Pre-Z pass. Draw your skybox last. Draw your main character gun, opaque GUI or other front-most objects first. Avoid shader depth output. See the Depth & Stencil efficiency section.

Use vertex texture fetch

Getting data into your vertex and geometry shader is not only a question of memory bandwidth, but the fetching instructions may also be a limiting factor. By using vertex texture fetch you could potentially double your input rate by utilizing two separate fetching mechanisms. Splitting the data roughly equally between the vertex buffer and a texture often improves performance noticeably. See the Vertex texture
fetch section.

Use culling in the geometry shader

The geometry shader is typically limited by the output. If it can be quickly determined that a triangle is outside the frustum or that it is back-facing you can usually achieve a significant performance improvement by not writing it out. For instance in a render-to-cubemap case most triangles need only be written to one face thus may cut down output by almost a factor of six. See the Use frustum and backface culling section for details and example code.

Minimize geometry shader I/O

The geometry shader is typically limited by output. Input may also matter in many cases. By keeping the input and output data small you can see significant performance improvement. Packing data or trading GS output for a few instructions in the pixel shader is typically beneficial. See the Keep data small
section.

Use instancing

While D3D10 has improved things it continues to be the case that the number of draw calls can be asignificant limitation to performance. It is therefore a good idea to design your application around instancing. D3D10 makes instancing better than ever with an improved interface and tools like the SV_InstanceID system value. See the Instancing section.

Use the right data types

Don’t use vectors when a scalar is enough. Don’t compute alpha if you only care about RGB. Avoid excessive type conversions. See the Use the right data type, Avoid mixing types , Scalar ALUs, and Don’t return float4 if not necessary sections.

Use dynamic branching

Dynamic branching can be used to avoid doing unnecessary work, such as computing lighting for parts of a scene that is in shadow. Good use of dynamic branching can provide a significant performance increase. See the Dynamic branching section.

Use constant buffers in D3D10 style

When porting a game or application from D3D9 it is important to not just directly translate D3D9 calls into equivalent D3D10 calls. If you are uploading as many constants as in D3D9 you didn’t really gain anything. Try to keep as many constants around in video memory and only keep updating truly dynamic constants. See the Constant buffers section.

作者: Edison 时间: 2007-6-29 19:40
Hyper-Z的启用限制列表：

作者: NewCastle 时间: 2007-6-29 20:09
In order to utilize the power of the scalar architecture it is important that the code is parallel

这句话会让人对标量化架构的理解产生歧义的 w00t)

作者: Edison 时间: 2007-6-29 20:14
5-way 超标量而已。

作者: HeavenPR 时间: 2007-6-29 20:22
除了第一条，其他对 G80 也只有好处没有坏处，当然，第一条对 G80 也没啥坏处

作者: killpmp 时间: 2007-6-29 20:25

原帖由 HeavenPR 于 2007-6-29 20:22 发表
除了第一条，其他对 G80 也只有好处没有坏处，当然，第一条对 G80 也没啥坏处

只不过是雪中送炭与锦上添花的差别:lol:

作者: iiiiuuuu 时间: 2007-6-29 21:23
都是通用的编程优化技巧，谈不上radeon 2000系列专用

作者: lqf3dnow 时间: 2007-6-29 21:32
对自身架构怎么没有提及ringbus呢，这对未来产品都是由很大优势可言的

作者: Edison 时间: 2007-6-29 21:39

原帖由 lqf3dnow 于 2007-6-29 21:32 发表
对自身架构怎么没有提及ringbus呢，这对未来产品都是由很大优势可言的

ringbus在性能上相对XBAR来说不存在优势可言。

简而言之，在未来不会因为ringbus的可编程性带来性能的飞跃。

作者: lqf3dnow 时间: 2007-6-29 21:44

原帖由 Edison 于 2007-6-29 21:39 发表

ringbus在性能上相对XBAR来说不存在优势可言。

简而言之，在未来不会因为ringbus的可编程性带来性能的飞跃。

只是对自身架构产品而言
难道A卡将来舍弃ringbus，改回Xbar，不可能吧

作者: RacingPHT 时间: 2007-6-30 03:32
提示: 作者被禁止或删除内容自动屏蔽

作者: madcat2100 时间: 2007-6-30 10:07

原帖由 lqf3dnow 于 2007-6-29 21:44 发表

只是对自身架构产品而言
难道A卡将来舍弃ringbus，改回Xbar，不可能吧

搞不好ringbus是瓶颈？

作者: 来不及思考 时间: 2007-6-30 12:53
提示: 作者被禁止或删除内容自动屏蔽

作者: RacingPHT 时间: 2007-6-30 13:23
提示: 作者被禁止或删除内容自动屏蔽

作者: 天下18 时间: 2007-6-30 17:57
提示: 作者被禁止或删除内容自动屏蔽

作者: 89度热水 时间: 2007-6-30 18:14
INTEL的Gesher/Larrabee也用ringbus

作者: susu2933 时间: 2009-5-6 19:54
看不懂英文

作者: ic.expert 时间: 2009-5-7 20:35
AMD R600这个HZB太弱了，既不On-chip，也不能动态改变配置，而且只有一个Z Value……这个连S3G的GPU都不如。

如果D3D中，Alpha Test在Graphics Pipeline中的位置安排在Depth Test之后，那就好了，至少HZB上好很多。

欢迎光临 POPPUR爱换 (https://we.poppur.com/)