RADEON 2000体系结构的十大优化方向

Edison · 发表于 2007-6-29 19:35

:charles: Top 10 optimization hints

Parallelize your code
In order to utilize the power of the scalar architecture it is important that the code is parallel. Avoid unnecessarily serializing your instructions and use parentheses to introduce explicit parallelism where possible. See the Parallelize your code section in this document.

Optimize all shader stages

On a unified architecture it is not just the dominant shader that dictates your final performance, but all shader stages consume resources from the shared pool of computation power. The gain will naturally be larger by optimizing the most heavily loaded shader, but improvements to the less loaded shaders will still improve performance. So unlike in earlier hardware generations a pixel shader limited case might see performance improvements by optimizing the vertex shader. See the unified architecture section.
Make proper use of Z optimizations

Render your scene in rough front-to-back order or use a Pre-Z pass. Draw your skybox last. Draw your main character gun, opaque GUI or other front-most objects first. Avoid shader depth output. See the Depth & Stencil efficiency section.

Use vertex texture fetch

Getting data into your vertex and geometry shader is not only a question of memory bandwidth, but the fetching instructions may also be a limiting factor. By using vertex texture fetch you could potentially double your input rate by utilizing two separate fetching mechanisms. Splitting the data roughly equally between the vertex buffer and a texture often improves performance noticeably. See the Vertex texture
fetch section.

Use culling in the geometry shader

The geometry shader is typically limited by the output. If it can be quickly determined that a triangle is outside the frustum or that it is back-facing you can usually achieve a significant performance improvement by not writing it out. For instance in a render-to-cubemap case most triangles need only be written to one face thus may cut down output by almost a factor of six. See the Use frustum and backface culling section for details and example code.

Minimize geometry shader I/O

The geometry shader is typically limited by output. Input may also matter in many cases. By keeping the input and output data small you can see significant performance improvement. Packing data or trading GS output for a few instructions in the pixel shader is typically beneficial. See the Keep data small
section.

Use instancing

While D3D10 has improved things it continues to be the case that the number of draw calls can be asignificant limitation to performance. It is therefore a good idea to design your application around instancing. D3D10 makes instancing better than ever with an improved interface and tools like the SV_InstanceID system value. See the Instancing section.

Use the right data types

Don’t use vectors when a scalar is enough. Don’t compute alpha if you only care about RGB. Avoid excessive type conversions. See the Use the right data type, Avoid mixing types , Scalar ALUs, and Don’t return float4 if not necessary sections.

Use dynamic branching

Dynamic branching can be used to avoid doing unnecessary work, such as computing lighting for parts of a scene that is in shadow. Good use of dynamic branching can provide a significant performance increase. See the Dynamic branching section.

Use constant buffers in D3D10 style

When porting a game or application from D3D9 it is important to not just directly translate D3D9 calls into equivalent D3D10 calls. If you are uploading as many constants as in D3D9 you didn’t really gain anything. Try to keep as many constants around in video memory and only keep updating truly dynamic constants. See the Constant buffers section.

Edison · 发表于 2007-6-29 19:40

Hyper-Z的启用限制列表：

NewCastle · 发表于 2007-6-29 20:09

In order to utilize the power of the scalar architecture it is important that the code is parallel

这句话会让人对标量化架构的理解产生歧义的 w00t)

Edison · 发表于 2007-6-29 20:14

5-way 超标量而已。

HeavenPR · 发表于 2007-6-29 20:22

除了第一条，其他对 G80 也只有好处没有坏处，当然，第一条对 G80 也没啥坏处

killpmp · 发表于 2007-6-29 20:25

原帖由 HeavenPR 于 2007-6-29 20:22 发表
除了第一条，其他对 G80 也只有好处没有坏处，当然，第一条对 G80 也没啥坏处

只不过是雪中送炭与锦上添花的差别:lol:

iiiiuuuu · 发表于 2007-6-29 21:23

都是通用的编程优化技巧，谈不上radeon 2000系列专用

lqf3dnow · 发表于 2007-6-29 21:32

对自身架构怎么没有提及ringbus呢，这对未来产品都是由很大优势可言的

Edison · 发表于 2007-6-29 21:39

原帖由 lqf3dnow 于 2007-6-29 21:32 发表
对自身架构怎么没有提及ringbus呢，这对未来产品都是由很大优势可言的

ringbus在性能上相对XBAR来说不存在优势可言。

简而言之，在未来不会因为ringbus的可编程性带来性能的飞跃。

lqf3dnow · 发表于 2007-6-29 21:44

原帖由 Edison 于 2007-6-29 21:39 发表

ringbus在性能上相对XBAR来说不存在优势可言。

简而言之，在未来不会因为ringbus的可编程性带来性能的飞跃。

只是对自身架构产品而言
难道A卡将来舍弃ringbus，改回Xbar，不可能吧

只看该作者 · 发表于 2007-6-30 03:32

提示: 作者被禁止或删除内容自动屏蔽

madcat2100 · 发表于 2007-6-30 10:07

原帖由 lqf3dnow 于 2007-6-29 21:44 发表

只是对自身架构产品而言
难道A卡将来舍弃ringbus，改回Xbar，不可能吧

搞不好ringbus是瓶颈？

只看该作者 · 发表于 2007-6-30 12:53

提示: 作者被禁止或删除内容自动屏蔽

只看该作者 · 发表于 2007-6-30 13:23

提示: 作者被禁止或删除内容自动屏蔽

天下18 · 发表于 2007-6-30 17:57

提示: 作者被禁止或删除内容自动屏蔽

89度热水 · 发表于 2007-6-30 18:14

INTEL的Gesher/Larrabee也用ringbus

susu2933 · 发表于 2009-5-6 19:54

看不懂英文

ic.expert · 发表于 2009-5-7 20:35

AMD R600这个HZB太弱了，既不On-chip，也不能动态改变配置，而且只有一个Z Value……这个连S3G的GPU都不如。

如果D3D中，Alpha Test在Graphics Pipeline中的位置安排在Depth Test之后，那就好了，至少HZB上好很多。

帐号		自动登录	找回密码
密码			注册

RacingPHT 该用户已被删除	11^# 发表于 2007-6-30 03:32 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

来不及思考该用户已被删除	13^# 发表于 2007-6-30 12:53 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
来不及思考该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	14^# 发表于 2007-6-30 13:23 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

天下18 天下18 当前离线积分 24 IP卡狗仔卡头像被屏蔽	15^# 发表于 2007-6-30 17:57 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
天下18 天下18 当前离线积分 24 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

RADEON 2000体系结构的十大优化方向

本帖子中包含更多资源

浏览过的版块