|
:charles: Top 10 optimization hints
Parallelize your code
In order to utilize the power of the scalar architecture it is important that the code is parallel. Avoid unnecessarily serializing your instructions and use parentheses to introduce explicit parallelism where possible. See the Parallelize your code section in this document.
Optimize all shader stages
On a unified architecture it is not just the dominant shader that dictates your final performance, but all shader stages consume resources from the shared pool of computation power. The gain will naturally be larger by optimizing the most heavily loaded shader, but improvements to the less loaded shaders will still improve performance. So unlike in earlier hardware generations a pixel shader limited case might see performance improvements by optimizing the vertex shader. See the unified architecture section.
Make proper use of Z optimizations
Render your scene in rough front-to-back order or use a Pre-Z pass. Draw your skybox last. Draw your main character gun, opaque GUI or other front-most objects first. Avoid shader depth output. See the Depth & Stencil efficiency section.
Use vertex texture fetch
Getting data into your vertex and geometry shader is not only a question of memory bandwidth, but the fetching instructions may also be a limiting factor. By using vertex texture fetch you could potentially double your input rate by utilizing two separate fetching mechanisms. Splitting the data roughly equally between the vertex buffer and a texture often improves performance noticeably. See the Vertex texture
fetch section.
Use culling in the geometry shader
The geometry shader is typically limited by the output. If it can be quickly determined that a triangle is outside the frustum or that it is back-facing you can usually achieve a significant performance improvement by not writing it out. For instance in a render-to-cubemap case most triangles need only be written to one face thus may cut down output by almost a factor of six. See the Use frustum and backface culling section for details and example code.
Minimize geometry shader I/O
The geometry shader is typically limited by output. Input may also matter in many cases. By keeping the input and output data small you can see significant performance improvement. Packing data or trading GS output for a few instructions in the pixel shader is typically beneficial. See the Keep data small
section.
Use instancing
While D3D10 has improved things it continues to be the case that the number of draw calls can be asignificant limitation to performance. It is therefore a good idea to design your application around instancing. D3D10 makes instancing better than ever with an improved interface and tools like the SV_InstanceID system value. See the Instancing section.
Use the right data types
Don’t use vectors when a scalar is enough. Don’t compute alpha if you only care about RGB. Avoid excessive type conversions. See the Use the right data type, Avoid mixing types , Scalar ALUs, and Don’t return float4 if not necessary sections.
Use dynamic branching
Dynamic branching can be used to avoid doing unnecessary work, such as computing lighting for parts of a scene that is in shadow. Good use of dynamic branching can provide a significant performance increase. See the Dynamic branching section.
Use constant buffers in D3D10 style
When porting a game or application from D3D9 it is important to not just directly translate D3D9 calls into equivalent D3D10 calls. If you are uploading as many constants as in D3D9 you didn’t really gain anything. Try to keep as many constants around in video memory and only keep updating truly dynamic constants. See the Constant buffers section. |
|