POPPUR爱换

标题: anandtech 的 Ryan Smith 讲述 nvidia 265/270 驱动执行多线程渲染技术背后的故事 [打印本页]

作者: Edison    时间: 2011-4-9 23:57
标题: anandtech 的 Ryan Smith 讲述 nvidia 265/270 驱动执行多线程渲染技术背后的故事
After talking things over with NVIDIA, they've agreed to allow me to discuss the precise changes they made to boost their Civ V performance by so much. So gather around children, crazy uncle Ryan has a story to tell.

---

In our description of Civ V, I've mentioned that it uses  a slew of DirectX 11 technologies" but I've never gone in to great detail on what those are. I'm not going to go into deep detail on that now - there's a good article over at PC Games Hardware that contains an interview with Firaxis about that - but I will quickly explain the ins and outs.

Often from a gamer standpoint it's natural to look at the immediate visual benefits of a new API. With DX11, the big feature is tessellation with a secondary feature of contact hardening shadows. However there's also a great deal of stuff going on in the backend for developers to make things faster - making things faster allows developers to use new graphical effects that may not have been practical before. So for DX11 on top of tessellation and contact hardening shadows there's also things like multithreaded rendering, compute shaders, support for larger textures, and the implementation of a pull model for certain attribute evaluation.

So why do I like Civ V? Because the LORE engine it's based on implements so many of these features. Sure, something like AvP will have tessellation added, or Bad Company 2 will implement contact hardening shadows, but most of the DX11 games today are adding one or two graphical features that improve the look of the game, but only begin to scratch the surface of the API. LORE goes much, much deeper. Firaxis uses multi-threaded rendering, they use compute shaders for texture compression, and they use tessellation. Today it's probably the most extensive AAA DX11 game that has been released so far. This makes it a great GPU benchmark, as it's a real game we can use to test features other games don't touch.

So what then is going on that made Civ V so much faster for NVIDIA? Admittedly I had to press NVIDIA for this - performance practically doubled on high-end GPUs, which is unheard of. Until they told me what exactly they did, I wasn't convinced it was real or if they had come up with a really sweet cheat. It definitely wasn't a cheat.

If you recall from our articles, I keep pointing to how we seem to be CPU limited at the time. Now if you go back to the list of DX11 features Civ V uses, a light bulb should light up: multithreaded rendering. Civ V uses multi-threaded rendering, in fact it uses it quite extensively. Now why do we have multi-threaded rendering in the first place? Half of this is to better mesh with multi-threaded games by enabling additional threads to directly contribute without having to go through a master thread first. But a second purposes is because multi-threaded rendering helps the GPU just as much as it helps the CPU.



Traditionally, rendering is a very serial process. The program needs to setup a bunch of objects and then pass that on to the video drivers and finally to the GPU. There's a high degree of submission overhead, meaning it's possible to choke the CPU while submitting a large number of objects to the GPU. In DirectX 11, multi-threaded rendering is achieved by turning the D3D pipeline into a 3 step process: the Device, the Immediate Context, and the Deferred Context. The important bit here is that the deferred context is full of things that have yet to be sent to the GPU, and that you can have a deferred context for each thread. When developers talk about multi-threaded rendering with DX11, this is what they're referring to. When you use DX11s multi-threaded rendering capabilities correctly, you can have several threads assemble their deferred contexts, and then combine them into a single command list once it comes time to render the scene.

So Civ V uses proper multi-threaded rendering, that's great! So why isn't this the end of the story?

It turns out that you don't actually need to support all these nifty multi-threading features to be DX11(or rather D3D11) compliant - those features are optional - and that's what happened. And this is what changed my perspective on DX11, as before now I've never realized that anything in the API/spec was optional. Previously we had all the pieces to understand what was going on, but without knowing that AMD and NVIDIA did not fully support multi-threaded rendering, it was never clear what the bottleneck was.

But let's be clear here: multi-threaded rendering is a massive undertaking on the driver and hardware side. You're doing the GPU equivalent of inventing the multi-tasking operating system. NVIDIA and AMD have not until this point supported multi-threaded rendering in their drivers, as they have needed time to implement this feature correctly in their drivers. If you have the DX SDK installed, in the DX Caps Viewer this is visible in the D3D11 section under the title "Driver Command Lists".

Click this bar to view the small image.


So in a nuts**, 4 months ago Civ V supported multi-threaded rendering. AMD and NVIDIA did not.

        Quote:
        
                                                                                                Originally Posted by Firaxis @ PC Games Hardware                                                                        
                                Civilization V, as far as we know, is the first fully threaded DX11 game.

Unfortunately, because no other games have used this feature yet, neither Nvidia nor AMD have publically released threaded drivers, so users may not experience all the benefits just yet. We decided to keep threading enabled for Civilization V, however, because we are continuing to work closely with Nvidia and AMD on their support for multi-threading. We expect publically available threaded drivers shortly.

The internal architecture of the Civilization V graphics engine, however, is heavily multi-threaded and users will see multi-processor benefits even with drivers that are not threaded (including DX9). We have developed a series of configurable benchmark modes that we use internally for measuring our threading ability. These are fully described in the readme file. After some discussion, we decided to expose these internal tests on the released version so, if the users view the readme file, they will see that there are detailed instructions of these benchmark modes.

                                       

Can you guess then what changed?

With the Release 265 series drivers, NVIDIA enabled partial support for DX11's multi-threaded rendering features. At the time this support was limited to just Civ V, and while it was beyond the experimental stage it was clearly limited to Civ V as that allowed NVIDIA to deploy it against a single known program while they collected feedback and finished the other aspects of multi-threaded rendering.

With NVIDIA's drivers now allowing Civ V to use multiple deferred contexts, Civ V's performance shot way up. With high-end GPUs performance damn near doubled at lower resolutions. Civ V was in fact CPU limited - it was CPU limited because it was only able to use a single thread to assemble its contexts, and that thread was maxing out the single GPU core it could use. This is why drivers played such a big part in Civ V's performance, because how drivers handled D3D11 contexts was the key to unlocking Civ V's performance.

At this point in time we appear to be GPU limited, but we may also be CPU limited. Firaxis says Civ V can scale to 12 threads; this would be a hex-core CPU with hyperthreading. Our testbed is only a quad-core CPU with HT, meaning we probably aren't maxing out Civ V on the CPU side. And even with HT, it's likely that 12 real cores would improve on performance relative to 6 cores + HT. Firaxis believes they're GPU limited, but it's hard to definitively tell which it is.

Click this bar to view the small image.

Image from Firaxis GDC11 presentation

In any case, full support for multi-threaded rendering was finally enabled in NVIDIA's Release 270 drivers, which were released last week. At this point any game or application can take advantage of the feature, and not just Civ V. This is also why NVIDIA has finally allowed me to write about what they're previously told me, as they no longer consider it a secret. Finally, on a side note the fact that Civ V had this feature enabled in NVIDIA's drivers early is why performance does not appear to have changed between Release 265 and Release 270.

Anyhow, as far as I know, AMD does not currently offer fully support for multi-threaded rendering (I don't have an AMD card plugged in right now to run the DX Caps Viewer against). I'm not sure where they are on it, though I doubt they're very far behind.

So in conclusion, the reason NVIDIA beats AMD in Civ V is that NVIDIA currently offers full support for multi-threaded rendering/deferred contexts/command lists, while AMD does not. Civ V uses massive amounts of objects and complex terrain, and because it's multi-threaded rendering capable the introduction of multi-threaded rendering support in NVIDIA's drivers means that NVIDIA's GPUs can now rip through the game.

This is the true power of DX11. When properly implemented in both drivers and games, DX11's multi-threaded rendering capabilities are going to allow developers to push a lot more stuff out to the GPU without immediately bottlenecking the CPU.

On a future note, while Civ V is the first game to use DX11 multi-threaded rendering, it is not going to be the last. Battlefield 3 will most likely use it - DICE was lamenting the lack of driver support last month at GDC. The Capcom team responsible for Lost Planet 2 also mentioned how they would have liked to have this feature working before LP2, though I can't find the article at this time.

Coincidentally, last month's interview with AMD's Richard Huddy at Bit-Tech also has a lot in common with this. AMD says DX11 multi-threaded rendering can double object/draw-call throughput, and they want to go well beyond that by bypassing the DX11 API.

Further Reading: AnandTech, Revealing The Power of DirectX 11
作者: 路西法大大    时间: 2011-4-9 23:59
这多线程技术是不是只对于双卡或者以上的情况有效?
作者: Edison    时间: 2011-4-10 00:01
简单的说:

1、NVIDIA 在 265 驱动的时候引入了 DX11 多线程渲染技术,但是只限于 CIV 5,而 270 驱动是第一个全面提供 DX11 multithread rendering 技术的公开驱动。

2、按照 AMD Richard Huddy 的说法,DX11 multi-threaded rendering 可以让 object/draw-call 的吞吐率成倍增长。

3、未来的 battlefield 3 也将可能使用 DX11 multi-threaded rendering 技术。
作者: Edison    时间: 2011-4-10 00:04
路西法大大 发表于 2011-4-9 23:59
这多线程技术是不是只对于双卡或者以上的情况有效?

dx11 multithread rendering 主要针对的是渲染的时候有较多 draw call 情况,或者说如果瓶颈卡在 cpu 上的时候, MR 会有明显的效果,不过如果游戏卡在 gpu 上则没有什么帮助。

作者: 路西法大大    时间: 2011-4-10 00:23
那就是说能让游戏对多核心的支持更好?低频多核心的福音?
作者: ft5555    时间: 2011-4-10 00:29
是不是有个前提。。。这游戏得基于dx11.。。。。。。。。

那样的话对大多现有主流游戏没用了。。。
作者: rickerlian    时间: 2011-4-10 02:20
我把dx11的多线程渲染理解为软多线程渲染,deferred context就像draw call队列,但deferred context拥有一个优势,就是可以把多个draw call合并成一个(打包)call,减少draw call之间夹杂驱动及内存显存操作所带来的损失,合并后直接插到immediate context内调用。

dx11(通过API)提出了这种新型编程模型,需要驱动的配合,但是,我认为这种特性并不与特定的硬件相关,不是硬件特性,因为deferred context最终还是要通过immediate context来执行,所有的命令最终还是要通过主渲染线程发送到驱动(只不过有些是打包好的(deferred context),有些是未打包的(immediate context)),这与以往的命令发送方式相同。

不知道理解有没有错,还望指点{lol:]
作者: lacri    时间: 2011-4-10 08:05
不知这个技术对于dx10的卡有没有用?对于非dx11游戏有没有用?
作者: 河蟹万岁    时间: 2011-4-10 11:09
幸好学过鸟文
作者: godlike    时间: 2011-4-10 18:40
这个对dx9游戏有效?比如目前版本的c2?




欢迎光临 POPPUR爱换 (https://we.poppur.com/) Powered by Discuz! X3.4