rage3d 的专访，Eric Demers and GCN, Part II

Edison · 发表于 2011-12-23 22:53

AMD Radeon HD 7970 with model, Eric Demers

Eric Demers, Corporate Vice President and CTO of the AMD's Graphics Division, sat down with Rage3D's James Prior at the recent AMD Southern Islands tech day, held in Austin, Texas. At the AMD Lonestar campus, demonstrations and presentations of Tahiti product were shown, and we wanted some more information on a couple of items. Another gentleman was in attendence, one of the main architects for the new design but our scrawled note of his name proved indechiperable. His name is Tom; we'll fill in the details as soon as we get them from AMD PR.

James Prior, Rage3D: I'd like to get a little more information on what you did on the front end, to improve set up from Cayman to this new architecture, Tahiti. One of the criticisms leveled, fairly or unfairly, is that the triangle setup rate isn't quite where it needs to be - it's a bottleneck. What do you think about that?

Eric Demers: The truth is we have improved the efficiency; we didn't improve the peak primitive and vertex rate, this part matches it (Cayman), but we've improved the efficiency - things like increasing the buffer that we use to store the results of vertex processing while we're rasterizing the pixels so we'll have vertexes; a lot of it has significantly improved the efficiency when running close to our peak speeds.

Eric: Now, the reality is I can't think of an application that comes close to our peak speed today, but they might have been hitting some of our efficiency limits, so those will be significantly improved in some cases with Tahiti. From that standpoint I think we're ok. Now, reality-wise, could do better? Y'know, I just don't see, at least in a lot of cases, pure vertex rates being that much of a bottleneck. Where I've seen more the bottleneck is on the tessellation rate, and that's artificially inflated by how people are using tessellation. I'm all for people doing it, like we showed in Battlefield 3, and in DiRT 3, that's really cool to me. Our own demo on the partially resident textures - it looks awesome to me to do that. But when you're doing like Crysis 2, where you just add triangles and you do nothing, that's just silly.

Eric: I think it's artificially pointing a finger just because our competition has done things a different way. But we've done things differently too, for example the [tessellation] performance differences from our top end to our bottom end are very similar, so you can use the same geometry in both those engines and just scale your fillrate. I actually think that's a better solution in general - it allows an ISV to develop one set of geometry and use that on multiple boards.

Eric: I think a lot of these bottlenecks are artificial and in some part caused by how our friends have gone and implemented code in other people's stuff. Having said that, this architecture, as you saw on the tessellation slide but it's also true for geometry, shading and a lot of things that are vertex bound, you'll see a significant increase in performance. You'll see that translate into much higher performance, at least in the high tessellation games, that have that type of front end performance. You'll see Tahiti do a much better job there but, like I said, though I think some of those limits might be somewhat artificial this architecture will be much better there.
Tahiti chip. Eric Demers licked it. :(

R3D: Now the different caches, you've got 32 compute units feeding into 12 L2s. Does that have a potential for a bottleneck where you've going to have a lot of misses because everything is trying to hit L2 at the same time or is that just not a reasonable workload?

Eric: Well sure, we'll get misses on the L2 and that'll cause bottlenecks, there's no way around that. But we have 50% more b/w and 50% more L2s this time around than before. Fundamentally we do believe that even though the CU count increased by more than 40%, we actually increased our bandwidth by 50% so we should, if anything, be better off than we were before. Also, the L1s are twice the size now, so from that standpoint the number of misses they'll have is going to be reduced as well. Overall, when we do miss, the memory bandwidth is now 50% higher as well so in general things should balance out to be better than before. I can't imagine any case where we'd be any worse. I actually can't imagine a case where we'd match; I think we're always going to be ahead, and well ahead, of where we were. The only exception would be if you were doing a lot of read/write but then if you're doing a lot of read/write we're going to blow our previous generation out because we didn't cache any of that, we always went through memory - now we're caching read/writes all the time. I actually think we'll have a multiple integer performance improvement for any kind of read/write activity. In general I think this new cache architecture is just going to be much better.

Eric: Graphics, having 50%-100% more (cache) and more bandwidth is going to help there, too. Compute is going to help even more, too.

R3D: The reorganization of the compute units, no VLIW5 or VLIW4, seems like it would be efficient for some of the older graphics workloads like DX9 games. Are we going to see an improvement in performance vs. regular uplift from clocks and process?

Eric: The fillrates have gone up, up to to 50% [more] as well, and the texture rates has gone up from 96 to 128 pixels, and then clock rates are higher, so you are going to get performance just right off the bat - right away about 20-30% just from our previous 6970, just kind of minimum numbers. If you had a DX9 application that were really shader limited then yes, we would expect them to run faster. Trying to think of examples, the 3DMark ones should scale a little bit more but in general what we've seen in DX9 apps is that the shader use is still pretty simple. I think they're getting pretty good performance uplifts over 6970, anywhere from 1.2-1.5x and maybe even some 2X in some cases. I think they'll run that gamut, you guys will have to run the tests, certainly if you're CPU limited then you may not get any uplift because you're not 'feeding the beast'. Generally if you're running low resolution, if you're running 1280 or even 1600, the benefits there are much smaller than if you're running 2560 or Eyefinity. If you're running Eyefinity this thing kicks butt. If you're running single monitor 1920 you're going to have to throw in AA to really start taxing/pushing the machine pretty heavily. Actually Cayman is pretty good at 1920 so this guy starts stretching its legs at that kind of level.

Tom: You start getting CPU bound among other things.

Eric: You can get CPU bound, it's also a new architecture for us so there are other bottlenecks we're going to address over the coming months so it'll keep on getting better. The heuristics it's using are fundamentally different from our previous architecture, there are easier parts and there are parts that we're still working on. There's a lot, in fact we've got some games that have gone up 5-fold in performance since we started, from August to now, in drive changes to take advantage of the architecture. Probably another 2-3 months for us will really allow us to showcase the card. In one of the presentations you saw that performance changes, typically 15-40% - in this generation you'll see bigger numbers.

R3D: We were pleased to see you've put together a team for investigating anisotropic filtering (AF), and are learning from that with some great new improvements there which I'm really looking forward to picking apart.

Eric: [laughing] I'm so happy to hear that ...

Eric: [laughing] More of a challenge!

Eric: Y'know, we didn't even put in [the presentation] all the changes we put in, we put in a lot of stuff but - I'll be honest with you, we never even saw the problems on Cayman until people started looking at it. Then we went back and we have about three or four guys that worked on this pretty consistently, a whole bunch of guys and the whole texture team. We found some problems and we fixed those and we found other things that we're going to fix over time as well. The one promise we can say is 'better'. For sure, better. I don't know if you guys could see it from there?

R3D: Yes, I could.

Eric: That is the only texture in the whole demo [3DCenter.org grass texture 2] that shows that, if you go to checkerboard they all look perfect. And the checkerboard is pretty high frequency.
Anisotropic Filtering Improvements

R3D: The checkerboard, when I was looking at, I had to get right up on the screen and at that point I wondered if it was actually the GPU or if it is the display processing chip inside the monitor?

Tom: [laughing] Yes.

Eric: I like the split screen because if you're getting getting interpolation on both sides you know its something else, right? It really was this high frequency ground texture that was showing it and even then it's very subtle. It was hard to come up with a test case to show it. There were some of our guys that would look at and just see it, 'oh yeah! There it is!' and sure, if that's your job then you may spot it faster. For me, and it was funny because Matt (Skynner) came in this morning and he looked at all of them and he said 'I can't see any of them'. I pointed at it with the laser and he said 'man, that's way subtle, they won't see that beyond the front row'.

R3D: If you know what you're looking for you'll find it pretty easily.

Eric: Yah, and magnification on BC4, BC5, there's a couple of other things that we improved that and you should be able to see those. I think Cayman was rocking and we still have the best rotation and variance and now this one [Tahiti] just makes it a little bit better.

GDDR5 Memory

R3D: The grey screen of death problem which appeared to be related to memory timing and link retraining, there was an investigation and a driver fix and some new BIOS by certain partners, was there anything in there you learned for the chip design, that came into the new memory controller?

Tom: Memory clock switch, training G5 is tricky. It's a per lane and the real time to do that training is challenging. We've made improvements to cut that time down which should improve our memory state training.

Eric: I know we had some p-state issues which I think were fixed with our FIs; our sequencers for the memory are programmable with firmware associated with them, we made some changes there. This is a whole new design, [laughing] so it's a whole new set of problems is one way to look at it! Fundamentally, to Tom's point, we've made it a lot faster and a lot more stable and it's being leveraged in other parts of AMD's products. I don't expect the same problems to occur again. I don't remember hearing 'Grey Screen of Death' - catchy! It's a completely different design so I assume it wouldn't apply to these products.

R3D: You mentioned in your presentation -

Eric: I lied! :D

R3D: - that ECC support will be available on compute products, does that mean it's only going to be turned on for the FirePro/FireStream products?

Eric: At this point, yes. We thought about 'could it be used to improve yield on consumer products' and things like that and we may decide to that kind of thing, well we reserve the right do anything we want I guess! Right now those kinds of features would actually hurt performance for consumers because they do take away from memory storage (maybe not the internal, but the external DRAM) and it would certainly make the drivers more complex. The FirePro driver team are doing that because some of their customers desire that, I wouldn't say itâ

http://www.rage3d.com/interviews ... _2011/index.php?p=1

gtx5 · 发表于 2011-12-23 22:56

无责任GOOGLE机翻

AMD的Radeon HD 7970与模型，埃里克德默斯

AMD的图形部门的公司副总裁兼首席技术官埃里克德默斯，坐下来与业界动态的詹姆斯在最近AMD南方群岛高科技当天在得克萨斯州奥斯汀举行，在此之前。 AMD龙星校园，大溪地产品的示范和演示所示，我们希望多一些的几个项目上的信息。另一位绅士是在考勤，新设计的主要建筑师之一，但他的名字，我们的潦草注意证明indechiperable。他的名字是汤姆，我们会在细节只要我们从AMD公司公关填写。

詹姆斯在此之前，业界动态：我想要得到的前端多一点，你做了什么的信息，以提高从开曼设立这个新的架构，塔希提岛。的批评之一夷为平地，公平或不公平的，是三角形的安装率是不是很需要 - 这是一个瓶颈。你想想，什么？

埃里克德默斯：事实是，我们提高了工作效率;，我们没有改善原始的高峰和顶点率，这部分匹配（开曼），但我们已经提高了工作效率 - 的东西，如增加缓冲区，用来存储顶点处理的结果，同时我们光栅化的像素，所以我们有顶点，很多它具有显著接近峰值速度运行时提高了工作效率。

埃里克：现在，现实的情况是，我想不出一个应用程序，我们今天的峰值速度接近，但他们可能已经触及我们的工作效率的一些限制，因此，这些将在某些情况下与塔希提岛显著改善。从这个角度来看，我认为我们确定的。现在，现实明智的，可以做的更好？你知道，我只是没有看到，至少在很多情况下，纯粹的顶点率的一个瓶颈。我见过的瓶颈是镶嵌率，以及人们是如何使用镶嵌的人为虚增。我这样做的人，就像我们在战地3显示，在尘埃3，我真的很酷。我们自己的演示部分居民的纹理 - 它看起来真棒，我这样做。但是当你做像孤岛危机2，你刚才添加的三角形，你什么也不做，那只是愚蠢。

埃里克：我觉得它的人工手指指向只是因为我们的竞争，以不同的方式做的事情。但是，我们做不同的事情太多，例如，[镶嵌]从我们的高端性能差异，我们的底非常相似，所以你可以使用这两个引擎在相同的几何形状，只是规模您的填充率。其实我觉得这是一个更好的解决方案一般 - 它允许独立软件开发商开发一个几何的设置和使用上的多块电路板。

埃里克：我认为这些瓶颈很多是人为的和我们的朋友如何去和实施代码在别人的东西，造成一些部分。尽管如此，这种架构，正如你看到的镶嵌幻灯片，但它也为几何，着色和顶点必然的事情是很多的真实，你会看到在性能上的显着增加。你会看到，转化成更高的性能，至少在高镶嵌的游戏，这种类型的前端性能。你会看到塔希提岛做一个更好的工作，但就像我说的，但我认为这些限制，有些人可能有些人为这个架构将更好。
塔希提岛的芯片。埃里克德默斯舔它。：（

R3D：现在不同的缓存，你已经有了32计算单位为12 L2S喂养。请问，有潜力的一个瓶颈，你要很多的缺失，因为一切都试图打在同一时间的L2，或者是，只是没有一个合理的工作量？

埃里克：当然，我们会得到在L2未命中，那将造成瓶颈，有没有办法左右，的。但是，我们有50％以上的B / W和50％以上的L2S这大约比以前的时间。从根本上讲，我们相信，即使铜计数增加40％以上，我们实际上增加了50％我们的带宽，如果有的话，所以我们应该得到更好的，比我们之前。此外，L1S是现在的两倍大小，所以从这个角度看惦记他们将有数量将要减少。总体而言，当我们做小姐，现在的内存带宽高出50％，以及一般的东西应该平衡比以前更好。我无法想象任何情况下，我们会更糟。其实我不能想象一个情况下，我们会匹配;我认为，我们总是要提前，远远超过我们。唯一的例外是如果你做了很多的读/写，但如果你做了很多，我们要打击我们的上一代，因为我们没有缓存任何读/写，我们总是经历了内存 - 现在我们缓存读取/写入所有的时间。其实我觉得，我们将有多个任何一种读/写活动的整数性能改进。总的来说，我认为这个新的缓存架构，只是要好得多。

埃里克：图形，有50％-100％（高速缓存）和更多的带宽是有帮助，太的。计算将会更有帮助，太多。

R3D：计算单位，没有VLIW5或VLIW4，重组似乎喜欢将一些像DX9游戏上了年纪的图形工作负载的高效率。我们会看到在改善性能与时钟和过程的定期隆起？

埃里克：fillrates有涨，上升到50％[更多]作为，和纹理率已经从96至128像素，然后时钟速率更高，所以你会得到的表现恰到好处关闭蝙蝠 - 马上20-30％左右，刚刚从我们以前的6970，只是一种最低的数字。如果你有一个DX9的应用程序，真的着色有限，那么，我们会期望他们跑得更快。冥思苦想的例子，应该规模的3DMark的多一点，但一般我们看到在DX9应用程序，着色器的使用仍然是相当简单。我认为他们得到不错的表现隆起超过6970，从1.2 - 1.5倍的任何地方和在某些情况下甚至有2X。我想他们会运行该域，你们将要运行测试，当然如果你的CPU有限，那么您可能不会得到任何隆起，因为你不是“喂养野兽”。一般来说，如果你正在运行的分辨率较低，如果你正在运行1280或1600，好处有比如果你正在运行2560或Eyefinity技术少得多。如果你运行Eyefinity技术这个东西踢屁股。如果你正在运行的单显示器1920，你会扔在AA真正开始征税/推机相当严重。其实开曼是在1920不错，所以这家伙开始在这种水平伸展它的腿。

汤姆：你开始除其他事项外约束的CPU。

埃里克：你可以得到CPU束缚，它也是一个新的架构，我们还有其他瓶颈，我们将在未来几个月内解决，所以它会继续变得更好。启发式它的使用是从我们以前的架构根本不同，有比较容易的部分，并有部分，我们还在工作。还有很多，其实我们已经有了一些游戏涨了5倍性能自从我们开始，从8月到现在在驱动器的变化，采取的架构优势。大概2-3个月为我们将真正让我们向人们展示了卡。在演讲中，你看到的，性能变化，一般为15-40％ - 这一代人，你会看到更大的数字。

R3D：我们很高兴看到你放在一起进行调查各向异性过滤（AF）的一个团队，并从学习与一些伟大的新的改进，我真的很期待采摘除了。

埃里克：[笑]我很高兴听到... ...

埃里克：[笑]一个更大的挑战！

埃里克：你知道，我们甚至不[介绍]我们把所有的变化，我们把很多东西，但 - 我会老实跟你说，我们从来没有看到开曼群岛的问题，直到人们开始寻找它。然后，我们回去，我们有大约三或四的家伙，在此工作相当一贯，一大堆的家伙，整个纹理团队。我们发现了一些问题，我们固定的，我们发现，我们将修复时间的推移以及其他东西。一个承诺，我们可以说是'好'。当然，更好。我不知道，如果你们可以看到从那里？

R3D：是的，我可以。

埃里克：这是在整个演示只质地[3DCenter.org草质地2]显示，如果你去到棋盘，他们都期待完美。棋盘是相当高的频率。
各向异性过滤的改进

R3D的棋盘，当我在看，我在屏幕上得到正确的，在这一点上，我想知道，如果它实际上是在GPU或如果它是显示器内部的显示处理芯片？

汤姆：（笑）是的。

埃里克：我喜欢，因为如果你得到双方插值你知道别的东西，右分割画面？这的确是这种高频率的地面纹理显示，即使到那时，很微妙。这是很难拿出一个测试案例来证明这一点。有一些，看看我们的球员，只是看到它，“噢！它就在那里！“肯定的是，如果你的工作，那么你可能当场就更快。对于我来说，它很有趣，因为马特（Skynner）在今天上午来到他看着他们，他说：“我看不到任何他们”。我指着它与激光，他说：“男人的方式微妙，他们将不会看到超出了前排”。

R3D：如果你知道你在寻找你会发现它很容易。

埃里克：YAH，BC4，BC5的放大，有几个其他事情，我们改善了，你应该能看到这些。我想开曼是摇摆，我们仍然有最好的旋转和方差，现在这个人[大溪地]只是使它更好一点。

GDDR5记忆体

R3D：死的问题，这似乎是有关内存时序和链接再培训的灰色屏幕，有某些合作伙伴的调查和驱动程序修复和一些新的BIOS，是有什么在那里你学到的芯片设计，附带进入新的内存控制器？

汤姆：内存时钟切换，培训G5是棘手的。它的每一个车道，并实时做培训是具有挑战性的的。我们已经做了改进，削减下来，当时，这应该提高我们的记忆状态训练。

埃里克：我知道我们有一些P -状态的问题，我认为是与我们的金融机构的固定内存的时序控制器与他们相关的固件可编程的，我们做了一些变化。这是一个全新的设计，（笑），所以它的整个新的问题是一种方式来看待它！从根本上说，汤姆的观点，我们已经取得了很多更快和稳定得多，它的杠杆在AMD的产品的其他部分。我不希望再次发生同样的问题。我不记得听到“死亡的灰色屏幕” - 上口！这是一个完全不同的设计，所以我假设它并不适用于这些产品。

R3D：你刚才提到在您的演示文稿 -

埃里克：我撒谎！：D

R3D： - 支持ECC计算产品，这是否意味着它一定会被打开FirePro的/ FireStream产品吗？

埃里克：在这一点上，是的。我们认为“它可以被用来改善消费类产品的产量”之类的东西，我们可以决定的事情，我们保留权利做任何事情，我们希望我猜！眼下各种功能，这些实际上会伤害消费者的表现，因为他们带走从存储器存储（也许不是内部，但外部DRAM），这肯定会令司机更复杂的。 FirePro的驱动程序团队正在做的，因为他们的一些客户的愿望，我不会说东郡

iamw2d · 发表于 2011-12-23 23:00

part 1在哪?

Edison · 发表于 2011-12-23 23:22

iamw2d 发表于 2011-12-23 23:00
part 1在哪?

那是 n 个月之前的了。

iamw2d · 发表于 2011-12-23 23:39

Edison 发表于 2011-12-23 23:22
那是 n 个月之前的了。

dalao123 · 发表于 2011-12-24 08:18

看的昏昏欲睡啊，呵呵

hd4770 · 发表于 2011-12-24 08:51

据一个被他interview过的兄弟讲，这ERIC哥们的水平很差。

河蟹万岁 · 发表于 2011-12-24 10:08

埃里克德默斯舔它。

帐号		自动登录	找回密码
密码			注册

rage3d 的专访，Eric Demers and GCN, Part II

相关帖子

本帖子中包含更多资源

浏览过的版块

rage3d 的 专访，Eric Demers and GCN, Part II

相关帖子

本帖子中包含更多资源

浏览过的版块

rage3d 的专访，Eric Demers and GCN, Part II