|
AMD Radeon HD 7970 with model, Eric Demers
Eric Demers, Corporate Vice President and CTO of the AMD's Graphics Division, sat down with Rage3D's James Prior at the recent AMD Southern Islands tech day, held in Austin, Texas. At the AMD Lonestar campus, demonstrations and presentations of Tahiti product were shown, and we wanted some more information on a couple of items. Another gentleman was in attendence, one of the main architects for the new design but our scrawled note of his name proved indechiperable. His name is Tom; we'll fill in the details as soon as we get them from AMD PR.
James Prior, Rage3D: I'd like to get a little more information on what you did on the front end, to improve set up from Cayman to this new architecture, Tahiti. One of the criticisms leveled, fairly or unfairly, is that the triangle setup rate isn't quite where it needs to be - it's a bottleneck. What do you think about that?
Eric Demers: The truth is we have improved the efficiency; we didn't improve the peak primitive and vertex rate, this part matches it (Cayman), but we've improved the efficiency - things like increasing the buffer that we use to store the results of vertex processing while we're rasterizing the pixels so we'll have vertexes; a lot of it has significantly improved the efficiency when running close to our peak speeds.
Eric: Now, the reality is I can't think of an application that comes close to our peak speed today, but they might have been hitting some of our efficiency limits, so those will be significantly improved in some cases with Tahiti. From that standpoint I think we're ok. Now, reality-wise, could do better? Y'know, I just don't see, at least in a lot of cases, pure vertex rates being that much of a bottleneck. Where I've seen more the bottleneck is on the tessellation rate, and that's artificially inflated by how people are using tessellation. I'm all for people doing it, like we showed in Battlefield 3, and in DiRT 3, that's really cool to me. Our own demo on the partially resident textures - it looks awesome to me to do that. But when you're doing like Crysis 2, where you just add triangles and you do nothing, that's just silly.
Eric: I think it's artificially pointing a finger just because our competition has done things a different way. But we've done things differently too, for example the [tessellation] performance differences from our top end to our bottom end are very similar, so you can use the same geometry in both those engines and just scale your fillrate. I actually think that's a better solution in general - it allows an ISV to develop one set of geometry and use that on multiple boards.
Eric: I think a lot of these bottlenecks are artificial and in some part caused by how our friends have gone and implemented code in other people's stuff. Having said that, this architecture, as you saw on the tessellation slide but it's also true for geometry, shading and a lot of things that are vertex bound, you'll see a significant increase in performance. You'll see that translate into much higher performance, at least in the high tessellation games, that have that type of front end performance. You'll see Tahiti do a much better job there but, like I said, though I think some of those limits might be somewhat artificial this architecture will be much better there.
Tahiti chip. Eric Demers licked it. :(
R3D: Now the different caches, you've got 32 compute units feeding into 12 L2s. Does that have a potential for a bottleneck where you've going to have a lot of misses because everything is trying to hit L2 at the same time or is that just not a reasonable workload?
Eric: Well sure, we'll get misses on the L2 and that'll cause bottlenecks, there's no way around that. But we have 50% more b/w and 50% more L2s this time around than before. Fundamentally we do believe that even though the CU count increased by more than 40%, we actually increased our bandwidth by 50% so we should, if anything, be better off than we were before. Also, the L1s are twice the size now, so from that standpoint the number of misses they'll have is going to be reduced as well. Overall, when we do miss, the memory bandwidth is now 50% higher as well so in general things should balance out to be better than before. I can't imagine any case where we'd be any worse. I actually can't imagine a case where we'd match; I think we're always going to be ahead, and well ahead, of where we were. The only exception would be if you were doing a lot of read/write but then if you're doing a lot of read/write we're going to blow our previous generation out because we didn't cache any of that, we always went through memory - now we're caching read/writes all the time. I actually think we'll have a multiple integer performance improvement for any kind of read/write activity. In general I think this new cache architecture is just going to be much better.
Eric: Graphics, having 50%-100% more (cache) and more bandwidth is going to help there, too. Compute is going to help even more, too.
R3D: The reorganization of the compute units, no VLIW5 or VLIW4, seems like it would be efficient for some of the older graphics workloads like DX9 games. Are we going to see an improvement in performance vs. regular uplift from clocks and process?
Eric: The fillrates have gone up, up to to 50% [more] as well, and the texture rates has gone up from 96 to 128 pixels, and then clock rates are higher, so you are going to get performance just right off the bat - right away about 20-30% just from our previous 6970, just kind of minimum numbers. If you had a DX9 application that were really shader limited then yes, we would expect them to run faster. Trying to think of examples, the 3DMark ones should scale a little bit more but in general what we've seen in DX9 apps is that the shader use is still pretty simple. I think they're getting pretty good performance uplifts over 6970, anywhere from 1.2-1.5x and maybe even some 2X in some cases. I think they'll run that gamut, you guys will have to run the tests, certainly if you're CPU limited then you may not get any uplift because you're not 'feeding the beast'. Generally if you're running low resolution, if you're running 1280 or even 1600, the benefits there are much smaller than if you're running 2560 or Eyefinity. If you're running Eyefinity this thing kicks butt. If you're running single monitor 1920 you're going to have to throw in AA to really start taxing/pushing the machine pretty heavily. Actually Cayman is pretty good at 1920 so this guy starts stretching its legs at that kind of level.
Tom: You start getting CPU bound among other things.
Eric: You can get CPU bound, it's also a new architecture for us so there are other bottlenecks we're going to address over the coming months so it'll keep on getting better. The heuristics it's using are fundamentally different from our previous architecture, there are easier parts and there are parts that we're still working on. There's a lot, in fact we've got some games that have gone up 5-fold in performance since we started, from August to now, in drive changes to take advantage of the architecture. Probably another 2-3 months for us will really allow us to showcase the card. In one of the presentations you saw that performance changes, typically 15-40% - in this generation you'll see bigger numbers.
R3D: We were pleased to see you've put together a team for investigating anisotropic filtering (AF), and are learning from that with some great new improvements there which I'm really looking forward to picking apart.
Eric: [laughing] I'm so happy to hear that ...
Eric: [laughing] More of a challenge!
Eric: Y'know, we didn't even put in [the presentation] all the changes we put in, we put in a lot of stuff but - I'll be honest with you, we never even saw the problems on Cayman until people started looking at it. Then we went back and we have about three or four guys that worked on this pretty consistently, a whole bunch of guys and the whole texture team. We found some problems and we fixed those and we found other things that we're going to fix over time as well. The one promise we can say is 'better'. For sure, better. I don't know if you guys could see it from there?
R3D: Yes, I could.
Eric: That is the only texture in the whole demo [3DCenter.org grass texture 2] that shows that, if you go to checkerboard they all look perfect. And the checkerboard is pretty high frequency.
Anisotropic Filtering Improvements
R3D: The checkerboard, when I was looking at, I had to get right up on the screen and at that point I wondered if it was actually the GPU or if it is the display processing chip inside the monitor?
Tom: [laughing] Yes.
Eric: I like the split screen because if you're getting getting interpolation on both sides you know its something else, right? It really was this high frequency ground texture that was showing it and even then it's very subtle. It was hard to come up with a test case to show it. There were some of our guys that would look at and just see it, 'oh yeah! There it is!' and sure, if that's your job then you may spot it faster. For me, and it was funny because Matt (Skynner) came in this morning and he looked at all of them and he said 'I can't see any of them'. I pointed at it with the laser and he said 'man, that's way subtle, they won't see that beyond the front row'.
R3D: If you know what you're looking for you'll find it pretty easily.
Eric: Yah, and magnification on BC4, BC5, there's a couple of other things that we improved that and you should be able to see those. I think Cayman was rocking and we still have the best rotation and variance and now this one [Tahiti] just makes it a little bit better.
GDDR5 Memory
R3D: The grey screen of death problem which appeared to be related to memory timing and link retraining, there was an investigation and a driver fix and some new BIOS by certain partners, was there anything in there you learned for the chip design, that came into the new memory controller?
Tom: Memory clock switch, training G5 is tricky. It's a per lane and the real time to do that training is challenging. We've made improvements to cut that time down which should improve our memory state training.
Eric: I know we had some p-state issues which I think were fixed with our FIs; our sequencers for the memory are programmable with firmware associated with them, we made some changes there. This is a whole new design, [laughing] so it's a whole new set of problems is one way to look at it! Fundamentally, to Tom's point, we've made it a lot faster and a lot more stable and it's being leveraged in other parts of AMD's products. I don't expect the same problems to occur again. I don't remember hearing 'Grey Screen of Death' - catchy! It's a completely different design so I assume it wouldn't apply to these products.
R3D: You mentioned in your presentation -
Eric: I lied! :D
R3D: - that ECC support will be available on compute products, does that mean it's only going to be turned on for the FirePro/FireStream products?
Eric: At this point, yes. We thought about 'could it be used to improve yield on consumer products' and things like that and we may decide to that kind of thing, well we reserve the right do anything we want I guess! Right now those kinds of features would actually hurt performance for consumers because they do take away from memory storage (maybe not the internal, but the external DRAM) and it would certainly make the drivers more complex. The FirePro driver team are doing that because some of their customers desire that, I wouldn't say itâ
http://www.rage3d.com/interviews ... _2011/index.php?p=1
|
|