|
原帖由 boris_lee 于 2008-8-27 23:22 发表 ![]()
fah据说是因为软件的问题,48x0只能用到320个sp,和3850性能比没什么提升,可能要等到GPU3 core
不过
3850和8600GT相当.....
GPU Client 2 在 AMD 卡上是采用 CAL/Brook+ 实现的,无法实现对具体的 ALU 进行调度控制,调度是完全由 GPU 的调度器、PC 实现的。
Mhouston 在 FAHForum 的原话是这样的:
"Not quite ahu.
The upper limit on the smaller WUs was basically 320 SPs or less. The larger WUs have more atoms so parallelize better automatically. I should note the code is complex so generalizations aren't useful other than at a high level.
We already have good VLIW utilization, so that's not the main issue. CPU overheads are holding back the GPU on smaller WUs, and we are knocking those down (you should have seen an overall PPD boost on many systems with Cat 8.8), but larger boost will require a tweak to CAL and Brook."
他说的是对于小包的 WU,性能看上去会只有 320 SP 或者更少,但是对于大块的 WU 是可以达到更佳的并行度的。
他所指出主要的原因是因为 CPU 在花费在小 WU 的负荷比较高从而抑制了 GPU 的并行度,他们正在设法把这部分的负荷弄小,催化剂 8.8 上可以看到全面的 PPD 提升,而更大的提升需要仰仗 CAL 的优化和 Brook 后端的改进。
当然也不要忽略了这段:
"Quite a bit better than small game shaders. The shaders (kernels) for Folding@Home are massive compared to game shaders so the compiler has much more opportunity to schedule. There are also much fewer memory loads/stores and straight line math. Basically, >4 for the heavy kernels and we have some that are >4.5." |
|