The upper limit on the smaller WUs was basically 320 SPs or less. The larger WUs have more atoms so parallelize better automatically. I should note the code is complex so generalizations aren't useful other than at a high level.
We already have good VLIW utilization, so that's not the main issue. CPU overheads are holding back the GPU on smaller WUs, and we are knocking those down (you should have seen an overall PPD boost on many systems with Cat 8.8), but larger boost will require a tweak to CAL and Brook."
他所指出主要的原因是因为 CPU 在花费在小 WU 的负荷比较高从而抑制了 GPU 的并行度,他们正在设法把这部分的负荷弄小,催化剂 8.8 上可以看到全面的 PPD 提升,而更大的提升需要仰仗 CAL 的优化和 Brook 后端的改进。
当然也不要忽略了这段:
"Quite a bit better than small game shaders. The shaders (kernels) for Folding@Home are massive compared to game shaders so the compiler has much more opportunity to schedule. There are also much fewer memory loads/stores and straight line math. Basically, >4 for the heavy kernels and we have some that are >4.5."作者: jhj9 时间: 2008-8-28 01:42