|
欢迎讨论,谢绝无脑喷
Futuremark已经解释了为啥A7物理得分低的原因,个人认为还是说得通的。
3Dmark这个iphone5s这个物理得分低的原因还在于随机访问造成的cache miss。学CS的大都应该明白这cache miss的恐怖性。
A7的运算资源已经足够丰富,futuremark也提到了单执行物理运算时能获得>2x的性能加成。
3DMark测试中物理运算部分都是随机访问数据。这真正的随机访问对谁来说都是无解,不可预测的。
一般来说cache命中和不命中的执行速度差了一到两个数量级。如果每次访存都是一次miss那A7再强的执行资源也是白搭,对于A7来说3Dmark的物理测试就变成了随机访存测试
可能提高A7在这种随机访存测试中成绩的方法:
1)提高主频。主频高了L/S执行频率也就多了,但L/S始终是瓶颈
2)增加cachesize 现在A7是1MB L2 。增加到2MB或者更多 L2也许就能涵盖这测试的数据规模,至少有助于减少cache miss.
3)再加一组L/S...
4)加核,也就相当于多一组L/S。
那么为什么对baytrail S800 A15,这些影响不大呢?可能原因如下:
1)主频相对它们的前一代有提升,间接增加了L/S的频率
2)核多,间接增加了L/S单元。3Dmark这种物理测试本身并行性很好,不会有什么访存冲突
3)Cache够大。z3770 tegra4 S800这些都是2MB L2 cache。或者2MB刚好涵盖了3Dmark的数据规模,不过这个天晓得。但cache够大在随机访存测试中总是占便宜的
http://community.futuremark.com/forum/showthread.php?177840-Why-iphone5-and-iphone5S-share-same-physics-score/page3
We've looked at this at great length, and the result is quite interesting. We are working with our relevant partners on this to verify, but at current it seems that:
The compiler does do SIMD and Neon. These do not offer any effect. We even tried doing Neon optimizing by hand to see if the compiler is not doing what it should, but no real effect
The Physics test is uses Bullet, and practically the whole CPU time is spent in the soft body solver, PSolve_links. If you pull this function out of Bullet and bench it separately, you do see a 2x speed increase. However, once it's inside the physics engine, you see nothing.
As this seemed to make no sense, we spent a few days trying to understand what is happening. The result seems to be that if the soft bodies are arranged in memory so that the CPU can access them in a sequential fashion, you get a 2x to 3x increase in speed. This is higher if it can run up the memory, a bit lower if it runs down. The way bullet places the bodies in the memory is a lot more random and they are accesses in a jump-back-and-forth manner. When memory is accessed in this way, all speed gains are lost.
iPhone 5 shows none of this behaviour. It is realistic to assume that in the new 5s we see the new prefetch in action, but it cannot gain traction with a random memory access pattern.
It's good to understand that this is not a flaw in Bullet. Arranging complex memory structures in memory to be in a sequential fashion is non-trivial to say the least. Our hacked solution only worked for us as we knew exactly the data that we would be using. Worrying about where your memory segments lie is not something the programmer should have to worry about anyway.
In terms of our Physics test at large, it does not appear that we have an inherent flaw - our use-case is simply such that no gain is seen. Any game/app using Bullet would see the same, and there are naturally other apps that will see the same (in GeekBench, you can see a few tests where the same thing is happening).
|
|