多核版 PhysX FluidMark 1.2 ：4 核 CPU 击败 GTX 275 单卡

Edison · 发表于 2010-3-18 22:25

As we mentioned previously, upcoming FluidMark 1.2, next version of popular GPU PhysX testing and benchmarking application, will include support for Multi-Core CPU PhysX calculations, and overall multi-threading optimizations as well.

Jerome Guinot, FluidMark developer, was kind enough to provide us with latest beta-version of new Fluid-Mark 1.2, and we’ll try to answer finally, what is faster – GPU PhysX or properly optimized CPU PhysX.

But first, lets take a closer look at new FluidMark. (click to view full picture)

Main control panel now includes several additional options, like “Force PhysX CPU” – ability to switch between GPU and CPU PhysX, without necessity to use Nvidia Control Panel.

“Multi-core PhysX” checkbox enables all multi-threading optimizations, vital and most interesting part of new FluidMark.

“# of CPU cores” is used specify number of CPU cores dedicated to simulation (up to 32 in current version), however this option is no so transparent as it looks – increased number of cores adds additional fluid emitters to the scene (one emitter per core or two in general), and with equal number of particles, various number of emitters can affect performance.

Application window has also changed – benchmark is still based on SPH fluid simulation, buit into PhysX SDK (latest version 2.8.3.21 is used), but scene includes additional static objects, particles appearance if different and, as mentioned earlier, several emitters can be used simultaneously. Nice addition – GPU temperature overlay, usefull for GPU stress testing.

Final Global score in benchmarking mode is calculated now in a different way, and can’t be compared with previous version of FluidMark. It consist of two components – GraphX score (graphics framerate per second) and PhysX score (physics simulations per second).

Thus, Global score = (GraphX_score * 0.3 + PhysX_score * 1.7) / 2

Now, lets do some testing.

The Wonder of Multi-Threading !

Take a look at the following graph:

[Three emitters were used (# CPU cores = 4) with fixed number of particles - 15 000. Timerange - 60 sec. 800x600 rendering window. System: C2Q 9400 @ 2.66 GHz CPU, Nvidia GTX 275 + GTX 260 (192 sp) GPUs, 4GB RAM, Win XP, PhysX System Software 9.09.1112]

When “Multi-core PhysX” option is off, PhysX simulation and scene rendering are done in the same thread and, more important, PhysX SDK multi-threading flags are not set.

But when “Multi-core PhysX” is enabled, all PhysX simulations are done in separate threads and since there is still a thread for the rendering, scene rendering is boosted because there is no longer PhysX in scene thread. Same situation with PhysX, one or several threads are completely dedicated for physics simulation.

While SPH fluid simulation is running on CPU with “Multi-core PhysX” set to off, load is destributed through several cores (probably due to internal Windows threads management), but in sum that’s 26% – full one core.

But with multi-threaded optimizations enabled, application fully utilizes all four cores by 100%, what results in great speed boost.

In addition, one interesting detail was discovered – fluid simulation is running faster on GPU when one emitter is used, and opposite way – for CPU it prefers multiple emitters (with equal number of particles) – probably that’s peculiarity of PhysX SDK itself.

For example, with one emitter and multi-core PhysX switched to off, CPU simulation results in 36 global points (64 with 3 emitters – on graph above), while GTX 275 GPU – in 247 points (128 with 3 emitters). But since one emitter can’t utilize more than two cores, number of emitters was increased to gain equality.

Therefore, bechmarking seems to be a little tricky in new FluidMark. We are curious if someone will come with solid method after app release.

P.S. Thanks to Jerome for beta FluidMark and detailed explanations

只看该作者 · 发表于 2010-3-18 22:31

提示: 作者被禁止或删除内容自动屏蔽

餐具 · 发表于 2010-3-18 22:35

AMD那句话没白说啊，立杆见影的效果

绿色世界 · 发表于 2010-3-18 23:07

提示: 作者被禁止或删除内容自动屏蔽

ft5555 · 发表于 2010-3-18 23:29

ls的给解释下为何Multi-core PhysX开启后

gpu physx的得分也会倍增

rickerlian · 发表于 2010-3-18 23:53

世界真奇妙

ft5555 · 发表于 2010-3-19 00:00

不知道，同样离奇的是Multi-core PhysX Off下，CPU+GPU相对GPU几乎没有增长

PS：我记得以前哪位大大说 ...
纳尼？发表于 2010-3-18 23:36

更奇妙的图中275+260得分是 275单卡两倍多。。。。

AlanLW · 发表于 2010-3-19 00:10

还有一点，Q9400比GTX275便宜多了

nfsking2 · 发表于 2010-3-19 00:11

本帖最后由 nfsking2 于 2010-3-19 00:12 编辑

有个问题，不知道是不是我小白了
AMD提出的质疑是：Multi-Core CPU Support is Disabled in PhysX
而PhysX加速又分CPU和GPU
所以问题的焦点应该是，当PhysX以CPU做物理加速时，是否运用到了多核心

这个软件既然叫PhysX FluidMark，应该就是通过PhysX API，调用CPU或者GPU来做物理加速。
而从1.2版的情况来看，软件能成功调用多核CPU来做物理加速，甚至能将每个核心都占用满，所以AMD当初所谓的“PhysX禁用了多核心物理加速，以夸大GPU物理加速”这个说法就不怎么站得稳了。

所以现在的情况是：4核CPU如果满负荷做物理加速，比一张中高端显卡的效率更高，而NV也没故意屏蔽PhysX对多核CPU的支持。

所以一切问题都在游戏开发商那边？

nfsking2 · 发表于 2010-3-19 01:18

本帖最后由 nfsking2 于 2010-3-19 02:56 编辑

回复 nfsking2

显然FluidMark是通过特殊手段破解而达到了取消某些限制的，只要有功能这中破解对于程序 ...
ArthurMa 发表于 2010-3-19 00:41

那就要看FluidMark的作者到底是NV的人，还是中立人士了
因为按照他的说法：

I’m currently updating Geeks3D’s PhysX FluidMark tool and from my last tests, multi-core CPU support in PhysX seems to be ok (that confirms what NVIDIA said in this news)…

而这个"what NVIDIA said in this news"指的是：

Here is the reply of Nadeem Mohammad, NVIDIA’s PhysX director of product management, to AMD’s accusations:

I have been a member of the PhysX team, first with AEGIA, and then with Nvidia, and I can honestly say that since the merger with Nvidia there have been no changes to the SDK code which purposely reduces the software performance of PhysX or its use of CPU multi-cores.

Our PhysX SDK API is designed such that thread control is done explicitly by the application developer, not by the SDK functions themselves. One of the best examples is 3DMarkVantage which can use 12 threads while running in software-only PhysX. This can easily be tested by anyone with a multi-core CPU system and a PhysX-capable GeForce GPU. This level of multi-core support and programming methodology has not changed since day one. And to anticipate another ridiculous claim, it would be nonsense to say we “tuned” PhysX multi-core support for this case.

PhysX is a cross platform solution. Our SDKs and tools are available for the Wii, PS3, Xbox 360, the PC and even the iPhone through one of our partners. We continue to invest substantial resources into improving PhysX support on ALL platforms–not just for those supporting GPU acceleration.

As is par for the course, this is yet another completely unsubstantiated accusation made by an employee of one of our competitors. I am writing here to address it directly and call it for what it is, completely false. Nvidia PhysX fully supports multi-core CPUs and multithreaded applications, period. Our developer tools allow developers to design their use of PhysX in PC games to take full advantage of multi-core CPUs and to fully use the multithreaded capabilities.

大意就是NV从AEGIA收购PhysX之后，并没有对SDK做任何“限制多核CPU”的修改，并且举例XB360，PS3等多平台共用SDK来证明PhysX能够在多核CPU上良好运行。

所以，针对AMD的指责，这里又出现N种可能：
1.FluidMark的作者Jerome Guinot是NV的人，有办法通过特殊手段绕过PhysX的多线程限制（这个不好说，一切皆有可能）
2.FluidMark在使用CPU进行物理加速时，根本没有真正调用PhysX API（虽然大部分人无从可知，不过对于某些人来说，验证这个说法不是难事吧）
3.NV通过更新PhysX运行库，取消了对多核CPU的限制（更换低版本的PhysX运行库试试）
4.PhysX本身就没限制过CPU，AMD指责只是红眼行为（这个应该是跟第1+2点的可能性相同的）

而针对CPU和GPU加速的性能差异，有3种可能
1.CPU在处理物理加速时，效率确实比显卡高
2.FluidMark对GPU物理加速优化不够
3.单卡进行GPU物理加速时，GPU并未100%参与到物理加速中，一部分资源被用来渲染图像。

而性能差异的可能性中，后两点的可能性比较大，因为在FluidMark 1.1.1中，使用AN混交进行GPU物理加速时，GPU-Z显示物理加速卡的GPU占用率很低。

待机时，这张作为物理加速卡的8800GTX 768M（刷过Ultra的分频BIOS，所以GPU-Z识别型号请无视）的显存占用量是132M（混交必须将桌面扩展到N卡，会占用一定显存）

做1080P分辨率的物理加速时，GPU占用率34%，显存占用量297M（8xAA下，GPU仍然占用34%，显存311M）

也就是说FluidMark最多能占用180M左右的显存以及34%左右的GPU资源。

而此时，主卡5870的GPU占用率为96%

而5870的性能距离目前N卡王者GTX295也并不遥远
所以从以上测试情况看，要让任何一张N卡同时处理画面渲染和物理加速，确实是力不从心了

并且在现在的PhysX游戏中，游戏调用的显存大小仅128M（比如蝙蝠侠），具体见http://we.pcinlife.com/thread-1367643-1-1.html，或者搜索 PhysX acceleration currently allocates 128MB of device memory in default
（对此有疑问的XD，可以找找蝙蝠侠安装目录中的某个ini文件中的某个参数，搜索关键词 gpuHeapSize）

260+与275双卡对FluidMark进行渲染和加速时，效率大增好像也说明了某些问题

Kevsun · 发表于 2010-3-19 01:50

软件对硬件的支持不够？

只看该作者 · 发表于 2010-3-19 06:57

提示: 作者被禁止或删除内容自动屏蔽

86751213 · 发表于 2010-3-19 07:22

提示: 作者被禁止或删除内容自动屏蔽

fuxingchina · 发表于 2010-3-19 07:50

提示: 作者被禁止或删除内容自动屏蔽

zblskj · 发表于 2010-3-19 08:04

显而易见 NV 给游戏厂商出工程师出钱赞助，要是游戏厂商还用CPU做PHX，NV的GPU PHX 不就失去了原本的意义。从 E大的转帖看， PHX 是可以支持多核CPU的，而其运算效率不低。突然发现NV为什么这么着急于
开发费米，（费米很多构架像极了多核CPU) 。。。。。因为费米的特性才是 PHX 物理和CUDA的最佳运算构架。想想480 480个多核心的CPU 做物理效率就上去了。当然 NV还是要解决游戏中游戏+PHX物理协同工作的问题。
不过这点我相信NV 应该能做到。---

defv4 · 发表于 2010-3-19 08:18

不会的老黄不会向Intel低头的人

fuxingchina · 发表于 2010-3-19 08:45

提示: 作者被禁止或删除内容自动屏蔽

380 · 发表于 2010-3-19 08:51

提示: 作者被禁止或删除内容自动屏蔽

380 · 发表于 2010-3-19 09:00

提示: 作者被禁止或删除内容自动屏蔽

380 · 发表于 2010-3-19 09:03

提示: 作者被禁止或删除内容自动屏蔽

帐号		自动登录	找回密码
密码			注册

pds21 该用户已被删除	2^# 发表于 2010-3-18 22:31 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
pds21 该用户已被删除
	回复支持反对使用道具举报显身卡

绿色世界绿色世界当前离线积分 26 IP卡狗仔卡头像被屏蔽	4^# 发表于 2010-3-18 23:07 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
绿色世界绿色世界当前离线积分 26 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

westlee 该用户已被删除	12^# 发表于 2010-3-19 06:57 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
westlee 该用户已被删除
	回复支持反对使用道具举报显身卡

86751213 86751213 当前离线积分 5 IP卡狗仔卡头像被屏蔽	13^# 发表于 2010-3-19 07:22 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
86751213 86751213 当前离线积分 5 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

多核版 PhysX FluidMark 1.2 ：4 核 CPU 击败 GTX 275 单卡

本帖子中包含更多资源

浏览过的版块

fuxingchina fuxingchina 当前离线积分 17 IP卡狗仔卡头像被屏蔽	14^# 发表于 2010-3-19 07:50 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
fuxingchina fuxingchina 当前离线积分 17 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

fuxingchina fuxingchina 当前离线积分 17 IP卡狗仔卡头像被屏蔽	17^# 发表于 2010-3-19 08:45 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
fuxingchina fuxingchina 当前离线积分 17 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽	18^# 发表于 2010-3-19 08:51 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽	19^# 发表于 2010-3-19 09:00 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡

380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽	20^# 发表于 2010-3-19 09:03 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
380 380 当前离线积分 43 IP卡狗仔卡头像被屏蔽
	回复支持反对使用道具举报显身卡