POPPUR爱换

标题: PhysX 被踢爆在 cpu 代码上采用 x87。更新:NVIDIA 否认劣化/ Havok 同样以 x87 为主 [打印本页]

作者: Edison    时间: 2010-7-7 17:02
标题: PhysX 被踢爆在 cpu 代码上采用 x87。更新:NVIDIA 否认劣化/ Havok 同样以 x87 为主
Havok 同样被 westlee 指出主要采用 x87 代码:

http://we.pcinlife.com/redirect. ... 32&pid=27517395

http://we.pcinlife.com/redirect. ... 32&pid=27546025



NVIDIA 的回应:

http://www.thinq.co.uk/2010/7/8/ ... g-cpu-physx/page-1/

Nvidia has hit out at claims it's deliberately hobbling CPU PhysX, describing the reports as "factually inaccurate."

Speaking to THINQ, Nvidia's senior PR manager Bryan Del Rizzo said "any assertion that we are somehow hamstringing the CPU is patently false." Del Rizzo states that "Nvidia has and will continue to invest heavily on PhysX performance for all platforms - including CPU-only on the PC."

The response follows a recent report on the Web claiming CPU-PhysX was unnecessarily reliant on x87 instructions, rather than SSE. The report also suggested PhysX wasn't properly multi-threaded, with the test benchmarks showing a dependence on just one CPU core.

Let's start with multi-threading, which Del Rizzo says is readily available in CPU-PhysX, and "it's up to the developer to allocate threads as they see fit based on their needs." He points out that you only need to look at the scaling shown in the CPU tests in 3DMark Vantage and FluidMark to see that CPU-PhysX is perfectly capable of scaling performance as more cores are added.



However, he notes that the current PhysX 2.x code "dates back to a time when multi-core CPUs were somewhat of a rarity," explaining why CPU-PhysX isn't automatically multi-threaded by default. Yet despite this, Del Rizzo says it's easy enough for a developer to implement multi-threading in PhysX 2.x.

"There are some flags that control the use of 'worker threads' which perform functions in the rigid body pipeline," he says as an example, "and each NxScene runs in a separate thread."

The point appears to be moot in the long-term anyway, as Nvidia is apparently planning to introduce new automatic multi-threading features in the forthcoming PhysX 3.0 SDK.

This "uses a task-based approach that was developed in conjunction with our Apex product to add in more automatic support for multi-threading," explains Dell Rizzo.

The new SDK will automatically take advantage of however many cores are available, or the number of cores set by the developer, and will also provide the option of a "thread pool" from which "the physics simulation can draw resources that run across all cores."

In addition to the new multi-threading features, Del Rizzo also says "SSE will be turned on by default" in the new SDK. However, he notes that "not all developers want SSE enabled by default, because they still want support for older CPUs for their software versions." The original

Why do games developers still want to provide support for CPUs that are over ten years old? Del Rizzo says it's up to the game devs and what they demand, but he reiterates that it's definitely not a deliberate attempt to hobble CPU PhysX.The original report by Real World Technologies showed the Dark Basic PhysX Soft Body demo (below) made heavy use of x87 instructions, rather than SSE.



"We have hundreds of developers who are using PhysX in their applications," he says, "and we have a responsibility to ensure we do not break compatibility with any platforms once they have shipped. Historically, we couldn't become dependent on any hardware feature like SSE after the first revision has shipped."

He also points out that the PhysX 2.x SDK does feature at least some SSE code, and SSE isn't necessarily faster anyway. "We have found sometimes non-SSE code can result in higher performance than SSE vector code in many situations," he says. However, in the long-term SSE will apparently be the way forward for CPU-PhysX in the long term. "We will continue to use SSE and we plan to enable it by default in future releases," says Del Rizzo.

In short, it looks as though there's a fair bit of legacy detritus in the current PhysX SDK, partly due to the demands from games devs. Nevertheless, there are already ways in which developers can use multi-threading in CPU-PhysX, and full SSE support and improved multi-threading will be coming shortly.

This doesn't look like a company trying to deliberately cripple CPU-PhysX to make its GPUs look good.




http://realworldtech.com/page.cf ... 70510142143&p=1

Introduction

One of the latest challenges in computer gaming is modeling the game environment, with a high degree of realism. The most gaming obvious improvements in the last 25 years have been graphical – from the early days of 2D sprites like Metroid or King's Quest, to 3D rendering with Glide and later DirectX and OpenGL, powering the latest games like Crysis. Features such as multi-sample anti-aliasing and anisotropic filtering produce more attractive images and increasing amounts of effort and computational capacity are spent on accurately portraying difficult phenomena such as smoke, water reflections, hair and shadows. However, an accurate visualization of an object is only as convincing and realistic as the modeling of the object itself; a glass hurled against a wall that bounces away harmlessly is unlikely to be convincing, no matter how beautifully rasterized (or ray traced). Consequently, as graphics have improved, modeling the underlying behavior becomes increasingly important. This article delves into the recent history of real-time game physics libraries (specifically PhysX), and analyzes the performance characteristics of PhysX. In particular, through our experiments we found that PhysX uses an exceptionally high degree of x87 code and no SSE, which is a known recipe for poor performance on any modern CPU.

History of PhysX

Advances in semiconductor manufacturing have pushed Moore's Law inexorably forward over the last 25 years. Graphics, by it's nature, is a trivially parallel application and has taken full advantage of the additional transistors and integration offered by Moore's Law. Each generation of new hardware is increasingly more powerful and enables new levels of graphical quality. In this context, real-time physics engines were developed to help games accurately model the behavior of objects, according to the relevant laws of physics (e.g. Newtonian mechanics). In 2006, Ageia, a little known hardware start up out of Washington University in St. Louis, launched a dedicated coprocessor for physics. Ageia’s physics engine was cleverly known as PhysX and ran on a specialized Physics Processing Unit (PPU). Ageia was hoping and betting (unwisely) that the PPU would transform the video game industry in the same way that 3dfx’s Voodoo graphics cards did in the 1990’s. Unfortunately, there is not room for more than two processors in a modern PC, and the CPU and GPU have already made their mark. Even two separate processors is somewhat dubious, given that the vast majority of the market is comfortable with integrated graphics. More problematic, software developers were reluctant to fully embrace the PhysX API, given that few gamers were buying the hardware – the perennial chicken-and-egg problem. Unsurprisingly, the company was bought at fire sale prices by Nvidia in 2008…so in that sense, Ageia did live up the legacy of 3dfx.

The Ageia PPU (below in Figure 1) itself was not particularly revolutionary. In many respects, it resembled Sony's Emotion Engine in the PS2 or the Cell in the PS3; it was a primitive throughput optimized processor, albeit with a familiar instruction set. The PPU had a 32-bit MIPS control processor with many vector execution units. It was 183mm2 manufactured on a 130nm TSMC process, which was hardly modern; contemporary CPUs were using a 65nm process.


Figure 1 - Ageia’s Physics Processing Unit (PPU), courtesy of the Tech Report
At Nvidia, the PPU was wisely discontinued and PhysX was ported to the proprietary CUDPro-Agramming environment. The goal was to execute on Nvidia’s GPUs (instead of the PPU) and demonstrate the benefits of GPU computing (sometimes known as GPGPU) for consumers. One of the advantages of executing on an Nvidia GPU rather than the PPU is that many gamers actually own Nvidia GPUs; thus solving part of the chicken-and-egg problem for software developers. Of course, PhysX can also execute on the CPU, albeit with reduced performance. This guarantees that games written with PhysX will function correctly on any PC platform; however, there are no performance guarantees. Additionally, PhysX continues to be used as a software library on the three major consoles, yet another incentive for developer adoption.

Profiling PhysX

Nvidia unquestionably uses PhysX as an exclusive marketing tool for their GPUs, and it clearly benefits from executing on a GPU. Nvidia claims that a modern GPU can improve physics performance by 2-4X over a CPU. That’s a pretty impressive claim, and some benchmarks (e.g. Cryostasis) seem to bear that out. However, detractors of Nvidia (largely those working at one of Nvidia's competitors) have repeatedly claimed that PhysX purposefully handicaps execution on a CPU to make GPUs look better. Of course, comments from a competitor should be taken with a large grain of salt. But if Nvidia does cripple CPU PhysX, it would throw into question the extent to which GPU PhysX is really beneficial. Certainly a 4X advantage is worth while. However, if the CPU is really hobbled and runs 2X slower by design, that would mean that the GPU only has a 2X advantage in reality, which is far less impressive.

A couple months ago, we decided we would profile a couple of applications which use PhysX to test how PhysX behaves on the CPU and GPU. Initially, we were going to use VTune to compare, contrast and analyze both GPU accelerated and CPU PhysX by collecting performance counter data. However, after we first ran the experiment with VTune to analyze PhysX execution on the CPU, our results were so strange that we changed our plan to focus solely on profiling CPU PhysX and examine how it is tuned for the CPU.

Experimental Setup

Our test system is a relatively modern 3.2GHz Nehalem (Bloomfield), with a Nvidia GTX 280 GPU and 3GB of memory (3 DIMMs). It runs Windows 7 (64-bit), with nvcuda.dll version 8.17.11.9621 and PhysX version 09.09.1112. To test PhysX, we used the Cryostasis tech demo and also the Dark Basic PhysX Soft Body Demo and analyzed the execution using Intel’s VTune. In each case, the NV control panel was set to disable hardware PhysX acceleration, and then run with VTune. For comparison, the two tests were also run with GPU accelerated PhysX. As expected, the GPU accelerated versions ran at a reasonable speed with very nice effects. However, the CPU chugged along rather sluggishly. There was a very clear difference in performance that shows the benefits of accelerating PhysX on a GPU.

VTune analyzes the execution of an application at several levels of granularity. The coarsest is the processes running in the system. From there, VTune can drill down into interesting processes and examine the threads within the process. The finest granularity is inspecting the individual modules executed within each thread. For each of the tests, we analyzed at every level and highlighted the key processes, threads and modules being used. We also tracked several performance counters, which are reported in our results:

Cycles – The number of unhalted clock cycles
Instructions – The total number of instructions retired
x87 instructions – The total number of x87 instructions retired, which will be a portion of the overall instructions retired
x87 uops – The number of x87 uops executed (note that a uop can be executed, but then squashed e.g. due to branch misprediction).
FP SSE uops – The number of floating point SSE uops executed (this includes SSE1 and SSE2 uops, both scalar and packed)
VTune also tracks the Instruction Per Cycle (or IPC), which is the average number of instructions retired each cycle. Nehalem can retire up to 4 instructions per cycle, and realistically it can probably sustain an IPC of 0.5-1.5 on most workloads.

One essential reminder: VTune uses statistical sampling, and thus the accuracy depends on the number of samples. If there are relatively few samples, then the numbers may vary substantially. In general, the longer running processes/threads/modules will be sampled more often and hence generate more accurate data, while those processes/threads/modules which run only briefly may yield less than ideal results. One advantage of working with modern CPUs is that they execute billions of cycles per second, so the law of large numbers ensures that the results are accurate and stable.

While it would be nice to track many more performance counters, we were ultimately limited by the amount of time available, and frankly many of the counters were relatively uninteresting in the context of PhysX. The results of our profiling are on the next page.
作者: Edison    时间: 2010-7-7 17:03
Profiling Results

With VTune, we first profiled at the coarsest granularity - focusing on processes running in the system. Based on the number of instructions retired and cycles spent, we selected the top processes. To drill down further, we profiled the threads within each top process. Last, we selected the top threads and then profiled the modules within each top thread.

Chart 1 below shows the results from profiling the active processes for both workloads (Cryostasis and Soft Body Physics). In each case, we kept the top 10 process, as measured by the percentage of instructions retired. Generally, the percentage of cycles is closely correlated with the instructions retired, but there is some slight variation. In each of the charts, we bolded the entries that were important and selected for further analysis. The right hand side of the chart contains the number of events observed during the experiment, while the left hand side contains percentages for each type of event observed during the experiment. For example, 90.9% of the floating point SSE uops observed during our experiment were executed from the Cryostasis process.

Chart 1 – Process level view of PhysX applications


In Cryostasis, there is only one process of significance, cryostasis.exe itself; all others constitute roughly 2% of instructions retired and 10% of the cycles. Strangely enough, Cryostasis uses a tremendous amount of x87 instructions; roughly 31% of the instructions retired are x87. There are plenty of x87 uops, but hardly any SSE floating point uops, roughly a 100:1 ratio. Perhaps at finer granularity, it will be clear exactly where these x87 instructions are coming from. Despite the x87 instructions, the IPC is a respectable 1.15.

Similarly, the Soft Bodies demo is dominated by a single process which accounts for almost all instructions (97%) and cycles (87%). The SoftBodies.exe process is heavily weighted towards x87 instructions, which are 31% of all retired instructions, with few SSE floating point operations. Like Cryostasis, the IPC is pretty good, achieving 1.23, largely due to the structured nature of the underlying the physics code. The slight difference between the two probably reflects the additional code required for a game, rather than a simple screen demo.

Chart 2 – Thread level view of PhysX applications


Drilling down to the thread level in Chart 2, there are two significant threads within the cryostasis.exe process, although the labels defy easy comprehension. Thread99 is the more important of the two, accounting for 80% of the cycles and instructions retired, although thread24 is significant enough to note. Looking at thread99 in Chart 3, the vast majority of time is spent inside the PhysXCore.dll module, which uses no SSE and all x87 for floating point calculations (roughly 35% of instructions retired in PhysXCore.dll are x87). In fact, PhysXCore.dll is the culprit responsible for 91% of all x87 instructions retired in the entire process. Despite the use of x87, the IPC is fairly high, 1.4 instructions retired per cycle.

Chart 3 – Module level view of Cryostasis


Thread24 corresponds primarily to cryostasis.exe itself and is a smaller portion of the overall process (roughly 10%). Thread24 uses some SSE floating point operations, although this is still dwarfed by the overall use of x87 operations. There are roughly 3X as many x87 uops as SSE floating point uops, and the x87 instructions are 15% of the instructions retired in the module and 3% of the instructions retired in the process.

SoftBodies.exe has two principal component threads; thread71 is roughly 73% of the overall instructions retired and cycles, while thread1 is the remaining 26%. Thread71 is almost entirely composed of the PhysXCore.dll module. Again, this module does not use any SSE and instead relies on x87; an incredible 40% of the retired instructions are x87. Since the module dominates the process overall, it is not surprising that 95% of the x87 instructions retired in the process are found within this one module. The IPC for this module is similar to the IPC observed when executing cryostasis, a healthy 1.4, which helps to explain the overall IPC of the process.

Oddly enough neither workload is multithreaded in a meaningful way. In each case, one thread is doing 80-90% of the work, rather than being split evenly across two or four threads – or as is done in an Nvidia GPU, hundreds of threads.

Chart 4 – Module level view of SoftBodies.exe


The second and smaller thread1 is primarily ole32.dll, which is a library used by Windows for OLE (Object Linking and Embedding). The ole32.dll module has a little x87 code, about 6% of instruction retired, but far less than the massive 40% found in PhysXCore.dll. It’s not quite clear what the library is actually doing, but it only contributes a little to the overall use of x87.

Overall, the results are somewhat surprising. In each case, the PhysX libraries are executing with an IPC>1, which is pretty good performance. But at the same time, there is a disturbing large amount of x87 code used in the PhysX libraries, and no SSE floating point code. Moreover, PhysX code is automatically multi-threaded on Nvidia GPUs by the PhysX and device drivers, whereas there is no automatic multi-threading for CPUs.


作者: Edison    时间: 2010-7-7 17:03
Why x87?

The x87 floating point instructions are positively ancient, and have been long since deprecated in favor of the much more efficient SSE2 instructions (and soon AVX). Intel started discouraging the use of x87 with the introduction of the P4 in late 2000. AMD deprecated x87 since the K8 in 2003, as x86-64 is defined with SSE2 support; VIA’s C7 has supported SSE2 since 2005. In 64-bit versions of Windows, x87 is deprecated for user-mode, and prohibited entirely in kernel-mode. Pretty much everyone in the industry has recommended SSE over x87 since 2005 and there are no reasons to use x87, unless software has to run on an embedded Pentium or 486.

x87 uses a stack of 8 registers with an extended precision 80-bit floating point format. However x87 data is primarily stored in memory with a 64-bit format that truncates the extra 16 bits. Because of this truncation, x87 code can return noticeably different results if the data is spilled to cache and then reloaded. x87 instructions are scalar by nature, and even the highest performance CPUs can only execute two x87 operations per cycle.

In contrast, SSE has 16 flat registers that are 128 bits wide. Floating point numbers can be stored in a single precision (32-bit) or double precision (64-bit) format. A packed (i.e. vectorized) SSE2 instruction can perform two double precision operations, or four single precision operations. Thus a CPU like Nehalem or Shanghai can execute 4 double precision operations, or 8 single precision operations per cycle. With AVX, that will climb to 8 or 16 operations respectively. SSE also comes in a scalar variety, where only one operation is executed per instruction. However, scalar SSE code is still somewhat faster than x87 code, because there are more registers, SSE instructions are slightly lower latency than the x87 equivalents and stack manipulation instructions are not needed. Additionally, some SSE non-temporal memory accessses are substantially faster (e.g. 2X for AMD processors) as they use a relaxed consistency model. So why is PhysX using x87?

PhysX is certainly not using x87 because of the advantages of extended precision. The original PPU hardware only had 32-bit single precision floating point, not even 64-bit double precision, let alone the extended 80-bit precision of x87. In fact, PhysX probably only uses single precision on the GPU, since it is accelerated on the G80, which has no double precision. The evidence all suggests that PhysX only needs single precision.

PhysX is certainly not using x87 because it contains legacy x87 code. Nvidia has the source code for PhysX and can recompile at will.

PhysX is certainly not using x87 because of a legacy installed base of older CPUs. Any gaming system purchased since 2005 will have SSE2 support, and the PPU was not released till 2006. Ageia was bought by Nvidia in 2008, and almost every CPU sold since then (except for some odd embedded ones) has SSE2 support. PhysX is not targeting any of the embedded x86 market either; it’s designed for games.

The truth is that there is no technical reason for PhysX to be using x87 code. PhysX uses x87 because Ageia and now Nvidia want it that way. Nvidia already has PhysX running on consoles using the AltiVec extensions for PPC, which are very similar to SSE. It would probably take about a day or two to get PhysX to emit modern packed SSE2 code, and several weeks for compatibility testing. In fact for backwards compatibility, PhysX could select at install time whether to use an SSE2 version or an x87 version – just in case the elusive gamer with a Pentium Overdrive decides to try it.

But both Ageia and Nvidia use PhysX to highlight the advantages of their hardware over the CPU for physics calculations. In Nvidia’s case, they are also using PhysX to differentiate with AMD’s GPUs. The sole purpose of PhysX is a competitive differentiator to make Nvidia’s hardware look good and sell more GPUs. Part of that is making sure that Nvidia GPUs looks a lot better than the CPU, since that is what they claim in their marketing. Using x87 definitely makes the GPU look better, since the CPU will perform worse than if the code were properly generated to use packed SSE instructions.
作者: Edison    时间: 2010-7-7 17:04
Analysis

Realistically, Nvidia could use packed, single precision SSE for PhysX, if they wanted to take advantage of the CPU. Each instruction would execute up to 4 SIMD operations per cycle, rather than just one scalar operation. In theory, this could quadruple the performance of PhysX on a CPU, but the reality is that the gains are probably in the neighborhood of 2X on the current Nehalem and Westmere generation of CPUs. That is still a hefty boost and could easily move some games from the unplayable <24 FPS zone to >30 FPS territory when using CPU based PhysX. To put that into context, here’s a quote from Nvidia’s marketing:

[In Cryostasis], with fine grained simulation of water, icicle destruction, and particle effects, the CPU shows itself as woefully inadequate for delivering playable framerates. GPUs that lack PhysX support become bottlenecked as a result, delivering the same level of performance irrespective of the hardware's graphics capability. GeForce GPUs with hardware physics support show a 2-4x performance gain, delivering great scalability across the GPU lineup.

That 2-4X performance gain sounds respectable on paper. In reality though, if the CPU could run 2X faster by using properly vectorized SSE code, the performance difference would drop substantially and in some cases disappear entirely. Unfortunately, it is hard to determine how much performance x87 costs. Without access to the source code for PhysX, we cannot do an apples-to-apples comparison that pits PhysX using x87 against PhysX using vectorized SSE. The closest comparison would be to compare the three leading physics packages (Havok from Intel, PhysX from Nvidia and the open source Bullet) on a given problem, running on the CPU. Havok is almost certain to be highly tuned for SSE vectors, given Intel’s internal resources and also their emphasis on using instruction set extensions like SSE and the upcoming AVX. Bullet is probably not quite as highly optimized as Havok, but it is available in source form, so a true x87 vs. vectorized SSE experiment is possible.

Not only would this physics solver comparison reveal the differences due to x87 vs. vectorized SSE, it would also show the impact of multi-threading. A review at the Tech Report already demonstrated that in some cases (e.g. Sacred II), PhysX will only use one of several available cores in a multi-core processor. Nvidia has clarified that CPU PhysX is by default single threaded and multi-threading is left to the developer. Nvidia has demonstrated that PhysX can be multi-threaded using CUDA on top of their GPUs. Clearly, with the proper coding and infrastructure, PhysX could take advantage of several cores in a modern CPU. For example, Westmere sports 6 cores, and using two cores for physics could easily yield a 2X performance gain. Combined with the benefits of vectorized SSE over x87, it is easy to see how Pro-Aper multi-core implementation using 2-3 cores could match the gains of PhysX on a GPU.

While as a buyer it may be frustrating to see PhysX hobbled on the CPU, it should not be surprising. Nvidia has no obligation to optimize for their competitor’s products. PhysX does not run on top of AMD GPUs, and nobody reasonably expects that it will. Not only because of the extra development and support costs, but also AMD would never want to give Nvidia early developer versions of their products. Nvidia wants PhysX to be an exclusive, and it will likely stay that way. In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway). For Nvidia, decreasing the baseline CPU performance by using x87 instructions and a single thread makes GPUs look better. This tactic calls into question the CPU vs. GPU comparisons made using PhysX; but the name of the game at Nvidia is making the GPU look good, and PhysX certainly fits the bill in the current incarnation.

The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU. There is already a large and contentious debate concerning the advantages of GPUs over CPUs and PhysX is another piece of that puzzle, but one that seems to create questions, rather than answers.
作者: 380    时间: 2010-7-7 17:15
提示: 作者被禁止或删除 内容自动屏蔽
作者: heavenboy    时间: 2010-7-7 17:16
那nv不是丑大了?
作者: 870717    时间: 2010-7-7 17:24
PhysX的负责人不是跳AMD去了吗?NV还会继续做吗?
tft1122 发表于 2010-7-7 17:18



    只是创始人之一。 。NV又不是一个人搞PX
作者: chaosXP    时间: 2010-7-7 17:25
这消息也太劲爆了点,留爪!
作者: sfeng0    时间: 2010-7-7 17:32
哇!!
没看懂
这个采用x87 code 有什么意义么?
作者: 土星实验室    时间: 2010-7-7 17:33
有什么问题吗?新版驱动可以选择GPU PhysX 还是 CPU PhysX,用后者的话,自然要用到 x87 指令
作者: 望君珍重    时间: 2010-7-7 17:33
敢来点中文吗。。。
作者: ddrill    时间: 2010-7-7 17:35
意思是没有用sse
作者: aibo    时间: 2010-7-7 17:38
nv故意弱化PhysX在CPU上的表现?

还是当初ageia的问题,只是被nv继承了。
作者: iamspy    时间: 2010-7-7 17:41
NV只用X87而不用SSE啊。浮点性能会差好多哦。
作者: itany    时间: 2010-7-7 17:46
我早就说过NV恶意劣化CPU性能,结果站长还让我提出证据,哎
作者: gzpony    时间: 2010-7-7 17:47
是没有为近期的cpu作出优化。

x87的代码,那兼容性一定很好了,从床底下搬一台486都可以跑。哈哈
作者: Edison    时间: 2010-7-7 17:47
哇!!
没看懂
这个采用x87 code 有什么意义么?
sfeng0 发表于 2010-7-7 17:32


目前的 cpu 每个内核一般只能同时执行两条 x87 mul 或 add,x87 不是 SIMD 指令,因此数据处理吞吐率只有 SSE2 的 1/2(SSE 的 1/4)。
作者: Edison    时间: 2010-7-7 17:50
是没有为近期的cpu作出优化。

x87的代码,那兼容性一定很好了,从床底下搬一台486都可以跑。哈哈
gzpony 发表于 2010-7-7 17:47


PhysX 只需要 32-bit 单精度浮点操作即可,而 SSE 可以上溯到 Pentium III 时代,10 年前,所以说"近期"的话就不对头了。
作者: 66666    时间: 2010-7-7 17:56
谈不上是劣化,只是没有对CPU指令集进行向量化而已。文中也说NV并没有义务为竞争对手做优化,而且物理引擎到底有多少代码能在CPU上能够向量化这谁也说不清楚
作者: gzpony    时间: 2010-7-7 17:57
PhysX 只需要 32-bit 单精度浮点操作即可,而 SSE 可以上溯到 Pentium III 时代,10 年前,所以说"近期 ...
Edison 发表于 2010-7-7 17:50



    说近期是想和x87相比而言。x87芯片可能就在80年代中期出现的吧,印象中好像见过,双列直插的,大小类似以前主板的EPROM,长方形的。
作者: lunew    时间: 2010-7-7 18:14
"PhysX 以 x87 代码来拖慢 CPU"

这个说法最具体
作者: solon76    时间: 2010-7-7 18:23
。。。。。 这么厉害啊
作者: asdfjkl    时间: 2010-7-7 18:26
我早就说过NV恶意劣化CPU性能,结果站长还让我提出证据,哎
itany 发表于 2010-7-7 17:46


话不能这么说;
我早说费尔马大定理是对的,但评判人不信;硬要我证明;哎
作者: westlee    时间: 2010-7-7 18:38
提示: 作者被禁止或删除 内容自动屏蔽
作者: potomac    时间: 2010-7-7 18:41
提示: 作者被禁止或删除 内容自动屏蔽
作者: Sirlion    时间: 2010-7-7 18:42
问下这个x87和x86有没有关系?
作者: 结果    时间: 2010-7-7 18:50
哈哈,人对事物的认识往往需要依靠判断力,要都等到铁案如山那除非是万事通否则早已不知被愚弄多少回了
作者: itany    时间: 2010-7-7 19:11
话不能这么说;
我早说费尔马大定理是对的,但评判人不信;硬要我证明;哎
asdfjkl 发表于 2010-7-7 18:26


我又没有那么多资源啊
我就是看了上次的评测之后进行的推测
作者: itany    时间: 2010-7-7 19:12
"PhysX 以 x87 代码来拖慢 CPU"

这个说法最具体
lunew 发表于 2010-7-7 18:14


不止是x87,还有单线程
总之CPU就是“被串行”化了,来显出GPU并行的优点
作者: westlee    时间: 2010-7-7 19:26
提示: 作者被禁止或删除 内容自动屏蔽
作者: foxzeng    时间: 2010-7-7 19:35
请问,超越physx物理硬件加速流体效果的havok demo在哪里?Intel连自己的sse都用不好,有什么资格 ...
westlee 发表于 2010-7-7 19:26



    Havok要靠DEMO来show show?
HL2还不够?
倒没见识过PhysX有哪个游戏比HL2牛B一些
作者: the_god_of_pig    时间: 2010-7-7 19:40
继承了ageia的传统罢了

http://we.pcinlife.com/viewthread.php?tid=763276&highlight=

嘿嘿
作者: the_god_of_pig    时间: 2010-7-7 19:42
请问,超越physx物理硬件加速流体效果的havok demo在哪里?Intel连自己的sse都用不好,有什么资格 ...
westlee 发表于 2010-7-7 19:26



   别转移话题

现在说的是人品问题不是技术问题
作者: asdfjkl    时间: 2010-7-7 19:58
有什么结论吗??最后还是CPU跑物理才是正道??
tft1122 发表于 2010-7-7 18:48

这可能吗? 差距这么大,全是NV没有优化的结果?
作者: 3118595    时间: 2010-7-7 19:58
敢来点中文吗。。。
作者: asdfjkl    时间: 2010-7-7 20:00
你少转移话题, 没人否认GPU本身的加速能力优秀, 大大高于CPU那是肯定的, 但这不能掩盖劣化CPU这样的恶行 ...
brl 发表于 2010-7-7 19:57


看您那个批评人的气势,就知道x86是什么都不清楚;
为什么要为别人优化?
作者: 3118595    时间: 2010-7-7 20:01
Havok要靠DEMO来show show?
HL2还不够?
倒没见识过PhysX有哪个游戏比HL2牛B一些
foxzeng 发表于 2010-7-7 19:35



    半条命2?很好玩么?
作者: VGASOS    时间: 2010-7-7 20:03
我想Physx應該是為了全平台(遊戲機)都可以用 才會使用CPU代碼吧
作者: 餐具    时间: 2010-7-7 20:04
无所谓的~~intel不会说啥的~~~~不然会搞得很想要NV的技术一样的~~~~
INTEL有自己的东西的~~大不了可以用N卡来跑一下
作者: 3118595    时间: 2010-7-7 20:06
跪求翻译帝...
作者: 什么?    时间: 2010-7-7 20:12
intel立功了
作者: yebxyebx    时间: 2010-7-7 20:16
nv的某些手法确实很值得BS
作者: xxx2006    时间: 2010-7-7 20:26
回复  asdfjkl


    还是你的逻辑, AMD/Intel如果为了"兼容性"只给NV的卡PCIe 1.0 x4, 你看NV还能卖出 ...
brl 发表于 2010-7-7 20:21



   只授权AGP8X,nv就得破产啦
作者: foxzeng    时间: 2010-7-7 20:36
本帖最后由 foxzeng 于 2010-7-7 20:41 编辑
半条命2?很好玩么?
3118595 发表于 2010-7-7 20:01



    非常烂的一个游戏 可惜也比什么蝙蝠侠强一些,
最烦人的就是Havok所表现出来的“重力枪”特效,比游戏里所有的枪都好玩
更屌的就是它还有很多MOD,包括什么Life 4 Dead之类的,TANK扔石块时的效果居然是CPU加速
就个MOD都卖得比全力宣传的蝙蝠侠牛B
唉 不愧是N卡跑DEMO
作者: algoking    时间: 2010-7-7 20:41
回复  asdfjkl


不优化和劣化是不一样的, 故意不用10岁高龄的"新技术", 去用近30高龄的东西就是劣化,  ...
brl 发表于 2010-7-7 20:18


你蠢啊,这个是反垄断法的问题。你当intel和MS是慈善家啊!不得不这样做罢了。
作者: ak75    时间: 2010-7-7 20:44
回复 43# 3118595


    hl2的效果到现在都是一流的 当然游戏本身不多说了,2代起码卖了600多万套

hl2也是典型的havok代表,优化到位,效果出众
作者: 3118595    时间: 2010-7-7 20:45
回复  3118595


    hl2的效果到现在都是一流的 当然游戏本身不多说了,2代起码卖了600多万套 ...
ak75 发表于 2010-7-7 20:44



    很多年前玩过   觉得不好玩-_-现在有汉化版没?记得以前是英文的
作者: 3118595    时间: 2010-7-7 20:46
非常烂的一个游戏 可惜也比什么蝙蝠侠强一些,
最烦人的就是Havok所表现出来的“重力枪”特效, ...
foxzeng 发表于 2010-7-7 20:36



    油饼....我好好的在问游戏如何   非惹出个90后    徒呼奈何
作者: 3118595    时间: 2010-7-7 20:49
回复  3118595


    hl2的效果到现在都是一流的 当然游戏本身不多说了,2代起码卖了600多万套 ...
ak75 发表于 2010-7-7 20:44




给你们说的这么好玩 有中文版否?哪个版本比较好玩?给个下载链接.....
作者: ak75    时间: 2010-7-7 20:50
回复 52# foxzeng


    还有 portal这种创意无敌的mod,portal 2 已经e3展示了,我是肯定要买的

physx我个人觉得最好应该是镜之边缘,但是他的重力效果,破坏效果,还是不如hl2
作者: 3118595    时间: 2010-7-7 20:51
回复  foxzeng


    还有 portal这种创意无敌的mod,portal 2 已经e3展示了,我是肯定要买的 ...
ak75 发表于 2010-7-7 20:50



    身为一个N饭   我不喜欢镜子-_-给个下载链接的说......
作者: ak75    时间: 2010-7-7 20:59
回复 59# 3118595


    steam   上有卖的,中文语言包和普通话配音都有,自己去买

半条命都不好玩,那其他fps不用混了
作者: 3118595    时间: 2010-7-7 21:02
回复  3118595


    steam   上有卖的,中文语言包和普通话配音都有,自己去买

半条命都不好玩,那 ...
ak75 发表于 2010-7-7 20:59



    明显我要破解的-_-
作者: qdamao    时间: 2010-7-7 21:15
说到底还是没有一套类似OpenGL或D3D那样广泛使用的物理API啊,PhysX跑老旧X87代码又怎么着,它就算只肯跑GPU、或者Havok拒绝AMD你又能怎么着。

相信没人会去要求NV让PhysX跑在A卡上,那么怒斥着NV让他的PhysX对CPU仔细优化也没啥道理,没好处的事任何商业公司都不会上心的。

其实一套开放且广泛应用的物理API对NV应该也是有好处的,但我猜想可能老黄都觉得PhysX不是这块料,既没有MS那样的强势去推广,也没有得到业界的广泛认可,而且到现在物理应用貌似还是比较小众。
作者: hpctech    时间: 2010-7-7 21:22
估计PhysX是直接用VS编译的。。。
这一点得怪Intel宣传不力,没能让微软的开发工具直接加入ICC的优化功能,也没让微软在VS里把SSE版本的数学函数库加进去。Ageia的人也懒,不想写SSE intrinsics。
作者: 鱼儿水中游    时间: 2010-7-7 21:27
完全不知所云。
作者: the_god_of_pig    时间: 2010-7-7 21:42
估计PhysX是直接用VS编译的。。。
这一点得怪Intel宣传不力,没能让微软的开发工具直接加入ICC的优化功能, ...
hpctech 发表于 2010-7-7 21:22



    搞笑阿
作者: gzpony    时间: 2010-7-7 21:46
你莫非用Pentium2/K7配Fermi跑PhysX? 兼容性? 笑掉大牙的说法.
brl 发表于 2010-7-7 19:50



    找话题灌水也看清楚再说好不? 兼容性这个提法有什么问题?从来就没涉及性能方面
作者: feifeijing    时间: 2010-7-7 21:50
提示: 作者被禁止或删除 内容自动屏蔽
作者: slr    时间: 2010-7-7 21:57
HL2不仅是一个游戏,还是一个完整的引擎。与虚幻三引擎一样是双方的代表作
作者: westlee    时间: 2010-7-7 22:01
提示: 作者被禁止或删除 内容自动屏蔽
作者: westlee    时间: 2010-7-7 22:03
提示: 作者被禁止或删除 内容自动屏蔽
作者: slr    时间: 2010-7-7 22:10
回复 70# westlee
因为即使是physx,绝大多数也是运行在CPU上的,放着主流90%不去优化?
作者: westlee    时间: 2010-7-7 22:30
提示: 作者被禁止或删除 内容自动屏蔽
作者: asdfjkl    时间: 2010-7-7 22:40
用physx做物理效果的游戏(非gpu物理加速),哪个在cpu占用上很难看,到了cpu跑不动的程度?低物 ...
westlee 发表于 2010-7-7 22:03


说的好!
作者: asdfjkl    时间: 2010-7-7 22:41
回复  westlee
因为即使是physx,绝大多数也是运行在CPU上的,放着主流90%不去优化?
slr 发表于 2010-7-7 22:10


90% 是你臆断出来的吧
首先那些集成显卡就不要掺和啥物理特效了吧;剩下的NV的显卡难道只占10%,剩下的都是AMD的显卡?
作者: asdfjkl    时间: 2010-7-7 22:45
真的需要优化的,自然会去买源代码优化,毕竟多线程优化还是有必要的,但是,既然买了源代码,游戏公 ...
westlee 发表于 2010-7-7 22:30


对呀,为什么要对SSE优化;那对AMD处理器没有SEE指令集,岂不跑起来很弱!
到时候肯定会说,NV收了Intel的黑钱;故意弱化AMD的处理器在游戏中的性能;如果同时为intel amd优化,有必要吗???
那苹果还禁止flash运行在自己的ipad上呢!
作者: FontainebleauV    时间: 2010-7-7 22:48
提示: 作者被禁止或删除 内容自动屏蔽
作者: hpctech    时间: 2010-7-7 22:49
对呀,为什么要对SSE优化;那对AMD处理器没有SEE指令集,岂不跑起来很弱!
到时候肯定会说,NV收了Int ...
asdfjkl 发表于 2010-7-7 22:45


IU和AU都支持SSE。。。
作者: FontainebleauV    时间: 2010-7-7 22:52
提示: 作者被禁止或删除 内容自动屏蔽
作者: 心头烦    时间: 2010-7-7 22:52
非常烂的一个游戏 可惜也比什么蝙蝠侠强一些,
最烦人的就是Havok所表现出来的“重力枪”特效, ...
foxzeng 发表于 2010-7-7 20:36



    hl2 , Life 4 Dead  卖得好是因为游戏可玩性,特别是求生之路的对抗,我都在玩
但这些游戏的效果..........
还是别吹得太神。。
作者: FontainebleauV    时间: 2010-7-7 22:57
提示: 作者被禁止或删除 内容自动屏蔽
作者: sleepyboy    时间: 2010-7-7 23:03
74,75连续两楼着实雷人啊。
作者: luckissy    时间: 2010-7-7 23:26
请问,超越physx物理硬件加速流体效果的havok demo在哪里?Intel连自己的sse都用不好,有什么资格 ...
westlee 发表于 2010-7-7 19:26



    你知道寒霜引擎吗? 你知道BC2吗? 不错的PC游戏 估计没什么PhysX的PC游戏能超过吧...
作者: itany    时间: 2010-7-7 23:38
你少转移话题, 没人否认GPU本身的加速能力优秀, 大大高于CPU那是肯定的, 但这不能掩盖劣化CPU这样的恶行 ...
brl 发表于 2010-7-7 19:57


GPU有他的优势,但是做物理运算并不一定就是GPU的强项
事实上GPU比CPU快这种说法从来就不是公认的。也只有NV这种才这么主观臆断
作者: 非瞬    时间: 2010-7-7 23:40
提示: 作者被禁止或删除 内容自动屏蔽
作者: PURE布    时间: 2010-7-7 23:40
SEE 公式 还搞不清楚回去好好百度一下在回贴子 当技术帝不是这么简单的
继续看戏
作者: itany    时间: 2010-7-7 23:41
看您那个批评人的气势,就知道x86是什么都不清楚;
为什么要为别人优化?
asdfjkl 发表于 2010-7-7 20:00


大大低于业界的水平,就叫做劣化
现在是个编译器都能做SSE2优化,是不是充分的且不说,纵不能用死烂的x87吧
作者: itany    时间: 2010-7-7 23:45
回复  westlee
因为即使是physx,绝大多数也是运行在CPU上的,放着主流90%不去优化?
slr 发表于 2010-7-7 22:10


为啥在XBOX 360上就给Power进行矢量优化呢?
作者: cellwing    时间: 2010-7-7 23:48
提示: 作者被禁止或删除 内容自动屏蔽
作者: cellwing    时间: 2010-7-7 23:59
提示: 作者被禁止或删除 内容自动屏蔽
作者: cellwing    时间: 2010-7-8 00:06
提示: 作者被禁止或删除 内容自动屏蔽
作者: slice    时间: 2010-7-8 00:09
说近期是想和x87相比而言。x87芯片可能就在80年代中期出现的吧,印象中好像见过,双列直插的,大 ...
gzpony 发表于 2010-7-7 17:57


问题是现实中会跑PhysX的CPU有只X87不SSE的么?
作者: slice    时间: 2010-7-8 00:09
说近期是想和x87相比而言。x87芯片可能就在80年代中期出现的吧,印象中好像见过,双列直插的,大 ...
gzpony 发表于 2010-7-7 17:57


问题是现实中会跑PhysX的CPU有只X87不SSE的么?
别给我说Win95或者DOS下去PhysX
作者: slr    时间: 2010-7-8 00:25
回复 74# asdfjkl
我说的是,90%的游戏是CPU-PX,GPU-PX那是少之又少,大作?五个手指数完
作者: hpctech    时间: 2010-7-8 01:08
大大低于业界的水平,就叫做劣化
现在是个编译器都能做SSE2优化,是不是充分的且不说,纵不能用死烂的 ...
itany 发表于 2010-7-7 23:41


很多编译器还不行吧,至少VS2008还不行,VS2008在64bit下勉强能编译出少量SSE,但无法直接从C语句编译优化成SSE,基本上还是靠intrinsics,写起来很恶心。ICC有SSE优化的库。
要想直接从C语句到SSE之类的矢量指令,最靠谱的还是换个编程模型,如CT之类的。CUDA也有开源的编译器从CU编译至SSE指令为主的代码。
作者: kaven    时间: 2010-7-8 01:09
这招不就是和intel学的?intel的icc编译器出来的程序,如果发现cpu不是intel的cpu,
比如amd的,via的cpu也不启用用sse优化的代码
作者: hpctech    时间: 2010-7-8 01:13
狡辩也要有个限度。

1.因为cpu优化后也超不过gpu,所以就让cpu用最慢的方式跑?火星逻辑?

2.似乎有 ...
迁徙的鸟 发表于 2010-7-8 00:59



第2点说得过了,使用一般的C开发工具,用SSE优化无论如何都不会比直接写C代码简单的。
作者: oooooo    时间: 2010-7-8 01:20
看过中文版后再看此帖的某几位回复- -很搞笑~
作者: hpctech    时间: 2010-7-8 01:29
我只是引述作者观点。 83楼有翻译。
迁徙的鸟 发表于 2010-7-8 01:28


看了。不排除媒体在添油加醋。
作者: hpctech    时间: 2010-7-8 01:45
有这事?

不过好几年前的icc是不挑a u的。当时的mame plus官方同时会有gcc和icc两个版本,a u上跑 ...
迁徙的鸟 发表于 2010-7-8 01:38


的确如此,貌似是从v11开始的。
不过最近听说要取消限制了
作者: goodayoo    时间: 2010-7-8 02:37
标题: 如果intel很生气的话,后果将会很严重!
E大的那篇文章的中文意思大概是这样的:

Nvidia公司一直把PhysX游戏引擎当作宣传GPU计算能力强于CPU的噱头,而且几年来多款游戏也确实采用了这种GPU加速技术,这类游戏开启 PhysX之后确实物理特效华丽不少,不过如果没有打开GPU硬件加速,只用CPU来计算物理特效,那么游戏的流畅度就会大打折扣。不过,我们曾指出,有 些这样的游戏在用CPU处理物理特效时只使用了单线程技术,而物理特效本身是可以很方便地使用类似多线程的技术来处理的,实际上在打开GPU硬加速时情况 便是如此,因此游戏厂商此举很有故意放着处理器的多核条件不用的作弊之嫌。



更糟糕的是,RealWorld Technologies网站的作者David Kanter最近的研究让这些游戏厂商的作弊嫌疑又加重了几分。他使用Intel的VTune进程查看工具分析了多款支持PhysX特效的游戏,结果发现当这些游戏使用CPU处理物理特效时,大部分的代码使用的仍然是老尽厩胱⒁庥么省
作者: cylinr    时间: 2010-7-8 08:55
太深奥了,看不懂英文.
作者: saga1974    时间: 2010-7-8 09:07
坐看某些人高潮
对NV或AGEIA来说,本身又不出售支持SSE指令的CPU,那么不支持SSE也就是正常的商业行为。
作者: hadeszhang    时间: 2010-7-8 09:10
就像英特尔和AMD都不对NV开放自己平台一样,行业竞争而已,有什么大惊小怪,如果NV全面对CPU做优化,那才叫**
作者: darkangel308    时间: 2010-7-8 09:16
对呀,为什么要对SSE优化;那对AMD处理器没有SEE指令集,岂不跑起来很弱!
到时候肯定会说,NV收了Int ...
asdfjkl 发表于 2010-7-7 22:45



    谁告诉你AMD处理器没SSE指令集......
作者: hdht    时间: 2010-7-8 09:28
提示: 作者被禁止或删除 内容自动屏蔽
作者: whoAU    时间: 2010-7-8 09:39
这有啥不正常?

AMD跟INTEL不也是不支持GPU PHYSX吗?

哪条法律规定NV要为PHYSX ON CPU做SSE方向的优 ...
hdht 发表于 2010-7-8 09:28

NV吃多了撑得了时, 也可以写支持AMD GPU的PHYSX,这个AMD管不到的。
这个就和没SEE一样的,NV去做了就是**,但从历史上来看不做的是**加脑抽。
作者: garou    时间: 2010-7-8 09:47
纯商业手段而已~没什么好奇怪的~
作者: chery66    时间: 2010-7-8 09:56
HL2是神作啊,唯一买的一套FPS游戏的正版




欢迎光临 POPPUR爱换 (https://we.poppur.com/) Powered by Discuz! X3.4