|
http://www.hpccommunity.org/f5/h ... opteron-2400-a-587/
We tested the two new processor platforms and below are our findings:
This year has brought big advances to the CPU industry with the arrival of the Intel Xeon 5500 series "Nehalem" and the AMD Opteron 2400 series "Istanbul". While many different benchmarks have been published comparing both systems, they all have seemed to fall a bit short in showing an accurate
comparison of the two platforms.. The HPC industry standard for benchmarking is HPL or High Performance Linpack, on which the "Top 500" list is based. Many industry insiders look to the results of this benchmark as an insight into how their systems will perform.
This benchmark was performed with a single goal in mind: to show the peak performance in terms of GFLOPS (billion floating point operations per second). The maximum theoretical GFLOPS per system is very different depending on the number of cores, clock speed and IPC's (instructions per cycle). Older CPUs of just a few years ago would only be able to do 2 IPC's, but with today's newer architectures, CPU's are able to do 4 IPC's. To give you a comparison, an older dual core Opteron running at 2.2 GHz had a theoretical peak of only 17.6 GFLOPS per machine. With today's new CPU's, a quad core Opteron has a theoretical peak of 70.6 GFLOPS. That is roughly twice the performance per core due to the CPU being able to handle 4 IPC's compared to 2 IPC's. Even though both the Nehalem and Istanbul systems provide 4 IPC's there are other architectural design decisions that can impact performance. Both have on-processor memory controllers and large caches, but core counts, processor interconnect speeds, and memory configurations all vary.
Testing:
HPL benchmarks that have been released by others appear to have no standardization of the running parameters. Our benchmarks were run with the same compiler, MPI, and linear algebra library. In our recent testing and through real world experience, we have found that the Intel compilers and Intel Math Kernel Library (MKL) usually provide the best performance. Instead of just settling on Intel's toolkit we tried various compilers including: Intel, GNU compilers, and Portland Group. We also tested various linear algebra libraries including: MKL, AMD Core Math Library (ACML), and libGOTO from the University of Texas. All of the testing showed we could achieve the highest performance when using both the Intel Compilers and Intel Math Library--even on the AMD system--so these were used them as the base of our benchmarks. The benchmarks were run on an Opteron 2435 Istanbul system (6 core 2.6GHz processor with 16GB of 800MHz DDR2) and a X5550 Nehalem system (quad core 2.66GHz processor with 12GB of 1333MHz DDR3). An attempt was made to keep the systems identical in every other way. The same power supply, hard drive, and operating system were used (even though these parameters shouldn't effect the performance of HPL). The amount of RAM varies due to the Nehalem providing the best performance when using its tri-channel memory architecture versus the Opteron's dual channel. Since HPL performs best when using as much memory as it can, we adjusted the problem size (N in the HPL configuration file) to use as close to 100% of the RAM on the system as possible.
Results:
CPU Model Problem Size (N) Theoretical Peak Actual Peak Efficiency Node Cost $ per GFLOP
Nehalem X5550 2.66GHz 35840 85.12 GFLOPS 74.03 GFLOPS 86.97% $3,800.00 $51.33
Istanbul 2435 2.6GHz 41216 124.8 GFLOPS 99.38 GFLOPS 79.63% $3,500.00 $35.21 |
|