POPPUR爱换

 找回密码
 注册

QQ登录

只需一步,快速开始

手机号码,快捷登录

搜索
查看: 9451|回复: 109
打印 上一主题 下一主题

Linux下Cell 3.2G vs. PPC G5 1.6G对比测试出炉

[复制链接]
跳转到指定楼层
1#
发表于 2006-11-23 12:06 | 只看该作者 回帖奖励 |正序浏览 |阅读模式
http://www.geekpatrol.ca/2006/11/playstation-3-performance/

Cell的这个PPE还真是强大阿,居然绝大部分测试性能能输给1.6GHz的PowerPC G5。 :wacko:

这个geekbench有Windows版本阿,大家可以测试一下,看看Cell vs Conroe是什么水准 :D

下载地址:

http://www.geekpatrol.ca/geekbench/#download


PlayStation 3 Performance

On Sunday I saw a clip of Fedora Core 5 for PPC running on the PlayStation 3 over at Kotaku; I’d completely forgotten that Sony was going to make it easy to boot other operating systems on the PlayStation 3!

On Monday I started receiving requests for Geekbench for Linux PPC so people could run it on the PlayStation 3. I managed to get a beta version out last night and while it’s not quite ready for public release yet, one beta tester sent in the results for his PlayStation 3 which I thought I’d share. To give the results some context, I’m going to compare the PlayStation 3 results against one of the first Power Mac G5s running at 1.6GHz.

Setup
      Playstation 3
          o Cell Broadband Engine @ 3.2GHz
          o 256 MB RAM
          o Fedora Core 5
          o Geekbench 2006 (Build 243)

      Power Mac G5
          o PowerPC G5 @ 1.6GHz
          o 1280 MB RAM
          o Fedora Core 4
          o Geekbench 2006 (Build 243)

I’m reporting the baseline score, rather than the raw score, for each test (where 100 is the score a PowerMac G5 1.6GHz running Mac OS X would receive on the same test). As always, higher scores are better.

Results
Overall Score
PlayStation 3   105.2
Power Mac G5 106.9

Integer Performance

Emulate 6502 (single-threaded scalar)
PlayStation 3   42.1
Power Mac G5 73.9

Emulate 6502 (multi-threaded scalar)
PlayStation 3   57.3
Power Mac G5 73.8

Blowfish (single-threaded scalar)
PlayStation 3   118.7
Power Mac G5 107.0

Blowfish (multi-threaded scalar)
PlayStation 3    165.6
Power Mac G5  107.0

bzip2 Compress (single-threaded scalar)
PlayStation 3   89.8
Power Mac G5 162.8

bzip2 Compress (multi-threaded scalar)
PlayStation 3   124.1
Power Mac G5 168.4

bzip2 Decompress (single-threaded scalar)
PlayStation 3   76.6
Power Mac G5 129.9

bzip2 Decompress (multi-threaded scalar)
PlayStation 3   99.5
Power Mac G5 133.1

Floating Point Performance

Mandelbrot (single-threaded scalar)
PlayStation 3   49.0
Power Mac G5 100.0

Mandelbrot (multi-threaded scalar)
PlayStation 3   72.1
Power Mac G5 100.0

Dot Product (single-threaded scalar)
PlayStation 3   120.0
Power Mac G5 100.8

Dot Product (multi-threaded scalar)
PlayStation 3   119.3
Power Mac G5 100.3

JPEG Compress (single-threaded scalar)
PlayStation 3   70.7
Power Mac G5 103.0

JPEG Compress (multi-threaded scalar)
PlayStation 3   94.8
Power Mac G5 103.0

JPEG Decompress (single-threaded scalar)
PlayStation 3   61.6
Power Mac G5 119.0

JPEG Decompress (multi-threaded scalar)
PlayStation 3   72.9
Power Mac G5 119.2

Memory Performance

Read Sequential (single-threaded scalar)
PlayStation 3   51.9
Power Mac G5 116.7

Read Sequential (multi-threaded scalar)
PlayStation 3   56.9
Power Mac G5 116.0

Write Sequential (single-threaded scalar)
PlayStation 3   194.6
Power Mac G5 104.7

Write Sequential (multi-threaded scalar)
PlayStation 3   191.4
Power Mac G5 112.7

Stdlib Allocate (single-threaded scalar)
PlayStation 3    43.4
Power Mac G5  56.4

Stdlib Allocate (multi-threaded scalar)
PlayStation 3   51.2
Power Mac G5 55.6

Stdlib Write (single-threaded scalar)
PlayStation 3    331.5
Power Mac G5  92.7

Stdlib Write (multi-threaded scalar)
PlayStation 3   365.9
Power Mac G5 94.7

Stdlib Copy (single-threaded scalar)
PlayStation 3   64.5
Power Mac G5 63.5

Stdlib Copy (multi-threaded scalar)
PlayStation 3   102.1
Power Mac G5 72.7

Stream Performance

Stream Copy (single-threaded scalar)
PlayStation 3   89.7
Power Mac G5 114.1

Stream Copy (multi-threaded scalar)
PlayStation 3   109.9
Power Mac G5 111.8

Stream Scale (single-threaded scalar)
PlayStation 3   69.2
Power Mac G5 118.3

Stream Scale (multi-threaded scalar)
PlayStation 3   101.4
Power Mac G5 120.1

Stream Add (single-threaded scalar)
PlayStation 3   62.6
Power Mac G5 123.0

Stream Add (multi-threaded scalar)
PlayStation 3   93.2
Power Mac G5 118.0

Stream Triad (single-threaded scalar)
PlayStation 3   62.7
Power Mac G5 122.8

Stream Triad (multi-threaded scalar)
PlayStation 3   102.2
Power Mac G5 118.6


Conclusion

There was a comment on Slashdot last year that made the following assertion about the Cell processor:

    The problem is that though the main CPU is PowerPC-based like current Apple chips, it is stripped down, and the Altivec support will be much lower than in current G5s. Unoptomized, Apple code would run like a G4 on this hardware.

Turns out the comment was right; Cell processor performance is comparable to low-end PowerPC G5 performance (which in turn is comparable to high-end PowerPC G4 performance). I can’t comment on Altivec performance, unfortunately, since Geekbench for Linux PPC doesn’t measure Altivec performance yet.

Geekbench also isn’t able to exploit the eight vector processors on the Cell processor. Any program designed and optimized for the Cell processor should be a lot faster than one designed for a generic processor (like, say, Geekbench). So while the Geekbench results might seem disappointing, keep in mind that Geekbench can’t exercise the PlayStation 3 to its full potential.

[ 本帖最后由 Prescott 于 2006-11-23 12:21 编辑 ]
110#
发表于 2007-3-18 20:14 | 只看该作者
原帖由 the_god_of_pig 于 2007-3-18 19:24 发表
我来挖出来:lol:



这坟里埋了不少人啊。
回复 支持 反对

使用道具 举报

109#
发表于 2007-3-18 19:24 | 只看该作者
我来挖出来:lol:
回复 支持 反对

使用道具 举报

108#
 楼主| 发表于 2006-11-27 19:43 | 只看该作者
原帖由 RacingPHT 于 2006-11-27 18:03 发表


有意思.
NGC Gekko 485MHZ, Dhrystone v2.1 = 1125
PS2 EE      297MHZ, Dhrystone vX.X = 450
(我没搞错吧..)

PPC 750只有4级流水线,还有一个说得过去的分支预测器,加上单周期的L1,跑Dhrystone效率自然不错。
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
107#
发表于 2006-11-27 18:03 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

106#
发表于 2006-11-27 16:03 | 只看该作者
新资料,厚道地顶一下.
回复 支持 反对

使用道具 举报

105#
发表于 2006-11-26 20:40 | 只看该作者
・Dhrystone v2.1
PS3 Cell 3.2GHz: 1879.630
PowerPC G4 1.25GHz: 2202.600
PentiumIII 866MHz: 1124.311
Pentium4 2.0AGHz: 1694.717
Pentium4 3.2GHz: 3258.068

・Linpack 100x100 Benchmark In C/C++ (Rolled Double Precision)
PS3 Cell 3.2GHz: 315.71
PentiumIII 866MHz: 313.05
Pentium4 2.0AGHz: 683.91
Pentium4 3.2GHz: 770.66
Athlon64 X2 4400+ (2.2GHz): 781.58

・Linpack 100x100 Benchmark In C/C++ (Rolled Single Precision)
PS3 Cell 3.2GHz: 312.64
PentiumIII 866MHz: 198.7
Pentium4 2.0AGHz: 82.57
Pentium4 3.2GHz: 276.14
Athlon64 X2 4400+ (2.2GHz): 538.05


source: http://rian.s26.xrea.com/nicky.cgi?D...121A#20061121A
回复 支持 反对

使用道具 举报

104#
发表于 2006-11-25 17:05 | 只看该作者
笔记本上测的,Core Duo T2500@2GHz。


Geekbench 2006 (build 238).  Email geekbench@geekpatrol.ca with feedback.

System Information
  Geekbench Version:         Geekbench 2006 (build 238)
  Geekbench Platform:        Windows x86 (32-bit)
  Geekbench Compiler:        Visual C++ 2005
  OS:                        Microsoft Windows XP Professional
  Model:                     Dell Inc. MM061
  Motherboard:               Dell Inc. 0XD720
  Processor:                 Genuine Intel(R) CPU           T2500  @ 2.00GHz
  Processor ID:              GenuineIntel Family 6 Model 14 Stepping 8
  Logical Processor Count:   2
  Physical Processor Count:  2
  Processor Frequency:       1995 MHz
  Bus Frequency:             133 MHz
  Memory:                    1022 MB

Integer Performance
  Emulate 6502
    single-threaded scalar   157.0 (rate: 1.0, result: 296.9 MHz)
    multi-threaded scalar    314.1 (rate: 2.0, result: 593.5 MHz)
  Blowfish
    single-threaded scalar   120.1 (rate: 1.0, result: 49.5 MB/sec)
    multi-threaded scalar    239.5 (rate: 2.0, result: 98.8 MB/sec)
  bzip2 Compress
    single-threaded scalar   198.8 (rate: 1.0, result: 31.0 MB/sec)
    multi-threaded scalar    392.3 (rate: 2.0, result: 60.8 MB/sec)
  bzip2 Decompress
    single-threaded scalar   158.1 (rate: 1.0, result: 58.8 MB/sec)
    multi-threaded scalar    324.4 (rate: 2.0, result: 116.8 MB/sec)

Floating Point Performance
  Mandelbrot
    single-threaded scalar   125.4 (rate: 1.0, result: 889.0 Mflops)
    multi-threaded scalar    250.6 (rate: 2.0, result: 1.8 Gflops)
  Dot Product
    single-threaded scalar    95.3 (rate: 1.0, result: 490.9 Mflops)
    multi-threaded scalar    186.8 (rate: 2.0, result: 970.7 Mflops)
    single-threaded vector   155.6 (rate: 4.5, result: 2.2 Gflops)
    multi-threaded vector    308.5 (rate: 9.1, result: 4.4 Gflops)
  JPEG Compress
    single-threaded scalar   142.0 (rate: 1.0, result: 13.2 Mpixels/sec)
    multi-threaded scalar    284.9 (rate: 2.0, result: 26.4 Mpixels/sec)
  JPEG Decompress
    single-threaded scalar   153.4 (rate: 1.0, result: 25.5 Mpixels/sec)
    multi-threaded scalar    293.4 (rate: 1.9, result: 48.6 Mpixels/sec)

Memory Performance
  Read Sequential
    single-threaded scalar   213.5 (rate: 1.0, result: 2.7 GB/sec)
    multi-threaded scalar     26.8 (rate: 0.1, result: 166.5 MB/sec)
  Write Sequential
    single-threaded scalar   203.1 (rate: 1.0, result: 1.5 GB/sec)
    multi-threaded scalar    209.2 (rate: 0.5, result: 803.4 MB/sec)
  Stdlib Allocate
    single-threaded scalar    99.0 (rate: 1.0, result: 3.5 Mallocs/sec)
    multi-threaded scalar     45.5 (rate: 0.5, result: 1.6 Mallocs/sec)
  Stdlib Write
    single-threaded scalar   259.4 (rate: 1.0, result: 6.6 GB/sec)
    multi-threaded scalar    172.3 (rate: 0.6, result: 4.0 GB/sec)
  Stdlib Copy
    single-threaded scalar   188.8 (rate: 1.0, result: 2.1 GB/sec)
    multi-threaded scalar    151.1 (rate: 0.8, result: 1.6 GB/sec)

Stream Performance
  Stream Copy
    single-threaded scalar   171.2 (rate: 1.0, result: 2.1 GB/sec)
    multi-threaded scalar    179.4 (rate: 1.0, result: 2.2 GB/sec)
    single-threaded vector   160.2 (rate: 1.0, result: 2.2 GB/sec)
    multi-threaded vector    166.0 (rate: 1.1, result: 2.3 GB/sec)
  Stream Scale
    single-threaded scalar   185.1 (rate: 1.0, result: 2.2 GB/sec)
    multi-threaded scalar    190.9 (rate: 1.0, result: 2.2 GB/sec)
    single-threaded vector   160.3 (rate: 1.0, result: 2.2 GB/sec)
    multi-threaded vector    164.2 (rate: 1.0, result: 2.3 GB/sec)
  Stream Add
    single-threaded scalar   178.7 (rate: 1.0, result: 2.3 GB/sec)
    multi-threaded scalar    193.1 (rate: 1.1, result: 2.6 GB/sec)
    single-threaded vector   177.8 (rate: 1.1, result: 2.5 GB/sec)
    multi-threaded vector    183.0 (rate: 1.1, result: 2.6 GB/sec)
  Stream Triad
    single-threaded scalar   175.1 (rate: 1.0, result: 2.3 GB/sec)
    multi-threaded scalar    192.2 (rate: 1.1, result: 2.6 GB/sec)
    single-threaded vector   140.5 (rate: 1.1, result: 2.4 GB/sec)
    multi-threaded vector    146.9 (rate: 1.1, result: 2.6 GB/sec)

Overall Score:   187.1
回复 支持 反对

使用道具 举报

potomac 该用户已被删除
103#
发表于 2006-11-25 12:00 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

102#
发表于 2006-11-25 07:31 | 只看该作者
现在的问题是,什么程序充分利用了spe的运算能力?cell的问题如果解决,是否意味着gpgpu+cup的问题也能解决?
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
101#
发表于 2006-11-24 10:43 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

100#
发表于 2006-11-24 07:04 | 只看该作者
spe 的发挥需要牺牲 ppe 的性能,   这种测试 对 pc 不公平, 游戏机是 简单化, 被 sony fan 称做,  专门 优化,  没有 windows 那么臃肿的系统,所以效率 高很多。  sony fan 说 linux 效率比 windows 高很多。 所以 在windows 下 测试出的成绩 对 windows 不公平。 sony fan 的意思。
回复 支持 反对

使用道具 举报

99#
发表于 2006-11-23 20:14 | 只看该作者
原帖由 Edison 于 2006-11-23 20:00 发表
SPE的流水线是in-order的设计,有128个4D SP寄存器,如果把MOB做成Conroe那样成本太高了,性能改善也不见得合理。

对于"典型"整数应用来说,这没什么。 比寄存器数量,Itanium的寄存器够多吧?  猜猜看, Load/Store指令占多少比例? 还是很大的。

SPE最近的存储单元,高延迟,而Load Buffer/Store Buffer缓冲环节如此脆弱, 还有Memory disambig.混叠的时候, 有好戏看了。
回复 支持 反对

使用道具 举报

potomac 该用户已被删除
98#
发表于 2006-11-23 20:05 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

97#
发表于 2006-11-23 20:00 | 只看该作者
SPE的流水线是in-order的设计,又有128个128bit寄存器,如果把MOB做成Conroe那样成本太高了,性能改善也不见得合理。
回复 支持 反对

使用道具 举报

96#
发表于 2006-11-23 19:54 | 只看该作者
原帖由 Edison 于 2006-11-23 19:42 发表

LS离得这么近,你觉得MOB弄多大合适?

看上去近,但这不算近, 不是说7周期延迟吗?

不少处理器的L1D也很近, 还是需要Load buffer/Store buffer来缓冲,不然冲突停顿的开销很利害。

你以为是Itanium2的1周期延迟啊?还有ALAT猜测机制等等。
回复 支持 反对

使用道具 举报

95#
发表于 2006-11-23 19:49 | 只看该作者
原帖由 Edison 于 2006-11-23 19:19 发表
VLIW的代码尺寸成本很高,SPE那点内存可能塞不了几个VLIW包就爆掉了。

处理这个问题,EPIC的寄存器旋转技术,可比原来的VLIW要高明多了。
回复 支持 反对

使用道具 举报

94#
发表于 2006-11-23 19:42 | 只看该作者
原帖由 hopetoknow2 于 2006-11-23 16:51 发表
而且SPE, 好像连Load buffer/Store buffer也都没有。 Load指令和Store指令的实际执行效率, 恐怕经常Stall

LS离得这么近,你觉得MOB弄多大合适?
回复 支持 反对

使用道具 举报

93#
发表于 2006-11-23 19:19 | 只看该作者
VLIW的代码尺寸成本很高,SPE那点内存可能塞不了几个VLIW包就爆掉了。
回复 支持 反对

使用道具 举报

92#
发表于 2006-11-23 18:31 | 只看该作者
原帖由 Tanknet 于 2006-11-23 18:18 发表

这里面没有讨论IA64或者VLIW的啊? 发错连接了?

呵呵,只是说VLIW派系也很牛, 居然拿了2次最高荣誉。
咱没有资料啊。 只是重复以前别人教训咱的话。 说咱搞了半天,居然还没发现体系结构世界的另一半"邪恶势力"。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

广告投放或合作|网站地图|处罚通告|

GMT+8, 2025-11-24 22:16

Powered by Discuz! X3.4

© 2001-2017 POPPUR.

快速回复 返回顶部 返回列表