INTEL：新处理器快过CELL 5000倍，下代游戏主机无须手柄

naze · 发表于 2007-12-17 22:33

原帖由 D65 于 2007-12-17 21:02 发表
AMD吹牛就算了，还编些假事件，假数据，泡制假新闻，弄出一付Intel也爱吹牛的模样。俗称抹黑战术。我黑了，也要搞黑对手。

和民进党非常像，实在是又臭又硬。

同求新闻来源链接...
不会是某某小编梦中呓语得到的吧

Jason21 · 发表于 2007-12-18 08:03

原帖由紫色于 2007-12-17 11:37 发表
intel必须吹。你见过街上卖东西的小贩不吆喝的么？

你必须告诉别人，你必须吸引别人!

哪怕忽悠大众、吸引别人来骂你都行啊

最怕的就是自己成了粪坑里的石头-又臭又硬，没人理！

越看越像AMD:whistling:

jiuzhege · 发表于 2007-12-18 09:11

原帖由 bwdinlife 于 2007-12-17 20:17 发表

貌似ps2已经说过了

ps2 说脑后插管？搞笑

紫色 · 发表于 2007-12-18 09:15

有些东西是必然的，小贩总要叫卖、鸡总是要给旅客打电话。
商人必须叫卖自己的产品，必须产品有了点眉目就放出风声去吸引人，因为让公众作出反应、去猜测、去研究从而做好购买的准备都需要时间，这样产品一出来就可以顺利地被大家接受、购买。
所以有的时候就会发生跳票事件：nb吹过头了，公众的胃口是吊起来了，自己却又没有能耐实现~~~所以对产品上市前的风声应半信半疑，不要激动，只有见到了实物、见到了独立可信网站的实测才认可。

紫色 · 发表于 2007-12-18 09:22

继续。
还有一点值得注意，厂商总是要吹嘘自家的产品的，因为没人嫌钱多。结论就是：自家的产品即使不好也必须吹、落后了也要吹。ia都是如此，谴责他们没用。
重要的是自己要有独立思考、独立判断的能力，不要盲目相信别人，这一点对一个人来说实在是太重要了。

紫色 · 发表于 2007-12-18 09:29

说教继续。
什么是事实？事实是不以你我意志为转移的“独立“的东西吗？错了，事实是人的意志的一种状态。我在看叔本华“作为意志和表象的世界”这本书之前就悟出了这一点。:a)
当我判断哪种U好的时候，那就是事实（对我）:p。

itany · 发表于 2007-12-18 10:47

原帖由紫色于 2007-12-18 09:29 发表
说教继续。
什么是事实？事实是不以你我意志为转移的“独立“的东西吗？错了，事实是人的意志的一种状态。我在看叔本华“作为意志和表象的世界”这本书之前就悟出了这一点。:a)
当我判断哪种U好的时候，那就是事实 ...

你可以认为别人都是纱布，就像别人都认为你是纱布一样
你可以认为疯人院外边的人都是疯子，就像你被关进疯人院一样
你可以认为AMD很好，就像认为那一陀很香一样……

shike_cuke · 发表于 2007-12-18 10:49

等实际产品出来以后才知道分晓啊!

紫色 · 发表于 2007-12-19 10:09

嗬嗬，无所谓。
ibm为cell体系优化了其编程语言，在c,fortran中引入vector数据类型很大程度上就是为了提供针对对cell spu的vector intrinsics，也算是踏踏实实干点实事吧。
即使intel实现诺言，其新处理器有强大的峰值性能，仅提供编程工具也至少要向后推迟1-2年，除非他现在就动手。
ifan们欢呼是不是早了点呢？
http://publib.boulder.ibm.com/in ... /vectordatatype.htm

只看该作者 · 发表于 2007-12-19 11:36

提示: 作者被禁止或删除内容自动屏蔽

itany · 发表于 2007-12-19 11:38

原帖由紫色于 2007-12-19 10:09 发表
嗬嗬，无所谓。
ibm为cell体系优化了其编程语言，在c,fortran中引入vector数据类型很大程度上就是为了提供针对对cell spu的vector intrinsics，也算是踏踏实实干点实事吧。
即使intel实现诺言，其新处理器有强大的 ...

前几天还讨论过Larrabee的体系结构和CT……
忘了？

Prescott · 发表于 2007-12-19 11:45

原帖由紫色于 2007-12-19 10:09 发表
嗬嗬，无所谓。
ibm为cell体系优化了其编程语言，在c,fortran中引入vector数据类型很大程度上就是为了提供针对对cell spu的vector intrinsics，也算是踏踏实实干点实事吧。
即使intel实现诺言，其新处理器有强大的 ...

Cell？你不是很鄙视SIMD吗？Cell还能活多久都是个问题，那个双精度加强版Cell去哪里了？

至于Intel编译器和MS C++编译器的SIMD intrinsic，你总不能选择性请注意文明用词吧。Intel的我就懒得说了，直接帖MSVC的吧。
在Intel编译器和MS编译器中都已经用了几十年的东西，你也好意思替IBM拿出来吹？忘了说了，Intel的intrinsic和MSVC的完全兼容，不过Intel的出来的更早些，明白怎么回事了吗？
不懂就虚心学，别什么都不懂就知道整天到处攻击。

http://msdn2.microsoft.com/en-us/library/y0dh78ez.aspx

Streaming SIMD Extensions 2 Instructions

This section describes the C/C++ language-level features supporting the Streaming SIMD Extensions 2 (SSE2) instructions:

Floating-Point Intrinsics Using Streaming SIMD Extensions 2 Instructions that describe the intrinsic operations for the double-precision, floating-point data type (__m128d).
Integer Intrinsics Using Streaming SIMD Extensions 2 that describe the intrinsics for the extended-precision integer data type (__m128i).

Other topics discussed in this section include:

Theemmintrin.h header file contains the declarations for the SSE2instructions intrinsics. The file dvec.h contains operator overloadsfor some of the SSE2 instructions intrinsics, which are available foruse in C++ programs.
SSE2 intrinsics use the __m128, __m128i, and __m128d data types, which are not supported on Itanium Processor Family (IPF) processors. Any SSE2 intrinsics that use the __m64 data type are not supported on x64 processors.

Floating-Point Intrinsics Using Streaming SIMD Extensions 2 Instructions

The following topics list the floating-point and integer intrinsics brokeninto groups by the nature of the operation. Each intrinsic entry has aninformal pseudo-code, and it is followed with a correspondinginstruction name in uppercase letters; for example, ADDSD is the name of the first instruction listed in this section. The variable ris generally used for the intrinsic's return value. A number appendedto a variable name indicates the element of a packed object. Forexample, r0 is the lowest double of r. Some intrinsics are compositesbecause they require more than one instruction to implement them. Formore details, refer to the Streaming SIMD Extensions 2 (SSE2)instructions external architecture specification (EAS). You should befamiliar with the hardware features provided by the SSE2 instructionswhen writing programs with the intrinsics. The following are threeimportant issues to keep in mind:

Certain intrinsics, such as _mm_loadr_pd and _mm_cmpgt_sd,are not directly supported by the instruction set. While theseintrinsics are convenient programming aids, be mindful of theirimplementation cost.
Data loaded or stored as __m128d objects must be generally 16-byte aligned.
Someintrinsics require that their argument be immediates, that is, constantintegers (literals), because of the nature of the instruction.

This section contains the following topics:

后面的自己看吧

[ 本帖最后由 Prescott 于 2007-12-19 11:50 编辑 ]

紫色 · 发表于 2007-12-19 12:13

谁说我鄙视simd了，任意翻我的帖子你找不到我犯了这条罪。我鄙视的是那些盲目YY理论性能的，cell和sse都是如此，我很希望能体验到simd的好处。intel编译器带simd intrinsic我知道，一个例子是intel确实用sse2实现了sin函数（libimf.so库里的sin_sse2.o），不过请P大研究研究它的速度，好像慢的很，比fsin还慢。在我看来真不如象sqrt那样，设置sinpd,sinps,sinsd,sinss之类的sse指令对我更实惠些。
老话题不要再讨论了。跑题了已经，我上面的帖子说的是“万亿处理器”编程工具落后于cell，落后了几年那是事实。

[ 本帖最后由紫色于 2007-12-19 12:53 编辑 ]

Prescott · 发表于 2007-12-19 13:46

原帖由紫色于 2007-12-19 12:13 发表
谁说我鄙视simd了，任意翻我的帖子你找不到我犯了这条罪。我鄙视的是那些盲目YY理论性能的，cell和sse都是如此，有了编程工具让我体验到好处，我为啥鄙视它。intel编译器带simd intrinsic我知道，我琢磨了libimf.so库 ...

谈万亿次？Cell只是个骗Sony的玩具而已。

难道IBM的intrinsic就是为万亿次作准备，Intel/MS的intrinsic就什么都不是？你也太有色眼睛了吧。难道就因为Intel/MS的intrinsic出来的太早，那个时候还没有万亿次的概念？

sin sse2实现比fsin慢？您用的是AMD的处理器吧。:whistling:

紫色 · 发表于 2007-12-19 15:15

也许吧。intel的simd内置函数出来的时候有谁联想到了万亿次？
用sse实现sin就涉及到算法，还要考虑到那些sse指令的延迟/吞吐，弄得不好就会慢了点，也不算奇怪。除了少数几个函数以外，大多数操作都是sse会比x87快点。

sin_sse.o我反汇编了给各位看看，P大认为很简单嘛？：
sin_sse2.o:    file format elf32-i386

Disassembly of section .text:

00000000 <__libm_sse2_sin>:
0: 55                   push %ebp
1: 8b ec                mov %esp,%ebp
3: 83 ec 68             sub $0x68,%esp
6: 66 0f c5 c0 03       pextrw $0x3,%xmm0,%eax
b: 25 ff 7f 00 00       and $0x7fff,%eax
  10: 2d 30 30 00 00       sub $0x3030,%eax
  15: 3d c5 10 00 00       cmp $0x10c5,%eax
  1a: 0f 87 37 01 00 00    ja    157 <__libm_sse2_sin+0x157>
  20: f2 0f 10 0d 70 08 00 movsd  0x870,%xmm1
  27: 00
  28: f2 0f 59 c8          mulsd  %xmm0,%xmm1
  2c: f2 0f 10 15 80 08 00 movsd  0x880,%xmm2
  33: 00
  34: f2 0f 2d d1          cvtsd2si %xmm1,%edx
  38: f2 0f 58 ca          addsd  %xmm2,%xmm1
  3c: f2 0f 10 1d 50 08 00 movsd  0x850,%xmm3
  43: 00
  44: f2 0f 5c ca          subsd  %xmm2,%xmm1
  48: 66 0f 28 15 40 08 00 movapd 0x840,%xmm2
  4f: 00
  50: f2 0f 59 d9          mulsd  %xmm1,%xmm3
  54: 66 0f 14 c9          unpcklpd %xmm1,%xmm1
  58: 81 c2 00 76 1c 00    add $0x1c7600,%edx
  5e: 66 0f 28 e0          movapd %xmm0,%xmm4
  62: 83 e2 3f             and $0x3f,%edx
  65: 66 0f 28 2d 30 08 00 movapd 0x830,%xmm5
  6c: 00
  6d: 8d 05 00 00 00 00    lea 0x0,%eax
  73: c1 e2 05             shl $0x5,%edx
  76: 03 c2                add %edx,%eax
  78: 66 0f 59 d1          mulpd  %xmm1,%xmm2
  7c: f2 0f 5c c3          subsd  %xmm3,%xmm0
  80: f2 0f 59 0d 60 08 00 mulsd  0x860,%xmm1
  87: 00
  88: f2 0f 5c e3          subsd  %xmm3,%xmm4
  8c: f2 0f 10 78 08       movsd  0x8(%eax),%xmm7
  91: 66 0f 14 c0          unpcklpd %xmm0,%xmm0
  95: 66 0f 28 dc          movapd %xmm4,%xmm3
  99: f2 0f 5c e2          subsd  %xmm2,%xmm4
  9d: 66 0f 59 e8          mulpd  %xmm0,%xmm5
  a1: 66 0f 5c c2          subpd  %xmm2,%xmm0
  a5: 66 0f 28 35 10 08 00 movapd 0x810,%xmm6
  ac: 00
  ad: f2 0f 59 fc          mulsd  %xmm4,%xmm7
  b1: f2 0f 5c dc          subsd  %xmm4,%xmm3
  b5: 66 0f 59 e8          mulpd  %xmm0,%xmm5
  b9: 66 0f 59 c0          mulpd  %xmm0,%xmm0
  bd: f2 0f 5c da          subsd  %xmm2,%xmm3
  c1: 66 0f 28 10          movapd (%eax),%xmm2
  c5: f2 0f 5c cb          subsd  %xmm3,%xmm1
  c9: f2 0f 10 58 18       movsd  0x18(%eax),%xmm3
  ce: f2 0f 58 d3          addsd  %xmm3,%xmm2
  d2: f2 0f 5c fa          subsd  %xmm2,%xmm7
  d6: f2 0f 59 d4          mulsd  %xmm4,%xmm2
  da: 66 0f 59 f0          mulpd  %xmm0,%xmm6
  de: f2 0f 59 dc          mulsd  %xmm4,%xmm3
  e2: 66 0f 59 d0          mulpd  %xmm0,%xmm2
  e6: 66 0f 59 c0          mulpd  %xmm0,%xmm0
  ea: 66 0f 58 2d 20 08 00 addpd  0x820,%xmm5
  f1: 00
  f2: f2 0f 59 20          mulsd  (%eax),%xmm4
  f6: 66 0f 58 35 00 08 00 addpd  0x800,%xmm6
  fd: 00
  fe: 66 0f 59 e8          mulpd  %xmm0,%xmm5
102: 66 0f 28 c3          movapd %xmm3,%xmm0
106: f2 0f 58 58 08       addsd  0x8(%eax),%xmm3
10b: 66 0f 59 cf          mulpd  %xmm7,%xmm1
10f: 66 0f 28 fc          movapd %xmm4,%xmm7
113: f2 0f 58 e3          addsd  %xmm3,%xmm4
117: 66 0f 58 f5          addpd  %xmm5,%xmm6
11b: f2 0f 10 68 08       movsd  0x8(%eax),%xmm5
120: f2 0f 5c eb          subsd  %xmm3,%xmm5
124: f2 0f 5c dc          subsd  %xmm4,%xmm3
128: f2 0f 58 48 10       addsd  0x10(%eax),%xmm1
12d: 66 0f 59 f2          mulpd  %xmm2,%xmm6
131: f2 0f 58 e8          addsd  %xmm0,%xmm5
135: f2 0f 58 df          addsd  %xmm7,%xmm3
139: f2 0f 58 cd          addsd  %xmm5,%xmm1
13d: f2 0f 58 cb          addsd  %xmm3,%xmm1
141: f2 0f 58 ce          addsd  %xmm6,%xmm1
145: 66 0f 15 f6          unpckhpd %xmm6,%xmm6
149: f2 0f 58 ce          addsd  %xmm6,%xmm1
14d: f2 0f 58 e1          addsd  %xmm1,%xmm4
151: 66 0f 28 c4          movapd %xmm4,%xmm0
155: eb 72                jmp 1c9 <__libm_sse2_sin+0x1c9>
157: 7f 2e                jg    187 <__libm_sse2_sin+0x187>
159: c1 e8 04             shr $0x4,%eax
15c: 3d fd fc ff 0f       cmp $0xffffcfd,%eax
161: 75 0a                jne 16d <__libm_sse2_sin+0x16d>
163: f2 0f 59 05 b0 08 00 mulsd  0x8b0,%xmm0
16a: 00
16b: eb 5c                jmp 1c9 <__libm_sse2_sin+0x1c9>
16d: f2 0f 10 1d 90 08 00 movsd  0x890,%xmm3
174: 00
175: f2 0f 59 d8          mulsd  %xmm0,%xmm3
179: f2 0f 5c d8          subsd  %xmm0,%xmm3
17d: f2 0f 59 1d a0 08 00 mulsd  0x8a0,%xmm3
184: 00
185: eb 42                jmp 1c9 <__libm_sse2_sin+0x1c9>
187: 66 0f c5 c0 03       pextrw $0x3,%xmm0,%eax
18c: 25 f0 7f 00 00       and $0x7ff0,%eax
191: 3d f0 7f 00 00       cmp $0x7ff0,%eax
196: 74 29                je    1c1 <__libm_sse2_sin+0x1c1>
198: 83 ec 20             sub $0x20,%esp
19b: f2 0f 11 04 24       movsd  %xmm0,(%esp)
1a0: 8d 44 24 20          lea 0x20(%esp),%eax
1a4: 89 44 24 08          mov %eax,0x8(%esp)
1a8: b8 02 00 00 00       mov $0x2,%eax
1ad: 89 44 24 0c          mov %eax,0xc(%esp)
1b1: e8 fc ff ff ff       call 1b2 <__libm_sse2_sin+0x1b2>
1b6: 83 c4 20             add $0x20,%esp
1b9: f3 0f 7e 44 24 08    movq 0x8(%esp),%xmm0
1bf: eb 08                jmp 1c9 <__libm_sse2_sin+0x1c9>
1c1: f2 0f 59 05 c0 08 00 mulsd  0x8c0,%xmm0
1c8: 00
1c9: 8b e5                mov %ebp,%esp
1cb: 5d                   pop %ebp
1cc: c3                   ret
1cd: 90                   nop
1ce: 90                   nop
1cf: 90                   nop

Prescott · 发表于 2007-12-19 15:21

哈哈，又不要你写，水平不够就老老实实用Intel给的函数啊。

你觉得这么一大堆指令会比三条fld/fsin/fstp慢？？哈哈哈。不是指令多就慢的。小朋友。

sunboy77_2000 · 发表于 2007-12-19 15:23

英特尔好像是说的希望下代游戏机不需要手柄。

紫色 · 发表于 2007-12-19 15:28

P大屡屡说gcc弱智，我看了gcc的论坛，关于simd intrinsic讨论的太多了。我认为真要解决它，他们有那个水平，gcc没有实现更多的是考虑ABI接口兼容性。目前gcc确实是没有simd内置函数实现（对32位），加-mno-80387编译libm就肯定失败。我很期待能早点实现。

[ 本帖最后由紫色于 2007-12-24 20:55 编辑 ]

Prescott · 发表于 2007-12-19 15:29

#include <stdio.h>
#include <math.h>
#include <sys/time.h>
int main()
{
struct timeval tv1, tv2;
int timediff;
gettimeofday(&tv1, NULL);
double x=0.654625453;
double y=0.754656857;
double z1, z2;
int i;
for(i=1;i<20000000;i++){
z1 = sin(x);
z2 = sin(y);
x += z1 + 1;
y += z2 + 1;
}
gettimeofday(&tv2, NULL);
timediff = (tv2.tv_sec - tv1.tv_sec) * 1000000 + (tv2.tv_usec - tv1.tv_usec);
printf("sse2: x=%e, y=%e, time = %dn", x, y, timediff);
gettimeofday(&tv1, NULL);
x=0.654625453;
y=0.754656857;
asm("finit");
for(i=1;i<20000000;i++){
asm("fsin" : "=t" (z1) : "0" (x));
asm("fsin" : "=t" (z2) : "0" (y));
x += z1 + 1;
y += z2 + 1;
}
gettimeofday(&tv2, NULL);
timediff = (tv2.tv_sec - tv1.tv_sec) * 1000000 + (tv2.tv_usec - tv1.tv_usec);
printf("fsin: x=%e, y=%e, time = %dn", x, y, timediff);
return 0;
}
[xxxx@xxxx-desk test]$ icc -g -fast -static sin_test1.cpp -lstdc++ -lm -o sin.icc
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating assembly file /tmp/iccPOwkIPas_.s
[xxxx@xxxx-desk test]$ ./sin.icc
sse2: x=4.712389e+00, y=4.712389e+00, time = 758399
fsin: x=4.712389e+00, y=4.712389e+00, time = 1254399

复制代码

:whistling: :whistling:

[ 本帖最后由 Prescott 于 2007-12-19 15:34 编辑 ]

紫色 · 发表于 2007-12-19 15:50

能不能跑一下简单点的。
double x=0.0,tmp
for(i=1;i<20000000;i++){
      tmp = sin(i);
      x=x+tmp;
   }

我就是跑这个的时候发现fsin快些。
cpu : p4 2.4B (478)，支持sse ,sse2

帐号		自动登录	找回密码
密码			注册

potomac 该用户已被删除	30^# 发表于 2007-12-19 11:36 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
potomac 该用户已被删除
	回复支持反对使用道具举报显身卡

INTEL：新处理器快过CELL 5000倍，下代游戏主机无须手柄

浏览过的版块