|
原帖由 Prescott 于 2007-9-30 17:21 发表 ![]()
稍微修改一下程序,满足你一下:
#include
#include
float a[512][512], b[512][512], c[512][512], d = 0;
int main () {
int i,j,N;
srand(0);
for (i = 0; i < 512; i++ ...
用P大的代码试了一下:loveliness:
结果如下:
gogo@no1-debian:~/test$ g++ -O3 -mfpmath=387 sse.cc -o a.gcc.387
gogo@no1-debian:~/test$ time ./a.gcc.387
result of c[25][391] is -297.068634
real 8m29.681s
user 8m28.160s
sys 0m0.212s
gogo@no1-debian:~/test$ g++ -O3 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=5 sse.cc -o a.gcc.sse
sse.cc:1: warning: SSE instruction set disabled, using 387 arithmetics
sse.cc:12: note: not vectorized: unhandled data-ref
sse.cc:21: note: not vectorized: no vectype for stmt: D.3208_38 = c[i_15][j_88] scalar_type: float
sse.cc:21: note: vectorized 0 loops in function.
//照P大的参数居然优化不成功,加了对应CPU的参数就好了
gogo@no1-debian:~/test$ g++ -O3 -mfpmath=sse -march=athlon-xp -ftree-vectorize -ftree-vectorizer-verbose=5 sse.cc -o a.gcc.sse
sse.cc:12: note: not vectorized: unhandled data-ref
sse.cc:21: note: dependence distance = 0.
sse.cc:21: note: accesses have the same alignment.
sse.cc:21: note: dependence distance modulo vf == 0 between c[i_90][j_92] and c[i_90][j_92]
sse.cc:21: note: dependence distance = 0.
sse.cc:21: note: accesses have the same alignment.
sse.cc:21: note: dependence distance modulo vf == 0 between c[i_90][j_92] and c[i_90][j_92]
sse.cc:21: note: LOOP VECTORIZED.
sse.cc:21: note: vectorized 1 loops in function.
gogo@no1-debian:~/test$ time ./a.gcc.sse
result of c[25][391] is -297.068634
real 5m33.425s
user 5m32.317s
sys 0m0.160s
gogo@no1-debian:~/test$ |
|