|
for dense matrix, cache works very well.
blas3/lapack routines can get more than 80% of cpu's peak performance.
for sparse matrix, try to buy a machine with more bandwidth is the only way.
However, flop/bw=4 is the extreme case.
原帖由 RacingPHT 于 2007-11-26 18:21 发表 ![]()
受教
不过如果根本没有办法把FSB降下来, 那怎么办?你应该不会否认稀疏矩阵对科学/模拟计算的意义吧。包括稠密矩阵, 也是一样的, 算法本身就有固定的flop/bw比值。
幸好一般常用的程序, working set是大于4MB的情 ... |
|