PCINLIFE特约：深入浅出谈CUDA by hotball

只看该作者 · 发表于 2008-6-10 11:16

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2008-6-10 11:34

CUDA其实是可以算是有一个ISA的：PTX。

只看该作者 · 发表于 2008-6-10 11:44

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2008-6-10 12:06

CUDA本身就是希望开发人员不需要怎么接触ISA就能完成程序的开发，即使把opcode提供了给你，你又能干些啥？CTM就是一个例子。

至于CUDA 1.0的硬件不能跑CUDA 1.1的代码这并不奇怪，这就好像Power等CPU都有不同的版本，只不过CUDA本身是直接相关于GPU硬件的，GPU发展比较快，旧的GPU也当然不能跑针对新GPU的代码。

不过我手头的CUDA 1.1代码都能在g80上运行。

只看该作者 · 发表于 2008-6-10 14:34

提示: 作者被禁止或删除内容自动屏蔽

只看该作者 · 发表于 2008-6-10 14:41

提示: 作者被禁止或删除内容自动屏蔽

痞子不俗 · 发表于 2008-6-10 21:58

学习学习！

Edison · 发表于 2008-6-11 10:51

这里推荐一篇Q/A，比较适合于一般的读者。

http://cuda.csdn.net/News.aspx?i ... 5-8930-496e06ca439c

complexmind · 发表于 2008-6-12 08:04

原帖由 Prescott 于 2008-6-7 05:12 PM 发表
Ct不是Larrabee的编程语言。
我再强调一遍，Ct只是研究项目，不是产品。

Edison和几年前一样，还是暴力计算能力的拥护者。缺乏的是对程序设计的基本了解，所以很容易就被Cell/CUDA的超高理论计算能力吸引，于是认 ...

又见p大,小弟这厢有礼了:-)看各位达人的帖感觉又回到了当初争论EE CELL到底在多大程度上能实现其理论性能的时候,结果无一是高并行结构失败…感觉并行计算只能在暴力计算场合
能占优势…而英伟达想让原来在cpu上的某些并行计算给gpu做,而不是取代它.

[ 本帖最后由 complexmind 于 2008-6-12 08:28 编辑 ]

deem · 发表于 2008-6-14 16:30

其实关键还是看应用范围，程序类型，CELL和CUDA这种东西在流体力学，大气分析等计算密集型应用方面
优势是不言而喻的，围着一小块数据拼命的计算，干这个东西再合适不过了。实际上也就只是适合干这个。

06年底参加过一段时间IBM的CELL研讨班(IBM自己弄得)，他们对于这个东西的期望也是非通用行业的，他们
期望的主要客户是石油，气象预报，能源等行业，其他的行业也没有报什么行业，在讨论编成模式的时候，他
们的技术人员也承认这个东西想要优化起来确实难度太大，很多东西都是理论上的性能，或者DEMO上的性能
实际应用中几乎不可能达到。

文中提到的延时隐藏也就是是在一些计算密集型程序里面容易实现，大型算法里面做这个简直就是要命。

deem · 发表于 2008-6-14 16:31

另外，现在的CPU都已经有数据预取指令了，如果充分优化程序，把访存的延时隐藏掉，同样可以
获得极大的性能提示。不过难度嘛......

complexmind · 发表于 2008-6-15 11:45

原帖由 deem 于 2008-6-14 04:31 PM 发表
另外，现在的CPU都已经有数据预取指令了，如果充分优化程序，把访存的延时隐藏掉，同样可以
获得极大的性能提示。不过难度嘛......

那也该比优化Cell小吧？

Edison · 发表于 2008-6-19 12:45

CUDA 2.0 beta 2的文档出来了，其中的5.1节有描述计算能力的部分：

A.1.2 Specifications for Compute Capability 1.1
Support for atomic functions operating on 32-bit words in global memory (see Section 4.4.4).

A.1.3 Specifications for Compute Capability 1.2
Support for atomic functions operating in shared memory and atomic functions operating on 64-bit words in global memory (see Section 4.4.4);
Support for warp vote functions (see Section 4.4.5);
The number of registers per multiprocessor is 16384;
The maximum number of active warps per multiprocessor is 32;
The maximum number of active threads per multiprocessor is 1024.

A.1.4 Specifications for Compute Capability 1.3
Support for double-precision floating-point numbers.

Edison · 发表于 2008-6-19 13:02

Specifications for Compute Capability 1.0

The maximum number of threads per block is 512;
The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, respectively;
The maximum size of each dimension of a grid of thread blocks is 65535;
The warp size is 32 threads;
The number of registers per multiprocessor is 8192;
The amount of shared memory available per multiprocessor is 16 KB organized into 16 banks (see Section 5.1.2.5);
The total amount of constant memory is 64 KB;
The cache working set for constant memory is 8 KB per multiprocessor;
The cache working set for texture memory varies between 6 and 8 KB per multiprocessor;
The maximum number of active blocks per multiprocessor is 8;
The maximum number of active warps per multiprocessor is 24;
The maximum number of active threads per multiprocessor is 768;
For a texture reference bound to a one-dimensional CUDA array, the maximum width is 213;
For a texture reference bound to a two-dimensional CUDA array, the maximum width is 216 and the maximum height is 215;
For a texture reference bound to a three-dimensional CUDA array, the maximum width is 211, the maximum height is 211, and the maximum depth is 211;
For a texture reference bound to linear memory, the maximum width is 227;
The limit on kernel size is 2 million PTX instructions;
Each multiprocessor is composed of eight processors, so that a multiprocessor is able to process the 32 threads of a warp in four clock cycles.

Bocelli · 发表于 2008-6-21 07:45

获益匪浅。比较认同Prescott的说法，NV的CUDA是占了先机，但Intel的决心不容小觑。

[ 本帖最后由 Bocelli 于 2008-6-21 08:02 编辑 ]

Eji · 发表于 2008-6-21 16:12

我在某P的發言內只看到了CPU is all.... :p

los_parrot · 发表于 2008-6-22 00:07

应该是intel is all.

这就是intel的思维方式(没有任何不敬的意思)

littlemouse · 发表于 2008-6-24 17:05

cuda和opencl有什么区别？
好像nv也是opencl成员？

Prescott · 发表于 2008-6-24 23:41

这么大的利好消息，E大怎么都不转？
http://anandtech.com/video/showdoc.aspx?i=3339

Edison · 发表于 2008-6-25 02:15

原帖由 Prescott 于 2008-6-24 23:41 发表
这么大的利好消息，E大怎么都不转？
http://anandtech.com/video/showdoc.aspx?i=3339

我这里目前用8800GS的速度和GTX280的速度一样，估计程序还没有优化好。NVIDIA说未来的软件版本可以做到480x360接近200fps的压缩，现在是130fps，720p可能可以做到100fps。

帐号		自动登录	找回密码
密码			注册

RacingPHT 该用户已被删除	61^# 发表于 2008-6-10 11:16 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	63^# 发表于 2008-6-10 11:44 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	65^# 发表于 2008-6-10 14:34 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	66^# 发表于 2008-6-10 14:41 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡