AMD 下一代 GPU 讨论贴 [追加 37 张架构幻灯片]

Edison · 发表于 2011-6-17 13:25

目前已知的情况：

http://www.realworldtech.com/for ... did=120411&roomid=2

Real caching in L1, L2, separate color/z caches for graphics and atomics
Concurrent tasks
Out of order resource allocation
ECC on srams and drams

Shaders
No vliw, just multiple issue simd
Branch, scalar, vector, vector memory, export units
4x16 wide vector ALUs

Some of the GPGPU improvements include:
1. Real caches
2. Graphs of data parallel kernels
3. Exceptions, recursion, function calls
4. Better branching, predication, masking, control flow
5. No more VLIW, instead a scalar+vector arch with fewer scheduling rules and more regular code generation
6. Acquire/release consistency model
7. ECC support for some SKUs
8. Substantially better DP performance for some SKUs
9. Faster global atomics

追加架构幻灯片：
http://www.hardware.fr/news/11648/afds-architecture-futurs-gpus-amd.html

rickerlian · 发表于 2011-6-17 17:44

非常开好融合，而且现在并行编程已经出现lib了，以后的代码究竟在cpu上跑还是在gpu上跑，完全不用程序员控制，由runtime根据实际硬件资源进行调度就ok了。
融合将使得runtime的编写变得更方便，runtime的响应更快速，这是因为cpu和gpu的内存都在同一个物理容器内，无需把数据在cpu和gpu之间来回传输。

Edison · 发表于 2011-6-17 19:22

按照 Digitimes 今天（2011-06-17）的消息，TSMC 已经确定拿下了 AMD 28nm GPU 的订单。

http://www.digitimes.com/news/a20110617PB200.html

AMD reportedly has completed the tape-out of its next-generation GPU, codenamed Southern Islands, on Taiwan Semiconductor Manufacturing Company's (TSMC) 28nm process with High-k Metal Gate (HKMG) technology, according to a Chinese-language Commercial Times report. The chip is set to expected to enter mass produciton at the end of 2011.
TSMC will also be AMD's major foundry partner for the 28nm Krishna and Wichita accelerated processing units (APUs), with volume production set to begin in the first half of 2012, the report said.

TSMC reportedly contract manufactures the Ontario, Zacate and Desna APUs for AMD as well as the Northern Island family of GPUs. All of these use the foundry's 40nm process technology.

TSMC was quoted as saying in previous reports that it had begun equipment move-in for the phase one facility of a new 12-inch fab (Fab 15) with volume production of 28nm technology products slated for the fourth quarter of 2011. The foundry previously said it would begin moving equipment into the facility in June, with volume production expected to kick off in the first quarter of 2012.

Foundry partners for next-generation AMD APU and GPU series

2011
2012
Mainstream and high-end APU
Llano
Trinity
Foundry partners
Globalfoundries 32nm SOI
Globalfoundries 32nm SOI
Entry-level APU targeting tablets
Ontario/ Zacate/ Desna
Krishna/Wichita
Foundry partners
TSMC 40nm
TSMC 28nm HKMG, Globalfoundries 28nm HKMG
GPU
Northern Islands
Southern Islands
Foundry partners
TSMC 40nm
TSMC 28nm HKMG

asdfjkl · 发表于 2011-6-17 20:00

rickerlian 发表于 2011-6-17 17:44
非常开好融合，而且现在并行编程已经出现lib了，以后的代码究竟在cpu上跑还是在gpu上跑，完全不用程序员控制 ...

融合受限于内存带宽，无法把规模以上的GPU融入CPU；目前NV也就只是把最高端的GF100推向HPC市场；GF104或者更低的只是作为显卡。
目前AMD融合的都是低端GPU；一方面低端GPU从设计之初侧重的图形性能；另一方面计算能力很弱(指DP的GFLOP/s).
AMD的低端还不支持DP吧？
现对于NV的方案，APU减少了CPU和GPU之间的交互的开销；但缺点就是我上面说的、

rickerlian · 发表于 2011-6-17 21:37

ls,不要忘记，2-3年前，hd3690在128bit的ddr3上跑，规模是320sp
而且，并行运算对于一些逻辑处理也很重要，这时并不需要浮点性能，如果有兴趣，可以参http://msdn.microsoft.com/en-us/library/gg675934.aspx。是介绍微软PPL并行运算程序库，可以预见，未来的PPL肯定会在APU上有很大作为；而使用PPL的程序，在无须修改代码的前提下，在APU上将跑得更快。

ptmd · 发表于 2011-6-17 23:05

本帖最后由 ptmd 于 2011-6-17 23:06 编辑

asdfjkl 发表于 2011-6-17 20:00
融合受限于内存带宽，无法把规模以上的GPU融入CPU；目前NV也就只是把最高端的GF100推向HPC市场；GF104或 ...

目前APU 还只是停留在Consumer 市场的部份，因为Hetergeneous Computing 生态还没在Server Market 建立起来，只是刚刚起步而己。在Consumer 市场还比较容易推动，因为有 PC Gaming 的支持。

他们在HPC 上推出的解决方案依然是CPUs with Accellerators through PCIe Buses。

disruptor · 发表于 2011-6-18 10:57

这和fermi看起来很像啊，amd的技术落后nv两年了

Edison · 发表于 2011-6-18 11:02

disruptor 发表于 2011-6-18 10:57
这和fermi看起来很像啊，amd的技术落后nv两年了

我觉得 FSA 和 kepler 更相似。

ptmd · 发表于 2011-6-18 15:03

disruptor 发表于 2011-6-18 10:57
这和fermi看起来很像啊，amd的技术落后nv两年了

GCN 比起Fermi，更像Larrabee 或是以前的Cray (Scalar+Vector)。像 Fermi 吗？或许。

反正落不落后和像不像，都是个人主观判断。{titter:]

只看该作者 · 发表于 2011-6-18 17:37

提示: 作者被禁止或删除内容自动屏蔽

disruptor · 发表于 2011-6-19 11:02

FSA还有很多CUDA4.0的特性

66666 · 发表于 2011-6-19 11:13

从图中看AMD还是把CPU和GPU挂在一个环形总线上共享数据，为何不做成SNB那种共享L2和L3缓存机制？CPU和GPU融合通用计算的话，共享缓存明显比走总线效率要高的多。

Edison · 发表于 2011-6-19 15:30

66666 发表于 2011-6-19 11:13
从图中看AMD还是把CPU和GPU挂在一个环形总线上共享数据，为何不做成SNB那种共享L2和L3缓存机制？CPU和GPU融 ...

以后肯定会实现，目前的制造工艺和 APU 的产品定位让 AMD 觉得搞 iL3 cache 有些亏。

帐号		自动登录	找回密码
密码			注册

shadowxp 该用户已被删除	10^# 发表于 2011-6-18 17:37 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
shadowxp 该用户已被删除
	回复支持反对使用道具举报显身卡

AMD 下一代 GPU 讨论贴 [追加 37 张架构幻灯片]

本帖子中包含更多资源

浏览过的版块