NVIDIA NV50 执行单元流水线架构图

Edison · 发表于 2006-5-3 15:01

215： Pixel Input Buffer
220：Vertex Input Buffer
225：Texture Unit

When execution of an instruction is complete, Execution Unit 470 updates Resource Scoreboard 460 to indicate that destination registers are written and the computation resources used to process the instruction are available. In an alternate embodiment, Resource Scoreboard 460 snoops an interface between Execution Unit 470 and Register File 350 to update register status.

When the program instructions associated with a thread have completed execution, the storage resources allocated to retain intermediate data generated during execution of the thread become available for allocation to another thread, i.e., the storage resources are deallocated and the thread is flagged as available in Thread Control Unit 420. When a program instruction stored in Instruction Cache 410 has completed execution on each sample within the one or more sets that the program instruction is programmed to process, the program instruction is retired from Instruction Cache 410 (by being overwritten).

In step 510 Thread Control Unit 320 or 420 receives a sample, e.g., vertex, fragment, pixel, and the like. In step 515 Thread Control Unit 320 or 420 determines if the sample is a vertex sample or a pixel sample, and if the sample is a vertex sample Thread Control Unit 320 or 420 proceeds to step 530. In step 530 Thread Control Unit 320 or 420 assigns a vertex thread to the vertex sample to be processed by a vertex program.

The maximum number of threads that can be executed simultaneously is related to the number of Execution Pipelines 240, the size of storage for thread state data, the amount of storage for intermediate data generated during processing of a sample, the latency of Execution Pipelines 240, and the like. Likewise, a number of threads of each sample type that may be executed simultaneously may be limited in each embodiment. Therefore, not all samples within a first set of samples of a first type can be processed simultaneously when the number of threads available for processing samples of the first type is less than the number of samples of the first type. Conversely, when the number of threads available for processing samples of a second type exceeds the number of samples of the second type within a second set, more than one set can be processed simultaneously. When processing throughput is limited for samples of the first type, the number of threads available for the first type may be increased by allocating unused threads for processing samples of the first type. For example, locations in Portion 620 may be allocated to Portion 630.

有空我再介绍:whistling:

Edison · 发表于 2006-5-3 15:05

United States Patent  7,015,913 Lindholm , et al.  March 21, 2006
United States Patent  7,038,686 Lindholm  May 2, 2006
United States Patent  7,038,685 Lindholm  May 2, 2006

gzeasy2006 · 发表于 2006-5-3 15:09

原帖由 Edison 于 2006-5-3 15:01 发表
215： Pixel Input Buffer
220：Vertex Input Buffer
225：Texture Unit

When execution of an instruction is complete, Execution Unit 470 updates Resource Scoreboard 460 to indicate that destinati ...

老大我看不懂E文，能不能给翻译一下？

Edison · 发表于 2006-5-3 15:28

这是纯专利文档，不能揭示最终的样子。

NV50既能用于DX9，也能用于DX10。

NV50和RSX毫无关系。

在低端产品上以目前的制造工艺而言，执行common shader的经济性也许不是很好。

samsung · 发表于 2006-5-3 18:49

預測NV50是96:24或64:16架構
G83可能是32:8(48:12可能成本因素不可能實現)

Edison · 发表于 2006-5-3 23:45

NV50只是NVIDIA US架构的基本构思，而不是某个特定的产品，所以shader:tmu的比例看看就好了。

布莱德舰长 · 发表于 2006-5-15 18:47

我小白，US什么意思啊？

mrhjzhang · 发表于 2006-5-18 01:15

路过，强贴留名，以后慢慢看

waveszhang · 发表于 2006-5-20 11:20

US是米国。。。。。

Eji · 发表于 2006-5-20 14:35

看起來的確是Unified Shader.....
如果G80 is NV50 based，那應該跑不掉是Unified吧？

gzeasy2006 · 发表于 2006-5-22 18:21

原帖由 Eji 于 2006-5-20 14:35 发表
看起來的確是Unified Shader.....
如果G80 is NV50 based，那應該跑不掉是Unified吧？

会不会动用 65nm ？

shu0202 · 发表于 2006-6-26 13:43

是一个体系代称。NV是想搞个延续性架构，后几代产品都在此基础上开发。因为现在GPU如此复杂，每两代都搞个全新架构代价已经无法承受了，而且以后API发展速度也放缓了，所以基础架构的扩展性是首先需要考虑的问题。G80应该是这个架构的首批实验品。

ayanamei · 发表于 2006-7-9 22:46

可以看出NV50并没有打算和ATI一样统一硬件上的US单元
还是把VS,PS放到各自的单元去.
感觉整个流程就要烦琐很多,虽然ALU资源的结构不用象ATi那样那么复杂.
难道是同样的晶体管数量下更多的独立作用的ALU组
去对抗相对复杂,必然数量较少的US组.

按道理这样的设计,中间应该会造成一定的延迟,这个负面的影响应该是有的.
然后是单个单纯的VS,PS 单元的效率,和US的效率对比的话,前者肯定会大于后者.
US的优势在于可以很灵活的针对实际情况支配VS,PS的比例,这个是US的优势.

纯效率至上的设计 VS 灵活性的设计??

Edison · 发表于 2006-7-10 05:25

这是一个彻头彻尾的common shader架构，不明白你怎么还会说是分离架构，到现在怎么都还有人迷信ATI所到处宣扬的G80为传统架构+Geometry shader一说，难道是NVIDIA的产品都是ATI设计的？难道David Krik在2004年针对ATI可能的US架构说的话会导致一个三年后大量上市的产品继续沿用旧的架构，你们的思维这么保守？

ayanamei · 发表于 2006-7-10 14:19

原帖由 Edison 于 2006-7-10 05:25 发表
这是一个彻头彻尾的common shader架构，不明白你怎么还会说是分离架构，到现在怎么都还有人迷信ATI所到处宣扬的G80为传统架构+Geometry shader一说，难道是NVIDIA的产品都是ATI设计的？难道David Krik在2004年针对 ...

上面没有涉及到最后的执行部分
这个整套基本上就是thread管理部分
感觉越来越像CPU的前端了
assign vertex thread/pixel thead 怎么继续往下走才是重点吧？
如果Shade执行单元可以无差别的执行VS,PS 那么是否有必要这样的反复判断Shader类型呢

只看该作者 · 发表于 2006-7-11 23:09

提示: 作者被禁止或删除内容自动屏蔽

ayanamei · 发表于 2006-7-14 16:13

原帖由 来不及思考 于 2006-7-11 23:09 发表

你眼睛花了
这些图你从哪看得出是分离架构啊

图中区分了VS thread 和PS thread 走向应该是不同的
他们的去向一样的话何必这样注重这个？
最后他们应该是被分配到了不同性质的执行单元上
至少我现在这么认为

Eji · 发表于 2006-8-23 11:44

原帖由 ayanamei 于 2006-7-14 16:13 发表

图中区分了VS thread 和PS thread 走向应该是不同的
他们的去向一样的话何必这样注重这个？
最后他们应该是被分配到了不同性质的执行单元上
至少我现在这么认为

因為VS輸出的結果要送給Triangle Setup Engine，
你總是得做些flag才能知道到時候做完要傳到哪邊；
Vertex Texture的性質和一般Texture的性質也不太一樣，
所以我不認為即使"都會用到同樣的東西"，它也不需要什麼額外的分辨與定義。
也就是說，我認為這是"把一個有兩種型態的Unit，透過前端的分辨來設定其型態"的代表。
因為如果是分離式結構，那它當然是更不需要分辨了。
Shader unit在流水線的位置又不會改變，那VS一定是收到Vertex、PS亦然，
那何需分辨？直接跑就好了。

所以，反而是"要分辨thread是Vertex or Pixel"，才是必然為US結構的證明。

[ 本帖最后由 Eji 于 2006-8-23 11:46 编辑 ]

ayanamei · 发表于 2006-10-8 10:53

原帖由 Eji 于 2006-8-23 11:44 发表

因為VS輸出的結果要送給Triangle Setup Engine，
你總是得做些flag才能知道到時候做完要傳到哪邊；
Vertex Texture的性質和一般Texture的性質也不太一樣，
所以我不認為即使"都會用到同樣的東西&qu ...

但是如果GPU本身内部非US,而DX10 不区分VS/PS的话,GPU同样要介定Shader类型才能给予正确的执行不是么?
接口层面同样作到US,但是内部用非US方式实现个人感觉是这样

[ 本帖最后由 ayanamei 于 2006-10-8 10:54 编辑 ]

Eji · 发表于 2006-10-22 00:39

原帖由 ayanamei 于 2006-10-8 10:53 发表

但是如果GPU本身内部非US,而DX10 不区分VS/PS的话,GPU同样要介定Shader类型才能给予正确的执行不是么?
接口层面同样作到US,但是内部用非US方式实现个人感觉是这样

？DX10本身管線上仍然是區分VS/PS的，反而是Shader內部無視工作種類在咦鳌

帐号		自动登录	找回密码
密码			注册

来不及思考该用户已被删除	16^# 发表于 2006-7-11 23:09 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
来不及思考该用户已被删除
	回复支持反对使用道具举报显身卡

NVIDIA NV50 执行单元流水线架构图

本帖子中包含更多资源

回复 #4 Edison 的帖子

回复 #16 ayanamei 的帖子

浏览过的版块