NVIDIA 下一代架构"Fermi" 猜测、讨论专题

只看该作者 · 发表于 2009-6-1 12:29

提示: 作者被禁止或删除内容自动屏蔽

ic.expert · 发表于 2009-6-8 22:39

赫赫，其實我非常非常歡迎不同的學術思鄉。但是Term方面還是要規範一下，大家才好更順暢的溝通，這方面可以參考ACM等組織的文獻。那老外的東西規範自己，這並不是說要唯洋大人馬首是瞻，只是因爲咱們還不具備定義術語的權勢，所以只能盡量按照別人的標準去規範。

Prefetch Instruction的確在CPU裏面有，這實際上是把Cache當作Extended Registerfiles來使用。只是爲了程式工程師理解方便才這番定義的。當然，Cache本身自己也有HW的Speculative Prefetching Mechanism。但是這些並不是廣義上的Prefetch。很多看似模棱兩個的東西都是有規範的，比如Vector和SIMD到底是不是一個東西？哪個是狹義的？哪個是廣義的？

現在回到Graphics上，那篇Paper很有名，我覺得應該好好看一看，我的文章是发表于 2009-6-1 12:14 ，而兄台的文章是2009-6-1 12:29 ，兄台連寫文章和看Paper的時間總計划了15分鐘，我覺得這個時間是看不完那片文章的。而且那篇文章還有很多Ref，而Ref又有Ref。這需要大量的閲讀來建立一個知識體系。

比如LRB和NV50都歸屬TLP，但是他們的完全不一樣的TLP，一個是FGMT一個是CGMT，兩者在哪些方面各有千秋？還比如在SGRAM時代，Fast-Z Clear是直接被Memory支持的（即，可以用一次Memory Access操作在Memory中跨越Row來Set Value），但是到了GDDR時代，這個技術特性在Memory上就被Cancel了，爲什麽？還比如，兄台說得“長FIFO”，這個規範的Term叫做什麽？在哪些Paper中被廣泛研究過。根據這些Paper的思想，還提出過什麽類似的架構？這需要更細緻仔細的閲讀文獻，和思考。以及更加規範的術語。就像規範我們平時項目中的文檔一樣。

只看该作者 · 发表于 2009-6-9 09:39

提示: 作者被禁止或删除内容自动屏蔽

ic.expert · 发表于 2009-6-9 10:42

所谓胸中千古训笔下万般情，我不知道如果以后我手下碰到这样的气势的猛士，我要怎么管理，要怎么样春风化雨才能晓之以情，但是我觉得作为一个N年工作的人来说，一般都能明白一些何谓“胸有激雷而面平湖者，可拜为上将军”，而又何谓“心怀天下而下天下着，可谓帝王”。

古人云“下君尽己之能，中君尽人之力，上君尽人之智”，气势最高明的辩论只是施加心理暗示，然后让被暗示者受到其他而幡然醒悟，管理团队也同样是如此。能做到这一点就要仔细听取各方意见，然后才能分析。对于这种N年工作的人来说，想必这些道理早就烂熟于胸了，我想兄台也一定是如此大气大志之人。所以，我觉得兄台还要从新看看我的226贴里面讨论时咱们哪个时间发表的帖子，以及相关的Paper。工业界的技术思想都是来源于学术界的，而学术界是没有NDA的。举个例子，比如Intel的HyperThread，这个技术在学术界称做SMT，关于SMT的Paper很多，而HyperThread仅仅是一种实现方式而已。并不能说应为HyperThread有NDA，就没人能够研究了SMT……。

只看该作者 · 发表于 2009-6-9 11:32

提示: 作者被禁止或删除内容自动屏蔽

ic.expert · 发表于 2009-6-9 19:44

所有的Control Dependency都是能转化为Data Dependency的，Branch也是如此，都可以通过HW或是SW来解决。如果HW要支持，则必须动用CAM，而CAM上 Entry的数量决定了一切。实际上着相当于一个Dataflow Machine，主要原理来自于一种图灵等价的Lambda表达式，这方面在80年代已经有很多ACM的Paper了，实际上Prefetch-based Arch是可以通过这种方式来得到灵活性的同时又得到性能，不过SM4是的这一切成为了泡影……

赫赫，其实技术不过就是权衡而已。没有旁征博引、兼听则明，那又谈权衡呢？我们永远需要像楼上兄台这样不同的声音。无论对与错，即便说的不再理，会让我们得以借鉴教训，如果说在理，也会让我们改正错误开阔视野。总之，还是谢谢楼上兄台的讨论和批评。

ic.expert · 发表于 2009-6-10 06:46

古训有云，三人行,必有我师焉;择其善者而从之,其不善者而改之。楼下继续讨论：〉希望本版能够繁荣昌盛~

Edison · 发表于 2009-6-13 02:44

http://patft.uspto.gov/netacgi/n ... IA&RS=AN/NVIDIA

SUMMARY OF THE INVENTION

Embodiments of the present invention provide a method and system for ensuring needed graphics rendering data (e.g., texture values, normal maps, etc.) can be maintained in low latency memory for an efficient access by the GPU. Embodiments of the present invention provide fast and efficient real-time 3-D graphics rendering by increasing the efficiency of cache memory access and by limiting the performance penalties resulting from accessing higher latency memory.

In one embodiment, the present invention is implemented as a GPU architecture configured for traversing pixels of an area. The GPU includes a set-up unit for generating polygon descriptions and a rasterizer unit coupled to the set-up unit for rasterizing the polygon descriptions. The rasterizer unit is configured to traverse a plurality of pixels of an image using a first boustrophedonic pattern along a predominant axis, and during the traversal using the first boustrophedonic pattern, traverse a plurality of pixels of the image using a second boustrophedonic pattern, wherein the second boustrophedonic pattern is nested within the first boustrophedonic pattern.

In one embodiment, the first boustrophedonic pattern and the second boustrophedonic pattern are implemented by a coarse rasterizer component within the raster unit of the GPU. In one embodiment, the GPU groups the plurality of pixels of the image as tiles and the tiles are traversed using the first boustrophedonic pattern and the second boustrophedonic pattern.

In one embodiment, the number of pixels per tile is programmable, and can be designated as 4.times.4, 8.times.8, 16.times.16, 32.times.32, 64.times.64, 128.times.128, or the like, including rectangular as well as square arrays, in accordance with the requirements of a graphics rendering operation. Similarly, the number of pixels per tile is programmable in accordance with a size of a cache memory of the GPU, and the predominant axis of the first boustrophedonic pattern and/or the second boustrophedonic pattern is programmable.

tile-based renderer :)?

只看该作者 · 发表于 2009-6-14 17:02

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2009-6-25 23:35

下一代架构的名称出来了——fermi。

Edison · 发表于 2009-6-27 17:39

http://patft.uspto.gov/netacgi/n ... 1&RS=PN/7372471

这个应该就是 Tesla 开始引入的 CSAA 的专利。

BACKGROUND OF THE INVENTION
Computer graphics systems represent graphical primitives as pixel elements of a display. Aliasing refers to the visual artifacts, such as stair-stepping of surface edges (sometimes known as "jaggies"), that occur when an image is sampled at a spatial sample frequency that is too low.
A variety of anti-aliasing techniques exist to reduce visual artifacts. One example is supersampling, in which an image is sampled more than once per pixel grid cell and the samples are filtered. For example, in supersampling the contribution of each sample in a pixel may be weighted to determine attributes of the pixel, such as the pixel color. Another example of an anti-aliasing technique is multisampling. As objects are rendered in a multisampling system, a single color is typically computed per primitive and used for all subpixel samples covered by the primitive. Additional background information on anti-aliasing techniques used in graphics processing units (GPUs) may be found in several patents issued to the Nvidia Corporation of Santa Clara, Calif. such as U.S. Pat. No. 6,452,595, "Integrated graphics processing unit with anti-aliasing," U.S. Pat. No. 6,720,975, "Super-sampling and multi-sampling system and method for anti-aliasing," and U.S. Pat. No. 6,469,707, "Method for efficiently rendering color information for a pixel in a computer system," the contents of each of which are hereby incorporated by reference.
In both supersampling and multisampling the quality of the anti-aliasing tends to improve as the number of samples per partially covered pixel increases. For example, increasing the number of samples per pixel from four samples ("4.times." sampling) to sixteen ("16.times." sampling) would be expected to reduce visual artifacts. However, as the number of samples per pixel increases more sample data must be generated, stored, transported, and processed. Consequently, the required memory resources, computing resources, and memory bandwidth may increase as the number of samples per pixel is increased. As a result, the cost and complexity of a graphics system tends to increase as the number of samples used for anti-aliasing increases.
Therefore, what is desired is an improved apparatus, system, and method for anti-aliasing that achieves many of the same benefits associated with an increase in the number of samples per pixel but without the corresponding increase in cost and complexity associated with conventional anti-aliasing techniques.
SUMMARY OF THE INVENTION
A graphics system has a mode of operation in which it determines coverage of primitives at a single real sample location within a pixel and at least one virtual sample location within the pixel. In one embodiment, a virtual sample is a pointer that points to either the single real sample associated with the same pixel or to a neighboring proximate pixel. Information on virtual sample coverage is used to adjust the weights of real samples for a down-filtering process.
In one embodiment, generating weighted samples for anti-aliasing a pixel includes: determining coverage by primitives over a single real sample location per pixel and at least one virtual sample location per pixel; for each virtual sample location within a pixel, identifying a virtual sample as belonging to either said pixel or to a neighboring pixel proximate said virtual sample location; and utilizing virtual sample coverage information to adjust the weights of real samples with real samples of neighboring pixels.

复制代码

还有这个：
http://patft.uspto.gov/netacgi/n ... 9&RS=PN/7333119

generating at least one real sample for a primitive covering the pixel, the real sample including z depth data and color data for a sample location within the pixel; detecting coverage of at least one virtual sample location by said primitives within the pixel; for each covered virtual sample location within the pixel, forming a virtual sample by generating a pointer identifying a set of real sample locations within the pixel that are also covered by a common visible primitive; utilizing said at least one virtual sample to adjust the weight of at least one real sample for anti-aliasing; and displaying the anti-aliased pixel.
for each real sample r weight[r] = 1;// contribution of real sample itself for each virtual sample determine closest real sample r with a 1 in the bitmask weight[r]++. A table may be used to determine, for each virtual sample and coverage bit pattern, which eligible real sample is the closest.
The final pixel color is calculated during anti-aliasing using the weights calculated for the real samples. An exemplary algorithm for computing the final pixel color as a weighted average of the real sample colors (color[r]) is as follows
finalcolor = 0;
for each real sample r finalcolor += color[r]* weight[r]/ (total samples).

复制代码

只看该作者 · 发表于 2009-7-1 11:16

提示: 作者被禁止或删除内容自动屏蔽

Edison · 发表于 2009-7-1 13:00

以目前看，CSAA 就是测定三角形覆盖的高倍数虚拟样本位置，来获取真实样本的色彩权重百分比，不知道是否理解正确。

如果这样的话，多少倍 CS，其实得到的顶多只是几个更精确的 real sample color。

CSAA 除了 NVIDIA 的例程里有不错的保险外，许多游戏里的确是没啥效果，至于劣化，好像还没看到。

下面附带上 7333119 的 pdf，已经经过 OCR，文本可以搜索。:)

ic.expert · 发表于 2009-7-1 22:34

CSAA看似是一种更加Smart的AA技术，如果实线真的如此的话，那和CFAA也可以很好的融合在一起了。不过CSAA这种下降滤波的方式感觉和Alpha to Coverage的机制极有可能有一腿，即每个Sampling Location占用百分比的参数有可能被Alpha to Coverage复用。不知道大牛都怎么看？

ic.expert · 发表于 2009-7-1 22:42

问一下，陈先生用什么OCR的？我用ABBYY总是会有乱码。

ic.expert · 发表于 2009-7-1 22:48

本帖最后由 ic.expert 于 2009-7-1 23:13 编辑

我又想了一下，这相当于做了一个前置滤波，而且是下降滤波。

不过这种机制仍然不能使得，当三角形之间发生交叉的时候，使交叉边界更加柔和……

Edison · 发表于 2009-7-2 00:10

问一下，陈先生用什么OCR的？我用ABBYY总是会有乱码。
ic.expert 发表于 2009-7-1 22:42

Adobe acrobat pro， document->OCR。

Edison · 发表于 2009-7-2 00:27

我对这个细节不清楚，按照 NVIDIA 的说法，适用性挺广的，不过效果真的是没啥，也许 wireframe 模式下会比较出彩，问题是 wireframe 也就主要是工业软件在使用。

ic.expert · 发表于 2009-7-2 02:18

Adobe acrobat pro， document->OCR。
Edison 发表于 2009-7-2 00:10

谢谢陈先生 :>

只看该作者 · 发表于 2009-7-3 17:58

提示: 作者被禁止或删除内容自动屏蔽

帐号		自动登录	找回密码
密码			注册

RacingPHT 该用户已被删除	221^# 发表于 2009-6-1 12:29 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	223^# 发表于 2009-6-9 09:39 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	225^# 发表于 2009-6-9 11:32 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

RacingPHT 该用户已被删除	229^# 发表于 2009-6-14 17:02 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

NVIDIA 下一代架构"Fermi" 猜测、讨论专题

本帖子中包含更多资源

本帖子中包含更多资源

RacingPHT 该用户已被删除	232^# 发表于 2009-7-1 11:16 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
RacingPHT 该用户已被删除
	回复支持反对使用道具举报显身卡

Christ2002 该用户已被删除	240^# 发表于 2009-7-3 17:58 \| 只看该作者提示: 作者被禁止或删除内容自动屏蔽
Christ2002 该用户已被删除
	回复支持反对使用道具举报显身卡