POPPUR爱换

 找回密码
 注册

QQ登录

只需一步,快速开始

手机号码,快捷登录

搜索
楼主: Edison
打印 上一主题 下一主题

NVIDIA 下一代架构"Fermi" 猜测、讨论专题

 关闭 [复制链接]
221#
发表于 2009-6-1 12:14 | 只看该作者
当然,本人能力有限,才疏学浅,还往楼上多多指教,多多批评:〉我继续说说自己的观点。

学术上对于Prefetch的定义不仅仅局限于预测,预测这个概念的Prefetch只是一个方面。这一点在RISC的思想中已经得到体现,即,把Load/Strore指令和ALU指令分开。ALU指令不参与访存,而是靠L/S指令prefetch 数据到Register中,从而开始ALU计算。而非像CISC那种每个ALU指令都可以从Memroy中取数据。至于GPU方面的Prefetch,对于举例子的那张图,可以参考Siggraph上“Prefetching in a Texture Cache Architecture”这篇文章。

关于最后给的那个论文是用来说明Stream是如何Operation的,着主要归类为两种操作,bulk operation和bulk transfer。从数学的角度来说,这种编程模型属于lambda 表达式的范畴。但是在Cell这种图灵机实现上面跑而已。

关于楼上大侠最后提到的Vertex Stream等,不知道我是不是理解有误,其实Input Assamble Stage中对Vetex Buffer访问,目前GPU完全是靠Shader的TLP来隐藏延迟的,这点可以参考AMD的开放文档。而且Bottleneck主要在Texture上,而纹理完全是靠TLP的,所以基本上来说Prefetch-arch已经很弱化了,目前是TLP的天下。
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
222#
发表于 2009-6-1 12:29 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

223#
发表于 2009-6-8 22:39 | 只看该作者
赫赫,其實我非常非常歡迎不同的學術思鄉。但是Term方面還是要規範一下,大家才好更順暢的溝通,這方面可以參考ACM等組織的文獻。那老外的東西規範自己,這並不是說要唯洋大人馬首是瞻,只是因爲咱們還不具備定義術語的權勢,所以只能盡量按照別人的標準去規範。

Prefetch Instruction的確在CPU裏面有,這實際上是把Cache當作Extended Registerfiles來使用。只是爲了程式工程師理解方便才這番定義的。當然,Cache本身自己也有HW的Speculative Prefetching Mechanism。但是這些並不是廣義上的Prefetch。很多看似模棱兩個的東西都是有規範的,比如Vector和SIMD到底是不是一個東西?哪個是狹義的?哪個是廣義的?

現在回到Graphics上,那篇Paper很有名,我覺得應該好好看一看,我的文章是发表于 2009-6-1 12:14 ,而兄台的文章是2009-6-1 12:29 ,兄台連寫文章和看Paper的時間總計划了15分鐘,我覺得這個時間是看不完那片文章的。而且那篇文章還有很多Ref,而Ref又有Ref。這需要大量的閲讀來建立一個知識體系。

比如LRB和NV50都歸屬TLP,但是他們的完全不一樣的TLP,一個是FGMT一個是CGMT,兩者在哪些方面各有千秋?還比如在SGRAM時代,Fast-Z Clear是直接被Memory支持的(即,可以用一次Memory Access操作在Memory中跨越Row來Set Value),但是到了GDDR時代,這個技術特性在Memory上就被Cancel了,爲什麽? 還比如,兄台說得“長FIFO”,這個規範的Term叫做什麽?在哪些Paper中被廣泛研究過。根據這些Paper的思想,還提出過什麽類似的架構?這需要更細緻仔細的閲讀文獻,和思考。以及更加規範的術語。就像規範我們平時項目中的文檔一樣。
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
224#
发表于 2009-6-9 09:39 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

225#
发表于 2009-6-9 10:42 | 只看该作者
所谓胸中千古训笔下万般情,我不知道如果以后我手下碰到这样的气势的猛士,我要怎么管理,要怎么样春风化雨才能晓之以情,但是我觉得作为一个N年工作的人来说,一般都能明白一些何谓“胸有激雷而面平湖者,可拜为上将军”,而又何谓“心怀天下而下天下着,可谓帝王”。

古人云“下君尽己之能,中君尽人之力,上君尽人之智”,气势最高明的辩论只是施加心理暗示,然后让被暗示者受到其他而幡然醒悟,管理团队也同样是如此。能做到这一点就要仔细听取各方意见,然后才能分析。对于这种N年工作的人来说,想必这些道理早就烂熟于胸了,我想兄台也一定是如此大气大志之人。所以,我觉得兄台还要从新看看我的226贴里面讨论时咱们哪个时间发表的帖子,以及相关的Paper。工业界的技术思想都是来源于学术界的,而学术界是没有NDA的。举个例子,比如Intel的HyperThread,这个技术在学术界称做SMT,关于SMT的Paper很多,而HyperThread仅仅是一种实现方式而已。并不能说应为HyperThread有NDA,就没人能够研究了SMT……。
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
226#
发表于 2009-6-9 11:32 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

227#
发表于 2009-6-9 19:44 | 只看该作者
所有的Control Dependency都是能转化为Data Dependency的,Branch也是如此,都可以通过HW或是SW来解决。如果HW要支持,则必须动用CAM,而CAM上 Entry的数量决定了一切。实际上着相当于一个Dataflow Machine,主要原理来自于一种图灵等价的Lambda表达式,这方面在80年代已经有很多ACM的Paper了,实际上Prefetch-based Arch是可以通过这种方式来得到灵活性的同时又得到性能,不过SM4是的这一切成为了泡影……

赫赫,其实技术不过就是权衡而已。没有旁征博引、兼听则明,那又谈权衡呢?我们永远需要像楼上兄台这样不同的声音。无论对与错,即便说的不再理,会让我们得以借鉴教训,如果说在理,也会让我们改正错误开阔视野。总之,还是谢谢楼上兄台的讨论和批评。
回复 支持 反对

使用道具 举报

228#
发表于 2009-6-10 06:46 | 只看该作者
古训有云,三人行,必有我师焉;择其善者而从之,其不善者而改之。楼下继续讨论:〉希望本版能够繁荣昌盛~
回复 支持 反对

使用道具 举报

229#
 楼主| 发表于 2009-6-13 02:44 | 只看该作者
http://patft.uspto.gov/netacgi/n ... IA&RS=AN/NVIDIA

SUMMARY OF THE INVENTION

Embodiments of the present invention provide a method and system for ensuring needed graphics rendering data (e.g., texture values, normal maps, etc.) can be maintained in low latency memory for an efficient access by the GPU. Embodiments of the present invention provide fast and efficient real-time 3-D graphics rendering by increasing the efficiency of cache memory access and by limiting the performance penalties resulting from accessing higher latency memory.

In one embodiment, the present invention is implemented as a GPU architecture configured for traversing pixels of an area. The GPU includes a set-up unit for generating polygon descriptions and a rasterizer unit coupled to the set-up unit for rasterizing the polygon descriptions. The rasterizer unit is configured to traverse a plurality of pixels of an image using a first boustrophedonic pattern along a predominant axis, and during the traversal using the first boustrophedonic pattern, traverse a plurality of pixels of the image using a second boustrophedonic pattern, wherein the second boustrophedonic pattern is nested within the first boustrophedonic pattern.

In one embodiment, the first boustrophedonic pattern and the second boustrophedonic pattern are implemented by a coarse rasterizer component within the raster unit of the GPU. In one embodiment, the GPU groups the plurality of pixels of the image as tiles and the tiles are traversed using the first boustrophedonic pattern and the second boustrophedonic pattern.

In one embodiment, the number of pixels per tile is programmable, and can be designated as 4.times.4, 8.times.8, 16.times.16, 32.times.32, 64.times.64, 128.times.128, or the like, including rectangular as well as square arrays, in accordance with the requirements of a graphics rendering operation. Similarly, the number of pixels per tile is programmable in accordance with a size of a cache memory of the GPU, and the predominant axis of the first boustrophedonic pattern and/or the second boustrophedonic pattern is programmable.


tile-based renderer :)?
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
230#
发表于 2009-6-14 17:02 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

231#
 楼主| 发表于 2009-6-25 23:35 | 只看该作者
下一代架构的名称出来了——fermi。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?注册

x
回复 支持 反对

使用道具 举报

232#
 楼主| 发表于 2009-6-27 17:39 | 只看该作者
http://patft.uspto.gov/netacgi/n ... 1&RS=PN/7372471

这个应该就是 Tesla 开始引入的 CSAA 的专利。
  1. BACKGROUND OF THE INVENTION

  2. Computer graphics systems represent graphical primitives as pixel elements of a display. Aliasing refers to the visual artifacts, such as stair-stepping of surface edges (sometimes known as "jaggies"), that occur when an image is sampled at a spatial sample frequency that is too low.

  3. A variety of anti-aliasing techniques exist to reduce visual artifacts. One example is supersampling, in which an image is sampled more than once per pixel grid cell and the samples are filtered. For example, in supersampling the contribution of each sample in a pixel may be weighted to determine attributes of the pixel, such as the pixel color. Another example of an anti-aliasing technique is multisampling. As objects are rendered in a multisampling system, a single color is typically computed per primitive and used for all subpixel samples covered by the primitive. Additional background information on anti-aliasing techniques used in graphics processing units (GPUs) may be found in several patents issued to the Nvidia Corporation of Santa Clara, Calif. such as U.S. Pat. No. 6,452,595, "Integrated graphics processing unit with anti-aliasing," U.S. Pat. No. 6,720,975, "Super-sampling and multi-sampling system and method for anti-aliasing," and U.S. Pat. No. 6,469,707, "Method for efficiently rendering color information for a pixel in a computer system," the contents of each of which are hereby incorporated by reference.

  4. In both supersampling and multisampling the quality of the anti-aliasing tends to improve as the number of samples per partially covered pixel increases. For example, increasing the number of samples per pixel from four samples ("4.times." sampling) to sixteen ("16.times." sampling) would be expected to reduce visual artifacts. However, as the number of samples per pixel increases more sample data must be generated, stored, transported, and processed. Consequently, the required memory resources, computing resources, and memory bandwidth may increase as the number of samples per pixel is increased. As a result, the cost and complexity of a graphics system tends to increase as the number of samples used for anti-aliasing increases.

  5. Therefore, what is desired is an improved apparatus, system, and method for anti-aliasing that achieves many of the same benefits associated with an increase in the number of samples per pixel but without the corresponding increase in cost and complexity associated with conventional anti-aliasing techniques.

  6. SUMMARY OF THE INVENTION

  7. A graphics system has a mode of operation in which it determines coverage of primitives at a single real sample location within a pixel and at least one virtual sample location within the pixel. In one embodiment, a virtual sample is a pointer that points to either the single real sample associated with the same pixel or to a neighboring proximate pixel. Information on virtual sample coverage is used to adjust the weights of real samples for a down-filtering process.

  8. In one embodiment, generating weighted samples for anti-aliasing a pixel includes: determining coverage by primitives over a single real sample location per pixel and at least one virtual sample location per pixel; for each virtual sample location within a pixel, identifying a virtual sample as belonging to either said pixel or to a neighboring pixel proximate said virtual sample location; and utilizing virtual sample coverage information to adjust the weights of real samples with real samples of neighboring pixels.
复制代码
还有这个:
http://patft.uspto.gov/netacgi/n ... 9&RS=PN/7333119
  1. generating at least one real sample for a primitive covering the pixel, the real sample including z depth data and color data for a sample location within the pixel; detecting coverage of at least one virtual sample location by said primitives within the pixel; for each covered virtual sample location within the pixel, forming a virtual sample by generating a pointer identifying a set of real sample locations within the pixel that are also covered by a common visible primitive; utilizing said at least one virtual sample to adjust the weight of at least one real sample for anti-aliasing; and displaying the anti-aliased pixel.


  2. for each real sample r weight[r] = 1;// contribution of real sample itself for each virtual sample determine closest real sample r with a 1 in the bitmask weight[r]++. A table may be used to determine, for each virtual sample and coverage bit pattern, which eligible real sample is the closest.

  3. The final pixel color is calculated during anti-aliasing using the weights calculated for the real samples. An exemplary algorithm for computing the final pixel color as a weighted average of the real sample colors (color[r]) is as follows

  4. finalcolor = 0;

  5. for each real sample r finalcolor += color[r]* weight[r]/ (total samples).
复制代码
回复 支持 反对

使用道具 举报

RacingPHT 该用户已被删除
233#
发表于 2009-7-1 11:16 | 只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
回复 支持 反对

使用道具 举报

234#
 楼主| 发表于 2009-7-1 13:00 | 只看该作者
以目前看,CSAA 就是测定三角形覆盖的高倍数虚拟样本位置,来获取真实样本的色彩权重百分比,不知道是否理解正确。

如果这样的话,多少倍 CS,其实得到的顶多只是几个更精确的 real sample color。

CSAA 除了 NVIDIA 的例程里有不错的保险外,许多游戏里的确是没啥效果,至于劣化,好像还没看到。

下面附带上 7333119 的 pdf,已经经过 OCR,文本可以搜索。:)

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?注册

x
回复 支持 反对

使用道具 举报

235#
发表于 2009-7-1 22:34 | 只看该作者
CSAA看似是一种更加Smart的AA技术,如果实线真的如此的话,那和CFAA也可以很好的融合在一起了。不过CSAA这种下降滤波的方式感觉和Alpha to Coverage的机制极有可能有一腿,即每个Sampling Location占用百分比的参数有可能被Alpha to Coverage复用。不知道大牛都怎么看?
回复 支持 反对

使用道具 举报

236#
发表于 2009-7-1 22:42 | 只看该作者
问一下,陈先生用什么OCR的?我用ABBYY总是会有乱码。
回复 支持 反对

使用道具 举报

237#
发表于 2009-7-1 22:48 | 只看该作者
本帖最后由 ic.expert 于 2009-7-1 23:13 编辑

我又想了一下,这相当于做了一个前置滤波,而且是下降滤波。

不过这种机制仍然不能使得,当三角形之间发生交叉的时候,使交叉边界更加柔和……
回复 支持 反对

使用道具 举报

238#
 楼主| 发表于 2009-7-2 00:10 | 只看该作者
问一下,陈先生用什么OCR的?我用ABBYY总是会有乱码。
ic.expert 发表于 2009-7-1 22:42


Adobe acrobat pro, document->OCR。
回复 支持 反对

使用道具 举报

239#
 楼主| 发表于 2009-7-2 00:27 | 只看该作者
我对这个细节不清楚,按照 NVIDIA 的说法,适用性挺广的,不过效果真的是没啥,也许 wireframe 模式下会比较出彩,问题是 wireframe 也就主要是工业软件在使用。
回复 支持 反对

使用道具 举报

240#
发表于 2009-7-2 02:18 | 只看该作者
Adobe acrobat pro, document->OCR。
Edison 发表于 2009-7-2 00:10


谢谢陈先生  :>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

广告投放或合作|网站地图|处罚通告|

GMT+8, 2025-8-29 06:57

Powered by Discuz! X3.4

© 2001-2017 POPPUR.

快速回复 返回顶部 返回列表