|
215: Pixel Input Buffer
220:Vertex Input Buffer
225:Texture Unit
When execution of an instruction is complete, Execution Unit 470 updates Resource Scoreboard 460 to indicate that destination registers are written and the computation resources used to process the instruction are available. In an alternate embodiment, Resource Scoreboard 460 snoops an interface between Execution Unit 470 and Register File 350 to update register status.
When the program instructions associated with a thread have completed execution, the storage resources allocated to retain intermediate data generated during execution of the thread become available for allocation to another thread, i.e., the storage resources are deallocated and the thread is flagged as available in Thread Control Unit 420. When a program instruction stored in Instruction Cache 410 has completed execution on each sample within the one or more sets that the program instruction is programmed to process, the program instruction is retired from Instruction Cache 410 (by being overwritten).
In step 510 Thread Control Unit 320 or 420 receives a sample, e.g., vertex, fragment, pixel, and the like. In step 515 Thread Control Unit 320 or 420 determines if the sample is a vertex sample or a pixel sample, and if the sample is a vertex sample Thread Control Unit 320 or 420 proceeds to step 530. In step 530 Thread Control Unit 320 or 420 assigns a vertex thread to the vertex sample to be processed by a vertex program.
The maximum number of threads that can be executed simultaneously is related to the number of Execution Pipelines 240, the size of storage for thread state data, the amount of storage for intermediate data generated during processing of a sample, the latency of Execution Pipelines 240, and the like. Likewise, a number of threads of each sample type that may be executed simultaneously may be limited in each embodiment. Therefore, not all samples within a first set of samples of a first type can be processed simultaneously when the number of threads available for processing samples of the first type is less than the number of samples of the first type. Conversely, when the number of threads available for processing samples of a second type exceeds the number of samples of the second type within a second set, more than one set can be processed simultaneously. When processing throughput is limited for samples of the first type, the number of threads available for the first type may be increased by allocating unused threads for processing samples of the first type. For example, locations in Portion 620 may be allocated to Portion 630.
有空我再介绍:whistling: |
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有帐号?注册
x
|