|
http://forums.amd.com/devforum/m ... 0&enterthread=y
GDS is not currently supported in CAL even though the hardware does have this feature. This is mainly because there is no global locking mechanism to synchronize on and therefor there is no good way of using it. A Global GPR is a Register that is shared between the same thread index of a wavefront. These are the shared registers in IL.
For example, if you have two wavefronts with thread id's numbers 0-63 and 64-127. By declaring 1 shared register in your IL kernel, threads n and n + 64 can both read/write to sr0. Shared registers are guarantee atomic accesses in the same instruction only. So you can do a rmw operation on this register and another wavefront will see the updated value. This is useful for doing simple reductions in compute shader. You can do a simple reduction of any size in 3 passes instead of the log (n) passes that is currently required.
It would go something like this:
first pass:
run 1 thread per data point and have it update a globally shared register(either min, max, sum, etc...)
second pass:
run 1 wavefront per simd and use the LDS to share data between threads and update a single thread with the result of the rest of the threads and write out to global buffer
third pass:
run 1 wavefront and have it reduce the data from the global buffer to a single point
This can only be guaranteed to work if you use calCtxRunProgramGridArray and set the array to be three passes. |
|