|
在 AMD 的 Stream GUI 里(http://developer.amd.com/gpu_assets/Stream_Computing_User_Guide.pdf),关于 LDS 的使用有如下的描述。
Local Data Store (LDS)
The LDS is a write-private, read-public model: a thread can write only to its own memory space but can read from the memory space of any thread in the same group. This model is for the ATI Radeon HD4870. The constraints of the current LDS model are:
1. All read/writes are 128 bits and 4 dword aligned.
2. Writes are statically specified at compile time.
3. Reads are dynamically specified at runtime.
4. There is an inverse relation between group size and LDS size.
5. The largest group size is 1024 threads, with an LDS size of 4.
6. The largest LDS size per thread is 64 dwords, with a group size <= 64.
7. The absolute addressing mode forces all wavefronts to write to the memory allocated to the first 64 thread IDs. Thus, all threads with the same ID modulus 64 write to the same location.
8. Data can only be shared within threads in a group.
9. Memory accesses outside of the thread group result in undefined behavior.
The LDS also has a useful keyword call feature: _neighborExch on reads, which performs a 4x4 transpose within a four-thread neighborhood. This means all threads that have tID % 4 == 0 get x values from the 4 threads, tID%4==1 get y values, etc. Figure C.2 illustrates this.
Shared Registers
Shared registers are a method of sharing data at a lower level than the LDS. The LDS shares data at the group level, but shared registers share data at the wavefront level. The shared registers are unique to the index of a wavefront and share data between wavefronts; this enables vertical sharing between all the wavefronts that run on a SIMD. This feature allows sharing between groups; however, one constraint is that shared registers only guarantee atomicity during the same instruction.
这里提到的 group 似乎就是相当于 CUDA 里的 thread block 了。 |
|