"Efficient Gather and Scatter Operations on Graphics Processors"

Edison · 发表于 2008-5-13 18:09

http://sc07.supercomputing.org/schedule/pdf/pap175.pdf

摘要：

"Scatter and gather operations are two fundamental operations in many scientific and enterprise computing applications. These operations are implemented as native collective operations in message passing interfaces (MPI) to define communication patterns across the processors [4], and in parallel programming languages such as ZPL [8] and HPF [1]. Scatter operations write data to arbitrary locations and gather operations read data from arbitrary locations. Both operations are highly memory intensive and form the basic primitives to implement many parallel algorithms such as quicksort [12], sparse matrix transpose [8], and others. In this paper, we study the performance of scatter and gather operations on GPUs."

"GPU memory architectures are significantly different from CPU memory architectures [21]. Specifically, GPUs consist of high-bandwidth, high-latency video memory and the GPU cache sizes are significantly smaller than the CPUs – therefore, the performance characteristics of scatter and gather operations on GPUs may involve different optimizations than corresponding CPU-based algorithms. Additionally, it was only recently that the scatter functionality was introduced on GPUs, and there is little work in identifying the performance characteristics of the scatter on GPUs."

"We use our scatter and gather to implement three common applications on an NVIDIA GeForce 8800 GPU (G80) - radix sort using scatter operations, and the hash search and the sparse-matrix vector multiplication using gather operations. Our results indicate that our optimizations can greatly improve the utilization of the memory bandwidth. Specifically, our optimized algorithms achieve a 2-4X performance improvement over single-pass GPU-based implementations. We also compared the performance of our algorithms with optimized CPU-based algorithms on high-end multi-core CPUs. In practice, our results indicate a 2-7X performance improvement over CPU-based algorithms"

作者：

Bingsheng He# Naga K. Govindaraju*
#Hong Kong Univ. of Science and Technology {saven, luo}@cse.ust.hk

Qiong Luo# Burton Smith*
*Microsoft Corp.
{nagag, burtons}@microsoft.com

帐号		自动登录	找回密码
密码			注册

"Efficient Gather and Scatter Operations on Graphics Processors"

浏览过的版块