POPPUR爱换

 找回密码
 注册

QQ登录

只需一步,快速开始

手机号码,快捷登录

搜索
查看: 1838|回复: 0
打印 上一主题 下一主题

"Efficient Gather and Scatter Operations on Graphics Processors"

[复制链接]
跳转到指定楼层
1#
发表于 2008-5-13 18:09 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
http://sc07.supercomputing.org/schedule/pdf/pap175.pdf

摘要:

"Scatter and gather operations are two fundamental operations in many scientific and enterprise computing applications. These operations are implemented as native collective operations in message passing interfaces (MPI) to define communication patterns across the processors [4], and in parallel programming languages such as ZPL [8] and HPF [1]. Scatter operations write data to arbitrary locations and gather operations read data from arbitrary locations. Both operations are highly memory intensive and form the basic primitives to implement many parallel algorithms such as quicksort [12], sparse matrix transpose [8], and others. In this paper, we study the performance of scatter and gather operations on GPUs."

"GPU memory architectures are significantly different from CPU memory architectures [21]. Specifically, GPUs consist of high-bandwidth, high-latency video memory and the GPU cache sizes are significantly smaller than the CPUs – therefore, the performance characteristics of scatter and gather operations on GPUs may involve different optimizations than corresponding CPU-based algorithms. Additionally, it was only recently that the scatter functionality was introduced on GPUs, and there is little work in identifying the performance characteristics of the scatter on GPUs."

"We use our scatter and gather to implement three common applications on an NVIDIA GeForce 8800 GPU (G80) - radix sort using scatter operations, and the hash search and the sparse-matrix vector multiplication using gather operations. Our results indicate that our optimizations can greatly improve the utilization of the memory bandwidth. Specifically, our optimized algorithms achieve a 2-4X performance improvement over single-pass GPU-based implementations. We also compared the performance of our algorithms with optimized CPU-based algorithms on high-end multi-core CPUs. In practice, our results indicate a 2-7X performance improvement over CPU-based algorithms"


作者:

Bingsheng He# Naga K. Govindaraju*
#Hong Kong Univ. of Science and Technology {saven, luo}@cse.ust.hk

Qiong Luo# Burton Smith*
*Microsoft Corp.
{nagag, burtons}@microsoft.com
您需要登录后才可以回帖 登录 | 注册

本版积分规则

广告投放或合作|网站地图|处罚通告|

GMT+8, 2025-8-6 10:35

Powered by Discuz! X3.4

© 2001-2017 POPPUR.

快速回复 返回顶部 返回列表