Hybrid Parallel Programming on GPU ClustersGPU集群的混合并行编程(英汉对照).docx
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《Hybrid Parallel Programming on GPU ClustersGPU集群的混合并行编程(英汉对照).docx》由会员分享,可在线阅读,更多相关《Hybrid Parallel Programming on GPU ClustersGPU集群的混合并行编程(英汉对照).docx(13页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Hybrid Parallel Programming on GPU ClustersAbstractNowadays, NVIDIAs CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has p
2、roven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a hybrid parallel programming
3、approach using hybrid CUDA and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same
4、computational node.Keywords: CUDA, GPU, MPI, OpenMP, hybrid, parallel programmingI. INTRODUCTIONNowadays, NVIDIAs CUDA 1, 16 is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions a hierarchy of thread blocks, shared me
5、mory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA 1, 16 to achieve dramatic speedups on production and research codes.
6、In NVDIA the CUDA chip, all to the core of hundreds of ways to construct their chips, in here we will try to use NVIDIA to provide computing equipment for parallel computing. This paper proposes a solution to not only simplify the use of hardware acceleration in conventional general purpose applicat
7、ions, but also to keep the application code portable. In this paper, we propose a parallel programming approach using hybrid CUDA, OpenMP and MPI 3 programming, which partition loop iterations according to the performance weighting of multi-core 4 nodes in a cluster. Because iterations assigned to o
8、ne MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node.In this paper, we propose a general ap
9、proach that uses performance functions to estimate performance weights for each node. To verify the proposed approach, a heterogeneous cluster and a homogeneous cluster were built. In ourimplementation, the master node also participates in computation, whereas in previous schemes, only slave nodes d
10、o computation work. Empirical results show that in heterogeneous and homogeneous clusters environments, the proposed approach improved performance over all previous schemes.The rest of this paper is organized as follows. In Section 2, we introduce several typical and well-known self-scheduling schem
11、es, and a famous benchmark used to analyze computer system performance. In Section 3, we define our model and describe our approach. Our system configuration is then specified in Section 4, and experimental results for three types of application program are presented. Concluding remarks and future w
12、ork are given in Section 5.II. BACKGROUND REVIEWA. History of GPU and CUDAIn the past, we have to use more than one computer to multiple CPU parallel computing, as shown in the last chip in the history of the beginning of the show does not need a lot of computation, then gradually the need for the g
13、ame and even the graphics were and the need for 3D, 3D accelerator card appeared, and gradually we began to display chip for processing, began to show separate chips, and even made asimilar in their CPU chips, that is GPU. We know that GPU computing could be used to get the answers we want, but why
14、do we choose to use the GPU? This slide shows the current CPU and GPU comparison. First, we can see only a maximum of eight core CPU now, but the GPU has grown to 260 core, the core number, well know a lot of parallel programs for GPU computing, despite his relatively low frequency of core, we I bel
15、ieve a large number of parallel computing power could be weaker than a single issue. Next, we know that there are within the GPU memory, and more access to main memory and GPU CPU GPU access on the memory capacity, we find that the speed of accessing GPU faster than CPU by 10 times, a whole worse 90
16、GB / s, This isquite alarming gap, of course, this also means that when computing the time required to access large amounts of data can have a good GPU to improve.CPU using advanced flow control such as branch predict or delay branch and a large cache to reduce memory access latency, and GPUs cache
17、and a relatively small number of flow control nor his simple, so the method is to use a lot of GPU computing devices to cover up the problem of memory latency, that is, assuming an access memory GPU takes 5 seconds of the time, but if there are 100 thread simultaneous access to, the time is 5 second
18、s, but the assumption that CPU time memory access time is 0.1 seconds, if the 100 thread access, the time is 10 seconds, therefore, GPU parallel processing can be used to hide even in access memory thanCPU speed. GPU is designed such that more transistors are devoted to data processing rather than d
19、ata caching and flow control, as schematically illustrated by Figure 1.Therefore, we in the arithmetic logic by GPU advantage, trying to use NVIDIAs multi-core available to help us a lot of computation, and we will provide NVIDIA with so many core programs, and NVIDIA Corporation to provide the API
20、of parallel programming large number of operations to carry out. We must use the form provided by NVIDIA Corporation GPU computing to run it? Not really. We can use NVIDIA CUDA, ATI CTM and apple made OpenCL (Open Computing Language), is the development of CUDA is one of the earliest and most people
21、 at this stage in the language but with the NVIDIA CUDA only supports its own graphics card, from where we You can see at this stage to use GPU graphics card with the operator of almost all of NVIDIA, ATI also has developed its own language of CTM, APPLE also proposed OpenCL (Open Computing Language
22、), which OpenCL has been supported by NVIDIA and ATI, but ATI CTM has also given up the language of another, by the use of the previous relationship between the GPU, usually only support singleprecision floating-point operations, and in science, precision is a very important indicator, therefore, in
23、troduced this year computing graphics card has to support a Double precision floating-point operations.B. CUDA ProgrammingCUDA (an acronym for Compute Unified Device Architecture) is a parallel computing 2 architecture developed by NVIDIA. CUDA is the computing engine in NVIDIA graphics processing u
24、nits or GPUs that is accessible to software developers through industry standard programming languages. The CUDA software stack is composed of several layers as illustrated in Figure 2: a hardware driver, an application programming interface (API) and its runtime, and two higher-level mathematical l
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Hybrid Parallel Programming on GPU Clusters GPU集群的混合并行编程英汉对照 集群 混合 并行 编程 英汉 对照
![提示](https://www.taowenge.com/images/bang_tan.gif)
链接地址:https://www.taowenge.com/p-29942636.html
限制150内