lect4_华科并行编程课件.pdf
《lect4_华科并行编程课件.pdf》由会员分享,可在线阅读,更多相关《lect4_华科并行编程课件.pdf(50页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Parallel Programming Principle and PracticeLecture 4 Parallel Programming MethodologyJin,HaiSchool of Computer Science and TechnologyHuazhong University of Science and Technology2Outline Motivating Problems Steps in Creating a Parallel Program What a Simple Parallel Program Looks Like3MOTIVATING PRO
2、BLEMSParallel programming methodology4Motivating Problems Simulating Ocean CurrentsRegular structure,scientific computing Simulating the Evolution of GalaxiesIrregular structure,scientific computing Rendering Scenes by Ray TracingIrregular structure,computer graphics5Simulating Ocean Currents Model
3、as two-dimensional grids Discretize in space and timefiner spatial and temporal resolution-greater accuracy Many different computations per time stepset up and solve equations Concurrency across and within grid computations6Simulating Galaxy Evolution Simulate the interactions of many stars evolving
4、 over time Computing forces is expensive O(n2)brute force approach Hierarchical methods take advantage of force law:G Many time-steps,plenty of concurrency across stars within one7Rendering Scenes by Ray Tracing Shoot rays into scene through pixels in image plane Follow their pathsThey bounce around
5、 as they strike objectsThey generate new rays:ray tree per input ray Result is color and opacity for that pixel Parallelism across raysAll case studies have abundant concurrency8Creating a Parallel Program Assumption:Sequential algorithm is givenSometimes need very different algorithm,but beyond sco
6、pe Pieces of the jobIdentify work that can be done in parallelPartition work and perhaps data among processesManage data access,communication and synchronizationNote:work includes computation,data access,and I/O Main goal:Speedup(plus low prog.effort and resource needs)For a fixed problem9STEPS IN C
7、REATING PARALLEL PROGRAMParallel programming methodology10Some Important Concepts TaskArbitrary piece of undecomposed work in parallel computationExecuted sequentially;concurrency is only across taskse.g.a particle/cell in Barnes-Hut,a ray or ray group in RaytraceFine-grained versus coarse-grained t
8、asks Process(thread)Abstract entity that performs the tasks assigned to processesProcesses communicate and synchronize to perform their tasks ProcessorPhysical engine on which process executesProcesses virtualize machine to programmerfirst write program in terms of processes,then map to processors11
9、Limited Concurrency:Amdahls Law Fundamental limitation on parallel speedup If s=fraction of sequential execution that is inherently serialthen speedup 1/s12Amdahls Law Example 2-phase computation over an n-by-n gridPhase 1:perform an independent computation on each grid elementeasy to parallelizePha
10、se 2:add a value from each grid element into a global summore difficult to parallelize;serial by default Sequential Executionboth phases take n2time;2n2 total13First Attempt at Parallelization StrategyPhase 1:execute in paralleltime for phase 1=n2/pPhase 2:execute seriallytime for phase 2=n2 Overall
11、 PerformanceSpeedupi.e.no more than 214Parallelizing Phase 2 Trick:divide second phase into two stepsStep 1:accumulate into private sum during sweepStep 2:add per-process private sum into global sum Overall Performance:Parallel time=n2/p+n2/p+pSpeedupclose to p if np15Concurrency Profiles Cannot usu
12、ally divide into serial and fully parallel parts Area under curve is total work done,or time with 1 processor Horizontal extent is lower bound on time(infinite processors)Speedup is the ratio:,base case:Amdahls law applies to any overhead,not just limited concurrency16Steps in Creating a Parallel Pr
13、ogram4 steps:Decomposition,Assignment,Orchestration,MappingDone by programmer or system software(compiler,runtime,.)Issues are the same,so assume programmer does it all explicitly17Decomposition Break up computation into tasks to be divided among processesi.e.Identify concurrency and decide level at
14、 which to exploit it Tasks may or may not be defined staticallyTasks may become available dynamicallyLots of available tasks may vary with time Goal:Enough tasks to keep processes busy,but not too manyLots of tasks available at a time is upper bound on achievable speedup18Steps in Creating a Paralle
15、l Program 4 steps:Decomposition,Assignment,Orchestration,Mapping19Assignment Specifying mechanism to divide work up among processese.g.which process computes forces on which stars,or which raysTogether with decomposition,also called partitioningGoals:balance workload,reduce communication and managem
16、ent cost Structured approaches usually work wellCode inspection(parallel loops)or understanding of applicationWell-known heuristicsStatic versus dynamic assignment As programmers,we worry about partitioning firstUsually independent of architecture or programming modelBut cost and complexity of using
17、 primitives may affect decisions As architects,we assume program does reasonable job of it20Steps in Creating a Parallel Program 4 steps:Decomposition,Assignment,Orchestration,Mapping21OrchestrationMain taskNaming dataStructuring communicationSynchronizationOrganizing data structures and scheduling
18、tasks temporallyGoalsReduce cost of communication and synchronization as seen by processorsPreserve locality of data reference(incl.data structure organization)Schedule tasks to satisfy dependences earlyReduce overhead of parallelism managementClosest to architecture(and programming model&language)C
19、hoices depend a lot on communication abstraction,efficiency of primitivesArchitects should provide appropriate primitives efficiently22Steps in Creating a Parallel Program 4 steps:Decomposition,Assignment,Orchestration,Mapping23MappingAfter orchestration,already have parallel programTwo aspects of m
20、appingWhich processes will run on same processor,if necessaryWhich process runs on which particular processormapping to a network topologyOne extreme:space-sharingMachine divided into subsets,only one application at a time in a subsetProcesses can be pinned to processors,or left to OSAnother extreme
21、:complete resource management control to OSOS uses the performance techniques we will discuss laterReal world is between the twoUser specifies desires in some aspects,system may ignoreUsually adopt the view:process processor24Parallelizing Computation vs.Data Above view is centered around computatio
22、nComputation is decomposed and assigned(partitioned)Partitioning data is often a natural view tooComputation follows data:owner computesGrid example;data mining;High Performance Fortran(HPF)But not general enoughDistinction between computation and data stronger in many applicationsRetain computation
23、-centric viewData access and communication is part of orchestration25High-level Goals High performance(speedup over sequential program)But low resource usage and development effort Implications for algorithm designers and architectsAlgorithm designers:high-performance,low resource needsArchitects:hi
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- lect4_ 并行 编程 课件
限制150内