并行计算 教学课件.ppt
《并行计算 教学课件.ppt》由会员分享,可在线阅读,更多相关《并行计算 教学课件.ppt(84页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、并行计算系统体系结构概述,Pingpeng YuanService Computing Technology and System LabCluster and Grid Computing Lab,2,2019/10/22,目录,并行计算机系统及结构模型当代并行机系统并行计算性能评测,3,2019/10/22,1 并行计算机系统及结构模型,1.1 并行计算需求1.2 并行计算机系统互连1.2.1 系统互连1.2.2 静态互联网络1.2.3 动态互连网络1.2.4 标准互联网络1.3 并行计算机系统结构1.3.1 并行计算机结构模型1.3.2 并行计算机访存模型,4,2019/10/22,Dr
2、ivers of Parallel Computing,Application Needs: Our insatiable need for computing cyclesScientific computing: CFD, Biology, Chemistry, Physics, .General-purpose computing: Video, Graphics, CAD, Databases, TP.Internet applications: Search, e-Commerce .Technology Trends,5,2019/10/22,Scientific Computin
3、g Demand,Ever increasing demand due to need for more accuracy, higher-level modeling and knowledge, and analysis of exploding amounts of dataExample area: Climate and Ecological Modeling goalsSimply resolution, simulated time, and improved physics leads to increased requirement by factors of 104 to
4、107. Then Reliable global warming, natural disaster and weather predictionPredictive models of rainforest destruction, forest sustainability, effects of climate change on ecoystems and on foodwebs, global health trendsVerifiable global ecosystem and epidemic modelsIntegration of macro-effects with l
5、ocalized and then micro-effectsPredictive effects of human activities on earths life support systemsUnderstanding earths life support systems,6,2019/10/22,Engineering Computing Demand,Large parallel machines a mainstay in many industriesPetroleum (reservoir analysis)Automotive (crash simulation, dra
6、g analysis, combustion efficiency), Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism), Computer-aided designPharmaceuticals (molecular modeling)Visualization in all of the aboveentertainment (movies), architecture (walk-throughs, rendering)Financial modeling (
7、yield and derivative analysis)etc.,7,2019/10/22,Commercial Computing,Also relies on parallelism for high endScale not so large, but use much more wide-spreadComputational power determines scale of business that can be handledDatabases, online-transaction processing, decision support, data mining, da
8、ta warehousing .E-commerce, search and other scalable internet servicesParallel applications running on clustersDeveloping new parallel software models and primitivesInsight from automated analysis of large disparate data,8,2019/10/22,Drivers of Parallel Computing,Application NeedsTechnology Trends,
9、9,2019/10/22,Technology Trends: Rise of the Micro,The natural building block for multiprocessors is now also about the fastest!,10,2019/10/22,General Technology Trends,Microprocessor performance increases 50% - 100% per yearClock frequency doubles every 3 yearsTransistor count quadruples every 3 yea
10、rs Moores law: xtors per chip = 1.59year-1959 (originally 2year-1959)Huge investment per generation is carried by huge commodity market,11,2019/10/22,Clock Frequency Growth Rate (Intel family),30% per year,12,2019/10/22,Transistor Count Growth Rate (Intel family),Transistor count grows much faster t
11、han clock rate- 40% per year, order of magnitude more contribution in 2 decadesWidth/space has greater potential than per-unit speed,13,2019/10/22,How to Use More Transistors,Improve single threaded performance via architecture:Not keeping up with potential given by technology (next)Use transistors
12、for memory structures to improve data localityDoesnt give as high returns (2x for 4x cache size, to a point)Use parallelismInstruction-level Thread levelBottom line: Not that single-threaded performance has plateaued, but that parallelism is natural way to stay on a better curve,14,2019/10/22,Microp
13、rocessor Performance,15,2019/10/22,Similar Story for Storage (Transistor Count),16,2019/10/22,Similar Story for Storage (DRAM Capacity),17,2019/10/22,Similar Story for Storage,Divergence between memory capacity and speed more pronouncedCapacity increased by 1000x from 1980-95, and increases 50% per
14、yrLatency reduces only 3% per year (only 2x from 1980-95)Bandwidth per memory chip increases 2x as fast as latency reduces,Larger memories are slower, while processors get fasterNeed to transfer more data in parallelNeed deeper cache hierarchiesHow to organize caches?,18,2019/10/22,Similar Story for
15、 Storage,Parallelism increases effective size of each level of hierarchy, without increasing access timeParallelism and locality within memory systems tooNew designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interfaceBuffer caches most recently accessed
16、dataDisks too: Parallel disks plus cachingOverall, dramatic growth of processor speed, storage capacity and bandwidths relative to latency (especially) and clock speed point toward parallelism as the desirable architectural direction,19,2019/10/22,Top 10 Fastest Computers (Linpack),RankSiteComputerP
17、rocessorsYearRmaxDOE/NNSA/LLNL USAIBM BlueGene 1310722005280600NNSA/Sandia Labs, USACray Red Storm, Opteron 26544 2006101400IBM Research, USA,IBM Blue Gene Solution 40960 2005 91290DOE/NNSA/LLNL, USAASCI Purple - IBM eServer p5 12208 2006 75760Barcelona Center, Spain IBM JS21 Cluster, PPC 970 10240
18、2006 62630NNSA/Sandia Labs, USADell Thunderbird Cluster 9024 2006 53000CEA, FranceBull Tera-10 Itanium2 Cluster 9968 2006 52840NASA/Ames, USASGI Altix 1.5 GHz, Infiniband 101602004 51870GSIC Center, JapanNEC/Sun Grid Cluster (Opteron) 11088 2006 47380Oak Ridge Lab, USACray Jaguar XT3, 2.6 GHz dual 1
19、0424 2006 43480,NEC Earth Simulator (top for 5 lists) moves down to #14 #10 system has doubled in performance since last year,20,2019/10/22,Top 500: Architectural Styles,21,2019/10/22,Top 500: Processor Type,22,2019/10/22,系统互连,不同带宽与距离的互连技术: 总线、SAN、LAN、MAN、WAN,23,2019/10/22,局部总线、I/O总线、SAN和LAN,24,2019
20、/10/22,网络性能指标,节点度(Node Degree):射入或射出一个节点的边数。在单向网络中,入射和出射边之和称为节点度。网络直径(Network Diameter): 网络中任何两个节点之间的最长距离,即最大路径数。对剖宽度(Bisection Width) :对分网络各半所必须移去的最少边数对剖带宽( Bisection Bandwidth):每秒钟内,在最小的对剖平面上通过所有连线的最大信息位(或字节)数如果从任一节点观看网络都一样,则称网络为对称的(Symmetry),25,2019/10/22,静态互连网络 与动态互连网络,静态互连网络:处理单元间有着固定连接的一类网络,在程
21、序执行期间,这种点到点的链接保持不变;典型的静态网络有一维线性阵列、二维网孔、树连接、超立方网络、立方环、洗牌交换网、蝶形网络等动态网络:用交换开关构成的,可按应用程序的要求动态地改变连接组态;典型的动态网络包括总线、交叉开关和多级互连网络等。,26,2019/10/22,静态互连网络(1),一维线性阵列(1-D Linear Array):并行机中最简单、最基本的互连方式,每个节点只与其左、右近邻相连,也叫二近邻连接,N个节点用N-1条边串接之,内节点度为2,直径为N-1,对剖宽度为1当首、尾节点相连时可构成循环移位器,在拓扑结构上等同于环,环可以是单向的或双向的,其节点度恒为2,直径或为i
22、nt(n/2) (双向环)或为N-1(单向环),对剖宽度为2,27,2019/10/22,静态互连网络(2),二维网孔(2-D Mesh):每个节点只与其上、下、左、右的近邻相连(边界节点除外),节点度为4,网络直径为 ,对剖宽度为 在垂直方向上带环绕,水平方向呈蛇状,就变成Illiac网孔了,节点度恒为4,网络直径为 ,而对剖宽度为 垂直和水平方向均带环绕,则变成了2-D环绕(2-D Torus),节点度恒为4,网络直径为 ,对剖宽度为,28,2019/10/22,静态互连网络(3),二叉树:除了根、叶节点,每个内节点只与其父节点和两个子节点相连。节点度为3,对剖宽度为1,而树的直径为 如果
23、尽量增大节点度为,则直径缩小为2,此时就变成了星形网络传统二叉树的主要问题是根易成为通信瓶颈。胖树节点间的通路自叶向根逐渐变宽。,29,2019/10/22,静态互连网络(4),超立方 :一个n-立方由 个顶点组成,3-立方如图(a)所示;4-立方如图(b)所示,由两个3-立方的对应顶点连接而成。n-立方的节点度为n,网络直径也是n ,而对剖宽度为 。如果将3-立方的每个顶点代之以一个环就构成了如图(d)所示的3-立方环,此时每个顶点的度为3,而不像超立方那样节点度为n。,30,2019/10/22,嵌入,将网络中的各节点映射到另一个网络中去用膨胀(Dilation)系数来描述嵌入的质量,它是
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 并行 计算 教学 课件
限制150内