现代计算机体系结构.ppt
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《现代计算机体系结构.ppt》由会员分享,可在线阅读,更多相关《现代计算机体系结构.ppt(54页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、现代计算机体系结构1现代计算机体系结构现代计算机体系结构主讲教师:张钢主讲教师:张钢主讲教师:张钢主讲教师:张钢 教授教授教授教授天津大学计算机学院天津大学计算机学院天津大学计算机学院天津大学计算机学院通信邮箱:提交作业邮箱:2016年2The Main Contents课程主要内容Chapter 1.Fundamentals of Quantitative Design and AnalysisChapter 2.Memory Hierarchy DesignChapter 3.Instruction-Level Parallelism and Its ExploitationChapter
2、 4.Data-Level Parallelism in Vector,SIMD,and GPU ArchitecturesChapter 5.Thread-Level ParallelismChapter 6.Warehouse-Scale Computers to Exploit Request-Level and Data-Level ParallelismAppendix A.Pipelining:Basic and Intermediate Concepts3.4/3.5/3.6课堂讨论2022/12/144Advantages of Dynamic SchedulingDynami
3、c schedulingHardware rearranges the instruction execution to reduce stalls while maintaining data flow and exception behaviorWhats the meaning that maintaining data flow and exception behavior?2022/12/145Advantages of Dynamic SchedulingAdvantagesIt handles cases when dependences unknown at compile t
4、ime it allows the processor to tolerate unpredictable delays such as cache misses,by executing other code while waiting for the miss to resolveIt allows code that compiled for one pipeline to run efficiently on a different pipeline It simplifies the compiler Why?2022/12/146HW Schemes:Instruction Par
5、allelismKey idea:Allow instructions behind stall to proceedDIVD F0,F2,F4ADDD F10,F0,F8SUBD F12,F8,F14Enables out-of-order execution and allows out-of-order completion(e.g.,SUBD)In a dynamically scheduled pipeline,all instructions still pass through issue stage in order(in-order issue)What are the me
6、aning that in-order issue,out-of-order execution,out-of-order completion?2022/12/147HW Schemes:Instruction ParallelismWill distinguish when an instruction begins execution and when it completes execution;between 2 times,the instruction is in executionWhen and Where in a pipeline?Note:Dynamic executi
7、on creates WAR and WAW hazards and makes exceptions harderWhy it can create WAR and WAW?2022/12/148Dynamic Scheduling Step 1Simple pipeline had 1 stage to check both structural and data hazards:Instruction Decode(ID),also called Instruction IssueSplit the ID pipe stage of simple 5-stage pipeline int
8、o 2 stages:IssueDecode instructions,check for structural hazards Read operandsWait until no data hazards,then read operands Understand?ExISExWb2022/12/149A Dynamic Algorithm:TomasulosFor IBM 360/91(before caches!)Long memory latencyGoal:High Performance without special compilersSmall number of float
9、ing point registers(4 in 360)prevented interesting compiler scheduling of operationsThis led Tomasulo to try to figure out how to get more effective registers renaming in hardware!Why Study 1966 Computer?The descendants of this have flourished!Alpha 21264,Pentium 4,AMD Opteron,Power 5,2022/12/1410To
10、masulo AlgorithmControl&buffers distributed with Function Units(FU)FU buffers called“reservation stations”;have pending operandsRegisters in instructions replaced by values or pointers to reservation stations(RS);called register renaming;Renaming avoids WAR,WAW hazardsMore reservation stations than
11、registers,so can do optimizations compilers cant2022/12/1411Tomasulo AlgorithmResults to FU from RS,not through registers,over Common Data Bus that broadcasts results to all FUsAvoids RAW hazards by executing an instruction only when its operands are availableLoad and Stores treated as FUs with RSs
12、as wellInteger instructions can go past branches(predict taken),allowing FP ops beyond basic block in FP queue2022/12/1412Tomasulo OrganizationFP addersAdd1Add2Add3FP multipliersMult1Mult2From MemFP RegistersReservation StationsCommon Data Bus(CDB)To MemFP OpQueueLoad BuffersStore BuffersLoad1Load2L
13、oad3Load4Load5Load62022/12/1413Reservation Station ComponentsOp:Operation to perform in the unit(e.g.,+or)Vj,Vk:Value of Source operandsStore buffers has V field,result to be storedQj,Qk:Reservation stations producing source registers(value to be written)Note:Qj,Qk=0=readyStore buffers only have Qi
14、for RS producing result Busy:Indicates reservation station or FU is busyRegister result statusIndicates which functional unit will write each register,if one exists.Blank when no pending instructions that will write that register.2022/12/1414Three Stages of Tomasulo Algorithm1.Issueget instruction f
15、rom FP Op Queue If reservation station free(no structural hazard),control issues instr&sends operands(renames registers).2.Executeoperate on operands(EX)When both operands ready then execute;if not ready,watch Common Data Bus for result3.Write resultfinish execution(WB)Write on Common Data Bus to al
16、l awaiting units;mark reservation station available2022/12/1415Three Stages of Tomasulo AlgorithmNormal data bus:data+destination(“go to”bus)Common data bus:data+source (“come from”bus)64 bits of data+4 bits of Functional Unit source addressWrite if matches expected Functional Unit(produces result)D
17、oes the broadcastExample speed:3 clocks for Flopt.+,-;10 clocks for Flopt.*;40 clocks for Flopt./2022/12/1416Tomasulo ExampleClock cycle counterFU countdownInstruction stream3 Load/Buffers3 FP Adder R.S.2 FP Mult R.S.2022/12/1417Tomasulo Example Cycle 12022/12/1418Tomasulo Example Cycle 2Note:Can ha
18、ve multiple loads outstanding2022/12/1419Tomasulo Example Cycle 3Note:registers names are removed(“renamed”)in Reservation Stations;MULT issuedLoad1 completing;what is waiting for Load1?2022/12/1420Tomasulo Example Cycle 4Load2 completing;what is waiting for Load2?2022/12/1421Tomasulo Example Cycle
19、5Timer starts down for Add1,Mult12022/12/1422Tomasulo Example Cycle 6Issue ADDD here despite name dependency on F6?2022/12/1423Tomasulo Example Cycle 7Add1(SUBD)completing;what is waiting for it?2022/12/1424Tomasulo Example Cycle 82022/12/1425Tomasulo Example Cycle 92022/12/1426Tomasulo Example Cycl
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 现代 计算机体系结构
![提示](https://www.taowenge.com/images/bang_tan.gif)
限制150内