[精选]现代计算机体系结构--CPU(英文版)35565.pptx
1本资料来源2CPU(1)3KEY POINTS1.CISC&RISC2.Instruction pipeline3.Instruction-level parallelism4.Dynamic scheduling5.Scoreboard6.Loop unrolling7.Register renaming8.Tomasulos approach41 CISC&RISCWhy CISC(1)?Compiler simplification?DisputedComplex machine instructions harder to exploitOptimization more difficultSmaller programs?Program takes up less memory butMemory is now cheapMay not occupy less bits,just look shorter in symbolic formMore instructions require longer op-codesRegister references require fewer bits51 CISC&RISCWhy CISC(2)?Faster programs?Bias towards use of simpler instructionsMore complex control unitMicroprogram control store largerthus simple instructions take longer to executeIt is far from clear that CISC is the appropriate solution61 CISC&RISCRISC CharacteristicsOne instruction per cycleRegister to register operationsFew,simple addressing modesFew,simple instruction formatsHardwired design(no microcode)Fixed instruction formatMore compile time/effort71 CISC&RISCNot clear cutMany designs borrow from both philosophiese.g.PowerPC and Pentium II81 CISC&RISCRISC发展过程1964年CDC公司推出的CDC 6600是第一台超级计算机,具备了RISC的一些基本特征CDC 6600的设计者认识到为了实现有效的流水技术,需要简化体系结构Load-Store结构记分板(Score-Boarding)动态流水线调度乱序执行(Out-of-Order)技术1976年的Cray-1向量机使用了与CDC 6600类似的想法Cray是CDC 6600的主要设计者之一上述简化结构以高效实现的想法在60-70年代没有受到小型机和微处理器设计者的重视91 CISC&RISC1968年John Cocke在IBM的San Jose研究中心开始ASC(Advanced Scientific Computer)项目的研究基本思想是让编译器做更多的指令调度以减少硬件复杂度还提出了每个周期发射多条指令的思想ASC计划后来被取消,Cocke在1971年到Future System1975年Cocke到IBM的Yorktown研究中心开始研制IBM 801,801是最早开始设计的RISC处理器Cocke获得了Eckert-Mauchly和Turing奖801是PowerPC的前身比801稍晚开始的有Patterson在Berkeley的RISC-I及RISC-II与Hennessy在Standford的MIPS项目这两个大学的研究生曾参与801项目的研究,后来返回大学RISC-II是SPARC的前身,MIPS项目是MIPS处理器前身101 CISC&RISC801的项目经理Joel Birnbaum到HP创立了PA-RISCDEC在推出Alpha之前曾经使用MIPS处理器三年1994年Intel和HP宣布使用相同的系统结构从上述发展过程不难解释刚开始时五个RISC处理器的相似性后来每个RISC处理器有了不同的发展如Alpha的指令简单,超流水结构,流水级多,主频高PowerPC指令功能强,灵活,甚至有点象CISC112 Instruction pipelineMost instructions are register to registerTwo phases of executionI:Instruction fetchE:ExecuteALU operation with register input and outputFor load and storeI:Instruction fetchE:ExecuteCalculate memory addressD:MemoryRegister to memory or memory to register operation12Effects of Pipelining13Optimization of PipeliningDelayed branchDoes not take effect until after execution of following instructionThis following instruction is the delay slot14Normal and Delayed BranchAddressNormal BranchDelayed BranchOptimized Delayed Branch100LOADX,rALOADX,rALOADX,rA101ADD1,rAADD1,rAJUMP105102JUMP105JUMP106ADD1,rA103ADDrA,rBNOOPADDrA,rB104SUBrC,rBADDrA,rBSUBrC,rB105STORE rA,ZSUBrC,rBSTORE rA,Z106STORE rA,Z15Use of Delayed Branch16ControversyQuantitativecompare program sizes and execution speedsQualitativeexamine issues of high level language support and use of VLSI real estateProblemsNo pair of RISC and CISC that are directly comparableNo definitive set of test programsDifficult to separate hardware effects from complier effectsMost comparisons done on“toy”rather than production machinesMost commercial devices are a mixture17General pipeline18Pipeline dependencyWhat does mean dependency?在流水线中,如果某指令的某个阶段必须等到它前面另一条指令的某个阶段后才能开始,则这两条指令存在相关相关的指令要隔开足够远,否则后面的指令就必须等待19Pipeline dependencyData dependencesResource conflictsControl dependences20指令流水线的相关数据相关:使用同一个寄存器引起的相关如后面的指令用到前面指令的结果控制相关:与PC有关的相关每条指令取指用到PC,转移指令修改PC结构相关:资源冲突多条指令同时使用一个功能部件相关引起流水线阻塞21Example 1取指和取数都要访存22Resource conflicts23Data dependencesRAW(Read After Write)后面指令用到前面指令所写的数据WAW(Write After Write)两条指令写同一个单元在简单流水线中没有此类相关,因为不会乱序执行WAR(Write After Read)后面指令覆盖前面指令所读的单元在简单流水线中没有此类相关在动态流水线中会有WAR和WAW相关24Instr.OrderTime(clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11IFID/RFEXMEMWBData dependences25解决RAW相关的Forwarding技术Instr.OrderTime(clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r1126Forwarding27Data dependences with ForwardingInstr.OrderTime(clock cycles)lw r1,0(r2)sub r4,r1,r6and r6,r1,r7or r8,r1,r928Pipeline stop by dependencyInstr.OrderTime(clock cycles)lw r1,0(r2)sub r4,r1,r6and r6,r1,r7or r8,r1,r929static scheduling如下程序段的优化和非优化代码如下程序段的优化和非优化代码a=b+c;d=e-f;Slow code:LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSWd,RdFast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSWd,Rd30Control dependencesPC dependences演讲完毕,谢谢观看!