计算机体系结构课后习题.docx
《计算机体系结构课后习题.docx》由会员分享,可在线阅读,更多相关《计算机体系结构课后习题.docx(12页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、_计算机体系结构课后习题1.1 Three enhancements with the following speedups are proposed for a new architecture :Speedup1=30Speedup2=20Speedup3=15Only one enhancement is usable at a time.(1) If enhancements 1 and 2 are each usable for 25% of the time ,what fraction of the time must enhancement 3 be used to achie
2、ve an overall speedup of 10?(2)Assume the enhancements can be used 25%,35% and 10% of the time for enhancements 1,2,and 3,respectively .For what fraction of the reduced execution time is no enhancement in use?(3)Assume ,for some benchmark,the possible fraction of use is 15% for each of enhancements
3、1 and 2 and 70% for enhancement 3.We want to maximize performance .If only one enhancement can be implemented ,which should it be ?If two enhancements can be implemented ,which should be chosen?答:(1)Assume: the fraction of the time enhancement 3 must be used to achieve an overall speedup of 10 is x.
4、Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhanced10=11-25%-25%-x+25%30+25%20+x15 So , x=45%(2)Assume:The total execution time before the three enhancements can be used is Timebefore ,The execution time for no enhancement is Timeno.Timeno=1-25%-35%-10%TimebeforeThe total execution time
5、 after the three enhancements can be used is TimeafterTimeafter=Timeno+25%30Timebefore+35%20Timebefore+10%15TimebeforeSo,TimenoTimeafter=90.2%(3)By Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedIf only one enhancement can be implemented:Speedupoverall1=11-15%+15%30=1.17Speedupovera
6、ll2=11-15%+15%20=1.166Speedupoverall3=11-15%+15%15=2.88So,we must select enhancement 1 and 3 to maximize performance.Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall12=11-15%-15%+15%30+15%20=1.40Speedupoverall13=11-15%-70%+15%30+70%15=4.96Speedupoverall23=11-15%-70%+15%
7、20+70%15=4.90So,we must select enhancement 1 and 3 to maximize performance.1.2 Suppose there is a graphics operation that accounts for 10% of execution time in an application ,and by adding special hardware we can speed this up by a factor of 18 . In further ,we could use twice as much hardware ,and
8、 make the graphics operation run 36 times faster.Give the reason of whether it is worth exploring such an further architectural change?答:Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall1=11-10%+10%18=10.9+0.0055555=1.104Speedupoverall2=11-10%+10%36=10.9+0.0027777=1.108S
9、o,It is not worth exploring such an further architectural change.1.3 In many practical applications that demand a real-time response,the computational workload W is often fixed.As the number of processors increases in a parallel computer,the fixed workload is distributed to more processors for paral
10、lel execution.Assume 20 percent of W must be executed sequentially ,and 80 percent can be executed by 4 nodes simultaneously .What is a fixed-load speedup?答:Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall1=WW20%+W80%4=10.2+0.2=2.5So,a fixed-load speedup is 2.5.2.1 Ther
11、e is a model machine with nine instructions,which frequencies are ADD(0.3), SUB(0.24), JOM(0.06), STO(0.07), JMP(0.07), SHR(0.02), CIL(0.03), CLA(0.2), STP(0.01),respectively. There are several GPRs in the machine.Memory is byte addressable,with accessed addresses aligned .And the memory word width
12、is 16 bit.Suppose the nine instructions with the characteristics as following :nTwo operands instructionsnTwo kinds of instruction lengthnExtended codingnShorter instruction operands format:R(register)-R(register)nLonger instruction operands format:R(register)-M(memory)nWith displacement memory addr
13、essing modeA. Encode the nine instructions with Huffman-coding, and give the average code length.B. Designed the practical instruction codes,and give the average code length.C. Write the two instruction word formats in detail.D. What is the maximum offset for accessing memory address?答: Huffman codi
14、ng by Huffman treenADD30%01nSUB24% 11nCLA 20% 10nJOM6% 0001nSTO7%0011nJMP7%0010nSHR2%000001nCIL3%00001nSTP1%000000So,the average code length isi=19pili=2.61bits12_(B)Two kinds of instruction length extended codingnADD30%01nSUB 24% 11nCLA20% 10nJOM6% 11000nSTO7%11001nJMP7%11010nSHR2%11011nCIL3%11100n
15、STP1%11101So,the average code length is(C)Shorter instruction format:Opcode2bitsRegister3bitsRegister3bitsLonger instruction format:opcode5bitsRegister3bitsRegister3bitsoffset5bits(D)The maximum offset for accessing memory address is 32 bytes.3.1Identify all of the data dependences in the following
16、code .Which dependences are data hazards that will be resolved via forwarding?ADDR2,R5,R4ADDR4,R2,R5SW R5,100(R2)ADDR3,R2,R4答:3.2How could we modify the following code to make use of a delayed branch slot?Loop: LW R2,100(R3)ADDI R3,R3,#4BEQ R3,R4,Loop答:LW R2,100(R3)Loop:ADDI R3,R3,#4BEQ R3,R4,LoopDe
17、layed branch slotLW R2,100(R3)3.3Consider the following reservation table for a four-stage pipeline with a clock cycle t=20ns.A. What are the forbidden latencies and the initial collision vector?B. Draw the state transition diagram for scheduling the pipeline.C. Determine the MAL associated with the
18、 shortest greedy cycle.D. Determine the pipeline maximumthroughput corresponding to the MAL and given t.s1s2s3s4123456答:A. the forbidden latencies F=1,2,5 the initial collision vectorC=(10011)B.the state transition diagramC. MAL (Minimal Average Latency)=3 clock cyclesD. The pipeline maximum through
19、put Hk=1/(320ns)3.4Using the following code fragment:Loop: LW R1,0(R2); load R1 from address 0+R2ADDI R1,R1,#1;R1=R1+1SW0(R2),R1;store R1 at address 0+R2ADDI R2,R2,#4;R2=R2+4SUBR4,R3,R2;R4=R3-R2BNEZ R4,Loop;Branch to loop if R4!=0Assume that the initial value of R3 is R2+396.Throughout this exercise
20、 use the classic RISC five-stage integer pipeline and assume all memory access take 1 clock cycle.A. Show the timing of this instruction sequence for the RISC pipeline without any forwarding or bypassing hardwarebut assuming a register read and a write in the same clock cycle “forwards”through the r
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计算机体系结构 课后 习题
限制150内