Abstract Interpretation with Applications to Timing Validation.ppt
Timing Analysis-timing guarantees for hard real-time systems-Reinhard WilhelmSaarland UniversitySaarbrckenTexPoint fonts used in EMF.Read the TexPoint manual before you delete this box.:AAAAAStructure of the Lecture1.Introduction2.Static timing analysis1.the problem2.our approach3.the success4.tool architecture3.Cache analysis4.Pipeline analysis5.Value analysis6.Worst-case path determination7.Timing Predictabilitycachesnon-cache-like devicesfuture architectures8.ConclusionIndustrial NeedsHard real-time systems,often in safety-critical applications aboundAeronautics,automotive,train industries,manufacturing control Wing vibration of airplane,sensing every 5 mSecSideairbag in car,Reaction in 10 mSeccrankshaft-synchronous taskshave very tight deadlines,45uSTiming Analysis Embedded controllers are expected to finish their tasks reliably within time bounds.The problem:Given1.a software to produce some reaction,2.a hardware platform,on which to execute the software,3.required reaction time.Derive:a guarantee for timeliness.Timing Analysisprovides parameters for schedulability analysis:Execution time,Ci,of tasks,and if that is impossible,upper bounds and maybe also lower bounds on execution times of tasks,often called Worst-Case Execution Times(WCET)and Best-Case Execution Times(BCET).Architecture(constant executiontimes)Timing Analysis the Search Spaceall control-flow paths(through the binary executable)depending on the possible inputs.Feasible as search for a longest path if iteration and recursion are bounded,execution time of instructions are(positive)constants.Elegant method:Timing Schemata(Shaw 89)inductive calculation of upper bounds.SoftwareInputub(if if b thenthen S1 elseelse S2):=ub(b)+max(ub(S1),ub(S2)High-Performance Microprosessorsincrease(average-case)performance by using:Caches,Pipelines,Branch Prediction,SpeculationThese features make timing analysis difficult:Execution times of instructions vary widelyBest case-everything goes smoothly:no cache miss,operands ready,resources free,branch correctly predictedWorst case-everything goes wrong:all loads miss the cache,resources are occupied,operands not readySpan may be several hundred cyclesVariability of Execution TimesLOAD r2,_aLOAD r1,_bADD r3,r2,r1PPC 755x=a+b;In most cases,executionwill be fast.So,assuming the worst caseis safe,but very pessimistic!AbsInts WCET Analyzer aiT IST Project DAEDALUS final review report:The AbsInt tool is probably thebest of its kind in the world and it is justified to consider this result as a breakthrough.”Several time-critical subsystems of the Airbus A380 have been certified using aiT;aiT is the only validated tool for these applications.Tremendous Progressduring the 10 years from 1998 to 2008199520022005over-estimation20-30%15%30-50%42560200cache-miss penaltyLim et al.Thesing et al.Souyris et al.The explosion of penalties has been compensated by the improvement of the analyses!10%25%State-dependent Execution TimesExecution time depend on the execution state.Execution state results from the execution history.semantics state:values of variablesexecution state:occupancy of resourcesstateArchitectureTiming Analysis the Search Spacewith State-dependent Execution Timesall control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesSoftwareInputinitialstatemul rD,rA,rB execution states for paths reaching this program pointinstructionin I-cacheinstructionnot in I-cache1bus occupiedbus not occupiedsmall operandslarge operands14 40ArchitectureTiming Analysis the Search Spacewith out-of-order executionall control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesSoftwareInputinitialstateArchitectureTiming Analysis the Search Spacewith multi-threadingall control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesincluding different interleavings of accesses to shared resourcesSoftwareInputinitialstateWhy Exhaustive Exploration?Naive attempt:follow local worst-case transitions onlyUnsound in the presence of Timing Anomalies:A path starting with a local worst case may have a lower overall execution time,Ex.:a cache miss preventing a branch mis-predictionCaused by the interference between processor components:Ex.:cache hit/miss influences branch prediction;branch prediction causes prefetching;prefetching pollutes the I-cache.State Space Explosion in Timing Analysisconstantexecutiontimesstate-dependentexecution timesout-of-orderexecutionpreemptiveschedulingconcurrency+shared resourcesyears+methods199520002010Timing schemata Static analysis?Caches,pipelines,speculation:combined cache andpipeline analysisSuperscalar processors:interleavingsof all schedulesMulti-core withshared resources:interleavingsof several threadsNotions in Timing AnalysisHard or impossible to determineDetermine upper bounds instead High-Level Requirements for Timing AnalysisUpper bounds must be safe,i.e.not underestimatedUpper bounds should be tight,i.e.not far away from real execution timesAnalogous for lower boundsAnalysis effort must be tolerableNote:all analyzed programs are terminating,loop bounds need to be known no decidability problem,but a complexity problem!Timing Accidents and PenaltiesTiming Accident cause for an increase of the execution time of an instructionTiming Penalty the associated increaseTypes of timing accidentsCache missesPipeline stallsBranch mispredictionsBus collisionsMemory refresh of DRAMTLB missExecution Time is History-SensitiveContribution of the execution of an instruction to a programs execution time depends on the execution state,e.g.the time for a memory access depends on the cache statethe execution state depends on the execution historyneeded:an invariant about the set of execution states produced by all executions reaching a program point.We use abstract interpretation to compute these invariants.Deriving Run-Time GuaranteesOur method and tool,aiT,derives Safety Properties from these invariants:Certain timing accidents will never happen.Example:At program point p,instruction fetch will never cause a cache miss.The more accidents excluded,the lower the upper bound.MurphysinvariantFastestVariance of execution timesSlowestAbstract Interpretation in Timing AnalysisAbstract interpretation statically analyzes a program for a given property without executing it.Derived properties therefore hold for all executions.It is based on the semantics of the analyzed language.A semantics of a programming language that talks about time needs to incorporate the execution platform!Static timing analysis is thus based on such a semantics.The Architectural Abstraction inside the Timing AnalyzerTiming analyzerArchitectural abstractionsCacheAbstractionPipeline AbstractionValue Analysis,Control-FlowAnalysis,Loop-BoundAnalysisabstractions ofthe processorsarithmeticAbstract Interpretation in Timing AnalysisDeterminesinvariants about the values of variables(in registers,on the stack)to compute loop boundsto eliminate infeasible pathsto determine effective memory addressesinvariants on architectural execution stateCache contents predict hits&missesPipeline states predict or exclude pipeline stallsTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger LinearProgrammingValue Analysis Determines enclosingintervals for the set of values in registers and local variables,used fordetermining addresses.Loop boundanalysisDetermines loop boundsControl FlowAnalysisDetermines infeasible pathsThe Story in DetailTool ArchitectureValue AnalysisMotivation:Provide access information to data-cache/pipeline analysisDetect infeasible pathsDerive loop boundsMethod:calculate intervals at all program points,i.e.lower and upper bounds for the set of possible values occurring in the machine program(addresses,register contents,local and global variables)(Cousot/Halbwachs78)Value Analysis II Intervals are computed along the CFG edges At joins,intervals are unioned“D1:-2,+2D1:-4,0D1:-4,+2move.l#4,D0add.l D1,D0move.l(A0,D0),D1D1:-4,4,A0 x1000,0 x1000D04,4,D1:-4,4,A0 x1000,0 x1000D00,8,D1:-4,4,A0 x1000,0 x1000access 0 x1000,0 x1008Which address is accessed here?Value Analysis(Airbus Benchmark)1Ghz Athlon,Memory usage Cache sets are independent:Everything explained in terms of one setLRU-Replacement Strategy:Replace the block that has been Least Recently UsedModeled by AgesExample:4-way set associative cacheage0123m0m1Access m4 (miss)m4m2m1Access m1 (hit)m0m4m2m1m5Access m5 (miss)m4m0m0 m1 m2 m3Cache AnalysisHow to statically precompute cache contents:Must Analysis:For each program point(and context),find out which blocks are in the cache prediction of cache hitsMay Analysis:For each program point(and context),find out which blocks may be in the cacheComplement says what is not in the cache prediction of cache missesIn the following,we consider must analysis until otherwise stated.(Must)Cache AnalysisConsider one instruction in the program.There may be many paths leading to this instruction.How can we compute whether a will always be in cache independently of which path execution takes?load a Question:Is the access to a always a cache hit?Determine Cache-Information(abstract cache states)at each Program Pointa,bxyoungest age-0oldest age-3Interpretation of this cache information:describes the set of all concrete cache states in which x,a,and b occur x with an age not older than 1 a and b with an age not older than 2,Cache information contains 1.only memory blocks guaranteed to be in cache.2.they are associated with their maximal age.Cache Analysis how does it work?How to compute for each program point an abstract cache state representing a set of memory blocks guaranteed to be in cache each time execution reaches this program point?Can we expect to compute the largest set?Trade-off between precision and efficiency quite typical for abstract interpretation(Must)Cache analysis of a memory accessa,bxaccess to ab,xaAfter the access to a,a is the youngest memory block in cache,and we must assume that x has aged.What about b?baaccess to ab axyyxconcretetransferfunction(cache)abstracttransferfunction(analysis)Combining Cache InformationConsider two control-flow paths to a program point:for one,prediction says,set of memory blocks S1 in cache,for the other,the set of memory blocks S2.Cache analysis should not predict more than S1 S2 after the merge of paths.the elements in the intersection should have their maximal age from S1 and S2.Suggests the following method:Compute cache information along all paths to a program point and calculate their intersection but too many paths!More efficient method:combine cache information on the way,iterate until least fixpoint is reached.There is a risk of losing precision,not in case of distributive transfer functions.What happens when control-paths merge?a c,f d c e a d a,c d“intersection +maximal age”We canguaranteethis contenton this path.We canguaranteethis contenton this path.Which contentcan weguaranteeon this path?combine cache information at each control-flow merge pointMust-Cache and May-Cache-InformationThe presented cache analysis is a Must Analysis.It determines safe information about cache hits.Each predicted cache hit reduces the upper bound.We can also perform a May Analysis.It determines safe information about cache misses Each predicted cache miss increases the lower bound.(May)Cache analysis of a memory accessa,bxaccess to axaWhy?After the access to a a is the youngest memory block in cache,and we must assume that x,y and b have aged.b,zyzyCache Analysis:Join(may)a c,f d c e a d a,c e f d“union +minimal age”Join(may)Abstract Domain:Must Cachezsxaxsztzsxtszxtztxs AbstractionRepresenting sets of concrete caches by their descriptionconcrete caches z,xsabstract cacheAbstract Domain:Must Cache z,xz,xs s Concretizationsz,xz,x Sets of concrete caches described by an abstract cacheremaining line filled upwith any other blockconcrete cachesabstract cacheover-approximation!Abstract Domain:May Cachezsxaxsztzsxtszxtztxsz,s,x t a Abstractionabstract cacheconcrete cachesAbstract Domain:May Cache Concretizationz,s,x t a abstract may-caches saywhat definitely is not in cacheand what the minimal age of those is that may be in cache.z,s,xz,s,x,tz,s,x,tz,s,x,t,aconcrete cachesabstract cacheLessons LearnedCache analysis,an important ingredient of static timing analysis,provides for abstract domains,which proved to be sufficiently precise,have compact representation,have efficient transfer functions,which are quite natural.Problem Solved?We have shown a solution for LRU caches.LRU-cache analysis works smoothlyFavorable structure“of domainEssential information can be summarized compactlyLRU is the best strategy under several aspectsperformance,predictability,sensitivity and yet:LRU is not the only strategyPseudo-LRU(PowerPC 755 Airbus)FIFOworse under almost all aspects,but average-case performance!Contribution to WCETwhile .do max n .ref to s .odtimetmissthitloop timen tmissn thittmiss (n 1)thitthit (n 1)tmissContextsCache contents depends on the Context,i.e.calls and loopswhile cond do join(must)First Iteration loads the cache=Intersection loses most of the information!Distinguish basic blocks by contextsTransform loops into tail recursive proceduresTreat loops and procedures in the same wayUse interprocedural analysis techniques,VIVU virtual inlining of proceduresvirtual unrolling of loopsDistinguish as many contexts as useful1 unrolling for caches1 unrolling for branch prediction(pipeline)Structure of the Lectures1.Introduction2.Static timing analysis1.the problem2.our approach3.the success4.tool architecture3.Cache analysis4.Pipeline analysis5.Value analysis6.Worst-case path analysis7.Timing Predictabilitycachesnon-cache-like devicesfuture architectures8.ConclusionTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger LinearProgrammingPipelinesHardware Features:PipelinesIdeal Case:1 Instruction per CycleFetchDecodeExecuteWBFetchDecodeExecuteWBInst 1Inst 2Inst 3Inst 4FetchDecodeExecuteWBFetchDecodeExecuteWBFetchDecodeExecuteWBPipelinesInstruction execution is split into several stagesSeveral instructions can be executed in parallelSome pipelines can begin more than one instruction per cycle:VLIW,SuperscalarSome CPUs can execute instructions out-of-orderPractical Problems:Hazards and cache missesPipeline HazardsPipeline Hazards:Data Hazards:Operands not yet available(Data Dependences)Resource Hazards:Consecutive instructions use same resourceControl Hazards:Conditional branchInstruction-Cache Hazards:Instruction fetch causes cache missCache analysis:prediction of cache hits on instruction or operand fetch or storeStatic exclusion of hazardslwz r4,20(r1)HitDependence analysis:elimination of data hazardsResource reservation tables:elimination of resource hazards