哈希算法(精品).ppt
Hash Tables 1Comp 122,Spring 2004hashtables-2Lin/DeviDictionary wDictionary:Dynamic-set data structure for storing items indexed using keys.Supports operations Insert,Search,and Delete.Applications:Symbol table of a compiler.Memory-management tables in operating systems.Large-scale distributed systems.wHash Tables:Effective way of implementing dictionaries.Generalization of ordinary arrays.Comp 122,Fall 2003hashtables-3Lin/DeviDirect-address Tables wDirect-address Tables are ordinary arrays.wFacilitate direct addressing.Element whose key is k is obtained by indexing into the kth position of the array.wApplicable when we can afford to allocate an array with one position for every possible key.i.e.when the universe of keys U is small.wDictionary operations can be implemented to take O(1)time.Details in Sec.11.1.Comp 122,Fall 2003hashtables-4Lin/DeviHash TableswNotation:U Universe of all possible keys.K Set of keys actually stored in the dictionary.|K|=n.wWhen U is very large,Arrays are not practical.|K|U|.wUse a table of size proportional to|K|The hash tables.However,we lose the direct-addressing ability.Define functions that map keys to slots of the hash table.Comp 122,Fall 2003hashtables-5Lin/DeviHashingwHash function h:Mapping from U to the slots of a hash table T0.m1.h:U 0,1,m1wWith arrays,key k maps to slot Ak.wWith hash tables,key k maps or“hashes”to slot Thk.whk is the hash value of key k.Comp 122,Fall 2003hashtables-6Lin/DeviHashing0m1h(k1)h(k4)h(k2)=h(k5)h(k3)U(universe of keys)K(actualkeys)k1k2k3k5k4collisionComp 122,Fall 2003hashtables-7Lin/DeviIssues with HashingwMultiple keys can hash to the same slot collisions are possible.Design hash functions such that collisions are minimized.But avoiding collisions is impossible.Design collision-resolution techniques.wSearch will cost(n)time in the worst case.However,all operations can be made to have an expected complexity of(1).Comp 122,Fall 2003hashtables-8Lin/DeviMethods of ResolutionwChaining:(拉链法)Store all elements that hash to the same slot in a linked list.Store a pointer to the head of the linked list in the hash table slot.wOpen Addressing(线性探测法)All elements stored in hash table itself.When collisions occur,use a systematic(consistent)procedure to store elements in free slots of the table.k20m1k1k4k5k6k7k3k8Comp 122,Fall 2003hashtables-9Lin/DeviCollision Resolution by Chaining0m1h(k1)=h(k4)h(k2)=h(k5)=h(k6)h(k3)=h(k7)U(universe of keys)K(actualkeys)k1k2k3k5k4k6k7k8h(k8)XXXComp 122,Fall 2003hashtables-10Lin/Devik2Collision Resolution by Chaining0m1U(universe of keys)K(actualkeys)k1k2k3k5k4k6k7k8k1k4k5k6k7k3k8Comp 122,Fall 2003hashtables-11Lin/DeviHashing with ChainingDictionary Operations:wChained-Hash-Insert(T,x)Insert x at the head of list Th(keyx).Worst-case complexity O(1).wChained-Hash-Delete(T,x)Delete x from the list Th(keyx).Worst-case complexity proportional to length of list with singly-linked lists.O(1)with doubly-linked lists.wChained-Hash-Search(T,k)Search an element with key k in list Th(k).Worst-case complexity proportional to length of list.Comp 122,Fall 2003hashtables-12Lin/DeviAnalysis on Chained-Hash-SearchwLoad factor=n/m=average keys per slot.m number of slots.n number of elements stored in the hash table.wWorst-case complexity:(n)+time to compute h(k).wAverage depends on how h distributes keys among m slots.wAssume Simple uniform hashing.Any key is equally likely to hash into any of the m slots,independent of where any other key hashes to.O(1)time to compute h(k).wTime to search for an element with key k is Q(|Th(k)|).wExpected length of a linked list=load factor=n/m.Comp 122,Fall 2003hashtables-13Lin/DeviExpected Cost of an Unsuccessful SearchProof:wAny key not already in the table is equally likely to hash to any of the m slots.wTo search unsuccessfully for any key k,need to search to the end of the list Th(k),whose expected length is.wAdding the time to compute the hash function,the total time required is(1+).Theorem:An unsuccessful search takes expected time(1+).Comp 122,Fall 2003hashtables-14Lin/DeviExpected Cost of a Successful SearchProof:wThe probability that a list is searched is proportional to the number of elements it contains.wAssume that the element being searched for is equally likely to be any of the n elements in the table.wThe number of elements examined during a successful search for an element x is 1 more than the number of elements that appear before x in xs list.These are the elements inserted after x was inserted.wGoal:Find the average,over the n elements x in the table,of how many elements were inserted into xs list after x was inserted.Theorem:A successful search takes expected time(1+).Comp 122,Fall 2003hashtables-15Lin/DeviExpected Cost of a Successful SearchProof(contd):wLet xi be the ith element inserted into the table,and let ki=keyxi.wDefine indicator random variables Xij=Ih(ki)=h(kj),for all i,j.wSimple uniform hashing Prh(ki)=h(kj)=1/m EXij=1/m.wExpected number of elements examined in a successful search is:Theorem:A successful search takes expected time(1+).No.of elements inserted after xi into the same slot as xi.Comp 122,Fall 2003hashtables-16Lin/DeviProof Contd.(linearity of expectation)Expected total time for a successful search=Time to compute hash function+Time to search=O(2+/2 /2n)=O(1+).Comp 122,Fall 2003hashtables-17Lin/DeviExpected Cost InterpretationwIf n=O(m),then=n/m=O(m)/m=O(1).Searching takes constant time on average.wInsertion is O(1)in the worst case.wDeletion takes O(1)worst-case time when lists are doubly linked.wHence,all dictionary operations take O(1)time on average with hash tables with chaining.Comp 122,Fall 2003hashtables-18Lin/DeviGood Hash FunctionswSatisfy the assumption of simple uniform hashing.Not possible to satisfy the assumption in practice.wOften use heuristics,based on the domain of the keys,to create a hash function that performs well.wRegularity in key distribution should not affect uniformity.Hash value should be independent of any patterns that might exist in the data.E.g.Each key is drawn independently from U according to a probability distribution P:k:h(k)=j P(k)=1/m for j=0,1,m1.An example is the division method.Comp 122,Fall 2003hashtables-19Lin/DeviKeys as Natural NumberswHash functions assume that the keys are natural numbers.wWhen they are not,have to interpret them as natural numbers.wExample:Interpret a character string as an integer expressed in some radix notation.Suppose the string is CLRS:ASCII values:C=67,L=76,R=82,S=83.There are 128 basic ASCII values.So,CLRS=671283+76 1282+821281+831280 =141,764,947.Comp 122,Fall 2003hashtables-20Lin/DeviDivision MethodwMap a key k into one of the m slots by taking the remainder of k divided by m.That is,h(k)=k mod mwExample:m=31 and k=78 h(k)=16.wAdvantage:Fast,since requires just one division operation.wDisadvantage:Have to avoid certain values of m.Dont pick certain values,such as m=2pOr hash wont depend on all bits of k.wGood choice for m:Primes,not too close to power of 2(or 10)are good.Comp 122,Fall 2003hashtables-21Lin/DeviMultiplication MethodwIf 0 A 1,h(k)=m(kA mod 1)=m(kA kA)where kA mod 1 means the fractional part of kA,i.e.,kA kA.wDisadvantage:Slower than the division method.wAdvantage:Value of m is not critical.Typically chosen as a power of 2,i.e.,m=2p,which makes implementation easy.wExample:m=1000,k=123,A 0.6180339887h(k)=1000(123 0.6180339887 mod 1)=1000 0.018169.=18.Comp 122,Fall 2003hashtables-22Lin/DeviMultiplication Mthd.ImplementationwChoose m=2p,for some integer p.wLet the word size of the machine be w bits.wAssume that k fits into a single word.(k takes w bits.)wLet 0 s 2w.(s takes w bits.)wRestrict A to be of the form s/2w.wLet k s=r1 2w+r0.wr1 holds the integer part of kA(kA)and r0 holds the fractional part of kA(kA mod 1=kA kA).wWe dont care about the integer part of kA.So,just use r0,and forget about r1.Comp 122,Fall 2003hashtables-23Lin/DeviMultiplication Mthd Implementationks=A2wr0r1w bitsh(k)extract p bitswWe want m(kA mod 1).We could get that by shifting r0 to the left by p=lg m bits and then taking the p bits that were shifted to the left of the binary point.wBut,we dont need to shift.Just take the p most significant bits of r0.binary pointComp 122,Fall 2003hashtables-24Lin/DeviHow to choose A?wAnother example:On board.wHow to choose A?The multiplication method works with any legal value of A.But it works better with some values than with others,depending on the keys being hashed.Knuth suggests using A (5 1)/2.Comp 122,Fall 2003