程序设计与算法基础课件.ppt
《程序设计与算法基础课件.ppt》由会员分享,可在线阅读,更多相关《程序设计与算法基础课件.ppt(86页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、程序设计与算法基础程序设计与算法基础(6)潘爱民2006/10/30OutlinelHash tableslBloom filterlInverted indexSearching problem againlFor a linked list,-O(n)lFor a sorted array,-O(logn)lCan we expect O(1)?which meanslRegardless of the number of elements being searched,the run time is always the samelGiven a key,the position in
2、the table can be accessed directlyOperations on hash tableslCollection of pairsl(key,element),here key maybe a string,number,record,etc.lPairs have different keyslOperationslGet(theKey)lDelete(theKey)lInsert(theKey,theElement)Ideal HashinglUses a 1D array(or table)table0m-1.lEach position of this ar
3、ray is a bucketlA bucket can normally hold only one pairlUses a hash function h that converts each key k into an index in the range 0,m-1lh(k)is the home bucket for key klEvery pair(key,element)is stored in its home bucket tablehkey01234567Ideal Hashing ExamplelPairs are:(22,a),(33,c),(3,d),(73,e),(
4、85,f)lHash table is table07,m=8lHash function is h(key)=key/11 lPairs are stored in table as below(85,f)(22,a)(33,c)(3,d)(73,e)Get,Insert,and Delete take O(1)timeWhat Can Go Wrong?lWhere does(26,g)go?lKeys that have the same home bucket are synonymsl22 and 26 are synonyms with respect to the hash fu
5、nction that is in use.lThe home bucket for(26,g)is already occupied.01234567(85,f)(22,a)(33,c)(3,d)(73,e)What Can Go Wrong?lA collision occurs when the home bucket for a new pair is occupied by a pair with a different keylAn overflow occurs when there is no space in the home bucket for the new pairl
6、When a bucket can hold only one pair,collisions and overflows occur togetherlNeed a method to handle overflows(85,f)(22,a)(33,c)(3,d)(73,e)Hash Table IssueslChoice of hash functionslCollision resolutionlSize of hash tablelThe size of bucket&number of bucketsHash functionslDefinition:lA hash function
7、 transforms a key K into an index in the table used for storing items of the same type as KlIf hash function h transforms different keys into different numbers,it is called a perfect hash function lTherefore,a hash function is a mapping from n items to m positionslTotal number of possible mappings i
8、s mnlIf n m,the number of perfect hash functions is m!/(m-n)!Hash FunctionslTwo partslConvert a key into an integer in case the key is not an integerlh(k)lMap an integer into a home bucketlf(k)is an integer in the range 0,m-1,where m is the number of buckets in the tableString To Non-negative Intege
9、rlEach character is 1 byte longlAn int is 4 byteslA two-character string s may be converted into a unique 4 byte non-negative int using the code:lint answer=s.at(0);lanswer=(answer 8)+s.at(1);lStrings that are longer than 3 characters do not have a unique non-negative int representationString To Non
10、negative Integerunsigned long operator()(const string theKey)/Convert theKey to a nonnegative integer.unsigned long hashValue=0;int length=(int)theKey.length();for(int i=0;i length;i+)hashValue=5*hashValue +theKey.at(i);return hashValue;Map into a home bucketlMost common method is by division homeBu
11、cket=h(theKey)%divisor;ldivisor equals the number of buckets ml0 homeBucket divisor=m01234567(85,f)(22,a)(33,c)(3,d)(73,e)Uniform Hash FunctionLet keySpace be the set of all possible keysA uniform hash function maps the keys in keySpace into buckets such that approximately the same number of keys ge
12、t mapped into each bucket01234567(85,f)(22,a)(33,c)(3,d)(73,e)Uniform Hash Function Equivalently,the probability that a randomly selected key has bucket i as its home bucket is 1/m,0 i m A uniform hash function minimizes the likelihood of an overflow when keys are selected at random01234567(85,f)(22
13、,a)(33,c)(3,d)(73,e)Hashing By DivisionlkeySpace=all intslFor every m,the number of ints that get mapped(hashed)into bucket i is approximately 232/mlTherefore,the division method results in a uniform hash function when keySpace=all intslIn practice,keys tend to be correlatedlSo,the choice of the div
14、isor m affects the distribution of home bucketsSelecting The DivisorlBecause of this correlation,applications tend to have a bias towards keys that map into odd integers(or into even ones)lWhen the divisor is an even number,odd integers hash into odd home buckets and even integers into even home buc
15、ketsl20%14=6,30%14=2,8%14=8l15%14=1,3%14=3,23%14=9lThe bias in the keys results in a bias toward either the odd or even home bucketsSelecting The DivisorlWhen the divisor is an odd number,odd(even)integers may hash into any homel20%15=5,30%15=0,8%15=8l15%15=0,3%15=3,23%15=8lThe bias in the keys does
16、 not result in a bias toward either the odd or even home bucketslBetter chance of uniformly distributed home bucketslSo do not use an even divisorSelecting The DivisorlSimilar biased distribution of home buckets is seen,in practice,when the divisor is a multiple of prime numbers such as 3,5,7,lThe e
17、ffect of each prime divisor p of m decreases as p gets largerlIdeally,choose m so that it is a prime numberlAlternatively,choose m so that it has no prime factor smaller than 20Hashing by foldinglThe key is divided into several parts,and these parts are combined or folded togetherlShift foldinglUsin
18、g a simple operation such as addition to combine them in a certain waylFor example,the social security number(SSN),123-45-6789(123+456+789)%m(m=1000)lBoundary foldinglThe pieces of the keys are folded on the borders between different parts,for example(123+654+789)%m(m=1000)About foldinglSimple and f
19、ast,bit pattern can be used instead of numerical valueslIn the case of stringslXoring the characters together,and using the result for the addresslXoring the chunks of strings rather than single characters.The length of a chunk is equal to the number of bytes in an integerlTypically,the result of fo
20、lding processing are divided modulo mHashing by a Mid-square functionlThe key is squared and the middle or mid part of the result is used as the hash valuelEntire key participates in generating the hash value,so a better chance that the different values are generated for different keyslIn practice,i
21、t is more efficient to choose a power of 2 for the size of the table,m,and extract the mid part of the bit representation of the square of a key,just using a mask and a shift operationlFor example,31212=100101001010000101100001,the mid part 101000010=322Hashing by extractionlOnly a part of the key i
22、s used to compute the addresslIf the part is distributed uniformly,it can be sufficient for hashing(provided the omitted portion distinguishes the keys only in an insignificant way)lFor example,in the student ID,some digits are safely omitted in a hash functionlISBN code is similarHashing by radix t
23、ransformationlThe key is transformed into another number baselFor example,K is the decimal number 345,then its value in base 9 is 423(9)lThen it is divided modulo m in original base,the resulting number is used as the hash valuelIf m=100,345(10)and 245(10)are not hashed to the same value,but 345(10)
24、and 264(10)will be hashed to a same value,23Hashing by cryptographic hash algorithmslStrong hash algorithmslA change of any bit in the input will cause the change of any bit in the output with 0.5 probability approximatelylThe encryption algorithm has similar featurelUsing cryptographic algorithms t
25、o hash a keylFor example,compute the MD5(key)or SHA1(key)or encrypt the key with a fixed encryption key,DESk(key)lThen divided modulo mCollision resolutionlAvoid collisionlIncreasing the table size may lead to better hashing,but sometimes notlThe two factors hash function and table size may minimize
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 程序设计 算法 基础 课件
限制150内