WJ-CH10-模式识别-聚类算法-05-英文版-教学课件.ppt
(博士/教授/博导)1/模式识别Pattern Recognition Chapter 10(V)OTHER CLUSTERING ALGORITHMS4/10/20231vThe following types of algorithms will be considered:Graph theory based clustering algorithms.Competitive learning algorithms.Valley seeking clustering algorithms.Cost optimization clustering algorithms based on:Branch and bound approach.Simulated annealing methodology.Deterministic annealing.Genetic algorithms.Density-based clustering algorithms.Clustering algorithms for high dimensional data sets.OTHER CLUSTERING ALGORITHMSOTHER CLUSTERING ALGORITHMS2GRAPH THEORY BASED CLUSTERING GRAPH THEORY BASED CLUSTERING ALGORITHMSALGORITHMSvIn principle,such algorithms are capable of detecting clusters of various shapes,at least when they are well separated.In the sequel we discuss algorithms that are based on:The Minimum Spanning Tree(MST).Regions of influence.Directed trees.3vMinimum Spanning Tree(MST)algorithms Preliminaries:Let G be the complete graph,each node of which corresponds to a point of the data set X.e=(xi,xj)denote an edge of G connecting xi and xj.wed(xi,xj)denote the weight of the edge e.Definitions:Two edges e1 and e2 are k steps away from each other if the minimum path that connects a vertex of e1 and a vertex of e2 contains k-1 edges.A Spanning Tree of G is a connected graph that:Contains all the vertices of the graph.Has no loops.The weight of a Spanning Tree is the sum of weights of its edges.A Minimum Spanning Tree(MST)of G is a spanning tree with minimum weight(when all wes are different from each other,the MST is unique).4vMinimum Spanning Tree(MST)algorithms(cont)Example 1:For the MST in the figure and for k=2 and q=3 we have:For e0:we0=17,me0=2.3,e0=0.95.we0 lies 15.5*e0 away from me0,hence it is inconsistent.For e11:we11=3,me11=2.5,e11=2.12.we11 lies 0.24*e11 away from me11,hence it is consistent.6vMinimum Spanning Tree(MST)algorithms(cont)Remarks:The algorithm depends on the choices of k and q.The algorithm is insensitive to the order in which the data points are considered.No initial conditions are required,no convergence aspects are involved.The algorithm works well for many cases where the clusters are well separated.A problem may occur when a“large”edge e has another“large”edge as its neighbor.In this case,e is likely not to be characterized as inconsistent and the algorithm may fail to unravel the underlying clustering structure correctly.7vAlgorithms based on Regions of Influence(ROI)Definition:The region of influence of two distinct vectors xi,xjX is defined as:R(xi,xj)=x:cond(d(x,xi),d(x,xj),d(xi,xj),xi xj where cond(d(x,xi),d(x,xj),d(xi,xj)may be defined as:a)maxd(x,xi),d(x,xj)d(xi,xj),b)d2(x,xi)+d2(x,xj)d2(xi,xj),c)(d2(x,xi)+d2(x,xj)d2(xi,xj)OR(mind(x,xi),d(x,xj)d(xi,xj),d)(maxd(x,xi),d(x,xj)d(xi,xj)OR(mind(x,xi),d(x,xj)d(xi,xj),where affects the size of the ROI defined by xi,xj and is called relative edge consistency.8vAlgorithms based on Regions of Influence(cont)Remarks:The algorithm is insensitive to the order in which the pairs are considered.In the choices of cond in(c)and(d),must be chosen a priori.For the resulting graphs:-if the choice(a)is used for cond,they are called relative neighborhood graphs(RNGs)-if the choice(b)is used for cond,they are called Gabriel graphs(GGs)Several results show that better clusterings are produced when(c)and(d)conditions are used in the place of cond,instead of(a)and(b).10vAlgorithms based on Directed Trees Definitions:A directed graph is a graph whose edges are directed.A set of edges ei1,eiq constitute a directed path from a vertex A to a vertex B,if,A is the initial vertex of ei1 B is the final vertex of eiq The destination vertex of the edge eij,j=1,q-1,is the departure vertex of the edge eij+1.(In figure(a)the sequence e1,e2,e3 constitute a directed path connecting the vertices A and B).11vAlgorithms based on Directed Trees(cont Clustering Algorithm based on Directed TreesSet to a specific value.Determine ni,i=1,N.Compute gij,i,j=1,N,ij.For i=1 to NIf ni=0 then-xi is the root of a new directed tree.Else-Determine xr such that gir=maxxji()gij-If gir0 thenoxr is the parent of xi(there exists a directed edge from xi to xr).13vAlgorithms based on Directed Trees(cont)Remarks:The root xi of a directed tree is the point in i()with the most dense neighborhood.The branch that handles the case gir=0 ensures that no circles occur.The algorithm is sensitive to the order of consideration of the data points.For proper choice of and large N,this scheme behaves as a mode-seeking algorithm(see a later section).Example 2:In the figure below,the size of the edge of the grid is 1 and=1.1.The above algorithm gives the directed trees shown in the figure.15COMPETITIVE LEARNING ALGORITHMSCOMPETITIVE LEARNING ALGORITHMSvThe main ideaEmploy a set of representatives wj(in the sequel we consider only point representatives).Move them to regions of the vector space that are“dense”in vectors of X.vCommentsIn general,representatives are updated each time a new vector xX is presented to the algorithm(pattern mode algorithms).These algorithms do not necessarily stem from the optimization of a cost function.vThe strategyFor a given vector xAll representatives compete to each otherThe winner(representative that lies closest to x)moves towards x.The losers(the rest of the representatives)either remain unchanged or move towards x but at a much slower rate.16Remarks:h(x,wi)is an appropriately defined function(see below).and are the learning rates controlling the updating of the winner and the losers,respectively(may differ from looser to looser).A threshold of similarity (carefully chosen)controls the similarity between x and a representative wj.-If d(x,wj),for some distance measure,x and wj are considered as dissimilar.The termination criterion is|W(t)-W(t-1)|tmax)(max allowable no of iterations)Assign each xX to the cluster whose representative wj lies closest to x.-(*)d(.)may be the Euclidean distance or other distances(e.g.,Itakura-Saito distortion).In addition similarity measures may be used(in this case min is replaced by max).(*)is the learning rate and takes values in 0,1.19vBasic Competitive Learning Algorithm(cont)Remarks:In this scheme losers remain unchanged.The winner,after the updating,lies in the line segment formed by wj(t-1)and x.A priori knowledge of the number of clusters is required.If a representative is initialized far away from the regions where the points of X lie,it will never win.Possible solution:Initialize all representatives using vectors of X.Versions of the algorithm with variable learning rate have also been studied.20vLeaky Learning Algorithm The same with the Basic Competitive Learning Algorithm except part(D),the updating equation of the representatives,which becomes where w and l are the learning rates in 0,1 and wl.Remarks:All representatives move towards x but the losers move at a much slower rate than the winner does.The algorithm does not suffer from the problem of poor initialization of the representatives(why?).An algorithm in the same spirit is the“neural-gas”algorithm,where l varies from loser to loser and decays as the corresponding representatives lie away from x.This algorithm results from the optimization of a cost function.21vConscientious Competitive Learning Algorithms Main Idea:Discourage a representative from winning if it has won many times in the past.Do this by assigning a“conscience”to each representative.A simple implementationEquip each representative wj,j=1,m,with a counter fj that counts the times that wj wins.At part(A)(initialization stage)of GCLS set fj=1,j=1,m.Define the distance d*(x,wj)asd*(x,wj)=d(x,wj)fj.(the distance is penalized to discourage representatives that have won many times)Part(B)becomesThe representative wj is the winner on x ifd*(x,wj)=mink=1,md*(x,wk)fj(t)=fj(t-1)+1 Parts(C)and(D)are the same as in the Basic Competitive Learning AlgorithmAlso m=minit=mmax22vSelf Organizing Maps(cont.)If wj wins on the current input x all the representatives in Qj(t)are updated(Self Organizing Map(SOM)scheme).SOM(in its simplest version)may be viewed as a special case of GCLS ifParts(A),(B)and(C)are defined as in the basic competitive learning scheme.In part(D),if wj wins on x,the updating equation becomes:where(t)is a variable learning rate satisfying certain conditions.After convergence,neighboring representatives also lie“close”in terms of their distance in the vector space(topographical ordering)(see fig.(d).24vSupervised Learning Vector Quantization(VQ)In this case each cluster is treated as a class(m compact classes are assumed)the available vectors have known class labels.The goal:Use a set of m representatives and place them in such a way so that each class is“optimally”represented.The simplest version of VQ(LVQ1)may be obtained from GCLS as follows:Parts(A),(B)and(C)are the same with the basic competitive learning scheme.In part(D)the updating for wj s is carried out as follows25vSupervised Learning Vector Quantization(cont.)In words:wj is moved:Toward x if wj wins and x belongs to the j-th class.Away from x if wj wins and x does not belong to the j-th class.All other representatives remain unaltered.26vValley-Seeking Clustering Algorithms(cont.)Valley-Seeking algorithmFix a.Fix the number of clusters m.Define an initial clustering X.RepeatFor i=1 to NFind j:kji=maxq=1,mkqi-Set ci=jEnd ForFor i=1 to N-Assign xi to cluster Cci.End ForUntil no reclustering of vectors occurs.28vValley-Seeking Clustering Algorithms(cont.)The algorithm Moves a window d(x,y)a at x and counts the points from different clusters in it.Assigns x to the cluster with the larger number of points in the window(the cluster that corresponds to the highest local pdf).In other wordsThe boundary is moved away from the“winning”cluster(close similarity with Parzen windows).Remarks:The algorithm is sensitive to a.It is suggested to perform several runs,for different values of a.The algorithm is of a mode-seeking nature(if more than enough clusters are initially appointed,some of them will be empty).29CLUSTERING VIA COST OPTIMIZATION(REV.)CLUSTERING VIA COST OPTIMIZATION(REV.)vBranch and Bound Clustering AlgorithmsThey compute the globally optimal solution to combinatorial problems.They avoid exhaustive search via the employment of a monotonic criterion J.Monotonic criterion J:if k vectors of X have been assigned to clusters,the assignment of an extra vector to a cluster does not decrease the value of J.Consider the following 3-vectors,2-class case 121:1st,3rd vectors belong to class 1 2nd vector belongs to class 2.(leaf of the tree)12x:1st vector belongs to class 1 2nd vector belongs to class 2 3rd vector is unassigned (Partial clustering-node of the tree).31vBranch and Bound Clustering Algorithms How exhaustive search is avoidedLet B be the best value for criterion J computed so far.If at a node of the tree,the corresponding value of J is greater than B,no further search is performed for all subsequent descendants springing from this node.Let Cr=c1,cr,1 r N,denotes a partial clustering where ci1,2,m,ci=j if the vector xi belongs to cluster Cj and xr+1,xN are yet unassigned.For compact clusters and fixed number of clusters,m,a suitable cost function is where mci is the mean vector of the cluster Cci with nj(Cr)being the number of vectors xx1,xr that belong to cluster Cj.32vBranch and Bound Clustering Algorithms(cont.)InitializationStart from the initial node and go down to a leaf.Let B be the cost of the corresponding clustering(initially set B=+).Main stageStart from the initial node of the tree and go down until either-(i)A leaf is encountered.oIf the cost B of the corr.clustering C is smaller than B then *B=B *C is the best clustering found so faroEnd if-Or(ii)a node q with value of J greater than B is encountered.ThenoNo subsequent clustering branching from q is considered.oBacktrack to the parent of q,qpar,in order to span a different path.oIf all paths branching from qpar have been considered then *Move to the grandparent of q.oEnd if-End ifTerminate when all possible paths have been considered explicitly or implicitly.33vBranch and Bound Clustering Algorithms(cont.)Remarks:Variations of the above algorithm,where much tighter bounds of B are used(that is many more clusterings are rejected without explicit consideration)have also been proposed.A disadvantage of the algorithm is the excessive(and unpredictable)amount of required computational time.34vSimulated AnnealingIt guarantees(under certain conditions)in probability,the computation of the globally optimal solution of the problem at hand via the minimization of a cost function J.It may escape from local minima since it allows moves that temporarily may increase the value of J.DefinitionsAn important parameter of the algorithm is the“temperature”T,which starts at a high value and reduces gradually.A sweep is the time the algorithm spents at a given temperature so that the system can enter the“thermal equilibrium”.NotationTmax is the initial value of the temperature T.Cinit is the initial clustering.C is the current clustering.t is the current sweep.35The algorithm:Set T=Tmax and C=Cinit.t=0Repeat-t=t+1-RepeatoCompute J(C)oProduce a new clustering,C,by assigning a randomly chosen vector from X to a different cluster.oCompute J(C)oIf J=J(C)-J(C)1 points of X).For each cDsp define a connection with all neighboring cubes cj in Dp for which d(mc,mcj)is no greater than 4,where mc,mcj are the means of c and cj,respectively.Main stageDetermine the set Dr that contains:-the highly populated cubes and-the cubes that have at least one connection with a highly populated cube.For each point x in a cube cDr determine Y(x)as the set of points that belong to cubes cj in Dr such that the mean values of cjs lie at distance less than from x(typically=4).57vThe DENCLUE Algorithm(cont.)DENCLUE algorithm(cont.)For each point x in a cube cDrApply a hill climbing method starting from x and let x*be the local maximum to which the method converges.If x*is a significant local maximum(f X(x*)then-If a cluster C associated with x*has already been created thenox is assigned to C-ElseoCreate a cluster C associated with x*oAssign x to C-End ifEnd ifEnd for58vThe DENCLUE Algorithm(cont.)Remarks:Shortcuts allow the assignment of points to clusters,without having to apply the hill-climbing procedure.DENCLUE is able to detect arbitrarily shaped clusters.The algorithm deals with noise very satisfactory.The worst-case time complexity of DENCLUE is O(Nlog2N).Experimental results indicate that the average time complexity is O(log2N).It works efficiently with high-dimensional data.59CLUSTERING ALGORITHMS FOR HIGH-CLUSTERING ALGORITHMS FOR HIGH-DIMENSIONAL DATA SETSDIMENSIONAL DATA SETSvWhat is a High-dimensionality space?Dimensionality l of the input space with20 l few thousands indicate high-dimensional data sets.vProblems of considering simultaneously all dimensions in high-dimensional data sets:“Curse of dimensionality”.As a fixed num