分布式锁服务的设计与实现优秀课件.ppt
分布式锁服务的设计与实现第1页,本讲稿共36页主要内容Debby系统整体设计服务器端设计与实现数据存储的设计与实现客户端设计与实现容错日志(paxos)的设计与实现第2页,本讲稿共36页系统的整体结构第3页,本讲稿共36页Debby server实现服务器和客户端的通信一致性的保证文件、目录的实现Session的实现事件(Event)管理的实现SnapShot第4页,本讲稿共36页服务器和客户端的通信用户调用客户端库于服务器通信通过ICE远程过程调用实现提供的接口connect,close,keepAlive,addEventgetData,setData,create,mkdir,remove,isDir,exists.第5页,本讲稿共36页服务器一致性的保证协议Paxos调用底层Paxos对文件的操作时,把操作提交给台服务器上操作的一致性3保证在Paxos提供的接口PaxossendProposal()第6页,本讲稿共36页Session 的实现服务器维护一个Debby管理器Session通过KeepAlive来保证每个KeepAlive会捎带事件信息KeepAlive:客户端等待,服务器受到请求立即返回第7页,本讲稿共36页文件、目录的实现文件、目录放在内存常规文件系统和临时文件系统常规文件系统map MemDir 用tree.hh实现临时文件系统map 第8页,本讲稿共36页事件管理的实现Debby维护了一个事件管理器已注册的事件和已发生的事件对于已注册的事件,系统维护一个事件到handle列表的map 当心跳发生时,将发生的事件返回给订阅的客户第9页,本讲稿共36页SnapShot只用log恢复服务器带来的问题:日志将会越来越多恢复时间越来越长 本系统采用snapshot(快照)机制解决此问题第10页,本讲稿共36页SnapShot将内存中的文件系统数据结构直接序列化到磁盘上Snapshot过程执行成功后,比snapshot备份时间早的log信息不再需要,可通知paxos将log删除。第11页,本讲稿共36页SnapShotSnapShot方法增加了额外的复杂性实现SnapShot之前,crush掉的服务器只需从其他机器获得最近的log即可进行恢复。实现SnapShot之后,需同时考虑log和snapshot信息。第12页,本讲稿共36页SnapShotclass SnapShotprivate static string DIR_PATH;public static void serialize(MemDir&md);public MemDir&void Unserialize();第13页,本讲稿共36页Debby Client00448161第14页,本讲稿共36页APIvoid create(const string&path,bool ephemeral)void mkdir(const string&path)void remove(const string&path)bool exists(const string&path)bool isdir(const string&path)vector list(const string&path)bool lock(const string&path,bool share)void release(const string&path)string read(const string&path)void write(const string&path,const string&content)void regcb(const string&path,EventType e,shared_ptr cb)void clearcb(const string&path)第15页,本讲稿共36页LockServer dont support lock directly,client use ephemeral file to implement lock service.When client obtain a lock on file,create filename.lck ephemeral file.If file already exists,server would throw a exception,and client returns failure.When client release the lock,simply delete the file.第16页,本讲稿共36页Lock(2)When client lose connection with server,ephemeral file is deleted,including those indicate locks,thus locks is released.To prevent ambiguity,user file is not allowed to end with“”,so they are easy to be differentiated from files used to implement locks.第17页,本讲稿共36页EventsEventTypeEventCreatedEventRemovedEventChangedEventLockChangedEventArbitraryAll event would apply on both directories and files第18页,本讲稿共36页Events(2)All callbacks are managed by client,when a callback is first registered on a event,the client registers the event on master.When a client receives a event,it invoke all callbacks registered on the event.User could cancel all callbacks on a certain path,and client would unregister events on server.第19页,本讲稿共36页Event(3)Client supply a Callback class to implement use callback,it contains a pure virtual function run().User implement their own class based on Callback,and implement the run()function.User could save any necessary information in the class.When client invoke a Callback,it create a thread to invoke the run()function.第20页,本讲稿共36页Choose ServerThere a 5 server in a debby cell,while only one of them is the master.Client use ICE multi-endpoints mechanism to find the only master.Client register the address of all 5 servers to ICE,and ICE will try all 5 addresses to automatically find the right server,as long as there is only one master at one time.第21页,本讲稿共36页Grace PeriodWhen the master election is going on,no service is available,and client must wait for new master to be elected.Use ICE retry mechanism to enable this function,use indicate a retry time series in which ICE will retry connection,for example,3,5,10.We retry connection in 10,30,60第22页,本讲稿共36页Paxos Framework Implementfor fault-tolerance log单栋栋 10748200网络实验室第23页,本讲稿共36页系统结构Api for fault-tolerant logFrom Paxos made live第24页,本讲稿共36页Paxos normal-case operationclientrequetproposalacceptreply012相当于leader客户的两种提交方式:1.只能由leader接受请求并提交(我们的做法,chubby)2.所有服务器都可以接受请求,并把这些请求转给leader,由leader提交。(zookeeper)第25页,本讲稿共36页Paxos消息类型ViewChangeMessage(选leader)HeartBeatMessage(leader租约)PrepareMessage(成leader前的内容同步)PrepareOKMessageProposalMessage(提议)AcceptMessage第26页,本讲稿共36页LeaderElection何时选leader系统启动时检测到当前leader宕机,并已过leader的租约每次选leader都要提交一个全序的view号View号的产生两种选leader的方法每台服务器只提自己当leader可以提别的服务器为leader(我们的实现)第27页,本讲稿共36页PreparePhareLeader被选出后,由leader执行,并只执行一次保证系统安全的过渡Leader catch up第28页,本讲稿共36页ProposalPhare由leader发起Proposal两种proposal同步proposal,客户提交决议后一直等待,直到决定被完成(有可以失败)异步proposal,客户提交决议后马上返回,paxos执行完决议后再通知客户Proposal的限时,重发第29页,本讲稿共36页消息的发送与接收使用boost的asio库进行网络通讯采用多播的方式点对点的catch up使用TCP连接接受消息使用异步socket采用多线程多类型的消息传输第30页,本讲稿共36页实验向paxos不停的提交proposal,让paxos到保证每台服务器数据的一致性(log)每台服务器都记log,并且同步写磁盘对于一个proposal有1秒来还没答成一致,作提交失败处理总共4组实验,每组各进行3次,每次5000个proposal(在单台机器上模拟)3台(运行2台,运行3台)5台(运行3台,运行5台)第31页,本讲稿共36页结果1-3台机器机器数(3台)运行时间提交失败数(5000次提交)平均每秒提交数每100失败率只运行2台 99.133050.7次/秒0.02795.1260101.8564运行3台126.893038.95次/秒0.013128.1541130.1071第32页,本讲稿共36页结果2-5台机器机器数(5台)运行时间提交失败数(5000次提交)平均每秒提交数每100失败数3台204.4856724.96次/秒1.35181.86259214.639775台204.624023.29次/秒0.0067240.3061(截图)199.1060第33页,本讲稿共36页截图第34页,本讲稿共36页截图第35页,本讲稿共36页Paxos总结基本实现paxos协议,并根据上层应用作相应的修改采用c+编码,代码2000行只实现同步提交proposal,未实现异步提交Catch up 机制还未完善第36页,本讲稿共36页