《飞天开放平台-大数据技术年会-4x324.pptx》由会员分享,可在线阅读,更多相关《飞天开放平台-大数据技术年会-4x324.pptx(26页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、 开放平台 Apsara Cloud PlatformAbout AliyunChinas largest cloud service provider100s of thousands of customersBillions of accesses everydayAlibaba-operated IDCPartner-operated IDCApsara Cloud PlatformACE3rd-party Platform ServicesMap,Mail,Search3rd-party Application ServicesCustomersISV and SIDevelopers
2、Providing Foundation Services of the Cloud Eco-systemPay by usageElasticitySafety(like“tap water”)The Nature of Cloud ComputingScale大规模Economy低成本Public Utility服务运营Internet-scale computing2.5EB generated per day,doubling every 40 monthsBillions of txns on Taobao everyday,must be processed in 6 hoursE
3、conomy means more than low pricesLeading to behavior changes(like“telephone”)Key is scheduling(like“power grid”)Two Design PrinciplesLarge-scale general computing platform as the baseOne system supporting both offline and online servicesMulti-tenancy,resource sharing,load shiftingWeb-based API as th
4、e delivery mechanismOnline activation,pay-by-usageLocation-transparencyLinux ClusterIDCResource Management(伏羲)Security(钟馗)RPC(夸父)Naming/Coordination(女娲)Cluster Deployment(大禹)Cluster Monitor(神农)Distributed File System(盘古)Job Scheduling(伏羲)ACEOSSOTSODPSECS/SLBRDSMap,Mail,Search,etcCloud MartOther Clou
5、d ServicesOSPSCloud Computing ServicesElastic Computing弹性计算ECS:virtualized instances of servers that can be created and tailored to meet application requirementsSLB:software load balancing technology that can elastically expand service capacity on demandACE:Convenient and efficient execution environ
6、ment for Web services,supporting Java,PHP,Node.jsStorage and Databases海量存储和数据库Large-scale Computing大规模计算Cloud Computing ServicesElastic Computing弹性计算Storage and Databases海量存储和数据库OSS:large-scale object storage service for unstructured data such as photos,music,or videoOTS:large scale storage service
7、for structured or semi-structured data storage and real-time queryRDS:managed instances for relational databases with automatic backup and failoverLarge-scale Data Computing大规模计算A Comparison of Storage and Database ServicesOSSOTSRDSData ModelUnstructuredSemi-structuredFully-structuredTarget Data Vol
8、ume10PB100TB/tableTB/dbTxn SupportNoneLimited supportFull SupportProgramming InterfaceRESTful APIRESTful APISQLCloud Computing ServicesElastic Computing弹性计算Storage and Database海量存储和数据库Large-scale Computing大规模计算ODPS:large-scale data batch processing and computation,supporting SQL and MapReduce style
9、programming languagesOSPS:stream data processing service,supporting SQL-like query language and automatic failure recoveryApsara Technical HighlightsA common platform supporting both offline and online servicesSearch:24B pages processed,13B online indexMail:100M mails received,10M mails sent,10ms la
10、tencyCapability-based security management framework,enforcing the Principle of Least PrivilegeDistributed deployment,monitoring and diagnosticsZero SPOF(single-point-of-failure):availability 99.9%All data has 3 replicas:data reliability 99.99999999%5K2013/08/15:First-ever 5000-node Apsara cluster(OD
11、PS)went into production100K CPU cores,100PB raw storageProcessing petabytes per day2013/09/24:Opened access to ODPS for 4 universities&research institutions Sorting 100TB in 30 minutesCurrent known record:72 minutes(Yahoo!,2013/07/03)Pangu:Large-scale Distributed File SystemMaster-Slave Architecture
12、Master for metadata mgmt,Slave(Chunk Server)for IO mgmtPaxos-based multi-master architecture,failure recovery time 1 minuteEnd-to-end inline checksumScales to 1 billion filesCSCSCSCSCSMMMPaxosSeparated IO Pipeline and Storage MgmtAdaptive IO PipelineReplication master:chunk server vs clientReplicati
13、on policy:chaining vs star-replication Chunking policy:fixed,variable,or RAIDDurability guarantee:txn logging vs sequential writeCommon Storage ManagementPhysical IO managementPriority and QoSBackground re-replicationChunk placementStaged Event-driven Physical IO MgmtChunk Server would rearrange IO
14、requests to support priority,QoS,and reduce IO seek overheadDistributed Re-replication1TB1TB1TBTypical:Mirroring(10 hours)1TB1TB1TBPangu:Distributed re-replication(20 min,50-nodes)1TBIntelligent schedulingBalanced storageBandwidth throttlingMinimizing data lossRAIDBuilt into the core system instead
15、of an add-on layer(as in HDFS RAID)Better management of data integrity,recovery,and chunk placementSynchronous redundancy block generationLow-latency failure recoverySmall file supportFuxi MasterFuxi Master.AppMasterAPPWorkerAppMasterAPPWorker.Client.TuboTuboJob controlResource requestsNode controlJ
16、ob submissionAPPWorkerAPPWorkerAPPWorkerTuboTuboFuxi Resource SchedulingMulti-dimension resourcesElastic quotaCGroup-based isolationFuxi Master HAApp Master failoverIncremental schedulingFuxi Job Programming ModelJob:A DAGVertex:TaskEach task may have multiple instances based on input data chunksEdg
17、e:data flow,each task may have multiple input/output flowsA data flow connecting two tasks represents data shufflingDAinput1input2output2BCDAABBCDoutput1MapReduceoutputinputMapMapReduceMapReduce is a degenerated caseExample:Find Best-SellersSELECT prod_id,Sum(count)AS quantity FROM orders GROUP BY p
18、rod_id ORDER BY quantity DESC;order_idprod_idunit_pricecount0001042151003000203343101000003012345010000404215805prod_idquantity0251831790075632845109641229430421520043ordersFuxiMapReduceComparison of Job-execution PlanInputMMR1R2R2prod_idcountprod_idquantityprod_idquantityInputMMR1R2R2prod_idcountpr
19、od_idquantityprod_idquantityM2prod_idquantity020406080100120020406080100120FuxiMapReduceComparison of Job ExecutionR1MR1MM2R2R2A Brief History02/04/2009First line of Apsara code08/27/2010Apsara became the common platform for search,mail,storage,VM and large-scale data processing.07/28/2011Official Web site of Aliyun went online.ECS became the first Aliyun service open to public.08/15/20135000-node Apsara cluster went into production.10/24/20133rd Aliyun dev conference held in Hangzhou.5000 developers attended the conference.Hello,Apsara5KQuestions&Suggestions演讲完毕,谢谢观看!
限制150内