2022年云存储-设计文件 .pdf
《2022年云存储-设计文件 .pdf》由会员分享,可在线阅读,更多相关《2022年云存储-设计文件 .pdf(20页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、 Design:云存储李建奇V0.11 2011-1-15 V0.20 2011-2-20 1 Executive view 1.1 云存储比专有存储的优势?扩展性在 PB 级别后,具有优势共享 pc 计算资源开放,可编程性,适应计算可以更靠近计算Low cost for better reliability&availability&performance 1.1.1Unstructured data 比如,图像但是也只有超过一定的规模后,才具有优势。这个规模临界点,我猜想目前可能在 2PB 左右。(400 台机器的cluster 往上)。每 GB 存储成本的下降,将继续降低这个规模临界点。
2、Haystack 28%cheaper than Nas,4X read throughtput than Nas.1.1.2Structured data 比如,email,webpage,user 等不过就存储计算,也就是structured data 来说,要对比的,倒不是 cost/GB.,存储计算所重视的是海量计算的可用性。1.2 与Cluster NAS 比较z好的 scalability zHigh availability 名师资料总结-精品资料欢迎下载-名师精心整理-第 1 页,共 20 页 -1.3 系统首次部署的开发代价应该小于12 人年可以考虑先部署一个 key/val
3、ue 存储应用。1.4 功能定义z这个存储框架,要为各种云存储提供支持?海量对象存储在离线应用,强调 bandwidth?海量对象存储在在线应用,强调 low latency.z设计的重点是关于分区,复制,transaction,failure recovery 等。在线的海量:?图像?Email,?文档数据库?No sql 数据库1.5 不支持的需求 (TBD)1.6 预防风险我们相信,尚且不知道的影响性能和稳定性的东西是存在的。z尽可能的充分设计z设计和实现要面向服务,组件,重构z实现要尽可能的采取简单策略z复杂系统的故障非常难以诊断,必须充分设计监控和调试部分1.7 产品线,未来zkey
4、/value zBlob,EBS zkey/value db,such as SimpleDB zColumn oriented db,such as Bigtable z。名师资料总结-精品资料欢迎下载-名师精心整理-第 2 页,共 20 页 -2Key feature 2.1 Key feature 2.1.1incremental 扩展2.1.2异构环境Capability aware of storage 2.1.3开放的框架,支持各种云存储应用2.1.4Tune the values of N,R and W to achieve their desired levels of pe
5、rformance,availability and durability 这个与 Cassandra 的 feature 是一样的。2.2 支持的应用2.2.1海量图片存储请求类似于,http:/.2.2.2海量文档数据库2.2.3海量structured data 2.2.4Blob 名师资料总结-精品资料欢迎下载-名师精心整理-第 3 页,共 20 页 -3 概念架构3.1 路线选择Amazon 的策略是,不同存储采用独立系统Google 的以 GFS(closorusu)为所有存储的基础选择 amazon 的结构。倾向于部署到虚拟机上。3.2 倾向于 S3 的 principle ht
6、tp:/ fully decentralized techniques to remove scaling bottlenecks and single points of failure.?Asynchrony:The system makes progress under all circumstances.?Autonomy:The system is designed such that individual components can make decisions based on local information.?Local responsibility:Each indiv
7、idual component is responsible for achieving its consistency;this is never the burden of its peers.?Controlled concurrency:Operations are designed such that no or limited concurrency control is required.?Failure tolerant:The system considers the failure of components to be a normal mode of operation
8、,and continues operation with no or minimal interruption.?Controlled parallelism:Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.?Decompose into small well-understood building bloc
9、ks:Do not try to provide a single service that does everything for every one,but instead build small components that can be used as building blocks for other services.?Symmetry:Nodes in the system are identical in terms of functionality,and require no or minimal node-specific configuration to functi
10、on.?Simplicity:The system should be made as simple as possible(-but no simpler).3.3 允许用户平衡 availability,consistency model 3.4 设计的思路名师资料总结-精品资料欢迎下载-名师精心整理-第 4 页,共 20 页 -自治的,无master 的思路倾向于运行于虚拟机上Dom 0EBS serviceNetworkipipiSCSI targetClientRoutingReplication图:写数据4 参考系统4.1 Dynamo 架构9Consistent hash 9 E
11、ventually consistent 4.2 HayStack Distributed object storage 名师资料总结-精品资料欢迎下载-名师精心整理-第 5 页,共 20 页 -A typical URL that directs the browser to the CDN looks like the following:http:/hCDNi/hCachei/hMachine idi/hLogical volume,Photoi haystack 是 low latency 的好设计。The key insight is to avoid disk operations
12、 when accessing metadata.Haystack provides a fault-tolerant and simple solution to photo storage at dramatically less cost and higher throughput than a traditional approach using NAS appliances.Furthermore,Haystack is incrementally scalable。zHaystack 存储是 log 结构。z用 XFS file system 名师资料总结-精品资料欢迎下载-名师精
13、心整理-第 6 页,共 20 页 -4.3 Amazon s3 4.4 Cassandra (TBD)4.5 Google gfs,Colossus,bigtable 4.5.1.1 GFS Constrains:Most modifications are appends Random writes are practically nonexistent Many files are written once,and read sequentially Two types of reads Large streaming reads Small random reads(in the for
14、ward direction)Sustained bandwidth more important than latency 名师资料总结-精品资料欢迎下载-名师精心整理-第 7 页,共 20 页 -因此,GFS 主要是针对离线部分设计的。The original Google File System,he says,didnt scale as well as the company would like.GFS lessons:zScaled to approximately 50M files,10P zLarge files increased upstream plexity zNo
15、t appropriate for latency sensitive applications zScaling limits added management overhead 4.5.1.2 Colossus9Next-generation cluster-level file system 9Automatically sharded metadata layer 9Data typically written using Reed-Solomon(1.5x)9Client-driven replication,encoding and replication 9Metadata sp
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 2022年云存储-设计文件 2022 存储 设计 文件
限制150内