2022年2022年谷歌文件系统双语 2.pdf
《2022年2022年谷歌文件系统双语 2.pdf》由会员分享,可在线阅读,更多相关《2022年2022年谷歌文件系统双语 2.pdf(12页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、The Google File System Sanjay Ghemawat,Howard Gobioff,and Shun-Tak Leung Google?ABSTRACT 概述We have designed and implemented the Google File System,a scalable distributed file system for large distributed data-intensive applications.It provides fault tolerance while running on inexpensive commodity h
2、ardware,and it delivers high aggregate performance to a large number of clients.我们设计和实现了Google File System,简称 GFS,一个可扩展的分布式文件系统,用于大型分布式数据相关应用。它提供了基于普通商用硬件上的容错机制,同时对大量的客户端提供高性能的响应。While sharing many of the same goals as previous distributed file systems,our design has been driven by observations of o
3、ur application workloads and technological environment,both current and anticipated,that reflect a marked departure from some earlier file system assumptions.This has led us to reexamine traditional choices and explore radically different design points.GFS与此前的分布式文件系统具有许多相同的目标,但我们的设计是基于对我们的应用负载和技术环境的
4、观察而来,包含当前状况,也包含今后的发展,这与一些早期的文件系统的假定就有了分别。这驱使着我们去重新考虑传统的选择和探索新的设计点。The file system has successfully met our storage needs.It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require
5、large data sets.The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines,and it is concurrently accessed by hundreds of clients.这个文件系统成功的满足了我们的存储需求。在Google 它被广泛的部署,我们的业务用其作为生成和处理数据的存储平台,同时也被用于节省在面对大量数据时的研究和开发成本。当前最大的集群已经可以基于超过一千台机器上的
6、数千个磁盘,来存储上万TB的数据,同时它也支持来自于上万个客户端的访问请求。In this paper,we present file system interface extensions designed to support distributed applications,discuss many aspects of our design,and report measurements from both micro-benchmarks and real world use.在这篇论文中,我们展示了文件系统的接口扩展,用以支持分布式应用,并且针对我们的设计进行的多个方面的讨论,以及
7、在真实环境中运行的度量数据。1.INTRODUCTION 简介We have designed and implemented the Google File System(GFS)to meet the rapidly growing demands of Googles data processing needs.GFS shares many of the same goals as previous distributed file systems such as performance,scalability,reliability,and availability.However,
8、its design has been driven by key observations of our application workloads and technological environment,both current and anticipated,that reflect a marked departure from some earlier file system design assumptions.We have reexamined traditional choices and explored radically different points in th
9、e design space.我们设计实现了GFS来应对来自Google 快速增长的数据处理需求。GFS和此前的分布式文件系统具有某些相同的目标,如性能,可扩展型,可靠性和可用性。然而,GFS的设计被 Google 的应用负载情况及技术环境所驱动,具有和以往的分布式文件系统不同的方面。我们从设计角度重新考虑了传统的选择,针对这些不同点进行了探索。名师资料总结-精品资料欢迎下载-名师精心整理-第 1 页,共 12 页 -First,component failures are the norm rather than the exception.The file system consists
10、of hundreds or even thousands of storage machines built from inexpensive commodity parts and is accessed by a comparable number of client machines.The quantity and quality of the components virtually guarantee that some are not functional at any given time and some will not recover from their curren
11、t failures.We have seen problems caused by application bugs,operating system bugs,human errors,and the failures of disks,memory,connectors,networking,and power supplies.Therefore,constant monitoring,error detection,fault tolerance,and automatic recovery must be integral to the system.第一,组件的失效比异常更加常见
12、。文件系统包含了成百上千的基于普通硬件的存储机器,同时被大量的客户端机器访问,组件的数量和质量决定了在某个时刻一些组件会失效而其中的一些无法从失效状态中恢复。我们曾经见到过由于下面的原因引发的实效:应用缺陷,OS缺陷,人为错误,磁盘/内存/连接器/网络/电源错误等等,因此系统必须包含状态监视、错误检测、容错、自动恢复等能力。Second,files are huge by traditional standards.Multi-GB files are common.Each file typically contains many application objects such as we
13、b documents.When we are regularly working with fast growing data sets of many TBs comprising billions of objects,it is unwieldy to manage billions of approximately KB-sized files even when the file system could support it.As a result,design assumptions and parameters such as I/O operation and blocks
14、izes have to be revisited.第二,传统标准的文件量十分巨大,总量一般都会达到GB级别。文件通常包含许多应用对象,诸如Web文档等。当我们在工作中与日益增长的包含大量对象的TB级的数据进行交互时,管理数以亿计的KB大小的文件是非常困难的。所以,设计假定和参数需要重新定义,如I/O 操作和块大小等。Third,most files are mutated by appending new data rather than overwriting existing data.Random writes within a file are practically non-exi
15、stent.Once written,the files are only read,and often only sequentially.A variety of data share these characteristics.Some may constitute large repositories that data analysis programs scan through.Some may be data streams continuously generated by running applications.Some may be archival data.Some
16、may be intermediate results produced on one machine and processed on another,whether simultaneously or later in time.Given this access pattern on huge files,appending becomes the focus of performance optimization and atomicity guarantees,while caching data blocks in the client loses its appeal.第三,多数
17、的文件变化是因为增加新的数据,而非重写原有数据。在一个文件中的随机写操作其实并不存在。一旦完成写入操作,文件就变成只读,通常也是顺序存储。多种数据拥有这样的特征。构造大型存储区以供数据分析程序操作;运行应用产生的连续数据流;历史归档数据;一台机器产生的会被其他机器使用的中间数据;对于巨大文件的访问模式,“增加”变成了性能优化的焦点,与此同时,在客户端进行数据块缓存逐渐失去了原有的意义。Fourth,co-designing the applications and the file system API benefits the overall system by increasing our
18、 flexibility.For example,we have relaxed GFS s consistency model to vastly simplify the file system without imposing an onerous burden on the applications.We have also introduced an atomic append operation so that multiple clients can append concurrently to a file without extra synchronization betwe
19、en them.These will be discussed in more details later in the paper.第四,统一设计应用和文件系统API 对提升灵活性有着好处。例如,我们将GFS的一致性模型设计的尽量轻巧,使得文件系统得到极大的简化,应用系统也不会背上沉重的包袱。我们还引入了一个原子Append 操作,这样多个客户端可以同时向一个文件增加内容,而不会出现同步问题。这些将会在论文的后续章节进行讨论。Multiple GFS clusters are currently deployed for different purposes.The largest ones
20、 have over 1000 storage nodes,over 300 TB of disk storage,and are heavily accessed by hundreds of clients on distinct machines on a continuous basis.名师资料总结-精品资料欢迎下载-名师精心整理-第 2 页,共 12 页 -多个 GFS集群被部署用于不同的用途。最大的一个拥有1000 个存储节点,300TB的磁盘存储,被上万个用户持续的密集访问。2.DESIGN OVERVIEW 设计概览2.1 Assumptions 假定In designing
21、 a file system for our needs,we have been guided by assumptions that offer both challenges and opportunities.We alluded to some key observations earlier and now lay out our assumptions in more details.在设计符合我们需求的文件系统的时候,我们制定了下述的假定,有挑战也有机会。前面我们提到过一些关键的观察,现在我们将其具体化。?The system is built from many inexpe
22、nsive commodity components that often fail.It must constantly monitor itself and detect,tolerate,and recover promptly from component failures on a routine basis.系统由许多便宜常见的组件构成,它们经常出现错误。必须定期进行监视、检测、容错、以及从错误状态恢复到例行工作状态。?The system stores a modest number of large files.We expect a few million files,eac
23、h typically 100 MB or larger in size.Multi-GB files are the common case and should be managed efficiently.Small files must be supported,but we need not optimize for them.系统存储了一定数目的大型文件。我们期望是数百万个文件,每个大概是100MB以上。GB级文件是常见情形,需要被有效的管理起来。小文件也必须支持,但是我们无需为其优化。?The workloads primarily consist of two kinds of
24、 reads:large streaming reads and small random reads.In large streaming reads,individual operations typically read hundreds of KBs,more commonly 1 MB or more.Successive operations from the same client often read through a contiguous region of a file.A small random read typically reads a few KBs at so
25、me arbitrary offset.Performance-conscious applications often batch and sort their small reads to advance steadily through the file rather than go back and forth.系统的负荷来自于两种读操作:大型顺序读,以及小型随机读。在大型顺序读的情况中,单个操作通常读取MB级别以上的数据。来自相同客户端的连续操作通常读取一个文件的连续区间。小型随机读通常读取若干KB的数据据。关注性能的应用往往会将小型读操作进行打包和排序,从而使得在文件中平稳的读取,
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 2022年2022年谷歌文件系统双语 2022 年谷歌 文件系统 双语
限制150内