11-An Effective Merge Strategy Based Hierarchy.docx
An Effective Merge Strategy Based Hierarchy for ImprovingSmall File Problem on HDFSZhipeng Gao1, Yinghao Qin1, Kun Niu2,State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications,Beijing 100876, China2School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, Chinagaozhipeng, qyhqyh 123123163 , niukunAbstract: Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large file and low-cost storage capability. As HDFS architecture bases on master (NameNode) to handle metadata for multiple slaves (DataNode), NameNode often becomes bottleneck, especially when handing large number of small files. It is a common solution to merge many small files into one big file about this problem. HDFS does not consider the correlation between the files stored on it, it is hard to use one efficient prefetching mechanism. To solve the large small files problem and to improve efficiency of accessing small files , in this paper, we define Logic File Name (LFN) and propose Small file Merge Strategy Based LFN (SMSBL). SMSBL is a new idea and a new perspective on hierarchy, it improves the correlation of small files in the same block of HDFS effectively based different file system hierarchy, so the performance is amazing facing large small files when HDFS adopted SMSBL with prefetching mechanism. The system efficiency analysis model is established and experimental results demonstrate that SMSBL can solve small file problem in HDFS and has appreciable high hit rate of prefetching files.Keywords: HDFS; small files; merging and prefetching;IntroductionWith the rapid development of Internet services, the amount of data grows exponentially, cloud computing has become increasingly popular as the next infrastructure for hosting data and deploying software and services 1. As data explode from the web applications and grow bigger, it is hard for traditional system to handle this situation, so new projects and systems are needed. Distributed file systems are introduced and developed for cloud computing and for big data. The most three famous distributed file systems examples are Google file system (GFS) 2, Hadoop distributed file system (HDFS) and Amazon Simple Storage Service (S3). Among them, HDFS is an open-source software framework influenced by GoogleFS, it is widely used in many industries such as science, biology, climatology, astronomy, finance, internet, geography, etc 3.A small file is a file whose size less than the HDFS block size 4. The design of the HDFS is for storing large files, HDFS is inefficient because it has high memory usage and unacceptable access cost when it stores a large number of small files 5.The rest of the paper is organized as follows. Section II discusses the background and related work. Section III describes our proposed new approach SMSBL to impove HDFS. Experiments are conducted in section IV. Section V concludes the summary of this paper and future work.1 BackgroundHDFSInternet giants, such as Facebook, Twitter and YAHOO, use HDFS as their basic distributed data storage environment. The design of Hadoop Distributed File System (HDFS) is fully inspired by Google File System (GFS). Both HDFS and GFS are master/slave architecture. The previous version of HDFS cluster has only one master node called Namenode, which holds and manages the metadata information include distributed file system namespace, file descriptions, file-data block mappings, data block allocations, access regulations and so on 6. The High Availability (HA) version of HDFS cluster has more than one namenode, active namenode and standby namenode, to enhance reliability of HDFS and efficiency of operation in cluster 7. Along with continuous improvemengt of Hadoop and HDFS, they play more and more important roles in the era of big data.Namenode not only is in charge of responsing to client who requests for accessing some files on the HDFS cluster, but also manages huge metadata information. Whatever the size of file is, the metadata of each file consumes 250bytes and its block with default three replicas consumes 368 bytes in memory of Namenode 8. When many small files are stored in HDFS, the memory of Namenode will has high pressure and then maintaining these huge metadata is inefficient 9.When the size of relevant metadats is larger than the size of memory in Namenode or frequently accessing small files exceeding I/O capacity of the cluster, this HDFS may shutdown abnormally 10. Above situations happen on account of that design of original HDFS is for huge files with the purpose of big scale transfer 11.Along with the era of big data coming, Hadoop and HDFS have to deal with many complicated data which them may not be good at. HDFS architecture is shown in Fig.l.Related workThe present studies on handing small file problem mainly concentrate on two ways:One of ways is improving the architectural of distributed files systems. Xuhui Liu's approach is merging small files into big ones and building indexs for each small file which stored consecutively in physical memory in terms of their geographic locations where hash index is used 12. But his approach is exclusively used in WebGIS. Chandrasekar S proposed Extended Hadoop Distributed File System (EHDFS), it has been designed and implemented in such a way that a large number of small files can be merged into a single combined file and it also provides a framework for prefetching metadata for a specified number of files 13. Chatuporn Vbrapongkitpun proposed a mechanism based on Hadoop Archive (HAR), called New Hadoop Archive (NHAR), to reduce the memory utilization for metadata and enhance the efficiency of accessing small files in HDFS. Dipayan Dev designed Hadoop Archive Plus (HAR+) using hashtable on its architecture, selected sha256 as the key, which is a modification of existing HAR. HAR+ is designed to provide more reliability and it can also provide auto scaling of metadata 14. But improvement of this architectural design is very complex and high cost in resource.Optimizing strategy of merging small files and using cache for prefetching files is another way to impove HDFS handing small files. Du zhonghui presented a balance of small files merging algorithm to optimize distribution of merged large files volume, which could effectively reduce the numbers of HDFS data blocks. Zhang chunming proposed an approach called HIFM (hierarchy index file merging), in which the correlations between small files and the directory structure are considered to assist merging small files into large ones and then generating hierarchical index 15. Zhang chunming proposition is good but it has not good universality.1.1 Our contributionThrough study on the above these approaches, we propose a new approach which have better performace in local correlation and in universality. In our new approacch we define Logic File Name (LFN), and adjust order of fields in LFN to match its using environment. Our approach named small file merge strategy based LFN (SMSBL) can be used in most hierarchy file systems and can elevate the hit rate of prefetching files on HDFS.The contribution of our work:1) Solve the problem of large small files on HDFS. Our proposed approach which merges small files into larger ones and build index for file accessing ensure HDFS available facing large small files.2) SMSBL has highly universality and flexibility. Our proposed approach can be used in most file system of hierarchy, LFN will match concrete hierarchy. In different use situation, we always adjust priorities of LFN's fields to increase local correlation of small files before merging small files.3) SMSBL has high hit rate of prefetching files. Because of adjustable priorities of LFN9s fields, local correlation of small files can be always high, this is an essential foundation for high hit rate of prefetching files.2 Our proposed approachThere are two steps in our proposed approach. First, making a radix sort on files set under the rule of altered LFN to get a sequence of output files. Second, merging files from the sequence of output files to block files orderly and building index files, then uploading them to HDFS.We proposed one concept named Logic File Name (LFN) to support finding the optimum sequence of output files, in order to enhance local correlation of files on HDFS in different situations.2.1 Hierarchy of file system and Logic File Name (LFN)As shown in Fig. 2, many file systems adopt hierarchy. And Fig. 3 shows the corresponding LFN.二 |J | | :二 fi 图 dT | |1 .J .J .J 1 .J | .J j 1 11 -J . Figure 2 File system architecturehashvalue timeuserid filetype fileidLFNxfix 19fix2,fix3,.?fixv. The number of items of LFNx may not equals the number of items of LFN, because field could be split for more precise dimension.Figure 3 LFNLogic File Name (LFN) is a filename that has meaning of hierarchy. It is made up of some fields having logic meaning based on its parent directories name and file name, representing the hierarchy of file system. Each dimension of LFN attributes contain each dimension of information about this file. In most situations the structs of LFN are different in different system. As shown in Fig. 3, in this file system LFN of one file is made up of hash value、creating time> related user id、file type file id and some other information. Such as the LFN of the bottom left file in Fig. 2 is “hasha_timel_useridl_typel_l”. Logic file name is a global notion, so when some files have not attributes in some dimension these fields should be filled with default value such as ''nuH''. Some fields of LFN should be encoded to represent complex string value, it is important for subsequent processing to reduce calculation. A set of LFN based on the store system and a corresponding map from a LFN to a address of physical location will be set up through traversing directory tree.3.2 Radix sort based on altered LFNuserid time filetype fileidFigure 4 Altered LFNIn practical applications, the order of access files depends on the usage environment, so it is absolutely impossible that one certain store situation can be suitable for every cases. By altering the priorities of LFN fields with different situations, the radix sort could produce a queue customized for usage in a certain case. For example, in some geographic information system, the correlation of files on field of coordinates may be more important than the correlation of files on field of time, but in some weather forecast system the correlation of files on field of time may be more important than the correlation of files on field of coordinates. In the storage system of some social networks , time field may be ahead of file types field, but in some other social networks, file type may be more important. Fig. 4 shows altered LFN for one specific case, by customizing priorities of other fields under the same field of hashvalue.Here is algorithm description of radix sort based on LFN:Fset=fi f2,f3,.i,n files should be deal with.1)Alter priorities of LFN fields for adaptation to specific situations, from LFNfil9fi29fi3,.9fik to 2)Set up item lists for sort based on field fixv, and start one process called assign on Fset in which file belong to this fixv key should append to the corresponding list.3)Collect these lists items in order, and this process will form one new list, called Fsetx.4)Set up radix items lists based on field fixv-1 just like 2) but on Fsetx,and repeat 2)4) until deal all fields. The number of iteration is v, equal the number of fields of LFNx.5)The Fsetx is a sorted list based on LFN.Here is pseudocode of our proposed approach:LFNlist=fixi,fix2,fix3,.fixv*Jtraverse(file system) is a function which through all files on this file system.8file.fieldname return certain field type in file's LFN +distribute (fieldname) is a distribution function in radix sort*"Fset=traverse(file system) 8Fset_temp=Fset*Jfbr fieldx in LFNlist.reverse:itemlist=8for file in Fset temp:*"itemlistdistribute (file.fieldx).append(file)*JFset_temp=<Jfbr fileset in itemlist:Fsettemp.appenfileset)Fset=Fset_temp*Jreturn FsetHere is analysis of complexity: The original list has n items, d keys, the range of keys is a list = rxl,rx2,rx3. rxv. The time complexity is O(dx(n+max(rx 1 ,rx2,. .rxv). The time complexity of one assign process is O(n) and the time complexity of one collection is O(radix), the space complexity of assigns and collections is d.3.3 Merging small files and store on DHFSThe merge process depends on the customized sorted list in Fsetx. In HDFS files are saved in distributed blocks which have uniform size (The size is 128M at present version of HDFS by default) 16. In some situation one small file has to be divided to two parts stored in two blocks to make the most of volume of one block, but it is not efficient to read two blocks when accessing the one small file. So it is important to keep balance between time cost and space cost, as well as a principle in computer systems. In algorithm of merging small files there is a threshold. When free space of one block is smaller than threshold (default size is 10%) and it can not load entirely the next small file, this block should be uploaded onto HDFS and start new one block. Here is the algorithm of merging small files.l)Create a new buffer file and an index file which named buffer_file and maptable contain the small file offset information. The size of buffer file base on the 3)If the size of this small file less than free space of buffer file,then put this small file in buffer file and update maptable and repeat2),if not go to 4).4)If the size of free space of buffer file greater than 10%, but it can not hole whole small file,then split small file based on free space of buffer file, then put buffer file and maptable onto HDFS and take merge process. Then do rest part file based on 3)5)If the size of free space of buffer file less than 10% and this small file greater than free space of buffer file, put buffer file and maptable onto HDFS and take merge process. Make new buffer file and maptable. Go to 2)6)Cope with all files in Fsetx.There is pseudocode of merge process:block=sizeof(Hadoop_block)*Jbuflfer_file=new fHe(block)wmaptable=new fileQFor file in Fsetx:If file.size<= buffcrfile.remainO:bufferfile.appendCfile)maptable.writemapCfile)else if buffer_file.remainQ/block>0.1 and fHe.size>buffer_file.remain(): 8file 1 ,file2=file.split(buffer_file.remain)*Jbufferfile. append(maptable. writemap(file 1) uptohdfs(buflfer_£le,maptable)。Fsetx.insert(file2)*JBuffer_file=new fileCblock)*"Maptable=new fileQelse:*"uptohdfsfbufferfilejmaptable) buffer_file=new filelo