(2.5.11)--finishing_the_euchromatic_sequen.pdf
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《(2.5.11)--finishing_the_euchromatic_sequen.pdf》由会员分享,可在线阅读,更多相关《(2.5.11)--finishing_the_euchromatic_sequen.pdf(15页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、 2004 Nature PublishingGroupFinishing the euchromatic sequence ofthe human genomeInternational Human Genome Sequencing Consortium*A list of authors and their affiliations appears in the Supplementary Information.The sequence of the human genome encodes the genetic instructions for human physiology,a
2、s well as rich information abouthuman evolution.In 2001,the International Human Genome Sequencing Consortium reported a draft sequence of the euchromaticportion of the human genome.Since then,the international collaborationhas worked to convert this draftinto a genome sequencewith high accuracy and
3、nearly complete coverage.Here,we report the result of this finishing process.The current genomesequence(Build35)contains2.85billionnucleotidesinterruptedbyonly341gaps.Itcovers,99%oftheeuchromaticgenomeandisaccuratetoanerrorrateof,1eventper100,000bases.Manyoftheremainingeuchromaticgapsareassociatedwi
4、thsegmentalduplications and will require focused work with new methods.The near-complete sequence,the first for a vertebrate,greatlyimprovestheprecisionofbiologicalanalysesofthehumangenomeincludingstudiesofgenenumber,birthanddeath.Notably,thehuman enome seems to encode only 20,00025,000 protein-codi
5、ng genes.The genome sequence reported here should serve as afirm foundation for biomedical research in the decades ahead.The Human Genome Project(HGP)was launched in 1990 with thegoal of obtaining a highly accurate sequence of the vast majority ofthe euchromatic portion of the human genome.The initi
6、al workfollowed a two-pronged approach:(1)the mapping of the humanand mouse genomes19to allow the study of inherited disease andprovide a crucial scaffold for genome assembly;and(2)thesequencing of organisms with smaller,simpler genomes1014toserveasatestbedfor methoddevelopmentandassistininterpretin
7、gthe human genome.With success along both paths,the sequencingof the human genome itself eventually became feasible.The Inter-national Human Genome Sequencing Consortium(IHGSC),anopen collaboration involving twenty centres in six countries,wasformed to carry out this component of the HGP.In February
8、 2001,the IHGSC15and Celera Genomics16eachreported draft sequences providing a first overall view of thehuman genome.These sequences allowed systematic study of thehuman genome itself,including identification of genes,combina-torial architecture of proteins,regional differences in genomecomposition,
9、distribution and history of transposable elements,distribution of polymorphism and relationship between geneticrecombination and physical distance.Moreover,systematic knowl-edge of the human genome has enabled new tools and approachesthat have markedly accelerated biomedical research.Bothdraftsequen
10、ces,however,hadimportantshortcomings.TheIHGSC sequence,for example,omitted,10%of the euchromaticgenome;it was interrupted by,150,000 gaps;and the order andorientation of many segments within local regions had not beenestablished.The IHGSC thus turned to the challenge of completingthe sequence of the
11、 euchromatic genome.Operationally,a finishedsequence was defined as having an error rate of,at most,one eventper 104bases,and the goal for completion was coverage in finishedsequence of at least 95%of the euchromatic genome,with the onlygaps being those refractory to all available techniques17(see h
12、ttp:/www.genome.gov/10000923).The goal was challenging because thehumangenomeisrepletewithsuchfeaturesasdispersedrepeatsandlarge segmental duplications,which greatly complicate the deter-mination of genome structure and sequence.In fact,near-completesequences have been obtained so far only for three
13、 multicellularorganisms:thenematode13,mustardweed18andthefruitfly19.Thesegenomes are all roughly 30-fold smaller than the human genomeand have much simpler structure.We describe here the results of a multiyear effort by the IHGSCtowards the goal of a complete human sequence.The number ofgaps has bee
14、n reduced 400-fold to only 341,most of which areassociated with segmental duplications and will require newmethods for resolution.The assembled near-complete genomesequence has an error rate of only,1 event per 100,000 bases;itcontains 2.85 billion nucleotides and covers,99%of the euchro-matic genom
15、e.This paper describes the current genome sequenceand the process used to produce it;examines the accuracy andcompleteness of the sequence;and illustrates biological analysesmade possible by the sequence.We do not attempt here a compre-hensive analysis of the contents of the human genome.An initiala
16、nalysis was previously reported15and a series of papers is beingwritten describing the individual chromosomes17,2030,includingannotation of genes and other features.Current genome sequenceFinishing processThe process of converting the initial draft sequence into a near-complete sequence is referred
17、to as finishing.It is a complexiterative process that proceeds simultaneously at multiple scales,ranging from single nucleotides to the integrity of whole chromo-somes.The fundamental challenge is that genomic regions that arenot well represented or readily resolved through random shotgunsequencing
18、tend to be highly enriched in problematic sequences.Resolving such regions required the development of specialapproaches,which evolved substantially over time and variedamong centres.Broadly,the finishing process involved two distinct components:(1)producingfinished maps,consistingofcontinuousandacc
19、uratepaths of overlapping large-insert clones spanning the euchromaticregionofeachchromosomearm;and(2)producingfinishedclones,consisting of continuous and accurate nucleotide sequence acrosseach large-insert clone.In practice,these two components weretightly intertwined in that progress in each ofte
20、n depended onresultsfromtheother.ThecomponentsaredescribedinBoxes1and2.Further information about the finishing process and finishingstandardscan befound in the Supplementary Information(Note 1)and at http:/www.genome.gov/10000923.In total,we generated a shotgun sequence from 59,208 large-insert clon
21、es(total length,5.84 gigabases(Gb)and finished thesequence from 45,742 of these clones(total length,3.67Gb).Theclones consisted primarily of bacterial artificial chromosomesarticlesNATURE|VOL 431|21 OCTOBER 2004| 2004 Nature PublishingGroup(BACs),but also included some P1-artificial chromosomes(PACs
22、),yeast artificial chromosomes(YACs),fosmids and cosmids;theycarried DNA from multiple anonymous sources15.We then chose aclone tiling path of 26,720 overlapping clones across the genome,selected a sequence tiling path of directly adjacent,non-overlappingsegments from consecutive clones and concaten
23、ated these segmentsto create a near-complete genome sequence.Contributions of theIHGSC centres to this finishing phase are shown in Table 1.Genome sequenceThe human sequence reported here consists of 2,851,330,913nucleotides,lying almost entirely within the euchromatic portionof the genome(Table 2).
24、Itis interrupted byonly 341 gaps,of which33 gaps(totalling,198 megabases(Mb)reflect heterochromatin,which was not targeted by the HGP,and 308 gaps(totalling,28Mb)are euchromatic.The euchromatic genome is thus,2.88Gb and the overall human genome is,3.08Gb.The long-range continuity of the current geno
25、me sequence is high by variousmeasures(Table 3).The N50 length is 38.5Mb and the N-averagelength is 40.9Mb;these values are,1,000-fold larger than the sizeof a typical human gene.(The first statistic is the length x such thatatleast50%ofnucleotideslieinacontinuoussegmentoflength$x,whereas the second
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 2.5 11 finishing_the_euchromatic_sequen
![提示](https://www.taowenge.com/images/bang_tan.gif)
限制150内