被子植物系统发育精选文档.ppt
被子植物系统发育本讲稿第一页,共二十二页一、前言背景二、材料与方法三、结果四、讨论本讲稿第二页,共二十二页一、前言背景nPremise of the study:Recent analyses employing up to fi ve genes have provided numerous insights into angiosperm phylogeny,but many relationships have remained unresolved or poorly supported.In the hope of improving our understanding of angiospermphylogeny,we expanded sampling of taxa and genes beyond previous analyses.nSoltis et al.,1999,2000the 567-taxon,three-gene(rbcL,atpB,and 18S rDNA)本讲稿第三页,共二十二页nHilu et al.(2003)conducted a broad analysis of angiosperms based on matK sequences,with results that agreed closely with the three-gene topologynDavies et al.(2004)constructed a supertree for angiosperms.nSoltis et al.(2007)undertook a Bayesian analysis of the 567-taxon,three-gene data set and obtained a topology nearly identical to that obtained with parsimony.nBell et al.(2010)used a Bayesian relaxed clock model to analyze this same three-gene data set and found results similar to Soltis et al.(2007).本讲稿第四页,共二十二页nBurleigh et al.(2009)used five genes for the same 567 taxa analyzed in Soltis et al.nThe five-gene matrix had significantly more missing data(27.5%)than the three-gene matrix(2.9%),but the five-gene analysis resulted in higher levels of bootstrap support across the tree.本讲稿第五页,共二十二页The value of constructing data sets of many genesnemployed nearly complete plastid genome sequence data(e.g.,Leebens-Mack et al.,2005)nlimited in sampling to fewer than 100 taxanbased on many genes but focused only on major angiosperm clades nThe valuewith very large amounts of data(i.e.,13 to 83 genes),many,if not most,deep-level questions of angiosperm phylogeny can be resolved.(e.g.,Sch nenberger et al.,2005;Jianet al.,2008;H.Wang et al.,2009;Brockington et al.,2010;Tank and Donoghue,2010)本讲稿第六页,共二十二页The goal:assemble a data set having both broad taxonomic coverage and numerousgenes.1、The three-and fi ve-gene analyses of 567 taxa have broad taxonomic coverage,support for many portions of the framework of angiosperm phylogeny is low in these studies.2、Conversely,studies employing complete plastid genome sequences have deep gene coverage and strong internal support,but taxonomic coverage is often sparse.本讲稿第七页,共二十二页In this text:1、using genes that represent all three plant genomic compartments:nucleus,plastid,and mitochondrion2、constructing a 17-gene data set for 640 species representing 640 genera,330 families,and58 of 59 orders(sensu APG III,2009).本讲稿第八页,共二十二页二、材料与方法1、DNA samples extracted:from either fresh or silica-dried material following the general method of Doyle and Doyle(1987)or modifications thereof that employ liquid nitrogen and higher CTAB concentrations(e.g.,Soltis et al.,1991;Sytsma,1994)本讲稿第九页,共二十二页Multiple Species2、Use the same species and DNA samplesMany of these DNA samples have been used in earlier analyses(e.g.,Chase et al.,1993;Soltis et al.,2000).across all of the genes analyzed here,although multiple species were sometimes used as necessary placeholders to reduce missing data.本讲稿第十页,共二十二页The outgroup3、Extant gymnosperms:Cycas,Ginkgo,Gnetum,Metasequoia,Pinus,Podocarpus,Welwitschia,and Zamia.本讲稿第十一页,共二十二页Constructed Tow Data SetsnGiven the potential problems inherent in phylogeny reconstruction using mtDNA sequences.nConstructed a data set without the mtDNA data,resulting in a matrix of 13 genes.本讲稿第十二页,共二十二页The selection of taxon samplingnWith poorly resolved clades(e.g.,Malpighiales,Saxifragales)targeted for denser taxon sampling.nAvoided parasitic clades:can create analytical problems due to gene loss,accelerated molecular evolution,and horizontal gene transfer.本讲稿第十三页,共二十二页The 17 GenesnFrom the nuclear genome18S and 26S rDNA;nFrom the plastid genomeatpB,matK,ndhF,psbBTNH(four contiguous genes here treated as one region),rbcL,rpoC2,rps16,and rps4;nFrom the mitochondrial genome atp1,matR,nad5,and rps3 本讲稿第十四页,共二十二页nThe 17-gene matrix was 25 260 bp and represent all three plant genomesnThe 13-gene matrix was 19 846 bp and represent only the nucleus and plastid.nThe percentage of missing data for the full data set was 41%and for the data set without mtDNA data was 42%.本讲稿第十五页,共二十二页Alignment and phylogenetic analysesnAll sequence data were stored and managed in TOLKINa web application,distance collaboration as part of the Angiosperm Tree of Life project,that allows users to access and share data in real time,as well as automatically generate FASTA files and link to other relevant information and resources.nAlignments of combined sequences were generated with the program MAFFT at the DNA level using the l-ins-i algorithm and default alignment parameters.本讲稿第十六页,共二十二页nAdjustments were made by eye when there were obvious alignment errors due to particularly divergent or“gappy”sequences.本讲稿第十七页,共二十二页Data Site:1、The individual gene regions varied in the amount of missing data per site:18SrDNA(6%),26S rDNA(15%),atpB(5%),atp1(1%),matK(13%),matR(3%),nad5(4%),ndhF(20%),psbBTNH(19%),rbcL(4%),rpoC2(21%),rps16(26%),rps3(10%),and rps4(50%).2、Individual gene regions also varied in the number of taxa with data in the combined analyses:18S rDNA(78%),26SrDNA(57%),atpB(88%),atp1(59%),matK(92%),matR(76%),nad5(59%),ndhF(80%),psbBTNH(54%),rbcL(98%),rpoC2(63%),rps16(35%),rps3(62%),and rps4(58%).本讲稿第十八页,共二十二页3、More than 50%missing data were removed with the program Phyutility to avoid regions of potentially problematic ambiguous alignment caused by such broad sampling.本讲稿第十九页,共二十二页analysesRAxML and MP1、Phylogenetic analysesusing maximum likelihood(Felsenstein,1973)were conducted in the program RAxML2、Maximum parsimony(MP)本讲稿第二十页,共二十二页三、结果nEach ML analysis of the 17-and 13-gene data sets took 20 32 h on a 32-core(2.93 gHz Xeon 7350)machine with 128 gb RAM,and analyses of individual genes took 1 11 h.nThe best RAxML trees from analyses of the 17-gene and 13-gene matrices are very similar.nExcept Polyosma,other differences are relatively minor,and in most cases the 17-gene tree gives higher BS support than the 13-gene tree.Hence,nly the 17-gene treeis discussed below.本讲稿第二十一页,共二十二页四、讨论本讲稿第二十二页,共二十二页