欢迎来到淘文阁 - 分享文档赚钱的网站! | 帮助中心 好文档才是您的得力助手!
淘文阁 - 分享文档赚钱的网站
全部分类
  • 研究报告>
  • 管理文献>
  • 标准材料>
  • 技术资料>
  • 教育专区>
  • 应用文书>
  • 生活休闲>
  • 考试试题>
  • pptx模板>
  • 工商注册>
  • 期刊短文>
  • 图片设计>
  • ImageVerifierCode 换一换

    生物物理学习PPT教案.pptx

    • 资源ID:17440902       资源大小:2.74MB        全文页数:61页
    • 资源格式: PPTX        下载积分:12金币
    快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录   QQ登录  
    二维码
    微信扫一扫登录
    下载资源需要12金币
    邮箱/手机:
    温馨提示:
    快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如填写123,账号就是123,密码也是123。
    支付方式: 支付宝    微信支付   
    验证码:   换一换

     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    生物物理学习PPT教案.pptx

    Gene Prediction Ideal caseReal worldWhat is a gene?Wilhelm Johannsens definition of a gene :The word gene was first used by Wilhelm Johannsen in 1909, based on the concept developed by Gregor Mondel in 1866. “The special conditions, foundations and determiners which are present in the gametes (配子) in unique, separate and thereby independent ways by which many characteristics of the organism are specified.” Johannsen, W. (1909) Biol. Philos. 4: 303-329.What is a gene? A gene is the basic physical and functional unit of heredity. Genes, which are made up of DNA, act as instructions to make molecules called proteins. Old concept: A gene is a locus (or region) of DNA that encodes a functional protein or RNA product, and is the molecular unit of heredity.New definition: Gene PredictionGene prediction: To identify all genes in a genomeatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctGeneGene prediction is the basic for functional studiesFinding all genes in a genome could be hardFinding all the genes is hard- Mammalian genomes are large 8000 km of 10 bp type- Only about 1% coding proteins- Non-coding RNAs are more difficult to be predictedThe structure of prokaryotic (原核生物的) genesPromoter structure of prokaryotic (原核生物的) genesThe structure of eukaryotic (真核生物的) genesThe structure of eukaryotic (真核生物的) genesOpen Reading Frames (ORFs)Protein coding gene prediction is to detect potential coding regions by looking for ORFsSignals defining ORFs in eukaryotic genes:- Start codon: ATG- Stop codons: TAG, TGA, TAA- Splicing donor sites: usually GT- Splicing acceptor sites: usually AGUTRs are usually defined according to expression evidenceTypes of exonsSix Frames in a DNA SequenceDNA replication occurs in the 5 -to-3 direction Six Frames in a DNA SequenceSix Frames in a DNA SequenceCodon usage selection in translationCodon usage selection in translationCodon usage in mouse genomeUneven usage of codons may characterize a real gene! Eukaryotic ORF prediction Signals defining ORFs in eukaryotic genes:- Start codon: ATG- Stop codons: TAG, TGA, TAA- Splicing donor sites: usually GT- Splicing acceptor sites: usually AG- Coding frame- Codon usageGene syntax rulesThe common gene syntax rules for forward-strand genes:Conceptual gene finding frameworkConceptual gene finding frameworkMethods for Eukaryotic Gene Prediction1. Ab initio method: - Only use genomic sequences as input - GENSCAN (Burge 1997; Burge and Karlin 1997) - Fgenesh (Solovyev and Salamov 1997) - Capable to predict novel genes 2. Transcript-alignment-based method: - Use cDNA, mRNA or protein similarity as major clues - ENSEMBL (Birney et al. 2004) - High accuracy - Can only find genes with transcription evidence 3. Hybrid method: - Integrate EST, cDNA, mRNA or protein alignments into ab initio method - Fgenesh+ (Solovyev and Salamov 1997) - AUFUSRUS+ (Stanke, Schoffmann et al. 2006)Methods for Eukaryotic Gene Prediction4. Comparative-genomics-based method: - Assume coding regions are more conserved Genome 1Genome 2Methods for Eukaryotic Gene Prediction4. Comparative-genomics-based method: - Assume coding regions are more conserved - Capable to predict novel genes and non-protein coding genes - Can use transcript data to improve prediction accuracy - TWINCAN and N-SCAN (do not use transcript similarity) - TWINCAN-EST and N-SCAN-EST (use transcript similarity)Problems: - Performance depends on the evolutionary distance between the compared sequences- Exon/intron boundaries may not be conservedAbout the ab initio gene prediction methodsDifficult to handle the following cases:- Nested/overlapped genes- Polycistronic genes- Alternative splicing- Frame-shift errors- Split start codons- Non-ATG triplet as the start codon- Extremely short exons- Extremely long introns- Non-canonical introns- UTR intronsHidden Markov Model is a commonly used algorithm for gene predictionHidden Markov Model (HMM) Markov Property Markov Chain Markov Model Hidden Markov ModelMarkov PropertyMarkov Property is simply that given the present state, future states are independent of the past Stochastic processes are generally considered as the collections of random variables, thus have Markov PropertyMarkov ChainMarkov Chain is a system that we can use to predict the future given the presentIn the Markov Chain, the present state only depends on two things: - Previous state - Probability of moving from previous state to present stateMarkov ChainTo estimate the status of studentsMarkov ChainSuppose graduate students have two types of moods: - Happy - Depressed about researchEach type of students has its own Markov chainFinally, there are three locations we can find the students: - Lab - Canteen - DormMarkov ChainMarkov Chain of happy studentsLabCanteenDormMarkov ChainMarkov Chain of depressed studentsLabCanteenDormMarkov Chain ProbabilityThe probability of observing a given sequence is equal to the product (乘积) of all observed transition probabilities. P (Canteen - Dorm - Lab) = P (Canteen) P(Dorm|Canteen) P(Lab|Dorm) P (Canteen - Lab) = P (Canteen) P(Lab|Canteen) Markov ModelA Markov model is a stochastic model used to model randomly changing system where it is assumed that the future states depends only on the present state. LabCanteenDorm LabCanteenDormDormCanteenLab Hidden Markov ModelNow we have the general information about the relationship between the student mood and location - Mood is HiddenIf we simply observe the locations of a student, can we tell what mood he is in? - Observations are the locations of the students- Parameters of the model are the probabilities of a student being in a particular locationHidden Markov Model (HMM)Observations: Observations: LLLC LLLCD DCLLCLLDDDDLLCLLCD DL LDDDDC CDDDDDDDDLCLLLCCLLCLLLCCLHidden state: Hidden state: HHHHHHHHHHHHDDDDDDDDDHHHHHH HHHHHHHHHHHHDDDDDDDDDHHHHHHUsing HMM to estimate student moodLab0.75Dorm0.05Lab0.4Canteen0.2Dorm0.4 Canteen0.2Hidden Markov Model (HMM)Application of HMM in gene predictionWhat do we want? Why are HMMs a good fit for gene prediction? - DNA sequences are in order which is necessary for HMMs - Enough training data for what is a gene and what is not a gene- To find coding and non-coding regions from an unlabeled string of DNA sequencesHMMs need to be trained to be truly effectiveHMMs for gene predictionHMMs for gene predictionCautions about HMMsNeed to be mindful of overfittingHMMs can be slow (needs proper decoding)- DNA sequences can be very long thus processing them can be very time consumingStates are supposed to be independent of each other and this is NOT always true! - Need a good training set- More training data does not always mean a better model Protein-coding genes have specific evolutionary constraints- Gaps between homologous genes are multiples of three (preserve amino acid translation)- Mutations are mostly at synonymous positions- Conservation boundaries are sharp (pinpoint individual splicing signals)Features for protein coding genesDmel TGTTCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDsec TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDsim TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDyak TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGCCTTCTACCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDere TGTCCATAAATAAA-TTTACAACAGTTAGCTG-CTTAGCCATGCGGAGTGCCTCCTGCCATTGCCGTGCGGGCGAGCATGT-GGCTCCAGCATCTTTDana TGTCCATAAATAAA-TCTACAACATTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGACCGTTCATG-CGGCCGTGA-GGCTCCATCATCTTADpse TGTCCATAAATGAA-TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGGCGCCGTCCGTTCCCGTGCATACGCCCGTGG-GGCTCCATCATTTTCDper TGTCCATAAATGAA-TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGCCGCCGTCCGTTCCCGTGCATACGCCCGTGG-GGCTCCATTATTTTCDwil TGTTCATAAATGAA-TTTACAACACTTAACTGAGTTAGCCAAGCCGAGTGCCGCCGGCCATTAGTATGCAAACGACCATGG-GGTTCCATTATCTTCDmoj TGATTATAAACGTAATGCTTTTATAACAATTAGCTG-GTTAGCCAAGCCGAGTGGCGCC-TGCCGTGCGTACGCCCCTGTCCCGGCTCCATCAGCTTTDvir TGTTTATAAAATTAATTCTTTTAAAACAATTAGCTG-GTTAGCCAGGCGGAATGGCGCC-GTCCGTGCGTGCGGCTCTGGCCCGGCTCCATCAGCTTCDgri TGTCTATAAAAATAATTCTTTTATGACACTTAACTG-ATTAGCCAGGCAGAGTGTCGCC-TGCCATGGGCACGACCCTGGCCGGGTTCCATCAGCTTT * * * * * * * * * * * * * * * * * * * * * * SpliceREALITYPREDICTIONExon LevelWRONGEXONCORRECTEXONMISSINGEXONSn =Sensitivity(灵敏度灵敏度)number of correct exonsnumber of actual exonsSp =Specificity(特异性特异性)number of correct exonsnumber of predicted exonsMeasure of prediction accuracyTNFPFNTNTNTPFNTPFNREALITYPREDICTIONPREDICTIONREALITYTPFNTNFPccncncSn = TP / (TP + FN)Sp = TP / (TP + FP)SensitivitySpecificityNucleotide LevelMeasure of prediction accuracyC: correct; nc: incorrect; TP: true positive; FP: false positive; FN: false negative; TN: true negativeGene prediction softwareExample of gene findersExample of gene findersExample of gene findersAccuracy of Gene Prediction Gene prediction is easier in microbial genomesWhy? Smaller genomesSimpler gene structuresMore sequenced genomes! (for comparative approaches)Methods? Previously, mostly HMM-based Now: similarity-based methodsbecause so many genomes are availableGene prediction in prokaryotesSummaryNothing is perfectEach gene identification approach has its own features and limitationsGenome annotation is an on-going process, and the accuracy is bring improved along with the improvement of methods and accumulation of the evidence data The structure of prokaryotic (原核生物的) genesOpen Reading Frames (ORFs)Protein coding gene prediction is to detect potential coding regions by looking for ORFsSignals defining ORFs in eukaryotic genes:- Start codon: ATG- Stop codons: TAG, TGA, TAA- Splicing donor sites: usually GT- Splicing acceptor sites: usually AGUTRs are usually defined according to expression evidenceSix Frames in a DNA SequenceDNA replication occurs in the 5 -to-3 direction Markov ChainMarkov Chain of depressed studentsLabCanteenDormUsing HMM to estimate student moodLab0.75Dorm0.05Lab0.4Canteen0.2Dorm0.4 Canteen0.2Hidden Markov Model (HMM) Protein-coding genes have specific evolutionary constraints- Gaps between homologous genes are multiples of three (preserve amino acid translation)- Mutations are mostly at synonymous positions- Conservation boundaries are sharp (pinpoint individual splicing signals)Features for protein coding genesDmel TGTTCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDsec TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDsim TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGCCCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDyak TGTCCATAAATAAA-TTTACAACAGTTAGCTG-GTTAGCCAGGCGGAGTGCCTTCTACCATTACCGTGCGGACGAGCATGT-GGCTCCAGCATCTTCDere TGTCCATAAATAAA-TTTACAACAGTTAGCTG-CTTAGCCATGCGGAGTGCCTCCTGCCATTGCCGTGCGGGCGAGCATGT-GGCTCCAGCATCTTTDana TGTCCATAAATAAA-TCTACAACATTTAGCTG-GTTAGCCAGGCGGAGTGTCTGCGACCGTTCATG-CGGCCGTGA-GGCTCCATCATCTTADpse TGTCCATAAATGAA-TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGGCGCCGTCCGTTCCCGTGCATACGCCCGTGG-GGCTCCATCATTTTCDper TGTCCATAAATGAA-TTTACAACATTTAGCTG-CTTAGCCAGGCGGAATGCCGCCGTCCGTTCCCGTGCATACGCCCGTGG-GGCTCCATTATTTTCDwil TGTTCATAAATGAA-TTTACAACACTTAACTGAGTTAGCCAAGCCGAGTGCCGCCGGCCATTAGTATGCAAACGACCATGG-GGTTCCATTATCTTCDmoj TGATTATAAACGTAATGCTTTTATAACAATTAGCTG-GTTAGCCAAGCCGAGTGGCGCC-TGCCGTGCGTACGCCCCTGTCCCGGCTCCATCAGCTTTDvir TGTTTATAAAATTAATTCTTTTAAAACAATTAGCTG-GTTAGCCAGGCGGAATGGCGCC-GTCCGTGCGTGCGGCTCTGGCCCGGCTCCATCAGCTTCDgri TGTCTATAAAAATAATTCTTTTATGACACTTAACTG-ATTAGCCAGGCAGAGTGTCGCC-TGCCATGGGCACGACCCTGGCCGGGTTCCATCAGCTTT * * * * * * * * * * * * * * * * * * * * * * SpliceGene prediction softwareExample of gene finders

    注意事项

    本文(生物物理学习PPT教案.pptx)为本站会员(知****量)主动上传,淘文阁 - 分享文档赚钱的网站仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知淘文阁 - 分享文档赚钱的网站(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于淘文阁 - 版权申诉 - 用户使用规则 - 积分规则 - 联系我们

    本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

    工信部备案号:黑ICP备15003705号 © 2020-2023 www.taowenge.com 淘文阁 

    收起
    展开