Gene Composer database software for protein construct design, codon engineering, and gene synthesis.doc

资源ID：61748744 资源大小：1.26MB 全文页数：22页
资源格式： DOC 下载积分：15金币

快捷下载

会员登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要15金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

Gene Composer database software for protein construct design, codon engineering, and gene synthesis.doc

BMCBiotechnologyBioMedCentralSoftwareOpen AccessGene Composer: database software forprotein construct design,codon engineering, andgene synthesis1,2,3,AmyRaymond ,John Walchli3,4,Mark Mixon3,4,1,2DonLorimerAdrienne Barrow1,3,Ellen Wallace1,3,Rena Grice1,3,Alex Burgin1andLance Stewart*1,2,3Address: 1deCODE biostructures, Inc7869 NEDayRoad West, Bainbridge Island, WA,98110, USA, 2Seattle Structural Genomics Center forInfectious Disease, Bainbridge Island, WA,98110, USA, 3Accelerated Technologies Center for Geneto3DStructure, Bainbridge Island, WA,98110, USAand 4Emerald BioSystems, Inc7869 NEDayRoadWest, Bainbridge Island, WA, 98110, USAE-mail: DonLorimer -dlorimerdecode ; AmyRaymond -araymonddecode ; JohnWalchli -jwalchlidecode ;Mark Mixon -mmixondecode ; Adrienne Barrow -abarrowdecode ; Ellen Wallace -ewallacedecode ;RenaGrice -rgricedecode ; AlexBurgin -aburgindecode ; Lance Stewart* -lstewartdecode *Corresponding authorPublished: 21April 2009Received: 16October 2008Accepted: 21April 2009BMCBiotechnology 2009, 9:36 doi: 10.1186/1472-6750-9-36This article isavailable from: :/ biomedcentral /1472-6750/9/36©2009Lorimer etal;licensee BioMed Central Ltd.ThisisanOpenAccess article distributed undertheterms oftheCreative Commons Attribution License ( :/creativecommons.org/licenses/by/2.0),which permits unrestricted use,distribution, andreproduction inanymedium, provided theoriginal workisproperly cited.AbstractBackground: Toimprove efficiency inhighthroughput protein structure determination, wehavedeveloped a database software package, Gene Composer, which facilitates the information-richdesign of protein constructs and their codon engineered synthetic gene sequences. With itsmodular workflow design and numerous graphical user interfaces, Gene Composer enablesresearchers toperform allcommon bio-informatics steps usedinmodern structure guided proteinengineering and synthetic gene engineering.Results: An interactive Alignment Viewer allows the researcher to simultaneously visualizesequence conservation in the context of known protein secondary structure, ligand contacts, watercontacts,crystalcontacts, B-factors,solventaccessiblearea, residueproperty type and severalotheruseful property views. The Construct Design Module enables the facile design of novel proteinconstructswithalteredN-andC-termini,internalinsertionsordeletions,pointmutations,anddesiredaffinitytags.Themodificationscanbecombinedandpermutedintomultipleproteinconstructs,andthenvirtuallyclonedinsilicointodefinedexpressionvectors.TheGeneDesignModuleusesaprotein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codonengineerednucleicacidgenesequenceaccordingtoaselectedcodonusagetablewithminimalcodonusagethreshold,definedG:C%content,anddesiredsequencefeaturesachievedthroughsynonymouscodonselectionthatisoptimizedfortheintendedexpressionsystem.Thegene-to-oligoalgorithmoftheGeneDesignModuleplansoutalloftherequiredoverlappingoligonucleotidesandmutagenicprimersneededtosynthesizethedesiredgeneconstructsbyPCR,andforphysicallycloningthemintoselectedvectorsbythemostpopularsubcloningstrategies.Conclusion: We present acomplete descriptionof Gene Composer functionality,and an efficientPCR-basedsyntheticgeneassemblyprocedurewithmis-matchspecificendonucleaseerrorcorrectionincombinationwithPIPEcloning.InasistermanuscriptwepresentdataonhowGeneComposerdesignedgenesandproteinconstructscanresultinimprovedproteinproductionforstructuralstudies.Page 1of22(page number notfor citation purposes) BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36Backgroundconstructs, guided by2Dand 3Dinformation, while thecorresponding nucleic acid sequences are engineered forboth codon usage and other desired sequence features.Gene Composer also enables the virtual cloning of thedesigned gene constructs which, depending on userpreferences, can be parsed into data files for onlineordering of complete genes or overlapping oligonucleo-tides that can be used for PCR-based gene assembly inany standard molecular biology lab. Gene Composeroperates within the Windows® operating system andutilizes a network based SQL server or Access® databasethat is populated by users as they design genes. Thisarrangement makes it possible for multiple users to goback after time to design new construct variants thatimprove on existing designs by inclusion of newsequence or structural information from internationalgenome sequencing and structural genomics efforts. Inthis report we describe how the synthetic gene designmodules of Gene Composer facilitate protein constructengineering for structural studies, codon engineering forheterologous protein production, and oligonucleotideplanning for PCR-based gene assembly with mismatchendonuclease error correction.Large-scale projects in genomic sequencing and proteinstructure determination are producing enormous quan-tities of data on the relationships between 2D genesequence and 3D protein structure. Moreover, suchefforts areproviding experimental dataonsuccess factorsat every step in the gene to structure research endeavor.Ideally, this wealth of information should be used in afeedback cycle to facilitate the design and production ofgenes and protein constructs that are optimized for thesuccessful production of functional protein samples forstructural studies. Fundamentally, this goal represents abioinformatics software challenge. With the goal ofimproving yield and success rates of heterologousprotein production for structural studies, we havedeveloped Gene Composer, adatabase software packagewhich facilitates the information-rich design of proteinconstructs and their codon engineered synthetic genesequences.The redundancy of the genetic code allows any givenprotein tobeencoded byaverylargenumber ofpossiblesynonymous gene sequences. On average, each aminoacid can be encoded by approximately three differentcodons (61 amino acid codons/20 amino acids). For atypical 100aminoacidprotein therewouldbe3100(5×1047) different possible coding sequences. The degen-eracyofthegenetic codetherefore allows thepressures ofnatural selection to simultaneously influence both DNAandRNAsequence features inaddition toprotein codingfunction. DNA sequence elements and folded RNAstructures are known to play significant roles in geneexpression. As such, the overlapping information con-tained in a gene sequence can be significantly morecomplex than coding for a linear amino acid sequence.For example in the tryptophan operon of E. coli , themRNA can fold into one of two mutually exclusiveconformations that are a direct consequence of trypto-phan availability 1. These alternate conformationsaffect mRNA stability and therefore alter the expressionof the encoded proteins. It is also well established thatcodon preferences between species, and often betweengene families within a given species, can vary 2,3.Therefore, some gene sequences may behave better thanothers in supporting high-level translation for hetero-logous protein expression. Being able to tailor syntheticgene sequences by codon engineering to favor optimalheterologous expression isawell established strategy forimproving heterologous protein expression forstructuralbiology 4.Implementation and resultsGene ComposerSoftwareGene Composer has a modular design to facilitate thework of protein engineers and structural biologists. Itcombines, within asingle database software product, theability to carry out comparative sequence alignments(Alignment Viewer) that facilitates interactive proteinconstruct design with virtual cloning (Construct DesignModule), followed by codon engineering of novelsynthetic gene sequences that are optimized for proteinexpression invarious recombinant systems (Gene DesignModule).GeneComposer iswritten inC+forWindows®operating systems, and runs together with either anAccess® or SQL® database.Alignment ViewerA typical gene design cycle is initiated when a userdefines a protein "target name" and a "project name"which establishes key database identifiers for which allsubsequent Gene Composer workflow is associated.Once these identifiers have been established, the useris presented with a file navigation interface that allowsone to import information into the Gene Composerdatabase from multiple sources such asFASTA sequencefiles from BLAST 5 searches, existing sequence align-ments, simple text(.txt) files, andstructure filesfromtheProtein Data Bank (PDB, :/ rcsb.org/pdb).From this imported information, Gene Composer usesthe popular ClustalW algorithm 6,7 to calculatecomparative protein sequence alignments, which areGiven the overlapping nature of information content ingene sequences (DNA, RNA, and protein level) weendeavored tocreate adatabase andsoftware tool calledGene Composer which facilitates the design of proteinPage 2of22(page number notfor citation purposes) BMCBiotechnology 2009, 9:36 :/ biomedcentral /1472-6750/9/36presented in a distilled format within the interactiveAlignment Viewer (Figure 1). This Alignment Viewerallows the researcher to simultaneously visualizesequence conservation in the context of known proteinsecondary structure, ligand contacts, water contacts,crystal contacts, residue property type and several otheruseful property views that are used to guide interactivedecision making for protein construct design.Importantly, the native amino acid residue numberingscheme is preserved throughout any alignment manip-ulations. However, since the amino acid sequencenumbering scheme of PDB files is often not necessarilycongruent with the residue numbering of native fulllength gene sequences, users can select agiven sequenceand re-define the starting residue number. In this way,users canarrive atacommon residue numbering schemeto help ensure accuracy in subsequent construct design.Finally, the information rich alignments can besaved inthe Gene Composer database and exported to severalother formats including *.aln, *.xml, and*.pdf forfaciledata sharing between researchers.Areas of sequence conservation are highlighted withinthe Alignment Viewer according to a user defined colorscheme andaconsensus sequence isdisplayed below thealignment. Protein secondary structure information isextracted from PDB files and displayed in commongraphic annotation underneath their associated linearamino acid sequences. Importantly, the AlignmentViewer presents both the "chain" sequence for theprotein that went into crystallization as well as theexperimentally refined "model" sequence from the PDBcoordinate file. This allows the user to easily visualizewhichaminoacidresidues hadnostructural informationreported in the PDB file, displayed as blank gaps in the"model" sequence. Such residues are usually locatedwithin highly flexible regions of the protein and do notcontribute to X-ray diffraction.Protein Construct Design andAutomated CloningTheConstruct Design Module works inconcert with theAlignment Viewer allowing theresearcher tointeractivelydefine novel protein constructs with altered amino- andcarboxy-termini, internal insertions or deletions, pointmutations, and added affinity tags. The construct designtools are connected tothe Alignment Viewer byacursorthat shows the user exactly where in the sequencealignment the desired changes are being made. Forexample, the user can set the cursor within theAlignment Viewer at a domain boundary as visualizedin the comparative sequence alignment and thentruncate the construct at that site. The desired modifica-tions canbevirtually combined andpermuted insilico toarrive at multiple desired protein constructs (Figure 2).The user can also add avariety of adaptor assemblies atthe DNA sequence level to facilitate the virtual andphysical cloning of the constructs into multiple definedexpression vectors (Figure 3). Importantly, the GeneComposer virtual in silico cloning utility manages theinserts, vectors, and adaptor assemblies as three inde-pendent informatics components that are combined bythe user to arrive at final vector clones 11. After thevirtual cloning is completed, the user can inspect theentire vector with its adaptor assemblies and proteinconstruct inserts. In this way, the user can see exactlyhowopenreading frames areconstructed andtheneasilyfix any virtual cloning errors before wet lab work isperformed. Many expression vectors come with theirown N-terminal or C-terminal affinity tags that must beaccurately fused in frame with the protein construct.Visual inspection of the virtual clone ensures that theopen reading frame formed by the vector/adaptor/insertcombination is intact and accurate.Users can define a threshold contact distance setting(default setting is3.4Angstroms) which GeneComposeruses to generate a simple distance matrix between non-bonded, non-hydrogen atom centers in PDB files. Theresulting matrix is used to flag residues in the proteinmodel thatparticipate inligand contacts, water contacts,and/or crystal contacts. Each contact type is annotatedwithin theAlignment Viewer with special visual symbolsdisplayed belowtheresidue ofinterest (Figure 1).Crystalcontacts are indicated when non-hydrogen atoms of aresidue are positioned within 4.0 Angstroms of neigh-boring molecule related by crystallographic or non-crystallographic symmetry. Gene Composer has a data-base of all protein crystal space groups required for thecrystal contact calculation. The ability to visualizeresidues involved in crystal contacts helps the user toidentify residues that could be mutated to improvecrystal growth 8. Gene Composer also calculates fromPDB file information the relative solvent accessibleConnolly surface area 9 and thermal B-factors forresidues which are displayed with relative color inten-sities to provide a visual representation of the surfacelocation andmobility ofamino acids. This facilitates thevisual identification of surface residues that are candi-dates forsurface entropy reduction mutagenesis which isacommonly used toaidprotein crystallization 10. Thealignments can also be easily modified and annotatedwith the aid of an interactive cursor that allows the userto insert or delete sequences, residues, or spaces.In order to automate the physical cloning of designedconstructs us

注意事项

本文（Gene Composer database software for protein construct design, codon engineering, and gene synthesis.doc）为本站会员（e****s）主动上传，淘文阁 - 分享文档赚钱的网站仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知淘文阁 - 分享文档赚钱的网站（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。