书签分享收藏举报版权申诉 / 27

立即下载

当前位置：首页 > 技术资料 > 技术方案 > Supplementary Dr. S. Mari完整原版文件.docx

Supplementary Dr. S. Mari完整原版文件.docx

上传人：暗伤

文档编号：100751790

上传时间：2024-09-27

格式：DOCX

页数：27

大小：1.33MB

( 4.5 )

《Supplementary Dr. S. Mari完整原版文件.docx》由会员分享，可在线阅读，更多相关《Supplementary Dr. S. Mari完整原版文件.docx（27页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、Supplementary MaterialCurrent Metabolomics, 2013, Vol. 1, No. 2 viiSupplementary MaterialMetabolomic Univariate & Multivariate Analysis (muma)TUTORIALTABLE OF CONTENTSmuma overview3Functions list3Download and Installation4Dataset format5Analysis procedure51| Create the working directory62| Start the

2、 analysis63| Principal Component Analysis Score and Loading plots114| Univariate Analysis135| Merge univariate and multivariate information176| Partial Least Square Discriminant Analysis (PLS-DA)197| Orthogonal Projection to Latent Structures - Discriminant Analysis (OPLS-DA)208| Tools for NMR molec

3、ular assignment and data interpretation21A| Statistical TOtal Correlation SpectroscopY (STOCSY)21B| STOCSY 1D23C| Orthogonal Signal Correction (OSC) STOCSY24D| Ratio Analysis NMR SpectroscopY (RANSY)26References27muma overviewmuma is a tool for the multivariate and univariate statistical analysis of

4、 metabolomic data, written in the form of add-on package for the open source software R. By creating this statistical protocol we wanted to provide guidelines for the whole process of metabolomic data interpretation, from data pre- processing, to dataset exploration and visualization, to identificat

5、ion of potentially interesting variables (or metabolites). For doing so, we implemented the steps that are typically used in metabolomic analyses and created some new tips that can facilitate users work. In fact, muma is designed for those people who are not R experts, but want to perform statistica

6、l analysis in a very short time and with reliable results.Even though muma has been designed for the analysis of metabolomic data generated with different analytical platforms (NMR, MS, NIR.), it provides specific methods for helping the NMR-based metabolomics. In particular, muma is equipped with t

7、wo tools (STOCSY and RANSY) aiding the identification and assignment of molecules present in NMR spectra, or suggesting possible biochemical interaction between different molecules.In this tutorial we provide a workflow for metabolomics data interpretation using muma, describing from the installatio

8、n to the specific usage of all mumas functions, to the recovery of all results generated. Enjoy.Functions listwork.dir()Generate a working directory within which all the files generated are stored.explore.data() Perform data pre-processing (normalization, scaling) and data exploration, through PCA.P

9、lot.pca()Plot the PCA Score and Loading plots for specified principal components.plsda()Perform PLS-DA.univariate()Perform an array of univariate statistical techniques.Plot.plsda()PlotthePLS-DAScoreandw*cplots,forspecified components.oplsda()Perform OPLS-DA.stocsy()Perform STOCSY analysis.stocsy.1d

10、()Perform monodimensional STOCSY analysis. ostocsy()Perform STOCSY analysis on the OSC-filtered dataset. ransy()Perform RANSY analysis.Download and InstallationFirst of all download R (version 2.15 or higher) from the CRAN (www.r- project.org), according to your operating system (Unix, MacOS or Wind

11、ows). Install R as indicated in the R manual.You can open R with its graphic interface or from command line: shell (Unix), Terminal (MacOS) or DOS (Windows).After you have installed and launched R, you can install the package muma, as described in Figure 1. You can install muma by typing the command

12、 install.packages(muma)and by chosing your CRAN mirror from the browser (Figure 1).FIGURE 1It could happen that installation fails, due to diverse R software versions. In this case, it should be sufficient to install the following packages, prior the installation of muma:install.packages(mvtnorm) in

13、stall.packages(robustbase) install.packages(gtools) install.packages(bitops) install.packages(caTools)and then run the commandinstall.packages(muma)Once muma is installed you can load the package by typing library(muma).Dataset formatData table of interest has to be submitted in .csv format and with

14、 a specific form, as indicated in Figure 2.FIGURE 2In particular:- the first column indicates the names of every samples (NOTE: these must be different from each other, even if samples belong to the same class; moreover, for an optimal graphical visualization, short names (4-5 characters) are recomm

15、ended);- the second column indicates the “Class” of each sample, with an integer, positive value, starting from 1.- From the third column to the column N are reported data values of each sample, for each variable.- The first row will be considered as header; within this row variables names must be p

16、rovided, everyone different from each other.The dataset in Figure 2 is provided with this tutorial and derives from a metabolomics analysis of B cell cultures untreated or after one, two, three and four days of LPS treatment (Garcia-Manteiga et al, 2011). As it can be observed from Figure 2, the “Cl

17、ass” column is filled according to the day of treatment.Analysis procedureFor starting the analysis move to the directory in which you have stored your data table, by selecting the option Change Working Directory from the menuMisc of the R Console. If you are not using the R console, but you decided

18、 to launch R from command line, just navigate to the directory in which your data table is stored, then launch R with the command R.1| Create the working directoryBefore starting the analysis it is recommended to create a new directory that will become the working directory from now on. This is reco

19、mmended because muma generates diverse files and directories, that could be useful to store in a unique directory. All the results created from mumas analyses will be stored here. You can use the function work.dir(dir.name=WorkDir)to create a new working directory, as indicated in Figure 3.FIGURE 3A

20、s it can be observed a directory called “WorkDir” has been created. All the files present in the first directory are copied in the new generated one. Automatically, this drectory become the current working directory.2| Start the analysisThe first step in muma analysis can be performed with the funct

21、ion explore.data(), which provides data pre-processing and dataset exploration. Figure 4 indicates a usage of such function. In particular it can be used in the following way: explore.data(file=YourFile.csv, scaling=ScalingType, scal=TRUE, normalize=TRUE, imputation=FALSE, imput=ImputType)This funct

22、ion generates three new directories:- “Groups”, in which are stored the samples of each group as identified by the “Class” column in the data table;- “PCA_Data_scalingused”, in which are stored the principal component analysis files as the matrices of score and loading values, as well as all the plo

23、ts and graphics PCA-related. Note: this directory is given different names according to the scaling used.;- “Preprocessing_Data_scalingused”, in which are stored all the files used for preprocessing the dataset, as the normalized and scaled tables. Note: this directory is given different names accor

24、ding to the scaling used.FIGURE 4A| In particular, this function reads the data table and converts all the negative values to 0 values, because metabolomics measurements resulting with negative values are considered noise or errors, hence are brought to a null baseline. A table called “NegativeValue

25、s.out” and reporting the negative values found is written and saved in the directory “Preprocessing_Data_scalingused”.There is the posibility to imput a data table with missing values. The field “imputation” is FALSE by default, but turning it to TRUE allows the substitution of missing values with a

26、 specified option. There are three options for imputation and they can be specified in the field “imput”:- mean : missing values are imputed with the average value of the other obsevations;- minimum: missing values are imputed with the minimum value, among the other observations;- half.minimum: miss

27、ing values are imputed with the half of the minimum value, among the other observations;- zero: missing values are imputed with a zero value.Reports on which values are imputed are printed to screen and a file called “ImputedMatrix.csv” and reporting the matrix with imputed values is written and sav

28、ed in the directory “Preprocessing_Data_scalingused”.Moreover, a control on the proportion of missing values for each variable has been implemented; when a variable shows a proportion of missing values higherSupplementary MaterialCurrent Metabolomics, 2013, Vol. 1, No. 2 xithan 80%, that variable is

29、 eliminated, as considered not informative. Warnings about the eliminated variables are reported at the end of the function, indicating the variables that have been eliminated.B| Then the function performs normalization of each sample on total spectrum: this is achieved by calculating the sum of all

30、 variables within a spectrum and by normalizing each spectrum on such value; in this way every single variable is transformed as a fraction of the total spectral area or intensity. A table called ProcessedTable.csv reporting the normalized values is written and saved in the directory “Preprocessing_

31、Data_scalingused”.As this process can influence the outcome of following analyses, normalization can be avoided by turning the field “normalize” to FALSE. This has been implemented for those data tables that are already normalized or that do not require normalization.The function performs automatic

32、centering and scaling of each variable, according to the scaling type specified. There are five scaling options that can be chosen by the user:- pareto scaling- auto scaling- vast scaling- range scaling- median scalingAll these options are not case sensitive, therefore you can use, for example, eith

33、er “Pareto” or “pareto” as well as “P” or “p”: explore.data(file=MetaBc.csv, scaling=pareto)or explore.data(file=MetaBc.csv, scaling=p)A table called “ProcessedTable.csv” and reporting the scaled values is written and saved in the directory “Preprocessing_Data_scalingused”.As for normalization, the

34、scaling step can influence subsequent analyses, therefore it can be avoided by turning the field scal to FALSE. Of course in this case the field scaling can be skipped. THEORY Centering and scalingTheory sections provided within this tutorial are meant to introduce the user to the theory behind data

35、 treatment techniques that are proposed. Far from claiming to be a thorough description of the statistics applied here, these sections aim to provide information about the basic concepts mumas tools rely on and to help the user exploiting those tools in the best suiting way.CENTERING: converts the v

36、ariables from fluctuations around the mean into fluctuations around the zero. It flattens out the differences between high and low abundant metabolites.Disadvantages: may not be sufficient with heteroscedastic data.ScalingsAUTOSCALING: also called Unit Variance Scaling, it uses the standard deviatio

37、n as the scaling factor. After this procedure, each variable has standard deviation of one and becomes equally important.Disadvantages: inflation of the measurement errors; when applied before PCA, can make the interpretation of the loading plots difficult, as a large amount of metabolites will have

38、 high loading values.PARETO: similar to autoscaling, but the square root of the standard deviation is used as scaling factor. Highly variating metabolites are decreased more than low variating ones. The data stays closer to the original measurement, than with autoscaling.Disadvantages: sensitive to

39、large fold changes.VAST:an extension of autoscaling, it focuses on the stable variables i.e. those variables that change less. It uses the standard deviation and the coefficient of variation as scaling factor. Metabolites with a small relative standard deviation are more important.Disadvantages: not

40、 suited for large induced variation without group structure.RANGE: it uses the value range as the scaling factor. Metabolites are compared according to the induced biological response.Disadvantages: inflation of the measurement errors and sensitive to outliers.MEDIAN: also called central tendency sc

41、aling, this operation makes the median of each sample equivalent. This scaling is used when only few metabolites are expected to change, but there may be non- biological sample-dependent factors influencing data interpretation. Disadvantages: low reliability with dataset having a high proportion of

42、responding variables.For a more complete introduction to scalings and other data pretreatment techniques used in metabolomics please refer to (van den Berg et al, 2006).C| Principal Component Analysis (PCA) is performed on normalized/scaled table and it returns the score plots of each pairwise compa

43、rison of the first ten principal components (when the number of principal components = 10, otherwise all comparisons of components are plotted) (Figure 5, right panel). Together with this a screeplot (Figure 5, left panel) is created, in order to provide the user with information about the importanc

44、e of each principal component.These plots are visualized to screen and automatically saved in the directory “PCA_Data_scalingused”, with the name “First_10_Components_scalingused” and “Screeplot_scalingused”, respectively.Proportion of Variance explained020406080100ScreeplotPrincipal ComponentsFIGUR

45、E 5D| In order to help the user to chose the “best” pair of principal components to visualize, a specific tool has been implemented which calculates the statistical significance of cluster separation (Goodpaster et al, 2011) obtained with each pair of principal components. In other words, groups wil

46、l be more or less separated from each other according to different pair of principal components and this cluster separation is tested for its statistical significance. A rank of the first five best-separating principal components is printed to screen, reporting the number of the principal components

47、, the calculated p-value from the F statistics and the proportion of variance explained by each pairs of components (Figure 6). The p- value shown is the sum of all p-values fro each cluster separation statistics: as lower is the p-value, better is the separation ability. Thank to this ranking, one

48、can chose the best pair of PCs according to both their “separation capacity” and the proportion of variance explained.Two files deriving from this technique are saved in the directory “PCA_Data_scalingused”, one listing the F statistics values for each pair of components and named “PCs_Fstatistic.out”, and one ranking all the co

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

20 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Supplementary Dr. S. Mari完整原版文件 Dr Mari 完整原版文件

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：Supplementary Dr. S. Mari完整原版文件.docx
链接地址：https://www.taowenge.com/p-100751790.html