Supplementary Dr. S. Mari完整原版文件.docx

资源ID：100751790 资源大小：1.33MB 全文页数：27页
资源格式： DOCX 下载积分：20金币

快捷下载

会员登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要20金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

Supplementary Dr. S. Mari完整原版文件.docx

Supplementary MaterialCurrent Metabolomics, 2013, Vol. 1, No. 2 viiSupplementary MaterialMetabolomic Univariate & Multivariate Analysis (muma)TUTORIALTABLE OF CONTENTSmuma overview3Functions list3Download and Installation4Dataset format5Analysis procedure51| Create the working directory62| Start the analysis63| Principal Component Analysis Score and Loading plots114| Univariate Analysis135| Merge univariate and multivariate information176| Partial Least Square Discriminant Analysis (PLS-DA)197| Orthogonal Projection to Latent Structures - Discriminant Analysis (OPLS-DA)208| Tools for NMR molecular assignment and data interpretation21A| Statistical TOtal Correlation SpectroscopY (STOCSY)21B| STOCSY 1D23C| Orthogonal Signal Correction (OSC) STOCSY24D| Ratio Analysis NMR SpectroscopY (RANSY)26References27muma overviewmuma is a tool for the multivariate and univariate statistical analysis of metabolomic data, written in the form of add-on package for the open source software R. By creating this statistical protocol we wanted to provide guidelines for the whole process of metabolomic data interpretation, from data pre- processing, to dataset exploration and visualization, to identification of potentially interesting variables (or metabolites). For doing so, we implemented the steps that are typically used in metabolomic analyses and created some new tips that can facilitate users work. In fact, muma is designed for those people who are not R experts, but want to perform statistical analysis in a very short time and with reliable results.Even though muma has been designed for the analysis of metabolomic data generated with different analytical platforms (NMR, MS, NIR.), it provides specific methods for helping the NMR-based metabolomics. In particular, muma is equipped with two tools (STOCSY and RANSY) aiding the identification and assignment of molecules present in NMR spectra, or suggesting possible biochemical interaction between different molecules.In this tutorial we provide a workflow for metabolomics data interpretation using muma, describing from the installation to the specific usage of all mumas functions, to the recovery of all results generated. Enjoy.Functions listwork.dir()Generate a working directory within which all the files generated are stored.explore.data() Perform data pre-processing (normalization, scaling) and data exploration, through PCA.Plot.pca()Plot the PCA Score and Loading plots for specified principal components.plsda()Perform PLS-DA.univariate()Perform an array of univariate statistical techniques.Plot.plsda()PlotthePLS-DAScoreandw*cplots,forspecified components.oplsda()Perform OPLS-DA.stocsy()Perform STOCSY analysis.stocsy.1d()Perform monodimensional STOCSY analysis. ostocsy()Perform STOCSY analysis on the OSC-filtered dataset. ransy()Perform RANSY analysis.Download and InstallationFirst of all download R (version 2.15 or higher) from the CRAN (www.r- project.org), according to your operating system (Unix, MacOS or Windows). Install R as indicated in the R manual.You can open R with its graphic interface or from command line: shell (Unix), Terminal (MacOS) or DOS (Windows).After you have installed and launched R, you can install the package muma, as described in Figure 1. You can install muma by typing the command> install.packages("muma")and by chosing your CRAN mirror from the browser (Figure 1).FIGURE 1It could happen that installation fails, due to diverse R software versions. In this case, it should be sufficient to install the following packages, prior the installation of muma:install.packages("mvtnorm") install.packages("robustbase") install.packages("gtools") install.packages("bitops") install.packages("caTools")and then run the commandinstall.packages("muma")Once muma is installed you can load the package by typing library(muma).Dataset formatData table of interest has to be submitted in .csv format and with a specific form, as indicated in Figure 2.FIGURE 2In particular:- the first column indicates the names of every samples (NOTE: these must be different from each other, even if samples belong to the same class; moreover, for an optimal graphical visualization, short names (4-5 characters) are recommended);- the second column indicates the “Class” of each sample, with an integer, positive value, starting from 1.- From the third column to the column N are reported data values of each sample, for each variable.- The first row will be considered as header; within this row variables names must be provided, everyone different from each other.The dataset in Figure 2 is provided with this tutorial and derives from a metabolomics analysis of B cell cultures untreated or after one, two, three and four days of LPS treatment (Garcia-Manteiga et al, 2011). As it can be observed from Figure 2, the “Class” column is filled according to the day of treatment.Analysis procedureFor starting the analysis move to the directory in which you have stored your data table, by selecting the option Change Working Directory from the menuMisc of the R Console. If you are not using the R console, but you decided to launch R from command line, just navigate to the directory in which your data table is stored, then launch R with the command R.1| Create the working directoryBefore starting the analysis it is recommended to create a new directory that will become the working directory from now on. This is recommended because muma generates diverse files and directories, that could be useful to store in a unique directory. All the results created from mumas analyses will be stored here. You can use the function> work.dir(dir.name="WorkDir")to create a new working directory, as indicated in Figure 3.FIGURE 3As it can be observed a directory called “WorkDir” has been created. All the files present in the first directory are copied in the new generated one. Automatically, this drectory become the current working directory.2| Start the analysisThe first step in muma analysis can be performed with the function explore.data(), which provides data pre-processing and dataset exploration. Figure 4 indicates a usage of such function. In particular it can be used in the following way:> explore.data(file="YourFile.csv", scaling="ScalingType", scal=TRUE, normalize=TRUE, imputation=FALSE, imput="ImputType")This function generates three new directories:- “Groups”, in which are stored the samples of each group as identified by the “Class” column in the data table;- “PCA_Data_scalingused”, in which are stored the principal component analysis files as the matrices of score and loading values, as well as all the plots and graphics PCA-related. Note: this directory is given different names according to the scaling used.;- “Preprocessing_Data_scalingused”, in which are stored all the files used for preprocessing the dataset, as the normalized and scaled tables. Note: this directory is given different names according to the scaling used.FIGURE 4A| In particular, this function reads the data table and converts all the negative values to 0 values, because metabolomics measurements resulting with negative values are considered noise or errors, hence are brought to a null baseline. A table called “NegativeValues.out” and reporting the negative values found is written and saved in the directory “Preprocessing_Data_scalingused”.There is the posibility to imput a data table with missing values. The field “imputation” is FALSE by default, but turning it to TRUE allows the substitution of missing values with a specified option. There are three options for imputation and they can be specified in the field “imput”:- mean : missing values are imputed with the average value of the other obsevations;- minimum: missing values are imputed with the minimum value, among the other observations;- half.minimum: missing values are imputed with the half of the minimum value, among the other observations;- zero: missing values are imputed with a zero value.Reports on which values are imputed are printed to screen and a file called “ImputedMatrix.csv” and reporting the matrix with imputed values is written and saved in the directory “Preprocessing_Data_scalingused”.Moreover, a control on the proportion of missing values for each variable has been implemented; when a variable shows a proportion of missing values higherSupplementary MaterialCurrent Metabolomics, 2013, Vol. 1, No. 2 xithan 80%, that variable is eliminated, as considered not informative. Warnings about the eliminated variables are reported at the end of the function, indicating the variables that have been eliminated.B| Then the function performs normalization of each sample on total spectrum: this is achieved by calculating the sum of all variables within a spectrum and by normalizing each spectrum on such value; in this way every single variable is transformed as a fraction of the total spectral area or intensity. A table called ProcessedTable.csv reporting the normalized values is written and saved in the directory “Preprocessing_Data_scalingused”.As this process can influence the outcome of following analyses, normalization can be avoided by turning the field “normalize” to FALSE. This has been implemented for those data tables that are already normalized or that do not require normalization.The function performs automatic centering and scaling of each variable, according to the scaling type specified. There are five scaling options that can be chosen by the user:- pareto scaling- auto scaling- vast scaling- range scaling- median scalingAll these options are not case sensitive, therefore you can use, for example, either “Pareto” or “pareto” as well as “P” or “p”:> explore.data(file="MetaBc.csv", scaling="pareto")or> explore.data(file="MetaBc.csv", scaling="p")A table called “ProcessedTable.csv” and reporting the scaled values is written and saved in the directory “Preprocessing_Data_scalingused”.As for normalization, the scaling step can influence subsequent analyses, therefore it can be avoided by turning the field scal to FALSE. Of course in this case the field scaling can be skipped. THEORY Centering and scalingTheory sections provided within this tutorial are meant to introduce the user to the theory behind data treatment techniques that are proposed. Far from claiming to be a thorough description of the statistics applied here, these sections aim to provide information about the basic concepts mumas tools rely on and to help the user exploiting those tools in the best suiting way.CENTERING: converts the variables from fluctuations around the mean into fluctuations around the zero. It flattens out the differences between high and low abundant metabolites.Disadvantages: may not be sufficient with heteroscedastic data.ScalingsAUTOSCALING: also called Unit Variance Scaling, it uses the standard deviation as the scaling factor. After this procedure, each variable has standard deviation of one and becomes equally important.Disadvantages: inflation of the measurement errors; when applied before PCA, can make the interpretation of the loading plots difficult, as a large amount of metabolites will have high loading values.PARETO: similar to autoscaling, but the square root of the standard deviation is used as scaling factor. Highly variating metabolites are decreased more than low variating ones. The data stays closer to the original measurement, than with autoscaling.Disadvantages: sensitive to large fold changes.VAST:an extension of autoscaling, it focuses on the stable variables i.e. those variables that change less. It uses the standard deviation and the coefficient of variation as scaling factor. Metabolites with a small relative standard deviation are more important.Disadvantages: not suited for large induced variation without group structure.RANGE: it uses the value range as the scaling factor. Metabolites are compared according to the induced biological response.Disadvantages: inflation of the measurement errors and sensitive to outliers.MEDIAN: also called central tendency scaling, this operation makes the median of each sample equivalent. This scaling is used when only few metabolites are expected to change, but there may be non- biological sample-dependent factors influencing data interpretation. Disadvantages: low reliability with dataset having a high proportion of responding variables.For a more complete introduction to scalings and other data pretreatment techniques used in metabolomics please refer to (van den Berg et al, 2006).C| Principal Component Analysis (PCA) is performed on normalized/scaled table and it returns the score plots of each pairwise comparison of the first ten principal components (when the number of principal components >= 10, otherwise all comparisons of components are plotted) (Figure 5, right panel). Together with this a screeplot (Figure 5, left panel) is created, in order to provide the user with information about the importance of each principal component.These plots are visualized to screen and automatically saved in the directory “PCA_Data_scalingused”, with the name “First_10_Components_scalingused” and “Screeplot_scalingused”, respectively.Proportion of Variance explained020406080100ScreeplotPrincipal ComponentsFIGURE 5D| In order to help the user to chose the “best” pair of principal components to visualize, a specific tool has been implemented which calculates the statistical significance of cluster separation (Goodpaster et al, 2011) obtained with each pair of principal components. In other words, groups will be more or less separated from each other according to different pair of principal components and this cluster separation is tested for its statistical significance. A rank of the first five best-separating principal components is printed to screen, reporting the number of the principal components, the calculated p-value from the F statistics and the proportion of variance explained by each pairs of components (Figure 6). The p- value shown is the sum of all p-values fro each cluster separation statistics: as lower is the p-value, better is the separation ability. Thank to this ranking, one can chose the best pair of PCs according to both their “separation capacity” and the proportion of variance explained.Two files deriving from this technique are saved in the directory “PCA_Data_scalingused”, one listing the F statistics values for each pair of components and named “PCs_Fstatistic.out”, and one ranking all the co

注意事项

本文（Supplementary Dr. S. Mari完整原版文件.docx）为本站会员（暗伤）主动上传，淘文阁 - 分享文档赚钱的网站仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知淘文阁 - 分享文档赚钱的网站（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。