Comparison of statistical techniques for the analysis of ”omics” data
This report refers to a project which has been assigned to a group of students in the MSc in Mathematics at Università degli Studi di Milano during a modeling seminar, in the academic year 2017/18. They faced the following problem.
A group of biochemists in the University of Milan conducted an experiment testing the effect of dihydroprogesterone (DHP) treatment on rats with induced diabetes. The test subjects were divided into three groups: the control group (CTRL), the group with induced diabetes but no treatment (STZ), and the group with induced diabetes and treatment (STZ+DHP). As a result of the experiment the group measured the concentrations of 222 metabolites in the 15 test subject rats. Based on the measurements, they deduced that the metabolite concentrations in the STZ+DHP group were similar to the CTRL group and different from the STZ group. (Cermenati et al. 2017)
However, due to the small amount of observations compared to the amount of variables in the experiment data, the statistical analysis with one-way Analysis of Variance (ANOVA) used in the experiment is not well-suited to be used in analysing the data. Therefore the use of other statistical methods to determine trends in the data is of great interest. The goal of the project was to apply two different statistical techniques to the experiment data, namely SVD decomposition and Functional Statistics, and deduce whether there is a significant difference between the treated rats and the non-treated, and also to identify which metabolites contribute most to the difference between the groups.
SVD is mainly a technique to filter the noise, which decomposes the matrix A containing rats (=experiments) on rows, and metabolites on columns into the product A=USVT, where the columns of U are interpreted as eigenrats, while the rows of V are interpreted as eigenmetabolites. Using such decomposition, a score on the importance of each metabolite in the group characterization can be defined (see Font-Clos et al., 2017), by which the set of metabolites which are mainly responsible of the differences between the groups could be identified.
Functional Statistic is usually applied to longitudinal data (i.e. ordered data, depending on time), so it does not fit the dataset that we are treating “as it is”. Noise in functional statistics is filtered by fitting functions from a specific basis with a Karhunen-Loeve transform, which smoothens the functions. In the proposed case study, in order to preliminarly regularize the data, we ordered the metabolites in increasing order with respect to their mean, and then a spline basis has been fitted. Afterwards a functional ANOVA procedure was able to confirm the dissimilarity of the CTRL and the STZ group, and the recovered similarity of the CTRL and STZ+DHP groups, after the treatment. The construction of confidence bands for the F statistics used in the FANOVA procedure was able to reveal the set of metabolites more responsible of the groups differences.
Using then a “consensus strategy” with the different applied methods, we could identify 18 metabolites which show a bigger involvement in the recovering effects of dihydroprogesterone.
Cermenati, Gaia, Giatti, Silvia, Audano, Matteo, Pesaresi, Marzia, Spezzano, Roberto, Caruso, Donatella, Mitro, Nico, and Melcangi, Roberto Cosimo (2017). “Diabetes alters myelin lipid profile in rat cerebral cortex: Protective effects of dihydroprogesterone”. In: The Journal of Steroid Biochemistry and Molecular Biology 168, pp. 60–70. ISSN: 0960-0760. DOI: https://doi.org/10.1016/j.jsbmb.2017.02.002.
Font-Clos, Fransesc, Zapperi, Stefano, and La Porta, Caterina A.M. (2017). “Integrative analysis of pathway deregulation in obesity”. In: Nature Partner Journals Systems Biology and Applications. URL: www.nature.com/npjsba.
Claudio Russo Introito
Modelling Seminar @ UNIMI 2017/18
Supervisor: Prof. Alessandra Micheletti