Copula Gaussian graphical modelling of biological networks and Bayesian inference of model parameters

Document Type : Article

Authors

1 Department of Statistics, Middle East Technical University, Ankara, Turkey.

2 Department of Statistics, Middle East Technical University

10.24200/sci.2019.5071.1076

Abstract

Understanding complex biological networks enable us to better understand the systems’ diseases such as cancers and heart attacks, and to produce drug targets which is one of the major research questions under the personalized medicine. But the description of these complexities is challenging since the associated data are very sparse, high dimensional and seriously correlated. The copula Gaussian graphical model (CGGM), which depends on the representation of the multivariate normal distribution via marginals and a copula term, is one of the successful modelling approaches to present such type of datasets. In this study, we apply CGGM in modelling steady-state activations of biological networks and make inference of model parameters under Bayesian settings. We suggest the reversible jump Markov chain Monte Carlo (RJMCMC) algorithm to estimate plausible interactions between the systems’ elements which are proteins or genes. We also generate the open-source R codes of RJMCMC for CGGM under different dimensional networks. In the application, we use real datasets and evaluate the accuracy of estimates via F1-score. From the results, we observe that CGGM with RJMCMC is successful in the presentation of real and complex systems with higher accuracy and can be a promising approach to understand biological networks and diseases.

Keywords


1. Whittaker, J. Graphical Models in Applied Multivariate Statistics, John Wiley and Sons, New York (1990). 2. Friedman, J., Hastie, T., and Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso", Biostatistics, 9, pp. 432-441 (2008). 3. Meinshausen, N. and Buhlmann, P. High dimensional graphs and variable selection with the lasso", The Annals of Statistics, 34, pp. 1436-1462 (2006). 4. Green, P.J. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination", Biometrika, 82(4), pp. 711-732 (1995). 5. Dobra, A. and Lenkoski, A. Copula Gaussian graphical models and their application to modeling functional disability data", Annals of Applied Statistics, 5, pp. 969-993 (2011). 6. Mohammadi, A. Bayesian model determination in complex systems", PhD Thesis. University of Groningen, Netherland (2015). 7. Richardson, S. and Green, P.J. Bayesian analysis of mixtures with an unknown number of components", Journal of Royal Statistical Society B, 59, pp. 731-792 (1997). 8. Walker, S. A Gibbs sampling alternative to reversible jump MCMC", Report no.: IMS-EJS-EJS 2009 383, pp. 1-3 (2009). 9. Mohammadi, A. and Wit, E.C. Bayesian structure learning in sparse Gaussian graphical models", Bayesian Analysis, 10, pp. 109-138 (2015). 10. Skrondal, A. and Rabe-Hesketch, S. Structural equation modeling: Categorical variables", Entry for the Encyclopedia of Statistics in Behavioral Science, Wiley, pp. 1-8 (2005). 11. Wang, H. and Zhengzi, S. E_cient Gaussian graphical model determination under G-Wishart prior distributions", Electronic Journal of Statistics, 6, pp. 168-198 (2012). 12. Lenkoski, A. A direct sampler for G-Wishart variates", Statistics, 2, pp. 119-128 (2013). 13. Atay-Kayis, A. A Monte Carlo method for computing the marginal likelihood in non-decomposable Gaussian graphical models", Biometrika, 92(2), pp. 317-335 (2005). 14. Ai, J. Reversible-jump MCMC methods in Bayesian statistics", MSc Thesis, The University of Leeds, United Kingdom (2012). 2504 H. Farnoudkia and V. Purut_cuo_glu/Scientia Iranica, Transactions E: Industrial Engineering 26 (2019) 2495{2505 15. Hu, Z., Zhu, D. etc. Genome-wide pro_ling of HPV integration in cervical cancer identi_es clustered genomic hot spots and a potential microhomolgy-mediated integration mechanism", Nature Genetics, 47(2), pp. 158-163 (2015). 16. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian Carcinoma", Nature, 474, pp. 609-615 (2011). 17. Levine, D.A. and The Cancer Genome Atlas Research Network Integrated genomic characterization of endometrial carcinoma", Nature, 497, pp. 67-73 (2013). 18. Sachs, K., Perez, O., Pe'er, D., Lauenburger, D.A., and Nolan, G.P. Causal protein-signaling networks derived from multiparameter single-cell data", Science, 308, pp. 523-529 (2005). 19. Trivedi, P.K. and Zimmer, D.M. Copula modeling: An introduction for practitioners", Foundations and Trends R in Econometrics, 1, pp. 1-111 (2005). 20. Weber, G.W., Defterli,  O., Alparslan Gok, S.Z., and Kropat, E. Modeling, inference and optimization of regulatory networks based on time series data", European Journal of Operational Research, 211(1), pp. 1-14 (2011). 21. Sima, C., Hua, J., and Jung, S. Inference of gene regulatory networks using time-series data", A Survey, Current Genomics, 10(6), pp. 416-429 (2009). 22. Abegaz, F. and Wit, E. Sparse time series chain graphical models for reconstructing genetic networks", Biostatistics, 14(3), pp. 586-599 (2013). 23. Wawrzyniak, M.M. Dependence concepts", MSc Thesis, Delft University of Technology, Netherland (2006). 24. Brechmann, E.C. and Schepsmeier, U. Modeling dependence with C- and D-vine copulas: The R package CDVine", Journal of Statistical Software, 52(3), pp. 1-27 (2013). 25. Holmes, C.C. and Denison, D.G.T. Classi_cation with Bayesian MARS", Machine Learning, 50, pp. 159-173 (2003). 26. Yerlikaya-  Ozkurt, F., CMARS: A New Contribution to Nonparametric Regression with MARS, Lap Lambert Academic Publishing (2011). 27. Weber, G.W., Batmaz, _I., Koksal, G., Taylan, P., and Yerlikaya-  Ozkurt, F. CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization", Inverse Problems in Science and Engineering, 20(3), pp. 371-400 (2012). 28.  Ozmen, A. Robust Optimization of Spline Models and Complex Regulatory Networks, Springer International Publishing, Switzerland (2016). 29.  Ozmen, A., Weber, G.W., Batmaz, _I., and Kropat, E. RCMARS: Robusti_cation of CMARS with different scenarios under polyhedral uncertainty set", Communications in Nonlinear Science and Numerical Simulation, 16(12), pp. 4780-4787 (2011). 30. Ayy_ld_z, E., Purut_cuo_glu, V., and Weber, G.W. Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks", European Journal of Operational Research, 270(3), pp. 852-861 (2018). 31. Yaz_c_, C., Yerlikaya-  Ozkurt, F., and Batmaz, _I. A computational approach to nonparametric regression: bootstrapping CMARS method", Machine Learning, 101(1-3), pp. 211-230 (2015). 32. Yerlikaya-  Ozkurt, F., AS_kan, A., and Weber, G.W. A hybrid computational method based on convex optimization for outlier problems: Application to earthquake ground motion prediction", Informatica, 27(4), pp. 893-910 (2016). 33. Taylan, P., Yerlikaya-  Ozkurt, F., and Weber, G.W. An approach to the mean shift outlier model by Tikhonov regularization and conic programming", Intelligent Data Analysis, 18(1), pp. 79-94 (2014). 34.  Ozmen, A., Kropat, E., and Weber, G.W. Robust optimization in spline regression models for multi-model regulatory networks under polyhedral uncertainty", Optimization, 66(12), pp. 2135-2155 (2017). 35.  Ozmen, A., Batmaz, _I., and Weber, G.W. Precipitation modeling by polyhedral RCMARS and comparison with MARS and CMARS", Environmental Modeling and Assessment, 19(5), pp. 425-435 (2014). 36. Bower, J. and Bolouri, H., Computational Modeling of Genetic and Biochemical Networks, MIT Press, London (2001). 37. Golightly, A. and Wilkinson, D.J. Bayesian inference for stochastic kinetic models using di_usion approximation", Biometrics, 61(3), pp. 781-788 (2005). 38. Golightly, A. and Wilkinson, D.J. Bayesian sequential inference for stochastic kinetic biochemical network models", Journal of Computational Biology, 13(3), pp. 838-851 (2006). 39. Purut_cuo_glu, V. Inference of stochastic MAPK pathway by modi_ed di_usion bridge method", Central European Journal of Operational Research, 21(2), pp. 415-429 (2013). 40. Purut_cuo_glu, V. and Wit, E. Bayesian inference for the MAPK/ERK pathway by considering the dependency of the kinetic parameters", Bayesian Analysis, 3(4), pp. 851-886 (2008). 41. Li, X., Omotere, O., Qian, L., and Dougherty, E.R. Review of stochastic hybrid systems with applications in biological systems modeling and analysis", EURASIP Journal on Bioinformatics and Systems Biology, 8, pp. 1-12 (2017). 42. Savku, E. Advance in optimal control of markov regime-switching models with applications in _nance and economics", PhD Thesis. Middle East Technical University, Turkey (2017). 43. Savku, E., Azevedo, N., and Weber, G.W. Optimal control of stochastic hybrid models in the framework of regime switches", Modeling, Dynamics, Optimization and Bioeconomics, II. Editors: Pinto, A. and Zilberman, D., Springer, pp. 371-387 (2017).