Multidimensional Population Health Modeling: A Data-Driven Multivariate Statistical Learning Approach

Zhiyuan Wei, Adil Baran Narin,Sayanti Mukherjee

IEEE ACCESS(2022)

Cited 2|Views1
No score
Abstract
Population health is multidimensional in nature, having complex relationships with the various health determinants. However, most previous studies investigate a single dimension of population health using linear models, failing to capture the nonlinearity in the data and interdependence of multiple dimensions in health outcomes. In this paper, we propose a data-driven multivariate statistical learning approach to simultaneously model various aspects of population health-characterizing the length and quality of life-as a function of health behaviors, clinical care, socioeconomic factors, physical environment, and demographics. We also propose a novel percentile-based variable selection for multivariate regression, without compromising the model's generalization performance. We demonstrate the applicability of our proposed data-driven methodological framework using the New York State as a case study. Leveraging cross-validation techniques and statistical hypothesis tests, the results indicate that multivariate tree boosting method outperforms the traditionally-used univariate linear regression model and random forest in modeling multidimensional population health. The variable importance heat-map illustrates the relative influence of the key health determinants on the various dimensions of population health. Partial dependence plots are used to quantify the marginal effects and the nonlinear relationships between the health outcomes and health inputs. Our results show that teen birth rate is strongly associated with both length of life (e.g., child mortality) and quality of life (e.g., physically unhealthy days). Socioeconomic status is the key indicator to predict child and infant mortality. Our proposed framework can be used as a decision support tool for accurately assessing and predicting multivariate population health.
More
Translated text
Key words
Statistics, Sociology, Biological system modeling, Aging, Pediatrics, Random forests, Licenses, Data-driven framework, multivariate tree boosting, multidimensional population health, variable selection
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined