Big Data–Driven Machine Learning Frameworks for Clinical Risk Prediction
Main Article Content
Abstract
Healthcare is a path-breaking field for big data. By combining electronic medical record data with omics data (genomics, proteomics, metabolomics, etc.), lifestyle information (e.g., smoking, drinking, diet, and exercise), social determinants of health, and relevant data from wearable devices, a diverse array of clinical and biological predictive models can be constructed. In particular, the application of machine-learning (ML) methods for clinical risk-prediction modeling has gained impressive momentum in recent years, amassing a wealth of reference literature. Unlike traditional statistical approaches commonly utilized in clinical applications, ML techniques have the potential to simultaneously leverage high-dimensional, heterogeneous data. This contribution reviews multiple important aspects of risk prediction using big data and ML methods, including data-sources, framework, performance metrics, and regulation. Relevant clinical applications span almost every area, including cardiovascular medicine, oncology, infectious diseases, nephrology, rheumatology, and psychiatry. Although numerous ML-based risk-scoring systems with impressive performance are found in the literature, external validation and transportability remain critical challenges that merit further exploration.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.