Predictive ETL Failure Detection in Healthcare Data Pipelines Using Anomaly Detection Algorithms

Triveni Kolla

PDF

Published: Dec 14, 2023

Keywords:

Predictive ETL Failure Detection,Healthcare Data Pipelines,Anomaly Detection Algorithms,Machine Learning for Data Quality,Automated Data Pipeline Monitoring,Healthcare Analytics Infrastructure,ETL Reliability Engineering,Intelligent Data Operations (DataOps),Real-Time Pipeline Anomaly Detection,Predictive Maintenance for Data Systems,Big Data Healthcare Processing,AI-Driven Fault Detection,Data Integrity Monitoring,Operational Analytics in Healthcare IT,Scalable Pipeline Observability.

Triveni Kolla

Senior Business Intelligence Developer Cotiviti, USA

Abstract

Healthcare data pipelines' failure detection remains a largely unsolved problem, despite the criticality of ETL (extract, transform, load) processes for timely healthcare insights and the availability of ETL logs across many pipelines to support analytics. This study proposes an architecture to detect ETL failures up to 1 hour before they occur by training anomaly detection models with log and ETL status data over preceding periods. Datasets of five months of ETL logs from a healthcare data warehouse with associated ground truth labels are used to train, validate, and benchmark four anomaly-detection approaches—Isolation Forest, One-Class SVM, LSTM-VAE, and ARIMA—for predictive failure detection. Two months of additional data serve as testing for the models trained on three months of data and tested on one month, along with a four-day test set for models trained on one month and tested on three days.

Results demonstrate the potential of the suggested architecture for predictive ETL failure detection in healthcare data pipelines. Factors contributing to ETL failures are identified in the feature engineering stage, enabling an understanding of the predictive power of various features as well as dataset partitioning for better model training. Ultimately, these findings contribute to future closed-loop control of ETL tasks through automatic recovery and alerting on upcoming failures, with potential application to ETL systems outside the healthcare domain.

How to Cite

Triveni Kolla. (2023). Predictive ETL Failure Detection in Healthcare Data Pipelines Using Anomaly Detection Algorithms. International Journal of Medical Toxicology and Legal Medicine, 26(3 and 4), 65–77. Retrieved from https://ijmtlm.org/index.php/journal/article/view/1461

Issue

Vol. 26 No. 3 and 4 (2023)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details