Neel Sarkar
BME MS Thesis Defense Presentation
Date: 2025-06-16
Time: 10:00 am
Location / Meeting Link: Online via Zoom / https://gatech.zoom.us/j/94083989314
Committee Members:
Cassie Sue Mitchell, PhD (Advisor); Peng Qiu, PhD; James Lah, PhD
Title: A Comprehensive, Intuitive, Automated Framework for s-SuStaIn Biomedical Data Applications and Profiling
Abstract:
Event-based models (EBMs) are powerful machine learning (ML) frameworks for inferring the temporal sequence of monotonic biomarker changes in complex diseases. However, the application of EBMs to real-world biomedical datasets is often hindered by data quality issues, high dimensionality, and limited interpretability of results. We present an end-to-end, user-friendly pipeline that addresses these challenges through three core innovations. First, our model, s-SuStaIn, performs an automated data readiness assessment upon upload. This includes verification of file formats, metadata completeness, column consistency, and compatibility of clinical or omics measurements. It also detects scaling or standardization errors and provides interactive feedback to guide users in resolving issues prior to analysis. Second, to address high dimensionality in clinical and bioinformatic datasets, we implement a novel radial basis function (RBF)-based dimensionality reduction algorithm. This method projects features into a lower-dimensional space while preserving variance and feature separability, enabling stable EBM parameter estimation even with datasets containing hundreds or thousands of variables. Finally, s-SuStaIn outputs a comprehensive suite of results, including subject-level classification labels, an ordered sequence of biomarker changes, and performance metrics presented in an interactive report. It also generates a downloadable list of the most informative biomarkers, supporting both hypothesis generation and downstream statistical analysis. By integrating data validation, dimensionality reduction, and interpretable output into a cohesive interface, our pipeline enables researchers - including those without a computational background - to perform rigorous, reproducible event-based modeling and accelerate the discovery of temporal biomarker signatures in complex diseases.