In partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Bioinformatics

in the School of Biological Sciences

 

Monica Isgut

 

Defends her thesis:

Analysis and design of multi-modal clinical and genomic risk scores for disease prediction using machine learning 

Thursday, July 27th 2023

5:00 PM

Zoom Link =  https://gatech.zoom.us/j/6872970574

 

Thesis Advisor:

Dr. May D. Wang

Department of Biomedical Engineering

Georgia Institute of Technology

 

Committee Members:

Dr. I. King Jordan

School of Biological Sciences

Georgia Institute of Technology

 

Dr. Yunan Luo

School of Computational Science and Engineering

Georgia Institute of Technology

 

Dr. Saurabh Sinha

Department of Biomedical Engineering

Georgia Institute of Technology

 

Dr. Blake Anderson

School of Medicine

Emory University

  

Abstract:

Polygenic risk scores (PRSs) are promising tools for leveraging genomic data for disease risk prediction in clinical settings. However, little is known about their value in the context of clinical data routinely available. This work aims to analyze the value-add of genomic data in multi-modality risk prediction models over models with clinical data alone, 1) for several diseases, 2) across disease subpopulation groups, and 3) across different categories of model complexity (i.e., logistic regression vs. neural networks) and clinical or genomic feature space.

 

The latter more specifically evaluates: a) the effect of integrating large-scale clinical data derived from electronic health records (EHRs) with PRSs in a multi-modal neural network on the estimated value-add of the PRSs in the risk model, and b) the effect of integrating standard small-scale clinical risk factors (i.e., body mass index, smoking status) with genomic data in the form of individual genomic features (hereafter also denoted as a PRS) in a neural network on the estimated value-add of the genomic data.

 

In addition to the systematic analysis of the factors contributing to the value-add of genomic data and the design of multi-modality genomic and clinical neural networks for disease prediction, this work also introduces two novel representation learning algorithms designed to derive low-dimensional representations of EHR diagnostic data and genotype data, respectively. Furthermore, this work explores various the use of neural network interpretability tools applied to multi-modality disease risk scores to gain insights into important or interacting features utilized in risk prediction.