Title: Exploring Co-Embedding for Healthcare Data: Theoretical and Practical Insights

 

 

Date: Friday, November 10, 2023

Time: 2pm - 4pm ET (11am - 1pm PT)

Location (in-person): Coda C1315 Grant Park

Location (remote): Teams link

Meeting ID: 240 966 076 993
Passcode: VB9s9S

 


Dongjin Choi

School of Computational Science and Engineering

College of Computing

Georgia Institute of Technology

 

https://jinchoi.xyz/

 

Committee

Dr. Haesun Park - Advisor, School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Duen Horng (Polo) Chau - School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Chao Zhang - School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Hamid Haidarian - Kaiser Permanente

 

Abstract

In the advancement of data representation learning within the healthcare sector, this thesis introduces innovative methods by employing advanced co-embedding techniques grounded in constrained low-rank approximation. TBy synthesizing low-dimensional embeddings from a fusion of clinical and digital interaction data, this work develops a robust patient profiling model that addresses the inherent data sparsity characteristic of healthcare datasets, thereby enhancing clustering coherence and recommendation accuracy.

 

The core of this thesis is the development of algorithms that utilizes the plethora of structured and unstructured data available within the healthcare domain. By integratiing semi-supervised learning with label information constraints, the method refines the embeddings, thus enhancing classification and clustering outcomes. A distinctive feature of WellFactor is its capacity for immediate computation of embeddings for new patient data, thereby offering a dynamic solution to patient data analysis without necessitating exhaustive recomputation.

 

Empirical evaluations using real-world data from healthcare web portals substantiate the superiority of the proposed model against conventional baselines in terms of patient classification, clustering, and similarity predictions. The research encapsulates two main projects: the incorporation of heterogeneous data into the multi-type data co-embedding, and the interpretative methodologies of co-embedding outcomes. The outcomes aim to contribute a more nuanced understanding of patient profiles and interactions to the field of healthcare data science.