Title: Automated extraction and synthesis of biomedical data for AI-driven systematic review and meta-analysis

 

David Kartchner

CSE PhD Candidate

School of Computational Science and Engineering

College of Computing

Georgia Institute of Technology

https://davidkartchner.com 

 

 

Date: Friday, November 17, 2023

Time: 11:00am–1:00pm EST

In-Person Location: Coda C1115 Midtown

Zoom Link: https://gatech.zoom.us/j/91549305198

 

 

Committee:

Dr. Cassie Mitchell (Advisor), School of Biomedical Engineering, Georgia Institute of Technology

Dr. Chao Zhang, School of Computational Science and Engineering Georgia Institute of Technology

Dr. Duen Horng "Polo" Chau, School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Jon Duke, Georgia Tech Research Institute, Georgia Institute of Technology

Dr. Daniel Domingo-Fernández, Enveda Biosciences

 

Abstract:

Biomedical literature is not simply a record of scientific discovery; it also provides a platform for research exploration and optimized clinical practice. The purpose of this thesis is to utilize and develop natural language processing methods to enhance and automate biomedical literature-based research inquiry.  Specifically, we develop datasets, methods, and systems to enable AI-assisted systematic review and meta-analysis of clinical literature.  We further validate its efficacy via several clinical case studies that demonstrate its value in identifying potential treatments for emerging diseases and elucidating the mechanisms by which diseases affect patients.

 

Qualitative systematic reviews perform a thorough survey of a particular medical topic to highlight relevant relationships and highlight promising directions for future research.  To enable faster systematic review of biomedical relationships, we build a knowledge graph of relationships between biomedical entities extracted from 33+ million research articles on PubMed.  We pair this with an unsupervised graph ranking algorithm that identifies related concepts and their relationships from literature.  This graph and accompanying software package form a literature-based discovery system that can comprehensively identify and rank disease risks, mechanisms, and repurposed drugs for future clinical or experimental research prioritization.

 

Similarly, quantitative meta-analysis of clinical studies forms the gold standard for establishing clinical guidelines and best practice by calculating an aggregate effect size from a collection of smaller cohorts.  Meta-analysis begins with a specific research question and then extracts study-specific data elements to form a large, synthetic statistical cohort.  Currently, the process of selecting research articles and extracting relevant data is done manually, taking a year on average for each clinical meta-analysis.  This thesis presents data and methodological resources that dramatically accelerate the process of qualitatively and quantitatively aggregating evidence from biomedical research.   In doing so, we provide the following contributions:

  • We develop SemNet 2.0, a literature-based discovery software that integrates 33+ million PubMed articles into a comprehensive knowledge graph using named entity recognition, entity linking, and relationship extraction. We perform real-world case studies to illustrate the efficacy of SemNet 2.0 for summarizing relationships and prioritizing future experimental and clinical research.
  • We present meticulously annotated data resources -- BioSift and TrialSieve -- that enable efficient filtering of clinical studies and detailed extraction of study design and outcome information.  Specifically, TrialSieve is the first dataset to our knowledge that enables the automated quantification of clinical outcomes for each group represented in a clinical study.
  • We demonstrate the effectiveness of our developed platform by creating a large database of clinical evidence for over 100 commonly used drugs with high potential to improve therapeutic outcomes for numerous types of cancer.