Title: Automated extraction and synthesis of biomedical data for AI-driven systematic review and meta-analysis
David Kartchner
CSE PhD Candidate
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Date: Friday, November 17, 2023
Time: 11:00am–1:00pm EST
In-Person Location: Coda C1115 Midtown
Zoom Link: https://gatech.zoom.us/j/91549305198
Committee:
Dr. Cassie Mitchell (Advisor), School of Biomedical Engineering, Georgia Institute of Technology
Dr. Chao Zhang, School of Computational Science and Engineering Georgia Institute of Technology
Dr. Duen Horng "Polo" Chau, School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Jon Duke, Georgia Tech Research Institute, Georgia Institute of Technology
Dr. Daniel Domingo-Fernández, Enveda Biosciences
Abstract:
Biomedical literature is not simply a record of scientific discovery; it also provides a platform for research exploration and optimized clinical practice. The purpose of this thesis is to utilize and develop natural language processing methods to enhance and automate biomedical literature-based research inquiry. Specifically, we develop datasets, methods, and systems to enable AI-assisted systematic review and meta-analysis of clinical literature. We further validate its efficacy via several clinical case studies that demonstrate its value in identifying potential treatments for emerging diseases and elucidating the mechanisms by which diseases affect patients.
Qualitative systematic reviews perform a thorough survey of a particular medical topic to highlight relevant relationships and highlight promising directions for future research. To enable faster systematic review of biomedical relationships, we build a knowledge graph of relationships between biomedical entities extracted from 33+ million research articles on PubMed. We pair this with an unsupervised graph ranking algorithm that identifies related concepts and their relationships from literature. This graph and accompanying software package form a literature-based discovery system that can comprehensively identify and rank disease risks, mechanisms, and repurposed drugs for future clinical or experimental research prioritization.
Similarly, quantitative meta-analysis of clinical studies forms the gold standard for establishing clinical guidelines and best practice by calculating an aggregate effect size from a collection of smaller cohorts. Meta-analysis begins with a specific research question and then extracts study-specific data elements to form a large, synthetic statistical cohort. Currently, the process of selecting research articles and extracting relevant data is done manually, taking a year on average for each clinical meta-analysis. This thesis presents data and methodological resources that dramatically accelerate the process of qualitatively and quantitatively aggregating evidence from biomedical research. In doing so, we provide the following contributions:
- We develop SemNet 2.0, a literature-based discovery software that integrates 33+ million PubMed articles into a comprehensive knowledge graph using named entity recognition, entity linking, and relationship extraction. We perform real-world case studies to illustrate the efficacy of SemNet 2.0 for summarizing relationships and prioritizing future experimental and clinical research.
- We present meticulously annotated data resources -- BioSift and TrialSieve -- that enable efficient filtering of clinical studies and detailed extraction of study design and outcome information. Specifically, TrialSieve is the first dataset to our knowledge that enables the automated quantification of clinical outcomes for each group represented in a clinical study.
- We demonstrate the effectiveness of our developed platform by creating a large database of clinical evidence for over 100 commonly used drugs with high potential to improve therapeutic outcomes for numerous types of cancer.