In partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Bioinformatics
in the School of Biological Sciences
Matthew Hunter Seabolt
Defends his thesis:
Expanding the Bioinformatics Toolbox for DIVERSITY and TAXONOMIC STUDIES OF Microbial Eukaryotic Pathogens
Wednesday July 26, 2023
3:00-4:00pm
Krone Engineered Biosystems Building, Children’s Healthcare of Atlanta Seminar Room (EBB 1005)
Thesis Advisor:
Dr. Kostas Konstantinidis, Advisor
School of Civil & Environmental Engineering
Georgia Institute of Technology
Committee Members:
Dr. Joel Kostka
School of Biological Sciences
Georgia Institute of Technology
Dr. I. King Jordan
School of Biological Sciences
Georgia Institute of Technology
Dr. Christine Heitsch
School of Mathematics
Georgia Institute of Technology
Dr. Dawn M. Roellig
Waterborne Disease Prevention Branch
Division of Foodborne, Waterborne, and Environmental Diseases
Centers for Disease Control and Prevention
Abstract:
Cataloguing and studying microbial eukaryote diversity and speciation present unique challenges due to their evolutionary divergence from well-studied model genomes, limited culturing methods, and uncertain taxonomy. The scarcity of high-quality genomic data poses a significant obstacle to understanding genome relatedness and important traits like virulence and antimicrobial resistance, with much debate centered on how to reconcile discordant phylogenetic signals from existing molecular typing data with historical records and type specimens. Thus far, no major movement has occurred in almost two decades. Additionally, existing bioinformatics methods need advancement to handle large eukaryotic genomes effectively.
This research aims to expand the set of available bioinformatics tools for the comparative analysis of genomes of microbial eukaryotes. Case studies using the protozoan parasite Giardia duodenalis as a model organism are presented. These studies include (i) developing a new, automated pipeline for identifying the best gene markers in the genome for phylogenetic reconstruction purposes and strain-level resolution, (ii) the creation of a statistical framework to identify cryptic species and quantify their evolutionary relationships, and (iii) improving reference genome annotation of the Giardia genome. Lastly, we employed the genome aggregate average nucleotide identity (ANI) and graph-based methods to assess whether or not natural boundaries between eukaryotic species exist, similar to those previously observed for Prokaryotes, and study the relationship between shared gene content and ANI (or degree of genetic relatedness).
The findings suggest that sequence-discrete clusters of genomes, akin to traditional species, are prevalent among the examined genomes and our methodology is robust across eukaryote phyla and at multiple taxonomic hierarchies. Applying the conclusions from this research, such as 95% ANI as a general-purpose species boundary in eukaryotes as well as ANI’s utility for molecular typing, this research’s conclusions contribute novel biological insights and bioinformatics methods to the toolkit for eukaryote taxonomy, and genome analysis.