Sonakshi Gupta
Advisor: Prof. Rampi Ramprasad

will propose a doctoral thesis entitled,

Accelerating Knowledge Extraction and Reasoning from Polymer Literature for Data-Driven Materials Design

On

Monday, February 16 at 1:00 p.m.
MRDC Room 3515
or 
Virtually via MS Teams: 
https://teams.microsoft.com/l/meetup-join/19%3ameeting_YTE5YjJmY2QtMDg0Mi00ZWYxLTg3MDYtNzFkOWFkMjMyNzk4%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%229bfc545d-266c-451b-b17e-f278bfd2ae0f%22%7d

Committee
Prof. Rampi Ramprasad - School of Materials Science and Engineering (Advisor)
Prof. Chao Zhang - School of Computational Science and Engineering
Prof. Helen Xu - School of Computational Science and Engineering
Prof. Guoxiang (Emma) Hu - School of Materials Science and Engineering
Prof. Will Gutekunst - School of Chemical and Biochemical Engineering

Abstract
Designing polymer materials with targeted properties remains a challenging task because polymer behavior depends on the interplay of chemical structure, composition, and processing conditions. Polymer informatics offers a data-driven route to accelerate materials design, but its advancement is fundamentally limited by the availability of reliable, informatics-ready data. Although the scientific literature is a rich resource, much of the information it contains remains underutilized because it is reported in heterogeneous and unstructured formats across text, tables, and figures, creating a disconnect between how polymer data is reported and how data-driven methods operate. This thesis addresses this bottleneck by developing a literature-driven framework that transforms unstructured literature into curated knowledge for polymer design, with a particular focus on sustainable packaging materials. The first part focuses on developing automated extraction and validation methods to convert unstructured polymer literature into reliable informatics-ready datasets. The next part introduces retrieval-augmented frameworks based on dense semantic and graph-based representations to identify, retrieve, and reason over relevant literature, supporting both fact-centric queries and design-oriented reasoning for in silico polymer design. Finally, these representations are integrated into predictive modeling workflows to develop property prediction models and screening strategies that evaluate and rank candidate polymers against target performance criteria relevant to sustainable packaging. Together, these efforts establish a generalizable framework that integrates extraction, retrieval, reasoning, and predictive modeling to enable data-driven materials design grounded in scientific literature.