Title: Conversational AI for Education: Design, Evaluation, and Continual Improvement
Date: Friday, June 12, 2026
Time: 1:00 PM - 03:00 PM EDT
Location (hybrid):
• In-person: CODA Conference Room 233
• Online: Zoom (Meeting ID: 450 568 9105, Passcode: 944041)
Karan Taneja
Ph.D. Candidate in Computer Science
School of Interactive Computing
Georgia Institute of Technology
https://krntneja.github.io/
Committee:
• Dr. Ashok K. Goel (Advisor) - School of Interactive Computing, Georgia Institute of Technology
• Dr. Christopher Dede - Graduate School of Education, Harvard University
• Dr. Kartik Goyal – School of Interactive Computing, Georgia Institute of Technology
• Dr. Christopher J. MacLellan - School of Interactive Computing, Georgia Institute of Technology
• Dr. Peter Norvig Institute for Human-Centered AI Stanford University
Abstract:
Conversational AI in classrooms can provide personalized support and scalable access to course content through natural language dialogue. Conversational assistants allow students to explore technical material and course logistics interactively, building on classroom systems such as Jill Watson (2016). As LLM-based assistants are adopted in courses, responses must be grounded in instructor-approved documents, resist hallucinations and toxic outputs, present verifiable multimodal responses for visually rich STEM topics, and improve after deployment without costly re-annotation. Designing educational conversational AI that is safe, trustworthy, and continuously improvable is therefore crucial for reliable human-AI interaction in learning environments.
My thesis begins by describing the design and evaluation of a newer LLM-based Jill Watson, a modular skill-based system using retrieval-augmented generation over instructor-approved course documents with layered safeguards that substantially reduce unsupported answers and improve safety against adversarial prompts. I then developed MuDoC, which produces document-grounded interleaved text-and-image responses and lets students navigate from texts and figures to the corresponding source location in the source textbook for verification. A within-subjects study with Georgia Tech students (N=30) found that MuDoC led to higher perceived helpfulness, memorability, and confidence than its text-only variant TexDoC, along with greater trust in the system. After improving MuDoC based on lessons from this initial study, I conducted a larger randomized controlled trial (N=124) comparing MuDoC, TexDoC, and a semantic search tool called DocSearch over the same visually rich biology textbook. Multimodal conversational AI yielded significantly higher post-test scores than text-only conversational AI and more favorable learning-experience ratings than DocSearch. Analyses of user behavior, performance, and feedback suggest that conversationality reduces extraneous cognitive load while multimodality increases productive germane load through visual-verbal integration. In the final project, I characterized noise in LLM-generated annotations and introduced an active label correction process, which adaptively combines auto-correction, human verification, and filtering to iteratively improve discriminative modules with 17-24% fewer human annotations than the number of noisy labels in the initial dataset.
Overall, my dissertation makes theoretical, design, and empirical contributions to AI for education, multimodal AI, human-AI interaction, and LLM-based AI systems, providing frameworks for safe grounded classroom assistants, evidence for conversational multimedia learning, and methods for trustworthy, continually improvable educational AI.