Title: Visual Analytics for Trustworthy Large Language Models in Education
Date: Thursday, August 22nd 2024
Time: 9 a.m. - 11 a.m. ET (US)
Location: Technology Square Research Building (TSRB) 334 (third floor conference room; just walk in, no special access needed)
Virtual meeting: Click here to join Zoom meeting
Adam Coscia
Ph.D. Student in Human-Centered Computing
School of Interactive Computing
Georgia Institute of Technology
Committee
Dr. Alex Endert (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Duen Horng (Polo) Chau - School of Computational Science & Engineering, Georgia Institute of Technology
Dr. Cindy Xiong Bearfield - School of Interactive Computing, Georgia Institute of Technology
Dr. Yalong Yang - School of Interactive Computing, Georgia Institute of Technology
Dr. Scott Crossley - Department of Special Education, Vanderbilt University
Abstract
Developers in education technology and the learning sciences are rapidly integrating transformer-based large language models (LLMs) such as ChatGPT into novel adaptive learning tools for improving online education. Both scalable and generalizable, LLMs enable adaptive learning in a variety of ways -- from user-facing interfaces such as conversational chatbots for enhanced learning and feedback, to time-saving behind-the-scenes grading and content moderation. However, the often unpredictable behaviors of LLMs have also introduced several pedagogical risks and harms, such as responding with misinformation and discriminatory language in conversation, as well as biasing scores against individuals when used for grading. As a result, multiple stakeholders in education, including developers, instructors, and learners, are distrustful of LLMs being used in educational technologies, inhibiting the adoption of positive and transformational advances in adaptive learning enabled by LLMs.
One of the greatest barriers to building trust in safely using LLMs for education is a lack of tools that help stakeholders understand what LLMs are capable of and how they might impact learning outcomes. Thus, to address stakeholders' concerns around using LLMs in education, we propose to investigate how to build tools that establish trustworthy LLMs in education. The goal of this dissertation is to enable developers, instructors, and learners to calibrate their trust in LLMs by building novel visual analytics tools that help developers first evaluate the trustworthiness of LLMs in education, and then communicate the results of evaluation to non-technical stakeholders. We believe the use cases, study findings, and lessons learned from this work will inspire new techniques and advances in developing novel visualizations that help establish trustworthy LLMs in education.
The approach of this work is two-fold:
- First, we developed visual analytics tools for developers (KnowledgeVIS and iScore) that helped them evaluate the trustworthiness of their LLMs embedded in educational technology. We identified the challenges and tasks of developers when evaluating LLMs in the context of their technical workflows building LLM-powered educational technologies, then evaluated the effectiveness of our designs for helping developers understand, evaluate and build trust in how LLMs work.
- Second, we propose to create a visual analytics toolkit that helps developers communicate the trust they calibrated with instructors and learners. We will curate a set of useful metrics for measuring the trustworthiness of LLMs in the context of education, implement our metrics in a visual analytics toolkit for building solutions such as dashboards, and study the usability of our toolkit with developers, as well as the effectiveness of our visualizations for communicating trust with stakeholders.