Title: EFFECTIVE AUTOMATION OF TEST CODE GENERATION FOR REST APIS WITH MACHINE LEARNING AND LANGUAGE MODELS

Date: August 9th, 2024

Time: 1:00 PM - 2:30 PM EST

Location: Virtual (https://gatech.zoom.us/my/codingsoo)

 

Committee:

Dr. Alessandro Orso (Advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Qirun Zhang - School of Computer Science, Georgia Institute of Technology

Dr. Spencer Rugaber - College of Computing, Georgia Institute of Technology

Dr. Manish Motwani - School of Electrical Engineering and Computer Science, Oregon State University

Dr. Saurabh Sinha - IBM Research

 

Abstract

REST APIs are pivotal in modern web services, offering standardized and flexible access to web resources. Despite the facilitation provided by the OpenAPI Specification, current black-box REST API testing tools often exhibit limited coverage and lower fault detection capabilities. I analyzed ten state-of-the-art tools across twenty RESTful web services revealed significant limitations in code coverage and fault detection. This study highlights the need for further improvements and identifies four methodological enhancements to improve the effectiveness of automated black-box REST API testing tools.

 

First, NLP2REST enhances the OpenAPI Specification by augmenting machine-readable rules from human-readable sections using Natural Language Processing (NLP) techniques. This approach leverages advanced NLP techniques to systematically extract actionable rules from natural language descriptions within OpenAPI documents, significantly improving test case generation.

 

Second, RESTGPT utilizes Large Language Models (LLMs) to refine the specifications, improving rule extraction accuracy and generating realistic inputs for REST API testing. This approach employs structured prompting techniques to detect nuanced constraints and tailor the generation of contextually appropriate values for API parameters.

 

Third, ARAT-RL, a novel black-box REST API testing tool, employs Reinforcement Learning to enhance testing strategies based on feedback from API responses. Using the Q-Learning algorithm, ARAT-RL prioritizes operations and parameters that frequently fail, increasing test effectiveness and efficiency. It dynamically stores key-value pairs from request and response data to inform inputs for other parameters, enhancing code coverage and accelerating the testing process.

 

Finally, LlamaRestTest integrates quantized and fine-tuned Small Language Models (SLMs) to generate realistic values and resolve parameter dependencies. Designed for low-cost CPUs, this approach significantly outperforms existing tools in API operation coverage, code coverage, and fault detection.

 

Building on these advancements, I propose a comprehensive approach combining novel black-box and white-box testing techniques for REST API testing. For black-box testing, a Multi-Agent Reinforcement Learning (MARL) system integrated with efficient Semantic Operation Dependency Graph (SODG) and LLMs dynamically optimizes testing processes without requiring internal code access. For white-box testing, ASTER-IT (Automated Test Case Generator for Integration Testing) employs a multi-agent system to analyze the internal structure of the API, generate comprehensive test cases, and identify potential vulnerabilities. By incorporating knowledge of the API's implementation details, ASTER-IT generates more targeted and effective test cases.