Title: Compiler and Machine Learning-based Predictive Techniques for Security Enhancement through Software Debloating

Date: Monday, July 24, 2023

Time: 3:00pm - 5:00pm

Location: Klaus 2347, zoom

 

Chris Porter

Ph.D. Candidate in Computer Science

School of Computer Science

Georgia Institute of Technology

 

Committee

Dr. Santosh Pande (advisor)

Professor

Associate Chair for Graduate Studies

School of Computer Science

Georgia Institute of Technology

 

Dr. Rajiv Gupta

Distinguished Professor

Amrik Singh Poonian Professor of Computer Science

Associate Dean for Academic Personnel, BCOE

Department of Computer Science and Engineering

University of California, Riverside

 

Dr. Alessandro Orso

Professor

Associate Dean, College of Computing

School of Computer Science

Georgia Institute of Technology

 

Dr. Vivek Sarkar

Professor

Stephen Fleming Chair for Telecommunications

Chair, School of Computer Science

School of Computer Science

Georgia Institute of Technology

 

Dr. Qirun Zhang

Assistant Professor

School of Computer Science

Georgia Institute of Technology

 

 

Abstract

Code reuse attacks continue to be a serious threat to software. Attackers today are able to piece together short sequences of instructions in otherwise benign code to carry out malicious actions. Eliminating these reusable code snippets, known as gadgets, has become one of the prime focuses of attack surface reduction research. The aim is to break these chains of gadgets, thereby making such code reuse attacks impossible or substantially less common. Recent work on attack surface reduction has attempted to eliminate these attacks by subsetting the application, e.g. via user-specified inputs, configurations, or features, to achieve high gadget reductions. However, such approaches suffer from the limitations of soundness (meaning the software might crash or produce incorrect output during no-attack executions on regular inputs), or the techniques may be conservative and leave a large amount of attack surface untackled. This thesis develops three techniques that combine static analysis with dynamic predictions based on machine learning (ML) to address the above shortcomings. They are fully sound, obtain strong gadget reduction, and are shown to break shell-spawning gadget chains and stop real-world attacks arising out of known Common Vulnerabilities and Exposures (CVEs). The techniques reduce attack surface by activating a (minimal) set of functions at chosen callsites and then deactivating them upon return.

 

In the first work, BlankIt, we target library code and achieve ~97% attack surface reduction. The technique uses arguments to library function calls and their static single assignment-based backward slices for training an ML model, which then predicts reachable functions at the callsite using runtime values. In particular, we are able to debloat GNU libc, which is notorious for housing gadgets for code reuse attacks. In the second work, Decker, we target application code and achieve ~73% total gadget reduction. The percentage reduction is similar to prior art but without sacrificing soundness. Decker works by instrumenting the program at compile-time at key points to enable and disable code pages; then at runtime, the framework executes these permission-mapping calls with minimal overhead (~5%). In the third work, PDG, we show how to augment the whole-application technique with an accurate predictor to further reduce the potential attack surface. ML-based predictive techniques do not offer guarantees and suffer from mispredictions; thus, the predictions are sanitized with lightweight checks. The checks rely on statically derived ensue relations (i.e. valid call sequence relations) that are used for separating mispredictions from actual attacks. PDG achieves ~83% total gadget reduction with ~11% runtime overhead. Its predictions trigger runtime checking in ~4% of cases.

 

In conclusion, the thesis empirically shows that it is possible to devise precise and sound attack surface reduction techniques by combining static analysis and ML to overcome their inherent limitations. ML prediction aids purely static analysis by improving its precision, and static techniques augment the ML models by providing mechanisms for identifying when a misprediction is truly an attack.