Title: Improving Access to DNS Datasets Through the Large-Scale Collection of Active-DNS Data
Date: Friday, February 24th
Time: 2 pm - 4 pm EST
Location: Klaus 1123
Athanasios Kountouras
Ph.D. Candidate
School of Cybersecurity and Privacy
College of Computing
Georgia Institute of Technology
Committee:
Dr. Manos Antonakakis (advisor), ECE, Georgia Institute of Technology
Dr. Mustaque Ahamad, CS, Georgia Institute of Technology
Dr. Angelos Keromytis, ECE, Georgia Institute of Technology
Dr. Roberto Perdisci, CS, University of Georgia
Dr. Chaz Lever, Senior Director - Security Research, Devo Technology Inc.
Abstract:
The Internet has changed significantly in size, interconnectedness, speed, capability, and usability over the years. Especially after a few years of remote work and remote learning, we can safely say that the Internet is an essential resource for the modern world. How- ever, even though the network has expanded massively since its inception, it still relies upon the same fundamental technologies that still form the backbone of interconnected networks. The Domain Name System (DNS) is one of those fundamental Internet technologies; its main task is to translate humanly readable domain names into resources on the ever-growing network. Because nearly all internet traffic, benign and malicious, utilizes DNS, the system has long been utilized by the security community, which has evolved along with the Internet to help battle new and ever more sophisticated threats, and DNS has been proven to be a valuable tool in that effort.
Studying the Domain Name System helps us understand how it can be abused and how it can also be a great tool in combating abuse. In order, though, for DNS to be useful for Internet defenders, they require access to quality datasets for identifying malicious behavior, building detection models, evaluating and running models on real-world datasets, and many more. Such datasets will enable the development of new algorithms and methodologies that can assist with the early detection, tracking, and overall lifetime of modern Internet threats.
To that end, this thesis presents the concept of Active DNS data collection through a distributed querying infrastructure. More specifically, we show how this new public dataset which we name Active DNS, compares against traditionally utilized passive DNS datasets and document our system’s unique features that enable it to function as an alternative to passive DNS data. We then demonstrate the ability of Active DNS data to detect online abuse by utilizing it to amplify already known malicious web infrastructure and potentially identify new abusive infrastructure before it’s even used. Finally, we show how our distributed querying system, Thales, allows us to study the operational aspects of the global DNS infrastructure, specifically investigating the proliferation of a new DNS extension and measuring the impact and efficacy of this new DNS extension through active probing.