Title: Improving Access to DNS Datasets Through the Large-Scale Collection of Active-DNS Data

Date: Friday, February 24th 

Time: 2 pm - 4 pm EST

Location: Klaus 1123

 

Athanasios Kountouras

Ph.D. Candidate

School of Cybersecurity and Privacy

College of Computing

Georgia Institute of Technology

 

Committee:

Dr. Manos Antonakakis (advisor), ECE, Georgia Institute of Technology

Dr. Mustaque Ahamad, CS, Georgia Institute of Technology

Dr. Angelos Keromytis, ECE, Georgia Institute of Technology

Dr. Roberto Perdisci, CS, University of Georgia

Dr. Chaz Lever, Senior Director - Security Research, Devo Technology Inc.

 

Abstract:

The Internet has changed significantly in size, interconnectedness, speed, capability, and usability over the years. Especially after a few years of remote work and remote learning, we can safely say that the Internet is an essential resource for the modern world. How- ever, even though the network has expanded massively since its inception, it still relies upon the same fundamental technologies that still form the backbone of interconnected networks. The Domain Name System (DNS) is one of those fundamental Internet technologies; its main task is to translate humanly readable domain names into resources on the ever-growing network. Because nearly all internet traffic, benign and malicious, utilizes DNS, the system has long been utilized by the security community, which has evolved along with the Internet to help battle new and ever more sophisticated threats, and DNS has been proven to be a valuable tool in that effort.

 

Studying the Domain Name System helps us understand how it can be abused and how it can also be a great tool in combating abuse. In order, though, for DNS to be useful for Internet defenders, they require access to quality datasets for identifying malicious behavior, building detection models, evaluating and running models on real-world datasets, and many more. Such datasets will enable the development of new algorithms and methodologies that can assist with the early detection, tracking, and overall lifetime of modern Internet threats.

 

To that end, this thesis presents the concept of Active DNS data collection through a distributed querying infrastructure. More specifically, we show how this new public dataset which we name Active DNS, compares against traditionally utilized passive DNS datasets and document our system’s unique features that enable it to function as an alternative to passive DNS data. We then demonstrate the ability of Active DNS data to detect online abuse by utilizing it to amplify already known malicious web infrastructure and potentially identify new abusive infrastructure before it’s even used. Finally, we show how our distributed querying system, Thales, allows us to study the operational aspects of the global DNS infrastructure, specifically investigating the proliferation of a new DNS extension and measuring the impact and efficacy of this new DNS extension through active probing.