Title: Enabling Parallelism and Optimizations in Data Mining Algorithms
Ph.D. Student in Computer Science
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Friday, Nov 15, 2019
Time: 2:00 PM - 4:00 PM ET
Location: KACB 3100
Dr. Vivek Sarkar (Advisor, School of Computer Science, Georgia Tech)
Dr. Anshumali Shrivastava (Co-advisor, Department of Computer Science, Rice University)
Dr. Hyesoon Kim (School of Computer Science, Georgia Tech)
Dr. Santosh Pande (School of Computer Science, Georgia Tech)
Dr. Richard Vuduc (School of Computational Science and Engineering, Georgia Tech)
In this era of big data, one of the most crucial fields in Computer Science is data mining. Extracting meaningful information from a large amount of data in a reasonable time became possible mainly through two types of advancements - a) algorithmic, such as fast approximate algorithms and efficient learning algorithms, and b) architectural, such as machines with massive compute power involving distributed multi-core processors and high throughput accelerators. With the end of Dennard scaling, the free lunch in performance improvement through processor frequency increases ended, and multiple cores became a practical solution to gain performance. In current-generation processors, parallelism is ubiquitous, and thus parallel algorithms are a necessary means to utilize computing resources efficiently. In our work, we address the following questions - how well do the critical data mining algorithms of current interest fit with today's parallel architectures, which algorithmic and mapping opportunities can we leverage to further improve performance, and what are the challenges and gains in such efforts?
First, we investigate a classical application in large scale data stream processing - the frequency estimation problem - for GPU scale parallelism. Among the approximate solutions with sublinear memory requirements, Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations, with little exploration of throughput-optimized parallel architectures such as GPUs. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the data to efficiently utilize the massive amounts of parallelism available in modern GPUs.
Next, we explore one of the most common and important tasks in data mining - identification of most frequent elements - on modern distributed settings with both multi-node as well as multi-core parallelism. In this work, we start by observing that the existing approximate algorithms, although theoretically sound, are suboptimal from the performance perspective. Specifically, for identifying top-K frequent items, Count-Min Sketch (CMS) has an excellent update time but lacks the important property of reducibility, which is needed for exploiting available massive data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but the update costs are high. Our approach Topkapi, a fast and parallel algorithm for finding top-K frequent items, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. In addition, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work.
Finally, we explore opportunities for improvement in Word2Vec, a word embedding method. Word embedding is an essential application for Machine learning and Natural Language Processing, since distributed representations of words significantly improve accuracy in text processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data parallelism. With increasing vector length of commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. So, we look into the performance landscape for current Word2Vec approaches on CPUs and propose new ideas to overcome their shortcomings.