notice
Seminar by Dr. Samuel McCauley (IT University of Copenhagen)
Speaker: Dr. Samuel McCauley (IT University of Copenhagen)
Title: Faster Algorithms for Approximate Membership and Similarity Search
Date: Monday February 12th, 2018
Time: 10:30am-12pm
Room: EV 3.309
ABSTRACT
In the last few years, more and more of computer science has been based on processing extremely large amounts of data. This has lead to a challenge for algorithm designers: how do we store an ever-growing volume of data so that it can be queried quickly and efficiently?
In this talk, I will look at two fundamental problems: membership search (where we want to determine if a query point is already present in a database), and similarity search (finding the closest database point to a given query). These problems can be solved effectively using optimal data structures: Bloom Filters and MinHash, respectively. However, it turns out that these data structures can be improved significantly using simple, realistic assumptions on the type of data being queried. For similarity search, I will give a data structure that can efficiently query data from a skewed distribution. This leads to distribution-dependent asymptotic bounds that are significantly better than those of MinHash. For membership search, I will describe the first adaptive Bloom Filter, which is able to fix mistakes it made in the past without any extra asymptotic cost. These fixed mistakes improve the security of the data structure against denial of service attacks, and can significantly improve performance on query sets with large numbers of repeats. Along the way, I will give the first known Bloom Filter with deamortized constant-time insert and delete cost, a crucial building block to the adaptive data structure.
BIO
Samuel McCauley is a post-doc at IT University of Copenhagen. He received his PhD from Stony Brook University, advised by Michael Bender, and his B.A. from Tufts University. He is interested in algorithms and the theory of computation, particularly in problems involving searching through large amounts of data. He is the recipient of the best paper award at IPDPS 2015, a Chateaubriand research fellowship, and an NSF EAPSI fellowship.