
- This event has passed.
When your big data seems too small: accurate inferences beyond the empirical distribution
August 23, 2018 @ 6:00 pm - 8:00 pm PDT
Speaker: Gregory Valiant, Stanford University
Abstract: We discuss several problems related to the general challenge of making accurate inferences about a complex phenomenon, in the regime in which the amount of available data (i.e the sample size) is too small for the empirical distribution of the samples to be an accurate representation of the phenomenon in question. We show that for several fundamental and practically relevant settings, including estimating the covariance structure of a high-dimensional distribution, and learning a population of distributions given few data points from each individual, it is possible to “denoise” the empirical distribution significantly. We will also discuss the problem of estimating the “learnability” of a dataset: given too little labeled data to train an accurate model, we show that it is often possible to estimate the extent to which a good model exists. Framed differently, even in the regime in which there is insufficient data to learn, it is possible to estimate the performance that could be achieved if additional data (drawn from the same data source) were obtained. Our results, while theoretical, have a number of practical applications, and we also discuss some of these applications.
Biography: Gregory Valiant is an assistant professor of Computer Science at Stanford University. His current research interests span algorithms, statistics, and machine learning, with an emphasis on developing algorithms and information theoretic lower bounds for a variety of fundamental data-centric tasks. Recently, this work has also included questions of how to robustly extract meaningful information from untrusted datasets that might contain a significant fraction of corrupted or arbitrarily biased data points. Prior to joining Stanford, Gregory completed his PhD at UC Berkeley in 2012, and was a postdoctoral researcher at Microsoft Research, New England. He has received several honors, including the ACM Dissertation Award Honorable Mention, NSF Career Award, and Sloan Foundation Fellowship.