Approximate computation and implicit regularization for very large-scale data analysis

12 years 9 months ago

Download cs-www.cs.yale.edu

Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very diﬀerent than the more statistical perspective adopted by statisticians, scientiﬁc computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very diﬀerent approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies...

Michael W. Mahoney

Real-time Traffic