The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
In a recent paper by Hellerstein [15], a tight relationship was conjectured between the number of strata of a Datalog¬ program and the number of “coordination stages” require...
In the standard formalization of supervised learning problems, a datum is represented as a vector of features without prior knowledge about relationships among features. However, ...
We propose a model for user purchase behavior in online stores that provide recommendation services. We model the purchase probability given recommendations for each user based on...
Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a cruc...