We consider the problem of constructing decision trees for entity identification from a given relational table. The input is a table containing information about a set of entities...
Venkatesan T. Chakaravarthy, Vinayaka Pandit, Samb...
The structure of customer communication network provides us a natural way to understand customers’ relationships. Traditional customer relationship management (CRM) methods focu...
The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link ana...
Joins are essential for many data analysis tasks, but are not supported directly by the MapReduce paradigm. While there has been progress on equi-joins, implementation of join alg...
Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are g...