Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic ...
Timothy de Vries, Hui Ke, Sanjay Chawla, Peter Chr...
Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generali...
In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index o...
The XML support in relational databases and the SQL/XML language are still relatively new as compared to purely relational databases and traditional SQL. Today, most database user...
Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structur...
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current metho...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt...
We study query processing in large graphs that are fundamental data model underpinning various social networks and Web structures. Given a set of query nodes, we aim to find the g...
The standard method for combating spam, either in email or on the web, is to train a classifier on manually labeled instances. As the spammers change their tactics, the performanc...
Deepak Chinavle, Pranam Kolari, Tim Oates, Tim Fin...
The nDCG measure has proven to be a popular measure of retrieval effectiveness utilizing graded relevance judgments. However, a number of different instantiations of nDCG exist, d...
We describe an efficient technique to weigh word-based features in binary classification tasks and show that it significantly improves classification accuracy on a range of proble...
Justin Martineau, Tim Finin, Anupam Joshi, Shamit ...