Sciweavers

CIKM
2009
Springer
14 years 3 months ago
Robust record linkage blocking using suffix arrays
Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic ...
Timothy de Vries, Hui Ke, Sanjay Chawla, Peter Chr...
CIKM
2009
Springer
14 years 3 months ago
Efficient feature weighting methods for ranking
Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generali...
Hwanjo Yu, Jinoh Oh, Wook-Shin Han
CIKM
2009
Springer
14 years 3 months ago
Classification-based resource selection
In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index o...
Jaime Arguello, Jamie Callan, Fernando Diaz
CIKM
2009
Springer
14 years 3 months ago
Generating SQL/XML query and update statements
The XML support in relational databases and the SQL/XML language are still relatively new as compared to purely relational databases and traditional SQL. Today, most database user...
Matthias Nicola, Tim Kiefer
CIKM
2009
Springer
14 years 3 months ago
Scalable indexing of RDF graphs for efficient join processing
Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structur...
George H. L. Fletcher, Peter W. Beck
CIKM
2009
Springer
14 years 3 months ago
Suffix trees for very large genomic sequences
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current metho...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt...
CIKM
2009
Springer
14 years 3 months ago
Efficient processing of group-oriented connection queries in a large graph
We study query processing in large graphs that are fundamental data model underpinning various social networks and Web structures. Given a set of query nodes, we aim to find the g...
James Cheng, Yiping Ke, Wilfred Ng
CIKM
2009
Springer
14 years 3 months ago
Ensembles in adversarial classification for spam
The standard method for combating spam, either in email or on the web, is to train a classifier on manually labeled instances. As the spammers change their tactics, the performanc...
Deepak Chinavle, Pranam Kolari, Tim Oates, Tim Fin...
CIKM
2009
Springer
14 years 3 months ago
Empirical justification of the gain and discount function for nDCG
The nDCG measure has proven to be a popular measure of retrieval effectiveness utilizing graded relevance judgments. However, a number of different instantiations of nDCG exist, d...
Evangelos Kanoulas, Javed A. Aslam
CIKM
2009
Springer
14 years 3 months ago
Improving binary classification on text problems using differential word features
We describe an efficient technique to weigh word-based features in binary classification tasks and show that it significantly improves classification accuracy on a range of proble...
Justin Martineau, Tim Finin, Anupam Joshi, Shamit ...