Sciweavers

2117 search results - page 338 / 424
» A Competitive Term Selection Method for Information Retrieva...
Sort
View
136
Voted
CIKM
2011
Springer
14 years 3 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov
174
Voted
CIKM
2008
Springer
15 years 5 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
138
Voted
ADBIS
2006
Springer
104views Database» more  ADBIS 2006»
15 years 9 months ago
Multi-source Materialized Views Maintenance: Multi-level Views
In many information systems, the databases that make up the system are distributed in different modules or branch offices according to the requirements of the business enterprise. ...
Josep Silva, Jorge Belenguer, Matilde Celma
138
Voted
CIKM
2007
Springer
15 years 10 months ago
Randomized metric induction and evolutionary conceptual clustering for semantic knowledge bases
We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Sema...
Nicola Fanizzi, Claudia d'Amato, Floriana Esposito
106
Voted
WWW
2007
ACM
16 years 4 months ago
Combining classifiers to identify online databases
We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database...
Luciano Barbosa, Juliana Freire