Sciweavers

2715 search results - page 197 / 543
» Database Publication Practices
Sort
View
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
RECOMB
2008
Springer
14 years 8 months ago
Automatic Parameter Learning for Multiple Network Alignment
We developed Gr?mlin 2.0, a new multiple network aligner with (1) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletion...
Jason Flannick, Antal F. Novak, Chuong B. Do, Bala...
SIGIR
2006
ACM
14 years 2 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan
IDEAL
2005
Springer
14 years 1 months ago
Probabilistic Data Generation for Deduplication and Data Linkage
Abstract. In many data mining projects the data to be analysed contains personal information, like names and addresses. Cleaning and preprocessing of such data likely involves dedu...
Peter Christen
WAIM
2009
Springer
14 years 24 days ago
IRank: A Term-Based Innovation Ranking System for Conferences and Scholars
Since the proposition of Journal Impact Factor [1] in 1963, the classical citation-based ranking scheme has been a standard criterion to rank journals and conferences. However, the...
Zhixu Li, Xiaoyong Du, Hongyan Liu, Jun He, Xiaofa...