Sciweavers

318 search results - page 58 / 64
» Mining data records in Web pages
Sort
View
ICDE
2004
IEEE
151views Database» more  ICDE 2004»
14 years 9 months ago
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of impo...
Torsten Suel, Patrick Noel, Dimitre Trendafilov
ICDE
2009
IEEE
194views Database» more  ICDE 2009»
14 years 10 months ago
Top-k Set Similarity Joins
Abstract-- Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Tradi...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Sh...
ECIR
2010
Springer
13 years 7 months ago
Biometric Response as a Source of Query Independent Scoring in Lifelog Retrieval
Personal lifelog archives contain digital records captured from an individual’s daily life, e.g. emails, web pages downloaded and SMSs sent or received. While capturing this info...
Liadh Kelly, Gareth J. F. Jones
KDD
2009
ACM
194views Data Mining» more  KDD 2009»
14 years 9 months ago
Combining link and content for community detection: a discriminative approach
In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most ...
Tianbao Yang, Rong Jin, Yun Chi, Shenghuo Zhu
KDD
2007
ACM
169views Data Mining» more  KDD 2007»
14 years 8 months ago
Exploiting underrepresented query aspects for automatic query expansion
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely ...
Daniel Crabtree, Peter Andreae, Xiaoying Gao