Sciweavers

233 search results - page 42 / 47
» Clustering documents in a web directory
Sort
View
WWW
2006
ACM
14 years 8 months ago
Discovering event evolution graphs from newswires
In this paper, we propose an approach to automatically mine event evolution graphs from newswires on the Web. Event evolution graph is a directed graph in which the vertices and e...
Christopher C. Yang, Xiaodong Shi
ECIR
2004
Springer
13 years 9 months ago
Performance Analysis of Distributed Architectures to Index One Terabyte of Text
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...
Fidel Cacheda, Vassilis Plachouras, Iadh Ounis
DGO
2006
134views Education» more  DGO 2006»
13 years 9 months ago
Next steps in near-duplicate detection for eRulemaking
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
Hui Yang, Jamie Callan, Stuart W. Shulman
EDBT
2002
ACM
159views Database» more  EDBT 2002»
14 years 7 months ago
Cut-and-Pick Transactions for Proxy Log Mining
Web logs collected by proxy servers, referred to as proxy logs or proxy traces, contain information about Web document accesses by many users against many Web sites. This "man...
Wenwu Lou, Guimei Liu, Hongjun Lu, Qiang Yang
KDD
2007
ACM
169views Data Mining» more  KDD 2007»
14 years 7 months ago
Exploiting underrepresented query aspects for automatic query expansion
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely ...
Daniel Crabtree, Peter Andreae, Xiaoying Gao