Sciweavers

59 search results - page 10 / 12
» Web Document Clustering: A Feasibility Demonstration
Sort
View
SIGMOD
2007
ACM
105views Database» more  SIGMOD 2007»
14 years 7 months ago
Supporting entity search: a large-scale prototype search engine
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are increasingly inadequate. While we often search for various ...
Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang
ICDE
2003
IEEE
247views Database» more  ICDE 2003»
14 years 9 months ago
CLUSEQ: Efficient and Effective Sequence Clustering
Analyzing sequence data has become increasingly important recently in the area of biological sequences, text documents, web access logs, etc. In this paper, we investigate the pro...
Jiong Yang, Wei Wang 0010
ECIR
2004
Springer
13 years 9 months ago
Performance Analysis of Distributed Architectures to Index One Terabyte of Text
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...
Fidel Cacheda, Vassilis Plachouras, Iadh Ounis
ICCBR
2001
Springer
14 years 1 days ago
Mining High-Quality Cases for Hypertext Prediction and Prefetching
Case-based reasoning aims to use past experience to solve new problems. A strong requirement for its application is that extensive experience base exists that provides statisticall...
Qiang Yang, Ian Tian Yi Li, Henry Haining Zhang
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...