Sciweavers

1368 search results - page 108 / 274
» Designing a Self-Maintaining Storage System
Sort
View
137
Voted
SIGIR
2004
ACM
15 years 9 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
115
Voted
SIGIR
2004
ACM
15 years 9 months ago
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
Dmitry Davidov, Evgeniy Gabrilovich, Shaul Markovi...
150
Voted
HT
2009
ACM
15 years 1 months ago
Retrieving broken web links using an approach based on contextual information
In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from th...
Juan Martinez-Romo, Lourdes Araujo
WWW
2009
ACM
16 years 4 months ago
Discovering user profiles
In this paper we describe techniques for the discovery and construction of user profiles. Leveraging from the emergent data web, our system addresses the problem of sparseness of ...
Riddhiman Ghosh, Mohamed Dekhil
WWW
2007
ACM
16 years 4 months ago
BlogScope: spatio-temporal analysis of the blogosphere
We present BlogScope (www.blogscope.net), a system for analyzing the Blogosphere. BlogScope is an information discovery and text analysis system that offers a set of unique featur...
Nilesh Bansal, Nick Koudas