Sciweavers

35 search results - page 7 / 7
» Document centered approach to text normalization
Sort
View
HPDC
2003
IEEE
14 years 1 months ago
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
Abstract. We present PlanetP, a peer-to-peer (P2P) content search and retrieval infrastructure targeting communities wishing to share large sets of text documents. P2P computing is...
Francisco Matias Cuenca-Acuna, Christopher Peery, ...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
SIGIR
2009
ACM
14 years 2 months ago
Addressing morphological variation in alphabetic languages
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...
Paul McNamee, Charles K. Nicholas, James Mayfield
SIGIR
2008
ACM
13 years 7 months ago
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and mach...
Dingding Wang, Tao Li, Shenghuo Zhu, Chris H. Q. D...
ISI
2007
Springer
14 years 2 months ago
Mining Higher-Order Association Rules from Distributed Named Entity Databases
The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective ...
Shenzhi Li, Christopher D. Janneck, Aditya P. Bela...