Sciweavers

233 search results - page 22 / 47
» Clustering documents in a web directory
Sort
View
BMCBI
2006
153views more  BMCBI 2006»
13 years 7 months ago
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
David Chen, Hans-Michael Müller, Paul W. Ster...
KDD
2002
ACM
170views Data Mining» more  KDD 2002»
14 years 8 months ago
Enhanced word clustering for hierarchical text classification
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
WWW
2007
ACM
14 years 8 months ago
On building graphs of documents with artificial ants
We present an incremental algorithm for building a neighborhood graph from a set of documents. This algorithm is based on a population of artificial agents that imitate the way re...
Hanane Azzag, Julien Lavergne, Christiane Guinot, ...
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 2 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
SAC
2006
ACM
14 years 1 months ago
A scalable algorithm for high-quality clustering of web snippets
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...