Sciweavers

572 search results - page 82 / 115
» Winnowing-based text clustering
Sort
View
DOCENG
2005
ACM
13 years 9 months ago
Structuring documents according to their table of contents
In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination ...
Hervé Déjean, Jean-Luc Meunier
LREC
2008
126views Education» more  LREC 2008»
13 years 9 months ago
Identifying Strategic Information from Scientific Articles through Sentence Classification
We address here the need to assist users in rapidly accessing the most important or strategic information in the text corpus by identifying sentences carrying specific information...
Fidelia Ibekwe-Sanjuan, Chaomei Chen, Roberto Pinh...
DOCENG
2010
ACM
13 years 8 months ago
Picture detection in document page images
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to s...
Patrick Chiu, Francine Chen, Laurent Denoue
EMNLP
2010
13 years 5 months ago
Evaluating Models of Latent Document Semantics in the Presence of OCR Errors
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
TKDE
2010
224views more  TKDE 2010»
13 years 2 months ago
Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering
Coclustering heterogeneous data has attracted extensive attention recently due to its high impact on various important applications, such us text mining, image retrieval, and bioin...
Yanhua Chen, Lijun Wang, Ming Dong