Sciweavers

2827 search results - page 146 / 566
» Marking Text Documents
Sort
View
PAKDD
2009
ACM
127views Data Mining» more  PAKDD 2009»
14 years 3 months ago
Clustering Documents Using a Wikipedia-Based Concept Representation
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation b...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
ICDAR
2007
IEEE
14 years 2 months ago
Content-level Annotation of Large Collection of Printed Document Images
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
Anand Kumar 0002, C. V. Jawahar
RIDE
2002
IEEE
14 years 1 months ago
Enhancive Index for Structured Document
Structured documents, especially the XML documents, are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured...
Xiaoling Wang, Ji-Rong Wen, Yisheng Dong, Wenyin L...
FLAIRS
2006
13 years 9 months ago
Corpus Based Unsupervised Labeling of Documents
Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...
Delip Rao, Deepak P, Deepak Khemani
KDD
2004
ACM
160views Data Mining» more  KDD 2004»
14 years 8 months ago
Boosting for Text Classification with Semantic Features
Abstract. Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic...
Stephan Bloehdorn, Andreas Hotho