Sciweavers

328 search results - page 42 / 66
» A Multi-level Approach for Document Clustering
Sort
View
CIDU
2010
13 years 5 months ago
Multi-label ASRS Dataset Classification Using Semi Supervised Subspace Clustering
There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data - multi-labelity. This arises due to the fact that a ...
Mohammad Salim Ahmed, Latifur Khan, Nikunj C. Oza,...
VLDB
2002
ACM
120views Database» more  VLDB 2002»
14 years 7 months ago
Efficient schemes for managing multiversionXML documents
Multiversion support for XML documents is needed in many critical applications, such as software configuration control, cooperative authoring, web information warehouses, and "...
Shu-Yao Chien, Vassilis J. Tsotras, Carlo Zaniolo
ICDAR
2009
IEEE
14 years 2 months ago
Text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis
In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriti...
Vincent Malleron, Véronique Eglin, Hubert E...
EMNLP
2004
13 years 9 months ago
Trained Named Entity Recognition using Distributional Clusters
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
Dayne Freitag
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han