Sciweavers

81 search results - page 7 / 17
» Learning to Separate Text Content and Style for Classificati...
Sort
View
NAACL
2007
13 years 9 months ago
Entity Extraction is a Boring Solved Problem - Or is it?
This paper presents empirical results that contradict the prevailing opinion that entity extraction is a boring solved problem. In particular, we consider data sets that resemble ...
Marc Vilain, Jennifer Su, Suzi Lubar
ICML
2004
IEEE
14 years 8 months ago
A MFoM learning approach to robust multiclass multi-label text categorization
We propose a multiclass (MC) classification approach to text categorization (TC). To fully take advantage of both positive and negative training examples, a maximal figure-of-meri...
Sheng Gao, Wen Wu, Chin-Hui Lee, Tat-Seng Chua
WWW
2008
ACM
14 years 8 months ago
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from larges...
Xuan Hieu Phan, Minh Le Nguyen, Susumu Horiguchi
WIDM
2004
ACM
14 years 1 months ago
Stylistic and lexical co-training for web block classification
Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classifica...
Chee How Lee, Min-Yen Kan, Sandra Lai
CIKM
2006
Springer
13 years 11 months ago
A comparative study on classifying the functions of web page blocks
In this paper, we study the problem of learning block classification models to estimate block functions. We distinguish general models, which are learned across multiple sites, an...
Xiangye Xiao, Qiong Luo, Xing Xie, Wei-Ying Ma