Summarization as Feature Selection for Document Categorization on Small Datasets

15 years 5 months ago

Download users.dsic.upv.es

Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and discriminative information from the defined categories. Considering that training sets are extremely small in many classification tasks, in this paper we explore the use of unsupervised extractive summarization as a feature selection technique for document categorization. Our experiments using training sets of different sizes indicate that text summarization is a competitive approach for feature selection, and show its appropriateness for situations having small training sets, where it could clearly outperform the traditional information gain technique.

Emmanuel Anguiano-Hernández, Luis Villase&n

Real-time Traffic

Document Categorization | Feature Selection Technique | Natural Language Processing | TAL 2010 | Training Sets |

claim paper

» Text categorization with many redundant features using aggressive feature selection to mak...

» An Empirical Study of Category Skew on Feature Selection for Text Categorization

» Impact on Performance of Hypertext Classification of Selective Rich HTML Capture

» A Study of Local and Global Thresholding Techniques in Text Categorization

» FACT Fast Algorithm for Categorizing Text

» Supervised Evaluation of Dataset Partitions Advantages and Practice

» Which Clustering Do You Want Inducing Your Ideal Clustering with Minimal Feedback

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	TAL
Authors	Emmanuel Anguiano-Hernández, Luis Villaseñor Pineda, Manuel Montes-y-Gómez, Paolo Rosso

Comments (0)

Sciweavers

Summarization as Feature Selection for Document Categorization on Small Datasets

Document Categorization | Feature Selection Technique | Natural Language Processing | TAL 2010 | Training Sets |

Explore & Download

Productivity Tools

Sciweavers