Document-Base Extraction for Single-Label Text Classification

15 years 8 months ago

Download www.csc.liv.ac.uk

Many text mining applications, especially when investigating Text Classification (TC), require experiments to be performed using common textcollections, such that results can be compared with alternative approaches. With regard to single-label TC, most text-collections (textual data-sources) in their original form have at least one of the following limitations: the overall volume of textual data is too large for ease of experimentation; there are many predefined classes; most of the classes consist of only a very few documents; some documents are labeled with a single class whereas others have multiple classes; and there are documents found with little or no actual text-content. In this paper, we propose a standard approach to automatically extract "qualified" document-bases from a given textual data-source that can be used more effectively and reliably in single-label TC experiments. The experimental results demonstrate that document-bases extracted based on our approach can...

Yanbo J. Wang, Robert Sanderson, Frans Coenen, Pau

Real-time Traffic

DAWAK 2008 | Information Management | Many Text Mining | Single-label Tc | Textual Data-source |

claim paper

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	DAWAK
Authors	Yanbo J. Wang, Robert Sanderson, Frans Coenen, Paul H. Leng

Sciweavers

Document-Base Extraction for Single-Label Text Classification

DAWAK 2008 | Information Management | Many Text Mining | Single-label Tc | Textual Data-source |

Explore & Download

Productivity Tools

Sciweavers