InfoAnalyzer: a computer-aided tool for building enterprise taxonomies

15 years 10 months ago

Download domino.research.ibm.com

In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively assist the enterprise to prepare large set of samples used for machine learning in text categorization. In our system, the enterprise category tree is initially defined by some keywords, then the Google search engine is used to construct a small set of labeled documents, and topic tracking algorithm based on document length normalization is applied to enlarge the training corpus on the bases of the seed stories. Furthermore, we design a method to check the consistency of the training corpus. Experiments show that the training corpus is good enough for statistical classification methods and meets human's requirements as well. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural language Processing--Text Analysis; H.3.3 [Information Storage And Retrieval]: Information Search and Retrie...

Li Zhang, Shixia Liu, Yue Pan, Liping Yang

Real-time Traffic

CIKM 2004 | Document Length Normalization | Information Management | Topic Tracking | Training Corpus |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	CIKM
Authors	Li Zhang, Shixia Liu, Yue Pan, Liping Yang

Comments (0)

Sciweavers

InfoAnalyzer: a computer-aided tool for building enterprise taxonomies

CIKM 2004 | Document Length Normalization | Information Management | Topic Tracking | Training Corpus |

Explore & Download

Productivity Tools

Sciweavers