Deep classification in large-scale text hierarchies

15 years 6 months ago

Download www.cse.ust.hk

Most classification algorithms are best at categorizing the Web documents into a few categories, such as the top two levels in the Open Directory Project. Such a classification method does not give very detailed topic-related class information for the user because the first two levels are often too coarse. However, classification on a large-scale hierarchy is known to be intractable for many target categories with cross-link relationships among them. In this paper, we propose a novel deep-classification approach to categorize Web documents into categories in a large-scale taxonomy. The approach consists of two stages: a search stage and a classification stage. In the first stage, a category-search algorithm is used to acquire the category candidates for a given document. Based on the category candidates, we prune the large-scale hierarchy to focus our classification effort on a small subset of the original hierarchy. As a result, the classification model is trained on the small subset...

Gui-Rong Xue, Dikan Xing, Qiang Yang, Yong Yu

Real-time Traffic

Category Candidates | Information Technology | Large-scale Hierarchy | Open Directory Project | SIGIR 2008 |

claim paper

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	SIGIR
Authors	Gui-Rong Xue, Dikan Xing, Qiang Yang, Yong Yu

Sciweavers

Deep classification in large-scale text hierarchies

Category Candidates | Information Technology | Large-scale Hierarchy | Open Directory Project | SIGIR 2008 |

Explore & Download

Productivity Tools

Sciweavers