An experimental study on large-scale web categorization

16 years 7 months ago

Download www2005.org

Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, k-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems. Categories and Subject Descriptors F.2 [Analysis of Algorithms and Problem Complexity]: Miscellaneous; I.5.4 [Pattern Recognition]: Applications ? Text processing. General Terms Technology Assessment, Performance and Scalability Analysis, Empirical Validation. Keywords Text categorization, very large Web ...

Tie-Yan Liu, Yiming Yang, Hao Wan, Qian Zhou, Bin

Real-time Traffic

Internet Technology | Large Web Taxonomies | Parameter Tuning Strategies | Threshold Tuning Strategies | WWW 2005 |

claim paper

» Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Lar...

» Investigating SelfSimilarity and HeavyTailed Distributions on a LargeScale Experimental Fa...

» Mining LargeScale Knowledge Sources for Case Adaptation Knowledge

» Experimental comparison of peertopeer streaming overlays An application perspective

» Probabilistic web image gathering

» Modeling Wikipedia Articles to Enhance Encyclopedic Search

» Semisupervised text categorization by active search

» Largescale text categorization by batch mode active learning

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2005
Where	WWW
Authors	Tie-Yan Liu, Yiming Yang, Hao Wan, Qian Zhou, Bin Gao, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma

Comments (0)

Sciweavers

An experimental study on large-scale web categorization

Internet Technology | Large Web Taxonomies | Parameter Tuning Strategies | Threshold Tuning Strategies | WWW 2005 |

Explore & Download

Productivity Tools

Sciweavers