Sciweavers

NDT
2010

Web Document Classification by Keywords Using Random Forests

13 years 11 months ago
Web Document Classification by Keywords Using Random Forests
Web directory hierarchy is critical to serve user’s search request. Creating and maintaining such directories without human experts involvement requires good classification of web documents. In this paper, we explore web page classification using keywords from documents as attributes and using the random forest learning methods. Our initially results are promising that the random forests learning method performed better than several other well known learning methods. When the number of topics increased from five to seven, random forests still performed better than other methods even though absolute classification rates decreased.
Myungsook Klassen, Nikhila Paturi
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where NDT
Authors Myungsook Klassen, Nikhila Paturi
Comments (0)