Sciweavers

WIDM
2006
ACM

Coarse-grained classification of web sites by their structural properties

14 years 6 months ago
Coarse-grained classification of web sites by their structural properties
In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the composition of URLs, and the link structure of Web sites. Opposed to previous work, we perform a comprehensive measurement study to delve into the relation between the structure and the functionality of Web sites. Our study focuses on five of the most relevant functional classes, namely Academic, Blog, Corporate, Personal, and Shop. It is based upon more than 1,400 Web sites composed of 7 million crawled and 47 million known Web pages. We present a detailed statistical analysis which provides insight into how structural properties can be used to distinguish between Web sites from different functional classes. Building on these results, we introduce a content-independent approach for the automated coarse-grained classification of Web sites. A naïve Bayesian classifier with advanced density estimation yields a p...
Christoph Lindemann, Lars Littig
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where WIDM
Authors Christoph Lindemann, Lars Littig
Comments (0)