In this paper we introduce the webpage understanding problem which consists of three subtasks: webpage segmentation, webpage structure labeling, and webpage text segmentation and ...
The two most important tasks in information extraction from the Web are webpage structure understanding and natural language sentences processing. However, little work has been don...
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, Ji-R...
This paper presents our latest work on identifying frame content types for understanding learning media content. In particular, we categorize frames into six classes namely, slide...
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to...