In this paper we present a methodology to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. This taxonomy represents a hierarchy o...
The field of automatic genre classification has primarily focused on extracting textual features from documents. The goal of this research is to investigate whether visual feature...
Most of the current WWW is made up of dynamic pages. The development of dynamic pages is a difficult and costly endeavour, out-of-reach for most users, experts, and content produce...
In this paper, we propose to mine query hierarchies from clickthrough data, which is within the larger area of automatic acquisition of knowledge from the Web. When a user submits...
Dou Shen, Min Qin, Weizhu Chen, Qiang Yang, Zheng ...
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit str...