The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Understanding intents from search queries can improve a user’s search experience and boost a site’s advertising profits. Query tagging via statistical sequential labeling mode...
Ye-Yi Wang, Raphael Hoffmann, Xiao Li, Jakub Szyma...
Tagging systems such as del.icio.us and Diigo have become important ways for users to organize information gathered from the Web. However, despite their popularity among early ado...
Lichan Hong, Ed Huai-hsin Chi, Raluca Budiu, Peter...
In this paper, we propose a probabilistic model for web image mining, which is based on concept-sensitive salient regions without human intervene. Our goal is to achieve a middle-...
Complex structure and varying requirements increase the difficulties in developing domain specific Web Information Systems. People appeal to a smart tool to customize Web Informati...