The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists maybe scattered across thous...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
In this poster, we present a method for extracting queries related to real-life events, or news-related queries, from large web query logs. The method employs query frequencies an...
Michael Maslov, Alexander Golovko, Ilya Segalovich...
The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by ...
Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nag...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...