The Web is based on a browsing paradigm that makes it di cult to retrieve and integrate data from multiple sites. Today, the only way to achieve this integration is by building sp...
In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) ...
Analysis of web site usage data involves two significant challenges: firstly the volume of data, arising from the growth of the web, and secondly, the structural complexity of web...
Amir H. Youssefi, David J. Duke, Mohammed Javeed Z...
Clipping Web pages, namely extracting the informative clips (areas) from Web pages, has many applications, such as Web printing and e-reading on small handheld devices. Although m...
Lei Zhang, Linpeng Tang, Ping Luo, Enhong Chen, Li...
The semantic annotation of documents is an additional advantage for retrieval, as long as the annotations and their maintenance process scale well. Automatic or semi-automatic ann...