We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...
As large quantity of document images is getting archived by the digital libraries, there is a need for an efficient search strategies to make them available as per users informatio...
When search is against structured documents, it is beneficial to extract information from user queries in a format that is consistent with the backend data structure. As one step...
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...