The Web is based on a browsing paradigm that makes it di cult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applicatio...
Abstract A rich family of generic Information Extraction (IE) techniques have been developed by researchers nowadays. This paper proposes WebKER, a system for automatically extract...
A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tab...
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as W...
The needs for managing similar documents in different languages increases with the growing amounts of electronic information available in documents of the same type (e.g. news str...
Roberto Basili, Maria Teresa Pazienza, Fabio Massi...