We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...
In documents, tables are important structured objects that present statistical and relational information. In this paper, we present a robust system which is capable of detecting t...
The World-Wide-Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics...
According to the logical model of Information Retrieval (IR), the task of IR can be described as the extraction, from a given document base, of those documents d that, given a que...
Carlo Meghini, Fabrizio Sebastiani, Umberto Stracc...
A system, called NewsStand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news article and applies an online clustering algori...