The Utrecht Blend: Basic Ingredients for an XML Retrieval System

16 years 4 days ago

Download igitur-archive.library.uu.nl

Exploiting the structure of a document allows for more powerful information retrieval techniques. In this article a basic approach is discussed for the retrieval of XML document fragments. Based on a vector-space model for text retrieval we aim at investigating various strategies that inﬂuence the retrieval performance of an XML-based IR system. The ﬁrst extension of the system uses a schema-based approach that takes into account that authors tag their text to emphasise on particular pieces of content that are of extra importance. Based on the schema used by the document collection, the system can easily derive the childs of mixed content nodes and judge those child nodes to be more important than other nodes. A second approach discussed here is based on a horizontal fragmentation of the inverse document frequencies, used by the vector space model. The underlying assumption states that the spreading of terms is related to the semantical structure of the document. However, we obser...

Roelof van Zwol, Frans Wiering, Virginia Dignum

Real-time Traffic

Document | Document Fragments | INEX 2004 | Information Management | Retrieval Performance |

claim paper

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	INEX
Authors	Roelof van Zwol, Frans Wiering, Virginia Dignum

Comments (0)

Sciweavers

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

Document | Document Fragments | INEX 2004 | Information Management | Retrieval Performance |

Explore & Download

Productivity Tools

Sciweavers