Indexing file systems is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many different types of data, content indexing requires type-specific processing to extract information effectively. Cover page (or Semantic Header) is a portion of each document which should contain information useful in searching for a document based on a number of commonly used criteria. The information from the semantic header could be used by various indexing schemes to help locate appropriate documents with minimum effort. In this paper, we present a model that automatically extracts the secondary or meta-information, and stores it in a Semantic Header which will be used as an index for the document, which will help users in accessing and searching for it.
Bipin C. Desai, Sami S. Haddad, Abdelbaset Ali