In order to evaluate the performance of information retrieval and extraction algorithms, we need test collections. A test collection consists of a set of documents, a clearly form...
User queries on extensible markup language (XML) documents are typically expressed as regular path expressions. A variety of indexing techniques for efficiently retrieving the re...
This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in ma...
Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational c...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...