Over the years the amount and range of electronic text stored on the WWW has expanded rapidly, overwhelming both users and tools designed to index and search the information. It is impossible to index the WWW dynamically at query time due to the sheer volume so the index must be pre-compiled and stored in a compact but incremental data structure as the information is ever-changing. Much of the text is unstructured so a data structure must be constructed from such text, storing associations between words and the documents that contain them. The index must be able to index ne-grained word-based associations handle more abstract concepts such as synonym groups. A search tool is also required to link to the index and enable the user to pinpoint their required information. We describe such a system we have developed in an integrated hybrid neural architecture and evaluate our system against the benchmark SMART system for retrieval accuracy: recall and precision.
Victoria J. Hodge, Jim Austin