Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
Abstract The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this pa...
This paper presents a means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering technique...
It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...
The web hasgreatly improved accessto scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spreadacrossarchive site...