: Because digital libraries are expensive to create and maintain, Internet analogs of public libraries--reliable, quality, community services--have only recently begun to appear. A...
Ian H. Witten, Craig G. Nevill-Manning, Sally Jo C...
Extractors and taggers turn unstructured text into entityrelation (ER) graphs where nodes are entities (email, paper, person, conference, company) and edges are relations (wrote, ...
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been create...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...