The web hasgreatly improved accessto scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spreadacrossarchive site...
Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collect...
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the cont...
Abstract This paper describes the University of Sheffield entry for the 3rd International Competition on Plagiarism Detection which attempted the monolingual external plagiarism d...
Rao Muhammad Adeel Nawab, Mark Stevenson, Paul D. ...
Since WWW encourages hypertext and hypermedia document authoring (e.g. HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperl...