Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collect...
Geography Markup Language (GML) is an XML-based language for the markup, storage, and exchange of geospatial data. It provides a rich geospatial vocabulary and allows flexible doc...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where rel...
The Web makes it possible for news readers to learn more about virtually any story that interests them. Media outlets and search engines typically augment their information with l...
Francisco Iacobelli, Larry Birnbaum, Kristian J. H...