This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
As more business applications have become web enabled, the web server architecture has evolved to provide performance isolation, service differentiation, and QoS guarantees. Vario...
This paper uses the URL word breaking task as an example to elaborate what we identify as crucialin designingstatistical natural language processing (NLP) algorithmsfor Web scale ...
Kuansan Wang, Christopher Thrasher, Bo-June Paul H...
Duplicate and near-duplicate digital image matching is beneficial for image search in terms of collection management, digital content protection, and search efficiency. In this ...
We propose a novel approach to facilitate the concurrent development of ontologies by different groups of experts. Our approach adapts Concurrent Versioning, a successful paradigm...