To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
In the semantic web environment, where several independent ontologies are used in order to describe knowledge and data, ontologies have to be aligned by defining mappings among the...
Web applications typically interact with a back-end database to retrieve persistent data and then present the data to the user as dynamically generated output, such as HTML web pa...
Collaborative tagging systems allow users to use tags to describe their favourite online documents. Two documents that are maintained in the collection of the same user and/or ass...
Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbo...
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can b...
Monika Rauch Henzinger, Bay-Wei Chang, Brian Milch...