To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
A complex software system typically has a large number of objects in the memory, holding references to each other to implement an object model. Deciding when the objects should be...
In this paper we propose a novel approach to markup, called Extreme Annotational RDF Markup (EARMARK), using RDF and OWL to annotate features in text content that cannot be mapped...
We explore the relationship between time and relevance using TREC ad-hoc queries. A type of query is identified that favors very recent documents. We propose a time-based language...
We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasi...
Hongyuan Zha, Zhaohui Zheng, Haoying Fu, Gordon Su...