We argue that the advent of large volumes of full-length text, as opposed to short texts tracts and newswire, should be accompanied by corresponding new approaches to information ...
The MEDLINE database is the world largest repository of bio-medical abstracts. It is a central information entry point for most biologists despite the growing availability of full-...
An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obt...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
—BitTorrent is the most popular peer-to-peer software for file sharing, which has contributed to a significant portion of today’s Internet traffic. Many measurement studies ...