The majority of the current information retrieval models weight the query concepts (e.g., terms or phrases) in an unsupervised manner, based solely on the collection statistics. I...
In this paper we describe a cluster-based plagiarism detection method, which we have used in the learning management system of SCUT to detect plagiarism in the network engineering ...
A geometric and non parametric procedure for testing if two nite set of points are linearly separable is proposed. The Linear Separability Test is equivalent to a test that deter...
A. P. Yogananda, M. Narasimha Murty, Lakshmi Gopal
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can b...
Aron Culotta, Michael L. Wick, Robert Hall, Matthe...