By mapping messages into a large context, we can compute the distances between them, and then classify them. We test this conjecture on Twitter messages: Messages are mapped onto t...
Yegin Genc, Yasuaki Sakamoto, Jeffrey V. Nickerson
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
Large astronomical databases obtained from sky surveys such as the SuperCOSMOS Sky Surveys (SSS) invariably suffer from spurious records coming from artefactual effects of the t...
Amos J. Storkey, Nigel C. Hambly, Christopher K. I...
—String matching is a ubiquitous problem that arises in a wide range of applications in computing, e.g., packet routing, intrusion detection, web querying, and genome analysis. D...