The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents...
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which...
Yiming Yang, Jian Zhang, Jaime G. Carbonell, Chun ...
We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model t...
Dustin Hillard, Stefan Schroedl, Eren Manavoglu, H...
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
The lack of standards for Romanization of Thai proper names makes searching activity a challenging task. This is particularly important when searching for people-related documents ...