In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
In search engines, ranking algorithms measure the importance and relevance of documents mainly based on the contents and relationships between documents. User attributes are usual...
The requirements for secure document workflows in enterprises become increasingly sophisticated, with employees performing different tasks under different roles using the same pro...
Yacine Gasmi, Ahmad-Reza Sadeghi, Patrick Stewin, ...
Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristic...
Term signal is an existing text representation that depicts a term as a vector of frequencies of occurrences in a number of user-defined partitions of a document. Although term si...
Supphachai Thaicharoen, Tom Altman, Krzysztof J. C...
Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them...
Carlos Castillo, Claudio Corsi, Debora Donato, Pao...
A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to provide a solution that scales to hundreds of thousands of...
When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire docume...
Search engines present fix-length passages from documents ranked by relevance against the query. In this paper, we present and compare novel, language-model based methods for extr...
Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expa...