Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this...
Increasingly user-generated content is being utilised as a source of information, however each individual piece of content tends to contain low levels of information. In addition, ...
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Abstract—In distributed storage systems built using commodity hardware, it is necessary to store multiple replicas of every data chunk in order to ensure system reliability. In s...
—In this paper, we present a novel approach to search and retrieve from document image collections, without explicit recognition. Existing recognition-free approaches such as wor...