The discrete nature of categorical data makes it a particular challenge for visualization. Methods that work very well for continuous data are often hardly usable with categorical...
The snapshot of a word means the most informative fragment of the word. By taking the snapshot instead of the whole, the value space of the lexical feature can be significantly r...
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and ...
The TREC Blog track aims to explore information seeking behaviour in the blogosphere, by building reusable test collections for blog-related search tasks. Since, its advent in TRE...
Craig Macdonald, Rodrygo L. T. Santos, Iadh Ounis,...
A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering bec...