The compression of binary texts using antidictionaries is a novel technique based on the fact that some substrings (called “antifactors”) never appear in the text. Let × be a...
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resourc...
Juri Ganitkevitch, Chris Callison-Burch, Courtney ...
Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collect...
In the last years, new semistatic word-based byte-oriented compressors, such as Plain and Tagged Huffman and the Dense Codes, have been used to improve the efficiency of text retri...
In recent years, new semistatic word-based byte-oriented text compressors, such as Tagged Huffman and those based on Dense Codes, have shown that it is possible to perform fast d...