This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demon...
Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML d...
Massih-Reza Amini, Anastasios Tombros, Nicolas Usu...
Traditional information retrieval systems aim at satisfying most users for most of their searches, leaving aside the context in which the search takes place. We propose to model tw...
Nathalie Hernandez, Josiane Mothe, Claude Chrismen...
Variants of Huffman codes where words are taken as the source symbols are currently the most attractive choices to compress natural language text databases. In particular, Tagged...
Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification a...
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate d...
The complexity and diversity of government regulations make understanding and retrieval of regulations a non-trivial task. One of the issues is the existence of multiple sources o...