Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
A novel method to automatically associate ontological concepts to their realisations in texts is presented. The method has been developed in the context of the Papyrus project to ...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...
We all encounter many documents on a daily basis that we do not have time to process in their entirety. Nevertheless, we lack good tools to rapidly skim and identify key informati...