Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled doc...
Abstract. Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domaindependent and there are m...
The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a...
We describe the lexical knowledge base system (LKB) which has been designed and implemented as part of the ACQUILEX project1 to allow the representation of multilinguM syntactic a...
There has been relatively little work focused on determining the formality level of individual lexical items. This study applies information from large mixedgenre corpora, demonst...