We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
We present a shallow linguistic approach to subjectivity classification. Using multinomial kernel machines, we demonstrate that a data representation based on counting character n...
For CLEF 2008 JHU conducted monolingual and bilingual experiments in the ad hoc TEL and Persian tasks. The TEL task involved focused on searching electronic card catalog records i...
Abstract. This paper describes the technique for translation of character n-grams we developed for our participation in CLEF 2006. This solution avoids the need for word normalizat...
Automatic authorship identification offers a valuable tool for supporting crime investigation and security. It can be seen as a multi-class, single-label text categorization task. ...
Abstract. This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the nee...
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...