Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorizatio...
Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message l...
\Prose rhythm" is a widely observed but scarcely quanti ed phenomenon. We describe an information-theoretic model for measuring the regularity of lexical stress in English te...
We describe the authoring tool, EasyEnglish, which is part of IBM's internal SGML editing environment, Information Development Workbench. EasyEnglish helps writers produce cl...
This paper presents an unsupervised learning approach to building a non-English (Arabic) stemmer. The stemming model is based on statistical machine translation and it uses an Eng...