User generated content that appears on weblogs, wikis and social networks has been increasing at an unprecedented rate. The wealth of information produced by individuals from diff...
Electronic written texts used in computermediated interactions (e-mails, blogs, chats, etc) present major deviations from the norm of the language. This paper presents an comparat...
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain ...
This paper explores the problem of identifying sentence boundaries in the transcriptions produced by automatic speech recognition systems. An experiment which determines the level...
CAPTCHAs are automated Turing tests used to determine if the end-user is human and not an automated program. Users are asked to read and answer Visual CAPTCHAs, which often appear...