Basic language-inherent tempo cannot be isolated by the current metrics of speech rhythm. Here we propose the number of syllables per intonation unit as an appropriate measure, al...
In contrast with the booming increase of internet data, state-of-art QA (question answering) systems, otherwise, concerned data from specific domains or resources such as search e...
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...
Clinical coding and classification processes transform natural language descriptions in clinical text into data that can subsequently be used for clinical care, research, and othe...
Mary H. Stanfill, Margaret Williams, Susan H. Fent...
Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clusteri...