One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributi...
Automatic text categorization is a problem of automatically assigning text documents to predefined categories. In order to classify text documents, we must extract good features f...
A trainable method for distinguishing between mathematics notation and natural language (here, English) in images of textlines, using computational geometry methods only with no a...
This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extract...
We present a class of web queries whose result is a multi-column relation instead of a collection of unstructured documents as in standard web search. The user specifies the query...