The MapReduce programming model simplifies large-scale data processing on commodity clusters by having users specify a map function that processes input key/value pairs to generate...
We propose a system that registers and retrieves text documents to annotate them on-line. The user registers a text document captured from a nearly top view and adds virtual annot...
By far, the support vector machines (SVM) achieve the state-of-theart performance for the text classification (TC) tasks. Due to the complexity of the TC problems, it becomes a ch...
A robust segmentation is the most important part of an automatic character recognition system (e.g. document processing, license plate recognition etc.). In our contribution we pr...
Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification tech...
Tie-Yan Liu, Yiming Yang, Hao Wan, Qian Zhou, Bin ...