Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus f...
We develop a new component analysis framework, the Noisy-Or Component Analyzer (NOCA), that targets high-dimensional binary data. NOCA is a probabilistic latent variable model tha...
Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information r...
We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard docume...
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, J...
To develop effective learning algorithms for online cursive word recognition is still a challenge research issue. In this paper, we propose a probabilistic framework to model the ...