We describe a domain-independent, unsupervised algorithm for refined segmentation of time series data into meaningful episodes, focusing on the problem of text segmentation. The V...
A CAPTCHA which humans find to be highly legible and which is designed to resist automatic character–segmentation attacks is described. As first detailed in [BR05], these ‘Sc...
Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled a...
Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Gre...
A new approach for separating mathematics from usual text is presented. Contrary to the existing methods, it is more oriented toward the segmentation than the recognition, isolati...