We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) t...
Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
Digital Libraries will hold huge amounts of text and other forms of information. For the collections to be maximally useful, they must be highly organized with useful indexes and ...
Robert P. Futrelle, Xiaolan Zhang 0002, Yumiko Sek...
According to Koestler, the notion of a bisociation denotes a connection between pieces of information from habitually separated domains or categories. In this paper, we consider a ...
In this paper, we present an empirical comparison of the effects of category skew on six feature selection methods. The methods were evaluated on 36 datasets generated from the 20...