We present a document expansion approach that uses Conditional Random Field (CRF) segmentation to automatically extract salient phrases from ad titles. We then supplement the ad d...
Lattice-based approaches have been widely used in spoken document retrieval to handle the speech recognition uncertainty and errors. Position Specific Posterior Lattices (PSPL) an...
Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/...
Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
We are developing a recognition system, named `Infty', for scientific documents including those with mathematical formulae. In this paper, we propose a new system that can re...