In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevai...
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model a...
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, ...
This paper is about the reproduction of ancient texts with vectorised fonts. While for OCR only recognition rates count, a reproduction process does not necessarily require the re...
Abstract Irregularities are widespread in large databases and often lead to erroneous conclusions with respect to data mining and statistical analysis. For example, considerable bi...
Siu-Tong Au, Rong Duan, Siamak G. Hesar, Wei Jiang