We propose a low cost method for the correction of the output of OCR engines through the use of human labor. The method employs an error estimator neural network that learns to as...
Most current XQuery implementations require that all XML data reside in memory in one form or another before they start processing the data. This is unacceptable for large XML doc...
First generation Web-content encodes information in handwritten (HTML) Web pages. Second generation Web content generates HTML pages on demand, e.g. by filling in templates with c...
Jacco van Ossenbruggen, Joost Geurts, Frank Cornel...
Topics in prior-art patent search are typically full patent applications and relevant items are patents often taken from sources in different languages. Cross language patent retr...
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...