Automatic restoration of punctuation from unpunctuated text has application in improving the fluency and applicability of speech recognition systems. We explore the possibility t...
Abstract. In this paper we present algorithms for the automatic timesynchronization of score-, MIDI- or PCM-data streams which represent the same polyphonic piano piece. In contras...
Vlora Arifi, Michael Clausen, Frank Kurth, Meinard...
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...
Abstract. Regular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and f...