We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
The pebble tree automaton and the pebble tree transducer are enhanced by additionally allowing an unbounded number of `invisible' pebbles (as opposed to the usual `visible...
Joost Engelfriet, Hendrik Jan Hoogeboom, Bart Samw...
Many systems such as Tukwila and YFilter combine automaton and algebra techniques to process queries over tokenized XML streams. Typically in this architecture, an automaton is fi...
A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
Generative models of pattern individuality attempt to learn the distribution of observed quantitative features to determine the probability of two random patterns being the same. ...