We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
We present a set of XML language extensions that bring notions from functional programming to web authors, extending the power of declarative modelling for the web. Our previous w...
The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-...
Deise de Brum Saccol, Nina Edelweiss, Renata de Ma...
Since the advent of XML, the ability to transform documents using transformation languages such as XSLT has become an important challenge. However, writing a transformation script...
This paper presents a document image thresholding technique that binarizes badly illuminated document images by the photometric correction. Based on the observation that illuminat...