XML is increasingly becoming the format of choice for information exchange on the Internet. As this trend grows, one can expect that documents (or collections thereof) may get qui...
Premkumar T. Devanbu, Michael Gertz, April Kwong, ...
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provid...
Thomas Strecker, Joost van Beusekom, Sahin Albayra...
This paper reviews the main innovations of XML and considers their impact on the editing techniques for structured documents. Namespaces open the way to compound documents; well-f...
As camera resolution increases, high-speed non-contact text capture through a digital camera is opening up a new channel for document capture and understanding. Unfortunately, per...