ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Emerging distributed technologies aim to provide simple and powerful tools for web services design and implementation. Main vendors provide modern frameworks so that a good coordi...
Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks ...
HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks...
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) us...
Raymond Kosala, Maurice Bruynooghe, Jan Van den Bu...