In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two functions: it generates text region candidates, and it veriï...
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
At least two kinds of relations exist among related words: taxonomical relations and thematic relations. Both relations identify related words useful to language understanding and...
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions....
William W. Cohen, Richard C. Wang, Robert F. Murph...
Abstract A rich family of generic Information Extraction (IE) techniques have been developed by researchers nowadays. This paper proposes WebKER, a system for automatically extract...