In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today’s Web sources are “human-readable...
The discovery and extraction of general lists on the Web continues to be an important problem facing the Web mining community. There have been numerous studies that claim to autom...
Tim Weninger, Fabio Fumarola, Rick Barber, Jiawei ...
A new approach has been developed for acquiring bilingual web pages from the result pages of search engines, which is composed of two challenging tasks. The first task is to detec...
Many common web tasks can be automated by algorithms that are able to identify web objects relevant to the user's needs. This paper presents a novel approach to web object id...
Nathanael Chambers, James F. Allen, Lucian Galescu...
Background: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, an...