Tasks recognizing named entities such as products, people names, or locations from documents have recently received significant attention in the literature. Many solutions to thes...
Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spati...
The World Wide Web, initially intended as a way to publish static hypertexts on the Internet, is moving toward complex applications. Static Web sites are being gradually replaced ...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming what is referred to as the deep web. It is desirable to have systems that can pr...