Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies--attempting to do data record detection and attribute labeling in two se...
The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction lan...
This paper is concerned with the problem of structured data extraction from Web pages. The objective of the research is to automatically segment data records in a page, extract da...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...
Although the ever growing Web contain information to virtually every user’s query, it does not guarantee effectively accessing to those information. In many situations, the user...
Recent work has shown the feasibility and promise of templateindependent Web data extraction. However, existing approaches use decoupled strategies ? attempting to do data record ...
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Y...
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...