We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the...
Minos N. Garofalakis, Aristides Gionis, Rajeev Ras...
With an increasing amount of semi-structured data XML has become important. XML documents may contain private information that cannot be shared by all user communities. Therefore,...
The proliferation of XML as a standard for data representation and exchange in diverse, next-generation Web applications has created an emphatic need for effective XML data-integr...
Wenfei Fan, Minos N. Garofalakis, Ming Xiong, Xibe...