Sciweavers

92 search results - page 9 / 19
» HTML Pattern Generator--Automatic Data Extraction from Web P...
Sort
View
WWW
2009
ACM
14 years 9 months ago
Incorporating site-level knowledge to extract structured data from web forums
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
PVLDB
2010
114views more  PVLDB 2010»
13 years 7 months ago
ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
Talel Abdessalem, Bogdan Cautis, Nora Derouiche
LREC
2010
216views Education» more  LREC 2010»
13 years 10 months ago
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Georgios Petasis, Dimitrios Petasis
WWW
2009
ACM
14 years 1 months ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...
ICST
2011
IEEE
13 years 8 days ago
Tailored Shielding and Bypass Testing of Web Applications
User input validation is a technique to counter attacks on web applications. In typical client-server architectures, this validation is performed on the client side. This is ineff...
Tejeddine Mouelhi, Yves Le Traon, Erwan Abgrall, B...