Sciweavers

502 search results - page 15 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WWW
2004
ACM
14 years 9 months ago
Hearsay: enabling audio browsing on hypertext content
In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsa...
I. V. Ramakrishnan, Amanda Stent, Guizhen Yang
ICEIS
2009
IEEE
14 years 3 months ago
Semi-supervised Information Extraction from Variable-length Web-page Lists
We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...
Daniel Nikovski, Alan Esenther, Akihiro Baba
PVLDB
2008
141views more  PVLDB 2008»
13 years 8 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
SIGSOFT
2007
ACM
14 years 9 months ago
Mining API patterns as partial orders from source code: from usage scenarios to specifications
A software system interacts with third-party libraries through various APIs. Using these library APIs often needs to follow certain usage patterns. Furthermore, ordering rules (sp...
Mithun Acharya, Tao Xie, Jian Pei, Jun Xu
ITCC
2005
IEEE
14 years 2 months ago
Elimination of Redundant Information for Web Data Mining
These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...
Shakirah Mohd Taib, Soon-ja Yeom, Byeong Ho Kang