Sciweavers

368 search results - page 13 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
PVLDB
2010
135views more  PVLDB 2010»
13 years 5 months ago
SXPath - Extending XPath towards Spatial Querying on Web Documents
Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spati...
Ermelinda Oro, Massimo Ruffolo, Steffen Staab
AWIC
2003
Springer
14 years 18 days ago
Web Page Classification: A Soft Computing Approach
The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has gener...
Angela Ribeiro, Víctor Fresno, Maria C. Gar...
PAKDD
2009
ACM
116views Data Mining» more  PAKDD 2009»
14 years 2 months ago
Scalable Web Mining with Newistic
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
Ovidiu Dan, Horatiu Mocian
WWW
2007
ACM
14 years 8 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
DKE
1998
146views more  DKE 1998»
13 years 7 months ago
A Case study of Automatic Authoring: From a Textbook to a Hyper-Textbook
This paper presents a case-study of automatic construction of a hypertext from a large full-text document. The document we used as input of the automatic authoring process is a we...
Fabio Crestani, Massimo Melucci