Sciweavers

684 search results - page 15 / 137
» Extracting semantic structure of web documents using content...
Sort
View
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
14 years 27 days ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant
ISCIS
2003
Springer
14 years 1 months ago
A Cooperative Paradigm for Fighting Information Overload
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays ther...
Daniel Gayo-Avello, Darío Álvarez Gu...
ICIP
2003
IEEE
14 years 10 months ago
Structuralizing educational videos based on presentation content
This work addresses the challenge of extracting structure in educational and training media based on the type of material that is presented during lectures and training sessions. ...
Chitra Dorai, Vincent Oria, Viswanath Neelavalli
WWW
2010
ACM
13 years 8 months ago
Exploiting content redundancy for web information extraction
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
HT
1996
ACM
14 years 24 days ago
HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link...
Ron Weiss, Bienvenido Vélez, Mark A. Sheldo...