Sciweavers

263 search results - page 11 / 53
» Re-engineering structures from Web documents
Sort
View
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
ELPUB
1998
ACM
13 years 11 months ago
Addressing Publishing Issues with Hypermedia Distributed on the Web
The content and structure of an electronically published document can be authored and processed in ways that allow for flexibility in presentation on different environments for di...
Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbrug...
WWW
2004
ACM
14 years 8 months ago
Hearsay: enabling audio browsing on hypertext content
In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsa...
I. V. Ramakrishnan, Amanda Stent, Guizhen Yang
SAMT
2007
Springer
108views Multimedia» more  SAMT 2007»
14 years 1 months ago
Document Layout Substructure Discovery
Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...
Claudio Andreatta
SAINT
2005
IEEE
14 years 1 months ago
Learning Logic Wrappers for Information Extraction from the Web
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...
Costin Badica, Elvira Popescu, Amelia Badica