Search Sciweavers | Sciweavers

263 search results - page 11 / 53

» Re-engineering structures from Web documents

161

Voted

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 7 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

149

click to vote

ELPUB
1998
ACM

109views Information Technology» more ELPUB 1998»

Addressing Publishing Issues with Hypermedia Distributed on the Web

15 years 11 months ago

Download elpub.scix.net

The content and structure of an electronically published document can be authored and processed in ways that allow for flexibility in presentation on different environments for di...

Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbrug...

claim paper

Read More »

163

click to vote

WWW
2004
ACM

134views Internet Technology» more WWW 2004»

Hearsay: enabling audio browsing on hypertext content

16 years 7 months ago

Download www.iw3c2.org

In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsa...

I. V. Ramakrishnan, Amanda Stent, Guizhen Yang

claim paper

Read More »

176

click to vote

SAMT
2007
Springer

108views Multimedia» more SAMT 2007»

Document Layout Substructure Discovery

16 years 24 days ago

Download tev.fbk.eu

Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...

Claudio Andreatta

claim paper

Read More »

162

Voted

SAINT
2005
IEEE

120views Internet Technology» more SAINT 2005»

Learning Logic Wrappers for Information Extraction from the Web

16 years 8 days ago

Download software.ucv.ro

This paper discusses a methodology for applying general-purpose ﬁrst-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...

Costin Badica, Elvira Popescu, Amelia Badica

claim paper

Read More »

« Prev « First page 11 / 53 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers