Sciweavers

2677 search results - page 152 / 536
» Extracting Structured Data from Web Pages
Sort
View
136
Voted
WWW
2003
ACM
16 years 3 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
WWW
2011
ACM
14 years 9 months ago
FACTO: a fact lookup engine based on web tables
Recently answers for fact lookup queries have appeared on major search engines. For example, for the query {Barack Obama date of birth} Google directly shows “4 August 1961” a...
Xiaoxin Yin, Wenzhao Tan, Chao Liu
DOCENG
2004
ACM
15 years 8 months ago
The lifecycle of a digital historical document: structure and content
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final re...
Apostolos Antonacopoulos, Dimosthenis Karatzas, He...
IAT
2008
IEEE
15 years 9 months ago
Acquiring Vague Temporal Information from the Web
Many real–world information needs are naturally formulated as queries with temporal constraints. However, the structured temporal background information needed to support such c...
Steven Schockaert, Martine De Cock, Etienne E. Ker...
105
Voted
DEXAW
2002
IEEE
145views Database» more  DEXAW 2002»
15 years 7 months ago
An Architecture for Collaboratively Assembled Moderated Information Bearing Web Sites
As originally conceived, the World Wide Web was intended for the purpose of sharing information. Many websites realise this aim by publishing pages from a data repository which su...
Richard Cooper