Sciweavers

684 search results - page 32 / 137
» Extracting semantic structure of web documents using content...
Sort
View
WWW
2005
ACM
14 years 9 months ago
Automatically learning document taxonomies for hierarchical classification
While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new techni...
Kunal Punera, Suju Rajan, Joydeep Ghosh
EDBTW
2010
Springer
13 years 7 months ago
Using visual pages analysis for optimizing web archiving
Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers ha...
Myriam Ben Saad, Stéphane Gançarski
WEBI
2005
Springer
14 years 2 months ago
Automated Metadata and Instance Extraction from News Web Sites
In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and...
Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih G...
DOCENG
2009
ACM
14 years 3 months ago
From rhetorical structures to document structure: shallow pragmatic analysis for document engineering
In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric ...
Gersende Georg, Hugo Hernault, Marc Cavazza, Helmu...
RIAO
2007
13 years 10 months ago
Using a Content-and-Structure Oriented Method for Relevance Feedback in XML Retrieval
As opposed to traditional Information Retrieval (IR) which views whole documents as atomic units of retrieval, XML IR processes XML elements as possible units of retrieval. Many o...
Lobna Hlaoua, Mohand Boughanem, Karen Pinel-Sauvag...