Sciweavers

684 search results - page 37 / 137
» Extracting semantic structure of web documents using content...
Sort
View
DEXA
2009
Springer
173views Database» more  DEXA 2009»
14 years 3 months ago
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents
SHIRI 1 is an ontology-based system for integration of semistructured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts ...
Mouhamadou Thiam, Nacéra Bennacer, Nathalie...
KES
2006
Springer
13 years 8 months ago
Web Site Off-Line Structure Reconfiguration: A Web User Browsing Analysis
The correct web site text content must be help to the visitors to find what they are looking for. However, the reality is quite different, many times the web page text content is a...
Sebastián A. Ríos, Juan D. Vel&aacut...
PODS
2002
ACM
117views Database» more  PODS 2002»
14 years 8 months ago
Monadic Datalog and the Expressive Power of Languages for Web Information Extraction
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formal...
Georg Gottlob, Christoph Koch
VLDB
2003
ACM
125views Database» more  VLDB 2003»
14 years 9 months ago
THESUS: Organizing Web document collections based on link semantics
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...
ECIR
2008
Springer
13 years 10 months ago
Clustering Template Based Web Documents
More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
Thomas Gottron