Sciweavers

368 search results - page 12 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
CIS
2004
Springer
14 years 23 days ago
A Method of Acquiring Ontology Information from Web Documents
Abstract. Ontology plays an important role on the Semantic Web. In this paper, we propose a method, AOIWD, of acquiring ontology information from Web documents. The AOIWD method em...
Lixin Han, Guihai Chen, Li Xie
WWW
2005
ACM
14 years 8 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger
DOCENG
2009
ACM
14 years 1 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu
JCDL
2006
ACM
167views Education» more  JCDL 2006»
14 years 1 months ago
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Jie Zou, Daniel X. Le, George R. Thoma
ICDE
2010
IEEE
251views Database» more  ICDE 2010»
14 years 7 months ago
Viewing a World of Annotations through AnnoVIP
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...
Konstantinos Karanasos, Spyros Zoupanos