Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

216

DEXA
2005
Springer

109views Database» more DEXA 2005»

An XML Approach to Semantically Extract Data from HTML Tables

16 years 14 days ago

An XML Approach to Semantically Extract Data from HTML Tables

Download www.cis.unisa.edu.au

Abstract. Data intensive information is often published on the internet in the format of HTML tables. Extracting some of the information that is of users’ interest from the internet, especially when large number of web pages need to be accessed, is time consuming. To automate the processes of information extraction, this paper proposes an XML way of semantically analyzing HTML tables for the data od interest. It ﬁrstly introduces a mini language in XML syntax for specifying ontologies that represent the data of interest. Then it deﬁnes algorithms that parse HTML tables to a specially deﬁned type of XML trees. The XML trees are then compared with the ontologies to semantically analyze and locate the part of table or nested tables that have the interesting data. Finally, interesting data, once identiﬁed, is output as XML documents.

Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen

Real-time Traffic

Data Intensive Information | DEXA 2005 | HTML Tables | Xml Trees |

claim paper

Related Content

» Integrating HTML Tables Using Semantic Hierarchies And MetaData Sets

» From HTML documents to web tables and rules

» Extracting Personalised Ontology from DataIntensive Web Application an HTML FormsBased Rev...

» Rule Learning for Feature Values Extraction from HTML Product Information Sheets

» WebSets extracting sets of entities from the web using unsupervised information extraction

» VERT A Semantic Approach for Content Search and Content Extraction in XML Query Processing

» Unsupervised Learning of Tree Alignment Models for Information Extraction

» Towards domainindependent information extraction from web tables

» Extracting XML schema from multiple implicit xml documents based on inductive reasoning

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	DEXA
Authors	Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen

Comments (0)