Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents

16 years 2 months ago

Download wwwdi.supelec.fr

SHIRI 1 is an ontology-based system for integration of semistructured documents related to a speciﬁc domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extract-Align a...

Mouhamadou Thiam, Nacéra Bennacer, Nathalie

Real-time Traffic

Database | DEXA 2009 | Extract-Align Algorithm | Semantic Annotation | Shiri Uses Rdf/owl |

claim paper

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	DEXA
Authors	Mouhamadou Thiam, Nacéra Bennacer, Nathalie Pernelle, Moussa Lo

Sciweavers

Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents

Database | DEXA 2009 | Extract-Align Algorithm | Semantic Annotation | Shiri Uses Rdf/owl |

Explore & Download

Productivity Tools

Sciweavers