Learning Text Extraction Rules, without Ignoring Stop Words

15 years 8 months ago

Download www.di.ubi.pt

Information Extraction (IE) from text /web documents has become an important application area of AI. As the number of web sites and documents has grown dramatically, the users need an easy, fast and flexible ways of generating systems that can carry out specific IE tasks. This can be achieved with the help of Machine Learning (ML) techniques. We have developed a system that exploits this strategy. After training the system is capable of identifying certain relevant elements in the text and extracting the corresponding information. As input, system takes a collection of text documents (in a certain domain), that have been previously annotated by a user. This is used to generate extraction rules. We describe a set of experiments that have been oriented towards the domain of announcements (in Portuguese) concerning house/flat sales. We show that quite good results overall can be achieved using this methodology. In previous work some authors argue that stop words should really be eliminate...

João Cordeiro, Pavel Brazdil

Real-time Traffic

Important Application Area | PRIS 2004 | PRIS 2007 | Specific Ie Tasks | Text /web Documents |

claim paper

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	PRIS
Authors	João Cordeiro, Pavel Brazdil

Sciweavers

Learning Text Extraction Rules, without Ignoring Stop Words

Important Application Area | PRIS 2004 | PRIS 2007 | Specific Ie Tasks | Text /web Documents |

Explore & Download

Productivity Tools

Sciweavers