Sciweavers

PRIS
2004

Learning Text Extraction Rules, without Ignoring Stop Words

14 years 26 days ago
Learning Text Extraction Rules, without Ignoring Stop Words
Information Extraction (IE) from text /web documents has become an important application area of AI. As the number of web sites and documents has grown dramatically, the users need an easy, fast and flexible ways of generating systems that can carry out specific IE tasks. This can be achieved with the help of Machine Learning (ML) techniques. We have developed a system that exploits this strategy. After training the system is capable of identifying certain relevant elements in the text and extracting the corresponding information. As input, system takes a collection of text documents (in a certain domain), that have been previously annotated by a user. This is used to generate extraction rules. We describe a set of experiments that have been oriented towards the domain of announcements (in Portuguese) concerning house/flat sales. We show that quite good results overall can be achieved using this methodology. In previous work some authors argue that stop words should really be eliminate...
João Cordeiro, Pavel Brazdil
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where PRIS
Authors João Cordeiro, Pavel Brazdil
Comments (0)