Extracting Useful Information from the Full Text of Fiction

14 years 2 months ago

Download riao.free.fr

In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books, and the utility of IE techniques in extracting useful information from fiction. In particular, we evaluate a variety of Named Entity Recognition (NER) techniques in identifying the central characters in works of fiction. First, we describe the creation of a gold standard for evaluation, which contains ordered lists of characters for a corpus of classic book texts in Project Gutenberg. Second, we describe several approaches to the task of character identification, where our best model achieves an average coverage score of 78.4% across all central characters. Finally, we propose a number of approaches for future work.

Sharon Givon, Maria Milosavljevic

Real-time Traffic

Book Texts | Central Characters | IE Techniques | Information Technology | RIAO 2007 |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2007
Where	RIAO
Authors	Sharon Givon, Maria Milosavljevic

Comments (0)

Sciweavers

Extracting Useful Information from the Full Text of Fiction

Book Texts | Central Characters | IE Techniques | Information Technology | RIAO 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers