When printed hypertexts go digital: information extraction from the parsing of indices

16 years 2 months ago

Download www.perseus.tufts.edu

Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain speciﬁc language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper ﬁrst considers the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input. Categories and Subject Descriptors H.5.4 [Information Interfaces and Presentation]: [Hypertext/Hypermedia] General Terms Design, Experimentation. Keywords Printed hypertexts, indices, information extraction, parsing.

Matteo Romanello, Monica Berti, Alison Babeu, Greg

Real-time Traffic

Fragmentary Texts | HT 2009 | Internet Technology | Modern Critical Editions | Quotation Indices |

claim paper

Post Info
More Details (n/a)

Added	28 May 2010
Updated	28 May 2010
Type	Conference
Year	2009
Where	HT
Authors	Matteo Romanello, Monica Berti, Alison Babeu, Gregory Crane

Comments (0)

Sciweavers

When printed hypertexts go digital: information extraction from the parsing of indices

Fragmentary Texts | HT 2009 | Internet Technology | Modern Critical Editions | Quotation Indices |

Explore & Download

Productivity Tools

Sciweavers