A Segmentation Method for Bibliographic References by Contextual Tagging of Fields

14 years 4 months ago

Download www.cse.salford.ac.uk

In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to record fields: ``authors'', “title”, “conference name”, “date”, etc. Non labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references ar...

Dominique Besagni, Abdel Belaïd, Nelly Benet

Real-time Traffic

Bibliographic Reference Structure | Document Analysis | ICDAR 2003 | Non Labelled Tokens | Reference Structure |

claim paper

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Dominique Besagni, Abdel Belaïd, Nelly Benet

Sciweavers

A Segmentation Method for Bibliographic References by Contextual Tagging of Fields

Bibliographic Reference Structure | Document Analysis | ICDAR 2003 | Non Labelled Tokens | Reference Structure |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers