Citation Recognition for Scientific Publications in Digital Libraries

14 years 4 months ago

Download www.loria.fr

In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: ``authors'', "title", "conference name", "date", etc. Non labeled tokens are integrated in one or another field by either applying PoS correction rules or using a inter- or intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and abou...

Dominique Besagni, Abdel Belaïd

Real-time Traffic

Bibliographic Reference Structure | DIAL 2004 | Image Analysis | Non Labeled Tokens | Reference Structure |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	DIAL
Authors	Dominique Besagni, Abdel Belaïd

Comments (0)

Sciweavers

Citation Recognition for Scientific Publications in Digital Libraries

Bibliographic Reference Structure | DIAL 2004 | Image Analysis | Non Labeled Tokens | Reference Structure |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers