A Fact-aligned Corpus of Numerical Expressions

15 years 8 months ago

Download mcs.open.ac.uk

We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as `about' or `a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subse...

Sandra Williams, Richard Power

Real-time Traffic

Education | LREC 2010 | Numerical Expressions | Numerical Facts | Numerical Hedges |

claim paper

» The BagofOpinions Method for Review Rating Prediction from Sparse Text Patterns

» Text Normalization for the Pronunciation of Nonstandard Words in an Inflected Language

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Sandra Williams, Richard Power

Comments (0)

Sciweavers

A Fact-aligned Corpus of Numerical Expressions

Education | LREC 2010 | Numerical Expressions | Numerical Facts | Numerical Hedges |

Explore & Download

Productivity Tools

Sciweavers