A Probabilistic Rasch Analysis of Question Answering Evaluations

15 years 4 months ago

Download acl.ldc.upenn.edu

The field of Psychometrics routinely grapples with the question of what it means to measure the inherent ability of an organism to perform a given task, and for the last forty years, the field has increasingly relied on probabilistic methods such as the Rasch model for test construction and the analysis of test results. Because the underlying issues of measuring ability apply to human language technologies as well, such probabilistic methods can be advantageously applied to the evaluation of those technologies. To test this claim, Rasch measurement was applied to the results of 67 systems participating in the Question Answering track of the 2002 Text REtrieval Conference (TREC) competition. Satisfactory model fit was obtained, and the paper illustrates the theoretical and practical strengths of Rasch scaling for evaluating systems as well as questions. Most important, simulations indicate that a test invariant metric can be defined by carrying forward 20 to 50 equating questions, thus...

Rense Lange, Juan Moran, Warren R. Greiff, Lisa Fe

Real-time Traffic

NAACL 2004 | NAACL 2007 | Probabilistic Methods | Rasch | Rasch Model |

claim paper

» Opinion and Generic Question Answering Systems a Performance Analysis

» A Probabilistic Answer Type Model

» Evaluating the Evaluation A Case Study Using the TREC 2002 Question Answering Track

» Evaluating Multilingual Question Answering Systems at CLEF

» Question Answering with QED and Wee at TREC 2004

» Identification of the Question Focus Combining Syntactic Analysis and Ontologybased Lookup...

» Evaluating Robustness Of A QA System Through A Corpus Of RealLife Questions

» An Analysis of Clarification Dialogue for Question Answering

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	NAACL
Authors	Rense Lange, Juan Moran, Warren R. Greiff, Lisa Ferro

Comments (0)

Sciweavers

A Probabilistic Rasch Analysis of Question Answering Evaluations

NAACL 2004 | NAACL 2007 | Probabilistic Methods | Rasch | Rasch Model |

Explore & Download

Productivity Tools

Sciweavers