Category-based Pseudowords

15 years 8 months ago

Download biotext.berkeley.edu

A pseudoword is a composite comprised of two or more words chosen at random; the individual occurrences of the original words within a text are replaced by their conﬂation. Pseudowords are a useful mechanism for evaluating the impact of word sense ambiguity in many NLP applications. However, the standard method for constructing pseudowords has some drawbacks. Because the constituent words are chosen at random, the word contexts that surround pseudowords do not necessarily reﬂect the contexts that real ambiguous words occur in. This in turn leads to an optimistic upper bound on algorithm performance. To address these drawbacks, we propose the use of lexical categories to create more realistic pseudowords, and evaluate the results of different variations of this idea against the standard approach.

Preslav Nakov, Marti A. Hearst

Real-time Traffic

NAACL 2003 | NAACL 2007 | Optimistic Upper Bound | Original Words | Word Sense Ambiguity |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	NAACL
Authors	Preslav Nakov, Marti A. Hearst

Comments (0)

Sciweavers

Category-based Pseudowords

NAACL 2003 | NAACL 2007 | Optimistic Upper Bound | Original Words | Word Sense Ambiguity |

Explore & Download

Productivity Tools

Sciweavers