One Size Fits All? A Simple Technique to Perform Several NLP Tasks

14 years 8 months ago

Download www.di.uniovi.es

Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic summarization [4] or, even, genetic classification of languages [5]. All these techniques share some common aspects such as: (1) documents are mapped to a vector space where n-grams are used as coordinates and their relative frequencies as vector weights, (2) many of them compute a context which plays a role similar to stop-word lists, and (3) cosine distance is commonly used for document-to-document and query-to-document comparisons. blindLight is a new approach related to these classical n-gram techniques although it introduces two major differences: (1) Relative frequencies are no more used as vector weights but replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques although not so computationally expensive. This new approa...

Daniel Gayo-Avello, Darío Álvarez Gu

Real-time Traffic

Document Categorization | Information Retrieval | Natural Language Processing | TAL 2004 | Vector Weights |

claim paper

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	TAL
Authors	Daniel Gayo-Avello, Darío Álvarez Gutiérrez, José Gayo-Avello

Comments (0)

Sciweavers

One Size Fits All? A Simple Technique to Perform Several NLP Tasks

Document Categorization | Information Retrieval | Natural Language Processing | TAL 2004 | Vector Weights |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers