Enhanced Suffix Arrays as Language Models: Virtual k-Testable Languages

15 years 8 months ago

Download ilk.uvt.nl

Abstract. In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous backoff automatically identifies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks.

Herman Stehouwer, Menno van Zaanen

Real-time Traffic

ICGI 2010 | K-testable Automaton | N-gram Language Models | Natural Language Processing | Suffix Arrays |

claim paper

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	ICGI
Authors	Herman Stehouwer, Menno van Zaanen

Comments (0)

Sciweavers

Enhanced Suffix Arrays as Language Models: Virtual k-Testable Languages

ICGI 2010 | K-testable Automaton | N-gram Language Models | Natural Language Processing | Suffix Arrays |

Explore & Download

Productivity Tools

Sciweavers