Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

155

IJCNLP
2004
Springer

219views Natural Language Processing» more IJCNLP 2004»

Statistical Substring Reduction in Linear Time

15 years 12 months ago

Statistical Substring Reduction in Linear Time

Download homepages.inf.ed.ac.uk

We study the problem of efﬁciently removing equal frequency n-gram substrings from an n-gram set, formally called Statistical Substring Reduction (SSR). SSR is a useful operation in corpus based multi-word unit research and new word identiﬁcation task of oriental language processing. We present a new SSR algorithm that has linear time (O(n)), and prove its equivalence with the traditional O(n2) algorithm. In particular, using experimental results from several corpora with different sizes, we show that it is possible to achieve performance close to that theoretically predicated for this task. Even in a small corpus the new algorithm is several orders of magnitude faster than the O(n2) one. These results show that our algorithm is reliable and efﬁcient, and is therefore an appropriate choice for large scale corpus processing.

Xueqiang Lü Le Zhang, Junfeng Hu

Real-time Traffic

Frequency N-gram Substrings | IJCNLP 2004 | Statistical Substring Reduction | Word Identiﬁcation Task |

claim paper

Related Content

» Statistical Leakage and Timing Optimization for Submicron Process Variation

» Variationaware interconnect extraction using statistical moment preserving model order red...

» Monotony of surprise and largescale quest for unusual words

» A compressed selfindex using a ZivLempel dictionary

» Extracting keysubstringgroup features for text classification

» Processor Load Analysis for Mobile Multimedia Streaming The Implication of Power Reduction

» Voltage drop reduction for onchip power delivery considering leakage current variations

» Intervalvalued reduced order statistical interconnect modeling

» On Multiple Linear Approximations

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	IJCNLP
Authors	Xueqiang Lü Le Zhang, Junfeng Hu

Comments (0)