Sciweavers

PVLDB
2016

Repairing Data through Regular Expressions

8 years 7 months ago
Repairing Data through Regular Expressions
Since regular expressions are often used to detect errors in sequences such as strings or date, it is natural to use them for data repair. Motivated by this, we propose a data repair method based on regular expression to make the input sequence data obey the given regular expression with minimal revision cost. The proposed method contains two steps, sequence repair and token value repair. For sequence repair, we propose the Regular-expressionbased Structural Repair (RSR in short) algorithm. RSR algorithm is a dynamic programming algorithm that utilizes Nondeterministic Finite Automata (NFA) to calculate the edit distance between a prefix of the input string and a partial pattern regular expression with time complexity of O(nm2 ) and space complexity of O(mn) where m is the edge number of NFA and n is the input string length. We also develop an optimization strategy to achieve higher performance for long strings. For token value repair, we combine the edit-distance-based method and as...
Zeyu Li, Hongzhi Wang, Wei Shao, Jianzhong Li, Hon
Added 09 Apr 2016
Updated 09 Apr 2016
Type Journal
Year 2016
Where PVLDB
Authors Zeyu Li, Hongzhi Wang, Wei Shao, Jianzhong Li, Hong Gao
Comments (0)