Sciweavers

PADL
2012
Springer

LearnPADS + + : Incremental Inference of Ad Hoc Data Formats

12 years 8 months ago
LearnPADS + + : Incremental Inference of Ad Hoc Data Formats
An ad hoc data source is any semi-structured, non-standard data source. The format of such data sources is often evolving and frequently lacking documentation. Consequently, off-the-shelf tools for processing such data often do not exist, forcing analysts to develop their own tools, a costly and time-consuming process. In this paper, we present an incremental algorithm that automatically infers the format of large-scale data sources. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale or streaming data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions. 4
Kenny Qili Zhu, Kathleen Fisher, David Walker
Added 25 Apr 2012
Updated 25 Apr 2012
Type Journal
Year 2012
Where PADL
Authors Kenny Qili Zhu, Kathleen Fisher, David Walker
Comments (0)