The table recognition literature contains many strategies specified informally as a sequence of operations, obscuring both models of table structure and the effects of individual decisions. Decision making is more transparent in formal model-based approaches (e.g. grammar-based) but these approaches are less flexible than informal ones. We propose an intermediate level of formalization, defining strategies as a sequence of basic graph transformations that correspond to recognition operations (e.g. classification, segmentation). Transformations are parameterized by logical types and decision functions, which together define structure models and executable strategies for interpreting input graphs. We provide an overview of our first attempt at this intermediate level of formalization, the Recognition Strategy Language (RSL). As a proof-of-concept, we reimplement two informally specified table recognition strategies from the literature in RSL. The RSL implementations capture descr...
Richard Zanibbi, Dorothea Blostein, James R. Cordy