Large-scale systems for information extraction include many different classifiers and extractors. Experience in building such systems shows that finding an appropriate architecture is both difficult and important: in particular, in systems containing many learned components, it is important to cleanly share information between the components, and to flexibly sequence the actions of the components. In this paper, an architecture for large-scale information extraction systems is described, based a light-weight blackboard system for communication between components, and a declarative control system for automatically sequencing component-level tasks like classification, extraction, and feature computation.
William W. Cohen