Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an ...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agen...
Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mo...
This paper presents a simple and fast technique for geometrical feature detection of several human face organs such as eyes and mouth. Human face gravity-center template is firstly...
More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...