Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...
In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use...
Understanding the source, data, and documentation files associated with legacy systems in preparation for maintenance or reengineering is an increasingly important problem for man...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
This paper concerns the design of a workflow which permits to feed and query a data warehouse opened on the Web, driven by a domain ontology. This data warehouse has been built to...