This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for ...
We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same nume...
The success of a software project is largely dependent upon the quality of the Software Requirements Specification (SRS) document, which serves as a medium to communicate user req...
This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gather...
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...