Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled da...
Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches...
This paper describes a self-modelling, incremental algorithm for learning translation rules from existing bilingual corpora. The notions of supracontext and subcontext are extende...
Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...
We present the machine learning framework that we are developing, in order to support explorative search for non-trivial linguistic configurations in low-density languages (langua...