Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for represen...
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to genera...
Matthew G. Snover, Bonnie J. Dorr, Richard M. Schw...
With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translatio...
The Web has the potential to become the world’s
largest knowledge base. In order to unleash this potential,
the wealth of information available on the Web needs to be
extracte...
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifr...