The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned wi...
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Cov...
In this paper we present SINTESI, a system for the knowledge extraction from Italian inputs, currently under development in our re,search centre. It is used on short descriptive d...
Abstract—The integration of semantic representation and retrieval technologies into mainstream web applications depends on the ease of adoption and re-use of existing information...
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a ...
Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...