Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of docum...
We present a novel software tool called CDN (Collaborative Data Network) for large-scale sharing and querying of clinical documents modeled using HL7 v3 standard (e.g., Clinical D...
Praveen R. Rao, Tivakar Komara Swami, Deepthi S. R...
The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose ...
We present a novel approach for multilingual document clustering using only comparable corpora to achieve cross-lingual semantic interoperability. The method models document colle...
In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as...
Douglas Teodoro, Julien Gobeill, Emilie Pasche, Di...