In today’s global village, it is critical that the key information tools, such as web search engines, e-Commerce portals and e-Governance, work across multiple natural languages,...
In this paper, we focus on the ontological concept extraction and evaluation process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical...
The JOS language resources are meant to facilitate developments of HLT and corpus linguistics for the Slovene language and consist of the morphosyntactic specifications, defining ...
Tomaz Erjavec, Darja Fiser, Simon Krek, Nina Ledin...
Abstract. This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled...
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recent...