A widely acknowledged drawback of many statistical modelling techniques, commonly used in machine learning, is that the resulting model is extremely difficult to interpret. A numb...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user qu...
Paramveer S. Dhillon, Sundararajan Sellamanickam, ...
Extensible Markup Language (XML) is becoming the de facto standard for exchanging information over the Internet, which results in the proliferation of XML documents. This has led ...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai