The organization of documents is a task that we face as computer users daily. This is particularly true for management of email. Typically email documents are organized in director...
In dynamic environments with frequent content updates, we require online full-text search that scales to large data collections and achieves low search latency. Several recent met...
Website development has always been an hard task: it consumes time and resources. What is new today is normally taken as granted tomorrow by users. This is to say that users always...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...