We introduce the corpus of United States Congressional bills from 1947 to 1998 for use by language research communities. The U.S. Policy Agenda Legislation Corpus Volume 1 (USPALC...
We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasi...
Hongyuan Zha, Zhaohui Zheng, Haoying Fu, Gordon Su...
Recent study has shown that canonical algorithms such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) can be obtained from graph based dimensionality ...
In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
Focused Web browsing activities such as periodically looking up headline news, weather reports, etc., which require only selective fragments of particular Web pages, can be made m...