Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people access information through their web browsers, PDAs, and cell ph...
We give a new view on building content clusters from page pair models. We measure the heuristic importance within every two pages by computing the distance of their accessed positi...
Abstract. This paper argues that the World Wide Web could be regarded not only as an information resource but also as a dynamic, multilingual, least controlled, easy to access and ...