Large collections of documents containing various types of multimedia, are made available to the WWW. Unfortunately, due to the un-structuredness of Internet environments it is ha...
Abstract-This paper discusses the design and evaluation of CATNIP, a ContextAware Transport/Network Internet Protocol for the Web. This integrated protocol uses application-layer k...
Text categorization, as an essential component of applications for user navigation on the World Wide Web using QuestionAnswering in Japanese, requires more effective features for ...
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...