As opposed to traditional Information Retrieval (IR) which views whole documents as atomic units of retrieval, XML IR processes XML elements as possible units of retrieval. Many o...
This paper describes how use the Java Swing HTMLEditorKit to perform multi-threaded web data mining on the EDGAR system (Electronic DataGathering, Analysis, and Retrieval system)....
Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into acc...
As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...
As the popularity of the World Wide Web increases, the amount of traffic results in major congestion problems for the retrieval of data over wide distances. To react to this, user...