We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
Continuous queries are queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on...
One important problem proposed recently in the field of web mining is website classification problem. The complexity together with the necessity to have accurate and fast algorit...
This paper describes a hidden Markov model (HMM) based approach to perform search interface segmentation. Automatic processing of an interface is a must to access the invisible co...