We present the RGAI systems which participated in the third Web People Search Task challenge. The chief characteristics of our approach are that we focus on the raw textual parts o...
We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumb...
This paper describes the processes of downloading and classifying Web-based articles in online medical journals as a preliminary step to extracting bibliographic data to populate ...
Loc Q. Tran, Chan W. Moon, Daniel X. Le, George R....
In this paper, we propose an implementable characterization of genre suitable for automatic genre identification of web pages. This characterization is implemented as an inferenti...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...