In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new s...
In this paper, we propose a system for helping users to judge the credibility of Web search results and to search for credible Web pages. Conventional Web search engines present o...
The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pag...
Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and reca...
Mirel Cosulschi, Adrian Giurca, Bogdan Udrescu, Ni...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...