We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Searching for images of people is an essential task for image and video search engines. However, current search engines have limited capabilities for this task since they rely on ...
The large amount of information now available on the Web can play a prominent role in building a cooperative intelligent distance learning environment. We propose a system to prov...
Mohammed Abdel Razek, Claude Frasson, Marc Kaltenb...
The web contains lots of interesting factual information about entities, such as celebrities, movies or products. This paper describes a robust bootstrapping approach to corrobora...
Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in we...