According to a recent survey made by Nielsen NetRatings, searching on news articles is one of the most important activity online. Indeed, Google, Yahoo, MSN and many others have p...
Gianna M. Del Corso, Antonio Gulli, Francesco Roma...
The Semantic Web is a new layer of the Internet that enables semantic representation of the contents of existing web pages. Using common ontologies, human users sketch out the mos...
Christian Fillies, Gay Wood-Albrecht, Frauke Weich...
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can ser...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
We have developed a web-repository crawler that is used for reconstructing websites when backups are unavailable. Our crawler retrieves web resources from the Internet Archive, Go...