We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
The Web has the potential to become the world’s
largest knowledge base. In order to unleash this potential,
the wealth of information available on the Web needs to be
extracte...
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifr...
The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet mor...
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifr...
The existence of large image datasets such as the set of photos on the World Wide Web make it possible to build powerful generic models for low-level image attributes like color u...
The past two years have been a turbulent time for the New Economy generally – and for the digital content industry in particular. In the wake of the dot.com and telecoms crashes...
Kornelia van der Beek, Paula M. C. Swatman, Cornel...