Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spa...
This paper presents a method for finding a specification page on the web for a given object (e.g., "Titanic") and its class label (e.g., "film"). A specificati...
While we expect to discover knowledge in the texts available on the Web, such discovery usually requires many complex analysis steps, most of which require different text handling...
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document con...