Enriching digital library’s author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors’ information from their hom...
Background: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships fr...
It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web bro...
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...