The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a compu...
The paper is organized as a self-contained literate Haskell program that implements elements of an executable finite set theory with focus on combinatorial generation and arithmet...
The paper is organized as a self-contained literate Prolog program that implements elements of an executable finite set theory with focus on combinatorial generation and arithmetic...
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no al...
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Gan...
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...