Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
We describe DEIMOS, a system that automatically discovers and models new sources of information. The system exploits four core technologies developed by our group that makes an en...
Recently many data types arising from data mining and Web search applications can be modeled as bipartite graphs. Examples include queries and URLs in query logs, and authors and ...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is esse...
Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio...