We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...
Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the c...
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding ...
Web logs collected by proxy servers, referred to as proxy logs or proxy traces, contain information about Web document accesses by many users against many Web sites. This "man...