Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
Web sites allow the collection of vast amounts of navigational data – clickstreams of user traversals through the site. These massive data stores offer the tantalizing possibil...
Kaushik Dutta, Debra E. VanderMeer, Anindya Datta,...
Abstract--We propose a new Consistent Weighted Sampling method, where the probability of drawing identical samples for a pair of inputs is equal to their Jaccard similarity. Our me...
We study the mining of interesting patterns in the presence of numerical attributes. Instead of the usual discretization methods, we propose the use of rank based measures to scor...