Sciweavers

775 search results - page 100 / 155
» Email data cleaning
Sort
View
ICDE
2008
IEEE
152views Database» more  ICDE 2008»
14 years 11 months ago
Efficient Merging and Filtering Algorithms for Approximate String Searches
We study the following problem: how to efficiently find in a collection of strings those similar to a given query string? Various similarity functions can be used, such as edit dis...
Chen Li, Jiaheng Lu, Yiming Lu
WWW
2005
ACM
14 years 10 months ago
The volume and evolution of web page templates
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
David Gibson, Kunal Punera, Andrew Tomkins
SIGMOD
2001
ACM
120views Database» more  SIGMOD 2001»
14 years 10 months ago
Materialized View Selection and Maintenance Using Multi-Query Optimization
Materialized views have been found to be very effective at speeding up queries, and are increasingly being supported by commercial databases and data warehouse systems. However, w...
Hoshi Mistry, Prasan Roy, S. Sudarshan, Krithi Ram...
PODS
2008
ACM
250views Database» more  PODS 2008»
14 years 10 months ago
Approximating predicates and expressive queries on probabilistic databases
We study complexity and approximation of queries in an expressive query language for probabilistic databases. The language studied supports the compositional use of confidence com...
Christoph Koch
WSDM
2010
ACM
215views Data Mining» more  WSDM 2010»
14 years 7 months ago
Boilerplate Detection using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Christian Kohlschütter, Peter Fankhauser, Wol...