Sciweavers

775 search results - page 84 / 155
» Email data cleaning
Sort
View
PVLDB
2010
195views more  PVLDB 2010»
13 years 4 months ago
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
Jiannan Wang, Guoliang Li, Jianhua Feng
SIGMOD
2006
ACM
202views Database» more  SIGMOD 2006»
14 years 10 months ago
Avatar semantic search: a database approach to information retrieval
We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offli...
Eser Kandogan, Rajasekar Krishnamurthy, Sriram Rag...
SIGIR
2009
ACM
14 years 4 months ago
Identifying the original contribution of a document via language modeling
Abstract. One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistica...
Benyah Shaparenko, Thorsten Joachims
ICCS
2007
Springer
14 years 4 months ago
Learning Common Outcomes of Communicative Actions Represented by Labeled Graphs
We build a generic methodology based on learning and reasoning to detect specific attitudes of human agents and patterns of their interactions. Human attitudes are determined in te...
Boris Galitsky, Boris Kovalerchuk, Sergei O. Kuzne...
PAKDD
2005
ACM
134views Data Mining» more  PAKDD 2005»
14 years 3 months ago
Improved Bayesian Spam Filtering Based on Co-weighted Multi-area Information
Abstract. Bayesian spam filters, in general, compute probability estimations for tokens either without considering the email areas of occurrences except the body or treating the s...
Raju Shrestha, Yaping Lin