Sciweavers

801 search results - page 43 / 161
» The Inefficiency of Batch Training for Large Training Sets
Sort
View
KDD
2002
ACM
93views Data Mining» more  KDD 2002»
14 years 9 months ago
Interactive deduplication using active learning
Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
Sunita Sarawagi, Anuradha Bhamidipaty
CICLING
2009
Springer
14 years 3 months ago
Semi-supervised Word Sense Disambiguation Using the Web as Corpus
Abstract. As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks,...
Rafael Guzmán-Cabrera, Paolo Rosso, Manuel ...
ASSETS
2007
ACM
14 years 24 days ago
Corpus studies in word prediction
Word prediction can be used to enhance the communication rate of people with disabilities who use Augmentative and Alternative Communication (AAC) devices. We use statistical meth...
Keith Trnka, Kathleen F. McCoy
AUSDM
2007
Springer
102views Data Mining» more  AUSDM 2007»
14 years 24 days ago
A Two-Step Classification Approach to Unsupervised Record Linkage
Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that woul...
Peter Christen
ACL
2006
13 years 10 months ago
Discriminative Word Alignment with Conditional Random Fields
In this paper we present a novel approach for inducing word alignments from sentence aligned data. We use a Conditional Random Field (CRF), a discriminative model, which is estima...
Phil Blunsom, Trevor Cohn