Sciweavers

483 search results - page 70 / 97
» Sampling the Web as Training Data for Text Classification
Sort
View
LREC
2010
168views Education» more  LREC 2010»
15 years 4 months ago
Balancing SoNaR: IPR versus Processing Issues in a 500-Million-Word Written Dutch Reference Corpus
In The Low Countries, a major reference corpus for written Dutch is currently being built. In this paper, we discuss the interplay between data acquisition and data processing dur...
Martin Reynaert, Nelleke Oostdijk, Orphée D...
91
Voted
ACL
2003
15 years 4 months ago
Learning to Predict Pitch Accents and Prosodic Boundaries in Dutch
We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-comput...
Erwin Marsi, Martin Reynaert, Antal van den Bosch,...
126
Voted
ECAI
2004
Springer
15 years 8 months ago
Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets
Advances in data collection technologies allow accumulation of large and high dimensional datasets and provide opportunities for learning high quality classification and regression...
Kang Peng, Zoran Obradovic, Slobodan Vucetic
EMNLP
2009
15 years 11 days ago
Using the Web for Language Independent Spellchecking and Autocorrection
We have designed, implemented and evaluated an end-to-end system spellchecking and autocorrection system that does not require any manually annotated training data. The World Wide...
Casey Whitelaw, Ben Hutchinson, Grace Chung, Ged E...
184
Voted
RAID
1999
Springer
15 years 6 months ago
Anomaly Intrusion Detection Systems: Handling Temporal Relations Between Events
Lately, many approaches have been developed to discover computer abuse. Some of them use data mining techniques to discover anomalous behavior in audit trail, considering this beh...
Alexandr Seleznyov, Seppo Puuronen