Sciweavers

COLING
2010
13 years 6 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
CORR
2010
Springer
116views Education» more  CORR 2010»
13 years 11 months ago
LiquidXML: Adaptive XML Content Redistribution
We propose to demonstrate LiquidXML, a platform for managing large corpora of XML documents in large-scale P2P networks. All LiquidXML peers may publish XML documents to be shared...
Jesús Camacho-Rodríguez, Asterios Ka...
FLAIRS
2006
14 years 25 days ago
Corpus Based Unsupervised Labeling of Documents
Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...
Delip Rao, Deepak P, Deepak Khemani
LREC
2010
200views Education» more  LREC 2010»
14 years 27 days ago
A Corpus Factory for Many Languages
For many languages there are no large, general-language corpora available. Until the web, all but the richest institutions could do little but shake their heads in dismay as corpu...
Adam Kilgarriff, Siva Reddy, Jan Pomikálek,...
ACL
2008
14 years 27 days ago
A Subcategorization Acquisition System for French Verbs
This paper presents a system capable of automatically acquiring subcategorization frames (SCFs) for French verbs from the analysis of large corpora. We applied the system to a lar...
Cédric Messiant
COLCOM
2005
IEEE
14 years 5 months ago
An experimental evaluation of spam filter performance and robustness against attack
— In this paper, we show experimentally that learning filters are able to classify large corpora of spam and legitimate email messages with a high degree of accuracy. The corpor...
Steve Webb, Subramanyam Chitti, Calton Pu