Search Sciweavers | Sciweavers

241 search results - page 1 / 49

» Detecting Co-Derivative Documents in Large Text Collections

191

click to vote

LREC
2008

130views Education» more LREC 2008»

Detecting Co-Derivative Documents in Large Text Collections

15 years 8 months ago

Download www.lrec-conf.org

We have analyzed the SPEX algorithm by Bernstein and Zobel (2004) for detecting co-derivative documents using duplicate n-grams. Although we totally agree with the claim that not ...

Jan Pomikálek, Pavel Rychlý

claim paper

Read More »

164

click to vote

COLING
2010

108views Computational Linguistics» more COLING 2010»

Large Scale Parallel Document Mining for Machine Translation

15 years 1 months ago

Download static.googleusercontent.com

A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...

Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...

claim paper

Read More »

202

click to vote

CORR
2006
Springer

178views Education» more CORR 2006»

A tool set for the quick and efficient exploration of large document collections

15 years 6 months ago

Download langtech.jrc.ec.europa.eu

: We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the do...

Camelia Ignat, Bruno Pouliquen, Ralf Steinberger, ...

claim paper

Read More »

168

Voted

ECIR
2009
Springer

142views Information Technology» more ECIR 2009»

Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation

16 years 3 months ago

Download www.cse.psu.edu

Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many ﬁelds and ...

Levent Bolelli, Seyda Ertekin, C. Lee Giles

claim paper

Read More »

181

click to vote

ICDAR
2007
IEEE

163views Document Analysis» more ICDAR 2007»

Content-level Annotation of Large Collection of Printed Document Images

16 years 1 months ago

Download cvit.iiit.ac.in

A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...

Anand Kumar 0002, C. V. Jawahar

claim paper

Read More »

« Prev « First page 1 / 49 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers