Search Sciweavers | Sciweavers

116

CICLING
2007
Springer

149views Natural Language Processing» more CICLING 2007»

Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

15 years 9 months ago

Clustering short length texts is a diﬃcult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed thi...

David Pinto, José-Miguel Benedí, Pao...

claim paper

Read More »

132

Voted

ICDAR
2003
IEEE

123views Document Analysis» more ICDAR 2003»

Video text recognition using feature compensation as category-dependent feature extraction

15 years 8 months ago

Download www.cse.salford.ac.uk

When recognizing multiple fonts, geometric features, such as the directional information of strokes, are generally robust against deformation but are weak against degradation. Thi...

Minoru Mori

claim paper

Read More »

128

click to vote

ICDAR
2003
IEEE

111views Document Analysis» more ICDAR 2003»

Learning the lexicon from raw texts for open-vocabulary Korean word recognition

15 years 8 months ago

Download www.cse.salford.ac.uk

In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to ...

Sungho Ryu, Jin Hyung Kim

claim paper

Read More »

128

click to vote

DMKD
2000
ACM

110views Data Mining» more DMKD 2000»

Combining Strategies for Extracting Relations from Text Collections

15 years 7 months ago

Download www.cs.columbia.edu

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...

Eugene Agichtein, Eleazar Eskin, Luis Gravano

claim paper

Read More »

110

click to vote

AMTA
1998
Springer

103views Information Technology» more AMTA 1998»

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

15 years 7 months ago

Download www.lib.umd.edu

Abstract. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-speci city, licensing restri...

Philip Resnik

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers