Search Sciweavers | Sciweavers

187 search results - page 9 / 38

» Entity categorization over large document collections

220

Voted

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 6 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

176

Voted

IJCNN
2007
IEEE

101views Neural Networks» more IJCNN 2007»

Text Representations for Text Categorization: A Case Study in Biomedical Domain

16 years 1 months ago

Download www.comp.nus.edu.sg

— In vector space model (VSM), textual documents are represented as vectors in the term space. Therefore, there are two issues in this representation, i.e. (1) what should a term...

Man Lan, Chew Lim Tan, Jian Su, Hwee-Boon Low

claim paper

Read More »

150

click to vote

SIGIR
2004
ACM

130views Information Technology» more SIGIR 2004»

Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

16 years 4 days ago

Download www.cs.technion.ac.il

Although text categorization is a burgeoning area of IR research, readily available test collections in this ﬁeld are surprisingly scarce. We describe a methodology and system (...

Dmitry Davidov, Evgeniy Gabrilovich, Shaul Markovi...

claim paper

Read More »

165

click to vote

COLING
2010

108views Computational Linguistics» more COLING 2010»

Large Scale Parallel Document Mining for Machine Translation

15 years 1 months ago

Download static.googleusercontent.com

A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...

Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...

claim paper

Read More »

180

click to vote

EDBT
2004
ACM

133views Database» more EDBT 2004»

HOPI: An Efficient Connection Index for Complex XML Document Collections

16 years 6 months ago

Download wwwcs.upb.de

In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2?hop cover of a directed graph introduced by Cohen et al. In contrast to most o...

Ralf Schenkel, Anja Theobald, Gerhard Weikum

claim paper

Read More »

« Prev « First page 9 / 38 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers