Sciweavers

290 search results - page 21 / 58
» Document normalization revisited
Sort
View
COLING
2000
13 years 9 months ago
A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution
This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness ...
Toru Hisamitsu, Yoshiki Niwa, Jun-ichi Tsujii
SIGIR
2011
ACM
12 years 11 months ago
When documents are very long, BM25 fails!
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
Yuanhua Lv, ChengXiang Zhai
ICPR
2002
IEEE
14 years 9 months ago
Robust Text Detection from Binarized Document Images
Many document images are rich in color and have complex background. To detect text from them, a standard approach utilizes both color and binary information. This often leads to t...
Oleg Okun, Yu Yan, Matti Pietikäinen
ICPR
2010
IEEE
13 years 6 months ago
Text Separation from Mixed Documents Using a Tree-Structured Classifier
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, ...
DOCENG
2010
ACM
13 years 5 months ago
Diffing, patching and merging XML documents: toward a generic calculus of editing deltas
This work addresses what we believe to be a central issue in the field of XML diff and merge computation: the mathematical modeling o-called editing deltas and the study of their ...
Jean-Yves Vion-Dury