Sciweavers

1052 search results - page 55 / 211
» Improved CHAID algorithm for document structure modelling
Sort
View
ICDAR
2009
IEEE
14 years 4 months ago
Text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis
In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriti...
Vincent Malleron, Véronique Eglin, Hubert E...
ACL
2008
13 years 10 months ago
Learning Bigrams from Unigrams
Traditional wisdom holds that once documents are turned into bag-of-words (unigram count) vectors, word orders are completely lost. We introduce an approach that, perhaps surprisi...
Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, R...
ICDE
2005
IEEE
122views Database» more  ICDE 2005»
14 years 2 months ago
Signature-based Filtering Techniques for Structural Joins of XML Data
Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Efficient support for ...
Huan Huo, Guoren Wang, Chuan Yang, Rui Zhou
ICDE
2003
IEEE
143views Database» more  ICDE 2003»
14 years 10 months ago
Index-Based Approximate XML Joins
XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Ting...
IPM
2006
77views more  IPM 2006»
13 years 9 months ago
A general matrix framework for modelling Information Retrieval
Content-oriented retrieval models are based on a document-term matrix, whereas link-oriented retrieval models are based on an adjacent (parentchild) matrix. Term frequency and inv...
Thomas Rölleke, Theodora Tsikrika, Gabriella ...