In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriti...
Traditional wisdom holds that once documents are turned into bag-of-words (unigram count) vectors, word orders are completely lost. We introduce an approach that, perhaps surprisi...
Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, R...
Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Efficient support for ...
XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Ting...
Content-oriented retrieval models are based on a document-term matrix, whereas link-oriented retrieval models are based on an adjacent (parentchild) matrix. Term frequency and inv...