This position paper presents an algorithm, which determines similarities between text documents. These text documents are indexed with keywords and further background knowledge-ter...
Most text analysis is designed to deal with the concept of a “document”, namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the ...
This work provides algorithms and heuristics to index text documents by determining important topics in the documents. To index text documents, the work provides algorithms to gene...
It is important to automatically extract key information from sensitive text documents for intelligence analysis. Text documents are usually unstructured and information extraction...
Assessing semantic similarity between text documents is a crucial aspect in Information Retrieval systems. In this work, we propose to use hyperlink information to derive a simila...
Text documents, in electronic and hardcopy forms, are and will probably remain the most widely used kind of content in our digital age. The goal of this paper is to overview proto...
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significan...
In order to navigate huge document collections efficiently, tagged hierarchical structures can be used. For users, it is important to correctly interpret tag combinations. In this ...
We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While...
The proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information. Although tex...
Sriharsha Veeramachaneni, Diego Sona, Paolo Avesan...