This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising s...
In contrast to traditional information retrieval systems, which return ranked lists of documents that users must manually browse through, a question answering system attempts to d...
As camera resolution increases, high-speed non-contact text capture through a digital camera is opening up a new channel for text capture and understanding. Unfortunately, the cap...
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for ...
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pa...
mation science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization mode...
Investigative analysts who work with collections of text documents connect embedded threads of evidence in order to formulate hypotheses about plans and activities of potential in...
Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the la...
In many situations, individuals or groups of individuals are faced with the need to examine sets of documents to achieve understanding of their structure and to locate relevant in...
Alneu de Andrade Lopes, Roberto Pinho, Fernando Vi...
Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a...