The basis of consciousness is an association of notions, the neuronal network. Similarly, the creation of a next generation internet (semantic web) is impossible without attributes...
Abstract--Unlimited vocabulary annotation of multimedia documents remains elusive despite progress solving the problem in the case of a small, fixed lexicon. Taking advantage of th...
Abstract. With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether a...
Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classif...
Thiago Salles, Leonardo C. da Rocha, Gisele L. Pap...
Natural language is the main presentation means in industrial requirements documents. This leads to the fact that requirements documents are often incomplete and inconsistent. Desp...
This paper presents a pair of identification technique that automatically detect scripts and orientations of document images suffering from various types of document degradation. ...
This paper follows a word-document co-clustering model independently introduced in 2001 by several authors such as I.S. Dhillon, H. Zha and C. Ding. This model consists in creatin...
Huge amounts of legacy documents are being published by on-line digital libraries world wide. However, for these raw digital images to be really useful, they need to be transcribe...
Extractive multi-document summarization is the task of choosing sentences from a set of documents to compose a summary text in response to a user query. We propose a generative ap...
Modern search engines are expected to make documents searchable shortly after they appear on the ever changing Web. To satisfy this requirement, the Web is frequently crawled. Due...