While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimension...
In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams...
Nirmala Pudota, Antonina Dattolo, Andrea Baruzzo, ...
Innovative eCustoms solutions play an important role in the pan-European eGovernment strategy. The underlying premise is interoperability postulating a common understanding of pro...
Tobias Vogel, Alexander Schmidt, Alexander Lemm, H...
Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are u...
Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the d...
Traditional machine-learned ranking algorithms for web search are trained in batch mode, which assume static relevance of documents for a given query. Although such a batch-learni...
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (&q...
Content Management Systems (CMS) store enterprise data such as insurance claims, insurance policies, legal documents, patent applications, or archival data like in the case of dig...
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...
We propose a fast and robust skew estimation method for scanned documents that estimates skew angles based on piecewise covering of objects, such as textlines, figures, forms, or...