Sciweavers

JAIR
2010
94views more  JAIR 2010»
13 years 10 months ago
Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback
While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimension...
Sajib Dasgupta, Vincent Ng
IRCDL
2010
13 years 10 months ago
A New Domain Independent Keyphrase Extraction System
In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams...
Nirmala Pudota, Antonina Dattolo, Andrea Baruzzo, ...
JTAER
2008
118views more  JTAER 2008»
13 years 10 months ago
Service and Document Based Interoperability for European eCustoms Solutions
Innovative eCustoms solutions play an important role in the pan-European eGovernment strategy. The underlying premise is interoperability postulating a common understanding of pro...
Tobias Vogel, Alexander Schmidt, Alexander Lemm, H...
CIKM
2010
Springer
13 years 10 months ago
Improved index compression techniques for versioned document collections
Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are u...
Jinru He, Junyuan Zeng, Torsten Suel
CIKM
2010
Springer
13 years 10 months ago
Reverted indexing for feedback and expansion
Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the d...
Jeremy Pickens, Matthew Cooper, Gene Golovchinsky
CIKM
2010
Springer
13 years 10 months ago
Online learning for recency search ranking using real-time user feedback
Traditional machine-learned ranking algorithms for web search are trained in batch mode, which assume static relevance of documents for a given query. Although such a batch-learni...
Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohu...
SCHOLARPEDIA
2008
109views more  SCHOLARPEDIA 2008»
13 years 11 months ago
Latent semantic analysis
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (&q...
Thomas K. Landauer, Susan T. Dumais
PVLDB
2008
101views more  PVLDB 2008»
13 years 11 months ago
Multidimensional content eXploration
Content Management Systems (CMS) store enterprise data such as insurance claims, insurance policies, legal documents, patent applications, or archival data like in the case of dig...
Alkis Simitsis, Akanksha Baid, Yannis Sismanis, Be...
PVLDB
2008
85views more  PVLDB 2008»
13 years 11 months ago
Scalable ad-hoc entity extraction from text collections
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaud...
PR
2007
100views more  PR 2007»
13 years 11 months ago
Estimation of skew angles for scanned documents based on piecewise covering by parallelograms
We propose a fast and robust skew estimation method for scanned documents that estimates skew angles based on piecewise covering of objects, such as textlines, figures, forms, or...
Chien-Hsing Chou, Shih-Yu Chu, Fu Chang