We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...
A method is presented to partition a given set of data entries embedded in Euclidean space by recursively bisecting clusters into smaller ones. The initial set is subdivided into ...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian m...
Space constrained optimization problems arise in a variety of applications, ranging from databases to ubiquitous computing. Typically, these problems involve selecting a set of it...
Themis Palpanas, Nick Koudas, Alberto O. Mendelzon
Redundancy analysis (RA) is a versatile technique used to predict multivariate criterion variables from multivariate predictor variables. The reduced-rank feature of RA captures r...