: We introduce an end-to-end framework for data quality that integrates business strategy, data quality models, and supporting investigative and governance processes. We also descr...
We describe a novel class of distributions, called Mondrian processes, which can be interpreted as probability distributions over kd-tree data structures. Mondrian processes are m...
: Models of the association between input accuracy and output accuracy imply that, for any given application, the effect of input errors on the output error rate generally varies i...
Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1) that...
We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound...
: Sufficiently high data quality is crucial for almost every application. Nonetheless, data quality issues are nearly omnipresent. The reasons for poor quality cannot simply be bla...
We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invarian...
Large networks of spiking neurons show abrupt changes in their collective dynamics resembling phase transitions studied in statistical physics. An example of this phenomenon is th...
In analogy to the PCA setting, the sparse PCA problem is often solved by iteratively alternating between two subtasks: cardinality-constrained rank-one variance maximization and m...
: Revenue Assurance describes a methodology to increase a company’s income by determining where revenue gets lost, and to maximize their profits by eliminating revenue leakage an...