L1 (also referred to as the 1-norm or Lasso) penalty based formulations have been shown to be effective in problem domains when noisy features are present. However, the L1 penalty...
A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, howeve...
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropo...
—This paper investigates the problem of incremental detection of errors in distributed data. Given a distributed database D, a set Σ of conditional functional dependencies (CFDs...
Gene expression (microarray) data have been used widely in bioinformatics. The expression data of a large number of genes from small numbers of subjects are used to identify inform...
Credit scoring is a method of modelling potential risk of credit applications. Traditionally, logistic regression, linear regression and discriminant analysis are the most popular...