—Many real world learning problems can be recast as multi-task learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multi-task setting has many applications in fields of computer vision, text classification and bio-informatics. Generally, it can be realized by solving a L-1-infinity regularized optimization problem. And the solution automatically yields the joint sparsity among different tasks. However, due to the nonsmooth nature of the L-1-infinity norm, there lacks an efficient training algorithm for solving such problem with general convex loss functions. In this paper, we propose an accelerated gradient method based on an “optimal” first order black-box method named after Nesterov and provide the convergence rate for smooth convex loss functions. For nonsmooth convex loss functions, such as hinge loss, our method still has fast convergence ra...
Xi Chen, Weike Pan, James T. Kwok, Jaime G. Carbon