We propose a distributed parallel support vector machine (DPSVM) training mechanism in a configurable network environment for distributed data mining. The basic idea is to exchange support vectors among a strongly connected network (SCN) so that multiple servers may work concurrently on distributed data set with limited communication cost and fast training speed. The percentage of servers that can work in parallel and the communication overhead may be adjusted through network configuration. The proposed algorithm further speeds up through online implementation and synchronization. We prove that the global optimal classifier can be achieved iteratively over a strongly connected network. Experiments on a real world data set show that the computing time scales well with the size of the training data for most networks. Numerical results show that a randomly generated SCN may achieve better performance than the state of the art method, Cascade SVM, in terms of total training time.
Yumao Lu, Vwani P. Roychowdhury, L. Vandenberghe