Fully distributed EM for very large datasets

15 years 3 months ago

Download www.cs.berkeley.edu

In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the Mstep can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.

Jason Wolfe, Aria Haghighi, Dan Klein

Real-time Traffic

Entire Em Procedure | ICML 2008 | Independent Given Parameters | Large Data Sets | Machine Learning |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2008
Where	ICML
Authors	Jason Wolfe, Aria Haghighi, Dan Klein

Comments (0)

Sciweavers

Fully distributed EM for very large datasets

Entire Em Procedure | ICML 2008 | Independent Given Parameters | Large Data Sets | Machine Learning |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers