Identifying handled objects, i.e. objects being manipulated by a user, is essential for recognizing the person’s activities. An egocentric camera as worn on the body enjoys many advantages such as having a natural first-person view and not needing to instrument the environment. It is also a challenging setting, where background clutter is known to be a major source of problems and is difficult to handle with the camera constantly and arbitrarily moving. In this work we develop a bottom-up motion-based approach to robustly segment out foreground objects in egocentric video and show that it greatly improves object recognition accuracy. Our key insight is that egocentric video of object manipulation is a special domain and many domain-specific cues can readily help. We compute dense optical flow and fit it into multiple affine layers. We then use a max-margin classifier to combine motion with empirical knowledge of object location and background movement as well as temporal cues...