In many image and video collections, we have access
only to partially labeled data. For example, personal photo
collections often contain several faces per image and a caption
that only specifies who is in the picture, but not which
name matches which face. Similarly, movie screenplays can
tell us who is in the scene, but not when and where they are
on the screen. We formulate the learning problem in this setting
as partially-supervised multiclass classification where
each instance is labeled ambiguously with more than one
label. We show theoretically that effective learning is possible
under reasonable assumptions even when all the data
is weakly labeled. Motivated by the analysis, we propose
a general convex learning formulation based on minimization
of a surrogate loss appropriate for the ambiguous label
setting. We apply our framework to identifying faces culled
from web news sources and to naming characters in TV series
and movies. We experiment on a very large dat...