The goal of this work is to automatically learn a large
number of British Sign Language (BSL) signs from TV
broadcasts. We achieve this by using the supervisory information
available from subtitles broadcast simultaneously
with the signing.
This supervision is both weak and noisy: it is weak due
to the correspondence problem since temporal distance between
sign and subtitle is unknown and signing does not follow
the text order; it is noisy because subtitles can be signed
in different ways, and because the occurrence of a subtitle
word does not imply the presence of the corresponding sign.
The contributions are: (i) we propose a distance function
to match signing sequences which includes the trajectory of
both hands, the hand shape and orientation, and properly
models the case of hands touching; (ii) we show that by
optimizing a scoring function based on multiple instance
learning, we are able to extract the sign of interest from
hours of signing footage, despite the ve...