A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

13 years 4 months ago

Download www.cs.rochester.edu

We are developing a testbed for learning by demonstration combining spoken language and sensor data in a natural real-world environment. Microsoft Kinect RGBDepth cameras allow us to infer high-level visual features, such as the relative position of objects in space, with greater precision and less training than required by traditional systems. Speech is recognized and parsed using a “deep” parsing system, so that language features are available at the word, syntactic, and semantic levels. We collected an initial data set of 10 episodes of 7 individuals demonstrating how to “make tea”, and created a “gold standard” hand annotation of the actions performed in each. Finally, we are constructing “baseline” HMM-based activity recognition models using the visual and language features, in order to be ready to evaluate the performance of our future work on deeper and more structured models. Most research in AI has explored problems of natural language understanding, visual pe...

Young Chol Song, Henry A. Kautz

Real-time Traffic

AAAI 2012 | Intelligent Agents | Knowledge Representation And Reasoning | Natural Language Processing | Natural Language Understanding |

claim paper

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	AAAI
Authors	Young Chol Song, Henry A. Kautz

Sciweavers

A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

AAAI 2012 | Intelligent Agents | Knowledge Representation And Reasoning | Natural Language Processing | Natural Language Understanding |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers