Prof Stephen McKenna – University of Dundee
Multi-modal Activity Recognition in the Kitchen
Stephen McKenna (Joint work with Sebastian Stein)
Food preparation provides an ideal setting in which to explore many of the challenges involved in activity recognition. It involves complex manipulative gestures using a variety of utensils, ingredients and other objects, and large variation in the order and manner in which steps are performed even when following a fixed recipe. Research in this area is further motivated by applications such as situational support systems for activities of daily living.
We have recently made available the 50 Salads annotated dataset to facilitate research on this topic. It captures 25 people preparing two mixed salads each over a total of 4.5 hours. These activities were recorded using an RGB-D camera (Kinect) and accelerometers embedded in utensils. In this talk we describe this dataset and its annotation, and suggest various research problems for which it might prove useful. We then use it to investigate a 10-class activity recognition task, specify an evaluation protocol and present results for a range of methods. Motion features extracted from accelerometer data and video data are fused at different stages of the recognition process through accelerometer localization, feature concatenation, or by combining classifier outputs. Experiments demonstrate that fusing information can improve recognition performance. We also investigate user-adaptation and find that combining generic and user-specifc models can increase recognition accuracy, particularly when the number of training subjects is small.
If time allows, I will also summarise some other research from our group, on automatic analysis of cancer in histopathology images.
Stephen McKenna holds a Personal Chair in Computer Vision at the University of Dundee. He is based in the School of Computing where he leads the Computer Vision & Image Processing group (CVIP) together with colleagues Manuel Trucco and Jianguo Zhang. CVIP currently focuses on applications in biomedical image analysis and human activity recognition (http://cvip.computing.dundee.ac.uk), developing methods for modelling visual domains, typically using machine learning, to support interpretation and decision-making. Much of this research is interdisciplinary.
School of Computing Science & Digital Media, Robert Gordon University, Riverside East, Garthdee Road, Aberdeen, Conference Room N204, 14:10 – 15:10.