TY - JOUR
T1 - Gestures In-The-Wild
T2 - Detecting Conversational Hand Gestures in Crowded Scenes Using a Multimodal Fusion of Bags of Video Trajectories and Body Worn Acceleration
AU - Cabrera-Quiros, Laura
AU - Tax, David M.J.
AU - Hung, Hayley
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - This paper addresses the detection of hand gestures during free-standing conversations in crowded mingle scenarios. Unlike the scenarios of the previous works in gesture detection and recognition, crowded mingle scenes have additional challenges such as cross-contamination between subjects, strong occlusions, and nonstationary backgrounds. This makes them more complex to analyze using computer vision techniques alone. We propose a multimodal approach using video and wearable acceleration data recorded via smart badges hung around the neck. In the video modality, we propose to treat noisy dense trajectories as bags-of-trajectories. For a given bag, we can have good trajectories corresponding to the subject, and bad trajectories due for instance to cross-contamination. However, we hypothesize that for a given class, it should be possible to learn trajectories that are discriminative while ignoring noisy trajectories. We do this by exploiting multiple instance learning via embedded instance selection as our multiple instance learning approach. This technique also allows us to identify which instances contribute more to the classification. By fusing the decisions of the classifiers from the video and wearable acceleration modalities, we show improvements over the unimodal approaches with an AUC of 0.69. We also present a static analysis and a dynamic analysis to assess the impact of noisy data on the fused detection results, showing that the moments of high occlusion in the video are compensated by the information from the wearables. Finally, we applied our method to detect speaking status, leveraging the close relationship found in the literature between hand gestures and speech.
AB - This paper addresses the detection of hand gestures during free-standing conversations in crowded mingle scenarios. Unlike the scenarios of the previous works in gesture detection and recognition, crowded mingle scenes have additional challenges such as cross-contamination between subjects, strong occlusions, and nonstationary backgrounds. This makes them more complex to analyze using computer vision techniques alone. We propose a multimodal approach using video and wearable acceleration data recorded via smart badges hung around the neck. In the video modality, we propose to treat noisy dense trajectories as bags-of-trajectories. For a given bag, we can have good trajectories corresponding to the subject, and bad trajectories due for instance to cross-contamination. However, we hypothesize that for a given class, it should be possible to learn trajectories that are discriminative while ignoring noisy trajectories. We do this by exploiting multiple instance learning via embedded instance selection as our multiple instance learning approach. This technique also allows us to identify which instances contribute more to the classification. By fusing the decisions of the classifiers from the video and wearable acceleration modalities, we show improvements over the unimodal approaches with an AUC of 0.69. We also present a static analysis and a dynamic analysis to assess the impact of noisy data on the fused detection results, showing that the moments of high occlusion in the video are compensated by the information from the wearables. Finally, we applied our method to detect speaking status, leveraging the close relationship found in the literature between hand gestures and speech.
KW - Hand gestures
KW - MILES
KW - crowded mingles
KW - dense trajectories
KW - multiple instance learning
KW - wearable acceleration
UR - http://www.scopus.com/inward/record.url?scp=85077803556&partnerID=8YFLogxK
U2 - 10.1109/TMM.2019.2922122
DO - 10.1109/TMM.2019.2922122
M3 - Artículo
AN - SCOPUS:85077803556
SN - 1520-9210
VL - 22
SP - 138
EP - 147
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 1
M1 - 8734888
ER -