TY - JOUR
T1 - No-audio multimodal speech detection task at mediaeval 2019
AU - Gedik, Ekin
AU - Cabrera-Quiros, Laura
AU - Hung, Hayley
N1 - Publisher Copyright:
© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2019
Y1 - 2019
N2 - This overview paper provides a description of the No-Audio multimodal speech detection task for the MediaEval 2019. Same as the first edition that was held in 2018, the task again focuses on the estimation of speaking status from multimodal data. Task participants are provided with cropped videos of individuals interacting freely during a crowded mingle event, captured by an overhead camera. Each individuals tri-axial acceleration throughout the event, captured with a single badge-like device hung around the neck, is also provided. The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system must exploit the natural human movements that accompany speech. The task seeks to achieve competitive estimation performance compared to audio-based systems by exploiting the multi-modal aspects of the problem.
AB - This overview paper provides a description of the No-Audio multimodal speech detection task for the MediaEval 2019. Same as the first edition that was held in 2018, the task again focuses on the estimation of speaking status from multimodal data. Task participants are provided with cropped videos of individuals interacting freely during a crowded mingle event, captured by an overhead camera. Each individuals tri-axial acceleration throughout the event, captured with a single badge-like device hung around the neck, is also provided. The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system must exploit the natural human movements that accompany speech. The task seeks to achieve competitive estimation performance compared to audio-based systems by exploiting the multi-modal aspects of the problem.
UR - http://www.scopus.com/inward/record.url?scp=85091554302&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85091554302
SN - 1613-0073
VL - 2670
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2019 Working Notes of the MediaEval Workshop, MediaEval 2019
Y2 - 27 October 2019 through 30 October 2019
ER -