No-audio multimodal speech detection task at MediaEval 2020

Laura Cabrera-Quiros, Jose Vargas, Hayley Hung

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

Resumen

This overview paper provides a description of the No-Audio multimodal speech detection task for MediaEval 2020. Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle event, from multimodal data. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system proposed must exploit the natural human movements that accompany speech, captured by cameras and wearable sensors. Task participants are provided with cropped videos of individuals while interacting, captured by an overhead camera, and the tri-axial acceleration of each individual throughout the event, captured with a single badge-like device hung around the neck. This year's edition of the task also focuses on investigating posible reasons for interpersonal differences in the performances obtained.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen2882
EstadoPublicada - 2020
EventoMultimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 - Virtual, Online
Duración: 14 dic 202015 dic 2020

Huella

Profundice en los temas de investigación de 'No-audio multimodal speech detection task at MediaEval 2020'. En conjunto forman una huella única.

Citar esto