TY - GEN
T1 - Real-Time Hand Detection using Convolutional Neural Networks for Costa Rican Sign Language Recognition
AU - Zamora-Mora, Juan
AU - Chacon-Rivas, Mario
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Sign language is the natural language for the deaf, something that comes naturally as a form of non-verbal communication between signers, ruled by a set of grammars that is in constant evolution as the universe of signs represents a small fraction of all words in Spanish. This limitation combined with the lack of knowledge in sign language by verbal speakers creates a separation where both parties (signers and non-signers) are unable to efficiently communicate, a problem that increases under a specific context such as emergency situations, where first-response teams such as EMTs, firefighters or police officers might be unable to properly attend an emergency as interactions between the involved parties becomes a barrier for decision making when time is scarce. Developing a cognitive-capable tool that serves to recognize sign language in a ubiquitous way, is a must to reduce barriers between the deaf and emergency corps under this context. Hand detection is the first step toward building a Costa Rican sign language (LESCO) recognition framework. Important advances in computing, particularly in the area of deep learning, open a new frontier for object recognition that can be leveraged to build a hand detection module. This study trains the MobileNet V1 convolutional neural network against the EgoHands dataset from Indiana University's UI Computer Vision Lab to determine if the dataset itself is sufficient to detect hands in LESCO videos, from five different signers that wear short-sleeve shirts and under complex backgrounds. Those requirements are key to determine the usefulness of the solution as consulted bibliography performs tests with single-color backgrounds and long-sleeve shirts that ease the classification tasks under controlled environments only. The two-step experiment obtained 1) a mean average precision of 96.1% for the EgoHands dataset and 2) a 91% average accuracy for hand detection across the five LESCO videos. Despite the high accuracy reported by the tests in this paper, the hand detection module was unable to detect certain hand shapes such as closed fists and open hands pointing perpendicular to the camera lens, which suggests that the complex egocentric views as captured in the EgoHands dataset might be insufficient for proper hand detection for Costa Rican sign language.
AB - Sign language is the natural language for the deaf, something that comes naturally as a form of non-verbal communication between signers, ruled by a set of grammars that is in constant evolution as the universe of signs represents a small fraction of all words in Spanish. This limitation combined with the lack of knowledge in sign language by verbal speakers creates a separation where both parties (signers and non-signers) are unable to efficiently communicate, a problem that increases under a specific context such as emergency situations, where first-response teams such as EMTs, firefighters or police officers might be unable to properly attend an emergency as interactions between the involved parties becomes a barrier for decision making when time is scarce. Developing a cognitive-capable tool that serves to recognize sign language in a ubiquitous way, is a must to reduce barriers between the deaf and emergency corps under this context. Hand detection is the first step toward building a Costa Rican sign language (LESCO) recognition framework. Important advances in computing, particularly in the area of deep learning, open a new frontier for object recognition that can be leveraged to build a hand detection module. This study trains the MobileNet V1 convolutional neural network against the EgoHands dataset from Indiana University's UI Computer Vision Lab to determine if the dataset itself is sufficient to detect hands in LESCO videos, from five different signers that wear short-sleeve shirts and under complex backgrounds. Those requirements are key to determine the usefulness of the solution as consulted bibliography performs tests with single-color backgrounds and long-sleeve shirts that ease the classification tasks under controlled environments only. The two-step experiment obtained 1) a mean average precision of 96.1% for the EgoHands dataset and 2) a 91% average accuracy for hand detection across the five LESCO videos. Despite the high accuracy reported by the tests in this paper, the hand detection module was unable to detect certain hand shapes such as closed fists and open hands pointing perpendicular to the camera lens, which suggests that the complex egocentric views as captured in the EgoHands dataset might be insufficient for proper hand detection for Costa Rican sign language.
KW - convolutional neural network
KW - deep learning
KW - hand detection
KW - LESCO
KW - machine learning
KW - Sign language recognition
UR - http://www.scopus.com/inward/record.url?scp=85079275712&partnerID=8YFLogxK
U2 - 10.1109/CONTIE49246.2019.00042
DO - 10.1109/CONTIE49246.2019.00042
M3 - Contribución a la conferencia
AN - SCOPUS:85079275712
T3 - Proceedings - 2019 International Conference on Inclusive Technologies and Education, CONTIE 2019
SP - 180
EP - 186
BT - Proceedings - 2019 International Conference on Inclusive Technologies and Education, CONTIE 2019
A2 - Carreno-Leon, Monica Adriana
A2 - Sandoval-Bringas, Jesus Andres
A2 - Chacon-Rivas, Mario
A2 - Alvarez-Rodriguez, Francisco Javier
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Inclusive Technologies and Education, CONTIE 2019
Y2 - 30 October 2019 through 1 November 2019
ER -