TY - JOUR
T1 - Semisupervised Deep Learning for Image Classification with Distribution Mismatch
T2 - A Survey
AU - Calderon-Ramirez, Saul
AU - Yang, Shengxiang
AU - Elizondo, David
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Deep learning methodologies have been employed in several different fields, with an outstanding success in image recognition applications, such as material quality control, medical imaging, autonomous driving, etc. Deep learning models rely on the abundance of labeled observations to train a prospective model. These models are composed of millions of parameters to estimate, increasing the need of more training observations. Frequently, it is expensive to gather labeled observations of data, making the usage of deep learning models not ideal, as the model might overfit data. In a semisupervised setting, unlabeled data are used to improve the levels of accuracy and generalization of a model with small labeled datasets. Nevertheless, in many situations different unlabeled data sources might be available. This raises the risk of a significant distribution mismatch between the labeled and unlabeled datasets. Such phenomena can cause a considerable performance hit to typical semisupervised deep learning (SSDL) frameworks, which often assume that both labeled and unlabeled datasets are drawn from similar distributions. Therefore, in this article we study the latest approaches for SSDL for image recognition. Emphasis is made in SSDL models designed to deal with a distribution mismatch between the labeled and unlabeled datasets. We address open challenges with the aim to encourage the community to tackle them, and overcome the high data demand of traditional deep learning pipelines under real-world usage settings.
AB - Deep learning methodologies have been employed in several different fields, with an outstanding success in image recognition applications, such as material quality control, medical imaging, autonomous driving, etc. Deep learning models rely on the abundance of labeled observations to train a prospective model. These models are composed of millions of parameters to estimate, increasing the need of more training observations. Frequently, it is expensive to gather labeled observations of data, making the usage of deep learning models not ideal, as the model might overfit data. In a semisupervised setting, unlabeled data are used to improve the levels of accuracy and generalization of a model with small labeled datasets. Nevertheless, in many situations different unlabeled data sources might be available. This raises the risk of a significant distribution mismatch between the labeled and unlabeled datasets. Such phenomena can cause a considerable performance hit to typical semisupervised deep learning (SSDL) frameworks, which often assume that both labeled and unlabeled datasets are drawn from similar distributions. Therefore, in this article we study the latest approaches for SSDL for image recognition. Emphasis is made in SSDL models designed to deal with a distribution mismatch between the labeled and unlabeled datasets. We address open challenges with the aim to encourage the community to tackle them, and overcome the high data demand of traditional deep learning pipelines under real-world usage settings.
KW - Deep learning
KW - distribution mismatch
KW - image classification
KW - semisupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85135737136&partnerID=8YFLogxK
U2 - 10.1109/TAI.2022.3196326
DO - 10.1109/TAI.2022.3196326
M3 - Artículo
AN - SCOPUS:85135737136
SN - 2691-4581
VL - 3
SP - 1015
EP - 1029
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 6
ER -