TY - JOUR
T1 - ML4H Auditing
T2 - 6th Workshop on Machine Learning for Health: Advancing Healthcare for All, ML4H 2020, in conjunction with the 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Oala, Luis
AU - Fehr, Jana
AU - Gilli, Luca
AU - Balachandran, Pradeep
AU - Leite, Alixandro Werneck
AU - Calderon-Ramirez, Saul
AU - Li, Danny Xie
AU - Nobis, Gabriel
AU - Alvarado, Erick Alejandro Muñoz
AU - Jaramillo-Gutierrez, Giovanna
AU - Matek, Christian
AU - Shroff, Arun
AU - Kherif, Ferath
AU - Sanguinetti, Bruno
AU - Wiegand, Thomas
N1 - Publisher Copyright:
© 2020 L. Oala et al.
PY - 2020
Y1 - 2020
N2 - Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and case-adapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.
AB - Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and case-adapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.
KW - Health
KW - Machine Learning
KW - Testing
UR - http://www.scopus.com/inward/record.url?scp=85159455396&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85159455396
SN - 2640-3498
VL - 136
SP - 280
EP - 317
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 11 December 2020
ER -