TY - JOUR
T1 - A Study of Pipeline Parallelism in Deep Neural Networks
AU - Núñez, Gabriel
AU - Romero-Sandí, Hairol
AU - Rojas, Elvis
AU - Meneses, Esteban
N1 - Publisher Copyright:
©2024 Universidad Autónoma de Bucaramanga - UNAB.
PY - 2024/1/30
Y1 - 2024/1/30
N2 - The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.
AB - The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.
KW - artificial neural networks
KW - Deep learning
KW - distributed training
KW - parallelism
UR - http://www.scopus.com/inward/record.url?scp=85199713995&partnerID=8YFLogxK
U2 - 10.29375/25392115.5056
DO - 10.29375/25392115.5056
M3 - Artículo
AN - SCOPUS:85199713995
SN - 1657-2831
VL - 25
SP - 48
EP - 59
JO - Revista Colombiana de Computacion
JF - Revista Colombiana de Computacion
IS - 1
ER -