TY - GEN
T1 - Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python
AU - Villalobos, Johansell
AU - Meneses, Esteban
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2024
Y1 - 2024
N2 - In this paper, the Numba, JAX, CuPy, PyTorch, and TensorFlow Python GPU accelerated libraries were benchmarked using scientific numerical kernels on a NVIDIA V100 GPU. The benchmarks consisted of a simple Monte Carlo estimation, a particle interaction kernel, a stencil evolution of an array, and tensor operations. The benchmarking procedure included general memory consumption measurements, a statistical analysis of scalability with problem size to determine the best libraries for the benchmarks, and a productivity measurement using source lines of code (SLOC) as a metric. It was statistically determined that the Numba library outperforms the rest on the Monte Carlo, particle interaction, and stencil benchmarks. The deep learning libraries show better performance on tensor operations. The SLOC count was similar for all the libraries except Numba which presented a higher SLOC count which implies more time is needed for code development.
AB - In this paper, the Numba, JAX, CuPy, PyTorch, and TensorFlow Python GPU accelerated libraries were benchmarked using scientific numerical kernels on a NVIDIA V100 GPU. The benchmarks consisted of a simple Monte Carlo estimation, a particle interaction kernel, a stencil evolution of an array, and tensor operations. The benchmarking procedure included general memory consumption measurements, a statistical analysis of scalability with problem size to determine the best libraries for the benchmarks, and a productivity measurement using source lines of code (SLOC) as a metric. It was statistically determined that the Numba library outperforms the rest on the Monte Carlo, particle interaction, and stencil benchmarks. The deep learning libraries show better performance on tensor operations. The SLOC count was similar for all the libraries except Numba which presented a higher SLOC count which implies more time is needed for code development.
KW - Graphics Processing Units
KW - Parallel Programming
KW - Parallel Python
UR - http://www.scopus.com/inward/record.url?scp=85185702190&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-52186-7_1
DO - 10.1007/978-3-031-52186-7_1
M3 - Contribución a la conferencia
AN - SCOPUS:85185702190
SN - 9783031521850
T3 - Communications in Computer and Information Science
SP - 3
EP - 20
BT - High Performance Computing - 10th Latin American Conference, CARLA 2023, Revised Selected Papers
A2 - Barrios H., Carlos J.
A2 - Rizzi, Silvio
A2 - Meneses, Esteban
A2 - Mocskos, Esteban
A2 - Monsalve Diaz, Jose M.
A2 - Montoya, Javier
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th Latin American Conference on High Performance Computing, CARLA 2023
Y2 - 18 September 2023 through 22 September 2023
ER -