TY - JOUR
T1 - Comparison performance of machine learning and geostatistical methods for the interpolation of monthly air temperature over Costa Rica
AU - Méndez, M.
AU - Calvo-Valverde, L. A.
N1 - Publisher Copyright:
© 2020 IOP Publishing Ltd. All rights reserved.
PY - 2020/1/28
Y1 - 2020/1/28
N2 - The performance of three machine learning (ML) methods; cubist regression (CR), random forest (RF) and generalized additive model using splines (GAM) in generating monthly air temperature grids over Costa Rica was evaluated against two heavily used geostatistical methods; ordinary kriging (OK) and kriging with external drift (KED). The skill of the interpolation methods was evaluated using a 10-fold cross-validation technique; selecting the root-mean square error (RMSE), the mean absolute error (MAE) and the Pearson correlation-coefficient (R) as agreement metrics. To this purpose, data from an irregularly-distributed observational-network comprised by 73 weather-stations were selected for the period 1950-1987. Several spatial fields derived from a high-resolution digital elevation model (DEM) were tested as covariants. Results from the 10-fold cross-validation test show that CR yielded the best individual performance followed by KED, whereas GAM performed worst. Elevation on the other hand, was the only covariant ultimately incorporated in the interpolation process, since the remaining spatial fields exhibited poor correlation with temperature or resulted in data redundancy. While the quantitative and qualitative evaluation of CR and KED can be said to be comparable, CR is considered the best approach since the method is unaffected by assumptions on data normality and homoscedasticity.
AB - The performance of three machine learning (ML) methods; cubist regression (CR), random forest (RF) and generalized additive model using splines (GAM) in generating monthly air temperature grids over Costa Rica was evaluated against two heavily used geostatistical methods; ordinary kriging (OK) and kriging with external drift (KED). The skill of the interpolation methods was evaluated using a 10-fold cross-validation technique; selecting the root-mean square error (RMSE), the mean absolute error (MAE) and the Pearson correlation-coefficient (R) as agreement metrics. To this purpose, data from an irregularly-distributed observational-network comprised by 73 weather-stations were selected for the period 1950-1987. Several spatial fields derived from a high-resolution digital elevation model (DEM) were tested as covariants. Results from the 10-fold cross-validation test show that CR yielded the best individual performance followed by KED, whereas GAM performed worst. Elevation on the other hand, was the only covariant ultimately incorporated in the interpolation process, since the remaining spatial fields exhibited poor correlation with temperature or resulted in data redundancy. While the quantitative and qualitative evaluation of CR and KED can be said to be comparable, CR is considered the best approach since the method is unaffected by assumptions on data normality and homoscedasticity.
UR - http://www.scopus.com/inward/record.url?scp=85079785153&partnerID=8YFLogxK
U2 - 10.1088/1755-1315/432/1/012011
DO - 10.1088/1755-1315/432/1/012011
M3 - Artículo de la conferencia
AN - SCOPUS:85079785153
SN - 1755-1307
VL - 432
JO - IOP Conference Series: Earth and Environmental Science
JF - IOP Conference Series: Earth and Environmental Science
IS - 1
M1 - 012011
T2 - 2019 International Conference on Resources and Environmental Research, ICRER 2019
Y2 - 25 October 2019 through 27 October 2019
ER -