TY - JOUR
T1 - How well can machine learning models perform without hydrologists?
T2 - Application of rational feature selection to improve hydrological forecasting
AU - Moreido, Vsevolod
AU - Gartsman, Boris
AU - Solomatine, Dimitri P.
AU - Suchilina, Zoya
PY - 2021
Y1 - 2021
N2 - With more machine learning methods being involved in social and environmental research activities, we are addressing the role of available information for model training in model performance. We tested the abilities of several machine learning models for short-term hydrological forecasting by inferring linkages with all available predictors or only with those pre-selected by a hydrologist. The models used in this study were multivariate linear regression, the M5 model tree, multilayer perceptron (MLP) artificial neural network, and the long short-term memory (LSTM) model. We used two river catchments in contrasting runoff generation conditions to try to infer the ability of different model structures to automatically select the best predictor set from all those available in the dataset and compared models’ performance with that of a model operating on predictors prescribed by a hydrologist. Additionally, we tested how shuffling of the initial dataset improved model performance. We can conclude that in rainfall-driven catchments, the models performed generally better on a dataset prescribed by a hydrologist, while in mixed-snowmelt and baseflow-driven catchments, the automatic selection of predictors was preferable.
AB - With more machine learning methods being involved in social and environmental research activities, we are addressing the role of available information for model training in model performance. We tested the abilities of several machine learning models for short-term hydrological forecasting by inferring linkages with all available predictors or only with those pre-selected by a hydrologist. The models used in this study were multivariate linear regression, the M5 model tree, multilayer perceptron (MLP) artificial neural network, and the long short-term memory (LSTM) model. We used two river catchments in contrasting runoff generation conditions to try to infer the ability of different model structures to automatically select the best predictor set from all those available in the dataset and compared models’ performance with that of a model operating on predictors prescribed by a hydrologist. Additionally, we tested how shuffling of the initial dataset improved model performance. We can conclude that in rainfall-driven catchments, the models performed generally better on a dataset prescribed by a hydrologist, while in mixed-snowmelt and baseflow-driven catchments, the automatic selection of predictors was preferable.
KW - Hydrological forecasting
KW - Machine learning
KW - Rainfall-runoff models
UR - http://www.scopus.com/inward/record.url?scp=85108945653&partnerID=8YFLogxK
U2 - 10.3390/w13121696
DO - 10.3390/w13121696
M3 - Article
AN - SCOPUS:85108945653
SN - 2073-4441
VL - 13
SP - 1
EP - 14
JO - Water (Switzerland)
JF - Water (Switzerland)
IS - 12
M1 - 1696
ER -