Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region

V.G. Minaya Maldonado; Gerald A. Corzo; Dimitri P. Solomatine; Arthur E. Mynett

doi:10.1016/j.ecoinf.2016.12.002

Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region

V.G. Minaya Maldonado^*, Gerald A. Corzo, Dimitri P. Solomatine, Arthur E. Mynett

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

8 Citations (Scopus)

Abstract

As one of the main areas of carbon cycle and climate change studies, water and CO₂ relations are of great significance for estimation of gross primary production (GPP). Various biogeochemical process-based models have been set up to estimate the GPP based on mathematical representation of biological, physiological and ecological processes. However, they ended up increasing the complexity and computational processing power due to the large number of physical equations that need to be solved. Computational time becomes an important matter in the simulation of multiple scenarios using models for long periods of time (e.g. climate projections). Data driven surrogate models have proven to be a useful tool for environmental modelling especially when ecological and climatic co-variates are large. The advantages of Data Driven Models (DDM) are: the possibility of adding new independent variables even if their understanding is weak, and short computational time to run. The aim is to explore the ability of DDMs to replicate a biochemical model calculating GPP. This study evaluates the performance of four surrogate DDMs, namely linear regression method (LRM), model tree (MT), instance-based learning (IBL) and artificial neural network (ANN). A simple empirical and semi-empirical relationship between GPP and climatic variables are studied. Input variable selection (IVS) methods were used to decide on the most relevant and potential environmental model inputs and then followed by a two-step approach which included a model-free and a model-based technique. Data from the highlands (páramo ecosystem) in the Ecuadorian Andean Region from 12-year time-series (2000-2011) were used to evaluate the models at various time frames and at different altitudes. The GPP time series data for the same period were derived from an earlier study using the biomodel BIOME-BGC (BioGeochemical Cycles), which is a comprehensive physical based model used in different analysis of carbon fluxes around the world. So-called IBL (nearest neighbour method) showed a great capability to reproduce the GPP when data was aggregated to monthly time frame. The computational time used to evaluate the time series with IBL as the selected DDM is shorter with enough accuracy for using it in multi-model runs.

Original language	English
Pages (from-to)	222-230
Number of pages	9
Journal	Ecological Informatics: an international journal on ecoinformatics and computational ecology
Volume	43
DOIs	https://doi.org/10.1016/j.ecoinf.2016.12.002
Publication status	Published - 25 May 2015

Keywords

BIOME-BGC
k-nearest neighbour
Model tree
Neural network
Surrogate model
Vegetation models

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.ecoinf.2016.12.002

Cite this

Minaya Maldonado, V. G., Corzo, G. A., Solomatine, D. P., & Mynett, A. E. (2015). Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region. Ecological Informatics: an international journal on ecoinformatics and computational ecology, 43, 222-230. https://doi.org/10.1016/j.ecoinf.2016.12.002

@article{9e3a3ccdb8154556adc40f1fa8842bda,

title = "Data-driven techniques for modelling the gross primary production of the p{\'a}ramo vegetation using climate data: Application in the Ecuadorian Andean region",

abstract = "As one of the main areas of carbon cycle and climate change studies, water and CO2 relations are of great significance for estimation of gross primary production (GPP). Various biogeochemical process-based models have been set up to estimate the GPP based on mathematical representation of biological, physiological and ecological processes. However, they ended up increasing the complexity and computational processing power due to the large number of physical equations that need to be solved. Computational time becomes an important matter in the simulation of multiple scenarios using models for long periods of time (e.g. climate projections). Data driven surrogate models have proven to be a useful tool for environmental modelling especially when ecological and climatic co-variates are large. The advantages of Data Driven Models (DDM) are: the possibility of adding new independent variables even if their understanding is weak, and short computational time to run. The aim is to explore the ability of DDMs to replicate a biochemical model calculating GPP. This study evaluates the performance of four surrogate DDMs, namely linear regression method (LRM), model tree (MT), instance-based learning (IBL) and artificial neural network (ANN). A simple empirical and semi-empirical relationship between GPP and climatic variables are studied. Input variable selection (IVS) methods were used to decide on the most relevant and potential environmental model inputs and then followed by a two-step approach which included a model-free and a model-based technique. Data from the highlands (p{\'a}ramo ecosystem) in the Ecuadorian Andean Region from 12-year time-series (2000-2011) were used to evaluate the models at various time frames and at different altitudes. The GPP time series data for the same period were derived from an earlier study using the biomodel BIOME-BGC (BioGeochemical Cycles), which is a comprehensive physical based model used in different analysis of carbon fluxes around the world. So-called IBL (nearest neighbour method) showed a great capability to reproduce the GPP when data was aggregated to monthly time frame. The computational time used to evaluate the time series with IBL as the selected DDM is shorter with enough accuracy for using it in multi-model runs.",

keywords = "BIOME-BGC, k-nearest neighbour, Model tree, Neural network, Surrogate model, Vegetation models",

author = "{Minaya Maldonado}, V.G. and Corzo, {Gerald A.} and Solomatine, {Dimitri P.} and Mynett, {Arthur E.}",

year = "2015",

month = may,

day = "25",

doi = "10.1016/j.ecoinf.2016.12.002",

language = "English",

volume = "43",

pages = "222--230",

journal = "Ecological Informatics: an international journal on ecoinformatics and computational ecology",

issn = "1574-9541",

publisher = "Elsevier",

}

Minaya Maldonado, VG, Corzo, GA, Solomatine, DP & Mynett, AE 2015, 'Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region', Ecological Informatics: an international journal on ecoinformatics and computational ecology, vol. 43, pp. 222-230. https://doi.org/10.1016/j.ecoinf.2016.12.002

Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region. / Minaya Maldonado, V.G.; Corzo, Gerald A.; Solomatine, Dimitri P. et al.
In: Ecological Informatics: an international journal on ecoinformatics and computational ecology, Vol. 43, 25.05.2015, p. 222-230.

Research output: Contribution to journal › Article › Scientific › peer-review

TY - JOUR

T1 - Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data

T2 - Application in the Ecuadorian Andean region

AU - Minaya Maldonado, V.G.

AU - Corzo, Gerald A.

AU - Solomatine, Dimitri P.

AU - Mynett, Arthur E.

PY - 2015/5/25

Y1 - 2015/5/25

N2 - As one of the main areas of carbon cycle and climate change studies, water and CO2 relations are of great significance for estimation of gross primary production (GPP). Various biogeochemical process-based models have been set up to estimate the GPP based on mathematical representation of biological, physiological and ecological processes. However, they ended up increasing the complexity and computational processing power due to the large number of physical equations that need to be solved. Computational time becomes an important matter in the simulation of multiple scenarios using models for long periods of time (e.g. climate projections). Data driven surrogate models have proven to be a useful tool for environmental modelling especially when ecological and climatic co-variates are large. The advantages of Data Driven Models (DDM) are: the possibility of adding new independent variables even if their understanding is weak, and short computational time to run. The aim is to explore the ability of DDMs to replicate a biochemical model calculating GPP. This study evaluates the performance of four surrogate DDMs, namely linear regression method (LRM), model tree (MT), instance-based learning (IBL) and artificial neural network (ANN). A simple empirical and semi-empirical relationship between GPP and climatic variables are studied. Input variable selection (IVS) methods were used to decide on the most relevant and potential environmental model inputs and then followed by a two-step approach which included a model-free and a model-based technique. Data from the highlands (páramo ecosystem) in the Ecuadorian Andean Region from 12-year time-series (2000-2011) were used to evaluate the models at various time frames and at different altitudes. The GPP time series data for the same period were derived from an earlier study using the biomodel BIOME-BGC (BioGeochemical Cycles), which is a comprehensive physical based model used in different analysis of carbon fluxes around the world. So-called IBL (nearest neighbour method) showed a great capability to reproduce the GPP when data was aggregated to monthly time frame. The computational time used to evaluate the time series with IBL as the selected DDM is shorter with enough accuracy for using it in multi-model runs.

AB - As one of the main areas of carbon cycle and climate change studies, water and CO2 relations are of great significance for estimation of gross primary production (GPP). Various biogeochemical process-based models have been set up to estimate the GPP based on mathematical representation of biological, physiological and ecological processes. However, they ended up increasing the complexity and computational processing power due to the large number of physical equations that need to be solved. Computational time becomes an important matter in the simulation of multiple scenarios using models for long periods of time (e.g. climate projections). Data driven surrogate models have proven to be a useful tool for environmental modelling especially when ecological and climatic co-variates are large. The advantages of Data Driven Models (DDM) are: the possibility of adding new independent variables even if their understanding is weak, and short computational time to run. The aim is to explore the ability of DDMs to replicate a biochemical model calculating GPP. This study evaluates the performance of four surrogate DDMs, namely linear regression method (LRM), model tree (MT), instance-based learning (IBL) and artificial neural network (ANN). A simple empirical and semi-empirical relationship between GPP and climatic variables are studied. Input variable selection (IVS) methods were used to decide on the most relevant and potential environmental model inputs and then followed by a two-step approach which included a model-free and a model-based technique. Data from the highlands (páramo ecosystem) in the Ecuadorian Andean Region from 12-year time-series (2000-2011) were used to evaluate the models at various time frames and at different altitudes. The GPP time series data for the same period were derived from an earlier study using the biomodel BIOME-BGC (BioGeochemical Cycles), which is a comprehensive physical based model used in different analysis of carbon fluxes around the world. So-called IBL (nearest neighbour method) showed a great capability to reproduce the GPP when data was aggregated to monthly time frame. The computational time used to evaluate the time series with IBL as the selected DDM is shorter with enough accuracy for using it in multi-model runs.

KW - BIOME-BGC

KW - k-nearest neighbour

KW - Model tree

KW - Neural network

KW - Surrogate model

KW - Vegetation models

UR - http://www.scopus.com/inward/record.url?scp=85007574278&partnerID=8YFLogxK

U2 - 10.1016/j.ecoinf.2016.12.002

DO - 10.1016/j.ecoinf.2016.12.002

M3 - Article

AN - SCOPUS:85007574278

SN - 1574-9541

VL - 43

SP - 222

EP - 230

JO - Ecological Informatics: an international journal on ecoinformatics and computational ecology

JF - Ecological Informatics: an international journal on ecoinformatics and computational ecology

ER -

Data-driven techniques for modelling the gross primary production of the páramo vegetation using climate data: Application in the Ecuadorian Andean region

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this