dorsal/arxiv
View SchemaMethodological Issues in Building, Training, and Testing Artificial Neural Networks
| Authors | Stacy L. Ozesmi, Uygar Ozesmi, Can Ozan Tan |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0510017 |
| URL | https://arxiv.org/abs/q-bio/0510017 |
| DOI | 10.1016/j.ecolmodel.2005.11.012 |
| Journal | Ecological Modelling, 195:83-93. 2006 |
Abstract
We review the use of artificial neural networks, particularly the feedforward multilayer perceptron with back-propagation for training (MLP), in ecological modelling. Overtraining on data or giving vague references to how it was avoided is the major problem. Various methods can be used to determine when to stop training in artificial neural networks: 1) early stopping based on cross-validation, 2) stopping after a analyst defined error is reached or after the error levels off, 3) use of a test data set. We do not recommend the third method as the test data set is then not independent of model development. Many studies used the testing data to optimize the model and training. Although this method may give the best model for that set of data it does not give generalizability or improve understanding of the study system. The importance of an independent data set cannot be overemphasized as we found dramatic differences in model accuracy assessed with prediction accuracy on the training data set, as estimated with bootstrapping, and from use of an independent data set. The comparison of the artificial neural network with a general linear model (GLM) as a standard procedure is recommended because a GLM may perform as well or better than the MLP. MLP models should not be treated as black box models but instead techniques such as sensitivity analyses, input variable relevances, neural interpretation diagrams, randomization tests, and partial derivatives should be used to make the model more transparent, and further our ecological understanding which is an important goal of the modelling process. Based on our experience we discuss how to build a MLP model and how to optimize the parameters and architecture.
{
"annotation_id": "f44e9dfb-0289-4f91-b4b6-a43c30d3e945",
"date_created": "2026-03-02T18:01:32.331000Z",
"date_modified": "2026-03-02T18:01:32.331000Z",
"file_hash": "84ecd29872463e79b920d3ba34c67e28c3ad494956ccf087733de61b0030fba3",
"private": false,
"record": {
"abstract": "We review the use of artificial neural networks, particularly the feedforward\nmultilayer perceptron with back-propagation for training (MLP), in ecological\nmodelling. Overtraining on data or giving vague references to how it was\navoided is the major problem. Various methods can be used to determine when to\nstop training in artificial neural networks: 1) early stopping based on\ncross-validation, 2) stopping after a analyst defined error is reached or after\nthe error levels off, 3) use of a test data set. We do not recommend the third\nmethod as the test data set is then not independent of model development. Many\nstudies used the testing data to optimize the model and training. Although this\nmethod may give the best model for that set of data it does not give\ngeneralizability or improve understanding of the study system. The importance\nof an independent data set cannot be overemphasized as we found dramatic\ndifferences in model accuracy assessed with prediction accuracy on the training\ndata set, as estimated with bootstrapping, and from use of an independent data\nset. The comparison of the artificial neural network with a general linear\nmodel (GLM) as a standard procedure is recommended because a GLM may perform as\nwell or better than the MLP. MLP models should not be treated as black box\nmodels but instead techniques such as sensitivity analyses, input variable\nrelevances, neural interpretation diagrams, randomization tests, and partial\nderivatives should be used to make the model more transparent, and further our\necological understanding which is an important goal of the modelling process.\nBased on our experience we discuss how to build a MLP model and how to optimize\nthe parameters and architecture.",
"arxiv_id": "q-bio/0510017",
"authors": [
"Stacy L. Ozesmi",
"Uygar Ozesmi",
"Can Ozan Tan"
],
"categories": [
"q-bio.PE",
"q-bio.QM"
],
"doi": "10.1016/j.ecolmodel.2005.11.012",
"journal_ref": "Ecological Modelling, 195:83-93. 2006",
"title": "Methodological Issues in Building, Training, and Testing Artificial Neural Networks",
"url": "https://arxiv.org/abs/q-bio/0510017"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "86df6b6e-48ad-4b5f-9c5e-dcd1b887ed7f",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}