dorsal/arxiv
View SchemaVariable selection from random forests: application to gene expression data
| Authors | Ramon Diaz-Uriarte, Sara Alvarez de Andres |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0503025 |
| URL | https://arxiv.org/abs/q-bio/0503025 |
Abstract
Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its use for gene selection. We first show the effects of changes in parameters of random forest on the prediction error. Then we present an approach for gene selection that uses measures of variable importance and error rate, and is targeted towards the selection of small sets of genes. Using simulated and real microarray data, we show that the gene selection procedure yields small sets of genes while preserving predictive accuracy. Availability: All code is available as an R package, varSelRF, from CRAN, http://cran.r-project.org/src/contrib/PACKAGES.html, or from the supplementary material page. Supplementary information: http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html
{
"annotation_id": "b2738b88-4975-4db5-9902-17d39c877a44",
"date_created": "2026-03-02T18:01:31.154000Z",
"date_modified": "2026-03-02T18:01:31.154000Z",
"file_hash": "1c63abd2b9dbbeedab67b0e8cadba716cb34be50a9f936200cbbdc5b2aad15fb",
"private": false,
"record": {
"abstract": "Random forest is a classification algorithm well suited for microarray data:\nit shows excellent performance even when most predictive variables are noise,\ncan be used when the number of variables is much larger than the number of\nobservations, and returns measures of variable importance. Thus, it is\nimportant to understand the performance of random forest with microarray data\nand its use for gene selection.\n We first show the effects of changes in parameters of random forest on the\nprediction error. Then we present an approach for gene selection that uses\nmeasures of variable importance and error rate, and is targeted towards the\nselection of small sets of genes. Using simulated and real microarray data, we\nshow that the gene selection procedure yields small sets of genes while\npreserving predictive accuracy.\n Availability: All code is available as an R package, varSelRF, from CRAN,\nhttp://cran.r-project.org/src/contrib/PACKAGES.html, or from the supplementary\nmaterial page.\n Supplementary information:\nhttp://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html",
"arxiv_id": "q-bio/0503025",
"authors": [
"Ramon Diaz-Uriarte",
"Sara Alvarez de Andres"
],
"categories": [
"q-bio.QM"
],
"title": "Variable selection from random forests: application to gene expression data",
"url": "https://arxiv.org/abs/q-bio/0503025"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "e73dbc81-b045-4345-9ea9-630c2452938d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}