dorsal/arxiv
View SchemaAlgorithm for Finding Optimal Gene Sets in Microarray Prediction
| Authors | J. M. Deutsch |
|---|---|
| Categories | |
| ArXiv ID | physics/0108011 |
| URL | https://arxiv.org/abs/physics/0108011 |
Abstract
Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data. Availability: http://stravinsky.ucsc.edu/josh/gesses/ Contact: josh@physics.ucsc.edu
{
"annotation_id": "4183a009-ea4d-4e83-a89d-ffb23efef7bf",
"date_created": "2026-03-02T18:00:35.796000Z",
"date_modified": "2026-03-02T18:00:35.796000Z",
"file_hash": "a4608c167d415419b0ccdda3f396949092c9a95aa4503b538c62cb75892357e7",
"private": false,
"record": {
"abstract": "Motivation: Microarray data has been recently been shown to be efficacious in\ndistinguishing closely related cell types that often appear in the diagnosis of\ncancer. It is useful to determine the minimum number of genes needed to do such\na diagnosis both for clinical use and to determine the importance of specific\ngenes for cancer. Here a replication algorithm is used for this purpose. It\nevolves an ensemble of predictors, all using different combinations of genes to\ngenerate a set of optimal predictors.\n Results: We apply this method to the leukemia data of the Whitehead/MIT group\nthat attempts to differentially diagnose two kinds of leukemia, and also to\ndata of Khan et. al. to distinguish four different kinds of childhood cancers.\nIn the latter case we were able to reduce the number of genes needed from 96\ndown to 15, while at the same time being able to perfectly classify all of\ntheir test data.\n Availability: http://stravinsky.ucsc.edu/josh/gesses/\n Contact: josh@physics.ucsc.edu",
"arxiv_id": "physics/0108011",
"authors": [
"J. M. Deutsch"
],
"categories": [
"physics.bio-ph",
"physics.comp-ph",
"physics.med-ph",
"q-bio"
],
"title": "Algorithm for Finding Optimal Gene Sets in Microarray Prediction",
"url": "https://arxiv.org/abs/physics/0108011"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "50519078-b838-4fc7-ba4f-75983f72ee3f",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}