dorsal/arxiv
View SchemaPredicting Genetic Regulatory Response Using Classification
| Authors | Manuel Middendorf, Anshul Kundaje, Chris Wiggins, Yoav Freund, Christina Leslie |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0411028 |
| URL | https://arxiv.org/abs/q-bio/0411028 |
| Journal | Proceedings of the Twelfth International Conference on Intelligent Systems for Molecular Biology (ISMB 2004), Bioinformatics 20 Suppl 1, I232-I240, 2004 |
Abstract
We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences (``motifs'') in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment (``parents''). Thus our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurement to a classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S. cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks.
{
"annotation_id": "400d6f3d-e47b-40ae-a9a6-5e17d7017ea7",
"date_created": "2026-03-02T18:01:32.176000Z",
"date_modified": "2026-03-02T18:01:32.176000Z",
"file_hash": "445a1356245c335a8d00e9bd2b4fb278d0e5479828f80351fd7c2b4ee84cfa8d",
"private": false,
"record": {
"abstract": "We present a novel classification-based method for learning to predict gene\nregulatory response. Our approach is motivated by the hypothesis that in simple\norganisms such as Saccharomyces cerevisiae, we can learn a decision rule for\npredicting whether a gene is up- or down-regulated in a particular experiment\nbased on (1) the presence of binding site subsequences (``motifs\u0027\u0027) in the\ngene\u0027s regulatory region and (2) the expression levels of regulators such as\ntranscription factors in the experiment (``parents\u0027\u0027). Thus our learning task\nintegrates two qualitatively different data sources: genome-wide cDNA\nmicroarray data across multiple perturbation and mutant experiments along with\nmotif profile data from regulatory sequences. We convert the regression task of\npredicting real-valued gene expression measurement to a classification task of\npredicting +1 and -1 labels, corresponding to up- and down-regulation beyond\nthe levels of biological and measurement noise in microarray measurements. The\nlearning algorithm employed is boosting with a margin-based generalization of\ndecision trees, alternating decision trees. This large-margin classifier is\nsufficiently flexible to allow complex logical functions, yet sufficiently\nsimple to give insight into the combinatorial mechanisms of gene regulation. We\nobserve encouraging prediction accuracy on experiments based on the Gasch S.\ncerevisiae dataset, and we show that we can accurately predict up- and\ndown-regulation on held-out experiments. Our method thus provides predictive\nhypotheses, suggests biological experiments, and provides interpretable insight\ninto the structure of genetic regulatory networks.",
"arxiv_id": "q-bio/0411028",
"authors": [
"Manuel Middendorf",
"Anshul Kundaje",
"Chris Wiggins",
"Yoav Freund",
"Christina Leslie"
],
"categories": [
"q-bio.QM"
],
"journal_ref": "Proceedings of the Twelfth International Conference on Intelligent\n Systems for Molecular Biology (ISMB 2004), Bioinformatics 20 Suppl 1,\n I232-I240, 2004",
"title": "Predicting Genetic Regulatory Response Using Classification",
"url": "https://arxiv.org/abs/q-bio/0411028"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "5b14b396-cfe2-4214-8c32-d37ba1a00ee9",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}