dorsal/arxiv
View SchemaOverlapping Probabilities of Top Ranking Gene Lists, Hypergeometric Distribution, and Stringency of Gene Selection Criterion
| Authors | Wen Fury, Franak Batliwalla, Peter K. Gregersen, Wentian Li |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0606017 |
| URL | https://arxiv.org/abs/q-bio/0606017 |
| DOI | 10.1109/IEMBS.2006.260828 |
| Journal | Proceedings of 28th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE (2006), pages 5531-5534 |
Abstract
When the same set of genes appear in two top ranking gene lists in two different studies, it is often of interest to estimate the probability for this being a chance event. This overlapping probability is well known to follow the hypergeometric distribution. Usually, the lengths of top-ranking gene lists are assumed to be fixed, by using a pre-set criterion on, e.g., $p$-value for the t-test. We investigate how overlapping probability changes with the gene selection criterion, or simply, with the length of the top-ranking gene lists. It is concluded that overlapping probability is indeed a function of the gene list length, and its statistical significance should be quoted in the context of gene selection criterion.
{
"annotation_id": "93b25571-2d3a-4c6d-9ee1-48774de9227b",
"date_created": "2026-03-02T18:01:35.823000Z",
"date_modified": "2026-03-02T18:01:35.823000Z",
"file_hash": "55c6602b5d2625418ce8c634b89c1e67f985c9e4215387035c02d6d76660f519",
"private": false,
"record": {
"abstract": "When the same set of genes appear in two top ranking gene lists in two\ndifferent studies, it is often of interest to estimate the probability for this\nbeing a chance event. This overlapping probability is well known to follow the\nhypergeometric distribution. Usually, the lengths of top-ranking gene lists are\nassumed to be fixed, by using a pre-set criterion on, e.g., $p$-value for the\nt-test. We investigate how overlapping probability changes with the gene\nselection criterion, or simply, with the length of the top-ranking gene lists.\nIt is concluded that overlapping probability is indeed a function of the gene\nlist length, and its statistical significance should be quoted in the context\nof gene selection criterion.",
"arxiv_id": "q-bio/0606017",
"authors": [
"Wen Fury",
"Franak Batliwalla",
"Peter K. Gregersen",
"Wentian Li"
],
"categories": [
"q-bio.QM"
],
"doi": "10.1109/IEMBS.2006.260828",
"journal_ref": "Proceedings of 28th Annual International Conference of the\n Engineering in Medicine and Biology Society, IEEE (2006), pages 5531-5534",
"title": "Overlapping Probabilities of Top Ranking Gene Lists, Hypergeometric Distribution, and Stringency of Gene Selection Criterion",
"url": "https://arxiv.org/abs/q-bio/0606017"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "7df95ab2-e261-487c-a8bc-0e3abcb0778d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}