dorsal/arxiv
View SchemaEvolutionary implications of a power-law distribution of protein family sizes
| Authors | Joel S. Bader |
|---|---|
| Categories | |
| ArXiv ID | physics/9908032 |
| URL | https://arxiv.org/abs/physics/9908032 |
Abstract
Current-day genomes bear the mark of the evolutionary processes. One of the strongest indications is the sequence homology among families of proteins that perform similar biological functions in different species. The number of proteins in a family can grow over time as genetic information is duplicated through evolution. We explore how evolution directs the size distribution of these families. Theoretical predictions for family sizes are obtained from two models, one in which individual genes duplicate and a second in which the entire genome duplicates. Predictions from these models are compared with the family size distributions for several organisms whose complete genome sequence is known. We find that protein family size distributions in nature follow a power-law distribution. Comparing these results to the model systems, we conclude that genome duplication is the dominant mechanism leading to increased genetic material in the species considered.
{
"annotation_id": "1901a04d-6dec-4da6-8cad-fc09461b01da",
"date_created": "2026-03-02T18:01:24.822000Z",
"date_modified": "2026-03-02T18:01:24.822000Z",
"file_hash": "6ce7b39f2f28dcd9de070b2a859ff04a803da0aa8daee93e3818b685b5d6cc80",
"private": false,
"record": {
"abstract": "Current-day genomes bear the mark of the evolutionary processes. One of the\nstrongest indications is the sequence homology among families of proteins that\nperform similar biological functions in different species. The number of\nproteins in a family can grow over time as genetic information is duplicated\nthrough evolution. We explore how evolution directs the size distribution of\nthese families. Theoretical predictions for family sizes are obtained from two\nmodels, one in which individual genes duplicate and a second in which the\nentire genome duplicates. Predictions from these models are compared with the\nfamily size distributions for several organisms whose complete genome sequence\nis known. We find that protein family size distributions in nature follow a\npower-law distribution. Comparing these results to the model systems, we\nconclude that genome duplication is the dominant mechanism leading to increased\ngenetic material in the species considered.",
"arxiv_id": "physics/9908032",
"authors": [
"Joel S. Bader"
],
"categories": [
"physics.bio-ph",
"q-bio"
],
"title": "Evolutionary implications of a power-law distribution of protein family sizes",
"url": "https://arxiv.org/abs/physics/9908032"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "917e7689-a807-47b0-b288-dc36097414e9",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}