dorsal/arxiv
View SchemaLinguistic mechanism of the evolution of amino acid frequencies and genomic GC content
| Authors | Dirson Jian Li |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0612010 |
| URL | https://arxiv.org/abs/q-bio/0612010 |
Abstract
Much information is stored in amino acid composition of protein and base composition of DNA. We simulated the evolution of amino acid frequencies and genomic GC content by a linguistic model. It is showed that the evolution of genetic code determines the evolution of amino acid frequencies and genomic GC content. We explained the relationships among amino acid frequencies, genomic GC content and protein length distribution in a unified theoretical framework. Especially, the simulations of the evolution of amino acid frequencies and the codon position GC content agree dramatically with the results based on the data of all known genomes so far. Furthermore, we found that the space of average protein length in proteome and ratio of amino acid frequencies is useful to describe the phylogeny and evolution. Amazingly, the dots of all the species in this space form an evolutionary flow. We believe that the amino acid gain and loss is motivated by the established pattern of the variation of amino acid frequencies. The linguistic mechanism is helpful to unveil the origin of the genetic code.
{
"annotation_id": "fbbc2a8b-c722-4598-978d-a02d88a1af47",
"date_created": "2026-03-02T18:01:35.375000Z",
"date_modified": "2026-03-02T18:01:35.375000Z",
"file_hash": "973edc298abdc547656753fa402b1abd0632bb5eebd72a632ac114f071027fa5",
"private": false,
"record": {
"abstract": "Much information is stored in amino acid composition of protein and base\ncomposition of DNA. We simulated the evolution of amino acid frequencies and\ngenomic GC content by a linguistic model. It is showed that the evolution of\ngenetic code determines the evolution of amino acid frequencies and genomic GC\ncontent. We explained the relationships among amino acid frequencies, genomic\nGC content and protein length distribution in a unified theoretical framework.\nEspecially, the simulations of the evolution of amino acid frequencies and the\ncodon position GC content agree dramatically with the results based on the data\nof all known genomes so far. Furthermore, we found that the space of average\nprotein length in proteome and ratio of amino acid frequencies is useful to\ndescribe the phylogeny and evolution. Amazingly, the dots of all the species in\nthis space form an evolutionary flow. We believe that the amino acid gain and\nloss is motivated by the established pattern of the variation of amino acid\nfrequencies. The linguistic mechanism is helpful to unveil the origin of the\ngenetic code.",
"arxiv_id": "q-bio/0612010",
"authors": [
"Dirson Jian Li"
],
"categories": [
"q-bio.GN"
],
"title": "Linguistic mechanism of the evolution of amino acid frequencies and genomic GC content",
"url": "https://arxiv.org/abs/q-bio/0612010"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "b237bf93-c02c-438d-84cd-07f7c08b82b9",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}