dorsal/arxiv
View SchemaDivergence and Shannon information in genomes
| Authors | Hong-Da Chen, Chang-Heng Chang, Li-Ching Hsieh, Hoong-Chien Lee |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0412037 |
| URL | https://arxiv.org/abs/q-bio/0412037 |
| DOI | 10.1103/PhysRevLett.94.178103 |
Abstract
Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length-independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.
{
"annotation_id": "b1d1c98a-267c-40ba-b876-6f0c5aa67df8",
"date_created": "2026-03-02T18:01:31.934000Z",
"date_modified": "2026-03-02T18:01:31.934000Z",
"file_hash": "ecf3dbcd93e5ef260ba30980ebd07131223843ef85900721b733ab40395f14cc",
"private": false,
"record": {
"abstract": "Shannon information (SI) and its special case, divergence, are defined for a\nDNA sequence in terms of probabilities of chemical words in the sequence and\nare computed for a set of complete genomes highly diverse in length and\ncomposition. We find the following: SI (but not divergence) is inversely\nproportional to sequence length for a random sequence but is length-independent\nfor genomes; the genomic SI is always greater and, for shorter words and longer\nsequences, hundreds to thousands times greater than the SI in a random sequence\nwhose length and composition match those of the genome; genomic SIs appear to\nhave word-length dependent universal values. The universality is inferred to be\nan evolution footprint of a universal mode for genome growth.",
"arxiv_id": "q-bio/0412037",
"authors": [
"Hong-Da Chen",
"Chang-Heng Chang",
"Li-Ching Hsieh",
"Hoong-Chien Lee"
],
"categories": [
"q-bio.GN"
],
"doi": "10.1103/PhysRevLett.94.178103",
"title": "Divergence and Shannon information in genomes",
"url": "https://arxiv.org/abs/q-bio/0412037"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "f5ec3c92-a19d-4adb-b9c2-e3514fef1ae8",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}