dorsal/arxiv
View SchemaSimplifying the mosaic description of DNA sequences
| Authors | Rajeev K. Azad, J. Subba Rao, Wentian Li, Ramakrishna Ramaswamy |
|---|---|
| Categories | |
| ArXiv ID | physics/0207113 |
| URL | https://arxiv.org/abs/physics/0207113 |
| DOI | 10.1103/PhysRevE.66.031913 |
Abstract
By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share compositional similarity with one or more distant (non--neighbouring) domains. We thus obtain a coarse--grained description of the given DNA string in terms of a smaller set of distinct domain labels. This yields a minimal domain description of a given DNA sequence, significantly reducing its organizational complexity. This procedure gives a new means of evaluating genomic complexity as one examines organisms ranging from bacteria to human. The mosaic organization of DNA sequences could have originated from the insertion of fragments of one genome (the parasite) inside another (the host), and we present numerical experiments that are suggestive of this scenario.
{
"annotation_id": "467a2346-6d84-45d1-becc-692c5f5265d3",
"date_created": "2026-03-02T18:00:39.819000Z",
"date_modified": "2026-03-02T18:00:39.819000Z",
"file_hash": "9752a2b5b1942cc799cc20b4ca71f4d5434cffffdb9188cce1bd82890d705952",
"private": false,
"record": {
"abstract": "By using the Jensen-Shannon divergence, genomic DNA can be divided into\ncompositionally distinct domains through a standard recursive segmentation\nprocedure. Each domain, while significantly different from its neighbours, may\nhowever share compositional similarity with one or more distant\n(non--neighbouring) domains. We thus obtain a coarse--grained description of\nthe given DNA string in terms of a smaller set of distinct domain labels. This\nyields a minimal domain description of a given DNA sequence, significantly\nreducing its organizational complexity. This procedure gives a new means of\nevaluating genomic complexity as one examines organisms ranging from bacteria\nto human. The mosaic organization of DNA sequences could have originated from\nthe insertion of fragments of one genome (the parasite) inside another (the\nhost), and we present numerical experiments that are suggestive of this\nscenario.",
"arxiv_id": "physics/0207113",
"authors": [
"Rajeev K. Azad",
"J. Subba Rao",
"Wentian Li",
"Ramakrishna Ramaswamy"
],
"categories": [
"physics.bio-ph",
"q-bio.GN"
],
"doi": "10.1103/PhysRevE.66.031913",
"title": "Simplifying the mosaic description of DNA sequences",
"url": "https://arxiv.org/abs/physics/0207113"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "b93d9593-7f0b-44f6-a9b5-401420647b18",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}