dorsal/arxiv
View SchemaFour basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences
| Authors | A. N. Gorban, T. G. Popova, A. Yu. Zinovyev |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0410033 |
| URL | https://arxiv.org/abs/q-bio/0410033 |
| Journal | In Silico Biol. 5, 0025 (2005) http://www.bioinfo.de/isb/2005/05/0025/ |
Abstract
Coding information is the main source of heterogeneity (non-randomness) in the sequences of bacterial genomes. This information can be naturally modeled by analysing cluster structures in the "in-phase" triplet distributions of relatively short genomic fragments (200-400bp). We found a universal 7-cluster structure in bacterial genomic sequences and explained its properties. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy. Based on the analysis of 143 completely sequenced bacterial genomes available in Genbank in August 2004, we show that there are four "pure" types of the 7-cluster structure observed. All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site: http://www.ihes.fr/~zinovyev/7clusters The finding can be readily introduced into any software for gene prediction, sequence alignment or bacterial genomes classification.
{
"annotation_id": "a85706fc-7026-40b7-b014-c95764b6b03d",
"date_created": "2026-03-02T18:01:31.083000Z",
"date_modified": "2026-03-02T18:01:31.083000Z",
"file_hash": "e45a11c391b1ddc193e37a180217e319daa8ae6be6081c91a45aea2dd4665fe2",
"private": false,
"record": {
"abstract": "Coding information is the main source of heterogeneity (non-randomness) in\nthe sequences of bacterial genomes. This information can be naturally modeled\nby analysing cluster structures in the \"in-phase\" triplet distributions of\nrelatively short genomic fragments (200-400bp). We found a universal 7-cluster\nstructure in bacterial genomic sequences and explained its properties. We show\nthat codon usage of bacterial genomes is a multi-linear function of their\ngenomic G+C-content with high accuracy. Based on the analysis of 143 completely\nsequenced bacterial genomes available in Genbank in August 2004, we show that\nthere are four \"pure\" types of the 7-cluster structure observed. All 143\ncluster animated 3D-scatters are collected in a database and is made available\non our web-site: http://www.ihes.fr/~zinovyev/7clusters The finding can be\nreadily introduced into any software for gene prediction, sequence alignment or\nbacterial genomes classification.",
"arxiv_id": "q-bio/0410033",
"authors": [
"A. N. Gorban",
"T. G. Popova",
"A. Yu. Zinovyev"
],
"categories": [
"q-bio.GN",
"math.ST",
"stat.TH"
],
"journal_ref": "In Silico Biol. 5, 0025 (2005)\n http://www.bioinfo.de/isb/2005/05/0025/",
"title": "Four basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences",
"url": "https://arxiv.org/abs/q-bio/0410033"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "64d458cc-d835-4a4b-a3be-ba1b381d9ba0",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}