dorsal/arxiv
View SchemaDi-nucleotide Entropy as a Measure of Genomic Sequence Functionality
| Authors | Dmitri Parkhomchuk |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0611059 |
| URL | https://arxiv.org/abs/q-bio/0611059 |
Abstract
Considering vast amounts of genomic sequences of mostly unknown functionality, in-silico prediction of functional regions is an important enterprise. Many genomic browsers employ GC content, which was observed to be elevated in gene-rich functional regions. This report shows that the entropy of di- and tri-nucleotides distributions provides a superior measure of genomic sequence functionality, and proposes an explanation on why the GC content must be elevated (closer to 50%) in functional regions. Regions with high entropy strongly co-localize with exons and provide genome-wide evidences of purifying selection acting on non-coding regions, such as decreased SNPs density. The observations suggest that functional non-coding regions are optimised for mutation load in a way, that transition mutations have less impact on functionality than transversions, leading to the decrease in transversions to transitions ratio in functional regions.
{
"annotation_id": "18eb2123-ca26-4105-a1e7-cf281495d895",
"date_created": "2026-03-02T18:01:34.732000Z",
"date_modified": "2026-03-02T18:01:34.732000Z",
"file_hash": "2ce3715665f53438442b899f15e4c4cd21672734739e875e82aa0a0701d03ac6",
"private": false,
"record": {
"abstract": "Considering vast amounts of genomic sequences of mostly unknown\nfunctionality, in-silico prediction of functional regions is an important\nenterprise. Many genomic browsers employ GC content, which was observed to be\nelevated in gene-rich functional regions. This report shows that the entropy of\ndi- and tri-nucleotides distributions provides a superior measure of genomic\nsequence functionality, and proposes an explanation on why the GC content must\nbe elevated (closer to 50%) in functional regions. Regions with high entropy\nstrongly co-localize with exons and provide genome-wide evidences of purifying\nselection acting on non-coding regions, such as decreased SNPs density. The\nobservations suggest that functional non-coding regions are optimised for\nmutation load in a way, that transition mutations have less impact on\nfunctionality than transversions, leading to the decrease in transversions to\ntransitions ratio in functional regions.",
"arxiv_id": "q-bio/0611059",
"authors": [
"Dmitri Parkhomchuk"
],
"categories": [
"q-bio.GN"
],
"title": "Di-nucleotide Entropy as a Measure of Genomic Sequence Functionality",
"url": "https://arxiv.org/abs/q-bio/0611059"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "cd572b85-9297-4306-9e68-8e669f3631f5",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}