dorsal/arxiv
View SchemaStatistical analysis of simple repeats in the human genome
| Authors | Francesco Piazza, Pietro Lio |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0502009 |
| URL | https://arxiv.org/abs/q-bio/0502009 |
| DOI | 10.1016/j.physa.2004.08.038 |
| Journal | Physica A 347 (2005) 472-488 |
Abstract
The human genome contains repetitive DNA at different level of sequence length, number and dispersion. Highly repetitive DNA is particularly rich in homo-- and di--nucleotide repeats, while middle repetitive DNA is rich of families of interspersed, mobile elements hundreds of base pairs (bp) long, among which the Alu families. A link between homo- and di-polymeric tracts and mobile elements has been recently highlighted. In particular, the mobility of Alu repeats, which form 10% of the human genome, has been correlated with the length of poly(A) tracts located at one end of the Alu. These tracts have a rigid and non-bendable structure and have an inhibitory effect on nucleosomes, which normally compact the DNA. We performed a statistical analysis of the genome-wide distribution of lengths and inter--tract separations of poly(X) and poly(XY) tracts in the human genome. Our study shows that in humans the length distributions of these sequences reflect the dynamics of their expansion and DNA replication. By means of general tools from linguistics, we show that the latter play the role of highly-significant content-bearing terms in the DNA text. Furthermore, we find that such tracts are positioned in a non-random fashion, with an apparent periodicity of 150 bases. This allows us to extend the link between repetitive, highly mobile elements such as Alus and low-complexity words in human DNA. More precisely, we show that Alus are sources of poly(X) tracts, which in turn affect in a subtle way the combination and diversification of gene expression and the fixation of multigene families.
{
"annotation_id": "35782e39-4bd6-41d3-9dde-cb5d510f6a6d",
"date_created": "2026-03-02T18:01:32.229000Z",
"date_modified": "2026-03-02T18:01:32.229000Z",
"file_hash": "428c207ff4511119ceb42736ec8929a7f64dbb5bb60659ba8d9ce8496f62bb85",
"private": false,
"record": {
"abstract": "The human genome contains repetitive DNA at different level of sequence\nlength, number and dispersion. Highly repetitive DNA is particularly rich in\nhomo-- and di--nucleotide repeats, while middle repetitive DNA is rich of\nfamilies of interspersed, mobile elements hundreds of base pairs (bp) long,\namong which the Alu families. A link between homo- and di-polymeric tracts and\nmobile elements has been recently highlighted. In particular, the mobility of\nAlu repeats, which form 10% of the human genome, has been correlated with the\nlength of poly(A) tracts located at one end of the Alu. These tracts have a\nrigid and non-bendable structure and have an inhibitory effect on nucleosomes,\nwhich normally compact the DNA. We performed a statistical analysis of the\ngenome-wide distribution of lengths and inter--tract separations of poly(X) and\npoly(XY) tracts in the human genome. Our study shows that in humans the length\ndistributions of these sequences reflect the dynamics of their expansion and\nDNA replication. By means of general tools from linguistics, we show that the\nlatter play the role of highly-significant content-bearing terms in the DNA\ntext. Furthermore, we find that such tracts are positioned in a non-random\nfashion, with an apparent periodicity of 150 bases. This allows us to extend\nthe link between repetitive, highly mobile elements such as Alus and\nlow-complexity words in human DNA. More precisely, we show that Alus are\nsources of poly(X) tracts, which in turn affect in a subtle way the combination\nand diversification of gene expression and the fixation of multigene families.",
"arxiv_id": "q-bio/0502009",
"authors": [
"Francesco Piazza",
"Pietro Lio"
],
"categories": [
"q-bio.GN"
],
"doi": "10.1016/j.physa.2004.08.038",
"journal_ref": "Physica A 347 (2005) 472-488",
"title": "Statistical analysis of simple repeats in the human genome",
"url": "https://arxiv.org/abs/q-bio/0502009"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "8b2b6940-a848-4361-9822-19c349a13fea",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}