dorsal/arxiv
View SchemaCompositional representation of protein sequences and the number of Eulerian loops
| Authors | Bailin Hao, Huimin Xie, Shuyu Zhang |
|---|---|
| Categories | |
| ArXiv ID | physics/0103028 |
| URL | https://arxiv.org/abs/physics/0103028 |
Abstract
An amino acid sequence of a protein may be decomposed into consecutive overlapping strings of length K. How unique is the converse, i.e., reconstruction of amino acid sequences using the set of K-strings obtained in the decomposition? This problem may be transformed into the problem of counting the number of Eulerian loops in an Euler graph, though the well-known formula must be modified. By exhaustive enumeration and by using the modified formula we show that the reconstruction is unique at K equal or greater than 5 for an overwhelming majority of the proteins in the PDB.seq database. The corresponding Euler graphs provide a means to study the structure of repeated segments in protein sequences.
{
"annotation_id": "257d2aec-4f49-4d96-aad2-5584aa4c9eff",
"date_created": "2026-03-02T18:00:35.514000Z",
"date_modified": "2026-03-02T18:00:35.514000Z",
"file_hash": "b7cfe4b76ce372c9684fc90967d127e251029710f71470b727defb654ebb8450",
"private": false,
"record": {
"abstract": "An amino acid sequence of a protein may be decomposed into consecutive\noverlapping strings of length K. How unique is the converse, i.e.,\nreconstruction of amino acid sequences using the set of K-strings obtained in\nthe decomposition? This problem may be transformed into the problem of counting\nthe number of Eulerian loops in an Euler graph, though the well-known formula\nmust be modified. By exhaustive enumeration and by using the modified formula\nwe show that the reconstruction is unique at K equal or greater than 5 for an\noverwhelming majority of the proteins in the PDB.seq database. The\ncorresponding Euler graphs provide a means to study the structure of repeated\nsegments in protein sequences.",
"arxiv_id": "physics/0103028",
"authors": [
"Bailin Hao",
"Huimin Xie",
"Shuyu Zhang"
],
"categories": [
"physics.bio-ph",
"q-bio"
],
"title": "Compositional representation of protein sequences and the number of Eulerian loops",
"url": "https://arxiv.org/abs/physics/0103028"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "a3da43b0-d884-49af-bfa8-8ee85d49438c",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}