dorsal/arxiv
View SchemaThe principal eigenvector of contact matrices and hydrophobicity profiles in proteins
| Authors | Ugo Bastolla, Markus Porto, H. Eduardo Roman, Michele Vendruscolo |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0406003 |
| URL | https://arxiv.org/abs/q-bio/0406003 |
| Journal | Proteins 58, 22-30 (2005) |
Abstract
With the aim to study the relationship between protein sequences and their native structures, we adopt vectorial representations for both sequence and structure. The structural representation is based on the Principal Eigenvector of the fold's contact matrix (PE). As recently shown, the latter encodes sufficient information for reconstructing the whole contact matrix. The sequence is represented through a Hydrophobicity Profile (HP), using a generalized hydrophobicity scale that we obtain from the principal eigenvector of a residue-residue interaction matrix and denote it as interactivity scale. Using this novel scale, we define the optimal HP of a protein fold, and predict, by means of stability arguments, that it is strongly correlated with the PE of the fold's contact matrix. This prediction is confirmed through an evolutionary analysis, which shows that the PE correlates with the HP of each individual sequence adopting the same fold and, even more strongly, with the average HP of this set of sequences. Thus, protein sequences evolve in such a way that their average HP is close to the optimal one, implying that neutral evolution can be viewed as a kind of motion in sequence space around the optimal HP. Our results indicate that the correlation coefficient between N-dimensional vectors constitutes a natural metric in the vectorial space in which we represent both protein sequences and protein structures, which we call Vectorial Protein Space. In this way, we define a unified framework for sequence to sequence, sequence to structure, and structure to structure alignments. We show that the interactivity scale is nearly optimal both for the comparison of sequences with sequences and sequences with structures.
{
"annotation_id": "4f87fa22-3644-4882-8198-0a0c35c2f1c4",
"date_created": "2026-03-02T18:01:31.025000Z",
"date_modified": "2026-03-02T18:01:31.025000Z",
"file_hash": "4f4021e7455fca13b363ab96d01cf2df437d0516f8fa819bbb64e45800e28292",
"private": false,
"record": {
"abstract": "With the aim to study the relationship between protein sequences and their\nnative structures, we adopt vectorial representations for both sequence and\nstructure. The structural representation is based on the Principal Eigenvector\nof the fold\u0027s contact matrix (PE). As recently shown, the latter encodes\nsufficient information for reconstructing the whole contact matrix. The\nsequence is represented through a Hydrophobicity Profile (HP), using a\ngeneralized hydrophobicity scale that we obtain from the principal eigenvector\nof a residue-residue interaction matrix and denote it as interactivity scale.\nUsing this novel scale, we define the optimal HP of a protein fold, and\npredict, by means of stability arguments, that it is strongly correlated with\nthe PE of the fold\u0027s contact matrix. This prediction is confirmed through an\nevolutionary analysis, which shows that the PE correlates with the HP of each\nindividual sequence adopting the same fold and, even more strongly, with the\naverage HP of this set of sequences. Thus, protein sequences evolve in such a\nway that their average HP is close to the optimal one, implying that neutral\nevolution can be viewed as a kind of motion in sequence space around the\noptimal HP. Our results indicate that the correlation coefficient between\nN-dimensional vectors constitutes a natural metric in the vectorial space in\nwhich we represent both protein sequences and protein structures, which we call\nVectorial Protein Space. In this way, we define a unified framework for\nsequence to sequence, sequence to structure, and structure to structure\nalignments. We show that the interactivity scale is nearly optimal both for the\ncomparison of sequences with sequences and sequences with structures.",
"arxiv_id": "q-bio/0406003",
"authors": [
"Ugo Bastolla",
"Markus Porto",
"H. Eduardo Roman",
"Michele Vendruscolo"
],
"categories": [
"q-bio.BM"
],
"journal_ref": "Proteins 58, 22-30 (2005)",
"title": "The principal eigenvector of contact matrices and hydrophobicity profiles in proteins",
"url": "https://arxiv.org/abs/q-bio/0406003"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "13dfafd3-dbee-4864-b68a-03b83147d5bb",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}