dorsal/arxiv
View SchemaLength, Protein-Protein Interactions, and Complexity
| Authors | Taison Tan, Daan Frenkel, Vishal Gupta, Michael W. Deem |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0411035 |
| URL | https://arxiv.org/abs/q-bio/0411035 |
| DOI | 10.1016/j.physa.2004.11.021 |
Abstract
The evolutionary reason for the increase in gene length from archaea to prokaryotes to eukaryotes observed in large scale genome sequencing efforts has been unclear. We propose here that the increasing complexity of protein-protein interactions has driven the selection of longer proteins, as longer proteins are more able to distinguish among a larger number of distinct interactions due to their greater average surface area. Annotated protein sequences available from the SWISS-PROT database were analyzed for thirteen eukaryotes, eight bacteria, and two archaea species. The number of subcellular locations to which each protein is associated is used as a measure of the number of interactions to which a protein participates. Two databases of yeast protein-protein interactions were used as another measure of the number of interactions to which each \emph{S. cerevisiae} protein participates. Protein length is shown to correlate with both number of subcellular locations to which a protein is associated and number of interactions as measured by yeast two-hybrid experiments. Protein length is also shown to correlate with the probability that the protein is encoded by an essential gene. Interestingly, average protein length and number of subcellular locations are not significantly different between all human proteins and protein targets of known, marketed drugs. Increased protein length appears to be a significant mechanism by which the increasing complexity of protein-protein interaction networks is accommodated within the natural evolution of species. Consideration of protein length may be a valuable tool in drug design, one that predicts different strategies for inhibiting interactions in aberrant and normal pathways.
{
"annotation_id": "f88cca47-5593-4d2c-b370-f2b3846f024d",
"date_created": "2026-03-02T18:01:32.240000Z",
"date_modified": "2026-03-02T18:01:32.240000Z",
"file_hash": "7090bee9fc2b64d7d5fd1f5d415c9346825328754c2a59349d1128783463b4dc",
"private": false,
"record": {
"abstract": "The evolutionary reason for the increase in gene length from archaea to\nprokaryotes to eukaryotes observed in large scale genome sequencing efforts has\nbeen unclear. We propose here that the increasing complexity of protein-protein\ninteractions has driven the selection of longer proteins, as longer proteins\nare more able to distinguish among a larger number of distinct interactions due\nto their greater average surface area. Annotated protein sequences available\nfrom the SWISS-PROT database were analyzed for thirteen eukaryotes, eight\nbacteria, and two archaea species. The number of subcellular locations to which\neach protein is associated is used as a measure of the number of interactions\nto which a protein participates. Two databases of yeast protein-protein\ninteractions were used as another measure of the number of interactions to\nwhich each \\emph{S. cerevisiae} protein participates. Protein length is shown\nto correlate with both number of subcellular locations to which a protein is\nassociated and number of interactions as measured by yeast two-hybrid\nexperiments. Protein length is also shown to correlate with the probability\nthat the protein is encoded by an essential gene. Interestingly, average\nprotein length and number of subcellular locations are not significantly\ndifferent between all human proteins and protein targets of known, marketed\ndrugs. Increased protein length appears to be a significant mechanism by which\nthe increasing complexity of protein-protein interaction networks is\naccommodated within the natural evolution of species. Consideration of protein\nlength may be a valuable tool in drug design, one that predicts different\nstrategies for inhibiting interactions in aberrant and normal pathways.",
"arxiv_id": "q-bio/0411035",
"authors": [
"Taison Tan",
"Daan Frenkel",
"Vishal Gupta",
"Michael W. Deem"
],
"categories": [
"q-bio.MN"
],
"doi": "10.1016/j.physa.2004.11.021",
"title": "Length, Protein-Protein Interactions, and Complexity",
"url": "https://arxiv.org/abs/q-bio/0411035"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "6d6183d4-1ada-4f64-bb26-13ebc838d31d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}