dorsal/arxiv
View SchemaProtein secondary structure: Entropy, correlations and prediction
| Authors | Gavin E. Crooks, Steven E. Brenner |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0310034 |
| URL | https://arxiv.org/abs/q-bio/0310034 |
| DOI | 10.1093/bioinformatics/bth132 |
| Journal | Bioinformatics 20:1603-1611 (2004) |
Abstract
Is protein secondary structure primarily determined by local interactions between residues closely spaced along the amino acid backbone, or by non-local tertiary interactions? To answer this question we have measured the entropy densities of primary structure and secondary structure sequences, and the local inter-sequence mutual information density. We find that the important inter-sequence interactions are short ranged, that correlations between neighboring amino acids are essentially uninformative, and that only 1/4 of the total information needed to determine the secondary structure is available from local inter-sequence correlations. Since the remaining information must come from non-local interactions, this observation supports the view that the majority of most proteins fold via a cooperative process where secondary and tertiary structure form concurrently. To provide a more direct comparison to existing secondary structure prediction methods, we construct a simple hidden Markov model (HMM) of the sequences. This HMM achieves a prediction accuracy comparable to other single sequence secondary structure prediction algorithms, and can extract almost all of the inter-sequence mutual information. This suggests that these algorithms are almost optimal, and that we should not expect a dramatic improvement in prediction accuracy. However, local correlations between secondary and primary structure are probably of under-appreciated importance in many tertiary structure prediction methods, such as threading.
{
"annotation_id": "d50e4cc8-25ff-476c-96c2-185ce86f3ea9",
"date_created": "2026-03-02T18:01:28.831000Z",
"date_modified": "2026-03-02T18:01:28.831000Z",
"file_hash": "63a48913c2e2cfded7dca5c3c7fe7767b295c6f29f9610f187347c24923d7313",
"private": false,
"record": {
"abstract": "Is protein secondary structure primarily determined by local interactions\nbetween residues closely spaced along the amino acid backbone, or by non-local\ntertiary interactions? To answer this question we have measured the entropy\ndensities of primary structure and secondary structure sequences, and the local\ninter-sequence mutual information density. We find that the important\ninter-sequence interactions are short ranged, that correlations between\nneighboring amino acids are essentially uninformative, and that only 1/4 of the\ntotal information needed to determine the secondary structure is available from\nlocal inter-sequence correlations. Since the remaining information must come\nfrom non-local interactions, this observation supports the view that the\nmajority of most proteins fold via a cooperative process where secondary and\ntertiary structure form concurrently. To provide a more direct comparison to\nexisting secondary structure prediction methods, we construct a simple hidden\nMarkov model (HMM) of the sequences. This HMM achieves a prediction accuracy\ncomparable to other single sequence secondary structure prediction algorithms,\nand can extract almost all of the inter-sequence mutual information. This\nsuggests that these algorithms are almost optimal, and that we should not\nexpect a dramatic improvement in prediction accuracy. However, local\ncorrelations between secondary and primary structure are probably of\nunder-appreciated importance in many tertiary structure prediction methods,\nsuch as threading.",
"arxiv_id": "q-bio/0310034",
"authors": [
"Gavin E. Crooks",
"Steven E. Brenner"
],
"categories": [
"q-bio.BM"
],
"doi": "10.1093/bioinformatics/bth132",
"journal_ref": "Bioinformatics 20:1603-1611 (2004)",
"title": "Protein secondary structure: Entropy, correlations and prediction",
"url": "https://arxiv.org/abs/q-bio/0310034"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "92fd8ae5-6197-401f-97c7-724ab21644d3",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}