dorsal/arxiv
View SchemaSimplified amino acid alphabets based on deviation of conditional probability from random background
| Authors | Xin Liu, Di Liu, Ji Qi, Wei-Mou Zheng |
|---|---|
| Categories | |
| ArXiv ID | physics/0211031 |
| URL | https://arxiv.org/abs/physics/0211031 |
| DOI | 10.1103/PhysRevE.66.021906 |
| Journal | PRE 66, 021906 (2002) |
Abstract
The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.
{
"annotation_id": "54230639-d9e1-40f6-b6fd-b328291fcdc3",
"date_created": "2026-03-02T18:00:42.611000Z",
"date_modified": "2026-03-02T18:00:42.611000Z",
"file_hash": "be5d158d6454484868fadc79cc904775b217f8f9bce853634213a909f49ceea2",
"private": false,
"record": {
"abstract": "The primitive data for deducing the Miyazawa-Jernigan contact energy or\nBLOSUM score matrix consists of pair frequency counts. Each amino acid\ncorresponds to a conditional probability distribution. Based on the deviation\nof such conditional probability from random background, a scheme for reduction\nof amino acid alphabet is proposed. It is observed that evident discrepancy\nexists between reduced alphabets obtained from raw data of the\nMiyazawa-Jernigan\u0027s and BLOSUM\u0027s residue pair counts. Taking homologous\nsequence database SCOP40 as a test set, we detect homology with the obtained\ncoarse-grained substitution matrices. It is verified that the reduced alphabets\nobtained well preserve information contained in the original 20-letter\nalphabet.",
"arxiv_id": "physics/0211031",
"authors": [
"Xin Liu",
"Di Liu",
"Ji Qi",
"Wei-Mou Zheng"
],
"categories": [
"physics.bio-ph",
"physics.data-an",
"q-bio"
],
"doi": "10.1103/PhysRevE.66.021906",
"journal_ref": "PRE 66, 021906 (2002)",
"title": "Simplified amino acid alphabets based on deviation of conditional probability from random background",
"url": "https://arxiv.org/abs/physics/0211031"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "0885a88f-b786-4cf4-bd02-81fdcf39ad5d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}