dorsal/arxiv
View SchemaEstimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach
| Authors | Yan Y. Tseng, Jie Liang |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0601019 |
| URL | https://arxiv.org/abs/q-bio/0601019 |
| DOI | 10.1093/molbev/msj048 |
| Journal | Mol Biol Evol. 2006 Feb;23(2):421-36. Epub 2005 Oct 26 |
Abstract
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.
{
"annotation_id": "13f606a2-ba82-4c3c-baed-a81f4d879257",
"date_created": "2026-03-02T18:01:35.287000Z",
"date_modified": "2026-03-02T18:01:35.287000Z",
"file_hash": "53cf20f3111aad0e2d50c9ec31350267eb419a76647a70af4b5ef0c8f41eeae0",
"private": false,
"record": {
"abstract": "The amino acid sequences of proteins provide rich information for inferring\ndistant phylogenetic relationships and for predicting protein functions.\nEstimating the rate matrix of residue substitutions from amino acid sequences\nis also important because the rate matrix can be used to develop scoring\nmatrices for sequence alignment. Here we use a continuous time Markov process\nto model the substitution rates of residues and develop a Bayesian Markov chain\nMonte Carlo method for rate estimation. We validate our method using simulated\nartificial protein sequences. Because different local regions such as binding\nsurfaces and the protein interior core experience different selection pressures\ndue to functional or stability constraints, we use our method to estimate the\nsubstitution rates of local regions. Our results show that the substitution\nrates are very different for residues in the buried core and residues on the\nsolvent exposed surfaces. In addition, the rest of the proteins on the binding\nsurfaces also have very different substitution rates from residues. Based on\nthese findings, we further develop a method for protein function prediction by\nsurface matching using scoring matrices derived from estimated substitution\nrates for residues located on the binding surfaces. We show with examples that\nour method is effective in identifying functionally related proteins that have\noverall low sequence identity, a task known to be very challenging.",
"arxiv_id": "q-bio/0601019",
"authors": [
"Yan Y. Tseng",
"Jie Liang"
],
"categories": [
"q-bio.BM"
],
"doi": "10.1093/molbev/msj048",
"journal_ref": "Mol Biol Evol. 2006 Feb;23(2):421-36. Epub 2005 Oct 26",
"title": "Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach",
"url": "https://arxiv.org/abs/q-bio/0601019"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "50e8eb12-76e8-4034-bdad-04fee214f0b3",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}