dorsal/arxiv
View SchemaPairwise alignment incorporating dipeptide covariation
| Authors | Gavin E. Crooks, Richard E. Green, Steven E. Brenner |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0502020 |
| URL | https://arxiv.org/abs/q-bio/0502020 |
| DOI | 10.1093/bioinformatics/bti616 |
| Journal | Bioinformatics 21 3704-3710 (2005) |
Abstract
Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.
{
"annotation_id": "bbdb5fff-4796-49f9-9ba5-133d1968efd2",
"date_created": "2026-03-02T18:01:31.538000Z",
"date_modified": "2026-03-02T18:01:31.538000Z",
"file_hash": "80e634c943fd3d10d7a1e53ef5ca7788294c6fc18325d2ae7805d600b0e97b43",
"private": false,
"record": {
"abstract": "Motivation: Standard algorithms for pairwise protein sequence alignment make\nthe simplifying assumption that amino acid substitutions at neighboring sites\nare uncorrelated. This assumption allows implementation of fast algorithms for\npairwise sequence alignment, but it ignores information that could conceivably\nincrease the power of remote homolog detection. We examine the validity of this\nassumption by constructing extended substitution matrixes that encapsulate the\nobserved correlations between neighboring sites, by developing an efficient and\nrigorous algorithm for pairwise protein sequence alignment that incorporates\nthese local substitution correlations, and by assessing the ability of this\nalgorithm to detect remote homologies. Results: Our analysis indicates that\nlocal correlations between substitutions are not strong on the average.\nFurthermore, incorporating local substitution correlations into pairwise\nalignment did not lead to a statistically significant improvement in remote\nhomology detection. Therefore, the standard assumption that individual residues\nwithin protein sequences evolve independently of neighboring positions appears\nto be an efficient and appropriate approximation.",
"arxiv_id": "q-bio/0502020",
"authors": [
"Gavin E. Crooks",
"Richard E. Green",
"Steven E. Brenner"
],
"categories": [
"q-bio.BM",
"q-bio.PE"
],
"doi": "10.1093/bioinformatics/bti616",
"journal_ref": "Bioinformatics 21 3704-3710 (2005)",
"title": "Pairwise alignment incorporating dipeptide covariation",
"url": "https://arxiv.org/abs/q-bio/0502020"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "2930b071-dd3b-4e24-8f90-997e3bd66c7c",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}