dorsal/arxiv
View SchemaA methodology for determining amino-acid substitution matrices from set covers
| Authors | A. H. L. Porto, V. C. Barbosa |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0504007 |
| URL | https://arxiv.org/abs/q-bio/0504007 |
| DOI | 10.1007/11732242_13 |
| Journal | Lecture Notes in Computer Science 3907 (2006), 138-148 |
Abstract
We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration.
{
"annotation_id": "3cd8e25e-1350-4d70-8a62-d39caaac3034",
"date_created": "2026-03-02T18:01:32.164000Z",
"date_modified": "2026-03-02T18:01:32.164000Z",
"file_hash": "9f362550c38d1df72ca7af082dc836b920e6715fdaf3832833a79a3efb3af6c0",
"private": false,
"record": {
"abstract": "We introduce a new methodology for the determination of amino-acid\nsubstitution matrices for use in the alignment of proteins. The new methodology\nis based on a pre-existing set cover on the set of residues and on the\nundirected graph that describes residue exchangeability given the set cover.\nFor fixed functional forms indicating how to obtain edge weights from the set\ncover and, after that, substitution-matrix elements from weighted distances on\nthe graph, the resulting substitution matrix can be checked for performance\nagainst some known set of reference alignments and for given gap costs. Finding\nthe appropriate functional forms and gap costs can then be formulated as an\noptimization problem that seeks to maximize the performance of the substitution\nmatrix on the reference alignment set. We give computational results on the\nBAliBASE suite using a genetic algorithm for optimization. Our results indicate\nthat it is possible to obtain substitution matrices whose performance is either\ncomparable to or surpasses that of several others, depending on the particular\nscenario under consideration.",
"arxiv_id": "q-bio/0504007",
"authors": [
"A. H. L. Porto",
"V. C. Barbosa"
],
"categories": [
"q-bio.QM"
],
"doi": "10.1007/11732242_13",
"journal_ref": "Lecture Notes in Computer Science 3907 (2006), 138-148",
"title": "A methodology for determining amino-acid substitution matrices from set covers",
"url": "https://arxiv.org/abs/q-bio/0504007"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "e00fd6ba-8fad-4063-b8d8-959e7dc6ad9e",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}