dorsal/arxiv
View SchemaMAVID: Constrained ancestral alignment of multiple sequences
| Authors | Nicolas Bray, Lior Pachter |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0311018 |
| URL | https://arxiv.org/abs/q-bio/0311018 |
Abstract
We describe a new global multiple alignment program capable of aligning a large number of genomic regions. Our progressive alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region which consists of 1.8Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments: an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse and rat genomes.
{
"annotation_id": "cfa7ccf1-e711-46e8-8719-62fc275c7104",
"date_created": "2026-03-02T18:01:28.608000Z",
"date_modified": "2026-03-02T18:01:28.608000Z",
"file_hash": "fa6e70e6522d92bf65e09fa51f6781d2e87102ce9ccf1950c0a1df6626059419",
"private": false,
"record": {
"abstract": "We describe a new global multiple alignment program capable of aligning a\nlarge number of genomic regions. Our progressive alignment approach\nincorporates the following ideas: maximum-likelihood inference of ancestral\nsequences, automatic guide-tree construction, protein based anchoring of\nab-initio gene predictions, and constraints derived from a global homology map\nof the sequences. We have implemented these ideas in the MAVID program, which\nis able to accurately align multiple genomic regions up to megabases long.\nMAVID is able to effectively align divergent sequences, as well as incomplete\nunfinished sequences. We demonstrate the capabilities of the program on the\nbenchmark CFTR region which consists of 1.8Mb of human sequence and 20\northologous regions in marsupials, birds, fish, and mammals. Finally, we\ndescribe two large MAVID alignments: an alignment of all the available HIV\ngenomes and a multiple alignment of the entire human, mouse and rat genomes.",
"arxiv_id": "q-bio/0311018",
"authors": [
"Nicolas Bray",
"Lior Pachter"
],
"categories": [
"q-bio.GN"
],
"title": "MAVID: Constrained ancestral alignment of multiple sequences",
"url": "https://arxiv.org/abs/q-bio/0311018"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "de125020-99d4-407d-a0d2-35b5d05626a1",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}