dorsal/arxiv
View SchemaPhylogeny of Mixture Models: Robustness of Maximum Likelihood and Non-identifiable Distributions
| Authors | Daniel Stefankovic, Eric Vigoda |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0609038 |
| URL | https://arxiv.org/abs/q-bio/0609038 |
Abstract
We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work, we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and Markov chain Monte Carlo algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model whose transition matrices can be parameterized by an open set of multi-linear polynomials, either has non-identifiable mixture distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from convex programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake, and are natural from an algebraic geometry perspective.
{
"annotation_id": "25f6b9ba-1486-4c46-9c7c-165604c421ee",
"date_created": "2026-03-02T18:01:35.508000Z",
"date_modified": "2026-03-02T18:01:35.508000Z",
"file_hash": "4be643801ba27c5259b0b5747b7b7cf133e613285a66a3b6d66a78103a8f11d7",
"private": false,
"record": {
"abstract": "We address phylogenetic reconstruction when the data is generated from a\nmixture distribution. Such topics have gained considerable attention in the\nbiological community with the clear evidence of heterogeneity of mutation\nrates. In our work, we consider data coming from a mixture of trees which share\na common topology, but differ in their edge weights (i.e., branch lengths). We\nfirst show the pitfalls of popular methods, including maximum likelihood and\nMarkov chain Monte Carlo algorithms. We then determine in which evolutionary\nmodels, reconstructing the tree topology, under a mixture distribution, is\n(im)possible. We prove that every model whose transition matrices can be\nparameterized by an open set of multi-linear polynomials, either has\nnon-identifiable mixture distributions, in which case reconstruction is\nimpossible in general, or there exist linear tests which identify the topology.\nThis duality theorem, relies on our notion of linear tests and uses ideas from\nconvex programming duality. Linear tests are closely related to linear\ninvariants, which were first introduced by Lake, and are natural from an\nalgebraic geometry perspective.",
"arxiv_id": "q-bio/0609038",
"authors": [
"Daniel Stefankovic",
"Eric Vigoda"
],
"categories": [
"q-bio.PE"
],
"title": "Phylogeny of Mixture Models: Robustness of Maximum Likelihood and Non-identifiable Distributions",
"url": "https://arxiv.org/abs/q-bio/0609038"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "b70c080d-e759-475d-a9ae-2d417b26929d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}