dorsal/arxiv
View SchemaMetric learning for phylogenetic invariants
| Authors | Nicholas Eriksson, Yuan Yao |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0703034 |
| URL | https://arxiv.org/abs/q-bio/0703034 |
Abstract
We introduce new methods for phylogenetic tree quartet construction by using machine learning to optimize the power of phylogenetic invariants. Phylogenetic invariants are polynomials in the joint probabilities which vanish under a model of evolution on a phylogenetic tree. We give algorithms for selecting a good set of invariants and for learning a metric on this set of invariants which optimally distinguishes the different models. Our learning algorithms involve linear and semidefinite programming on data simulated over a wide range of parameters. We provide extensive tests of the learned metrics on simulated data from phylogenetic trees with four leaves under the Jukes-Cantor and Kimura 3-parameter models of DNA evolution. Our method greatly improves on other uses of invariants and is competitive with or better than neighbor-joining. In particular, we obtain metrics trained on trees with short internal branches which perform much better than neighbor joining on this region of parameter space.
{
"annotation_id": "1355c7f7-13fd-48f5-aea6-d3f916f3de2b",
"date_created": "2026-03-02T18:01:35.569000Z",
"date_modified": "2026-03-02T18:01:35.569000Z",
"file_hash": "0c4d5751bb0687aba696c7307faba847b1bb1a9bee4acbe19b561c69d483f55e",
"private": false,
"record": {
"abstract": "We introduce new methods for phylogenetic tree quartet construction by using\nmachine learning to optimize the power of phylogenetic invariants. Phylogenetic\ninvariants are polynomials in the joint probabilities which vanish under a\nmodel of evolution on a phylogenetic tree. We give algorithms for selecting a\ngood set of invariants and for learning a metric on this set of invariants\nwhich optimally distinguishes the different models. Our learning algorithms\ninvolve linear and semidefinite programming on data simulated over a wide range\nof parameters. We provide extensive tests of the learned metrics on simulated\ndata from phylogenetic trees with four leaves under the Jukes-Cantor and Kimura\n3-parameter models of DNA evolution. Our method greatly improves on other uses\nof invariants and is competitive with or better than neighbor-joining. In\nparticular, we obtain metrics trained on trees with short internal branches\nwhich perform much better than neighbor joining on this region of parameter\nspace.",
"arxiv_id": "q-bio/0703034",
"authors": [
"Nicholas Eriksson",
"Yuan Yao"
],
"categories": [
"q-bio.PE"
],
"title": "Metric learning for phylogenetic invariants",
"url": "https://arxiv.org/abs/q-bio/0703034"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "c43e94ab-b3a8-4ef1-8bdb-1aa595e65a2c",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}