dorsal/arxiv
View SchemaHow much can evolved characters tell us about the tree that generated them?
| Authors | Elchanan Mossel, Mike Steel |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0406048 |
| URL | https://arxiv.org/abs/q-bio/0406048 |
Abstract
In this paper we review some recent results that shed light on a fundamental question in molecular systematics: how much phylogenetic `signal' can we expect from characters that have evolved under some Markov process? There are many sides to this question and we begin by describing some explicit bounds on the probability of correctly reconstructing an ancestral state from the states observed at the tips. We show how this bound sets upper limits on the probability of tree reconstruction from aligned sequences, and we provide some new extensions that allow site-to-site rate variation or a covarion mechanism. We then explore the relationship between the number of sites required for accurate tree reconstruction and other model parameters - such as the number of species, and substitution probabilities, and we describe a phase transition that occurs when substitution probabilities exceed a critical value. In the remainder of this paper we turn to models of character evolution where the state space is assumed to be either infinite or very large. These models have some relevance to certain types of genomic data (such as gene order) and here we again investigate how many characters are required for accurate tree reconstruction.
{
"annotation_id": "f02fb76b-ad4b-49d7-bdaa-90fcd2f5b5d9",
"date_created": "2026-03-02T18:01:32.295000Z",
"date_modified": "2026-03-02T18:01:32.295000Z",
"file_hash": "f80b4ea55ad3824e1613e7272c55ef3012533933e37eff324a18e607cee96741",
"private": false,
"record": {
"abstract": "In this paper we review some recent results that shed light on a fundamental\nquestion in molecular systematics: how much phylogenetic `signal\u0027 can we expect\nfrom characters that have evolved under some Markov process? There are many\nsides to this question and we begin by describing some explicit bounds on the\nprobability of correctly reconstructing an ancestral state from the states\nobserved at the tips. We show how this bound sets upper limits on the\nprobability of tree reconstruction from aligned sequences, and we provide some\nnew extensions that allow site-to-site rate variation or a covarion mechanism.\nWe then explore the relationship between the number of sites required for\naccurate tree reconstruction and other model parameters - such as the number of\nspecies, and substitution probabilities, and we describe a phase transition\nthat occurs when substitution probabilities exceed a critical value. In the\nremainder of this paper we turn to models of character evolution where the\nstate space is assumed to be either infinite or very large. These models have\nsome relevance to certain types of genomic data (such as gene order) and here\nwe again investigate how many characters are required for accurate tree\nreconstruction.",
"arxiv_id": "q-bio/0406048",
"authors": [
"Elchanan Mossel",
"Mike Steel"
],
"categories": [
"q-bio.PE",
"math.ST",
"stat.TH"
],
"title": "How much can evolved characters tell us about the tree that generated them?",
"url": "https://arxiv.org/abs/q-bio/0406048"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "05292d8e-49a3-4c77-82d1-f88c84508ebb",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}