dorsal/arxiv
View SchemaMaximum likelihood estimation of phylogenetic tree and substitution rates via generalized neighbor-joining and the EM algorithm
| Authors | Asger Hobolth, Ruriko Yoshida |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0511034 |
| URL | https://arxiv.org/abs/q-bio/0511034 |
Abstract
A central task in the study of molecular sequence data from present-day species is the reconstruction of the ancestral relationships. The most established approach to tree reconstruction is the maximum likelihood (ML) method. In this method, evolution is described in terms of a discrete-state continuous-time Markov process on a phylogenetic tree. The substitution rate matrix, that determines the Markov process, can be estimated using the expectation maximization (EM) algorithm. Unfortunately, an exhaustive search for the ML phylogenetic tree is computationally prohibitive for large data sets. In such situations, the neighbor-joining (NJ) method is frequently used because of its computational speed. The NJ method reconstructs trees by clustering neighboring sequences recursively, based on pairwise comparisons between the sequences. The NJ method can be generalized such that reconstruction is based on comparisons of subtrees rather than pairwise distances. In this paper, we present an algorithm for simultaneous substitution rate estimation and phylogenetic tree reconstruction. The algorithm iterates between the EM algorithm for estimating substitution rates and the generalized NJ method for tree reconstruction. Preliminary results of the approach are encouraging.
{
"annotation_id": "07741241-e519-40cb-8d09-0bd98d8d141e",
"date_created": "2026-03-02T18:01:35.441000Z",
"date_modified": "2026-03-02T18:01:35.441000Z",
"file_hash": "d0c6c72d9c0f65245093f565cd5a2e07a28277f064ca420cedc45cbd8b61c044",
"private": false,
"record": {
"abstract": "A central task in the study of molecular sequence data from present-day\nspecies is the reconstruction of the ancestral relationships. The most\nestablished approach to tree reconstruction is the maximum likelihood (ML)\nmethod. In this method, evolution is described in terms of a discrete-state\ncontinuous-time Markov process on a phylogenetic tree. The substitution rate\nmatrix, that determines the Markov process, can be estimated using the\nexpectation maximization (EM) algorithm. Unfortunately, an exhaustive search\nfor the ML phylogenetic tree is computationally prohibitive for large data\nsets. In such situations, the neighbor-joining (NJ) method is frequently used\nbecause of its computational speed. The NJ method reconstructs trees by\nclustering neighboring sequences recursively, based on pairwise comparisons\nbetween the sequences. The NJ method can be generalized such that\nreconstruction is based on comparisons of subtrees rather than pairwise\ndistances. In this paper, we present an algorithm for simultaneous substitution\nrate estimation and phylogenetic tree reconstruction. The algorithm iterates\nbetween the EM algorithm for estimating substitution rates and the generalized\nNJ method for tree reconstruction. Preliminary results of the approach are\nencouraging.",
"arxiv_id": "q-bio/0511034",
"authors": [
"Asger Hobolth",
"Ruriko Yoshida"
],
"categories": [
"q-bio.QM"
],
"title": "Maximum likelihood estimation of phylogenetic tree and substitution rates via generalized neighbor-joining and the EM algorithm",
"url": "https://arxiv.org/abs/q-bio/0511034"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "31e04346-482a-45dc-82b6-e6e4357eb6d8",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}