dorsal/arxiv
View SchemaHierarchical Clustering Based on Mutual Information
| Authors | Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, Peter Grassberger |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0311039 |
| URL | https://arxiv.org/abs/q-bio/0311039 |
Abstract
Motivation: Clustering is a frequently used concept in variety of bioinformatical applications. We present a new method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). Results: We use this both in the Shannon (probabilistic) version of information theory, where the "objects" are probability distributions represented by random samples, and in the Kolmogorov (algorithmic) version, where the "objects" are symbol sequences. We apply our method to the construction of mammal phylogenetic trees from mitochondrial DNA sequences and we reconstruct the fetal ECG from the output of independent components analysis (ICA) applied to the ECG of a pregnant woman. Availability: The programs for estimation of MI and for clustering (probabilistic version) are available at http://www.fz-juelich.de/nic/cs/software
{
"annotation_id": "7bb155bc-a9be-4a3b-bc7a-9b9c8354265c",
"date_created": "2026-03-02T18:01:28.764000Z",
"date_modified": "2026-03-02T18:01:28.764000Z",
"file_hash": "a56ff5dc70708ad96aa60578c727bf26a49803114f146094d6a4a1d971b340af",
"private": false,
"record": {
"abstract": "Motivation: Clustering is a frequently used concept in variety of\nbioinformatical applications. We present a new method for hierarchical\nclustering of data called mutual information clustering (MIC) algorithm. It\nuses mutual information (MI) as a similarity measure and exploits its grouping\nproperty: The MI between three objects X, Y, and Z is equal to the sum of the\nMI between X and Y, plus the MI between Z and the combined object (XY).\n Results: We use this both in the Shannon (probabilistic) version of\ninformation theory, where the \"objects\" are probability distributions\nrepresented by random samples, and in the Kolmogorov (algorithmic) version,\nwhere the \"objects\" are symbol sequences. We apply our method to the\nconstruction of mammal phylogenetic trees from mitochondrial DNA sequences and\nwe reconstruct the fetal ECG from the output of independent components analysis\n(ICA) applied to the ECG of a pregnant woman.\n Availability: The programs for estimation of MI and for clustering\n(probabilistic version) are available at\nhttp://www.fz-juelich.de/nic/cs/software",
"arxiv_id": "q-bio/0311039",
"authors": [
"Alexander Kraskov",
"Harald St\u00f6gbauer",
"Ralph G. Andrzejak",
"Peter Grassberger"
],
"categories": [
"q-bio.QM",
"cs.CC",
"physics.bio-ph"
],
"title": "Hierarchical Clustering Based on Mutual Information",
"url": "https://arxiv.org/abs/q-bio/0311039"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "2dde4d4c-2025-4830-9be9-67799a485913",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}