dorsal/arxiv
View SchemaThe information bottleneck method
| Authors | Naftali Tishby, Fernando C. Pereira, William Bialek |
|---|---|
| Categories | |
| ArXiv ID | physics/0004057 |
| URL | https://arxiv.org/abs/physics/0004057 |
Abstract
We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.
{
"annotation_id": "48ca6602-36c2-4102-90e5-e038b65fd35d",
"date_created": "2026-03-02T18:00:31.993000Z",
"date_modified": "2026-03-02T18:00:31.993000Z",
"file_hash": "769dad7a34393a74b121b7ccd601fbff7aac181238c26cce1ee7bfff394cd803",
"private": false,
"record": {
"abstract": "We define the relevant information in a signal $x\\in X$ as being the\ninformation that this signal provides about another signal $y\\in \\Y$. Examples\ninclude the information that face images provide about the names of the people\nportrayed, or the information that speech sounds provide about the words\nspoken. Understanding the signal $x$ requires more than just predicting $y$, it\nalso requires specifying which features of $\\X$ play a role in the prediction.\nWe formalize this problem as that of finding a short code for $\\X$ that\npreserves the maximum information about $\\Y$. That is, we squeeze the\ninformation that $\\X$ provides about $\\Y$ through a `bottleneck\u0027 formed by a\nlimited set of codewords $\\tX$. This constrained optimization problem can be\nseen as a generalization of rate distortion theory in which the distortion\nmeasure $d(x,\\x)$ emerges from the joint statistics of $\\X$ and $\\Y$. This\napproach yields an exact set of self consistent equations for the coding rules\n$X \\to \\tX$ and $\\tX \\to \\Y$. Solutions to these equations can be found by a\nconvergent re-estimation method that generalizes the Blahut-Arimoto algorithm.\nOur variational principle provides a surprisingly rich framework for discussing\na variety of problems in signal processing and learning, as will be described\nin detail elsewhere.",
"arxiv_id": "physics/0004057",
"authors": [
"Naftali Tishby",
"Fernando C. Pereira",
"William Bialek"
],
"categories": [
"physics.data-an",
"cond-mat.dis-nn",
"cs.LG",
"nlin.AO"
],
"title": "The information bottleneck method",
"url": "https://arxiv.org/abs/physics/0004057"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "9d698f74-6522-4d92-a66e-c4f172cec1a5",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}