Annotation: dorsal/arxiv

Authors	Naftali Tishby, Fernando C. Pereira, William Bialek
Categories	physics.data-an cond-mat.dis-nn cs.LG nlin.AO
ArXiv ID	physics/0004057
URL	https://arxiv.org/abs/physics/0004057

Authors

Naftali Tishby, Fernando C. Pereira, William Bialek

Abstract

We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

{ "annotation_id": "48ca6602-36c2-4102-90e5-e038b65fd35d", "date_created": "2026-03-02T18:00:31.993000Z", "date_modified": "2026-03-02T18:00:31.993000Z", "file_hash": "769dad7a34393a74b121b7ccd601fbff7aac181238c26cce1ee7bfff394cd803", "private": false, "record": { "abstract": "We define the relevant information in a signal $x\\in X$ as being the\ninformation that this signal provides about another signal $y\\in \\Y$. Examples\ninclude the information that face images provide about the names of the people\nportrayed, or the information that speech sounds provide about the words\nspoken. Understanding the signal $x$ requires more than just predicting $y$, it\nalso requires specifying which features of $\\X$ play a role in the prediction.\nWe formalize this problem as that of finding a short code for $\\X$ that\npreserves the maximum information about $\\Y$. That is, we squeeze the\ninformation that $\\X$ provides about $\\Y$ through a `bottleneck\u0027 formed by a\nlimited set of codewords $\\tX$. This constrained optimization problem can be\nseen as a generalization of rate distortion theory in which the distortion\nmeasure $d(x,\\x)$ emerges from the joint statistics of $\\X$ and $\\Y$. This\napproach yields an exact set of self consistent equations for the coding rules\n$X \\to \\tX$ and $\\tX \\to \\Y$. Solutions to these equations can be found by a\nconvergent re-estimation method that generalizes the Blahut-Arimoto algorithm.\nOur variational principle provides a surprisingly rich framework for discussing\na variety of problems in signal processing and learning, as will be described\nin detail elsewhere.", "arxiv_id": "physics/0004057", "authors": [ "Naftali Tishby", "Fernando C. Pereira", "William Bialek" ], "categories": [ "physics.data-an", "cond-mat.dis-nn", "cs.LG", "nlin.AO" ], "title": "The information bottleneck method", "url": "https://arxiv.org/abs/physics/0004057" }, "schema_id": "dorsal/arxiv", "source": { "execution_id": "9d698f74-6522-4d92-a66e-c4f172cec1a5", "id": "arXiv Dataset IDs", "type": "Model", "variant": "snapshot-2026-03-01", "version": "0.1.0" }, "user_id": 1000002 }