Monthly
288 pp. per issue
6 x 9, illustrated
ISSN
0899-7667
E-ISSN
1530-888X
2014 Impact factor:
2.21

Neural Computation

June 1, 2003, Vol. 15, No. 6, Pages 1191-1253
(doi: 10.1162/089976603321780272)
© 2003 Massachusetts Institute of Technology
Estimation of Entropy and Mutual Information
Article PDF (845.98 KB)
Abstract

We present some new results on the nonparametric estimation of entropy and mutual information. First, we use an exact local expansion of the entropy function to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators. The setup is related to Grenander's method of sieves and places no assumptions on the underlying probability measure generating the data. Second, we prove a converse to these consistency theorems, demonstrating that a misapplication of the most common estimation techniques leads to an arbitrarily poor estimate of the true information, even given unlimited data. This “inconsistency” theorem leads to an analytical approximation of the bias, valid in surprisingly small sample regimes and more accurate than the usual formula of Miller and Madow over a large region of parameter space. The two most practical implications of these results are negative: (1) information estimates in a certain data regime are likely contaminated by bias, even if “bias-corrected” estimators are used, and (2) confidence intervals calculated by standard techniques drastically underestimate the error of the most common estimation methods.

Finally, we note a very useful connection between the bias of entropy estimators and a certain polynomial approximation problem. By casting bias calculation problems in this approximation theory framework, we obtain the best possible generalization of known asymptotic bias results. More interesting, this framework leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small. We demonstrate the application of this new estimator on both real and simulated data.