| |
Abstract:
The mutual information of two random variables
i
and
j
with joint probabilities {π
i
j
} is commonly used in learning Bayesian nets as well as in many
other fields. The chances π
i
j
are usually estimated by the empirical sampling frequency
n
i
j
/
n
leading to a point estimate
I
(
n
i
j
/
n
) for the mutual information. To answer questions like ``is
I
(
n
i
j
/
n
) consistent with zero?'' or ``what is the probability that the
true mutual information is much larger than the point estimate?''
one has to go beyond the point estimate. In the Bayesian
framework one can answer these questions by utilizing a (second
order) prior distribution
p
(π) comprising prior information about π. From the prior
p
(π) one can compute the posterior
p
(π |
n
), from which the distribution
p
(
I
|
n
) of the mutual information can be calculated. We derive
reliable and quickly computable approximations for
p
(
I
|
n
). We concentrate on the mean, variance, skewness, and kurtosis,
and non-informative priors. For the mean we also give an exact
expression. Numerical issues and the range of validity are
discussed.
|