| |
Abstract:
In this paper we examine the problem of estimating the
parameters of a multinomial distribution over a large number of
discrete outcomes, most of which do not appear in the training
data. We analyze this problem from a Bayesian perspective and
develop a hierarchical prior that incorporates the assumption that
the observed outcomes constitute only a small subset of the
possible outcomes. We show how to efficiently perform exact
inference with this form of hierarchical prior and compare our
method to standard approaches and demonstrate its merits.
|