| |
Abstract:
We introduce a novel distributional clustering algorithm that
explicitly maximizes the mutual information per cluster between the
data and given categories. This algorithm can be considered as a
bottom up hard version of the recently introduced "Information
Bottleneck Method". We relate the mutual information between
clusters and categories to the Bayesian classification error, which
provides another motivation for using the obtained clusters as
features. The algorithm is compared with the top-down soft version
of the information bottleneck method and a relationship between the
hard and soft results is established. We demonstrate the algorithm
on the
20 Newsgroups
data set. For a subset of two news-groups we achieve compression
by 3 orders of magnitudes loosing only 10% of the original mutual
information.
|