|
Abstract:
We present a powerful meta-clustering technique called
Iterative Double Clustering (IDC). The IDC method is a natural
extension of the recent Double Clustering (DC) method of Slonim
and Tishby that exhibited impressive performance on text
categorization tasks [12]. Using synthetically generated data we
empirically find that whenever the DC procedure is successful in
recovering some of the structure hidden in the data, the extended
IDC procedure can incrementally compute a significantly more
accurate classification. IDC is especially advantageous when the
data exhibits high attribute noise. Our simulation results also
show the effectiveness of IDC in text categorization problems.
Surprisingly, this unsupervised procedure can be competitive with
a (supervised) SVM trained with a small training set. Finally, we
propose a simple and natural extension of IDC for semi-supervised
and transductive learning where we are given both labeled and
unlabeled examples.
References
[12] Noam Slonim and Naftali Tishby. Document clustering using
word clusters via the information bottleneck method. In
ACM SIGIR 2000
, 2000.
|