Monthly
288 pp. per issue
6 x 9, illustrated
ISSN
0899-7667
E-ISSN
1530-888X
2014 Impact factor:
2.21

Neural Computation

September 2010, Vol. 22, No. 9, Pages 2229-2284
(doi: 10.1162/NECO_a_00012)
© 2010 Massachusetts Institute of Technology
Topographic Mapping of Large Dissimilarity Data Sets
Article PDF (572.72 KB)
Abstract

Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for example, browsing, become available. Both methods have been introduced for vectorial data sets; they require a classical feature encoding of information. Often data are available in the form of pairwise distances only, such as arise from a kernel matrix, a graph, or some general dissimilarity measure. In such cases, NG and SOM cannot be applied directly. In this article, we introduce relational topographic maps as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure. These methods are equivalent to the standard (vectorial) techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding. Extending these techniques for the general case of non-Euclidean dissimilarities makes possible an interpretation of relational clustering as clustering in pseudo-Euclidean space. We compare the methods to well-known clustering methods for proximity data based on deterministic annealing and discuss how far convergence can be guaranteed in the general case. Relational clustering is quadratic in the number of data points, which makes the algorithms infeasible for huge data sets. We propose an approximate patch version of relational clustering that runs in linear time. The effectiveness of the methods is demonstrated in a number of examples.