| |
Abstract:
The popular K-means clustering partitions a data set by
minimizing a sum-of-squares cost function. A coordinate descend
method is then used to find local minima. In this paper we show
that the minimization can be reformulated as a trace maximization
problem associated with the Gram matrix of the data vectors.
Furthermore, we show that a relaxed version of the trace
maximization problem possesses
global
optimal solutions which can be obtained by computing a partial
eigendecomposition of the Gram matrix, and the cluster assignment
for each data vectors can be found by computing a pivoted QR
decomposition of the eigenvector matrix. As a by-product we also
derive a lower bound for the minimum of the sum-of-squares cost
function.
|