Kernels are key components of pattern recognition mechanisms. We propose a universal kernel optimality criterion, which is independent of the classifier to be used. Defining data polarization as a process by which points of different classes are driven to geometrically opposite locations in a confined domain, we propose selecting the kernel parameter values that polarize the data in the associated feature space. Conversely, the kernel is said to be polarized by the data. Kernel polarization gives rise to an unconstrained optimization problem. We show that complete kernel polarization yields consistent classification by kernel-sum classifiers. Tested on real-life data, polarized kernels demonstrate a clear advantage over the Euclidean distance in proximity classifiers. Embedded in a support vectors classifier, kernel polarization is found to yield about the same performance as exhaustive parameter search.