| |
Abstract:
Support Vector Machines (SVMs) are currently the
state-of-the-art models for many classification problems but they
suffer from the complexity of their training algorithm which is
at least
quadratic
with respect to the number of examples. Hence, it is hopeless to
try to solve real-life problems having more than a few hundreds
of thousands examples with SVMs. The present paper proposes a new
mixture of SVMs that can be easily implemented in parallel and
where each SVM is trained on a small subset of the whole dataset.
Experiments on a large benchmark dataset (Forest) as well as a
difficult speech database, yielded significant time improvement
(time complexity appears empirically to locally grow linearly
with the number of examples). In addition, and that is a
surprise, a significant improvement in
generalization
was observed on Forest.
|