| |
Abstract:
Estimating insurance premia from data is a difficult
regression problem for several reasons: the large number of
variables, many of which are discrete, and the very peculiar
shape of the noise distribution, asymmetric with fat tails, with
a large majority zeros and a few unreliable and very large
values. We compare several machine learning methods for
estimating insurance premia, and test them on a large data base
of car insurance policies. We find that function approximation
methods that do not optimize a squared loss, like Support Vector
Machines regression, do not work well in this context. Compared
methods include decision trees and generalized linear models. The
best results are obtained with a mixture of experts, which better
identifies the least and most risky contracts, and allows to
reduce the median premium by charging more to the most risky
customers.
|