| |
Abstract:
real-valued functions (such as support vector machines,
sigmoid networks, and boosting) give bounds on misclassification
probability that do not depend on the size of the classifier, and
hence can be considerably smaller than the bounds that follow from
the VC theory. In this paper, we show that these techniques can be
more widely applied, by representing other boolean functions as
two-layer neural networks (thresholded convex combinations of
boolean functions). For example, we show that with high probability
any decision tree of depth no more than
d
that is consistent with
m
training examples has misclassification probability no more than
, where
is the class of node decision functions,
can be thought of as the effective number of leaves (it becomes
small as the distribution on the leaves induced by the training
data gets far from uniform), and
is the average leaf depth (averaged over the training data). This
bound is qualitatively different from the VC bound and can be
considerably smaller.
We use the same technique to give similar results for DNF
formulae.
|