| |
Abstract:
Unsupervised learning algorithms have been derived for several
statistical models of English grammar, but their computational
complexity makes applying them to large datasets intractable.
This paper presents a probabilistic model of English grammar that
is much simpler than conventional models, but which admits an
efficient EM training algorithm. The model is based upon
grammatical bigrams, i.e., syntactic relationships between pairs
of words. We present the results of experiments that quantify the
representational adequacy of the grammatical bigram model, its
ability to generalize from labelled data, and its ability to
induce syntactic structure from large amounts of raw text.
|