Quarterly (March, June, September, December)
160 pp. per issue
6 3/4 x 10
ISSN
0891-2017
E-ISSN
1530-9312
2014 Impact factor:
1.23

Computational Linguistics

Paola Merlo, Editor
September 2004, Vol. 30, No. 3, Pages 253-276
(doi: 10.1162/0891201041850894)
© 2004 Association for Computational Linguistics
Sample Selection for Statistical Parsing
Article PDF (197.66 KB)
Abstract

Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a helpful training example. Experiments are performed across two syntactic learning tasks and within the single task of parsing across two learning models to compare the effect of different predictive criteria. We find that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.