Computational Linguistics

Paola Merlo, Editor
March 2016, Vol. 42, No. 1, Pages 55-90
(doi: 10.1162/COLI_a_00242)
Integrating Selectional Constraints and Subcategorization Frames in a Dependency Parser
Statistical parsers are trained on treebanks that are composed of a few thousand sentences. In order to prevent data sparseness and computational complexity, such parsers make strong independence hypotheses on the decisions that are made to build a syntactic tree. These independence hypotheses yield a decomposition of the syntactic structures into small pieces, which in turn prevent the parser from adequately modeling many lexico-syntactic phenomena like selectional constraints and subcategorization frames. Additionally, treebanks are several orders of magnitude too small to observe many lexico-syntactic regularities, such as selectional constraints and subcategorization frames. In this article, we propose a solution to both problems: how to account for patterns that exceed the size of the pieces that are modeled in the parser and how to obtain subcategorization frames and selectional constraints from raw corpora and incorporate them in the parsing process. The method proposed was evaluated on French and on English. The experiments on French showed a decrease of 41.6% of selectional constraint violations and a decrease of 22% of erroneous subcategorization frame assignment. These figures are lower for English: 16.21% in the first case and 8.83% in the second.