| |
Abstract:
Theories of human sentence processing have largely been shaped
by the study of pathologies in human sentence processing:
principles and mechanisms seek to explain the difficulty people
have in comprehending structures that are ambiguous or
memory-intensive. While often insightful, this approach diverts
attention from the fact that people are in fact extremely
accurate and effective in understanding the vast majority of
utterances they encounter. In this talk, we argue for the
importance of studying the behaviour of robust, accurate, and
broad coverage parsing systems as models of human performance.In
particular, we present the results of experiments conducted using
the Incremental Cascaded Markov Model (ICMM) (Crocker and Brants
1999). The model is consistent with accounts of human language
processing that advocate probabilistic mechanisms for parsing and
disambiguation (e.g. Jurafsky 1996; MacDonald et al 1994;
Tanenhaus et al 1999; Crocker and Corley, to appear). ICMM is a
maximum-likelihood model that uses a generalisation of the hidden
Markov model. Such models have been previously defended as good
psychological models of lexical category disambiguation (Corley
and Crocker 1999). The ICMM uses layered, or cascaded, Markov
models (CMMs) to determine the most likely syntactic analyses for
a given input (Brants 1999). To make the model more
psychologically plausible it has been adapted to incrementally
select a subset (beam) of preferred syntactic analyses (Crocker
and Brants 1999).
We summarise the results of simulations with the system,
showing that is accounts for a range of observed results from
psycholinguistic experiments. These include NP/S complement
ambiguities, reduced relatives, noun-verb category ambiguities,
and 'that'-ambiguities. We also show how independently motivated
properties of the system yield standardly observed recency
effects (e.g. low attachment). Interestingly, the model also
accounts for the experimental findings of Pickering et al (to
appear), which contradict the predictions of a pure maximum
likelihood model. The reason for the differing predictions of the
ICMM and 'pure' likelihood models lies in an independently
motivated account of how probabilities should be calculated,
which effectively gives higher probabilities to 'simpler'
structures. This can be seen as a partial approximation of
Pickering et al's Informativity measure. Time permitting, we will
discuss other techniques for implementing Informativity in the
parser.
In the final part of the talk, we present the results of
general parsing performance experiments. We show the accuracy of
the system with respect to a parsed corpus (the gold standard)
and in comparison to the optimised non-incremental model. In
conclusion, we argue that the broad-coverage probabilistic
parsing models, and the ICMM in particular, provide a valuable
framework for explaining both accurate processing of "garden
variety" language as well as garden path phenomena.
Brants, T. (1999). Cascaded Markov Models. In: Proceedings of
9th Conference of the European Chapter of the Association for
Computational Linguistics (EACL-99), Bergen, Norway.
Chater, N., Crocker, M. & Pickering, M. (1998). The
Rational Analysis of Inquiry: The Case for Parsing. In: Chater
& Oaksford (eds), Rational Models of Cognition, pp. 441-468,
Oxford University Press, Oxford, UK.
Corley, S. & Crocker, M.W. (1999). The Modular Statistical
Hypothesis: Exploring Lexical Category Ambiguity. In: Crocker,
Pickering & Clifton (eds), Architectures and Mechanisms for
Language Processing, CUP, England.
Crocker, M & Brants, T. (1999). Incremental Probabilistic
Models of Human Linguistic Performance. Paper presented at AMLaP
99, Edinburgh, UK.
|