|
Abstract:
The subcategorization preference of a verb is significantly
affected by its discourse context and its word sense. This suggests
that processing models must (1) represent subcategorization biases
at the level of the semantic lemma ('sense bias'), rather than
orthographic word ('verb bias'), and (2) be able to integrate
multiple probabilistic factors (sense bias, genre bias, animacy
bias, etc.). Frequencies are strongly dependent on the method used
for computing them.
By comparing subcategorization frequencies across six corpora
(Connine et al.'s (1984) sentence-production data, Garnsey et al.'s
(1997) sentence-completion data, written text (the Brown, Wall
Street Journal, and British National corpora), and conversational
data (the Switchboard corpus)) we show that:
1) Different verb senses have different subcategorization
probabilities.
Processing models generally assume that subcategorization biases
are specific to each verb entry (Clifton, Frazier, & Connine,
1994; Garnsey, 1997; Spivey-Knowlton & Sedivy, 1995; Trueswell
et al., 1993, inter alia). Our results reveal biases at the level
of the semantic lemma ('sense bias') rather than orthographic
word.
2) The genre constrains the
subcategorization probabilities of a verb.
Following work by Merlo (1994), Roland and Jurafsky (1998)
showed that subcategorization frequencies in elicited data (Connine
et al., 1984) differed from those in natural corpora (elicited
sentences have a greater probability of PPs but fewer passives and
zero arguments). They gave functional explanations for these
differences. We extend this work by showing that discourse topic,
verbal aspect and the animacy of the syntactic subject all affect
verb sense, and thereby subcategorization. For example when
subjects in Connine et al. were asked to use the verb
pass
with the discourse topic 'school' they tended to use the 'pass a
test' sense - a DO-biased sense rare in the Brown and WSJ data.
Perfective aspect correlates with DO complementation and
imperfective with SC (Dowty, 1990); such aspectual biases were
found in the elicited data. Additionally, we found that
inanimate-subject senses of verbs like
worry
were preempted by the generally animate subjects of the elicited
sentences.
As sentence-processing models grow more probabilistic, we need a
deeper understanding of the constraints which affect these
probabilities, in order to ensure the methodological soundness of
our measures and accurate representation of the complex factors
which influence our models.
|