| |
Abstract:
Recent constraint-based sentence comprehension theories make
extensive use of argument structure frequency information in
explaining ambiguity resolution (e.g., Garnsey et al., 1997;
MacDonald et al., 1994; Trueswell & Tanenhaus, 1994). However,
while such frequency measures often predict comprehenders' choices
for ambiguities, the source of frequency differences remains
unclear. One possibility is that they are simply the result of
(historically) early random variation, reinforced over generations
of use. Alternatively, frequency differences might reflect
underlying differences in the frequency of semantic alternatives
(e.g., extending Levin, 1993), which can explain argument structure
frequency differences by appealing to differences in the frequency
of events in the world. A second concern is that very little
research has examined the appropriateness of corpus- versus
survey-based frequency measures. These measures have generally been
treated as interchangeable, but they do not necessarily reflect the
same underlying distributions.
To examine these issues, we used data for verbs (e.g.,
proposed
) which can take sentence complements (SCs) and corresponding
SC-taking nouns (e.g.,
proposal
), from both the Penn Treebank's Wall Street Journal (WSJ) corpus
and sentence-completion surveys. In the surveys, participants
continued fragments such as
Bill proposed
(verbs) or
Caroline ignored the proposal
(nouns) to form complete sentences. Participants completed either
a verb or a noun survey. Approximately 100 tokens of each word were
coded from each source.
We compared corresponding nouns and verbs on the probability of
taking an SC. If frequency biases are a matter of random variation,
they should not be expected to correlate across noun-verb pairs.
Alternatively, noun-verb pairs share substantial semantic
information, and thus if frequency biases are determined by
semantics, the biases across pairs should correlate. In both the
survey and WSJ sources, the SC probabilities were strongly
correlated
(r
=.49,
p<</I>.001;
r=.67,
p<</I>.001, respectively), suggesting that semantics
has a substantial influence on frequency biases. We further
examined these relationships by sorting the nouns and verbs into
semantic subcategories following Levin (1993) and Wierzbicka (1987)
and comparing the subcategories. Levin's subcategories provided
further evidence that fine-grained semantics is related to argument
structure frequency.
To examine the relationship between corpora and surveys, we
compared the sources separately for verbs and nouns, using the same
SC probability measure. For verbs, the correlation was marginal
(r=.26); for the nouns, the sources were strongly correlated
(r=.
59,
p<</I>.001), and this pattern also appeared for other
probability measures. These results indicate that corpora and
survey sources do tend to agree, but the relative weakness of the
verb correlations suggests a need for consideration of both types
of sources in using frequency information to predict comprehension
difficulty.
|