| |
Abstract:
Verb transitivity biases, and verb
subcategorization probabilities generally, play an important role
in models of sentence processing. Unfortunately, counts
from different corpora and from psychological norming studies,
though generally positively correlated, differ considerably (see,
e.g., Merlo, 1994; Lapata et al., 2000), partly due to
differences in text genre and verb senses (e.g., Biber, 1988;
Roland &Jurafsky, 1998).
To make matters worse, different studies have
applied different criteria for transitivity. In this paper,
we describe the results of a new norming study with a coding
system that allows us to investigate how coding decisions affect
transitivity biases, and we discuss two types of syntactic
constructions --- adjectival passives and verb+particle
constructions --- that affect transitivity counts
considerably.
The norming study is based on British and American
English corpora (Francis & Kucera, 1982; Zeno et al., 1995),
hand-labeled by a group of Linguistics graduate students.
Our coding scheme distinguishes 15 different patterns, including
passives, and sentential complements. Between 100 and 200
occurrences of 300 verbs were coded. One result of the
project is a detailed labelers manual discussing many of the
complications surrounding subcategorization counts.
Both "absolute" and "relative" criteria have been
used in describing verb transitivity biases. By the
"absolute" criterion, a verb is considered highly transitive if
the proportion of transitive uses exceeds some pre-determined
cut-off point, say 50%. By the "relative" method, the
proportion of transitive uses needs to exceed that of some
alternative pattern, say sentential complements. Studies
comparing different corpora have generally applied some version
of the "absolute" method, comparing the percentages of, for
example, transitive uses of a particular verb in different
corpora. However, most experimental studies of the
behavioral effects of verb biases have relied on the "relative"
method. One conclusion emerging from our cross-corpus
comparisons is that agreement among different corpora is
considerably higher if the "relative" method is used.
Adjectival passives (or "pseudo-passives" and
"semi-passives", cf. Quirk et al., 1985) superficially resemble
true passives, but their syntactic and aspectual properties are
those of adjectives, not verbs (cf. the examples in (1)
below). Adjectival passives are frequently counted as
transitive verb occurrences in available norming studies (e.g.,
Lalami, 2000; Lapata et al., to appear), partly because
form-based automatic extraction methods are unable to distinguish
adjectival and true passives. Analysis of our data shows
that adjectival passives account for an average of 8.1% of verb
occurrences, and as much as 85% of the "transitive" occurrences
of verbs like
locate
and
delight
. This affects the overall counts considerably: For
example, the transitivity biases of 16 out of the 59
"Psych-verbs" (Levin, 1993) in our data set change from "high" to
"low" or "mid" if adjectival passives are excluded.
Similarly, verb+particle combinations (e.g., look it up) make up
an average of 35% of "transitive" verb occurrences and have been
variously counted as transitive or non-transitive in previous
studies. How these forms are actually processed in human
parsing may be unclear, but their treatment in estimating verb
biases significantly affects the databases underlying sentence
processing research.
This analysis sheds further light on the sources
of differences among available corpus norms. The norming
study offers a valuable resource for research on sentence
processing.
|
(1)
|
a.
|
Adjectival Passive
|
|
|
|
The cloakrooms were located in
the basement and hard to find.
|
|
|
b.
|
True Passive
|
|
|
|
The missing children were finally
located.
|
References
Biber, D. (1988). Variation Across Speech
and Writing. Cambridge: Cambridge University Press.
Francis, W. & Kucera, H. (1982).
Frequency Analysis of English Usage: Lexicon and Grammar.
Boston: Houghton Mifflin.
Lalami, L. (1997). Frequency in Sentence
Comprehension. University of Southern California doctoral
dissertation.
Lapata, M., Keller, F., & Schulte im Walde, S.
(to appear). Verb frame frequency as a predictor of verb
bias. Journal of Psycholinguistic Research.
Levin, B. (1993). English Verb Classes and
Alternations. Chicago: Chicago University Press.
Merlo, P. (1994). A corpus-based analysis of
verb continuation frequencies for syntactic processing.
Journal of Psycholinguistic Research, 23.6: 435-457.
Quirk, R., Greenbaum, S., Leech, G., and J.
Svartvik. (1985). A Comprehensive Grammar of the English
Language. London: Longman.
Roland, Douglas & Daniel Jurafsky.
(1998). How verb subcategorization frequencies are affected
by corpus choice. Proceedings of COLING-ACL 1998, p
1117-1121.
Zeno, S. M., Ivens, S. H., Millard, R. T., &
Duvvuri, R. (1995). The Educator's Word Frequency
Guide. Touchstone Applied Science Associates, Inc.
|