|
Abstract:
The idea of structural preference principles is old (Kimball
73) but very useful in practice: structural preferences can be
tested on large data and allow to chose correct PP attachment in
Parse ranking applications without degrading the quality of parsing
(Kinyon 99a, b). Two such useful principles are: A- Prefer
arguments over adjuncts (eg. J. prefers his daughter to be honest
--> "To be honest" argument of "prefer" rather than sentence
modifier) B- Prefer to attach potential arguments to the closest
potential governor (eg. J. says that Peter talks to Mary --> "To
Mary" argument of "talk" rather than "say") A recurrent objection
to structural approaches is that it does not take into account
lexical preferences, such as preferences of realization of
arguments for verbs (Trueswell 96). But:
1- Very little data is available regarding these preferences,
esp. for languages other than English 2- The interaction between
two "preferred" realizations is unclear : For "Jean remercie
l'organisateur de la manifestation" (John thanks the organizer for
the demonstration / John thanks the organizer of the
demonstration), which attachment should be preferred assuming that
"remercier NP1 de NP2" and "organisateur de NP1" are the preferred
realizations respectively for "remercier" and for "organisateur" ?
3- Stuctural preferences still have an effect : "John put the book
that you were reading in the library" seems incomplete, although it
is syntactic and "put N1 in N2" is a frequent realization for
"put". 4- Unknown words, for instance in the context of second
language acquisition, are still processed (and thus attached)
although no data is available regarding the preference of
realizations of their arguments.
Our hypothesis is that regardless of which realization of
arguments a verb favors, if it can subcategorize a PP introduced by
a given Preposition P, then in practice when the verb and a PP
introduced by P appear in the same sentence, the PP is either an
argument of the verb, or in a position where it can not be argument
(i.e. argument of a closer potential governor, or located in
another clause such as inside a relative, or modifier only if the
verb is already saturated).
To validate our hypothesis, we extracted the 100 most frequent
verbs in LeMonde : a 1 million words annotated and chunked corpus
for French (Abeille & Clement 99). 56 of these verbs can
subcategorize PPs introduced by one or several prepositions, for a
total of 71 subcat frames. We then extracted for each subcat frame
all the sentences where Verb and Prep cooccur, looking at the
results manually.
Our main findings are the following : 1- Cases of possible
ambiguous attachment remain (13.86 % of the sentences examined) 2-
39% of these ambiguous cases are solved when attaching the PP to
the closest potential governor. Moreover, the attachment is deemed
correct in all cases. 3- The probability for a verb to realize as
an argument a PP introduced by a given Preposition P does not help
disambiguation and does not predict the proportion of ambiguous
attachments encountered when examining sentences where Verb and P
cooccur. 4- Rather, the preposition itself is important :
"à" yields much more ambiguity then other prepositions such
as "avec" or "pour" because it often introduces a temporal or
locational expression (e.g. "à l'assemblée nationale"
/ "à 3 heures"). In fact, 46% of the ambiguous cases
remaining after applying structural principles A and B are solved
by resorting to very simple semantic information : à +
location nouns , à + time nouns are overwhelmingly adjuncts
and not arguments.
We are left only with 4.6 % of ambiguous attachments (mainly set
phrases such as "lancer un appel au calme"), which can be
disambiguated by refining semantic disambiguation. Thus our
hypothesis is validated.
|