MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Structural Preferences vs Lexical preferences: Some Empirical Data for French Verbs Subcategorizing a Prepositional Phrase

 Alexandra Kinyon
  
 

Abstract:
The idea of structural preference principles is old (Kimball 73) but very useful in practice: structural preferences can be tested on large data and allow to chose correct PP attachment in Parse ranking applications without degrading the quality of parsing (Kinyon 99a, b). Two such useful principles are: A- Prefer arguments over adjuncts (eg. J. prefers his daughter to be honest --> "To be honest" argument of "prefer" rather than sentence modifier) B- Prefer to attach potential arguments to the closest potential governor (eg. J. says that Peter talks to Mary --> "To Mary" argument of "talk" rather than "say") A recurrent objection to structural approaches is that it does not take into account lexical preferences, such as preferences of realization of arguments for verbs (Trueswell 96). But:

1- Very little data is available regarding these preferences, esp. for languages other than English 2- The interaction between two "preferred" realizations is unclear : For "Jean remercie l'organisateur de la manifestation" (John thanks the organizer for the demonstration / John thanks the organizer of the demonstration), which attachment should be preferred assuming that "remercier NP1 de NP2" and "organisateur de NP1" are the preferred realizations respectively for "remercier" and for "organisateur" ? 3- Stuctural preferences still have an effect : "John put the book that you were reading in the library" seems incomplete, although it is syntactic and "put N1 in N2" is a frequent realization for "put". 4- Unknown words, for instance in the context of second language acquisition, are still processed (and thus attached) although no data is available regarding the preference of realizations of their arguments.

Our hypothesis is that regardless of which realization of arguments a verb favors, if it can subcategorize a PP introduced by a given Preposition P, then in practice when the verb and a PP introduced by P appear in the same sentence, the PP is either an argument of the verb, or in a position where it can not be argument (i.e. argument of a closer potential governor, or located in another clause such as inside a relative, or modifier only if the verb is already saturated).

To validate our hypothesis, we extracted the 100 most frequent verbs in LeMonde : a 1 million words annotated and chunked corpus for French (Abeille & Clement 99). 56 of these verbs can subcategorize PPs introduced by one or several prepositions, for a total of 71 subcat frames. We then extracted for each subcat frame all the sentences where Verb and Prep cooccur, looking at the results manually.

Our main findings are the following : 1- Cases of possible ambiguous attachment remain (13.86 % of the sentences examined) 2- 39% of these ambiguous cases are solved when attaching the PP to the closest potential governor. Moreover, the attachment is deemed correct in all cases. 3- The probability for a verb to realize as an argument a PP introduced by a given Preposition P does not help disambiguation and does not predict the proportion of ambiguous attachments encountered when examining sentences where Verb and P cooccur. 4- Rather, the preposition itself is important : "à" yields much more ambiguity then other prepositions such as "avec" or "pour" because it often introduces a temporal or locational expression (e.g. "à l'assemblée nationale" / "à 3 heures"). In fact, 46% of the ambiguous cases remaining after applying structural principles A and B are solved by resorting to very simple semantic information : à + location nouns , à + time nouns are overwhelmingly adjuncts and not arguments.

We are left only with 4.6 % of ambiguous attachments (mainly set phrases such as "lancer un appel au calme"), which can be disambiguated by refining semantic disambiguation. Thus our hypothesis is validated.

 
 


© 2010 The MIT Press
MIT Logo