MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Antigrammar

 William William
  
 

Abstract:
The traditional approach to designing a parser for natural language has involved encoding the rules of a grammar for a given language, and repetitively matching rules of the grammar against input text. This is done until either a match has been found, in which case the parse is successful, or until no match can been found, in which case the parse crashes. Seeking to improve parser performance, more recent implementations have used probabilistic grammars, roughly equivalent to traditional linguistic grammars except with attached probabilities (Charniak 1993, Marcus 1998). The success of these parsers has been crucially dependent on linearly ordered rules. Linearly ordered rules are readily encoded for languages such as English, for which the word order is relatively fixed. Little work, however, has been done with languages in which the word order is much freer. Although linearly ordered rules are viable for such languages as well, they would be much more cumbersome to implement due to the number of rules that would need to be defined (Karttunen 1985).

In Russian, for example, the word order of constituents is relatively free. Each of the 12 possible renderings of (1) is to a varying degree acceptable. Using the traditional parser approach, separate rules for each combination would be necessary. Parsers implemented using GPSG-like (Gazdar 1985) or HPSG-like (Pollard and Sag 1994) formalisms can escape this problem for Russian by encoding rules which indicate immediate dominance (ID) relations, but do not encode linear precedence (LP). Such rules would have the form as shown in (2), and differ from traditional grammatical rules in that order is not dictated, rather composition without ordering.

(1) sevodnja boris pracital etu knigu
today Boris read this:ACC book:ACC

(2) S -> Adv, V, NP[nom], NP[acc]

Implementing parsers using such formalisms would appear to solve the problem for languages with freer word order. However, most free-word order languages have constraints on the orders that constituents can take (Russian is freer than most). Such constraints on order might restrict the fronting of certain constituents (such as the verb), or impose ordering restrictions on other constituents within clauses or phrases. Using ID rules alone will not successfully parse such languages, in that some unacceptable orderings would be accepted. One would be forced to encode a set of LP rules, indicating the acceptable structures for the language.

An alternative approach is possible: rather than encoding all possible output forms for a given language into a set of LP rules, one could encode just those rules which violate the grammar. Such rules represent an "antigrammar" for the language being parsed. ID rules of the form shown in (2) would still be necessary, except that subordinate to such rules might be antirules of the form shown in (3). The antirule in (3) represents one impossible (or less acceptable) structure for the ID rule in (2).

(3) V > Adv, NPnom, NPacc

One could use antirules in a metagrammatic fashion by building a pre-parser which generates the set of valid rules for any given ID rule, expressly not generating the set of antirules. Another approach would be to treat antirules as filters to eliminate invalid parses.

Charniak, E. (1993). Statistical language learning. Cambridge, MA: MIT Press.
Gazdar, G., E. Klein, G. K. Pullum, and I. A. Sag (1985). Generalized Phrase Structure Grammar. Oxford: Basil Blackwell Publisher Ltd.
Karttunnen, L. and M. Kay (1985). Parsing in free word order language. In Dowty, D., L. Karttunen, and A. Zwicky (eds), Studies in Natural Language Processing. Cambridge: Cambridge University Press.
Marcus, M. (1998). Progress in Statistical Parsing: Statistics vs. Representations. Paper presented to the Eleventh Annual CUNY Conference on Human Sentence Processing, Rutgers NJ, 1998.

 
 


© 2010 The MIT Press
MIT Logo