MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Incremental Processing and Non-lexical Structure Building: a Treebank Study

 Vincenzo Lombardo and Patrick Sturt
  
 

Abstract:

Many models of parsing assume that the processor behaves incrementally, reading and interpreting its input from left to right without delay. A natural assumption to make is that the parser maintains a fully connected syntactic structure as each word is read (e.g. the "Left-to-Right" constraint of Frazier and Rayner, 1988), and this view has recently received empirical support in a number of studies, including those investigating head-final constructions (Bader and Lasser, 1994; Yamashita, 1994; Hemforth et al, 1994).

However, if we accept incrementality, we also have to accept that some structure building and syntactic disambiguation cannot be directly lexically driven (see Crocker, 1994; Lombardo and Sturt, 1997), posing important questions for theories which claim a lexical basis for these processes (Pritchett, 1992; MacDonald et al., 1994). Consider the following sentence fragment, for example.

(1) John thought my... (brother was ill).

Here, the word "my" can only be incorporated into a grammatical connected representation via a path of nodes including some which have not yet been licensed by lexical input (e.g. the NP which will later be headed by "brother," and the clause which will be headed by "was"). The construction of such a path of nodes, which we will call "connection paths," introduces serious non-determinacy into fully incremental parsing, particularly when we consider realistically wide coverage grammars. If human perceivers construct such connection paths, they must employ systematic strategies and constraints, which by definition cannot be purely lexically driven. Our goal in the research reported in this abstract is to determine the nature of these strategies.

In order to consider this question, we have constructed a parsing simulation algorithm, which takes a treebank as its input, and builds a database of the connection paths required to arrive at the correct parse of each tree, assuming an incremental parsing algorithm. By examining these connection paths, we intend to analyse the requirements of non-lexical structure building in the incremental parsing of unrestricted language, and also to determine any systematic heuristics that may be at work, with a goal to defining general principles that guide the processor in such cases. We also intend to examine the extent to which lexical information (for example, subcategorization information associated with nodes on the right frontier) can influence the construction of connection paths. We will provide both a detailed description of the algorithm and a discussion of the results.

 
 


© 2010 The MIT Press
MIT Logo