| |
Abstract:
Diverse recent studies indicate that violations of syntactic
constraints are processed differently from violations of semantic
constraints (Brain imaging: e.g., Ainsworth-Darnell et al., 1997;
Ni et al., in press; Speeded grammaticality judgment: McElree and
Griffith, 1995; Eye-tracking: Ni et al., 1998). Usually, these
results are taken as support for the view that the processor
employs two separate modules for enforcing the two classes of
constraints. But this account leaves open the question of how a
learner decides that a particular systematicity in language belongs
to one constraint system or the other. (Why is "Dogs moo" a
semantic violation while "Dogs barks" is a syntactic one?) Several
studies of learning connectionist networks show them developing
distinct responses to syntactic and semantic violations without
architectural modularity (Plaut, 1999; Tabor and Tanenhaus, 1999;
Rohde and Plaut, in press). The connectionist studies are appealing
because they derive a distinction between the two types rather than
stipulating it, and they are explicit about how the distinction
could be learned. But the source of the distinction in the
connectionist studies has been unclear up to now.
We report on a replication of Rohde and Plaut's simulation as
well as two new simulation studies which make it clear how the
contrast is learned and why a modular architecture assumption is
not necessary. Our new simulations used a Simple Recurrent Network
(SRN) and focused on the distinction between (semantic) selectional
constraints and (syntactic) subcategorization constraints, one of
the subtlest kind of syntax/semantics distinctions.
Simulation 1 examined a very simple language in order to
bootstrap our understanding of more complex cases: only three
phrase types and random semantic feature assignment were used. The
network organized its hidden representations into three major
clusters corresponding to the three phrase types and many
subclusters corresponding to the semantic contrasts within the
phrase types. A single principle governed the network's response to
all violations. Violations are cases where the information provided
by the current word clashes with the information provided by the
preceding context. The network responds to such clashes by
averaging the conflicting signals. In the case of selection
violation, this averaging puts the hidden representation outside of
the subclusters but within a major cluster. In the case of
syntactic violation, averaging puts the network between major
clusters (and there is no containing supercluster). If we assume
that reaction to a violation is slow when it is in the same cluster
and hence confusable with a familiar grammatical case, this finding
models the behavioral results identified above.
Simulation 2 scaled these results up to a more complex case
involving realistic noun classes. Whereas Rohde and Plaut's
large-scale simulation examined predictions made by the network
immediately preceding a violation, we studied its responses to the
violations themselves. In support of the interpretation suggested
by our Simulation 2 analysis, we found clear differences between
the average minimum distance away from familiar grammatical cases
in well-formed test cases (0.040), selection violations(0.176) and
subcategorization violations (0.360). All pairwise contrasts in
means were significant with p < .001 and subcategorization
violations were significantly farther from grammatical cases than
semantic violations were. These results indicate that again,
confusability is the distinguishing factor.
|