MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Neural Network Pruning and Topology for Natural Language Processing

 Jaime J. Davila
  
 

Abstract:

Among the different technics used for automated natural language processing, neural networks have been used to varying degrees of success. Different researchers have used topologies ranging from fully connected to very finely designed and controlled (see, for example, Jain, 1991). The question of how to determine what topology is optimal for a given task is still open.

In this poster I will report results of empirical experiments using genetic algorithms as a means of discovering optimal network topology for a natural language task. The language consists of over 500 sentences divided into three levels of complexity. At the highest level of complexity are sentences such as <mary saw that the duck swam in the river>. A sentence is presented to the network one word at a time by activating the equivalent of its lexical value at the input layer. The network incrementally generates a description of the input and demonstrates its understanding by correctly identifying: (1) each word in the sentence, e.g. <swam> is a past tense movement verb; (2) each phrase in the sentence, e.g. <the duck> is a noun phrase; and (3) the relationships among phrases, e.g. <mary> is the agent of <saw>. The network is trained with 20% of the corpus, and performance is measured by testing the network on the complete language.

The key factor in measuring performance is generalization. A network with too many connections will manage to memorize the training data, decreasing performance when generalizing to novel data (Caudill, 1990). My experiments confirm that a fully connected network is not optimal for the task described above. In order to increase generalization performance, some researchers have opted to prune weights off the network once it has been trained. Most pruning algorithms commonly being used delete one weight on each iteration and continue to iterate until some performance threshold is achieved. These methods assume that greedy deletion of weights is optimal, which is not true for all cases.

Unlike greedy pruning algorithms, the genetic algorithm randomly alters the number of connections and penalizes networks based on how many connections they use. This penalty is modified from one experiment to another in order to determine how the number of weights affects the ability of the network to generalize. It then combines the topologies of the better networks in an attempt to produce better ones. These experiments will produce a better measure of the importance of this factor in generalization performance.

Caudill, M. (1990). "Using Neural Nets: Diagnostic Expert Nets, Pruning Trained Networks." AI Expert, September 1990, p. 45

Jain, A. (1991). "Parsing Complex Sentences with Structured Connectionist Networks." Neural Computation, volume 3, pp.110-120

 
 


© 2010 The MIT Press
MIT Logo