Paper Announcement (Neuroprose)

Wed Oct 23 18:33:42 EDT 1991

>		      Simplifying Neural Network
>		     Soft Weight-Sharing Measures
>				  by
>			 Soft Weight-Measure
>			 Soft Weight Sharing
>	
>			  Barak Pearlmutter
>		       Department of Psychology
>		      P.O. Box 11A Yale Station

I enjoyed this take-off immensely.  

Determining good regularisers (or priors) is a major problem facing
feed-forward network research (and related representations), so I also
enjoyed the original Nowlan-Hinton paper.  Dramatic performance
improvements can be got by careful choice of regulariser/prior (I know
this from my tree research), and its a bit of a black art right now,
though I have some good directions.  Nowlan & Hinton suggest a strong
theoretical basis exists for their approach (see their section 8), so
perhaps we'll see more of this style, and "cleaner" versions to keep
the theoreticians happy.

By the way, at CLNL in Berkeley in August I expressed the view that
this problem: i.e.

Regularizers
------------
	for a given network/activation-function configuration,
	what are suitable parameterised families of regularizes,
	and how might the parameters be set from the knowledge
	of the particular application being addressed
NB.  the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's
     equation (1) has several fairly elegant and practical solutions

along with:

Training
--------
	decision-theoretic/bounded-rationality approaches to 
	batch vs. block (sub-batch) vs. pattern updates during gradient 
	descent (i.e. of back-prop.)
	(i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion,
	      or the batch update vs. stochastic update problem)
	and subsequent addition of second-order gradient methods

as two of the most pressing problems to make feed-forward networks
a "mature" technology that will then supercede many earlier 
non-neural methods.  

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 244-17                          fax:    (415) 604 6997
Moffett Field, CA, 94035 		  email:  wray at ptolemy.arc.nasa.gov

PS.thanks also to Martin Moller for adding some meat to the Training
   problem:
>     An interesting observation is that the number of blocks needed
>     to make an update is growing during learning so that after a certain
>     number of epochs the blocksize is equal to the number of patterns.
>     When this happens the algorithm is equal to a traditional batch-mode
>     algorithm and no validation is needed anymore.
  When explaining batch update vs. stochastic update to people,
  I always use this behaviour as an example of what a decision-theoretic 
  training scheme **should** do, so I'm glad you've confirmed it
  experimentally.