TR available: PRODUCT UNIT LEARNING

Thu Jan 25 17:02:39 EST 1996

The following Technical Report is available via the University of Maryland
Department of Computer Science and the NEC Research Institute archives: (A
short version of this TR was published in NIPS7)
_____________________________________________________________________

                             PRODUCT UNIT LEARNING

Technical Report UMIACS-TR-95-80 and CS-TR-3503, Institute for 
Advanced Computer Studies, University of Maryland, College Park, MD 20742

Laurens R. Leerink{a}, C. Lee Giles{b,c}, Bill G. Horne{b}, Marwan A.Jabri{a}

{a}SEDAL, Dept. of Electrical Engineering, The U. of Sydney, Sydney, NSW 2006, Australia
{b}NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA
{c}UMIACS, U. of Maryland, College Park, MD 20742, USA

                             ABSTRACT

Product units provide a method of automatically learning the higher-order
input combinations required for the efficient synthesis of Boolean logic
functions by neural networks. Product units also have a higher information
capacity than sigmoidal networks. However, this activation function has not
received much attention in the literature. A possible reason for this is
that one encounters some problems when using standard backpropagation to
train networks containing these units. This report examines these problems,
and evaluates the performance of three training algorithms on networks of
this type. Empirical results indicate that the error surface of networks
containing product units have more local minima than corresponding networks
with summation units. For this reason, a combination of local and global
training algorithms were found to provide the most reliable convergence.

We then investigate how `hints' can be added to the training algorithm. By
extracting a common frequency from the input weights, and training this
frequency separately, we show that convergence can be accelerated.

A constructive algorithm is then introduced which adds product units to a
network as required by the problem. Simulations show that for the same
problems this method creates a network with significantly less neurons than
those constructed by the tiling and upstart algorithms.

In order to compare their performance with other transfer functions,
product units were implemented as candidate units in the Cascade
Correlation (CC) {Fahlman90} system. Using these candidate units resulted
in smaller networks which trained faster than when the any of the standard
(three sigmoidal types and one Gaussian) transfer functions were used. This
superiority was confirmed when a pool of candidate units of four different
nonlinear activation functions were used, which have to compete for
addition to the network. Extensive simulations showed that for the problem
of implementing random Boolean logic functions, product units are always
chosen above any of the other transfer functions.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

http://www.neci.nj.nec.com/homepages/giles.html
http://www.cs.umd.edu/TRs/TR-no-abs.html

or

ftp://ftp.nj.nec.com/pub/giles/papers/UMD-CS-TR-3503.product.units.neural.nets.ps.Z

----------------------------------------------------------------------------

--                                 
C. Lee Giles / Computer Sciences / NEC Research Institute / 
4 Independence Way / Princeton, NJ 08540, USA / 609-951-2642 / Fax 2482
www.neci.nj.nec.com/homepages/giles.html
==