The Four-Quadrant Problem

Mon Sep 12 12:24:07 EDT 1988

Well, so much for my quest for a problem which *requires* more than 2 layers.
Hopefully this will exhaust the issue ...

The conclussions are:
   - for a 2-layer net of a finite number of threshold units can't do the
	4-quad problem to arbitrary accuracy (I'll demonstrate in a moment).
   - any arbitrarily large subspace *can* be approximated to arbitrary
	accuracy with a finite number of nodes.
   - with non-hard-threshold units (including sigmoids) and assuming infinite 
	presision (and you *need* that infinite precision) you *can* do the 
	4-quad problem with 2 layers.

Demonstration that 2-Layers of finite numbers of threshold units can't do it:

Assume that it can be done.  Each node in the first layer creates linear 
partitions in the plane described by the input space.  This finite set of 
partions (lines) intersect at a finite number of points.  Consider a circle 
centered at the origin which encloses all of these intersections.  Assume 
without loss of generality that quads 1&3 have a greater value than quads 
2&4 which is subsequently thresholded by the second layer node.  Above the 
circle, crossing from left to right across the y-axis (or any "don't care" 
band) must result in a net gain (since quad-2 < quad-1) so the weights 
connecting those nodes to the output node must sum > 0.  Below the circle a 
simular argument has the same sum < 0.  The sum can't be both > and < 0, a 
contradiction, therefore it can't be done.

 *BUT*

Doing It With Other than thresholding units:

If the hidden layer's transfer function f(x) = 0 for x<=0 and f(x) = x for 
x>=0 Ron Chrisley showed me at INNS that you can use a four node hidden layer.
Given input weights (w,w), (-w,-w), (w,-w), (-w, w) on the first layer
(no thresholds) and weights (w2, w2, -w2, -w2) from the hidden layer
to the thresholding output node you've got it.

I would add that you can approximate the semi-linear node to arbitrary
accuracy with thresholding units if you only want to go out a finite distance.

Also, you're really only using the fact that the transfer function is 
monotonically increasing -- as is a sigmoid (at least in theory).  So if 
the four nodes in the hidden layer have *any* non-zero threshold you can 
use the save network and use sigmoid units.  Eventually your top node will
have sigmoid(x + threshold) - sigmoid(x) as its input (which gets *very*
small very fast) but in theory this should work.

In conclussion, this is a task that is quite simple with a 3 layer net
(put a partition on the x an y axis and XOR their outputs).  Instead,
you can use an infinite number of nodes in 2 layers or see how well
your particular computer deals with numbers like 10**-(10**10**10**...)
with sigmoid nodes.  In short, while it is clearly cleaner and more
compact to do it with 3 layers, it *can* be done in theory with networks
(which are not pragmatically realizable) with 2-layers.

Alexis Wieland
wieland at mitre.arpa