supervised learning

Thu Jun 1 12:54:10 EDT 1989

I agree that there is some artificiality in the distinction between
supervised and unsupervised learning.  Traditionally, the distinction seems
to be made based on whether the learning algorithm has available to it
the desired outputs of the network.  If the algorithm can be described by
an energy function (such as minimal output error for Backpropagation),
then supervised learning seems to require an energy function which
explicitly includes the desired outputs of the network.  Unsupervised
learning (in some cases) can be described by energy functions which
specify properties or statistics of the outputs (or sometimes the weights).
  However, it seems easy to imagine a continuum of energy functions between
these two types.  For example, a BP net which requires outputs to be close
to the desired points within some tolerance, or an unsupervised algorithm 
designed so that the desired properties of the outputs can be satisfied
only by specific output values which are actually the desired outputs.
And what about an algorithm whose energy function is the mean squared
distance to the desired weights themselves?  (Learning in this case is
very easy and fast, but not particularly interesting!)  
  This is complicated by the fact that very often the same solution
(final set of weights) can be obtained from very different algorithms with
very different energy functions  (imagine training a BP network to 
converge to a Kohonen network).  So there seems to be an artificial
distinction between supervised nets (gradient descent on an energy function
defined by mean-squared output error) and unsupervised nets (everything else).
  But there does seem to be some nice intuitive idea behind this.  Perhaps
it is based on the difference between giving someone the right answer to
a math problem and teaching them how to solve it themselves.  In other words,
in real life we seem to make the supervised/unsupervised distinction quite
naturally.  Does anyone have any ideas why this distinction is so pervasive?

			Terry Sanger
			tds at wheaties.ai.mit.edu