Tech. Report available

Mon Oct 31 14:57:00 EST 1988

The following Tech. Report is available. Requests should be sent to
"SMITH at cs.umass.edu".

      A Stochastic Algorithm for Learning Real-valued Functions
                     via Reinforcement Feedback

                       Vijaykumar Gullapalli

                   COINS Technical Report 88-91
                    University of Massachusetts
                         Amherst, MA 01003

                             ABSTRACT

Reinforcement learning is the process by which the probability of the
response of a system to a stimulus increases with reward and decreases
with punishment. Most of the research in reinforcement learning (with
the exception of the work in function optimization) has been on
problems with discrete action spaces, in which the learning system
chooses one of a finite number of possible actions. However, many
control problems require the application of continuous control
signals. In this paper, we present a stochastic reinforcement learning
algorithm for learning functions with continuous outputs. Our
algorithm is designed to be implemented as a unit in a connectionist
network. We assume that the learning system computes its real-valued
output as some function of a random activation generated using the
Normal distribution. The activation at any time depends on the two
parameters, the mean and the standard deviation, used in the Normal
distribution, which, in turn, depend on the current inputs to the
unit. Learning takes place by using our algorithm to adjust these two
parameters so as to increase the probability of producing the optimal
real value for each input pattern.  The performance of the algorithm
is studied by using it to learn tasks of varying levels of difficulty.
Further, as an example of a potential application, we present a
network incorporating these real-valued units that learns the inverse
kinematic transform of a simulated 3 degree-of-freedom robot arm.