Paper on Residual Advantage Learning

HARMONME harmonme at aa.wpafb.af.mil
Tue Jun 6 06:52:25 EDT 2006


The following paper, submitted to NIPS-95, is now available via WWW at the
following address: http://ace.aa.wpafb.af.mil/~aaat/harmon.html

===============================================================================
Residual Advantage Learning Applied to a Differential Game
 

Mance E. Harmon
Wright Laboratory
WL/AAAT Bldg. 635  2185 Avionics Circle
Wright-Patterson Air Force Base, OH  45433-7301
harmonme at aa.wpafb.mil
	
Leemon C. Baird III
U.S.Air Force Academy
2354 Fairchild Dr. Suite 6K41, USAFA, CO  80840-6234
baird at cs.usafa.af.mil



ABSTRACT

An application of reinforcement learning to a differential game is
presented.  The reinforcement learning system uses a recently developed
algorithm, the residual form of advantage learning.  The game is a Markov
decision process (MDP) with continuous states and nonlinear dynamics.  The
game consists of two players, a missile and a plane; the missile pursues
the plane and the plane evades the missile.  On each time step each player
chooses one of two possible actions; turn left or turn right 90 degrees.
Reinforcement is given only when the missile hits the plane or the plane
reaches an escape distance from the missile.  The advantage function is
stored in a single-hidden-layer sigmoidal network.  The reinforcement
learning algorithm for optimal control is modified for differential games
in order to find the minimax point, rather than the maximum.  As far as we
know, this is the first time that a reinforcement learning algorithm with
guaranteed convergence for general function approximation systems has been
demonstrated to work with a general neural network.

===============================================================================


More information about the Connectionists mailing list