Reinforcement learning papers

Tue May 30 15:25:57 EDT 1995

This is to announce the availability of two new postscript preprints:

	High-Performance Job-Shop Scheduling With A Time-Delay 
			TD($\lambda$) Network
		Wei Zhang and Thomas G. Dietterich
	  		submitted to NIPS-95
  ftp://ftp.cs.orst.edu/users/z/zhangw/papers/zhang-tgd-nips95.ps.gz

Abstract:
Job-shop scheduling is an important task for manufacturing industries.
We are interested in the particular task of scheduling payload
processing for NASA's space shuttle program.  This paper summarizes
our previous work on formulating this task for solution by the
reinforcement learning algorithm $TD(\lambda)$.  A shortcoming of this
previous work was its reliance on hand-engineered input features.
This paper shows how to extend the time-delay neural network (TDNN)
architecture to apply it to irregular-length schedules.  Experimental
tests show that this TDNN-$TD(\lambda)$ network can match the
performance of our previous hand-engineered system.  The tests also
show that both neural network approaches significantly out-perform the
best previous (non-learning) solution to this problem in terms of the
quality of the resulting schedules and the number of search steps
required to construct them.

	Value Function Approximations and Job-Shop Scheduling
		Wei Zhang and Thomas G. Dietterich
	submitted to Workshop of Value Function Approximation 
		in Reinforcement Learning in ML-95
  ftp://ftp.cs.orst.edu/users/z/zhangw/papers/zhang-tgd-ml95rl.ps.gz

Abstract
We report a successful application of TD($\lambda$) with value
function approximation to the task of job-shop scheduling.  Our
scheduling problems are based on the problem of scheduling payload
processing steps for the NASA space shuttle program.  The value
function is approximated by a 2-layer feedforward network of sigmoid
units.  A one-step lookahead greedy algorithm using the learned
evaluation function outperforms the best existing algorithm for this
task, which is an iterative repair method incorporating simulated
annealing.  To understand the reasons for this performance
improvement, this paper introduces several measurements of the
learning process and discusses several hypotheses suggested by these
measurements.  We conclude that the use of value function
approximation is not a source of difficulty for our method, and in
fact, it may explain the success of the method independent of the use
of value iteration.  Additional experiments are required to
discriminate among our hypotheses.

The following reinforcement learning paper is also available at the site: 

Zhang, W. and Dietterich, T., A Reinforcement Learning Approach to 
Job-shop Scheduling, to appear in Proc. IJCAI-95, 1995.
ftp://ftp.cs.orst.edu/users/z/zhangw/papers/zhang-tgd-ijcai95.ps.gz