Reinforcement Learning Workshop - Call for Participation

Fri Jun 11 13:44:08 EDT 1993

                       LAST CALL FOR PARTICIPATION

           "REINFORCEMENT LEARNING: What We Know, What We Need"

   an Informal Workshop to follow ML93 (10th Int. Conf. on Machine Learning) 
           June 30 & July 1, University of Massachusetts, Amherst

Reinforcement learning is a simple way of framing the problem of an
autonomous agent learning and interacting with the world to achieve a goal.
This has been an active area of machine learning research for the last 5
years. The objective of this workshop is to present concisely the current
state of the art in reinforcement learning and to identify and highlight
critical open problems.

The intended audience is all learning researchers interested in reinforcement
learning. The first half of the workshop will be mainly tutorial while the
second half will define and explore open problems. The entire workshop will
last approximately one and three-quarters days. It is possible to register
for the workshop but not the conference, but attending the conference is
highly recommended as many new RL results will be presented in the
conference and these will not be repeated in the workshop. Registration
information is given at the end of this message.

Program Committee: Rich Sutton (chair), Nils Nilsson, Leslie Kaelbling,
Satinder Singh, Sridhar Mahadevan, Andy Barto, Steve Whitehead

............................................................................

                           PROGRAM INFORMATION

The following draft program is divided into "sessions", each consisting of a
set of presentations on a single topic. The earlier sessions are more "What
we know" and the later sessions are more "What we Need", although some of
each will be covered in all sessions. Sessions last 60-120 minutes and are
separated by 30 minute breaks. Each session has an organizer and a series of
speakers, one of which is likely to be the organizer herself. In most cases
the speakers are meant to cover a body of work, not just their own, as a
survey directed at identifying and explaining the key issues and open
problems. The organizer works with the speakers to assure this (the organizer
also has primary responsibility for picking the speakers, and chairs the
session). 

*****************************************************************************
PRELIMINARY SCHEDULE:

June 30:

 9:00--10:30    Session 1: Defining Features of RL
10:30--11:00    Break
11:00--12:30    Session 2: RL and Dynamic Programming
12:30--2:00     Lunch
 2:00--3:30     Session 3: Theory: Stochastic Approximation and Convergence
 3:30--4:00     Break
 4:00--5:00     Session 4: Hidden State and Short-Term Memory

July 1:

 9:00--11:00    Session 5: Structural Generalization: Scaling RL to Large
State Spaces
11:00--11:30    Break
11:30--12:30    Session 6: Hierarchy and Abstraction
12:30--1:30     Lunch
 1:30--2:30     Session 7: Strategies for Exploration
 2:30--3:30     Session 8: Relationships to Neuroscience and Evolution

*****************************************************************************
PRELIMINARY PROGRAM

---------------------------------------------------------------------------
Session 1: Defining Features of Reinforcement Learning
Organizer: Rich Sutton, rich at gte.com

"Welcome and Announcements" by Rich Sutton, GTE (10 minutes)
"History of RL" by Harry Klopf, WPAFB (25 minutes)
"Delayed Reward: TD Learning and TD-Gammon" by Rich Sutton, GTE (50 minutes)

The intent of the first two talks is to start getting across certain key
ideas about reinforcement learning: 1) RL is a problem, not a class of
algorithms, 2) the distinguishing features of the RL problem are
trial-and-error search and delayed reward. The third talk is a tutorial
presentation of temporal-difference learning, the basis of learning methods
for handling delayed reward. This talk will also present Gerry Tesauro's
TD-Gammon, a TD-learning system that learned to play backgammon at a
grandmaster level. (There is still an outside chance that Tesauro will be able
to attend the workshop and present TD-Gammon himself.)
---------------------------------------------------------------------------
Session 2: RL and Dynamic Programming 
Organizer: Andy Barto, barto at cs.umass.edu

"Q-learning" by Chris Watkins, Morning Side Inc (30 minutes)
"RL and Planning" by Andrew Moore, MIT (30 minutes)
"Asynchronous Dynamic Programming" by Andy Barto, UMass (30 minutes)

These talks will cover the basic ideas of RL and its relationship to dynamic
programming and planning.  Including Markov Decision Tasks.
---------------------------------------------------------------------------
Session 3: New Results in RL and Asynchronous DP
Organizer: Satinder Singh, singh at cs.umass.edu

"Introduction, Notation, and Theme" by Satinder P. Singh, UMass
"Stochastic Approximation: Convergence Results" by T Jaakkola & M Jordan, MIT
"Asychronous Policy Iteration" by Ron Williams, Northeastern
"Convergence Proof of Adaptive Asynchronous DP" by Vijaykumar Gullapalli, UMass
"Discussion of *some* Future Directions for Theoretical Work" by ?

This session consists of two parts. In the first part we present a new and
fairly complete theory of (asymptotic) convergence for reinforcement learning
(with lookup tables as function approximators). This theory explains RL
algorithms as replacing the full-backup operator of classical dynamic
programming algorithms by a random backup operator that is unbiased. We
present an extension to classical stochastic approximation theory (e.g.,
Dvoretzky's) to derive probability one convergence proofs for Q-learning,
TD(0), and TD(lambda), that are different, and perhaps simpler, than
previously available proofs. We will also use the stochastic approximation
framework to highlight the contribution made by reinforcement learning
algorithms such as TD, and Q-learning, to the entire class of iterative
methods for solving the Bellman equations associated with Markovian Decision
Tasks. 
          The second part deals with contributions by RL researchers to
asynchronous DP.  Williams will present a set of algorithms (and convergence
results) that are asynchronous at a finer grain than classical asynchronous
value iteration, but still use "full" backup operators. These algorithms are
related to the modified policy iteration algorithm of Puterman and Shin, as
well as to the ACE/ASE (actor-critic) architecture of Barto, Sutton and
Anderson. Subsequently, Gullapalli will present a proof of convergence for
"adaptive" asynchronous value iteration that shows that in order to ensure
convergence with probability one, one has to place constraints on how many
model-building steps have to be be performed between two consecutive updates
of the value function.
        Lastly we will discuss some pressing theoretical questions
regarding rate of convergence for reinforcement learning algorithms.
---------------------------------------------------------------------------
Session 4: Hidden State and Short-Term Memory
Organizer: Lonnie Chrisman, lonnie.chrisman at cs.cmu.edu
Speakers: Lonnie Chrisman & Michael Littman, CMU

Many realistic agents cannot directly observe every relevant aspect of their
environment at every moment in time. Such hidden state causes problems for
many reinforcement learning algorithms, often causing temporal differencing
methods to become unstable and making policies that simply map sensory input
to action insufficient.

In this session we will examine the problems of hidden state and of learning
how to best organize short-term memory. I will review and compare existing
approaches such as those of Whitehead & Ballard, Chrisman, Lin & Mitchell,
McCallum, and Ring. I will also give a tutorial on the theories of Partially
Observable Markovian Decision Processes, Hidden Markov Models, and related
learning algorithms such as Balm-Welsh/EM as they are relevant to
reinforcement learning.

Note: Andrew McCallum will present a paper on this topic as part of the
conference; that material will not be repeated in the workshop.
---------------------------------------------------------------------------
Session 5: Structural Generalization: Scaling RL to Large State Spaces
Organizer: Sridhar Mahadevan, sridhar at watson.ibm.com

"Motivation and Introduction" by Sridhar Mahadevan, IBM
"Neural Nets" by Long-Ji Lin, Siemens
"CMAC" by Tom Miller, Univ. New Hampshire
"Kd-trees and CART" by Marcos Salganicoff, UPenn
"Learning Teleo-Reactive Trees" by Nils Nilsson, Stanford
"Function Approximation in RL: Issues and Approaches" by Richard Yee, UMass
"RL with Analog State and Action Vectors", Leemon Baird, WPAFB

RL is slow to converge in tasks with high-dimensional continuous state
spaces, particularly given sparse rewards. One fundamental issue in
scaling RL to such tasks is structural credit assignment, which deals
with inferring rewards in novel situations.  This problem can be
viewed as a supervised learning task, the goal being to learn a
function from instances of states, actions, and rewards. Of course,
the function cannot be stored exhaustively as a table, and the
challenge is devise more compact storage methods.  In this session we
will discuss some of the different approaches to the structural
generalization problem.

Note: Steve Whitehead & Rich Sutton will present a paper on this topic as
part of the confernece; that material will not be repeated in the workshop.
---------------------------------------------------------------------------
Session 6: Hierarchy and Abstraction 
Organizer: Leslie Kaelbling, lpk at cs.brown.edu
Speakers: To be determined

Too much of RL is concerned with low-level actions and low-level (single time
step) models. How can we model the world, and plan about actions, at a higher
level, or over longer time scales? How can we integrate models and actions at
different time scales and levels of abstraction? To address these questions,
several researchers have proposed models of hierarchical learning and
planning, e.g., Satinder Singh, Mark Ring, Chris Watkins, Long-ji Lin, Leslie
Kaelbling, and Peter Dayan & Geoff Hinton. The format for this session will
be a brief introduction to the problem by the session organizer followed by
short talks and discussion. Speakers have not yet been determined.

Note: Kaelbling will also speak on this topic as part of the conference; that
material will not be repeated in the workshop.
-----------------------------------------------------------------------------
Session 7: Strategies for Exploration
Organizer: Steve Whitehead, swhitehead at gte.com

Exploration is essential to reinforcement learning, since it is through
exploration, that an agent learns about its environment. Naive exploration
can easily result in intractably slow learning. On the other hand,
exploration strategies that are carefully structured or exploit external
sources of bias can do much better.

A variety of approaches to exploration have been devised over the last few
years (e.g., Kaelbling, Sutton, Thrun, Koenig, Lin, Clouse, Whitehead). The
goal of this session is to review these techniques, understand their
similarities and differences, understand when and why they work, determine
their impact on learning time, and to the extent possible organize them
taxonomically.

The session will consist of a short introduction by the session organizer
followed by a open discussion. The discussion will be informal but aimed at
issues raised during the monologue. An informal panel of researchers will be
on hand to participate in the discussion and answer questions about their
work in this area.
-----------------------------------------------------------------------------
Session 8: Relationships to Neuroscience and Evolution
Organizer: Rich Sutton, rich at gte.com

We close the workshop with a reminder of RL's links to neuroscience and to
Genetic Algorithms / Classifier Systems:

"RL in the Brain: Developing Connections Through Prediction" by R Montague, Salk
"RL and Genetic Classifier Systems" by Stewart Wilson, Roland Institute

Abstract of first talk:
Both vertebrates and invertebrate possess diffusely projecting
neuromodulatory systems. In the vertebrate, it is known that these systems
are involved in the development of cerebral cortical structures and can
deliver reward and/or salience signals to the cerebral cortex and other
structures to influence learning in the adult. Recent data in primates
suggest that this latter influence obtains because changes in firing in
nuclei that deliver the neuromodulators reflect the difference in the
predicted and actual reward, i.e., a prediction error. This relationship is
qualitatively similar to that predicted by Sutton and Barto's classical
conditioning theory. These systems innervate large expanses of cortical and
subcortical turf through extensive axonal projections that originate in
midbrain and basal forebrain nuclei and deliver such compounds as dopamine,
serotonin, norepinephrine, and acetylcholine to their targets. The small
number of neurons comprising these subcortical nuclei relative to the extent
of the territory their axons innervate suggests that the nuclei are reporting
scalar signals to their target structures. These facts are synthesized into a
single framework which relates the development of brain structures and
conditioning in adult brains. We postulate a modification to Hebbian accounts
of self-organization: Hebbian learning is conditional on a incorrect
prediction of future delivered reinforcement from a diffuse neuromodulatory
system. The reinforcement signal is derived both from externally driven
contingencies such as proprioception from eye movements as well as from
internal pathways leading from cortical areas to subcortical nuclei. We
suggest a specific model for how such predictions are made in the vertebrate
and invertebrate brain. We illustrate the framework with examples ranging
from the development of sensory and sensory-motor maps to foraging behavior
in bumble-bees.

******************************************************************************
GENERAL INFO ON REGISTERING FOR ML93 AND WORKSHOPS:

        Tenth International Conference on Machine Learning (ML93)
        ---------------------------------------------------------

The conference will be held at the University of Massachusetts in Amherst,
Massachusetts, from June 27 (Sunday) through June 29 (Tuesday).  The con-
ference will feature four invited talks and forty-six paper presentations.
The invited speakers are Leo Breiman (U.C. Berkeley, Statistics), Micki Chi
(U. Pittsburgh, Psychology), Michael Lloyd-Hart (U. Arizona, Adaptive Optics
Group of Steward Observatory), and Pat Langley (Siemens, Machine Learning). 
Following the conference, there will be three informal workshops:

  Workshop #A:
    Reinforcement Learning: What We Know, What We Need (June 30 - July 1)
    Organizers: R. Sutton (chair), N. Nilsson, L. Kaelbling, S. Singh,
                S. Mahadevan, A. Barto, S. Whitehead

  Workshop #B:
    Fielded Applications of Machine Learning (June 30 - July 1)
    Organizers: P. Langley, Y. Kodratoff

  Workshop #C:
    Knowledge Compilation and Speedup Learning (June 30)
    Organizers: D. Subramanian, D. Fisher, P. Tadepalli

Options and fees:

Conference registration fee                     $140    regular
                                                $110    student
Breakfast/lunch meal plan (June 27-29)           $33
Dormitory housing (nights of June 26-28)         $63    single occupancy
                                                 $51    double occupancy
Workshop A (June 30-July 1)                      $40
Workshop B (June 30-July 1)                      $40
Breakfast/lunch meal plan (June 30-July 1)       $22
Dormitory housing (nights of June 29-30)         $42    single occupancy
                                                 $34    double occupancy
Workshop C (June 30)                             $20
Breakfast/lunch meal plan (June 30)              $11
Dormitory housing (night of June 29)             $21    single occupancy
                                                 $17    double occupancy
Administrative fee (required)                    $10
Late fee (received after May 10)                 $30

To obtain a FAX of the registration form, send an email request to Paul Utgoff
ml93 at cs.umass.edu or utgoff at cs.umass.edu