Post-NIPS Robot Learning workshop program

Wed Nov 4 15:52:45 EST 1992

___________________________________________________________________________

                 PROGRAM FOR THE POST-NIPS WORKSHOP "ROBOT LEARNING"
			Vail, Colorado, Dec 5th, 1992

NIPS=92 Workshop:       Robot Learning
=================

Intended Audience:      Connectionists and Non-Connectionists in Robotics, 
==================      Control, and Active Learning

Organizers:
===========
     Sebastian Thrun (CMU)     Tom Mitchell (CMU)       David Cohn (MIT)
       thrun at cs.cmu.edu        mitchell at cs.cmu.edu     cohn at psyche.mit.edu

Program:
========

Robot learning has grasped the attention of many researchers over the
past few years. Previous robotics research has demonstrated the
difficulty of manually encoding sufficiently accurate models of the
robot and its environment to succeed at complex tasks. Recently a wide
variety of learning techniques ranging from statistical calibration
techniques to neural networks and reinforcement learning have been
applied to problems of perception, modeling and control.  Robot
learning is characterized by sensor noise, control error, dynamically
changing environments and the opportunity for learning by
experimentation.

This workshop will provide a forum for researchers active in the area
of robot learning and related fields.  It will include informal
tutorials and presentations of recent results, given by experts in
this field, as well as significant time for open discussion.  Problems
to be considered include: How can current learning robot techniques
scale to more complex domains, characterized by massive sensor input,
complex causal interactions, and long time scales?  How can previously
acquired knowledge accelerate subsequent learning? What
representations are appropriate and how can they be learned?

Although each session has listed "speakers," the intent is that each
speaker will not simply present their own work, but will introduce
their work interactively, as a launching point for group discussion on
their chosen area. After all speakers have finished, the remaining
time will be used to discuss at length issues that the group feels
need most urgently to be addressed.

Below, we have listed the tentative agenda, which is followed by brief
abstracts of each author's topic. For those who wish to get a head
start on the workshop, we have included a list of references and/or
recommended readings, some of which are available by anonymous ftp.

=====================================================================
=====================================================================

				AGENDA

=====================================================================
=====================================================================

	SESSION ONE (early morning session), 7:30 - 9:30:
	-------------------------------------------------
		TITLE: 	"Robot learning: scaling up and state of the art"

		Keynote speaker:   Chris Atkeson  (30 min)
				   "Paradigms for Robot Learning"

		Speakers:	   Steve Hanson   (15 min)
				   (title to be announced)

				   Satinder Singh (15 min)
				   Behavior-Based Reinforcement Learning

				   Andrew W. Moore(15 min)
				   The Parti-Game Algorithm for Variable
				   Resolution Reinforcement Learning

				   Richard Yee    (15 min)
				   Building Abstractions to Accelerate
				   Weak Learners

	SESSION TWO (apres-ski session), 4:30 - 6:30:
	---------------------------------------------
		PANEL: "Robot learning: Where are the new ideas coming from?"

		Keynote speaker:   Andy Barto     (30 min)

		Speakers:	   Tom Mitchell   (10 min each)

				   Chris Atkeson

				   Dean Pomerleau 

				   Steve Suddarth 

=====================================================================
=====================================================================

			ABSTRACTS

=====================================================================
Session 1: 	Scaling up and the state of the art
When:		Saturday, Dec 5, 7:30-9:30 a.m.
=====================================================================
=====================================================================
Keynote:	Chris Atkeson (cga at ai.mit.edu)

Title:		Paradigms for Robot Learning

Abstract: This talk will survey a variety of robot learning tasks and
learning paradigms to perform those tasks.  The tasks include pattern
classification, regression/function approximation, root finding,
function optimization, designing feedback controllers, trajectory
following, stochastic modeling, stochastic control, and strategy
generation.  Given this wide range of tasks it seems reasonable to ask
if there is any commonality among them, or any way in which solving one
task might make other tasks easier to perform.  In our own work we have
typically taken an indirect approach: our learning algorithms explicitly
form models, and then solve the problem using algorithms that assume
complete knowledge.  It is not at all clear which learning tasks are
best dealt with using an indirect approach, and which are handled better
with a direct approach in which the control strategy is learned
directly.  Nor is it clear how to cope with uncertainty and incomplete
knowledge, either by modeling it explicitly, using stochastic models, or
using game theory and assuming a malevolent world.  I hope to provoke a
discussion on these issues.

======================================================================
Presenter: 	Satinder Pal Singh (singh at cs.umass.edu)

Title:		Behavior-Based Reinforcement Learning

Abstract: Control architectures based on reinforcement learning have
been successfully applied to agents/robots that use their repertoire
of primitive control actions to achieve goals in an external
environment.  The optimal policy for any goal is a state-dependent
composition of the given "primitive" policies (a primitive policy "A"
assigns action A to every state). In that sense, the primitive
policies form the "basis" set from which optimal solutions can be
"composed". I argue that reinforcement learning can be greatly
accelerated by redefining the basis set of policies available to the
agent.  These redefined basis policies should correspond to
"behaviors" that are useful across the set of tasks faced by the
agent.  Behavior-based RL, i.e., the application of RL to
behavior-based robotics (ref Brooks), has several advantages: it can
drastically reduce the effective dimensionality of the action space,
it provides a framework for incorporating prior knowledge into RL
architectures, it provides a technique for achieving transfer of
learning, and finally by restricting the rules of composition and the
types of behaviors it may become possible to perform "robust"
reinforcement learning. I will provide examples from my own work and
that of others to illustrate these ideas.

(Refs 4, 5, 6)

======================================================================
Presenter:	Andrew W. Moore (awm at ai.mit.edu)
Title		The Parti-Game Algorithm for Variable Resolution
		Reinforcement Learning

Can we efficiently learn in continuous state-spaces, while requiring
only relatively few real-world experienvces during the learning stage?
Dividing a continuous state-space into a fine grid can mean a
tragically large number of unnecessary experiences, while a coarse
grid or parametric representation can become stuck. This talk
overviews a new algorithm which, in real time, tries to adaptively
alter the resolution of a state space partitioning to be coarse where
it can and fine where it must to be if it is to avoid becoming stuck.
The key idea turns out to be the treatment of the problem as a game
instead of a Markov decision task.

Possible prior reading:
Ref 7 (Overview of some other uses of kd-trees in Machine learning)
Ref 8 (A non-real-time algorithm which uses a different partitioning strategy)
Ref 9 (A search control technique which Parti-Game uses)
Refs 9, 10

======================================================================
Presenter:	Richard Yee, (yee at cs.umass.edu)

Title:		Building Abstractions to Accelerate Weak Learners

Abstract: Learning methods based on dynamic programming (DP) are
promising approaches to the problem of controlling dynamical systems.
Practical DP-based learning will require function approximation
methods that are well-suited for learning optimal value functions,
which map system states into numeric estimates of utility.  Such
approximation problems are generally characterized by non-stationary,
dependent training data and, in many cases, little prospect for
incorporating strong {\em a priori\/} learning biases.  Consequently.
this talk considers learning approaches that begin weakly (e.g., using
rote memorization) but strengthen their learning biases as experiences
accrue.  Abstracting from stored experiences should accelerate
learning by improving generalization.  Bootstrapping such abstraction
processes (cf.\ "hypothesis boosting") might be a practical means for
scaling DP-based learning across a wide variety of applications.
(Refs 1, 2, 3, 4)

=====================================================================
Session 2: 	Where are the new ideas coming from?
When:		Saturday, Dec 5, 4:30-6:30 p.m.
=====================================================================
=====================================================================
Keynote:	Andrew G. Barto (barto at cs.umass.edu)

Title:		Reinforcement Learning Theory

Although reinforcement learning is being studied more widely than ever
before, especially methods based on approximating dynamic programming
(DP), its theoretical foundations are not yet highly developed. In
this talk, I discuss what I percieve to be the current state and the
missing links in this theory. This topic raises such questions as the
following: Just what is DP-based reinforcement learning from a
mathematical perspective? What is the relationship between DP-based
reinforcement learning and other methods for approximating DP? What
theoretical justification exists for combining function approximation
methods (such as artificial neural networks) with DP-based learning?
What kinds of problems are best suited to DP-based reinforcement
learning?  Is theory important?

=====================================================================
Presenter:	Dean Pomerleau

Title:		Combining artificial neural networks and symbolic
		processing for autonomous robot guidance

Artificial neural networks are capable of performing the reactive
aspects of autonomous driving, such as staying on the road and avoiding
obstacles.  This talk describes an efficient technique for training
individual networks to perform these reactive driving tasks.  But
driving requires more than a collection of isolated capabilities.  To
achieve true autonomy, a system must determine which capabilities should
be employed in the current situation to achieve its objectives.  Such
goal directed behavior is difficult to implement in an entirely
connectionist system.  This talk describes a rule-based technique for
combining multiple artificial neural networks with map-based symbolic
reasoning to achieve high level behaviors.  The resulting system is not
only able to stay on the road, it is able follow a route to a
predetermined destination, turning appropriately at intersections and
stopping when it has reached its goal.

(Refs 11, 12, 13, 14, 15)

=====================================================================
=====================================================================
  References
=====================================================================
=====================================================================

(#1) Yee, Richard, "Abstraction in Control Learning", Department of
Computer and Information Science, University of Massachusetts,
Amherst, MA 01003, COINS Technical Report 92-16, March 1992.
anonymous ftp:	envy.cs.umass.edu:pub/yee.abstrn.ps.Z

(#2) Barto, Andrew G. and Richard S. Sutton and Christopher J. C. H.
Watkins, Sequential decision problems and neural networks, in Advances
in Neural Information Processing Systems 2, 1990, Touretzky, D. S.,
ed.

(#3) Barto, Andrew G. and Richard S. Sutton and Christopher J. C. H.
Watkins", Learning and Sequential Decision Making, in Learning and
Computational Neuroscience: Foundations of Adaptive Networks, 1990.
anonymous ftp:  
archive.cis.ohio-state.edu:pub/neuroprose/barto.sequential_decisions.ps.Z

(#4) Barto, Andrew G. and Steven J. Bradtke and Satinder Pal Singh,
Real-time learning and control using asynchronous dynamic programming,
Computer and Information Science, University of Massachusetts,
Amherst, MA 01003, COINS Technical Report TR-91-57, August 1991.
anonymous ftp:  
archive.cis.ohio-state.edu:pub/neuroprose/barto.realtime-dp.ps.Z

(#5) Singh, S.P.," Transfer of Learning by Composing Solutions for Elemental
Sequential Tasks, Machine Learning, 8:(3/4):323-339, May 1992.
anonymous ftp:	envy.cs.umass.edu:pub/singh-compose.ps.Z

(#6) Singh, S.P., "Scaling reinforcement learning algorithms by
learning variable temporal resolution models, Proceedings of the Ninth
Machine Learning Conference, D. Sleeman and P. Edwards, eds., July
1992.
anonymous ftp:	envy.cs.umass.edu:pub/singh-scaling.ps.Z

(#7) S. M. Omohundro, Efficient Algorithms with Neural Network
Behaviour, Journal of Complex Systems, Vol 1, No 2, pp 273-347, 1987.

(#8) A. W. Moore, Variable Resolution Dynamic Programming: Efficiently
Learning Action Maps in Multivariate Real-valued State-spaces, in
"Machine Learning: Proceedings of the Eighth International Workshop",
edited by Birnbaum, L.  and Collins, G., published by Morgan Kaufman.
June 1991.

(#9) A. W. Moore and C. G. Atkeson, Memory-based Reinforcement
Learning: Converging with Less Data and Less Real Time, 1992. See the
NIPS92 talk or else preprints available by request to awm at ai.mit.edu

(#10) J. Peng and R. J. Williams, Efficient Search Control in Dyna,
College of Computer Science, Northeastern University, March, 1992

(#11) Pomerleau, D.A., Gowdy, J., Thorpe, C.E. (1991) Combining artificial
neural networks and symbolic processing for autonomous robot guidance.
In {\it Engineering Applications of Artificial Intelligence, 4:4} pp.
279-285.

(#12) Pomerleau, D.A. (1991) Efficient Training of Artificial Neural Networks
for Autonomous Navigation. In {\it Neural Computation 3:1} pp.  88-97.

(#13) Touretzky, D.S., Pomerleau, D.A. (1989) What's hidden in the hidden
units?  {\it BYTE 14(8)}, pp. 227-233.

(#14) Pomerleau, D.A. (1991) Rapidly Adapting Artificial Neural Networks for
Autonomous Navigation. In {\it Advances in Neural Information Processing
Systems 3}, R.P. Lippmann, J.E. Moody, and D.S. Touretzky (ed.), Morgan
Kaufmann, pp. 429-435.

(#15) Pomerleau, D.A. (1989) ALVINN: An Autonomous Land Vehicle In a Neural
Network. In {\it Advances in Neural Information Processing Systems 1},
D.S.  Touretzky (ed.), Morgan Kaufmann, pp. 305-313.