workshop
gordon@AIC.NRL.Navy.Mil
gordon at AIC.NRL.Navy.Mil
Tue May 20 10:29:16 EDT 1997
=======
CALL FOR PARTICIPATION
REINFORCEMENT LEARNING: TO MODEL OR
NOT TO MODEL, THAT IS THE QUESTION
Workshop at the Fourteenth
International Conference on Machine
Learning (ICML-97)
Vanderbilt University, Nashville, TN
July 12, 1997
www.cs.cmu.edu/~ggordon/ml97ws
Recently there has been some disagreement in the reinforcement
learning community about whether finding a good control policy
is helped or hindered by learning a model of the system to be
controlled. Recent reinforcement learning successes
(Tesauro's TD-gammon, Crites' elevator control, Zhang and
Dietterich's space-shuttle scheduling) have all been in
domains where a human-specified model of the target system was
known in advance, and have all made substantial use of the
model. On the other hand, there have been real robot systems
which learned tasks either by model-free methods or via
learned models. The debate has been exacerbated by the lack
of fully-satisfactory algorithms on either side for
comparison.
Topics for discussion include (but are not limited to)
o Case studies in which a learned model either contributed to
or detracted from the solution of a control problem. In
particular, does one method have better data efficiency?
Time efficiency? Space requirements? Final control
performance? Scaling behavior?
o Computational techniques for finding a good policy, given a
model from a particular class -- that is, what are good
planning algorithms for each class of models?
o Approximation results of the form: if the real system is in
class A, and we approximate it by a model from class B, we
are guaranteed to get "good" results as long as we have
"sufficient" data.
o Equivalences between techniques of the two sorts: for
example, if we learn a policy of type A by direct method B,
it is equivalent to learning a model of type C and computing
its optimal controller.
o How to take advantage of uncertainty estimates in a learned
model.
o Direct algorithms combine their knowledge of the dynamics and
the goals into a single object, the policy. Thus, they may
have more difficulty than indirect methods if the goals change
(the "lifelong learning" question). Is this an essential
difficulty?
o Does the need for an online or incremental algorithm interact
with the choice of direct or indirect methods?
Preliminary schedule of talks:
9:00- 9:30 Chris Atkeson
"Why Model-Based Learning Should Be Inconsistent With
the Model"
9:30-10:15 Jeff Schneider
"Exploiting Model Uncertainty Estimates for Safe Dynamic
Control Learning"
10:15-10:45 Discussion break
10:45-11:15 David Andre, Nir Friedman, and Ronald Parr
"Generalized Prioritized Sweeping"
11:15-12:00 Scott Davies, Andrew Y. Ng, and Andrew Moore
"Applying Model-Based Search to Reinforcement Learning"
12:00- 1:00 LUNCH BREAK
1:00- 1:45 Rich Sutton
"Multi-Time Models: A Unified View of Modeling and
Not Modeling"
1:45- 2:15 Doina Precup and Rich Sutton
"Multi-Time Models for Reinforcement Learning"
2:15- 2:45 Howell, Frost, Gordon, and Wu
"Real-Time Learning of Vehicle Suspension Control Laws"
2:45- 3:15 Discussion break
3:15- 3:45 Leonid Kuvayev and Rich Sutton
"Approximation in Model-Based Learning"
3:45-4:15 Geoff Gordon
"Wrap-up"
4:15- 5:00 Discussion
Organizers:
Chris Atkeson (cga at cc.gatech.edu)
College of Computing
Georgia Institute of Technology
801 Atlantic Drive
Atlanta, GA 30332-0280
Geoff Gordon (ggordon at cs.cmu.edu)
Computer Science Department
Carnegie Mellon University
5000 Forbes Ave
Pittsburgh, PA 15213-3891
(412) 268-3613, (412) 361-2893
Contact:
Geoff Gordon (ggordon at cs.cmu.edu)
More information about the Connectionists
mailing list