TR on Event Learning and Robust Policy Heuristics

Sun May 13 07:47:47 EDT 2001

A technical report is now available, from

http://people.inf.elte.hu/lorincz/NIPG-ELU-14-05-2001.ps.gz

TITLE
Event Learning and Robust Policy Heuristics
ABSTRACT
In this paper we introduce a novel form of reinforcement learning
called event-learning or E-learning. In our method an event is an
ordered pair of two consecutive states. We define event-value
function and derive learning rules which are guaranteed to
converge to the optimal event-value function. Combining our method
with a well-known robust control method, the SDS algorithm, we
introduce Robust Policy Heuristics (RPH). It is shown that RPH, a
fast-adapting non-Markovian policy, is particularly useful for
coarse models of the environment and for partially observed
systems. As such, RPH alleviates the `curse of dimensionality'
problem. Fast adaptation can be used to separate time scales of
learning the value functions of a Markovian decision making
problem and adaptation, the utilization of a non-Markovian policy.
We shall argue that (i) the definition of modules is
straightforward for E-learning, (ii) E-learning extends naturally
to policy switching, and (iii) E-learning promotes planning.
Computer simulations of a two-link pendulum with coarse
discretization and noisy controller are shown to demonstrate the
principle.

Comments are more than welcome.

Andras Lorincz
www.inf.elte.hu/~lorincz