New TR on Kernel-Based Reinforcement Learning

Tue May 4 13:45:46 EDT 1999

The following technical report is now available on-line at

http://www-stat.stanford.edu/~ormoneit/tr-1999-8.ps

Best,

Dirk

------------------------------------------------------------------
        KERNEL-BASED REINFORCEMENT LEARNING
                        by
          Dirk Ormoneit and Saunak Sen

Kernel-based methods have recently attracted increased attention in
the machine learning literature as reliable tools to attack
regression and classification tasks. In this work, we consider a
kernel-based approach to reinforcement learning that will be shown to
produce a consistent estimate of the true value function in a
continuous Markov Decision Process. Typically, consistency cannot be
obtained using parametric value function estimates such as neural networks.
As further contributions, we derive the asymptotic distribution of
the kernel-based estimate and establish optimal convergence rates.
The asymptotic distribution is then used to derive a formula for the
asymptotic bias inherent in the kernel-based approximation.
In spite of the fact that reinforcement learning is generally biased
due to the involved maximum operator, this is the first theoretical
result in this spirit to our knowledge. The suggested bias formulas
may serve as the basis for bias correction techniques that can be
used in practice to improve the estimate of the value function.

--------------------------------------------
Dirk Ormoneit
Department of Statistics, Room 206
Stanford University
Stanford, CA 94305-4065

ph.: (650) 725-6148
fax: (650) 725-8977

ormoneit at stat.stanford.edu
http://www-stat.stanford.edu/~ormoneit/