paper announcement

Lawrence Saul lksaul at psyche.mit.edu
Wed Apr 3 13:18:01 EST 1996


FTP-host: psyche.mit.edu
FTP-file: pub/lksaul/mdplc.ps.Z
WWW-host: http://web.mit.edu/~lksaul/

----------------------------------------------------

The following paper, to appear at COLT'96, is now available on-line.
It contains a statistical mechanical analysis of a simple problem in
decision and control.

----------------------------------------------------

Title: Learning curve bounds for a Markov decision 
       process with undiscounted rewards

Authors: Lawrence Saul and Satinder Singh

Abstract: The goal of learning in Markov decision processes is to find
a policy that yields the maximum expected return over time.  In
problems with large state spaces, computing these returns directly is
not feasible; instead, the agent must estimate them by stochastic
exploration of the state space.  Using methods from statistical
mechanics, we study how the agent's performance depends on the allowed
exploration time.  In particular, for a simple control problem with
undiscounted rewards, we compute a lower bound on the return of
policies that appear optimal based on imperfect statistics.  This is
done in the thermodynamic limit where the exploration time and the
size of the state space tend to infinity at a fixed ratio.

----------------------------------------------------






More information about the Connectionists mailing list