paper announcement
    Lawrence Saul 
    lksaul at psyche.mit.edu
       
    Wed Apr  3 13:18:01 EST 1996
    
    
  
FTP-host: psyche.mit.edu
FTP-file: pub/lksaul/mdplc.ps.Z
WWW-host: http://web.mit.edu/~lksaul/
----------------------------------------------------
The following paper, to appear at COLT'96, is now available on-line.
It contains a statistical mechanical analysis of a simple problem in
decision and control.
----------------------------------------------------
Title: Learning curve bounds for a Markov decision 
       process with undiscounted rewards
Authors: Lawrence Saul and Satinder Singh
Abstract: The goal of learning in Markov decision processes is to find
a policy that yields the maximum expected return over time.  In
problems with large state spaces, computing these returns directly is
not feasible; instead, the agent must estimate them by stochastic
exploration of the state space.  Using methods from statistical
mechanics, we study how the agent's performance depends on the allowed
exploration time.  In particular, for a simple control problem with
undiscounted rewards, we compute a lower bound on the return of
policies that appear optimal based on imperfect statistics.  This is
done in the thermodynamic limit where the exploration time and the
size of the state space tend to infinity at a fixed ratio.
----------------------------------------------------
    
    
More information about the Connectionists
mailing list