paper available
Ronald J Williams
rjw at ccs.neu.edu
Tue Nov 23 13:48:58 EST 1993
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/williams.perf-bound.ps.Z
**PLEASE DO NOT FORWARD TO OTHER GROUPS**
The following paper is now available in the neuroprose directory. It is
17 pages long. For those unable to obtain the file by ftp, hardcopies can
be obtained by contacting: Diane Burke, College of Computer Science, 161 CN,
Northeastern University, Boston, MA 02115, USA.
Tight Performance Bounds on Greedy Policies
Based on Imperfect Value Functions
Northeastern University College of Computer Science
Technical Report NU-CCS-93-13
Ronald J. Williams
College of Computer Science
Northeastern University
rjw at ccs.neu.edu
Abstract:
Consider a given value function on states of a Markov decision problem, as
might result from applying a reinforcement learning algorithm. Unless this
value function equals the corresponding optimal value function, at some
states there will be a discrepancy, which is natural to call the Bellman
residual, between what the value function specifies at that state and what
is obtained by a one-step lookahead along the seemingly best action at that
state using the given value function to evaluate all succeeding states.
This paper derives a bound on how far from optimal the discounted return
for a greedy policy based on the given value function will be as a function
of the maximum norm magnitude of this Bellman residual. A corresponding
result is also obtained for value functions defined on state-action pairs,
as are used in Q-learning, and in this case it is also shown that this bound
is tight in general. One significant application of this result is to
problems where a function approximator is used to learn a value function,
with training of the approximator based on trying to minimize the Bellman
residual across states or state-action pairs. When control is based on the
use of the resulting value function, this result provides a link between how
well the objectives of function approximator training are met and the quality
of the resulting control.
To obtain a copy:
ftp cheops.cis.ohio-state.edu
login: anonymous
password: <your email address>
cd pub/neuroprose
binary
get williams.perf-bound.ps.Z
quit
Then at your system:
uncompress williams.perf-bound.ps.Z
lpr -P<printer-name> williams.perf-bound.ps
---------------------------------------------------------------------------
Ronald J. Williams | email: rjw at ccs.neu.edu
College of Computer Science, 161 CN | Phone: (617) 373-8683
Northeastern University | Fax: (617) 373-5121
Boston, MA 02115, USA |
---------------------------------------------------------------------------
More information about the Connectionists
mailing list