Tech Rept: Q-learning for Bandit Problems
duff@wrath.cs.umass.edu
duff at wrath.cs.umass.edu
Fri Mar 24 17:06:59 EST 1995
The following technical report is available via anonymous ftp:
Q-LEARNING FOR BANDIT PROBLEMS
(COMPSCI Technical Report 95-26)
Michael Duff
Department of Computer Science
University of Massachusetts
Amherst, MA 01003
duff at cs.umass.edu
Multi-armed bandits may be viewed as
decompositionally-structured Markov decision processes
(MDP's) with potentially very large state sets. A
particularly elegant methodology for computing optimal
policies was developed over twenty ago by Gittins
[Gittins \& Jones, 1974]. Gittins' approach reduces
the problem of finding optimal policies for the
original MDP to a sequence of low-dimensional stopping
problems whose solutions determine the optimal policy
through the so-called ``Gittins indices.'' Katehakis
and Veinott [Katehakis \& Veinott, 1987] have shown
that the Gittins index for a task in state $i$ may be
interpreted as a particular component of the
maximum-value function associated with the
``restart-in-$i$'' process, a simple MDP to which
standard solution methods for computing optimal
policies, such as successive approximation, apply.
This paper explores the problem of learning the Gittins
indices on-line without the aid of a process model; it
suggests utilizing task-state-specific Q-learning
agents to solve their respective restart-in-state-$i$
subproblems, and includes an example in which the
online reinforcement learning approach is applied to a
simple problem of stochastic scheduling---one instance
drawn from a wide class of problems that may be
formulated as bandit problems.
FTP-host: envy.cs.umass.edu
FTP-file: pub/duff/bandit.ps.Z
18 MBytes compressed / .46 MBytes uncompressed / 32 pages (8 figures)
FTP Instructions:
unix> ftp envy.cs.umass.edu
login: anonymous
password: (your email address)
ftp> cd pub/duff
ftp> binary
ftp> get bandit.ps.Z
ftp> quit
unix> uncompress bandit.ps.Z
unix> lpr bandit.ps
More information about the Connectionists
mailing list