Ph.D. thesis available: Model-Based Reinforcement Learning in Continuous Environments

Mon Mar 12 06:46:12 EST 2001

Dear Connectionists,

my Ph.D. thesis

****************************************************************

 Model-Based Reinforcement Learning in Continuous Environments

                          Martin Appl

          December 2000, Technical University of Munich

****************************************************************

is now available at www.martinappl.de .

Best regards,

   Martin Appl

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ABSTRACT

Reinforcement learning enables machines to learn from experiences. For
example, controllers can learn optimal control strategies trying out
different strategies and evaluating the resulting performance of the
processes under control. At present reinforcement learning is rarely
used for the optimization of complex industrial processes, since the
computational requirements of reinforcement learning approaches grow
fast as the number of input variables increases.  Hence, the rough
goal of this thesis is to develop efficient approaches enhancing the
range of application of reinforcement learning. The focus of the
thesis is on time-discrete control of processes with continuous
controlled inputs and continuous measured outputs.

A central result of this thesis is a fuzzy model-based
reinforcement learning approach. Using this approach control
strategies for continuous processes can be efficiently trained. The
output of the approach is a Takagi-Sugeno fuzzy system representing
the optimal control strategy. A further result of this thesis is a
fuzzy model-based exploration strategy. During learning this strategy
controls processes in such a way that maximum information is
gained. Hence, the number of control cycles required to learn optimal
control strategies is significantly reduced. For many control problems
it is known a priori which measured quantities are correlated and
which are statistically independent. Taking this kind of a priori
knowledge into account both the model-based learning approach and the
exploration strategy can be significantly sped up, as is also shown in
this thesis. A general problem in fuzzy model-based learning is the
generation of suitable fuzzy partitions.  Defining partitions by hand
is not trivial, since fine partitions lead to a large number of
states, whereas coarse partitions can be unsuitable for the
representation of the optimal control strategy. Therefore, further
extensions of the fuzzy model-based learning approach and the
model-based exploration strategy are presented in this thesis. The
basic idea behind these extensions is to represent internal models by
clustered transitions. Based on this compact representation the
extended algorithms can automatically determine suitable partitions of
the state space.

The methods presented in this thesis are applied to tasks from traffic
signal control. One task is to select framework signal plans in
dependence of traffic conditions. It turned out that the fuzzy
model-based approaches outperform existing crisp methods. Furthermore,
these methods allow to solve the task in reasonable time.