PhD thesis: closed loop sequence learning
Bernd Porr
bp1 at cn.stir.ac.uk
Tue May 27 04:42:04 EDT 2003
I'm pleased to announce my PhD thesis:
"Sequence-Learning in a Self-Referential
Closed-Loop Behavioural System"
Available here:
http://www.cn.stir.ac.uk/~bp1/diss55pdf.pdf
<http://www.cn.stir.ac.uk/%7Ebp1/diss55pdf.pdf>
Abstract:
---------
This thesis focuses on the problem of ``autonomous agents''.
It is assumed that such agents want to be in a desired state
which can be assessed by the agent itself when it observes
the consequences of its own actions. Therefore the
_feedback_ from the motor output via the environment
to the sensor input is an essential component of such a
system. Therefore, an agent is defined in this thesis as a
self-referential system which operates within a closed
sensor-motor-sensor feedback loop. The generic situation is
that the agent is always prone to unpredictable disturbances
which arrive from the outside, i.e. from its environment.
These disturbances cause a deviation from the desired
state (for example the organism is attacked unexpectedly or
the temperature in the environment changes, ...). The
simplest mechanism for managing such disturbances in an
organism is to employ a reflex loop which essentially
establishes reactive behaviour. Reflex loops are directly
related to closed loop feedback controllers. Thus,
they are robust and they do not need a built-in model of the
control situation.
However, reflexes have one main disadvantage, namely that
they always occur ``too late''; i.e., only _after_ a (for
example, unpleasant) reflex eliciting sensor event has
occurred. This defines an objective problem for the
organism. This thesis provides a solution to this problem
which is called Isotropic Sequence Order (ISO-) learning.
The problem is solved by correlating the primary
\textsl{reflex} and a predictive sensor _input_: the result
is that the system learns the temporal relation between the
primary reflex and the earlier sensor input and creates a
new predictive reflex. This (new) predictive reflex does not
have the disadvantage of the primary reflex, namely of
always being too late. As a consequence the agent is able to
maintain its desired input-state all the time. In terms of
engineering this means that ISO learning solves the inverse
controller problem for the reflex, which is mathematically
proven in this thesis.
Summarising, this means that the organism starts as a
reactive system and learning turns the system into a
pro-active system.
It will be demonstrated by a real robot experiment that ISO
learning can successfully learn to solve the classical
obstacle avoidance task without external intervention (like
rewards). In this experiment the robot has to correlate a
reflex (retraction _after_ collision) with signals of range
finders (turn _before_ the collision). After successful
learning the robot generates a turning reaction before it
bumps into an obstacle. Additionally it will be shown that
the learning goal of ``reflex avoidance'' can also,
paradoxically, be used to solve an attraction task.
--
http://www.cn.stir.ac.uk/~bp1/
mailto:bp1 at cn.stir.ac.uk
More information about the Connectionists
mailing list