PhD thesis: closed loop sequence learning

Bernd Porr bp1 at cn.stir.ac.uk
Tue May 27 04:42:04 EDT 2003


I'm pleased to announce my PhD thesis:

"Sequence-Learning in a Self-Referential
Closed-Loop Behavioural System"

Available here:

http://www.cn.stir.ac.uk/~bp1/diss55pdf.pdf
<http://www.cn.stir.ac.uk/%7Ebp1/diss55pdf.pdf>


Abstract:
---------
This thesis focuses on the problem of ``autonomous agents''. 
It is assumed that such agents want to be in a desired state 
which can be assessed by the agent itself when it observes 
the consequences of its own actions. Therefore the
_feedback_ from the motor output via the environment
to the sensor input is an essential component of such a 
system. Therefore, an agent is defined in this thesis as a 
self-referential system which operates within a closed 
sensor-motor-sensor feedback loop. The generic situation is 
that the agent is always prone to unpredictable disturbances 
which arrive from the outside, i.e. from its environment. 
These disturbances cause a deviation from the desired
state (for example the organism is attacked unexpectedly or 
the temperature in the environment changes, ...). The 
simplest mechanism for managing such disturbances in an 
organism is to employ a reflex loop which essentially 
establishes reactive behaviour. Reflex loops are directly 
related to closed loop feedback controllers. Thus,
they are robust and they do not need a built-in model of the
control situation.

However, reflexes have one main disadvantage, namely that 
they always occur ``too late''; i.e., only _after_ a (for 
example, unpleasant) reflex eliciting sensor event has 
occurred. This defines an objective problem for the 
organism. This thesis provides a solution to this problem 
which is called Isotropic Sequence Order (ISO-) learning. 
The problem is solved by correlating the primary 
\textsl{reflex} and a predictive sensor _input_: the result 
is that the system learns the temporal relation between the 
primary reflex and the earlier sensor input and creates a 
new predictive reflex. This (new) predictive reflex does not 
have the disadvantage of the primary reflex, namely of 
always being too late. As a consequence the agent is able to 
maintain its desired input-state all the time. In terms of 
engineering this means that ISO learning solves the inverse 
controller problem for the reflex, which is mathematically 
proven in this thesis.

Summarising, this means that the organism starts as a 
reactive system and learning turns the system into a 
pro-active system.

It will be demonstrated by a real robot experiment that ISO 
learning can successfully learn to solve the classical 
obstacle avoidance task without external intervention (like 
rewards). In this experiment the robot has to correlate a 
reflex (retraction _after_ collision) with signals of range 
finders (turn _before_ the collision). After successful 
learning the robot generates a turning reaction before it 
bumps into an obstacle. Additionally it will be shown that 
the learning goal of ``reflex avoidance'' can also, 
paradoxically, be used to solve an attraction task.

-- 
http://www.cn.stir.ac.uk/~bp1/
mailto:bp1 at cn.stir.ac.uk







More information about the Connectionists mailing list