preprints and reports

Tue Apr 30 03:17:00 EDT 1991

Recent preprints and technical reports are available via ftp:
------------------------------------------------------------------

                  ADAPTIVE DECOMPOSITION OF TIME
                     Juergen Schmidhuber, TUM
          (Talk at ICANN'91, Helsinki, June 24-28, 1991)
In this paper we introduce  design principles for unsupervised
detection of regularities (like causal relationships) in temporal
sequences.  One basic idea is to train an adaptive predictor module to
predict future events from past events, and to train an additional
confidence module to model the reliability of the predictor's
predictions. We select system states at those points in time where
there are changes in prediction reliability, and use them
recursively as inputs for higher-level predictors.  This can be
beneficial for `adaptive sub-goal generation' as well as for
`conventional' goal-directed (supervised and reinforcement) learning:
Systems based on these design principles were successfully tested
on tasks where  conventional training algorithms for recurrent nets fail.
Finally we describe the principles of the first neural sequence `chunker'
which collapses a self-organizing multi-level predictor hierarchy into a
single recurrent network.

        LEARNING TO GENERATE SUBGOALS FOR ACTION SEQUENCES
                    Juergen Schmidhuber, TUM
                        (Talk at ICANN'91)
This paper extends the technical report FKI-129-90 (`Toward compositional
learning with neural networks').

  USING ADAPTIVE SEQUENTIAL NEUROCONTROL FOR EFFICIENT  LEARNING
              OF TRANSLATION AND ROTATION INVARIANCE
            Juergen Schmidhuber and Rudolf Huber, TUM
                        (Talk at ICANN'91)
This paper is based on FKI-128-90 (announced earlier).

-------------------------------------------------------------------

    LEARNING TO CONTROL FAST-WEIGHT MEMORIES: AN ALTERNATIVE TO
                   DYNAMIC RECURRENT NETWORKS
                    Juergen Schmidhuber, TUM
	   Technical report FKI-147-91, March 26, 1991
Previous algorithms for supervised sequence learning are based on
dynamic recurrent networks.  This paper describes alternative
gradient-based systems consisting of two  feed-forward nets
which learn to deal with temporal sequences by using  fast weights:
The first net  learns to produce context dependent weight changes
for the second net whose weights may vary very quickly. One
advantage of the method over the more conventional recurrent net
algorithms  is the following: It does not necessarily occupy
full-fledged units (experiencing some sort of feedback) for storing
information over time. A simple weight may be sufficient for storing
temporal information. Since with most networks there are many more
weights than units, this property represents a potential for
storage efficiency. Various learning methods are derived. Two
experiments with unknown time delays illustrate the approach. One
experiment shows how the system can be used for adaptive temporary
variable binding.

                    NEURAL SEQUENCE CHUNKERS
                    Juergen Schmidhuber, TUM
	   Technical report FKI-148-91, April 26, 1991
This paper addresses the problem of meaningful hierarchical adaptive
decomposition of temporal sequences. This problem is relevant for
time-series analysis as well as for goal-directed learning.  The first
neural systems for recursively chunking sequences are described.
These systems are  based on a principle called the `principle of history
compression'. This principle essentially says: As long as a predictor
is able to predict future environmental inputs from previous ones, no
additional knowledge can be obtained by observing these inputs in
reality. Only unpredicted inputs deserve attention.  The focus is on
a 2-network system which tries to collapse a self-organizing multi-level
predictor hierarchy into a single recurrent network (the automatizer).
The basic idea is to feed everything that was not expected by the
automatizer into a `higher-level' recurrent net (the chunker). Since
the expected things can be derived from the unexpected things by the
automatizer, the chunker is fed with a reduced description of the input
history.  The chunker has a comparatively easy job in finding
possibilities for additional reductions, since it works on a slower
time scale and receives less inputs than the automatizer.  Useful
internal representations of the chunker in turn are taught to the
automatizer. This leads to even more reduced input descriptions for
the chunker, and so on.  Experimentally it is shown that the system can
be superior to conventional training algorithms for recurrent nets:
It may require fewer computations per time step, and in addition it may
require fewer training sequences.  A possible extension for reinforcement
learning and adaptive control is mentioned.  An analogy is drawn between
the behavior of the chunking system and the apparent behavior of humans.

              ADAPTIVE CONFIDENCE AND ADAPTIVE CURIOSITY
                        Juergen Schmidhuber
             Technical Report FKI-149-91, April, 26, 1991
Much of the recent research on adaptive neuro-control and reinforcement
learning focusses on systems with adaptive `world models'. Previous
approaches, however, do not address the problem of modelling the
reliability of the world model's predictions in uncertain environments.
Furthermore, with previous approaches usually some ad-hoc method
(like random search) is used to train the world model to predict
future environmental inputs from previous inputs and control outputs of
the system. This paper introduces ways for modelling the reliability of
the outputs of adaptive world models, and it describes more sophisticated
and sometimes much more efficient methods for their adaptive construction
by on-line state space exploration: For instance, a 4-network
reinforcement learning system is described which tries to maximize the
future expectation of the temporal derivative of the adaptive assumed
reliability of future predictions. The system is `curious' in the sense
that it actively tries to provoke situations for which it {\em learned
to expect to learn} something about the environment.  In a very limited
sense the system learns how to learn.  An experiment with a simple
non-deterministic environment demonstrates that the method can be
clearly faster than the conventional model-building strategy.

------------------------------------------------------------------------
To obtain copies of the papers, do:

             unix>         ftp 131.159.8.35

             Name:         anonymous
             Password:     your name, please
             ftp>          binary
             ftp>          cd pub/fki
             ftp>          get <file>.ps.Z
             ftp>          bye

             unix>         uncompress <file>.ps.Z
             unix>         lpr  <file>.ps

Here <file> stands for any of the following six possibilities:

icanndec             (Adaptive Decomposition of Time)
icannsub             (Subgoal-Generator). This paper contains 5 partly
                     hand-drawn figures which are not retrievable. Sorry.
icanninv             (Sequential Neuro-Control).

fki147               (Fast Weights)
fki148               (Sequence Chunkers)
fki149               (Adaptive Curiosity)

Please do not forget to leave your name. This will allow us to save
paper if you are on our hardcopy mailing list.

NOTE: icanninv.ps, fki148.ps, and fki149.ps are designed
for European A4 paper format (20.9cm x 29.6cm).
------------------------------------------------------------------------

In case of ftp-problems contact

Juergen Schmidhuber
Institut fuer Informatik,
Technische Universitaet Muenchen
Arcisstr. 21
8000 Muenchen 2
GERMANY

or send email to
schmidhu at informatik.tu-muenchen.de

DO NOT USE REPLY!