compressibility, Kolmogorov, learning

Wed Nov 22 04:32:17 EST 1995

In response to Barak's and David's recent messages:

David writes:
>>>
To illustrate just one of the possible objections to measuring
randomness with Kolmogorov complexity: Would you say that a
macroscopic gas with a specified temperature is "random"? To describe
it exactly takes a huge Kolmogorov complexity. And certainly in many
regards its position in phase space is "nothing but noise". (Indeed,
in a formal sense, its position is a random sample of the Boltzmann
distribution.) Yet Physicists can (and do) make extraordinarilly
accurate predictions about such creatures with ease.
<<<

1. Is real gas random in the Kolmogorov sense? At least the
idealized microscopic gas models we are studying are not random.
For simplicity, let us consider a typical discrete time gas model.
Each system state can be computed by a short algorithm, given
the previous state. This allows for enormous compressibility of
the system history, even if the initial state had high complexity
(and even more so if the initial state was simple).

Even if we assume that the deterministic model is corrupted by 
random processes, we won't end up with a system history with maximal 
Kolmogorov complexity: for instance, using standard compression
techiques, we can provide short codes for likely next states and
long codes for unlikely next states. The only requirement is that
the random processes are not *completely* random. But in the
real world they are not, as can be deduced from macroscopic gas
properties (considering only abstract properties of the state,
such as temperature and pressure) --- as David indicated, there
are simple algorithms for predicting next macroscopic states from
previous macroscopic states.  In case of true randomness, this
would not be the case.

2. Only where there is compressibility, there is room for
non-trivial learning and generalization. Unfortunately, almost
all possible histories of possible universes are random and
incompressible. There is no miraculous universal learning
algorithm for arbitrary universes (that's more or less the
realm of NFL).

As has been observed repeatedly, however, our own universe appears 
to be one of the relatively few (but still infinitely many) compressible 
ones (every electron behaves the same way, etc.).  In fact, much of
the previous work on machine learning can be thought of exploiting
compressibility: ``chunking'', for instance, exploits the possibility
of re-using subprograms. Methods for finding factorial (statistically
non-redundant) codes of image data exploit the enormous redundancy
and compressibility of visual inputs. Similarly for ``learning by
analogy'' etc.

In the context of PAC learning, Ming Li and Paul Vitanyi address
related issues in an interesting paper from 1989: A theory of
Learning Simple Concepts Under Simple Distributions and Average
Case Complexity for the Universal Distribution, Proc. 30th
American IEEE Symposium on Foundations of Computer Science,
pages 34-39.

3. An intriguing possibility is: there may be something like a
universal learning algorithm for compressible, low-complexity
universes. I am the first to admit, however, that neither the concept
of Kolmogorov complexity by itself nor the universal Solomonoff-Levin
distribution provide all the necessary ingredients.  Clearly, a
hypothetical universal learning algorithm would have to take into
account the fact that computational resources are limited.
This is driving much of the current work at our lab.

Juergen Schmidhuber
IDSIA