Generative Power of Neural Nets

Fri Mar 2 09:53:44 EST 1990

Some time ago I posed the question of the weak generative power of
neural nets, and in particular, their equivalence or otherwise to
Turing machines.  This elicited a number of responses, and I am
by no means done reading the entire literature that has been
kindly sent to me.

I would like to offer an update, however.

(1) There seems to be no consensus on whether the issue is
significant, although I feel (as I have said in this forum,
without getting any response) that the arguments for ignoring
such issues (at least the ones I saw) were based on fundamental
misunderstandings of the mathematics and/or its applications.

(2) There seems to be a strong feeling among those who like
the neural nets that, regardless of what their generative power
is, they are interestingly different from, say, Turing machines,
but as far as I have been able to determine, no one has any
concrete proposals as how such a notion of difference (or a
corresponding notion of equivalence) are to be defined.  Instead,
it would appear that some of the misguided arguments against being
concerned with formal issues (mentioned in (1) above) may arise out
of a anything-but-misguided intuition that difference or equivalence
in terms of weak generative power is not the relevant criterion
combined with the indeed-quite-misguided notion that formal theories
of computation and the like cannot characterize any other such
criterion.

(3) There ARE a number of results (or ideas that could easily be
turned into results) about the weak generative power, but they
conflict at first glance quite a bit with each other.  Thus,
on some views, neural nets are equivalent to finite state machines,
on others to Turing machines, and yet on others they are MORE
powerful than Turing machines.  The last position sounds at first
glance the most intriguing.  However, the results that point in
this direction are based simply on assuming infinite nets which
are essentially (unless I am very wrong) like what you would get
if you took the idea of a finite-state machine (or the finite
control of a Turing machine, which is the same thing, essentially)
and allowed an infinite number of states.  In that case, you could
easily get all sorts of non-Turing-computable things to "compute",
but most would I think dispute that this adds anything to our
original idea that "computable" should mean "Turing-computable", since
crucially you would need to allow infinite-length computations.

On the hand, the arguments that reduce neural nets to finite-state
machines are based on the simple idea that actual nets must be strictly
finite (without even the infinite tape which we allow Turing machines
to have).  As has been pointed out, the same applies to any physically
realized computer such as the IBM AT I am using to type these words.
A Turing machine (or even a PDA) cannot be physically realized, only
a finite-state machine can.

However, both of these ways of viewing the situation seem to me
to be fruitless.  The Turing model has been so succesful because,
while literally quite false of the computers we build (because
of the infinite tape), it has proved useful in understanding
what real computation does, anyway.  The point is difficult to
make briefly, but it boils down to the observation that the finite-
state machine model does not distinguish a random finite list
of strings, for example, from something like all strings of the
form a^n b^n for n up to 20,000.  The standard Turing model does
do this, by allowing us to pretend in the second case that we
are really recognizing the infinite language a^n b^n for all n.
As a number of people have pointed out, this is really cheating
but it does not appear to do any harm, at least among those
who understand the rules of the game.

Thus, it seems to me that the important results would be along
the lines of showing that various types of neural nets are
equivalent to Turing machines, or that, if they are not, then
the distinction is NOT simply due either to assuming strict
finiteness (i.e. no way to simulate the TM tape) or else by
assuming strict infiniteness (i.e. an infinite control).  It
is by no means clear to me that we have a convincing answer
in this case, since it seems that some models that have been
defined are NOT (or at least not obviously) as powerful as TMs.
I would welcome additional clarification about this point.

(4) I personally feel that the next step would have to be to
develop a different way of measuring equivalence (as I have
been hinting all along), since this seems to me to be the
intuition that almost everybody has, but I have seen no
ideas directed to this.  Some of my colleagues and I have
been struggling with doing something like this for a different
domain (namely, the comparison of different linguistic formalisms),
but progress has been difficult.  I suspect that neural nets would
be an ideal testing ground for any proposals along these lines,
and I would be grateful for any ideas.  Just to make this a little
more concrete, let me give an example:

(5) It is clear that a Turing machine, a Von Neumann machine, and
a neural net are not the same thing, yet they may have the same
weak generative power.  The question is whether there is any
way to develop a useful mathematical theory in which the first two
are in some sense equivalent to each other but neither is to the third.
It seems to me that this is the intuition of many people, including
by definition (I would think) all connectionists, but there is no
basis currently for deciding whether those who feel so are right.

Alexis Manaster-Ramer
IBM Research
POB 704
Yorktown Heights, NY 10598