Ability to generalise over multiple training runs
Lee Giles
giles at research.nj.nec.com
Tue Mar 10 13:21:49 EST 1992
Regarding recent discussions on training
different nets and their ability to get the same
solution:
We observed (Giles, et.al., in IJCNN91, NIPS4, & Neural
Computation 92) similar results for recurrent nets
learning small regular grammars (finite state automata)
from positive and negative sample strings. Briefly, the characters of
each string are presented at each time step and
supervised training occurs at the end of string
presentation (RTRL). [See the above papers for more information]
Using random initial weight conditions and different numbers of
neurons, most trained neural networks perfectly classified the
training sets. Using a heuristic extraction method (there are
many similar methods), a grammar could be
extracted from the trained neural network. These extracted
grammars were all different, but could be reduced to a unique
"minimal number of states" grammar (or minimal finite state automaton).
Though these experiments were for 2nd order fully recurrent nets,
we've extracted the same grammars from 1st order recurrent
nets using the same training data.
Not all machines performed as well on unseen strings.
Some were perfect on all strings tested; others weren't.
For small grammars, nearly all of the trained neural
networks produced perfect extracted grammars.
In most cases the nets were trained on 10**3 strings and tested
on randomly chosen 10**6 strings whose string length
is < 99. (Since an arbitrary number of strings can be generated
by these grammars, perfect generalization is not possible
to test in practice.) In fact it was possible to extract
ideal grammars from the trained nets that
classified fairly well, but not perfectly, on the test set.
[In other words, you could throw away the net and use
just the grammar.}
This agrees with Paul Atkins' comment:
>From the above I presume (possibly incorrectly) that, if there are many
>possible solutions, then some of them will work well for new inputs and
>others will not work well.
and with Manoel Fernando Tenorio's observation:
>...then I contend that there are a very large, possibly infinite networks
>architectures, or if a single architecture is chosen; if it is a
>classification or interpolation; and if the weights are allowed to be real
>valued or not. A simple modification on the input variable order, or the
>presentation order, or the functions of the nodes, or the initial points,
>or the number of hidden nodes would lead to different nets...
C. Lee Giles
NEC Research Institute
4 Independence Way
Princeton, NJ 08540
USA
Internet: giles at research.nj.nec.com
UUCP: princeton!nec!giles
PHONE: (609) 951-2642
FAX: (609) 951-2482
More information about the Connectionists
mailing list