learning 5-bit parity: RMHC vs. GP vs. MLP

Tue Aug 15 14:49:12 EDT 1995

                 Comments on "A Response to ..." [Ko95]

                            Kevin J. Lang
                       NEC Research Institute
                      kevin at research.nj.nec.com
                              July 1995

These remarks constitute a brief technical reply to [Ko95], 
which was handed out at the ML-95 and ICGA-95 conferences.

                SCALING FROM 3-BIT TO 5-BIT PARITY:
                     hill climbing still wins

[Ko95, section 8] extends the experimental comparison in [La95] of the
search efficiency of random mutation hill climbing and genetic search
to the task of learning 5-bit parity.  It is shown that any given run
of RMHC is four times less likely than genetic search to find a
circuit which computes 5-bit parity.  However, when RMHC manages to
find a solution, it does so about fifty times faster than genetic search.

Hence, by using RMHC in an iterated manner (i.e. multiple independent
runs), it is possible to generate solutions much more cheaply than
with genetic search.  For example, by restarting each run of RMHC
after 75,000 candidates one could obtain a candidate to solution ratio
of about 300,000.  This compares well with the value of 2,913,583
candidates per solution reported for genetic search.

The above argument could be formalized by calculating performance
curves for the two algorithms, as discussed in [Ko92, chapter 8].
[Ko95] asserts that these curves would have permitted a meaningful
comparison of the algorithms to be made, but does not provide them,
citing the large computational expense that would supposedly have to
be incurred.

Actually, the relevant portion of the I(M,i,z) curve for RMHC could be
estimated in one day by doing fifty runs out to 100,000 candidates.
Since this is roughly the same amount of work as a single run of
genetic search on this problem, estimating the performance curve for
genetic search _would_ be a daunting task.

                        DON'T USE RMHC:
              the power of multi-layer perceptrons

I would like to emphasize that hill climbing algorithm described in
[La95] was not intended to be either novel or good, and that I do NOT
advocate its use.  By deliberately using a bad version of an old
algorithm, I sought to underline the negative character of my results.

My positive advice is this: when learning functions, use multi-layer
perceptrons.  By adopting this representation for hypotheses one can
exploit the powerful gradient-based search procedures that have been
developed by the numerical analysis community.  To illustrate the
advantages of this approach, I invested 7 seconds of computer time in
ten runs of conjugate gradient search for MLP weights to compute 5-bit
parity.  The resulting candidate to solution ratio was 393.  This is
roughly 750 times better than RMHC on boolean circuits, and 7400 times
better than genetic search on boolean circuits.

          candidates      found a  
          examined      solution? 
         ----------    ----------
             31           yes
             43           yes
             62           yes
            151           yes
            196      no, stuck in local optimum
            239      no, stuck in local optimum
            271      no, stuck in local optimum
            274      no, stuck in local optimum
            308      no, stuck in local optimum
            392           yes

Unlike RMHC and genetic search, the conjugate gradient search
procedure used here has no control parameters to tweak, and should
yield good results in the hands of any user.  Also, it detects when it
is stuck in a local optimum, thus permitting an immediate restart to
be made from random weights.  This transforms the iterated methodology
from an after-the-fact accounting system into a truly useful algorithm.

  Notes on the MLP network used in the above experiment:  
    5 input units
    5 hidden units computing tanh
    1 output unit computing tanh
    random initial weights drawn uniformly from the interval [-1,+1]
    true and false encoded by +1 and -1

                            REFERENCES

[Ko95] John Koza, "A Response to the ML-95 Paper entitled "Hill
       Climbing Beats Genetic Search on a Boolean Circuit Synthesis
       Task of Koza's"", informally published and distributed document.

[La95] Kevin Lang, "Hill Climbing Beats Genetic Search on a Boolean
       Circuit Synthesis Task of Koza's", The Twelfth International
       Conference on Machine Learning, pp. 340-343, 1995.
       Note: a copy of this paper can be obtained by anonymous ftp
       from ftp.nj.nec.com:/pub/kevin/lang-ml95.ps

[Ko92] John Koza, "Genetic Programming", MIT Press, 1992, pp. 205-236.