CascadeC variants, Sjogaard's paper

Tue Jan 21 16:49:41 EST 1992

Steen Sjogaard makes some interesting observations about the Cascade Corre-
lation algorithm. He invents a variation of it, named 2CCA, which is using
only two layers, but also relies on freezing weights, error covariance,
candidate pools etc. as the CCA does. The whole purpose of this difference
in the dynamic network construction strategy is to increase the network's
ability to generalize. He invents a meaningful classification problem (the
'three disc'-problem) and goes through some lengths to present 2CCA's su-
periority over CCA when confronted with the benchmark problem (specifically,
2CCA generalizes better by getting more test patterns right). Now, the prob-
lem is that he also says that, even after creation of 100 hidden units, 2CCA
only classifies only half of the points of the 'Two-Spirals-Problem' right,
which is not better than chance. Everybody who has ever tried to solve the
'Two-Spirals' with _any_ algorithm knows how nasty it is and how good the
CCA solution (as presented in Fahlmann's paper) really is. In this light,
it is obviously not easy to accept Sjogaard's claims.
However, I think that Sjogaard is making some good points, and I would like
to hear your opinion:
a) Is a solution that employs mostly low-order feature detectors (i.e., has
   _few_ and _populated_ hidden layers) typically generalizing better than
   one that uses fewer high-order ones ?

b) How is it to be decided when to add a new hidden unit to an existing
   layer versus putting it into a new one ? Simply creating a second candi-
   date pool doesn't do the job: A new-layer hidden unit should _always_ do
   at least as good as an extra old-layer hidden one, so simple covariance
   comparison does not work. Use an 'allowable error difference' term ?
   Use test patterns to check generalization ? Return to a fixed architec-
   ture ?

c) Would the addition of a little noise make HCCA perform even better ? I
   haven't heard of an experiment in this direction, but I suspect this is
   based on quickprop's relative dislike of changing training sets, not on
   fundamental issues. (HCCA = Sjogaard's term for 'high-order-CCA' = CCA).

d) How to write a k-CCA, a 'in-between-order' CC algorithm ?

e) If you read Sjogaard's neuroprose paper, did you like his formalization
   of generalization ? (I thought it is insightful)

Cheers, Henrik Klagges

IBM Research
rick at vee.lrz-muenchen.de @ henrik at mpci.llnl.gov

PS: I hope Steen gets a k-CCA paper out soon 8-) !