CascadeC variants, Sjogaard's paper

Thu Jan 23 13:55:36 EST 1992

    Steen Sjogaard makes some interesting observations about the Cascade Corre-
    lation algorithm. He invents a variation of it, named 2CCA, which is using
    only two layers, but also relies on freezing weights, error covariance,
    candidate pools etc. as the CCA does.

Some people don't like to see much discussion on this mailing list, so I'll
keep this response as brief as possible.

Sjogaard's 2CCA algorithm is just like cascade-correlation except that it
elimiantes the "cascade" part: the candidate units receive connections only
from the original inputs, and not from previously tenured hidden units.  So
it builds a net with a single hidden layer, plus shortcut connections from
inputs to outputs.

For some problems, a solution with one hidden layer is as good as any
other, and for these problems 2CCA will learn a bit faster and generalize a
bit better than Cascor.  The extra degress of freedom in cascor have
nothing useful to do and they just get in the way.

However, for other problems, such as two-spirals, you get a much better
solution with more layers.  A cascade architecture can solve this problem
with 10 units, while a single hidden layer requires something like 50 or
60.

My own conclusion is that 2CCA does work somewhat better for certain
problems, but is terrible for others.  If you don't know in advance what
architecture your problem needs, you are probably better off sticking with
the more general cascade architecture.

Chris Lebiere looked briefly at the following option: create two pools of
candidate units, one that receives connections from all pre-existing inputs
and hidden units, and one that has no connections from the deepest layer
created so far.  If the correlation scores are pretty close, tenure goes to
the best-scoring unit in the latter pool.  This new unit is a sibling of
the pre-existing units, not a descendant.  Preliminary results were
promising, but for various reason Chris didn't follow up on this and it is
still an open question what effect this might have on generalization.

Scott Fahlman
School of Computer Science
Carnegie Mellon University