ART 1

Wed Jan 26 21:08:03 EST 1994

> 	Recently I heard an argument against Gail Carpenter and Stephen
> Grossberg's ART(Adaptive resonance theory). The basic argument was that ART
> was simply the 'leader clustering algorithm' enclosed in a load of neural
> net terminology. I am not very familiar with the leader clustering
> algorithm and was wondering would anyone like to remark for/against this
> argument as I am very interested in ART. Does anyone know of any paper on
> this subject? (ART vs.  Leader clustering, or even leader clustering on
> it's own?).
>

I thought it would be informative to post my reply, since I have done 
some work with ART.  I would like to make two points:

First, it is incorrect to state that the binary pattern clustering 
algorithm implemented by ART1 is equivalent to the leader clustering 
algorithm (ART is much more general than the ART1 architecture.  I 
assumed the reference was to ART1).  There are two significant 
differences:

1. ART1 is meant to function as a real-time clustering algorithm. 
This means that it (1) accepts and clusters input patterns in sequence, as 
they would appear in an application requiring an online system that learns 
as it processes data, and (2) is capable of finding a representation of the 
inputs that is arguably general (see below).  

The leader clustering algorithm, as I understand it, is supposed to have 
all its inputs available at once so that it can scan the set globally to 
form clusters.  Hardly a real-time algorithm in any sense of the word.

2.  The leader clustering algorithm does not generalize about its inputs.  
To explain, the patterns that it uses to represent its clusters are simply 
the input patterns that initiate the clusters (the "leaders").  ART1, on 
the other hand, forms a synaptic (in the neurobiological sense of the 
word) memory consisting of patterns that are templates for the patterns in 
each of the (real-time, dynamic) clusters that it forms.  It updates these 
templates as it processes its inputs.  Each template is the bitwise AND of 
all the input patterns that have been assigned to the corresponding cluster 
at some time in the learning history of ART1.  This bitwise AND is a 
consequence of the Hebbian-like (actually, Weber-Fechner law) learning 
at each synapse in the outstar of F2 ---> F1 feedback connections from the 
F2 node that represents the cluster.  A corresponding change occurs in the 
F1 ---> F2 connections to that same node, which form an adaptive filter 
for screening the inputs that come in through the F1 layer.  

Whether an input pattern is adopted by a particular cluster or not depends 
upon two measures of input pattern/template similarity that the ART1 system 
computes.  The first measure is a result of F2 layer competition through 
inhibitory interconnections (again, synaptic).  The second is computed by 
F2 ---> F1 gain control and the vigilance mechanism.  The F2 ---> F1 gain 
control and F1 ---> vigilance node inhibitory connections, 
input layer ---> vigilance node connections, and vigilance node ---> F2 
connections (all synaptic) effect the computation.

The result is 

(1) Generalization.  In fact, if the F1 nodes are thought of as implementing 
predicates in a two-valued logic, it is possible to prove that the ART1 
templates represent conjunctive generalizations about the objects or events 
represented by the input patterns that have been adopted by a cluster.  That 
is, each ART1 cluster represents a concept class.  

Each template also corresponds to a formula about any future objects 
that might be recognized as members of its concept class.  This is more 
complicated than a simple conjunction of F1 predicates, but can be broken 
down into component conjunctions.  I have a technical report on this, but 
the following reference is more useful relative to ART1 and its algorithm:

Healy, M. J., Caudell, T. P. and Smith, S. D. G.,
A Neural Architecture for Pattern Sequence Verification Through 
Inferencing, IEEE Transactions on Neural Networks, Vol 4, No. 1, 
1993, pp. 9-20.

Suppose it is important to stabilize the memory on a fixed set of training 
patterns.  Suppose it is desirable to know how many cycles, repeatedly 
showing the set of patterns to the ART1 system, are necessary to accomplish 
this; that is, how many cycles until the templates do not change any more, 
and each input pattern is recognized consistently as corresponding to a 
single template?  Further, can the patterns be presented in some randomized 
order each time, or do they have to be presented in a particular order?

The answer is as follows: Suppose that the number of distinct sizes of 
patterns---size being the number of 1-bits in a binary pattern---is M 
(obviously, M <= N, where N is the number of training patterns).  Then 
M cycles are required.  Further, the order of presentation can be arbitrary, 
and can be different with each cycle.  Reference:

M. Georgiopoulos, G. L. Heileman, and J. Huang, 
Properties of Learning Related to Pattern Diversity in ART1, 
Neural Networks, Vol. 4, pp. 751-757, 1991. 

This does not mean that the FORM of the templates is independent of the order 
of presentation.  In fact, learning in ART1 is order-dependent, as it is in 
all clustering algorithms.  I'll bet that leader clustering, even though it 
views the training set all at once, is also order-dependent.  The inputs 
still have to be processed in some order and then deleted from the training 
set on each cycle.  You could redo the entire training process for all N! 
possible presentation orders, but you would still have to somehow find the 
"best" of all the N! clusterings.

My second point addresses the relevance of the argument that ART (meaning 
ART1) is "simply the leader clustering algorithm enclosed in a load of 
neural net terminology":  

ART1 represents a neural network, complete with a dynamic system model.  
Watch for

Heileman, G.,
 A Dynamical Adaptive Resonance Architecture, 
IEEE Transactions on Neural Networks (soon to appear)

Given the relevance of ART1 to neural systems, including 
those that may actually exist in the brain, and given the proven 
stability of the ART1 algorithm, it seems to me that any argument that 
ART1 is simply this, that or the other algorithm is a moot point.

I hope this sheds some light on the relationship between ART1 and the 
leader clustering algorithm.  My thanks to the author of the original 
posting.

						Mike Healy