2-parameter search in ACT-R (e.g. Slow Kendler)

Sat Jan 20 10:21:27 EST 2001

The steep ravine with gently sloping floor has been a cherished part of 
connectionist lore since at least 1985 in the early days of 
backpropagation.
A number of numerical optimization techniques have been used to try to 
speed up the weight learning (a.k.a. parameter tweaking), including the 
momentum method, quickprop, and various second order methods, all with 
various degrees of success.  But poorly configured search spaces are a 
fundamental computational problem for which no magic bullet is likely to 
exist.  Otherwise we would already have a trillion-unit backprop net with 
the capacities of the human brain.

The ravine results from tightly coupled parameters, in which the value of 
one (or more) strongly determines the optimal value of the other(s).  In 
the case of connectionist networks, for example, the value of the weights 
from the input units to the hidden units will strongly determine the value 
of the weights from the hidden units to the output units, because the 
former determine the meaning of the latter.  That is likely to result in 
any system with multiple parameters, unless those parameters are 
independent from each other.

The basic problem in this case is the lack of data, as Niels suggested. 
The impact of the :rule parameter is particularly strong initially but will 
fade with experience because its influence will be reduced in the Bayesian 
weighting, whereas :egs is a constant architectural parameter.  Therefore 
one would expect that having the learning curve data in addition to the 
aggregate performance data would more strongly determine a single parameter 
set.  For example, in my work on cognitive arithmetic (Lebiere, 1998; 
Lebiere, 1999), I found that the level of (activation) noise will 
fundamentally determine the slope of the learning curve, whereas other 
parameters will only shift it up and down by a constant factor.  Other 
parameter explorations for a model of implicit learning can be found in 
(Lebiere & Wallach, 2000).

This suggests an advantage of an architecture like ACT-R over neural 
networks, namely that the parameters are readily interpretable (and 
generally fewer).  This (sometimes) allows to set them by hand through 
careful analysis of their effect on model behavior rather than through 
brute force search.  Not that we sometimes don't have to resort to that as 
well.  The parameter optimizer available on the ACT-R web site tries to 
deal with the valley problem by resetting the direction of search according 
to the conjugate gradient technique.  Richard, I would be interested to 
know how well it does on your example.  Roman and Wheeler, if you can 
please make your parameter search program available on the ACT-R web site 
by emailing it to db30+ at andrew.cmu.edu.  Different techniques perform best 
on different problems, therefore it is important to have a wide assortment 
available.

In and of itself there is nothing wrong with parameter tuning.  But of 
course it is not predictive, and therefore fits to the data that result 
from parameter tuning cannot be taken as support for the model or theory. 
That is why we try to determine constant values (or range of values) for 
architectural parameters (e.g. :egs [though take note of Werner Tack's 
arguments regarding that parmeter at the 2000 workshop]) and rules and 
constraints for setting initial values of knowledge parameters (e.g. :rule).

Christian

Lebiere, C. (1998).  The dynamics of cognition: An ACT-R model of cognitive 
arithmetic.  Ph.D. Dissertation.  CMU Computer Science Dept Technical 
Report  CMU-CS-98-186.  Pittsburgh,PA.
Available at http://reports-archive.adm.cs.cmu.edu/.

Lebiere, C. (1999).  The dynamics of cognitive arithmetic. 
Kognitionswissenschaft [Journal of the German Cognitive Science Society] 
Special issue on cognitive modelling and cognitive architectures,  D. 
Wallach & H. A. Simon (eds.)., 8 (1), 5-19.

Lebiere, C., & Wallach, D. (2000).  Sequence learning in the ACT-R 
cognitive architecture: Empirical analysis of a hybrid model.  In Sun, R. & 
Giles, L. (Eds.) Sequence Learning:  Paradigms, Algorithms, and 
Applications.  Springer LNCS/LNAI, Germany.