Connectionists: First Deep Learning Networks in 1965

Schmidhuber Juergen juergen at idsia.ch
Tue Jul 8 09:28:41 EDT 2014


Andy,

thanks a lot - in fact, the survey mentions the work of Utgoff & Stracuzzi in Sec. 5.6.3 - related important work includes:

... constructive and pruning algorithms, e.g., layer-by-layer sequential network construction (e.g., Ivakhnenko, 1968, 1971; Ash, 1989; Moody, 1989; Gallant, 1988; Honavar and Uhr, 1988; Ring, 1991; Fahlman, 1991;Weng et al., 1992; Honavar and Uhr, 1993; Burgess, 1994; Fritzke, 1994; Parekh et al., 2000; Utgoff and Stracuzzi, 2002) (see also Sec. 5.3, 5.11), input pruning (Moody, 1992; Refenes et al., 1994), unit pruning (e.g., Ivakhnenko, 1968, 1971; White, 1989; Mozer and Smolensky, 1989; Levin et al., 1994) …

Cheers,
Juergen





On Jul 8, 2014, at 3:13 PM, Barto Andy <barto at cs.umass.edu> wrote:

> Juergen,
> 
> It is great that you bring attention to GMDH.  It is not so widely known, although in his 1990 book Neurocomputing, Robert Hecht-Nielson provides a nice discussion of it.  It is related to the general idea of a beam search. People might also want to know about the work of my late colleague Paul Utgoff and his student Dave Stracuzzi, who were very interested in what they called “many-layered learning”. Clearly not the first, but interesting too I think:
> 
> Utgoff, P.E., & Stracuzzi, D.J. (2002a). Many-layered learning. Neural Computation, 14, 2497-2539.
> 
> Best,
> Andy
> 
> 
> 
> On Jul 8, 2014, at 7:40 AM, Schmidhuber Juergen <juergen at idsia.ch> wrote:
> 
>> Who created the first Deep Learning networks?  
>> 
>> To my knowledge, this was done by Olexiy Hryhorovych (Alexey Grigoryevich) Ivakhnenko and colleagues in 1965. Here a brief summary:
>> 
>> Networks trained by the Group Method of Data Handling (GMDH) (Ivakhnenko and Lapa, 1965; Ivakhnenko et al., 1967; Ivakhnenko, 1968, 1971) were perhaps the first Deep Learning systems of the Feedforward Multilayer Perceptron type. The units of GMDH nets may have polynomial activation functions implementing Kolmogorov-Gabor polynomials (more general than other widely used neural network activation functions). Given a training set, layers are incrementally grown and trained by regression analysis (e.g., Legendre, 1805; Gauss, 1809, 1821), then pruned with the help of a separate validation set (using today’s terminology), where Decision Regularisation is used to weed out superfluous units. The numbers of layers and units per layer can be learned in problem-dependent fashion. To my knowledge, this was the first example of hierarchical representation learning in NNs. A paper of 1971 already described a deep GMDH network with 8 layers (Ivakhnenko, 1971). There have been numerous ap!
>> 
>> plications of GMDH-style nets, e.g. (Ikeda et al., 1976; Farlow, 1984; Madala and Ivakhnenko, 1994; Ivakhnenko, 1995; Kondo, 1998; Kordik et al., 2003; Witczak et al., 2006; Kondo and Ueno, 2008) … 
>> 
>> Precise references and more history in:
>> 
>> Deep Learning in Neural Networks: An Overview 
>> PDF & LATEX source & complete public BIBTEX file under
>> http://www.idsia.ch/~juergen/deep-learning-overview.html
>> 
>> Juergen Schmidhuber
>> 
>> 
>> 
> 




More information about the Connectionists mailing list