information function vs. squared error

Thu Mar 9 13:26:38 EST 1989

The use of the cross-entropy measure G = p log(p/q) + (1-p)log(1-p)/(1-q)
(Kullback, 1959), where p and q are the probabilities of a binary random
variable under 2 probability distributions) has been described in at least 
3 different contexts in the connectionist literature:

(i) As an objective function for supervised back-propagation; this
is appropriate if the output units are computing real values which
are to be interpreted as probability distributions over the space
of binary output vectors (Hinton, 1987). Here G-error represents 
the divergence between the desired and observed distributions.

(ii) As an objective function for Boltzmann machine learning (Hinton
and Sejnowski, 1986), where p and q are the output distributions 
in the + and - phases.

(iii) In the Gmax unsupervised learning algorithm (Pearlmutter and Hinton,
1986) as a measure of the difference between the actual
output distribution of a unit and the predicted distribution assuming
independent input lines.

References:

Hinton, G. E. 1987. "Connectionist Learning Procedures", Revised version
of Technical Report CMU-CS-87-115, to appear (appeared ?) in Artificial
Intelligence.

Hinton, G. E. and  Sejnowski, T. J. 1986. "Learning and relearning in 
Boltzmann machines", in Parallel distributed processing: Explorations in
the microstructure of cognition, Bradford Books.

Kullback, S., 1959. "Information Theory and Statistics", New York: Wiley.

Pearlmutter, B. A.  and  Hinton, G. E. 1986. "G-Maximization: An unsupervised 
learning procedure for discovering regularities.", Neural Networks for 
Computing: American Institute of Physics Conference Proceedings 151.

Sue Becker                      
DCS, University of Toronto