information function vs. squared error 
    becker@ai.toronto.edu 
    becker at ai.toronto.edu
       
    Thu Mar  9 13:26:38 EST 1989
    
    
  
The use of the cross-entropy measure G = p log(p/q) + (1-p)log(1-p)/(1-q)
(Kullback, 1959), where p and q are the probabilities of a binary random
variable under 2 probability distributions) has been described in at least 
3 different contexts in the connectionist literature:
(i) As an objective function for supervised back-propagation; this
is appropriate if the output units are computing real values which
are to be interpreted as probability distributions over the space
of binary output vectors (Hinton, 1987). Here G-error represents 
the divergence between the desired and observed distributions.
(ii) As an objective function for Boltzmann machine learning (Hinton
and Sejnowski, 1986), where p and q are the output distributions 
in the + and - phases.
(iii) In the Gmax unsupervised learning algorithm (Pearlmutter and Hinton,
1986) as a measure of the difference between the actual
output distribution of a unit and the predicted distribution assuming
independent input lines.
References:
Hinton, G. E. 1987. "Connectionist Learning Procedures", Revised version
of Technical Report CMU-CS-87-115, to appear (appeared ?) in Artificial
Intelligence.
Hinton, G. E. and  Sejnowski, T. J. 1986. "Learning and relearning in 
Boltzmann machines", in Parallel distributed processing: Explorations in
the microstructure of cognition, Bradford Books.
Kullback, S., 1959. "Information Theory and Statistics", New York: Wiley.
Pearlmutter, B. A.  and  Hinton, G. E. 1986. "G-Maximization: An unsupervised 
learning procedure for discovering regularities.", Neural Networks for 
Computing: American Institute of Physics Conference Proceedings 151.
Sue Becker                      
DCS, University of Toronto          	
    
    
More information about the Connectionists
mailing list