information function vs. squared error
becker@ai.toronto.edu
becker at ai.toronto.edu
Thu Mar 9 13:26:38 EST 1989
The use of the cross-entropy measure G = p log(p/q) + (1-p)log(1-p)/(1-q)
(Kullback, 1959), where p and q are the probabilities of a binary random
variable under 2 probability distributions) has been described in at least
3 different contexts in the connectionist literature:
(i) As an objective function for supervised back-propagation; this
is appropriate if the output units are computing real values which
are to be interpreted as probability distributions over the space
of binary output vectors (Hinton, 1987). Here G-error represents
the divergence between the desired and observed distributions.
(ii) As an objective function for Boltzmann machine learning (Hinton
and Sejnowski, 1986), where p and q are the output distributions
in the + and - phases.
(iii) In the Gmax unsupervised learning algorithm (Pearlmutter and Hinton,
1986) as a measure of the difference between the actual
output distribution of a unit and the predicted distribution assuming
independent input lines.
References:
Hinton, G. E. 1987. "Connectionist Learning Procedures", Revised version
of Technical Report CMU-CS-87-115, to appear (appeared ?) in Artificial
Intelligence.
Hinton, G. E. and Sejnowski, T. J. 1986. "Learning and relearning in
Boltzmann machines", in Parallel distributed processing: Explorations in
the microstructure of cognition, Bradford Books.
Kullback, S., 1959. "Information Theory and Statistics", New York: Wiley.
Pearlmutter, B. A. and Hinton, G. E. 1986. "G-Maximization: An unsupervised
learning procedure for discovering regularities.", Neural Networks for
Computing: American Institute of Physics Conference Proceedings 151.
Sue Becker
DCS, University of Toronto
More information about the Connectionists
mailing list