information function vs. squared error
Mahesan Niranjan
niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK
Tue Mar 14 10:16:44 EST 1989
I tried sending the following note last weekend but it failed for some
reason - apologies if anyone is getting a repeat!
Re:
> Date: Wed, 08 Mar 89 11:36:31 EST
> From: thanasis kehagias <ST401843%bitnet.brownvm at edu.cmu.cc.vma>
> Subject: information function vs. squared error
>
> i am looking for pointers to papers discussing the use of an alternative
> criterion to squared error, in back propagation algorithms. the
[..]
> G=sum{i=1}{N} p_i*log(p_i)
>
Here is a non-causal reference:
I have been looking at an error measure based on "approximate distances to
class-boundary" instead of the total squared error used in typical supervised
learning networks. The idea is motivated by the fact that a large network
has an inherent freedom to classify a training set in many ways (and thus
poor generalisation!).
In my training, an example of a particular class gets a target value
depending on where it lies with respect to examples from the other class
(in a two class problem).
This implies, that the target interpolation function that the network has to
construct is a smooth transition from one class to the other (rather than
a step-like cross section in the total squared error criterion).
The important consequence of doing this is that networks are automatically
deprived of the ability to form large weight (- sharp cross section)
solutions (an auto weight decay!!).
niranjan
PS: A Tech report will be announced soon.
More information about the Connectionists
mailing list