information function vs. squared error

Tue Mar 14 10:16:44 EST 1989

I tried sending the following note last weekend but it failed for some
reason - apologies if anyone is getting a repeat!

Re:
    > Date:         Wed, 08 Mar 89 11:36:31 EST
    > From: thanasis kehagias <ST401843%bitnet.brownvm at edu.cmu.cc.vma>
    > Subject:      information function vs. squared error
    >
    > i am looking for pointers to papers discussing the use of an alternative
    > criterion to squared error, in back propagation algorithms. the
    [..]
    >    G=sum{i=1}{N} p_i*log(p_i)
    >

Here is a non-causal reference:

I have been looking at an error measure based on "approximate distances to
class-boundary" instead of the total squared error used in typical supervised
learning networks. The idea is motivated by the fact that a large network
has an inherent freedom to classify a training set in many ways (and thus
poor generalisation!).

In my training, an example of a particular class gets a target value
depending on where it lies with respect to examples from the other class
(in a two class problem).

This implies, that the target interpolation function that the network has to
construct is a smooth transition from one class to the other (rather than
a step-like cross section in the total squared error criterion).

The important consequence of doing this is that networks are automatically
deprived of the ability to form large weight (- sharp cross section)
solutions (an auto weight decay!!).

niranjan
PS: A Tech report will be announced soon.