Forwarded

Tue Jun 6 06:52:25 EDT 2006

From: Mahesan Niranjan <niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK>
Date: Mon, 13 Mar 89 10:45:19 GMT
Subject: Not Total Squared Error Criterion

Re:
    > Date:         Wed, 08 Mar 89 11:36:31 EST
    > From: thanasis kehagias <ST401843%bitnet.brownvm at edu.cmu.cc.vma>
    > Subject:      information function vs. squared error
    >
    > i am looking for pointers to papers discussing the use of an alternative
    > criterion to squared error, in back propagation algorithms. the
    [..]
    >    G=sum{i=1}{N} p_i*log(p_i)
    >

Here is a non-causal reference:

I have been looking at an error measure based on "approximate distances to
class-boundary" instead of the total squared error used in typical supervised
learning networks. The idea is motivated by the fact that a large network
has an inherent freedom to classify a training set in many ways (and thus
poor generalisation!).

In my training, an example of a particular class gets a target value
depending on where it lies with respect to examples from the other class
(in a two class problem).

This implies, that the target interpolation function that the network has to
construct is a smooth transition from one class to the other (rather than
a step-like cross section in the total squared error criterion).

The important consequence of doing this is that networks are automatically
deprived of the ability to form large weight (- sharp cross section)
solutions.

niranjan
PS: A Tech report will be announced soon.