Tecnhical report on fault tolerance of feedforward ANNs available.

Sun Nov 21 16:27:40 EST 1993

This is the abstract of Technical Report No. TR-92-CSE-26, ECE Dept.,
Univ. of Massachusetts, Amherst, MA 01003.
The report is 
titled "Complete and Partial Fault Tolerance of Feedforward Neural
Nets". It is available in the neuroprose archive 
(the compressed post script file name is phatak.nn-fault-tolerance.ps.Z).
An abbreviated version of this report is in print in 
the IEEE Transactions on Neural Nets (to appear in 1994).
I would be glad to get feedback about the issues discussed and
the results presented. Thanks ! 

----------------------	ABSTRACT ---------------------------------------

A method is proposed 
to estimate the fault tolerance of
feedforward Artificial Neural Nets (ANNs) and 
synthesize robust nets. The
fault model abstracts 
a variety of failure modes of hardware implementations to 
permanent stuck--at type faults of single components.
A procedure is developed to build fault tolerant ANNs by replicating
the hidden units. It exploits the intrinsic 
weighted summation operation performed by the processing units
in order to overcome faults.
It is simple, robust and is applicable to any feedforward net.
Based on this procedure, metrics are devised
to quantify the fault tolerance as a function of redundancy.

	Furthermore, a lower bound on the redundancy required to 
tolerate all possible single faults is analytically derived.
This bound demonstrates that less than Triple Modular Redundancy 
(TMR) cannot provide complete fault tolerance for all
possible single faults.
This general result establishes a NECESSARY condition that
holds for ALL feedforward nets, 
irrespective of the network topology 
or the task it is trained on.
Analytical as well as extensive simulation results 
indicate that
the actual redundancy needed to SYNTHESIZE a completely
fault tolerant net
is specific to the problem at hand and is usually much 
higher than that dictated by the 
general lower bound.
The data implies that the conventional TMR 
scheme of triplication and majority vote is 
the best
way to achieve complete fault tolerance in most ANNs.

	Although the redundancy needed for complete fault tolerance is
substantial, the results do show that ANNs exhibit good partial 
fault tolerance to begin with (i.e., without any extra redundancy) and degrade 
gracefully. The first replication is
seen to yield maximum enhancement in partial fault tolerance compared to
later, successive replications. For large nets, exhaustive testing of
all possible single faults is prohibitive.
Hence, the strategy of randomly testing a small fraction of the total number 
links is adopted. It yields partial fault tolerance estimates
that are very close to those obtained by
exhaustive testing. Moreover,
when the fraction of links tested is held fixed, the accuracy of
the estimate generated by random testing is seen to
improve as the net size grows.

-------------------------------------------------------------------------------