From ling at csd.uwo.ca  Tue Feb  1 03:37:10 1994
From: ling at csd.uwo.ca (Charles X. Ling)
Date: Tue, 1 Feb 94 03:37:10 EST
Subject: some questions on training neural nets...
Message-ID: <9402010837.AA01695@godel.csd.uwo.ca>

Hi neural net experts,

I am using backprop (and variations of it) quite often although I have
not followed neural net (NN) research as well as I wanted. Some rather 
basic issues in training NN still puzzle me a lot, and I hope to get advice 
and help from the experts in the area. Sorry for being ignorant.

Say we are learning a function F (such as a Boolean function of n vars).
The training set (TR) and testing set (TS) are drawn randomly according to
the same probability distribution, with no noise added in.

1. Is it true that, since there is no noise, the smaller the training error
on TR, the better it would predict in general on TS? That is, stopping 
training earlier is not needed (so cross-validation is not needed).

2. Is it true that, to get reliable prediction (good or bad), we should
always choose net architecture with a minimum number of hidden units 
(or weights via weight decaying)? Will cross-validation help if we have
too much freedom in the net (could results on the validation set be coincident)?

3. If, for some reason, cross-validation is needed, and TR is split to
TR1 (for training) and TR2 (for validation), what would be the proper ways
to do cross-validation? Training on TR1 uses only partial information in 
TR, but training TR1 to find right parameters and then training on TR1+TR2 
may require parameters different from the estimation of training TR1. 

4. In case the net has too much freedom (even different random seeds
produce very different predictive accuracies), how can we effectively 
reduce the variations? Weight decaying seems to be a powerful tool, any others?
What kind of "simple" functions weight decaying is biased to?

Thanks very much for help
Charles

From marwan at sedal.sedal.su.OZ.AU  Tue Feb  1 21:07:09 1994
From: marwan at sedal.sedal.su.OZ.AU (Marwan Jabri)
Date: Tue, 1 Feb 94 21:07:09 EST
Subject: job openning
Message-ID: <9402011007.AA09253@sedal.sedal.su.OZ.AU>

The advertisment below could be of interest to a person with Unix and
connectionism skills.

---------------------------------------------------------------------
	Systems Engineering and Design Automation Laboratory
	Sydney University Electrical Engineering

	Computer Systems Officer (in other words, a software engineer!)
	Reference No:	B04/17


Applications are invited for the position of Computer Systems Officer
with the Systems Engineering and Design Automation Laboratory (SEDAL) 
at Sydney University Electrical Engineering. 

The position is aimed at:

- Supporting the administration of a computer network
  (Sun and DEC workstations);

- Developing software in the areas of neural computing,
  video coding and parallel computers.

The appointee must have knowledge and experience of C programming under
Unix, DOS and Windows, and a degree in electronics or computer science.
Experience in the areas of neural computing and/or video coding is highly
desirable.

Appointment will be for one year in the first instance with the
possibility of renewal for up to a further four years subject to need and
funding.

Further information from Marwan Jabri on (+61-2) 692 2240, fax (+61-2) 660
1228 or email: marwan at sedal.su.oz.au.

Salary:		Level 5   $28,899 - $32,598 per annum
Closing:  	10 February 1994

To apply, an application quoting reference number, including CV,
qualifications and the names, addresses, phone numbers and email addresses
of two referees should be sent to

	Personnel Officier
	Personnel Services K07
	The University of Sydney
	NSW 2006 Australia

----------------------------------------------------------------------
Equal opportunity and no smoking in the workplace are University Policies.
The University Resevers the right not to proceed with an appointment for
financial or other reasons.

From prechelt at ira.uka.de  Tue Feb  1 09:08:12 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 01 Feb 1994 15:08:12 +0100
Subject: donations to bibliography server
Message-ID: <"irafs2.ira.632:01.01.94.14.08.31"@ira.uka.de>


A colleague of mine here at University of Karlsruhe is currently
building a large bibliographic database that is available free of
charge on the internet.
It currently contains about 210000 entries from various fields of
computer science (mostly parallel processing, graphics, theoretical
computer science, computational geometry, human computer interaction)
Although there are several thousand entries on Artificial Intelligence topics,
connectionism is not covered very well yet (Neural Computation's contents
are present and some personal bibliographies).

To extend this database by at least some basic information about
neural network and other connectionist research, it would be fine if
somebody could donate bibliographies on these topics which are 
(almost) comprehensive in some respect.

In particular, I think it would be a very good start to have complete
contents of NIPS, IJCNN, and Neural Networks (and perhaps, other journals
such as Complex Systems).

   If anybody is able and willing to donate such bibliographies,
   please send me email.

BibTeX format would be best, but refer or other parsable formats are OK, too.

For information on the bibliography service, send mail with a single line
containing the word 'help' in the body to bibserv at ira.uka.de
[ The query service is still in a test stage and is not yet available to
  people located outside email domain '.de' (Germany)
  due to resource restrictions.
  The bibliographies themselves, however, are available for anonymous ftp
  from ftp.ira.uka.de:/pub/bibliography ]

  Lutz

Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From schraudo at salk.edu  Tue Feb  1 03:04:05 1994
From: schraudo at salk.edu (Nici Schraudolph)
Date: Tue, 1 Feb 94 00:04:05 PST
Subject: Neural Computation BibTeX database available
Message-ID: <9402010804.AA02809@salk.edu>

I've made a database of BibTeX entries for all articles published in the
first five volumes of the journal Neural Computation; it's available by
anonymous ftp from mitpress.mit.edu (18.173.0.28), file NC.bib.Z in the
pub/NeuralComp directory.

Share and enjoy,

- Nici Schraudolph.


From prechelt at ira.uka.de  Wed Feb  2 04:12:48 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 10:12:48 +0100
Subject: Techreport on CuPit available
Message-ID: <"irafs2.ira.960:02.01.94.09.13.09"@ira.uka.de>


The technical report

 Lutz Prechelt: "CuPit --- A Parallel Language for Neural Algorithms:
                 Language Reference and Tutorial"

is now available for anonymous ftp from
ftp.ira.uka.de /pub/uni-karlsruhe/papers/cupit.ps.gz (154 Kb, 75 pages)

It is NOT on neuroprose, because its topic does not quite fit into
neuroprose's scope.


Abstract:
----------
CuPit is a parallel programming language with two main design goals:
1. to allow the simple, problem-adequate formulation of learning
   algorithms for neural networks with focus on algorithms that change
   the topology of the underlying neural network during the learning
   process and
2. to allow the generation of efficient code for massively parallel
   machines from a completely machine-independent program description, in
   particular to maximize both data locality and load balancing even for
   irregular neural networks.
The idea to achieve these goals lies in the programming model:
CuPit programs are object-centered, with connections and nodes of a
graph (which is the neural network) being the objects. Algorithms are
based on parallel local computations in the nodes and connections and
communication along the connections (plus broadcast and reduction
operations).  
This report describes the design considerations and the resulting
language definition and discusses in detail a tutorial example
program.
----------

Remember to use 'binary' mode for ftp. To uncompress the Postscript file,
you need to have the GNU gzip utility.

  Lutz
  
Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  D-76128 Karlsruhe;  Germany    | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <"irafs2.ira.708:02.01.94.08.59.37"@ira.uka.de>


I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value, the relevant one
    being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone transformation
    to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

  Lutz

Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From marshall at cs.unc.edu  Wed Feb  2 12:41:49 1994
From: marshall at cs.unc.edu (Jonathan A. Marshall)
Date: Wed, 2 Feb 94 12:41:49 -0500
Subject: Papers on visual occlusion and neural networks
Message-ID: <9402021741.AA17887@marshall.cs.unc.edu>

Dear Colleagues,

Below I list two new papers that I have added to the Neuroprose archives
(thanks to Jordan Pollack!).  In addition, I list two of my older papers
in Neuroprose.  You can retrieve a copy of these papers -- follow the
instructions at the end of this message.

--Jonathan

----------------------------------------------------------------------------

marshall.occlusion.ps.Z  (5 pages)

	      A SELF-ORGANIZING NEURAL NETWORK THAT LEARNS TO
	  DETECT AND REPRESENT VISUAL DEPTH FROM OCCLUSION EVENTS
				      
		JONATHAN A. MARSHALL  and  RICHARD K. ALLEY
				      
	  Department of Computer Science, CB 3175, Sitterson Hall
      University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A.
		   marshall at cs.unc.edu, alley at cs.unc.edu

Visual occlusion events constitute a major source of depth information.  We
have developed a neural network model that learns to detect and represent
depth relations, after a period of exposure to motion sequences containing
occlusion and disocclusion events.  The network's learning is governed by a
new set of learning and activation rules.  The network develops two parallel
opponent channels or "chains" of lateral excitatory connections for every
resolvable motion trajectory.  One channel, the "On" chain or "visible"
chain, is activated when a moving stimulus is visible.  The other channel,
the "Off" chain or "invisible" chain, is activated when a formerly visible
stimulus becomes invisible due to occlusion.  The On chain carries a
predictive modal representation of the visible stimulus.  The Off chain
carries a persistent, amodal representation that predicts the motion of the
invisible stimulus.  The new learning rule uses disinhibitory signals
emitted from the On chain to trigger learning in the Off chain.  The Off
chain neurons learn to interact reciprocally with other neurons that
indicate the presence of occluders.  The interactions let the network
predict the disappearance and reappearance of stimuli moving behind
occluders, and they let the unexpected disappearance or appearance of
stimuli excite the representation of an inferred occluder at that location.
Two results that have emerged from this research suggest how visual systems
may learn to represent visual depth information.  First, a visual system can
learn a nonmetric representation of the depth relations arising from
occlusion events.  Second, parallel opponent On and Off channels that
represent both modal and amodal stimuli can also be learned through the same
process.

[In Bowyer KW & Hall L (Eds.), Proceedings of the AAAI Fall Symposium on
 Machine Learning and Computer Vision, Research Triangle Park, NC, October
 1993, 70-74.]

----------------------------------------------------------------------------

marshall.context.ps.Z  (46 pages)

		  ADAPTIVE PERCEPTUAL PATTERN RECOGNITION
		    BY SELF-ORGANIZING NEURAL NETWORKS:
	       CONTEXT, UNCERTAINTY, MULTIPLICITY, AND SCALE
				      
			    JONATHAN A. MARSHALL

	  Department of Computer Science, CB 3175, Sitterson Hall
      University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A.
			    marshall at cs.unc.edu

A new context-sensitive neural network, called an "EXIN" (excitatory+
inhibitory) network, is described.  EXIN networks self-organize in complex
perceptual environments, in the presence of multiple superimposed patterns,
multiple scales, and uncertainty.  The networks use a new inhibitory
learning rule, in addition to an excitatory learning rule, to allow
superposition of multiple simultaneous neural activations (multiple
winners), under strictly regulated circumstances, instead of forcing
winner-take-all pattern classifications.  The multiple activations represent
uncertainty or multiplicity in perception and pattern recognition.
Perceptual scission (breaking of linkages) between independent category
groupings thus arises and allows effective global context-sensitive
segmentation and constraint satisfaction.  A Weber Law neuron-growth rule
lets the network learn and classify input patterns despite variations in
their spatial scale.  Applications of the new techniques include
segmentation of superimposed auditory or biosonar signals, segmentation of
visual regions, and representation of visual transparency.

[Submitted for publication.]

----------------------------------------------------------------------------

marshall.steering.ps.Z  (16 pages)

    CHALLENGES OF VISION THEORY:  SELF-ORGANIZATION OF NEURAL MECHANISMS
  FOR STABLE STEERING OF OBJECT-GROUPING DATA IN VISUAL MOTION PERCEPTION

			    JONATHAN A. MARSHALL

[Invited paper, in Chen S-S (Ed.), Stochastic and Neural Methods in Signal
 Processing, Image Processing, and Computer Vision, Proceedings of the SPIE
 1569, San Diego, July 1991, 200-215.]

----------------------------------------------------------------------------

martin.unsmearing.ps.Z  (8 pages)

			 UNSMEARING VISUAL MOTION:
	 DEVELOPMENT OF LONG-RANGE HORIZONTAL INTRINSIC CONNECTIONS

		 KEVIN E. MARTIN  and  JONATHAN A. MARSHALL

[In Hanson SJ, Cowan JD, & Giles CL, Eds., Advances in Neural Information
 Processing Systems, 5.  San Mateo, CA: Morgan Kaufmann Publishers, 1993,
 417-424.]

----------------------------------------------------------------------------


RETRIEVAL INSTRUCTIONS

    % ftp archive.cis.ohio-state.edu
    Name (cheops.cis.ohio-state.edu:yourname): anonymous
    Password: (use your email address)
    ftp> cd pub/neuroprose
    ftp> binary
    ftp> get marshall.occlusion.ps.Z
    ftp> get marshall.context.ps.Z
    ftp> get marshall.steering.ps.Z
    ftp> get martin.unsmearing.ps.Z
    ftp> quit
    % uncompress marshall.occlusion.ps.Z ; lpr marshall.occlusion.ps
    % uncompress marshall.context.ps.Z ;   lpr marshall.context.ps
    % uncompress marshall.steering.ps.Z ;  lpr marshall.steering.ps
    % uncompress martin.unsmearing.ps.Z ;  lpr martin.unsmearing.ps


From tgd at chert.CS.ORST.EDU  Wed Feb  2 13:02:30 1994
From: tgd at chert.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 2 Feb 94 10:02:30 PST
Subject: some questions on training neural nets...
In-Reply-To: "Charles X. Ling"'s message of Tue, 1 Feb 94 03:37:10 EST <9402010837.AA01695@godel.csd.uwo.ca>
Message-ID: <9402021802.AA00565@curie.CS.ORST.EDU>

   From: "Charles X. Ling" <ling at csd.uwo.ca>
   Date: Tue, 1 Feb 94 03:37:10 EST

   Hi neural net experts,

   I am using backprop (and variations of it) quite often although I have
   not followed neural net (NN) research as well as I wanted. Some rather 
   basic issues in training NN still puzzle me a lot, and I hope to get advice 
   and help from the experts in the area. Sorry for being ignorant.

   Say we are learning a function F (such as a Boolean function of n vars).
   The training set (TR) and testing set (TS) are drawn randomly according to
   the same probability distribution, with no noise added in.

   1. Is it true that, since there is no noise, the smaller the training error
   on TR, the better it would predict in general on TS? That is, stopping 
   training earlier is not needed (so cross-validation is not needed).

No, this is not true.  Even in the noise-free case, the bias/variance
tradeoff is operating and it is possible to overfit the training data.
Consider for example an algorithm that just memorized the training set
and guessed "false" on all unseen examples.  It has obviously overfit,
and it will obviously do poorly even in the absence of noise.

   2. Is it true that, to get reliable prediction (good or bad), we should
   always choose net architecture with a minimum number of hidden units 
   (or weights via weight decaying)? Will cross-validation help if we have
   too much freedom in the net (could results on the validation set be coincident)?

There are many ways to manage the bias/variance tradeoff.  I would say
that there is nothing approaching complete agreement on the best
approaches (and more fundamentally, the best approach varies from one
application to another, since this is really a form of prior).  The
approaches can be summarized as

* early stopping
* error function penalties
* size optimization
  - growing
  - pruning
  - other

Early stopping usually employs cross-validation to decide when to stop
training.  (see below).  In my experience, training an overlarge
network with early stopping gives better performance than trying to
find the minimum network size.  It has the disadvantage that training
costs are very high.

Error function penalties such as weight decay and soft weight-sharing
have been very effective in some applications. In my experience, they
introduce additional training problems, because the error surface can
develop more local minima.  A solution to this is to gradually
increase the penalties during training, but this requires more
hands-on work than I have patience for.

Size optimization attempts to find the optimal number of units and/or
number of weights.  Cascade-correlation and related algorithms grow
the network, optimal brain damage and optimal brain surgeon prune the
network, and then of course one can use cross-validation and just
generate-and-test different network sizes.  An advantage of
"right-sizing" is that training time can be considerably reduced (at
least the time per epoch).  A problem with right-sizing, I believe, is
that simply counting units or weights is not necessarily a good
measure of network size.  The work by Weigend (see 1993 summer school
proceedings) suggests that early stopping provides a better method for
modulating the effective number of parameters in the network.  The
OBD/OBS methods do not "just count weights", but instead assess the
significance of the weights, so even non-zero weights that are useless
can be removed.

   3. If, for some reason, cross-validation is needed, and TR is split to
   TR1 (for training) and TR2 (for validation), what would be the proper ways
   to do cross-validation? Training on TR1 uses only partial information in 
   TR, but training TR1 to find right parameters and then training on TR1+TR2 
   may require parameters different from the estimation of training TR1. 

I use the TR1+TR2 approach.  On large data sets, this works well.  On
small data sets, the cross-validation estimates themselves are very
noisy, so I have not found it to be as successful.  I compute the
stopping point using the sum squared error per training example, so
that it scales.  I think it is an open research problem to know
whether this is the right thing to do.  On a large speech recognition
data set, after doing cross-validation training, we later checked to
see if we had stopped at the right point (by monitoring using the test
set).  The cross-validation point was nearly exactly right.  This was
a case with a large data set.

   4. In case the net has too much freedom (even different random seeds
   produce very different predictive accuracies), how can we effectively 
   reduce the variations? Weight decaying seems to be a powerful tool, any others?
   What kind of "simple" functions weight decaying is biased to?


   Thanks very much for help
   Charles

--Tom

From karun at faline.bellcore.com  Thu Feb  3 10:15:55 1994
From: karun at faline.bellcore.com (N. Karunanithi)
Date: Thu, 3 Feb 1994 10:15:55 -0500
Subject: Encoding missing values
Message-ID: <199402031515.KAA29100@faline.bellcore.com>


> I am currently thinking about the problem of how to encode data with
> a ttributes for which some of the values are missing in the data set for
> neural network training and use.

I am also having the same problem. I would like to get a copy
responses.

>1. Nominal attributes (that have n different possible values)
>  1.1 encoded "1-of-n", i.e., one network input per possible value, the relevant one
>    being 1 all others 0.
>      This encoding is very general, but has the disadvantage of producing
>      networks with very many connections.
>      Missing values can either be represented as 'all zero' or by simply
>      treating 'is missing' as just another possible input value, resulting
>      in a "1-of-(n+1)" encoding.
>  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
>    binary representation of the numbers 0...n-1 (or 1...n).
>      Missing values can either be represented as just another possible input
>      value (probably all-bits-zero is best) or by adding an additional network
>      input which is 1 for 'is missing' and 0 for 'is present'. The original
>      inputs should probably be all zero in the 'is missing' case.
>

   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

>2. continuous attributes (or attributes treated as continuous)
>  2.1 encoded as a single network input, perhaps using some monotone transformation
>    to force the values into a certain distribution.
>      Missing values are either encoded as a kind of 'best guess' (e.g. the
>      average of the non-missing values for this attribute) or by using
>      an additional network input being 0 for 'missing' and 1 for 'present' 
>      (or vice versa) and setting the original attribute input either to 0
>      or to the 'best guess'. (The 'best guess' variant also applies to
>      nominal attributes above)

This representation requires GUESS. A nominal tranformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical intervel. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. May be a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

> 3. binary attributes (truth values)
>   3.1 encoded by one input:  0=false  1=true   or vice versa
>       Treat like (2.1)
>   3.2 encoded by one input:  -1=false 1=true   or vice versa
>       In this case we may act as for (3.1) or may just use 0 to indicate 'missing'.
>   3.3 treat like nominal attribute with 2 possible values

No comments.

> 4. ordinal attributes (having n different possible values, which are ordered)
>   4.1 treat either like continuous or like nominal attribute.
>     If (1.2) is chosen, a Gray-Code should be used.
>     Continuous representation is risky unless a 'sensible' quantification
>     of the possible values is available.    

I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

(Refer to: N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

  N. Karunanithi and Y. K. Malaiya, "The Scaling Problem in Neural
     Networks for Software Reliability Prediction", Proc. IEEE Int.
    Symposium on Rel. Eng., Oct. 1992, pp. 776-82.
 )


> So far to my considerations. Now to my questions.
> 
> a) Can you think of other encoding methods that seem reasonable ?  Which ?
> 
> b) Do you have experience with some of these methods that is worth sharing ?
> 
> c) Have you compared any of the alternatives directly ?
> 
>   Lutz

 I have not found a simple solution that is general. I think
representation in general and the missing information in specific
are open problems within connectionist research. I am not sure we will
have a magic bullet for all problems. The best approach is to come up
with a specific solution for a given problem.

-Karun


From Thierry.Denoeux at hds.univ-compiegne.fr  Thu Feb  3 03:36:47 1994
From: Thierry.Denoeux at hds.univ-compiegne.fr (Thierry.Denoeux@hds.univ-compiegne.fr)
Date: Thu, 3 Feb 1994 09:36:47 +0100
Subject: Encoding missing values
Message-ID: <199402030836.AA29123@kaa.hds.univ-compiegne.fr>

Dear Lutz, dear connectionists,

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:

X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.


Thierry Denoeux

 
+------------------------------------------------------------------------+
| tdenoeux at hds.univ-compiegne.fr  Thierry DENOEUX                        |
|                                 Departement de Genie Informatique      |
|                                 Centre de Recherches de Royallieu      |
| tel (+33) 44 23 44 96           Universite de Technologie de Compiegne |
| fax (+33) 44 23 44 77           B.P. 649                               |
|                                 60206 COMPIEGNE CEDEX                  |
|                                 France                                 |
+------------------------------------------------------------------------+

From rreilly at nova.ucd.ie  Thu Feb  3 10:38:08 1994
From: rreilly at nova.ucd.ie (Ronan Reilly)
Date: Thu, 3 Feb 1994 15:38:08 +0000
Subject: Fourth Irish Neural Networks Conference - INNC'94
Message-ID: <mailman.654.1149540276.24850.connectionists@cs.cmu.edu>

FOURTH IRISH NEURAL NETWORK CONFERENCE - INNC'94

University College Dublin, Ireland
September 12-13, 1994

FIRST CALL FOR PAPERS

Papers are solicited for the Fourth Irish Neural Network
Conference (INNC'94).  They can be in any area of theoretical
or applied neural networks.  A non-exhaustive list of topic headings 
include:

	Learning algorithms
	Cognitive modelling
	Neurobiology
	Natural language processing
	Vision
	Signal processing
	Time series analysis
	Hardware implementations

An extended abstract of not more than 500 words should be sent, 
preferably by e-mail, to:

	Ronan Reilly - INNC'94
	Dept. of Computer Science
	University College Dublin
	Belfield
	Dublin 4
	IRELAND

	e-mail: rreilly at nova.ucd.ie

The deadline for receipt of abstracts is March 31, 1994.  Authors will
be contacted regarding acceptance by April 30, 1994.  Full papers will
be required by August 31, 1994.

From finnoff at predict.com  Thu Feb  3 11:40:51 1994
From: finnoff at predict.com (William Finnoff)
Date: Thu, 3 Feb 94 09:40:51 MST
Subject: some questions on training neural nets...
Message-ID: <9402031640.AA01243@predict.com>

 Charles X. Ling writes:   

>   Hi neural net experts,
>
>   I am using backprop (and variations of it) quite often although I have
>   not followed neural net (NN) research as well as I wanted. Some rather 
>   basic issues in training NN still puzzle me a lot, and I hope to get advice 
>   and help from the experts in the area. Sorry for being ignorant....

In addition to Tom's pertinent comments, (tgd at chert.cs.orst.edu, Thu Feb 3) I 
would suggest consulting the following references which contain discussions
of various issues pretaining to /model selection/overfitting/stopped training/
complexity control/bias variance dilema.  (This list is by no means
complete).  References 2), 4), 13), 15) and 17) are particularly relevant
to the questions raised.


1)  Baldi, P. and Chauvin, Y. (1991). Temporal evolution of generalization during learning in linear networks,  {\it Neural Computation} 3, 589-603.                 
2) Finnoff, W., Hergert, F. and Zimmermann, H.G., 
Improving generalization performance by  nonconvergent model selection methods,  {\it Neural Networks}, vol.6, nr.6, pp. 771-783, 1993. 

3)  Finnoff, W. and  Zimmermann, H.G. (1991). Detecting structure in small datasets by network fitting under complexity constraints. To appear in {\it Proc. of 2nd Ann. Workshop on Computational Learning Theory and Natural Learning Systems}, Berkley.                                


4) Geman, S., Bienenstock, E. and Doursat R., (1992). Neural networks and the bias/variance dilemma, {\it Neural Computation} 4, 1-58.
  
5)  Guyon, I., Vapnik, V., Boser, B., Bottou, L. and Solla, S. (1992). Structural risk minimization for character recognition. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman.  

6) Hanson, S. J., and Pratt, L. Y. (1989). Comparing biases for minimal network construction with back-propagation, In  D. S. Touretzky, (Ed.), {\it Advances in Neural Information Processing I} (pp.177-185). San Mateo: Morgan Kaufman.

7)  Hergert, F., Finnoff, W. and Zimmermann, H.G. (1992). A comparison of weight elimination methods                                                           for reducing complexity in neural networks.  {\it Proc. Int. Joint Conf. on Neural Networks}, Baltimore.

8)   Hergert, F., Zimmermann, H.G., Kramer, U., and Finnoff,  W. (1992).
Domain independent testing and performance comparisons for neural networks.  In I. Aleksander and J. Taylor (Eds.) {\it  Artificial Neural Networks II} (pp.1071-1076). London: North Holland.

9)  Le Cun, Y., Denker J. and Solla, S. (1990). Optimal Brain Damage.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605).   San Mateo: Morgan Kaufman.

10)  MacKay, D. (1991). {\it Bayesian Modelling and Neural Networks},  Dissertation, Computational and Neural Systems, California Inst. of Tech. 139-74, Pasadena.    
              
11)  Moody, J. (1992). Generalization, weight decay and architecture selection for nonlinear learning systems. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman. 

 
12) Morgan, N. and Bourlard, H. (1990).  Generalization and parameter estimation in feedforward nets: Some experiments.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605). San Mateo: Morgan Kaufman.  

13) Sj\"oberg, J. and  Ljung, L. (1992). Overtraining, regularization and searching for minimum in neural networks, {Report LiTH-ISY-I-1297, Dep. of Electrical Engineering}, Link\"oping University, S-581 83 Link\"oping, Sweden.                                       
14) Stone, C.J. (1977).  Cross-validation: A review. {\it Math. Operations res. Statist. Ser.}, 9, 1-51.


15)  Vapnik, V. (1992). Principles of risk minimization for learning theory.  In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 831-838 ). San Mateo: Morgan Kaufman.                                 

16)  Weigend, A. and  Rumelhart, D. (1991). The effective dimension of the space of hidden units, in {\it Proc. Int. Joint Conf. on Neural Networks}, Singapore. 
17)  Weigend, A.,  Rumelhart, D., and Huberman, B. (1991). Generalization by weight elimination with application to forecasting.  In  R. Lippman, J. Moody and D. Touretzy (Eds.), {\it Advances in Neural Information Processing III} (pp.875-882).  San Mateo:  Morgan Kaufman.                                
                    
18) White, H. (1989). Learning in artificial neural networks: A statistical perspective, {\it Neural Computation} 1, 425-464.


-William


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

William Finnoff
Prediction Co.
320 Aztec St., Suite B
Santa Fe, NM, 87501, USA

Tel.: (505)-984-3123
Fax:  (505)-983-0571

e-mail: finnoff at predict.com

From jlm at crab.psy.cmu.edu  Thu Feb  3 11:27:41 1994
From: jlm at crab.psy.cmu.edu (James L. McClelland)
Date: Thu, 3 Feb 94 11:27:41 EST
Subject: CMU-Pitt Center for the Neural Basis of Cognition
Message-ID: <9402031627.AA08304@crab.psy.cmu.edu.psy.cmu.edu>


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


      Carnegie Mellon University and the University of Pittsburgh

                        Announce the Creation of

             the Center for the Neural Basis of Cognition


The Center is dedicated to the study of the neural basis of cognitive
processes, including learning and memory, language and thought,
perception, attention, and planning; to the study of the development
of the neural substrate of these processes; to the study of disorders
of these processes and their underlying neuropathology; and to the
promotion of applications of the results of these studies to artificial 
intelligence, technology, and medicine.  The Center will synthesize the 
disciplines of basic and clinical neuroscience, cognitive psychology, 
and computer science, combining neurobiological, behavioral, computa-
tional and brain imaging methods.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


                     Faculty Openings in the Center


The Center seeks faculty and research scientists whose work relates to
the mission stated above.  Recruiting is beginning immediately, and
will continue for several years.  Appointments can be at any level and
will be coordinated with one or more departments at either university.

Coordinating departments include Biological Sciences, Computer Science, 
and Psychology at Carnegie Mellon and the departments of Behavioral 
Neuroscience, Neurobiology, Neurology, Psychiatry and Psychology at the 
University of Pittsburgh.  Other affiliations may be possible.

Candidates should send an application to either of the Co-Directors of
the Center, listed below.  The application should include a statement
of interest indicating how the candidate's work fits the mission of the 
center and suggesting possible departmental affiliations, as well as a
CV, copies of publications, and three letters of reference.  Both uni-
versities are EEO/AA Employers.


James L. McClelland                                     Robert Y. Moore
Department of Psychology                        Center for Neuroscience
Baker Hall 345-F                          Biomedical Science Tower 1656
Carnegie Mellon University                     University of Pittsburgh
Pittsburgh, PA 15213                               Pittsburgh, PA 15261

                                                                       
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From wahba at stat.wisc.edu  Thu Feb  3 20:42:28 1994
From: wahba at stat.wisc.edu (Grace Wahba)
Date: Thu, 3 Feb 94 19:42:28 -0600
Subject: nips6 paper on ss-anova in archive
Message-ID: <9402040142.AA06981@hera.stat.wisc.edu>


Dear Colleagues
  Our paper for the 1993 Neural Information Processing 
Society (NIPS) Proceedings is in the neuroprose archive 
under

wahba.nips6.ps.Z

Title: Structured Machine Learning For `Soft' Classification 
with Smoothing Spline ANOVA and Stacked Tuning, Testing
and Evaluation. 

Authors: G. Wahba, Y. Wang, C. Gu, R. Klein and B. Klein

              Summary

We describe the use of smoothing spline analysis of variance 
(SS-ANOVA) in the penalized log likelihood context, 
for learning (estimating) the probability 
$p$ of a `$1$' outcome, given a training set with 
attribute vectors and 0-1 outcomes.
$p$ is of the form $p(t) = e^{f(t)}/(1+e^{f(t)})$, where, 
if $t$ is a vector of attributes, $f$ is learned as a sum of smooth 
functions of one attribute plus a sum of smooth functions 
of two attributes, etc. The smoothing parameters governing 
$f$ are obtained by an iterative unbiased risk or iterative GCV method. 
Confidence intervals for these estimates are available. 
The method is applied to estimate the risk of progression 
of diabetic retinopathy given predictor variables of 
age, body mass index and glycosylated hemoglobin.

RETRIEVAL INSTRUCTIONS for NEUROPROSE ARCHIVE

    % ftp archive.cis.ohio-state.edu
    Name (cheops.cis.ohio-state.edu:yourname): anonymous
    Password: (use your email address)
    ftp> cd pub/neuroprose
    ftp> binary
    ftp> get wahba.nips6.ps.Z
    ftp> quit
    % uncompress wahba.nips6.ps.Z 
    % lpr wahba.nips6.ps

Some other papers of yours truly, friends and students, 
and an idiosyncratic bibliography of possible interest 
to connectionists are available by ftp. Get the (ascii)
file Contents to see what's there.

RETRIEVAL INSTRUCTIONS for WAHBA's public directory

    % ftp ftp.stat.wisc.edu
    Name (ftp.stat.wisc.edu:yournamehere): anonymous
    Password: (use your email address)
    ftp> binary
    ftp> cd pub/wahba
    ftp> get Contents
          ...
read Contents and retrieve files of interest


From pollack at cis.ohio-state.edu  Thu Feb  3 17:17:14 1994
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Thu, 3 Feb 1994 17:17:14 -0500
Subject: new neuroprose/Thesis subdirectory
Message-ID: <199402032217.RAA01292@dendrite.cis.ohio-state.edu>

*** do not forward **

The filesystem on which neuroprose resides has overflowed.
A set of very large files (all the files with *thesis* in their
filename), have been moved to a new subdirectory.

jordan


From bill at nsma.arizona.edu  Thu Feb  3 23:53:26 1994
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Thu, 03 Feb 1994 21:53:26 -0700 (MST)
Subject: Encoding missing values
Message-ID: <9402040453.AA24599@nsma.arizona.edu>


There is at least one kind of network that has no problem (in
principle) with missing inputs, namely a Boltzmann machine.
You just refrain from clamping the input node whose value is
missing, and treat it like an output node or hidden unit.

This may seem to be irrelevant to anything other than Boltzmann
machines, but I think it could be argued that nothing very much
simpler is capable of dealing with the problem.  When you ask
a network to handle missing inputs, you are in effect asking it
to do pattern completion on the input layer, and for this a
Boltzmann machine or some other sort of attractor network would
seem to be required. 

	-- Bill

From tal at goshawk.lanl.gov  Fri Feb  4 10:22:12 1994
From: tal at goshawk.lanl.gov (Tal Grossman)
Date: Fri, 4 Feb 1994 08:22:12 -0700
Subject: some questions on training neural nets...
Message-ID: <199402041522.IAA22945@goshawk.lanl.gov>

Dear Charles X. Ling,

You say: "Some rather basic issues in training NN still puzzle me a lot, 
and I hope to get advice and help from the experts in the area."
Well... the questions you have asked still puzzle the experts as well,
and good answers, where they exist, are very much case dependent.
As Tom Dietterich wrote, in general "Even in the noise-free case, the bias/variance tradeoff is operating and it is possible to overfit the training 
data", therefore you can not expect just any large net to generalize well.


It was also observed recently that...
When having a large enough set of examples (so one can have a good enough sample
for the training and the validation set), you can obtain better generalization with larger nets by using cross validation to decide when to stop training,
as is demonstrated in the paper of A. Weigend :

Weigend A.S. (1994), in the {\em Proc. of the 1993 Connectionist
Models Summer School}, edited by M.C. Mozer, P. Smolensky, D.S. Touretzky,
J.L. Elman and A.S. Weigend, pp. 335-342 
(Erlbaum Associates, Hillsdale NJ, 1994).

Rich Caruana has presented similar results in the "Complexity Issues" workshop
in the last NIPS post-conference.

But... Larger networks can generalize as good as, or even better than small
networks even without cross-validation.  A simple experiment that demonstrates
that was presented in :

    T. Grossman, R. Meir and E. Domany,
    Learning by choice of Internal Representations,
    Complex Systems 2,  555-575 (1988).

In that experiment, networks with different number of hidden units were 
trained to perform the symmetry task by using a fraction of the possible
examples as the training set, training the net to 100% performance on the TR set
and testing the performance on the rest  (off training set generalization). 
No early stopping, no cross validation. 
The symmetry problem can be solved by 2 hidden units - 
so this is the minimal architecture required for this specific function. 
However, it was found that it is NOT the best generalizing architecture.
The generalization rates of all the architectures (H=2..N, the size of the input)
were similar, with the larger networks somewhat better.
Now, this is a special case. One can explain it by observing that the symmetry
problem can also be solved by a network of N hidden units, with smaller
weights, and not only by effectively "zeroing" the contributions of all but two
units (see an example in Minsky and Papert's Perceptrons). Probably by
all the other architectures as well. So, considering the mapping from weight
space to function space, it is very likely that training a large network on
partial data will take you closer (in function space) to your target function
F (symmetry in that case) than training a small one.

The picture can be different in other cases...
One has to remember that the training/generalization problem (including the
bias/variance tradeoff problem) is, in general, a complex interaction between
three entities:
1. The target function (or the task).
2. The learning model, and what is the class of functions that is realizable
 by this model (and its associated learning algorithm).
3. The training set, and how well it represents the task.

Even the simple question: is my training set large enough (or good enough) ?
is not simple at all. One might think that it should be larger than, say, twice
the number of free parameters (weights) in your model/network architecture.
It turns out that not even this is enough in general.
Allow me to advertise here the paper presented by A.Lapedes and myself at the 
last NIPS where we present a method to test a "general" classification algorithm
(i.e. any classifier such as a neural net, a decision tree, etc. and its 
 learning algorithm, which may include pruning or net construction) by a method
we call "noise sensitivity signature" NSS (see abstract below). In addition to 
introducing this new model selection method, which we believe can be a good
alternative to cross-validation in data limited cases, we present the following
experiment:  the target function is a network with 20:5:1 architecture (weights
chosen at random). The training set is provided by choosing M random input
patterns and classifying them by the teacher net. we then train other nets
with various architectures, ranging from 1 to 8 hidden units on the training
set (without controlled stopping, but with tolerance in the error function).
A different (and large) set of classified examples is used to determine the
generalization performance of the trained nets (averaged over several 
realizations with different initial weights).

Some of the results are :
1. With different training set sizes M=400,700,1000,  the the optimal
 architecture is different. Smaller training set yields smaller optimal 
 network, according to the independent test set measure.
2. Even with M=1000 (much more than twice the number of weights), the
 optimal learning net is still smaller than the original teacher net.
3. There are differences of up to a few percents in generalization performance
 of the different learning nets for all training set sizes.
 In particular, nets that are larger than the optimal are doing worse with
 size.
 Depends on your problem, a few percents can be insignificant or they can make
 a real difference.  In some real applications, 1-2 % can be the difference
 between a contract or a paper...  In such cases you would like to tune your
 model (i.e to identify the optimal architecture) as best as you can.
4. Using the NSS it was possible to recognize the optimal architectures for 
each training set, without using extra data.

Some conclusions are:  
1. If one uses a validation set to choose the architecture (not for 
stopping) - for example by using the extra 1000 examples - then the architecture
that will be picked up when using the 700 training set is going to be smaller
(and worse) than the one picked up when using the 1000 training set.
In other words, if your data is just a 1000 examples, and you devote 300 of 
them to be your validation set. Then even if those 300 will give a good estimation of the generalization of the trained net, when you choose the model
according to this test set, you end up with the optimal model for 700 training
examples, which is less good than the optimal model that you can obtain when
training with all the 1000 examples.
It means that in many cases you need more examples than one might expect in
order to obtain a well tuned model. Especially if you are using a considerable
fraction of it as a validation set.
2. Using NSS one would find the right architecture for the total number of 
examples you have - paying a factor of about 30 on training effort.
3. You can use "set 1 aside" cross validation in order to select your model.
 This will probably overcome the bias caused by giving up a large fraction of
the examples. However, in order to obtain a reliable estimate of the performance
the training process will have to be repeated many times, probably more than
what is needed in order to calculate the NSS.

It is important to emphasize again:
The above results were obtained for that specific experiment. We have obtained
similar results with different tasks (e.g. DNA structure classification) and
with different learning machines (e.g. decision trees), but still, these
results prove nothing "in general", except may be, that life is complicated
and full of uncertainty... 
A more careful comparison with cross validation as a stopping method, and using
NSS in other scenarios (like function fitting) is under investigation.
If anyone is interested in using the NSS method in combination with pruning
methods (e.g. to test the stopping criteria), I will be glad to help.
I will be grateful for any other information/ref about similar experiments.

I hope all the above did not add too much to your puzzlement.
Good luck with your training,
Tal
------------------------------------------------

The paper I mentioned above is:
Learning Theory seminar:  Thursday Feb.10. 15:15.  CNLS Conference room.

title: Use of Bad Training Data For Better Predictions.

by : Tal Grossman and Alan Lapedes (Complex Systems group, LANL)

Abstract:
We present a method for calculating the ``noise sensitivity signature''
of a learning algorithm which is based on scrambling the output classes of
various fractions of the training data. This signature
can be used to indicate a good (or bad) match between the complexity of the
classifier and the complexity of the data and hence
to improve the predictive accuracy of a classification algorithm.
Use of noise sensitivity signatures is distinctly different from other schemes
to avoid overtraining, such as cross-validation, which uses only part of the
training data, or various penalty functions, which are not data-adaptive.
Noise sensitivity signature methods use all of the training data and
are manifestly data-adaptive and non-parametric.
They are well suited for situations with limited training data

It is going to appear in the Proc. of NIPS 6.   An expanded version of it
will (hopefully) be placed in the neuroprose archive within a week or two.
Until then I can send a ps file of it to the interested.


From sef+ at cs.cmu.edu  Fri Feb  4 10:25:51 1994
From: sef+ at cs.cmu.edu (Scott E. Fahlman)
Date: Fri, 04 Feb 94 10:25:51 EST
Subject: Encoding missing values 
In-Reply-To: Your message of Thu, 03 Feb 94 21:53:26 -0700.
             <9402040453.AA24599@nsma.arizona.edu> 
Message-ID: <mailman.655.1149540276.24850.connectionists@cs.cmu.edu>


    There is at least one kind of network that has no problem (in
    principle) with missing inputs, namely a Boltzmann machine.
    You just refrain from clamping the input node whose value is
    missing, and treat it like an output node or hidden unit.
    
    This may seem to be irrelevant to anything other than Boltzmann
    machines, but I think it could be argued that nothing very much
    simpler is capable of dealing with the problem.  When you ask
    a network to handle missing inputs, you are in effect asking it
    to do pattern completion on the input layer, and for this a
    Boltzmann machine or some other sort of attractor network would
    seem to be required. 
    
Good point, but perhaps in need of clarification for some readers:

There are two ways of training a Boltzmann machine.  In one (the original
form), there is no distinction between input and output units.  During
training we alternate between an instruction phase, in which all of the
externally visible units are clamped to some pattern, and a normalization
phase, in which the whole network is allow to run free.  The idea is to
modify the weights so that, when running free, the external units assume
the various pattern values in the training set in their proper frequencies.
If only some subset of the externally visible units are clamped to certain
values, the net will produce compatible completions in the other units,
again with frequencies that match this part of the training set.

A net trained in this way will (in principle -- it might take a *very* long
time for anything complicated) do what you suggest: Complete an "input"
pattern and produce a compatible output at the same time.  This works even
if the input is *totally* missing.

I believe it was Geoff Hinton who realized that a Boltzmann machine could
be trained more efficiently if you do make a distinction between input and
output units, and don't waste any of the training effort learning to
reconstruct the input.  In this model, the instruction phase clamps both
input and output units to some pattern, while the normalization phase
clamps only the input units.  Since the input units are correct in both
cases, all of the networks learning power (such as it is) goes into
producing correct patterns on the output units.  A net trained in this way
will not do input-completion.

I bring this up because I think many people will only have seen the latter
kind of Boltzmann training, and will therefore misunderstand your
observation.

By the way, one alternative method I have seen proposed for reconstructing
missing input values is to first train an auto-encoder (with some degree of
bottleneck to get generalization) on the training set, and then feed the
output of this auto-encoder into the classification net.  The auto-encoder
should be able to replace any missing values with some degree of accuracy.
I haven't played with this myself, but it does sound plausible.  If anyone
can point to a good study of this method, please post it here or send me
E-mail.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================

From zoubin at psyche.mit.edu  Fri Feb  4 11:04:32 1994
From: zoubin at psyche.mit.edu (Zoubin Ghahramani)
Date: Fri, 4 Feb 94 11:04:32 EST
Subject: Encoding missing values
Message-ID: <9402041604.AA28037@psyche.mit.edu>


Dear Lutz, Thierry, Karun, and connectionists,

I have also been looking into the issue of encoding and learning from
missing values in a neural network. The issue of handling missing
values has been addressed extensively in the statistics literature for
obvious reasons.  To learn despite the missing values the data has to
be filled in, or the missing values integrated over. The basic
question is how to fill in the missing data. There are many different
methods for doing this in stats (mean imputation, regression
imputation, Bayesian methods, EM, etc.). For good reviews see (Little
and Rubin 1987; Little, 1992).

I do not in general recommend encoding "missing" as yet another value
to be learned over. Missing means something in a statistical sense --
that the input could be any of the values with some probability
distribution. You could, for example, augment the original data
filling in different values for the missing data points according to a
prior distribution. Then the training would assign different weights
to the artificially filled-in data points depending on how well they
predict the output (their posterior probability). This is essentially
the method proposed by Buntine and Weigand (1991). Other approaches
have been proposed by Tresp et al. (1993) and Ahmad and Tresp (1993).

I have just written a paper on the topic of learning from incomplete
data. In this paper I bring a statistical algorithm for learning from
incomplete data, called EM, into the framework of nonlinear function
approximation and classification with missing values. This approach
fits the data iteratively with a mixture model and uses that same
mixture model to effectively fill in any missing input or output
values at each step. 

You can obtain the preprint by 
	ftp psyche.mit.edu
	login: anonymous
	cd pub
	get zoubin.nips93.ps
To obtain code for the algorithm please contact me directly.

Zoubin Ghahramani
zoubin at psyche.mit.edu

-----------------------------------------------------------------------
Ahmad, S and Tresp, V (1993) "Some Solutions to the Missing Feature
Problem in Vision." In Hanson, S.J., Cowan, J.D., and Giles, C.L.,
editors, Advances in Neural Information Processing Systems 5. Morgan
Kaufmann Publishers, San Mateo, CA.

Buntine, WL, and Weigand, AS (1991) "Bayesian back-propagation." Complex
Systems. Vol 5 no 6 pp 603-43

Ghahramani, Z and Jordan MI (1994) "Supervised learning from
incomplete data via an EM approach" To appear in Cowan, J.D., Tesauro,
G., and Alspector,J. (eds.). Advances in Neural Information Processing
Systems 6.  Morgan Kaufmann Publishers, San Francisco, CA, 1994.

Little, RJA (1992) "Regression With Missing X's:  A Review." Journal of the
American Statistical Association.  Volume 87, Number 420. pp.
1227-1237

Little, RJA. and Rubin, DB (1987). Statistical Analysis with Missing
Data. Wiley, New York.

Tresp, V, Hollatz J, Ahmad S (1993) "Network structuring and training
using rule-based knowledge." In Hanson, S.J., Cowan, J.D., and
Giles, C.~L., editors,  Advances in Neural Information Processing
Systems 5. Morgan Kaufmann Publishers, San Mateo, CA.

From Volker.Tresp at zfe.siemens.de  Fri Feb  4 13:09:46 1994
From: Volker.Tresp at zfe.siemens.de (Volker Tresp)
Date: Fri, 4 Feb 1994 19:09:46 +0100
Subject: missing data
Message-ID: <199402041809.AA14305@inf21.zfe.siemens.de>


In response to  the questions raised by Lutz Prechelt concerning
the missing data problem:


In general, the solution to the missing-data problem depends on 
the missing-data mechanism. For example, if you sample the income
of a population and rich people tend to refuse the answer the mean
of your sample is biased. To obtain an unbiased solution
you would have to take into account the missing-data mechanism.

The missing-data mechanism can be ignored if it is independent of 
the input and the output (in the example: the likelihood that a 
person refuses to answer is independent of the person's income). 
Most approaches assume that the missing-data mechanism can be ignored.


There exist a number of ad hoc solutions to the missing-data problem
but it is also possible to approach the problem from a statistical point
of view. In our paper (which will be published in the upcoming 
NIPS-volume and which will be available on neuroprose
shortly) we discuss a systematic likelihood-based approach.
NN-regression  can be framed as a maximum likelihood learning problem
if we assume the standard signal plus Gaussian noise model  

P(x, y) =  P(x) P(y|x)    \propto P(x) exp(-1/(2 \sigma^2) (y - NN(x))^2).


By deriving the probability density function for  a pattern with missing
features  we can formulate a likelihood function including patterns 
with complete and incomplete features.  

The solution  requires an  integration over the missing input. 
In practice, the  integral  is  approximated  using a numerical approximation. 
For networks of Gaussian basis functions,  it is possible to obtain 
closed-form solutions (by extending the EM algorithm).

Our paper also discusses why and when ad hoc solutions --such as substituting
the mean for an unknown input--  are  harmful. For example, 
if the mapping is approximately linear substituting the mean might work
quite well. In general, although, it introduces bias. 


Training with missing and noisy input data is described in:

``Training Neural Networks with Deficient Data,''
V. Tresp, S. Ahmad and R. Neuneier, in Cowan, J. D., Tesauro, G., 
and Alspector, J. (eds.), {\em  Advances in Neural Information Processing Systems 6}, Morgan Kaufmann,  1994.

A related paper by Zoubin Ghahramani and Michael Jordan will also appear 
in the  upcoming NIPS-volume.


Recall with missing and noisy data is discussed in (available in neuroprose
as ahmad.missing.ps.Z): 

``Some Solutions to the Missing Feature Problem in Vision,'' 
 S. Ahmad and  V. Tresp,  in  
{\em Advances in Neural Information Processing Systems 5,}
S. J. Hanson, J. D. Cowan,  and C. L. Giles eds.,
San Mateo, CA, Morgan Kaufman,  1993. 


Volker Tresp		Subutai Ahmad		Ralph Neuneier
tresp at zfe.siemens.de	ahmad at interval.com 	ralph at zfe.siemens.de

From wray at ptolemy-ethernet.arc.nasa.gov  Fri Feb  4 15:19:44 1994
From: wray at ptolemy-ethernet.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Feb 94 12:19:44 PST
Subject: Encoding missing values
In-Reply-To: <199402031515.KAA29100@faline.bellcore.com> (karun@faline.bellcore.com)
Message-ID: <9402042019.AA05621@ptolemy.arc.nasa.gov>

regarding this missing value question raised thusly ....
  by Thierry Denoeux, Lutz Prechelt, and others

>>>>>>>>>>>>>>>
> So far to my considerations. Now to my questions.
> 
> a) Can you think of other encoding methods that seem reasonable ?  Which ?
>  
> b) Do you have experience with some of these methods that is worth sharing ?
> 
> c) Have you compared any of the alternatives directly ?
> 
>   Lutz
 + 
>   I have not found a simple solution that is general. I think
>  representation in general and the missing information in specific
>  are open problems within connectionist research. I am not sure we will
>  have a magic bullet for all problems. The best approach is to come up
>  with a specific solution for a given problem.

->  Karun
>>>>>>>>>>

This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.

A classic study in this area, which includes most suggestions
I've read here so far, is
inproceedings{quinlan:ml6,
        AUTHOR = "J.R. Quinlan",
        TITLE = "Unknown Attribute Values in Induction",
        YEAR = 1989,
        BOOKTITLE = "Proceedings of the Sixth International
                        Machine Learning Workshop",
        PUBLISHER = "Morgan Kaufmann",
        ADDRESS = "Cornell, New York"}

The most frequently cited methods I've seen, and they're so common 
amongst the different communities its hard to lay credit:
  1)	 replace missings by their some best guess
  2)     fracture the example into a set of fractional examples
		each with the missing value filled in somehow
  3)     call the missing value another input value

3 is a good thing to do if they are "informative" missing,
i.e.  if someone leaves the entry "telephone number" blank in a 
	questionaire, then maybe they don't have a telephone,
	but probably not good otherwise unless you
	have loads of data and don't mind all the extra
	example types generated (as already mentioned)

1 is a quick and dirty hack at 2.  How good depends on your
application.

2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model".  The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever.  We do it for feed-forward nets in
@article{buntine.weigend:bbp,
        AUTHOR = "W.L. Buntine and A.S. Weigend",
        TITLE =  "Bayesian Back-Propagation",
        JOURNAL = "Complex Systems",
        Volume = 5,
        PAGES = "603--643",
        Number = 1,
        YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.

I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov

From stork at cache.crc.ricoh.com  Fri Feb  4 11:57:37 1994
From: stork at cache.crc.ricoh.com (David G. Stork)
Date: Fri, 4 Feb 94 08:57:37 -0800
Subject: Missing features...
Message-ID: <9402041657.AA12260@neva.crc.ricoh.com>


There is a provably optimal method for performing classification with
missing inputs, described in Chapter 2 of "Pattern Classification and
Scene Analysis" (2nd ed.) by R. O. Duda, P. E. Hart and D. G. Stork,
which avoids the ad-hoc heuristics that have been described by others.
Those interested in obtaining Chapter two via ftp should contact me.

Dr. David G. Stork
Chief Scientist and
Head, Machine Learning and Perception
Ricoh California Research Center
2882 Sand Hill Road  Suite 115
Menlo Park, CA 94025-7022 USA
415-496-5720 (w)
415-854-8740 (fax)
stork at crc.ricoh.com

From wray at ptolemy-ethernet.arc.nasa.gov  Fri Feb  4 15:47:25 1994
From: wray at ptolemy-ethernet.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Feb 94 12:47:25 PST
Subject: some questions on training neural nets...
In-Reply-To: <9402031640.AA01243@predict.com> (message from William Finnoff on Thu, 3 Feb 94 09:40:51 MST)
Message-ID: <9402042047.AA06120@ptolemy.arc.nasa.gov>

Tom Dietterich and William Finnof covered a lot of issues.

I'd just like to highlight two points:
	*   this is a contentious area
	*   there are several opposing factors at play that
		confuse our understanding of this

================  detail

Basically, this comment below is SO true.

>  There are many ways to manage the bias/variance tradeoff.  I would say
>  that there is nothing approaching complete agreement on the best
>  approaches (and more fundamentally, the best approach varies from one
>  application to another, since this is really a form of prior).  The
>  approaches can be summarized as

The bias/variance tradeoff lies at the heart of almost all disagreements
between different learning philosophies such as classical, Bayesian, minimum
description length, resampling schemes (now often viewed as empirical
Bayesian), statistical physics approaches, and the various
"implementation" schemes.

One thing to note is that there are several quite separate forces
in operation here:
	computational and search issues:
		(e.g.  maybe early stopping works better
			because its a more efficient way of
			searching the space of smaller networks ?)
	prior issues:
		(e.g.  have you thrown in 20 attributes you
			happen to think might apply, but probably
			15 are irrelevant;  OR did a medical
			specialist carefully pick all 10 attributes
			and assures you every one is important,
			OR  is a medical specialist able to solve the
			task blind, just be reading the 20 attribute
			values (without seeing the patient), etc.)
		(e.g.  are 30 hidden units adequate for the structure
			of the task? )
	asking the right question:
		(e.g.  sometimes the question:  what's the "best" network
			is a bit silly when you have a small amount of
			data, perhaps you should be trying to find
			10 reasonable alternative networks and pool their
			results (ala.  Michael Perrone's NIPS'93 workshop)
	understanding your representation:
		(e.g.   with rule based systems, each rule has a good
			interpretation so the question of how to
			prune, etc., is something you can understand
			well BUT with a large feed-forward network,
			understanding the structure of the space is more
			involved, e.g.  if I set these 2 weights to zero
			what the hell happens to my proposed solution)
		(e.g.   this confuses the problem of designing
			good regularizes/priors/network-encodings).

Problem is that theory people tend to focus on one, maybe two
of these, whereas application people tend to confuse them together.


Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov


From kak at gate.ee.lsu.edu  Fri Feb  4 17:24:34 1994
From: kak at gate.ee.lsu.edu (Subhash Kak)
Date: Fri, 4 Feb 94 16:24:34 CST
Subject: Encoding missing values
Message-ID: <9402042224.AA23849@gate.ee.lsu.edu>

Missing values in feedback networks raise interesting questions:
Should these values be considered "don't know" values or should 
these be generated in some "most likelihood" fashion? These issues
are discussed in the following paper:

S.C. Kak, "Feedback neural networks: new characteristics and a
generalization", Circuits, Systems, Signal Processing, vol. 12,
no. 2, 1993, pp. 263-278.

-Subhash Kak

From moody at chianti.cse.ogi.edu  Fri Feb  4 18:50:07 1994
From: moody at chianti.cse.ogi.edu (John Moody)
Date: Fri, 4 Feb 94 15:50:07 -0800
Subject: PhD and Masters Programs at the Oregon Graduate Institute
Message-ID: <9402042350.AA19148@chianti.cse.ogi.edu>


Fellow Connectionists:

The Oregon Graduate Institute of Science and Technology (OGI) has
openings for a few outstanding students in its Computer Science
and Electrical Engineering Masters and Ph.D programs in the areas
of Neural Networks, Learning, Signal Processing, Time Series,
Control, Speech, Language, and Vision.

Faculty and postdocs in these areas include Etienne Barnard, Ron
Cole, Mark Fanty, Dan Hammerstrom, Hynek Hermansky, Todd Leen, Uzi
Levin, John Moody, David Novick, Misha Pavel, Joachim Utans, Eric
Wan, and Lizhong Wu. Short descriptions of our research interests
are appended below.

OGI is a young, but rapidly growing, private research institute
located in the Portland area. OGI offers Masters and PhD programs
in Computer Science and Engineering, Applied Physics, Electrical
Engineering, Biology, Chemistry, Materials Science and Engineering,
and Environmental Science and Engineering.

Inquiries about the Masters and PhD programs and admissions for
either Computer Science or Electrical Engineering should be addressed
to:

Margaret Day, Director
Office of Admissions and Records 

Oregon Graduate Institute 

PO Box 91000
Portland, OR 97291

Phone: (503)690-1028
Email: margday at admin.ogi.edu


The final deadline for receipt of all applications materials for
the Ph.D. programs is March 1, 1994, so it's not too late to apply!
Masters program applications are accepted continuously.

	+++++++++++++++++++++++++++++++++++++++++++++++++++++++

	   Oregon Graduate Institute of Science & Technology
            Department of Computer Science and Engineering
       & Department of Electrical Engineering and Applied Physics

     Research Interests of Faculty in Adaptive & Interactive Systems
(Neural Networks, Signal Processing, Control, Speech, Language, and Vision)

       
Etienne Barnard (Assistant Professor):

Etienne Barnard is interested in the theory, design and implementation
of pattern-recognition systems, classifiers, and neural networks.
He is also interested in adaptive control systems -- specifically,
the design of near-optimal controllers for real- world problems
such as robotics.


Ron Cole (Professor):

Ron Cole is director of the Center for Spoken Language Understanding
at OGI. Research in the Center currently focuses on speaker-
independent recognition of continuous speech over the telephone
and automatic language identification for English and ten other
languages. The approach combines knowledge of hearing, speech
perception, acoustic phonetics, prosody and linguistics with neural
networks to produce systems that work in the real world.


Mark Fanty (Research Assistant Professor):

Mark Fanty's research interests include continuous speech recognition
for the telephone; natural language and dialog for spoken language
systems; neural networks for speech recognition; and voice control
of computers.


Dan Hammerstrom (Associate Professor):

Based on research performed at the Institute, Dan Hammerstrom and
several of his students have spun out a company, Adaptive Solutions
Inc., which is creating massively parallel computer hardware for
the acceleration of neural network and pattern recognition
applications.  There are close ties between OGI and Adaptive
Solutions.  Dan is still on the faculty of the Oregon Graduate
Institute and continues to study next generation VLSI neurocomputer
architectures.


Hynek Hermansky (Associate Professor);

Hynek Hermansky is interested in speech processing by humans and
machines with engineering applications in speech and speaker
recognition, speech coding, enhancement, and synthesis. His main
research interest is in practical engineering models of human
information processing.


Todd K. Leen (Associate Professor):

Todd Leen's research spans theory of neural network models,
architecture and algorithm design and applications to speech
recognition. His theoretical work is currently focused on the
foundations of stochastic learning, while his work on Algorithm
design is focused on fast algorithms for non-linear data modeling.


Uzi Levin (Senior Research Scientist):

Uzi Levin's research interests include neural networks, learning
systems, decision dynamics in distributed and hierarchical
environments, dynamical systems, Markov decision processes, and
the application of neural networks to the analysis of financial
markets.


John Moody (Associate Professor):

John Moody does research on the design and analysis of learning
algorithms, statistical learning theory (including generalization
and model selection), optimization methods (both deterministic and
stochastic), and applications to signal processing, time series,
and finance.


David Novick (Assistant Professor):

David Novick conducts research in interactive systems, including
computational models of conversation, technologically mediated
communication, and human-computer interaction. A central theme of
this research is the role of meta-acts in the control of interaction.
Current projects include dialogue models for telephone-based
information systems.


Misha Pavel (Associate Professor):

Misha Pavel does mathematical and neural modeling of adaptive
behaviors including visual processing, pattern recognition, visually
guided motor control, categorization, and decision making.  He is
also interested in the application of these  models to sensor
fusion, visually guided vehicular control, and human-computer
interfaces.


Joachim Utans (Post-Doctoral Research Associate):

Joachim Utans's research interests include computer vision and
image processing, model based object recognition, neural network
learning algorithms and optimization methods, model selection and
generalization, with applications in handwritten character recognition
and financial analysis.


Lizhong Wu (Post-Doctoral Research Associate):

Lizhong Wu's research interests include neural network theory and
modeling, time series analysis and prediction, pattern classification
and recognition, signal processing, vector quantization, source
coding and data compression.  He is now working on the application
of neural networks and nonparametric statistical paradigms to
finance.


Eric A. Wan  (Assistant Professor):

Eric Wan's research interests include learning algorithms and
architectures for neural networks and adaptive signal processing.
He is particularly interested in neural applications to time series
prediction, adaptive control, active noise cancellation, and
telecommunications.


From hicks at cs.titech.ac.jp  Sun Feb  6 17:22:17 1994
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Sun, 6 Feb 94 17:22:17 JST
Subject: Methods for improving generalization (was Re: some questions on ...)
Message-ID: <9402060822.AA11860@maruko.cs.titech.ac.jp>

Dear Mr. Grossman,

	I read with great interest your analysis of overlearning and about
your research into achieving better generalization with less data.

	However, I only want to point out an ommision in your background
despcription.  In the abstract of your paper "Use of Bad Training Data For
Better Predictions" you write:

>Use of noise sensitivity signatures is distinctly different from other schemes
>to avoid overtraining, such as cross-validation, which uses only part of the
>training data, or various penalty functions, which are not data-adaptive.
>Noise sensitivity signature methods use all of the training data and
>are manifestly data-adaptive and non-parametric.

When you say penalty functions the first thing which comes to mind is a
penalty on the sum of squared weights.  This method is indeed not
data-adaptive.  However, an interesting article in Neural Computation 4, pp. 
473-493, "Simplifying Neural Networks by Soft Weight-Sharing" proposes a
weight penalty method which is adaptive.  Basically, the weights are grouped
together in Gaussian clusters whose mean and variance are allowed to adapt to
the data.  The experimental results they published show improvement over both
cross-validation and weight decay.  

I am looking forward to reading your paper when it is available.

Yours Respectfully,


	Craig Hicks

Craig Hicks           hicks at cs.titech.ac.jp | Kore ya kono  Yuku mo kaeru mo
Ogawa Laboratory, Dept. of Computer Science | Wakarete wa   Shiru mo shiranu mo
Tokyo Institute of Technology, Tokyo, Japan |  	    Ausaka no seki        
lab:03-3726-1111 ext.2190 home:03-3785-1974 |  (from hyaku-nin-issyu)
fax: +81(3)3729-0685 (from abroad) 
     03-3729-0685  (from Japan)


From pluto at cs.ucsd.edu  Fri Feb  4 17:01:47 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Fri, 04 Feb 1994 14:01:47 -0800
Subject: some questions on training neural nets...
Message-ID: <9402042201.AA16326@odin.ucsd.edu>


I have another reference to add that may be helpful to those
interested in the cross-validation issue raised in the following 
discussion, which I have edited in what follows to focus on
the particular issue this reference addresses:


------- Forwarded Message


From tgd at chert.CS.ORST.EDU  Wed Feb  2 13:02:30 1994
From: tgd at chert.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 2 Feb 94 10:02:30 PST
Subject: some questions on training neural nets...
In-Reply-To: "Charles X. Ling"'s message of Tue, 1 Feb 94 03:37:10 EST <9402010837.AA01695@godel.csd.uwo.ca>
Message-ID: <9402021802.AA00565@curie.CS.ORST.EDU>


In answer to the following:

   From: "Charles X. Ling" <ling at csd.uwo.ca>
   Date: Tue, 1 Feb 94 03:37:10 EST

   Hi neural net experts,

   Will cross-validation help ? [...]
   (could results on the validation set be coincident)?


Tom Dietterich replies:

			[stuff deleted]

There are many ways to manage the bias/variance tradeoff.  I would say
that there is nothing approaching complete agreement on the best
approaches (and more fundamentally, the best approach varies from one
application to another, since this is really a form of prior).  The
approaches can be summarized as

* early stopping
* error function penalties
* size optimization
  - growing
  - pruning
  - other

Early stopping usually employs cross-validation to decide when to stop
training.  (see below).  In my experience, training an overlarge
network with early stopping gives better performance than trying to
find the minimum network size.  It has the disadvantage that training
costs are very high.

			[stuff deleted]

   3. If, for some reason, cross-validation is needed, and TR is split to
   TR1 (for training) and TR2 (for validation), what would be the proper ways
   to do cross-validation? Training on TR1 uses only partial information in 
   TR, but training TR1 to find right parameters and then training on TR1+TR2 
   may require parameters different from the estimation of training TR1. 

I use the TR1+TR2 approach.  On large data sets, this works well.  On
small data sets, the cross-validation estimates themselves are very
noisy, so I have not found it to be as successful.  I compute the
stopping point using the sum squared error per training example, so
that it scales.  I think it is an open research problem to know
whether this is the right thing to do.  [the reply continues..]

------- End of Forwarded Message


In response to the last point, I supply a reference that provides theoretical
guidance from a statistical perspective.  It proves that cross-validation
estimates Integrated Mean Squared Error (IMSE) within a constant due to
noise.

			What this means:  

IMSE is a version of the mean squared error that accounts for the 
finite size of the training set.  Think of it as the expected squared
error obtained by training a network on random training sets of a 
particular size.   It is an ideal (i.e., in general, unobservable)
measure of generalization.

IMSE embodies the bias and variance tradeoff.   It can be decomposed into
the sum of two terms, which directly quantify the bias + variance. 
Therefore, if IMSE embodies the measure
of generalization that is relevant to you, (which will depend on your
learning task) then, least-squares cross-validation provides a realizable
estimate of generalization. 


		Summary of the main results of the paper:

It proves that two versions of cross-validation
(one being the "hold-out set" version discussed above, and the other
being the "delete-1" version) provide unbiased and strongly consistent
estimates of IMSE  This is statistical jargon meaning that, on
average, the estimate is accurate, (i.e., the expectation
of the estimate for given training set size equals the IMSE + a noise term)
and asymtotically precise (in that as the training set and test set
size grow large, the estimate converges to the IMSE within the
constant factor due to noise, with probability 1.)

Note that it does not say anything about the rate at which the
variance of the estimate converges to the truth; therefore, it is
possible that other IMSE-approximate measures may excel for small
training set sizes (e.g., resampling methods such as bootstrap and
jackknife.)   However, it is the first result generally applicable
to nonlinear regression that the authors are aware of, extending
the well-known (in the statistical and econometric literature)
work by C.J. Stone and others that prove similar results for particular
learning tasks or for particular models.

The statement of the results will appear in NIPS 6.  I will post
the soon-to-be-completed extended version to Neuroprose if anyone 
wants to see it sooner, or need access to the proofs.

I hope this is helpful,

= Mark Plutowski
  Institute for Neural Computation,
  and Department of Computer Science and Engineering
  University of California, San Diego
  La Jolla, California.  USA.


Here is the reference:


Plutowski, Mark~E., Shinichi Sakata, and Halbert White. (1994).
``Cross-validation estimates IMSE.''
Cowan, J.D., Tesauro, G., and Alspector, J. (eds.),
{\em Advances in Neural Information Processing Systems 6},
San Mateo, CA: Morgan Kaufmann Publishers.


From esann at dice.ucl.ac.be  Sun Feb  6 15:19:56 1994
From: esann at dice.ucl.ac.be (esann@dice.ucl.ac.be)
Date: Sun, 6 Feb 94 21:19:56 +0100
Subject: ESANN'94: European Symposium on ANNs
Message-ID: <9402062019.AA07827@ns1.dice.ucl.ac.be>


******************************************************************
*                        European Symposium                      *
*                  on Artificial Neural Networks                 *
*                                                                *
*             Brussels (Belgium) - April 20-21-22, 1994          *
*                                                                *
*            Preliminary Program and registration form           *
******************************************************************

Foreword
********

The actual developments in the field of artificial neural networks mark a
watershed in its relatively young history.  Far from the blind passion for
disparate applications some years ago, the tendency is now to an objective
assessment of this emerging technology, with a better knowledge of the
basic concepts, and more appropriate comparisons and links with classical
methods of computing.

Neural networks are not restricted to the use of back-propagation and
multi-layer perceptrons.  Self-organization, adaptive signal processing,
vector quantization, classification, statistics, image and speech
processing are some of the domains where neural networks techniques may be
successfully used; but a beneficial use goes through an in-depth
examination of both the theoretical basis of the neural techniques and
standard methods commonly used in the specified domain.

ESANN'94 is the second symposium covering these specified aspects of neural
networks computing.  After a successful edition in 1993, ESANN'94 will open
new perspectives, by focusing on theoretical and mathematical aspects of
neural networks, biologically-inspired models, statistical aspects, and
relations between neural networks and both information and signal
processing (classification, vector quantization, self-organization,
approximation of functions, image and speech processing,...).

The steering and program committees of ESANN'94 are pleased to invite you
to participate to this symposium.  More than a formal conference presenting
the last developments in the field, ESANN'94 will be also a forum for open
discussions, round tables and opportunities for future collaborations.  We
hope to have the pleasure to meet you in April, in the splendid town of
Brussels, and that your stay in Belgium will be as scientifically
beneficial as agreeable.


Symposium information
*********************

Registration fees for symposium
-------------------------------
           registration before  registration after
            18th March 1994     18th March 1994
Universities    BEF 14500       BEF 15500
Industries      BEF 18500       BEF 19500

Registration fees include attendance to all sessions, the ESANN'94 banquet,
a copy of the conference proceedings, daily lunches (20-22 April '94), and
coffee breaks twice a day during the symposium.

Advance registration is mandatory.  Young researchers may apply for grants
offered by the European Community (restricted to citizens or residents of a
Western European country or, tentatively, Central or Eastern European
country - deadline for applications: March 11th, 1994 - please write to the
conference secretariat for details).

Advance payments (see registration form) must be made to the conference
secretariat by bank transfers in Belgian Francs (free of charges) or by
sending a cheque (add BEF 500 for processing fees).

Language
--------

The official language of the conference is English.  It will be used for
all printed material, presentations and discussions.

Proceedings
-----------

A copy of the proceedings will be provided to all Conference Registrants.
All technical papers will be included in the proceedings.
Additional copies of the proceedings (ESANN'93 and ESANN'94) may be
purchased at the following rate:
ESANN'94 proceedings:  BEF 2000
ESANN'93 proceedings:  BEF 1500.

Add BEF 500 to any order for p.&p. and/or bank charges.  Please write to
the conference secretariat for ordering proceedings.

Conference dinner
-----------------

A banquet will be offered on Thursday 21th to all conference registrants in
a famous and typical place of Brussels.  Additional vouchers for the
banquet may be purchased on Wednesday 20th at the conference.

Cancellation
------------

If cancellation is received by 25th March 1994, 50% of the registration
fees will be returned.  Cancellation received after this date will not be
entitled to any refund.


General information
*******************

Brussels, Belgium
-----------------

Brussels is not only the host city of the European Commission and of
hundreds of multinational companies; it is also a marvelous historical
town, with typical quarters, famous monuments known throughout the world,
and the splendid "Grand-Place".  It is a cultural and artistic center, with
numerous museums.

Night life in Brussels is considerable.  There are of lot of restaurants
and pubs open late in the night, where typical Belgian dishes can be tasted
with one of the more than 1000 different beers.

Hotel accommodation
-------------------

Special rates for participants to ESANN'94 have been arranged at the
MAYFAIR HOTEL, a De Luxe 4 stars hotel with 99 fully air conditioned guest
rooms, tastefully decorated to the highest standards of luxury and comfort.
The hotel includes two restaurants, a bar and private parking.  Public
transportation (trams n93 & 94) goes directly from the hotel to the
conference center (Parc stop)

Single room                             BEF 2800
Double room or twin room                BEF 3500
Prices include breakfast, taxes and service.  Rooms can only be confirmed
upon receipt of booking form (see at the end of this booklet) and deposit.

Located on the elegant Avenue Louise, the exclusive Hotel Mayfair is a
short walk from the "uppertown" luxurious shopping district.  Also nearby
is the 14th century Cistercian abbey and the magnificent "Bois de la
Cambre" park with its open-air cafes - ideal for a leisurely stroll at the
end of a busy day.
HOTEL MAYFAIR                   tel: +32 2 649 98 00
381 av. Louise                  fax: +32 2 649 22 49
1050 Brussels - Belgium

Conference location
-------------------

The conference will be held at the "Chancellerie" of the Generale de
Banque.  A map is included in the printed programme.
Generale de Banque - Chancellerie
1 rue de la Chancellerie
1000 Brussels - Belgium

Conference secretariat
D facto conference services     tel: + 32 2 245 43 63
45 rue Masui                    fax: + 32 2 245 46 94
B-1210 Brussels - Belgium       E-mail: esann at dice.ucl.ac.be


PROGRAM OF THE CONFERENCE
*************************

Wednesday 20th April 1994
-------------------------

9H30    Registration

10H00   Opening session

Session 1: Neural networks and chaos
Chairman: M. Hasler (Ecole Polytechnique Fdrale de Lausanne, Switzerland)

10H10   "Concerning the formation of chaotic behaviour in recurrent neural
networks"
        T. Kolb, K. Berns
        Forschungszentrum Informatik Karlsruhe (Germany)

10H30   "Stability and bifurcation in an autoassociative memory model"
        W.G. Gibson, J. Robinson, C.M. Thomas
        University of Sidney (Australia)

10H50   Coffee break

Session 2: Theoretical aspects 1
Chairman: C. Jutten (Institut National Polytechnique de Grenoble, France)

11H30   "Capabilities of a structured neural network.  Learning and
comparison with classical techniques"
        J. Codina, J. C. Aguado, J.M. Fuertes
        Universitat Politecnica de Catalunya (Spain)

11H50   "Projection learning: alternative approaches to the computation of
the projection"
        K. Weigl, M. Berthod
        INRIA Sophia Antipolis (France)

12H10   "Stability bounds of momentum coefficient and learning rate in
backpropagation algorithm""
        Z. Mao, T.C. Hsia
        University of California at Davis (USA)

12H30 Lunch

Session 3: Links between neural networks and statistics
Chairman: J.C. Fort (Universit Nancy I, France)

14H00   "Model selection for neural networks: comparing MDL and NIC""
        G. te Brake*, J.N. Kok*, P.M.B. Vitanyi**
        *Utrecht University, **Centre for Mathematics and Computer Science,
Amsterdam (Netherlands)

14H20   "Estimation of performance bounds in supervised classification"
        P. Comon*, J.L. Voz**, M. Verleysen**
        *Thomson-Sintra Sophia Antipolis (France), **Universit Catholique
de Louvain, Louvain-la-Neuve (Belgium)

14H40   "Input Parameters' estimation via neural networks"
        I.V. Tetko, A.I. Luik
        Institute of Bioorganic & Petroleum Chemistry, Kiev (Ukraine)

15H00   "Combining multi-layer perceptrons in classification problems"
        E. Filippi, M. Costa, E. Pasero
        Politecnico di Torino (Italy)

15H20   Coffee break

Session 4: Algorithms 1
Chairman: J. Hrault (Institut National Polytechnique de Grenoble, France)

16H00   "Diluted neural networks with binary couplings: a replica symmetry
breaking calculation of the storage capacity"
        J. Iwanski, J. Schietse
        Limburgs Universitair Centrum (Belgium)

16H20   "Storage capacity of the reversed wedge perceptron with binary
connections"
        G.J. Bex, R. Serneels
        Limburgs Universitair Centrum (Belgium)

16H40   "A general model for higher order neurons"
        F.J. Lopez-Aligue, M.A. Jaramillo-Moran, I. Acedevo-Sotoca, M.G. Valle
        Universidad de Extremadura, Badajoz (Spain)

17H00   "A discriminative HCNN modeling"
        B. Petek
        University of Ljubljana (Slovenia)


Thursday 21th April 1994
------------------------

Session 5: Biological models
Chairman: P. Lansky (Academy of Science of the Czech Republic)

9H00    "Biologically plausible hybrid network design and motor control"
        G.R. Mulhauser
        University of Edinburgh (Scotland)

9H20    "Analysis of critical effects in a stochastic neural model"
        W. Mommaerts, E.C. van der Meulen, T.S. Turova
        K.U. Leuven (Belgium)

9H40    "Stochastic model of odor intensity coding in first-order olfactory
neurons"
        J.P. Rospars*, P. Lansky**
        *INRA Versailles (France), **Academy of Sciences, Prague (Czech
Republic)

10H00   "Memory, learning and neuromediators"
        A.S. Mikhailov
        Fritz-Haber-Institut der MPG, Berlin (Germany), and Russian Academy
of Sciences, Moscow (Russia)

10H20   "An explicit comparison of spike dynamics and firing rate dynamics
in neural network modeling"
        F. Chapeau-Blondeau, N. Chambet
        Universit d'Angers (France)

10H40   Coffee break

Session 6: Algorithms 2
Chairman: T. Denoeux (Universit Technologique de Compigne, France)

11H10   "A stop criterion for the Boltzmann machine learning algorithm"
        B. Ruf
        Carleton University (Canada)

11H30   "High-order Boltzmann machines applied to the Monk's problems"
        M. Grana, V. Lavin, A. D'Anjou, F.X. Albizuri, J.A. Lozano
        UPV/EHU, San Sebastian (Spain)

11H50   "A constructive training algorithm for feedforward neural networks
with ternary weights"
        F. Aviolat, E. Mayoraz
        Ecole Polytechnique Fdrale de Lausanne (Switzerland)

12H10   "Synchronization in a neural network of phase oscillators with time
delayed coupling"
        T.B. Luzyanina
        Russian Academy of Sciences, Moscow (Russia)

12H30   Lunch

Session 7:  Evolutive and incremental learning
Chairman: T.J. Stonham (Brunel University, UK) - to be confirmed

14H00   "Reinforcement learning and neural reinforcement learning"
        S. Sehad, C. Touzet
        Ecole pour les Etudes et la Recherche en Informatique et
Electronique, Nmes (France)

14H20   "Improving piecewise linear separation incremental algorithms using
complexity reduction methods"
        J.M. Moreno, F. Castillo, J. Cabestany
        Universitat Politecnica de Catalunya (Spain)

14H40   "A comparison of two weight pruning methods"
        O. Fambon, C. Jutten
        Institut National Polytechnique de Grenoble (France)

15H00   "Extending immediate reinforcement learning on neural networks to
multiple actions"
        C. Touzet
        Ecole pour les Etudes et la Recherche en Informatique et
Electronique, Nmes (France)

15H20   "Incremental increased complexity training"
        J. Ludik, I. Cloete
        University of Stellenbosch (South Africa)

15H40   Coffee break

Session 8: Function approximation
Chairman: E. Filippi (Politecnico di Torino, Italy) - to be confirmed

16H20   "Approximation of continuous functions by RBF and KBF networks"
        V. Kurkova, K. Hlavackova
        Academy of Sciences of the Czech Republic

16H40   "An optimized RBF network for approximation of functions"
        M. Verleysen*, K. Hlavackova**
        *Universit Catholique de Louvain, Louvain-la-Neuve (Belgium),
**Academy of Science of the Czech Republic

17H00   "VLSI complexity reduction by piece-wise approximation of the
sigmoid function"
        V. Beiu, J.A. Peperstraete, J. Vandewalle, R. Lauwereins
        K.U. Leuven (Belgium)

20H00 Conference dinner


Friday 22th April 1994
----------------------

Session 9: Algorithms 3
Chairman: J. Vandewalle (K.U. Leuven, Belgium) - to be confirmed

9H00    "Dynamic pattern selection for faster learning and controlled
generalization of neural networks"
        A. Rbel
        Technische Universitt Berlin (Germany)

9H20    "Noise reduction by multi-target learning"
        J.A. Bullinaria
        Edinburgh University (Scotland)

9H40    "Variable binding in a neural network using a distributed
representation"
        A. Browne, J. Pilkington
        South Bank University, London (UK)

10H00   "A comparison of neural networks, linear controllers, genetic
algorithms and simulated annealing for real time control"
        M. Chiaberge*, J.J. Merelo**, L.M. Reyneri*, A. Prieto**, L. Zocca*
        *Politecnico di Torino (Italy), **Universidad de Granada (Spain)

10H20   "Visualizing the learning process for neural networks"
        R. Rojas
        Freie Universitt Berlin (Germany)

10H40   Coffee break

Session 10: Theoretical aspects 2
Chairman: M. Cottrell (Universit Paris I, France)

11H20   "Stability analysis of diagonal recurrent neural networks"
        Y. Tan, M. Loccufier, R. De Keyser, E. Noldus
        University of Gent (Belgium)

11H40   "Stochastics of on-line back-propagation"
        T. Heskes
        University of Illinois at Urbana-Champaign (USA)

12H00   "A lateral contribution learning algorithm for multi MLP architecture"
        N. Pican*, J.C. Fort**, F. Alexandre*
        *INRIA Lorraine, **Universit Nancy I (France)

12H20   Lunch

Session 11: Self-organization
Chairman: F. Blayo (EERIE Nmes, France)

14H00   "Two or three things that we know about the Kohonen algorithm"
        M. Cottrell*, J.C. Fort**, G. Pags***
        Universits *Paris 1, **Nancy 1, ***Paris 6 (France)

14H20   "Decoding functions for Kohonen maps"
        M. Alvarez, A. Varfis
        CEC Joint Research Center, Ispra (Italy)

14H40   "Improvement of learning results of the selforganizing map by
calculating fractal dimensions"
        H. Speckmann, G. Raddatz, W. Rosenstiel
        University of Tbingen (Germany)

15H00   Coffee break

Session 11 (continued): Self-organization
Chairman: F. Blayo (EERIE Nmes, France)

15H40   "A non linear Kohonen algorithm"
        J.-C. Fort*, G. Pags**
        *Universit Nancy 1, **Universits Pierre et Marie Curie, et Paris
12 (France)

16H00   "Self-organizing maps based on differential equations"
        A. Kanstein, K. Goser
        Universitt Dortmund (Germany)

16H20   "Instabilities in self-organized feature maps with short
neighbourhood range"
        R. Der, M. Herrmann
        Universitt Leipzig (Germany)


ESANN'94 Registration and Hotel Booking Form
********************************************

Registration fees
-----------------
                registration before     registration after
                18th March 1994 18th March 1994
Universities    BEF 14500               BEF 15500
Industries      BEF 18500               BEF 19500

University fees are applicable to members and students of academic and
teaching institutions.

Each registration will be confirmed by an acknowledgment of receipt, which
must be given to the registration desk of the conference to get entry
badge, proceedings and all materials.

Registration fees include attendance to all sessions, the ESANN'94 banquet,
a copy of the conference proceedings, daily lunches (20-22 April '94), and
coffee breaks twice a day during the symposium.

Advance registration is mandatory.  Students and young researchers from
European countries may apply for European Community grants.


Hotel booking
-------------

Hotel MAYFAIR (4 stars) - 381 av. Louise - 1050 Brussels

Single room :                   BEF 2800
Double room (large bed) :       BEF 3500
Twin room (2 beds) :            BEF 3500

Prices include breakfast, service and taxes.  A deposit corresponding to
the first night is mandatory.


Registration to ESANN'94 (please give full address and tick appropriate)
------------------------------------------------------------------------
Ms., Mr., Dr., Prof.:...............................................
Name:...............................................................
First Name:.........................................................
Institution:........................................................
...................................................................
Address:............................................................
...................................................................
ZIP:................................................................
Town:...............................................................
Country:............................................................
Tel:................................................................
Fax:................................................................
E-mail:.............................................................
VAT n:.............................................................

Universities:
O   registration before 18th March 1994:        BEF 14500
O   registration after 18th March 1994:         BEF 15500
Industries:
O   registration before 18th March 1994:        BEF 18500
O   registration after 18th March 1994:         BEF 19500

Hotel Mayfair booking (please tick appropriate)
O   single room                         deposit: BEF 2800
O   double room (large bed)             deposit: BEF 3500
O   twin room (twin beds)               deposit: BEF 3500

Arrival date:           ..../..../1994
Departure date:         ..../..../1994

O   Additional payment if fees are paid
       through bank abroad check:                 BEF 500

Total   BEF ____

Payment (please tick):
O   Bank transfer, stating name of participant, made payable to:
                Gnrale de Banque
                ch. de Waterloo 1341 A
                B-1180 Brussels - Belgium
                Acc.no: 210-0468648-93 of D facto (45 rue
                        Masui, B-1210 Brussels)
        Bank transfers must be free of charges.  EVENTUAL CHARGES
        MUST BE PAID BY THE PARTICIPANT.
O   Cheques/Postal Money Orders made payable to:
                D facto
                45 rue Masui
                B-1210 Brussels - Belgium
        A SUPPLEMENTARY FEE OF BEF 500 MUST BE ADDED if
        the payment is made through bank abroad cheque
        or postal money order.

Only registrations accompanied by a cheque, a postal money order or the
proof of bank transfer will be considered.

Registration and hotel booking form, together with payment, must be send as
soon as possible, and in no case later than 8th April 1994, to the
conference secretariat:

       &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
       &    D facto conference services - ESANN'94     &
       &   45, rue Masui - B-1210 Brussels - Belgium   &
       &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&


Support
*******

ESANN'94 is organized with the support of:
- Commission of the European Communities (DG XII, Human Capital and
Mobility programme)
- IEEE Region 8
- IFIP WG 10.6 on neural networks
- Region of Brussels-Capital
- EERIE (Ecole pour les Etudes et la Recherche en Informatique et
Electronique - Nmes)
- UCL (Universit Catholique de Louvain - Louvain-la-Neuve)
- REGARDS (Research Group on Algorithmic, Related Devices and Systems - UCL)


Steering committee
******************

Franois Blayo          EERIE, Nmes (F)
Marie Cottrell          Univ. Paris I (F)
Nicolas Franceschini    CNRS Marseille (F)
Jeanny Hrault          INPG Grenoble (F)
Michel Verleysen        UCL Louvain-la-Neuve (B)

Scientific committee
********************

Luis Almeida            INESC - Lisboa (P)
Jorge Barreto           UCL Louvain-en-Woluwe (B)
Herv Bourlard          L. & H. Speech Products (B)
Joan Cabestany          Univ. Polit. de Catalunya (E)
Dave Cliff              University of Sussex (UK)
Pierre Comon            Thomson-Sintra Sophia (F)
Holk Cruse              Universitt Bielefeld (D)
Dante Del Corso         Politecnico di Torino (I)
Marc Duranton           Philips / LEP (F)
Jean-Claude Fort        Universit Nancy I (F)
Karl Goser              Universitt Dortmund (D)
Martin Hasler           EPFL Lausanne (CH)
Philip Husbands         University of Sussex (UK)
Christian Jutten        INPG Grenoble (F)
Petr Lansky             Acad. of Science of the Czech Rep. (CZ)
Jean-Didier Legat       UCL Louvain-la-Neuve (B)
Jean Arcady Meyer       Ecole Normale Suprieure - Paris (F)
Erkki Oja               Helsinky University of Technology (SF)
Guy Orban               KU Leuven (B)
Gilles Pags            Universit Paris I (F)
Alberto Prieto          Universitad de Granada (E)
Pierre Puget            LETI Grenoble (F)
Ronan Reilly            University College Dublin (IRE)
Tamas Roska             Hungarian Academy of Science (H)
Jean-Pierre Rospars     INRA Versailles (F)
Jean-Pierre Royet       Universit Lyon 1 (F)
John Stonham            Brunel University (UK)
Lionel Tarassenko       University of Oxford (UK)
John Taylor             King's College London (UK)
Vincent Torre           Universita di Genova (I)
Claude Touzet           EERIE Nmes (F)
Joos Vandewalle         KUL Leuven (B)
Eric Vittoz             CSEM Neuchtel (CH)
Christian Wellekens     Eurecom Sophia-Antipolis (F)


_____________________________
Michel Verleysen

D facto conference services
45 rue Masui
1210 Brussels
Belgium
tel: +32 2 245 43 63
fax: +32 2 245 46 94
E-mail: esann at dice.ucl.ac.be
_____________________________


From lba at ilusion.inesc.pt  Mon Feb  7 04:57:07 1994
From: lba at ilusion.inesc.pt (Luis B. Almeida)
Date: Mon, 7 Feb 94 10:57:07 +0100
Subject: Encoding missing values
Message-ID: <9402070957.AA18932@ilusion.inesc.pt>

Bill Skaggs writes:

  There is at least one kind of network that has no problem (in
  principle) with missing inputs, namely a Boltzmann machine.
  You just refrain from clamping the input node whose value is
  missing, and treat it like an output node or hidden unit.

  This may seem to be irrelevant to anything other than Boltzmann
  machines, but I think it could be argued that nothing very much
  simpler is capable of dealing with the problem.  When you ask
  a network to handle missing inputs, you are in effect asking it
  to do pattern completion on the input layer, and for this a
  Boltzmann machine or some other sort of attractor network would
  seem to be required.


The same effect, of trying to guess the missing inputs, can also be
obtained with a recurrent multilayer perceptron, trained with
recurrent backprop. This is the reason why the pattern completion
results that I described in my 1987 ICNN paper (ref. below) were
rather good.

L. B. Almeida, "A learning rule for asynchronous perceptrons with
feedback in a combinatorial environment", Proc IEEE First
International Conference on Neural Networks, San Diego, Ca., 1987.

Luis B. Almeida

INESC                             Phone: +351-1-544607, +351-1-3100246
Apartado 10105                    Fax:   +351-1-525843
P-1017 Lisboa Codex
Portugal

lba at inesc.pt

-----------------------------------------------------------------------------

      *** Indonesians are killing innocent people in East Timor ***


From jordan at psyche.mit.edu  Mon Feb  7 20:47:09 1994
From: jordan at psyche.mit.edu (Michael Jordan)
Date: Mon, 7 Feb 94 20:47:09 EST
Subject: Encoding missing values
Message-ID: <CMM.0.90.0.760672029.jordan@psyche.mit.edu>


> There is at least one kind of network that has no problem (in
> principle) with missing inputs, namely a Boltzmann machine.
> You just refrain from clamping the input node whose value is
> missing, and treat it like an output node or hidden unit.
>  
> This may seem to be irrelevant to anything other than Boltzmann
> machines, but I think it could be argued that nothing very much
> simpler is capable of dealing with the problem. 

The above is a nice observation that is worth emphasizing;
I agree with all of it except the comment about being irrelevant 
to anything else.  The Boltzmann machine is actually relevant
to everything else.  What the Boltzmann algorithm is doing
with the missing value is essentially the same as what the 
EM algorithm for mixtures (that Ghahramani and Tresp referred 
to) is doing, and epitomizes the general case of an iterative
"filling in" algorithm.  The Boltzmann machine learning algorithm 
is a generalized EM (GEM) algorithm.  During the E step the 
system computes the conditional correlation function for the 
nodes under the Boltzmann distribution, where the conditioning 
variables are the known data (the values of the clamped units) 
and the current values of the parameters (weights).  This
"fills in" the relevant statistic (the correlation function)
and allows it to be used in the generalized M step (the
contrastive Hebb rule).

Moreover, despite the fancy terminology, these algorithms are 
nothing more (nor less) than maximum likelihood estimation, 
where the likelihood function is the likelihood of the parameters 
*given the data that was actually observed*.  By "filling in" 
missing data, you're not adding new information to the problem;
rather, you're allowing yourself to use all the information 
that is in those components of the data vector that aren't 
missing.  (EM theory provides the justification for that 
statement).  E.g., if only one component of an input vector 
is missing, it's obviously wasteful to neglect what the 
other components of the input vector are telling you.  And, 
indeed, if you neglect the whole vector, you will not end up 
with maximum likelihood estimates for the weights (nor in 
general will you get maximum likelihood estimates if you fill 
in a value with the unconditional mean of that variable).

"Filling in" is not the only way to compute ML estimates for
missing data problems, but its virtue is that it allows the 
use of the same learning algorithms as would be used for complete 
data (without incurring any bias, if the filling in is done correctly).
The only downside is that even if the complete-data algorithm 
is one-pass (which the Boltzmann algorithm and mixture fitting 
are not) the "filling-in" approach is generally iterative,
because the parameter estimates depend on the filled-in values 
which in turn depend on the parameter estimates.
On the other hand, there are so-called "monotone" patterns
of missing data for which the filling-in approach is not
necessarily iterative.  This monotone case might be of
interest, because it is relevant for problems involving 
feedforward networks in which the input vectors are
complete but some of the outputs are missing.  (Note that even 
if all the output values for a case are missing, a ML
algorithm will not throw the case out; there is statistical 
structure in the input vector that the algorithm must not
neglect).

	Mike

(See Ghahramani's message for references; particularly the
Little and Rubin book).


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


A few days ago, I posted some thoughts about how to represent missing
input values to a neural network and asked for comments and further ideas.
This message is a summary of the replies I received (some in my personal mail
some in connectionists).  I show the most significant comments and ideas
and append versions of the messages that are trimmed to the most important parts
(in case somebody wants to keep this discussion in his/her archive)

This was my original message:

------------------------------------------------------------------------


From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <mailman.656.1149540276.24850.connectionists@cs.cmu.edu>

I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value,
      the relevant one being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone
      transformation to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to
      indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

------------------------------------------------------------------------

SUMMARY:

For a), the following ideas were mentioned:
1. use statistical techniques to compute replacement values from the rest
   of the data set
2. use a Boltzman machine to do this for you
3. use an autoencoder feed forward network to do this for you
4. randomize on the missing values (correct in the Bayesian sense)

For b), some experience was reported. I don't know how to summarize
that nicely, so I just don't summarize at all.

For c), no explicit quantitative results were given directly.

Some replies suggest that data is not always missing randomly.
The biases are often known and should be taken into account (e.g.
medical tests are not carried out (resulting in missing data) for
moreless healthy persons more often than for ill persons).

Many replies contained references to published work on this area,
from NN, machine learning, and mathematical statistics.
To ease searching for these references in the replies below, I have
marked them with the string ##REF## (if you have a 'grep' program that
extracts whole paragraphs, you can get them all out with one command).

Thanks to all who answered.
These are the trimmed versions of the replies:

------------------------------------------------------------------------

From: tgd at research.CS.ORST.EDU (Tom Dietterich)

[...for nominal attributes:]
An alternative here is to encode them as bit-strings in a
error-correcting code, so that the hamming distance between any two
bit strings is constant.  This would probably be better than a dense
binary encoding.  The cost in additional inputs is small.  I haven't
tried this though.  My guess is that distributed representations at
the input are a bad idea.

One must always determine WHY the value is missing.  In the heart
disease data, I believe the values were not measured because other
features were believed to be sufficient in each case.  In such cases,
the network should learn to down-weight the importance of the feature
(which can be accomplished by randomizing it---see below).

In other cases, it may be more appropriate to treat a missing value as
a separate value for the feature, e.g., in survey research, where a
subject chooses not to answer a question.

[...for continuous attributes:]
Ross Quinlan suggests encoding missing values as the mean observed
output value when the value is missing.  He has tried this in his
regression tree work.

Another obvious approach is to randomize the missing values--on each
presentation of the training example, choose a different, random,
value for each missing input feature.  This is the "right thing to do"
in the bayesian sense.

[...for binary attributes:]
I'm skeptical of the -1,0,1 encoding, but I think there is more
research to be done here.

[...for ordinal attributes:]
I would treat them as continuous.  

------------------------------------------------------------------------

From: shavlik at cs.wisc.edu (Jude W. Shavlik)

We looked at some of the methods you talked about in
the following article in the journal Machine Learning.

##REF##
%T Symbolic and Neural Network Learning Algorithms: An Experimental Comparison
%A J. W. Shavlik
%A R. J. Mooney
%A G. G. Towell
%J Machine Learning
%V 6
%N 2
%P 111-143
%D 1991

------------------------------------------------------------------------

From: hertz at nordita.dk (John Hertz)

It seems to me that the most natural way to handle missing data is to
leave them out.  You can do this if you work with a recurrent network
(fx Boltzmann machine) where the inputs are fed in by clamping the
input units to the given input values and the rest of the net relaxes
to a fixed point, after which the output is read off the output units.
If some of the input values are missing, the corresponding input units
are just left unclamped, free to relax to values most consistent with 
the known inputs. 

I have meant for a long time to try this on some medical prognosis data
I was working on, but I never got around to it, so I would be happy to
hear how it works if you try it.  

------------------------------------------------------------------------

From: jozo at sequoia.WPI.EDU (Jozo Dujmovic)

In the case of clustering benchmark programs I frequently have
the the problem of estimation of missing data. A relatively
simple SW that implements a heuristic algorithm generates
estimates having the average error of 8%. NN will somehow
"implicitly estimate" the missing data. The two approaches
might even be in some sense equivalent (?).

Jozo

[ I suspect that they are not: When you generate values for the
  missing items and put them in the training set, the network loses the
  information that this data is only estimated. Since estimations are
  not as reliable as true input data, the network will weigh inputs that
  have lots of generated values as less important. If it gets the 'is
  missing' information explicitly, it can discriminate true values from
  estimations instead. ]

------------------------------------------------------------------------

From: guy at cs.uq.oz.au

A final year student of mine worked on the problem of dealing with missing
inputs, without much success. However, the student as not very good, so take
the following opinions with a pinch of salt. 

We (very tentatively) came to the conclusion that if the inputs were
redundant, the problem was easy; if the missing input contained vital
information, the problem was pretty much impossible.

We used the heart disease data. I don't recommend it for the missing inputs
problem. All of the inputs are very good indicators of the correct result,
so missing inputs were not important. 

Apparently there is a large literature in statistics on dealing with missing
inputs. 

Anthony Adams (University of Tasmania) has published a technical report on
this. His email address is "A.Adams at cs.utas.edu.au". 

##REF##
@techreport{kn:Vamplew-91,
   author = "P. Vamplew and A. Adams", 
   address = {Hobart, Tasmania, Australia},
   institution = {Department of Computer Science, University of Tasmania},
   number = {R1-4},
   title = {Real World Problems in Backpropagation: Missing Values and Generalisability},
   year = {1991}
}

------------------------------------------------------------------------

From: Mike Southcott <mlsouth at cssip.levels.unisa.edu.au>

##REF##
I wrote a paper for the Australian conference on neural networks in 1993.
``Classification of Incomplete Data using neural networks''
Southcott, Bogner.

You may find it interesting. You may not be able to get the proceedings
for this conference, but I am in the process of digging up a postscript
copy for someone in the States, so when I do that, I will send
you a copy.

------------------------------------------------------------------------

From: Eric Saund <saund at parc.xerox.com>

I have done some work on unsupervised learning of mulitple cause 
clusters in binary data, for which an appropriate encoding scheme
is -1 = FALSE, 1 = TRUE, and 0 = NO DATA.  This has worked well
for me, but my paradigm is not your standard feedforward network
and uses a different activiation function from the standard
weighted sum followed by sigmoid squashing.
I presented the paper on this work at NIPS:

##REF##
Saund, Eric; 1994; "Unsupervised Learning of Mixtures of Multiple Causes
in Binary Data," in Advances in Neural Information Processing Systems -6-, 
Cowan, J., Tesauro, G, and Alspector, J., eds. Morgan Kaufmann, San Francisco.

------------------------------------------------------------------------

From: Thierry.Denoeux at hds.univ-compiegne.fr

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:
##REF##
X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.

------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

From yong at cns.brown.edu  Tue Feb  8 10:40:35 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Tue, 8 Feb 94 10:40:35 EST
Subject: some questions on training neural nets
Message-ID: <9402081540.AA15383@cns.brown.edu>

On the discussion of cross-validation method, Dr. Plutowski
referred to his paper by writing 

> It proves that two versions of cross-validation
> (one being the "hold-out set" version discussed above, and the other
> being the "delete-1" version) provide unbiased and strongly consistent
> estimates of IMSE  This is statistical jargon meaning that, on
> average, the estimate is accurate, (i.e., the expectation
> of the estimate for given training set size equals the IMSE + a noise term)
> and asymtotically precise (in that as the training set and test set
> size grow large, the estimate converges to the IMSE within the
> constant factor due to noise, with probability 1.)

Comment: 
  This comment is on the above result about "delete-1" version
cross-validation. The  result must have assumed that the
training data set have no outliers (corruption in Y component of a
data point).  Since deleting  a data point that is outlier will cause
a great change in the estimated neural net weights, and also the
squared prediction error on this outliers will be large. This will
then eventually cause a biased estimation of the IMSE. Even if a
robust algorithm is used to estimate the neural net weights in order
to reduce the sensitive of outlier in the estimation, the squared
prediction error on  the outlier will still be large. 
A possible correction would be to weight this outlier less in the
cross-validation, or in another word, to take less attention to this
outlier when delete this outlier.
A weighted cross-validation like this has been discussed
briefly in Liu (1994). The weighting of a data is calculated through
an iterative reweighted algorithm for robust regression. 

  One interesting thing about this version of cross-validation is its
asymptotical equivalency to Moody's criterion (Moody,1992; Liu, 1993).

References:

Liu, Y.(1993) Neural Network Model Selection Using Asymptotic Jackknife
Estimator and Cross-Validation Method. In  C.L. Giles, S.J. Hanson,
and and J.D. Cowan editors, {\em Advances in neural information
processing system}, volume 5, pages 599-606. Morgan Kaufmann, San
Mateo, CA.  

Liu, Y.(1994) Robust Parameter Estimation and Model Selection for Neural
Network Regression. To Appear in Jack D. Cowan, Gerald Tesauro and
Joshua Alspector editors, {\em  Advances in neural information
processing system},  volume 6. Morgan Kaufmann, San Mateo, CA.

Moody, J.E. (1992).The effective number of parameters, an analysis of
generalization and regularization in nonlinear learning system. In
Moody, J.E., Hanson, S.J., and Lippmann, R.P., editors, {\em
Advances in Neural Information Processing System 4}. Morgan Kaufmann
Publication.


----------------------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

From pluto at cs.ucsd.edu  Wed Feb  9 02:39:00 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Tue, 08 Feb 1994 23:39:00 -0800
Subject: some questions on training neural nets
Message-ID: <9402090739.AA07477@odin.ucsd.edu>


	------- Previous Message: ---------

From yong at cns.brown.edu  Tue Feb  8 10:40:35 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Tue, 8 Feb 94 10:40:35 EST
Subject: some questions on training neural nets
Message-ID: <9402081540.AA15383@cns.brown.edu>

On the discussion of cross-validation method, Dr. Plutowski
referred to his paper by writing 

> It proves that two versions of cross-validation
> (one being the "hold-out set" version discussed above, and the other
> being the "delete-1" version) provide unbiased and strongly consistent
> estimates of IMSE  This is statistical jargon meaning that, on
> average, the estimate is accurate, (i.e., the expectation
> of the estimate for given training set size equals the IMSE + a noise term)
> and asymtotically precise (in that as the training set and test set
> size grow large, the estimate converges to the IMSE within the
> constant factor due to noise, with probability 1.)

Comment: 
  This comment is on the above result about "delete-1" version
cross-validation. The  result must have assumed that the
training data set have no outliers (corruption in Y component of a
data point).  Since deleting  a data point that is outlier will cause
a great change in the estimated neural net weights, and also the
squared prediction error on this outliers will be large. This will
then eventually cause a biased estimation of the IMSE.

- ----------------------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

------- End of Previous Message	  ------


No, actually it turns out that delete-1 cross-validation delivers 
unbiased estimates of IMSE under fairly reasonable conditions.
(More precisely, it delivers estimates of IMSE_N + \sigma^2,
for training set size N and noise variance \sigma^2.) 

Roughly, the noise must have variance the same everywhere in input space,
(or, "homoscedasticity" as the statisticians would say,) with examples
selected independently from the same, fixed environment (i.e., "i.i.d.") 
the expectation of the squared-target must be finite (this just ensures
that conditional expectations of the target and the noise exist everywhere)
plus some conditions on the network to make it behave nicely.  

For these same conditions, the estimate is additionally "conservative," 
in that it does not, (asymptotically, anyway, as N grows large) 
underestimate the expected squared error of the network for optimal weights.

(These results and the prerequisite assumptions are of course 
stated more precisely in the paper.)  

However, we did require an additional assumption to obtain the
"strong" convergence result, in that the optimal weights must be unique.
This is to ensure that the weights for each of the deleted
subsets of N-1 examples converge to the weights obtained by training
on all N examples.

As an aside: This latter condition may seem strong, but it seems to be
(intuitively) applicable to a particular variant of delete-1
cross-validation commonly employed to make its computation more feasible -
(in which case the global optima are in a sense "locally" unique under
the right conditions.) In this variant, the network is trained on
the entire training set to obtain the "base" network.
These weights are then "fine-tuned" upon each of the deleted subsets 
of size N-1 to obtain the N cross-validated weight vectors.
This tends to distribute the fine-tuned weights within a local region
that seens to get tighter as the training set size increases.
It tends to work well in practice, under the right conditions. 
(Essentially, you need to ensure that the ratio of examples to weights
is sufficiently large, and it is easy to detect when this is
not the case.)

A bit off the original subject, I suppose, but I hope these results
help clarify what cross-validation is doing, at least in that
wonderfully ideal place called "asymptopia."  It (apparently) turns out that
these conditions suffice to ensure that the detrimental effect of a
malicious outlier becomes negligible as the size of the training
set grows large, at least with respect to the estimation of this 
particular kind of generalization by cross-validation.

= Mark Plutowski
  UCSD: INC and CS&E


P.S. Thank you for the honorable salutation!  Actually, I am 
(still) just a student here.  8-) 8-| 

From lange at ira.uka.de  Wed Feb  9 14:19:22 1994
From: lange at ira.uka.de (lange@ira.uka.de)
Date: Wed, 9 Feb 94 14:19:22 MET
Subject: Methods for improving generalization (was Re: some questions on 
         ...)
Message-ID: <"iraun1.ira.337:09.01.94.13.22.32"@ira.uka.de>

Dear Mr. Hicks,

in your mail to Mr. Grossman you mentioned the "Soft Weight-Sharing" algorithm
and stated, that this algorithm would do some adaption to the data.
I don't think, that this is right: Soft Weight-Sharing is just a bit more
complicated than Weight-Decay or other things (so some improvements have
been made). But Soft Weight-Sharing does not really adapt to the data,
because you have to tune the same parameters as in normal Weight-Decay:
the parameters, that are used to handle the strength of the penalty-term.
The article of Nowlan and Hinton "Simplifying Neural Networks by Soft Weight-
Sharing" does not mention a method to do this automatically - so no "real"
adaption to the data is made.

Maybe the methods of MacKay ("Bayesian Interpolation", Neural Comp. 4 (1992),
page 415-447) could be used to get a fully-automatic adaption. A combination
of this method with Weight-Decay or Soft Weight-Sharing would perhaps be
data-adaptive; but Soft Weight-Sharing alone has still a parameter, that is
not adapted by the data.

Yours,
Frank Lange

From sec at ai.univie.ac.at  Wed Feb  9 08:53:36 1994
From: sec at ai.univie.ac.at (sec@ai.univie.ac.at)
Date: Wed, 9 Feb 1994 14:53:36 +0100
Subject: No subject
Message-ID: <199402091353.AA14535@prater.ai.univie.ac.at>

                              *          *       
                      *                          *
                                                       
                *       TWELFTH EUROPEAN MEETING        *
                                   
              *                    ON                     *

                *    CYBERNETICS AND SYSTEMS RESEARCH   *
                                                            
                      *       (EMCSR 1994)       *

                           April 5 - 8, 1994

                          UNIVERSITY OF VIENNA

      
        organized by the Austrian Society for Cybernetic Studies
                          in cooperation with
 Dept.of Medical Cybernetics and Artificial Intelligence, Univ.of Vienna
                                  and
             International Federation for Systems Research
                                   

Plenary lectures:
***************** 

	MARGARET BODEN (United Kingdom):
        "Artificial Intelligence and Creativity"

      	STEPHEN GROSSBERG (USA):
	"Neural Networks for Learning, Recognition, and Prediction" 

	STUART A. UMPLEBY (USA):
	"Twenty Years of Second Order Cybernetics"

      
241 papers will be presented and discussed in the following symposia:
*********************************************************************

GENERAL SYSTEMS METHODOLOGY
	G.J.Klir (USA)

ADVANCES IN MATHEMATICAL SYSTEMS THEORY
	J.Miro (Spain), M.Peschel (Germany), F.Pichler (Austria)

FUZZY SYSTEMS, APPROXIMATE REASONING AND KNOWLEDGE-BASED SYSTEMS
	C.Carlsson (Finland), K.-P.Adlassnig (Austria), E.P.Klement
	(Austria)

DESIGNING AND SYSTEMS, AND THEIR EDUCATION
	B.Banathy (USA), W.Gasparski (Poland), G.Goldschmidt 
	(Israel)

HUMANITY, ARCHITECTURE AND CONCEPTUALIZATION
	G.Pask (United Kingdom), G.de Zeeuw (Netherlands)

BIOCYBERNETICS AND MATHEMATICAL BIOLOGY
	L.M.Ricciardi (Italy)

SYSTEMS AND ECOLOGY
	F.J.Radermacher (Germany), K.Fedra (Austria)

CYBERNETICS AND INFORMATICS IN MEDICINE
	G.Gell (Austria), G.Porenta (Austria)

CYBERNETICS OF SOCIO-ECONOMIC SYSTEMS
	K.Balkus (USA), O.Ladanyi (Austria)

SYSTEMS, MANAGEMENT AND ORGANIZATION
	G.Broekstra (Netherlands), R.Hough (USA)

CYBERNETICS OF COUNTRY DEVELOPMENT
	P.Ballonoff (USA), T.Koizumi (USA), S.A.Umpleby (USA)

COMMUNICATION AND COMPUTERS
	A M.Tjoa (Austria)

INTELLIGENT AUTONOMOUS SYSTEMS
	J.Rozenblit (USA), H.Praehofer (Austria)

CYBERNETIC PRINCIPLES OF KNOWLEDGE DEVELOPMENT
	F.Heylighen (Belgium), S.A.Umpleby (USA)

CYBERNETICS, SYSTEMS AND PSYCHOTHERAPY
	M.Okuyama (Japan), H.Koizumi (USA)

ARTIFICIAL NEURAL NETWORKS AND ADAPTIVE SYSTEMS
	S.Grossberg (USA), G.Dorffner (Austria)

ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE
	V.Marik (Czech Republic), R.Born (Austria)
 

TUTORIALS:
**********

A SYNTACTIC APPROACH TO HEURISTIC NETWORKS: LINGUISTIC GEOMETRY
	Prof.Boris Stilman, University of Colorado, Denver, USA

FUZZY SETS AND IMPRECISE BUT RELEVANT DECISIONS
	Prof.Christer Carlsson, Abo Akademi University, Abo, Finland

CONTEXTUAL SYSTEMS: A NEW TECHNOLOGY FOR KNOWLEDGE BASED SYSTEM
DEVELOPMENT
	Dr.Irina V. Ezhkova, Russian Academy of Science, Moscow

TWENTY YEARS OF SECOND ORDER CYBERNETICS
	Prof.Stuart A. Umpleby, George Washington University,
	Washington, D.C., USA


PROCEEDINGS: 
************ 

Trappl R.(ed.): CYBERNETICS AND SYSTEMS '94,
2 vols, 1911 pages, World Scientific Publishing, Singapore. 


FOR FURTHER INFORMATION PLEASE CONTACT: 
***************************************

	EMCSR'94 Secretariat
	c/o Austrian Society for Cybernetic Studies
	Schottengasse 3
	A-1010 Vienna
	Austria
	Phone:  +43-1-53532810
	Fax:    +43-1-5320652
	E-mail: sec at ai.univie.ac.at


From gert at jhunix.hcf.jhu.edu  Wed Feb  9 09:32:57 1994
From: gert at jhunix.hcf.jhu.edu (Gert Cauwenberghs)
Date: Wed, 9 Feb 1994 09:32:57 -0500
Subject: "A Learning Analog Neural Network Chip..."
Message-ID: <94Feb9.093258edt.70280-3@jhunix.hcf.jhu.edu>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/cauwenberghs.nips93.ps.Z

A preprint of the paper:

A Learning Analog Neural Network Chip with Continuous-Time Recurrent Dynamics,
  by Gert Cauwenberghs,  8 pages including figures,
  to appear in Advances in Neural Information Processing Systems, vol. 6, 1994,

is available on the neuroprose repository, in compressed PostScript format:

anonymous binary ftp to archive.cis.ohio-state.edu
cd pub/neuroprose
get cauwenberghs.nips93.ps.Z
uncompress and print.

The abstract follows below.
--- Gert Cauwenberghs
    (gert at jhunix.hcf.jhu.edu)

We present experimental results on supervised learning of dynamical features in
an analog VLSI neural network chip.  The recurrent network, containing six
continuous-time analog neurons and 42 free parameters (connection strengths and
thresholds), is trained to generate time-varying outputs approximating given
periodic signals presented to the network.  The chip implements a stochastic
perturbative algorithm, which observes the error gradient along random
directions in the parameter space for error-descent learning.  In addition to
the integrated learning functions and the generation of pseudo-random
perturbations, the  chip provides for teacher forcing and long-term storage of
the volatile  parameters.  The network learns a 1 kHz circular trajectory
in 100 sec.  The chip occupies 2 X 2 mm in a 2 um CMOS process, and dissipates
1.2 mW.

From yong at cns.brown.edu  Wed Feb  9 14:42:14 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Wed, 9 Feb 94 14:42:14 EST
Subject: some questions on training neural nets
Message-ID: <9402091942.AA19342@cns.brown.edu>


Plutowski (Tue, 08 Feb 1994) wrote

 >No, actually it turns out that delete-1 cross-validation delivers 
 >unbiased estimates of IMSE under fairly reasonable conditions.
 >(More precisely, it delivers estimates of IMSE_N + \sigma^2,
 >for training set size N and noise variance \sigma^2.) 

 >Roughly, the noise must have variance the same everywhere in input space,
 >(or, "homoscedasticity" as the statisticians would say,) with examples
 >selected independently from the same, fixed environment (i.e., "i.i.d.") 
 >the expectation of the squared-target must be finite (this just ensures
 >that conditional expectations of the target and the noise exist everywhere)
 >plus some conditions on the network to make it behave nicely.  

 >For these same conditions, the estimate is additionally "conservative," 
 >in that it does not, (asymptotically, anyway, as N grows large) 
 >underestimate the expected squared error of the network for optimal weights.

Outliers are the data points that come in an "unexpected" way, both
in the training  data and in the future. For example, the data is collected
so that a proportional  of them are typos. So as the size of the data gets
large, the number of outliers in them also gets large. Plutowski's
assumption, as I understand it, is to assume the ratio of the number outliers
over the size of data size is very small. 

One way to look at data set containing outliers is to assume noises
are inhomoscedastic. Outlier data points have their noises with large variance,
and good data points have their noises with small variance (Liu 1994).
This is different from Plutowski's   "homoscedasticity" assumption.
Since we have no intention of  predicting the value of outliers, 
robust estimation in both the parameters and the generalization error
requires the "removal" of the outliers.

These discussion, I hope, could convey the idea that when using
cross-validation for the estimation of generalization error, 
some cautions should be taken as regards to the 
influence of Bad data in the training data set. 

------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912


From pluto at cs.ucsd.edu  Wed Feb  9 17:52:55 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Wed, 9 Feb 94 14:52:55 -0800
Subject: Outliers (Was: "Some questions on training..")
Message-ID: <9402092252.AA14771@beowulf>


------- previous message -------
Dr. Liu writes:

Outliers are the data points that come in an "unexpected" way, both
in the training  data and in the future. For example, the data is collected
so that a proportional  of them are typos. So as the size of the data gets
large, the number of outliers in them also gets large. Plutowski's
assumption, as I understand it, is to assume the ratio of the number outliers
over the size of data size is very small. 

One way to look at data set containing outliers is to assume noises
are inhomoscedastic. Outlier data points have their noises with large variance,
and good data points have their noises with small variance (Liu 1994).
This is different from Plutowski's   "homoscedasticity" assumption.
Since we have no intention of  predicting the value of outliers, 
robust estimation in both the parameters and the generalization error
requires the "removal" of the outliers.

These discussion, I hope, could convey the idea that when using
cross-validation for the estimation of generalization error, 
some cautions should be taken as regards to the 
influence of Bad data in the training data set. 

------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

------- end previous message -------

Dear Dr Liu,

Yes, this points out the importance of examining the
assumptions carefully to ensure that they apply to your
particular learning task.  As another example of where these
results do not apply, note that the assumption of mean zero noise 
can be easily violated in discrimination tasks (often referred
to as "classification" tasks) where the noise involves
random misclassification of the target.  

It also points out an appealling definition  of "outlier",
My interpretation of this is the following:
When the noise variance on the target can depends upon the input 
(in statistical jargon, referred to as "heteroscedasticity of
the conditional variance of Y_i given X_i")
there is the possibility that a plot of the conditional 
target variance over the input space could display
discontinuous jumps, corresponding to where it is more likely
to encounter targets that are much more "noisy" - as compared
to targets for neighboring inputs.   Is this accurate?

I look forward to reading (Liu 94).  Can you (or anyone else)
point me to other references utilizing a similar definition
of "outlier?"  (IMHO) "outlier" is quite a value-laden term
that I tend to avoid since I feel it has multiple and
often ambiguous interpretations/definitions.  

I am currently doing work on detection of what I call
"offliers" since I have a precise definition of what this
means to me, and since I hesitate to use the term "outliers"
for the reason stated above.

= Mark


PS: I would appreciate further opinions/references/examples 
of what "outlier" means (either in practice or in theory) 
which I will summarize and post to the mailing list.   


From mlsouth at cssip.levels.unisa.edu.au  Wed Feb  9 21:00:23 1994
From: mlsouth at cssip.levels.unisa.edu.au (mlsouth@cssip.levels.unisa.edu.au)
Date: Thu, 10 Feb 1994 12:30:23 +1030 (CST)
Subject: Missing values
Message-ID: <8610.9402100200@hotham.levels.unisa.edu.au>

Connectionists,

I did a short study on methods for classification of incomplete data
18 months ago.

I compared the statistical methods of discrimination and classification
and the EM algorithm to some neural methods. These methods could only
be applied to an artificial data set due to the inavailability of
a set of real data with missing values. Despite this, I believe that
the conclusions are still sound.

A copy of the paper ``Classification of incomplete data using
neural networks'', M.L. Southcott, R.E. Bogner which was presented
to the Fourth Australian Conference on Neural Networks (ACNN '93)
is available via anonymous ftp from ftp.cssip.edu.au. 
The file is pub/users/michael/southcott.missing.ps

Michael Southcott				mlsouth at cssip.edu.au
Centre for Sensor Signal and Information Processing
SPRI Building, The Levels, Pooraka 5095, South Australia.


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


[ Due to a transmission error at our end, Lutz Prechelt's 28 Kbyte summary
of the missing values discussion got truncated at about 16 Kbytes.  Here is
the second half of his summary.  Sorry for any inconvenience.

 	-- Dave Touretzky, CONNECTIONISTS moderator  ]


------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

From hicks at cs.titech.ac.jp  Fri Feb 11 00:02:54 1994
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Fri, 11 Feb 94 00:02:54 JST
Subject: Methods for improving generalization (was Re: some questions on 
         ...)
In-Reply-To: lange@ira.uka.de's message of Wed, 9 Feb 94 14:19:22 MET <"iraun1.ira.337:09.01.94.13.22.32"@ira.uka.de>
Message-ID: <9402101503.AA16767@maruko.cs.titech.ac.jp>


Dear Mr. Franke Lange (lange at ira.uka.de),

	On Wed, 9 Feb 94 14:19:22 MET you wrote:
>But Soft Weight-Sharing does not really adapt to the data,
>because you have to tune the same parameters as in normal Weight-Decay:
>the parameters, that are used to handle the strength of the penalty-term.
>The article of Nowlan and Hinton "Simplifying Neural Networks by Soft Weight-
>Sharing" does not mention a method to do this automatically - so no "real"
>adaption to the data is made.

I say "every model is adaptive, and no model is adaptive, but some are more
adaptive than others".  Every model has parameters which are adjusted during
learning.  Penalty functions, including soft weight sharing, affects the prior
distribution of weights and so can be thought of as just providing different
models.  All of these models adapt to data.  On the other hand, every model
>must< make some assumptions about which it is adamant.  If it didn't there
wouldn't be a model.  These assumptions are non-adaptive to the data. (note1)

	You further wrote:
>Maybe the methods of MacKay ("Bayesian Interpolation", Neural Comp. 4 (1992),
>page 415-447) could be used to get a fully-automatic adaption. A combination
>of this method with Weight-Decay or Soft Weight-Sharing would perhaps be
>data-adaptive; but Soft Weight-Sharing alone has still a parameter, that is
>not adapted by the data.


The article was very enlighenting.  Figure 1 on page 417 shows the 2 main
steps of modeling which involve Baysian methods: (1) Fit each model to the
data, (2) Assign preferences to the alternative models.  The first step is the
one we are all familiar with.  The second one is the topic of the paper and
consists of assigning objective preferences to each model: the probability of
the data given the model is called the evidence for the model.

Re your idea of "fully-automatic adaption". I will first review the parameters
related to soft weight sharing: (a) the number of weight groups (b) the mean
and variance of each group of weights.  The weight penalty weighting is not
arbitrary but determined by the variance of the squared error (which changes
with time) divided by a factor (determined by cross-validation) to adjust to
the number of free parameters.  I think you mean by "fully-automatic adaption"
that parameters (a) and (b) should be constant during stage (1), and after
running the simulation for a large number of times with different values for
(a) and (b) we should select the best ones with stage (2) methods: i.e. 
weighing the evidence for each model.  This would take a long time BUT we
might get a different answer from the one obtained by choosing (a) and (b) in
stage 1.

However, as to which way is best called "automatic", I would personaly favor
the present stage (1) way, because it automatically (although maybe
imperfectly) estimates the best parameters (a) and (b) implicitly during
learning, leaving less labor for the later and harder stage (2).  I realize I
am getting semantic here.

(note1) Mackay does give a special example of a 100% data-adaptive model: the
Sure Thing hypothesis, which is that the data set will be what it is
(predicted of course before seeing the data, selected afterwards), but this
hypothesis has very small a priori probability.  Too bad for our universe.
The other example is of course stock tips, (predicted of course before seeing
the money, collected afterwards), but look what happened to Micheal Milliken!

Respectfully Yours,

	Craig Hicks

Craig Hicks           hicks at cs.titech.ac.jp | Kore ya kono  Yuku mo kaeru mo
Ogawa Laboratory, Dept. of Computer Science | Wakarete wa   Shiru mo shiranu mo
Tokyo Institute of Technology, Tokyo, Japan |  	    Ausaka no seki        
lab:03-3726-1111 ext.2190 home:03-3785-1974 |  (from hyaku-nin-issyu)
fax: +81(3)3729-0685 (from abroad) 
     03-3729-0685  (from Japan)


From terry at salk.edu  Thu Feb 10 12:45:15 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Thu, 10 Feb 94 09:45:15 PST
Subject: robust statistics
Message-ID: <9402101745.AA28545@salk.edu>

One man's outlier is another man's data point.  Another
way to handle outliers is not to remove them but to model them
explicitly.  Geoff Hinton has pointed out that character
recognition can be made more robust by including models
for background noise such as postmarks.

Steve Nowlan and I recently used mixtures of expert networks
to separate multiple interpenetrating flow fields -- the
transparency problem for visual motion.  The gating network
was used to select regions of the visual field that 
contained reliable estimates of local velocity for 
which there was coherent global support.  There is
evidence for such selection neurons in area MT of primate
visual cortex, a region of cortex that specializes in
the detection of coherent motion.

Terry

-----

From yong at cns.brown.edu  Thu Feb 10 13:39:19 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Thu, 10 Feb 94 13:39:19 EST
Subject: outlier, robust statistics
Message-ID: <9402101839.AA21430@cns.brown.edu>


Plutowski wrote (Wed, 9 Feb 94)

   >It also points out an appealling definition  of "outlier",
   >My interpretation of this is the following:
   >When the noise variance on the target can depends upon the input 
   >(in statistical jargon, referred to as "heteroscedasticity of
   >the conditional variance of Y_i given X_i")
   >there is the possibility that a plot of the conditional 
   >target variance over the input space could display
   >discontinuous jumps, corresponding to where it is more likely
   >to encounter targets that are much more "noisy" - as compared
   >to targets for neighboring inputs.   Is this accurate?

Yes. It is the heuristics behind modelling the error as a mixture of
normal distributions in (Liu 94). In simple words, the statistical
formulation  regards the error for each data points as from a normal 
distribution with  different variances, and regard the variances as
missing observations.  By using a prior on the variance and EM
algorithm, one can estimate the variance. It turns out during the
estimation, the EM algorithm looks for the data points that have
larger variances and down-weights those data points. 

This way of modelling is in agreement with Dr. Sejnowski's view

  >One man's outlier is another man's data point.  Another
  >way to handle outliers is not to remove them but to model them
  >explicitly. ...


Plutowski also wrote (Wed, 9 Feb 94)

   >I look forward to reading (Liu 94).  Can you (or anyone else)
   >point me to other references utilizing a similar definition
   >of "outlier?"  (IMHO) "outlier" is quite a value-laden term
   >that I tend to avoid since I feel it has multiple and
   >often ambiguous interpretations/definitions.  

Box and Tiao (1968) hold similar views.
Outlier are generated from  a distribution
that is a perturbation to the underlying distribution, for example,
a small amount of noise with ever changing distribution in the
background. Huber's (1981) book is referred as a excellent reference.
Anyway, no matter what outlier is, what one really want is to use a
model/method that is not sensitive to them and predict the relevant
information.

References

Box, G.E.P. and Tiao, G.C.(1968) A Bayesian approach to some outlier
problem. Biometrika, 55, 119-129

Huber (1981) Robust Statistics. John Wiley & Sons, Inc..

BTW. I will be a Phd only three month later. 
-------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912


From zl at venezia.rockefeller.edu  Thu Feb 10 20:54:42 1994
From: zl at venezia.rockefeller.edu (Zhaoping Li)
Date: Thu, 10 Feb 94 20:54:42 -0500
Subject: Paper announcement on neuroprose
Message-ID: <9402110154.AA00738@venezia.rockefeller.edu>


FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/li-zhaoping.stereocoding.ps.Z


The file li-zhaoping.stereocoding.ps.Z is now available for copying from the 
Neuroprose archive. This is a 16 page paper plus 6 figures, to be
published in Network: Computation in Neural Systems.
---------------------------------------------------------------------------

        Efficient Stereo Coding in the Multiscale Representation

                Zhaoping Li and Joseph J. Atick

                The Rockefeller University
                1230 York Avenue
                New York, NY 10021, USA


Abstract:

Stereo images are highly redundant; the left and right
frames of typical  scenes are  very similar. We explore the consequences
of the hypothesis that cortical cells --- in addition to their
multiscale coding strategies (Li and Atick 1994a) --- are concerned with
reducing binocular redundancy due to correlations between the two eyes.
We derive the most efficient coding strategies that achieve
binocular decorrelation. It is shown that multiscale coding combined
with a binocular decorrelation strategy  leads to a rich diversity
of cell types. In particular, the theory predicts monocular/binocular cells
as well as a family of disparity selective cells, among which one can identify
cells that are tuned-zero-excitatory, near, far, and tuned inhibitory.
The theory also predicts  correlations between
ocular dominance, cell size, orientation, and disparity selectivities.
Consequences on cortical ocular dominance column formation from
abnormal developmental conditions such as strabismus and monocular
eye closure are also predicted.
These findings are compared with physiological measurements.


Please address correspondence  to Zhaoping Li
----------------------------------------------------------------------------
To obtain a copy:

  ftp archive.cis.ohio-state.edu
  login: anonymous
  password: <your email address>
  cd pub/neuroprose
  binary
  get li-zhaoping.stereocoding.ps.Z
  quit

Then at your system:

  uncompress li-zhaoping.stereocoding.ps
  lpr -P<printer-name> li-zhaoping.stereocoding.ps


Zhaoping Li 
Box 272
Rockefeller University
1230 York Ave
New York, NY 10021
phone: 212-327-7423
fax: 212-327-7422
zl at rockvax.rockefeller.edu


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


[ My attempt to forward Lutz Prechelt's summary of the missing values
discussion was twice foiled by technical problems.  Note to future posters:
do not attempt to transmit lines containing nothing but a period and a
carriage return.  It confuses our FTP software.

  Here is my final attempt to transmit the entire summary.  If this fails,
Lutz will just have to dump it to neuroprose and let people access it
via FTP.  Sorry about the repeated postings.

 	-- Dave Touretzky, CONNECTIONISTS moderator  ]

================================================================


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


A few days ago, I posted some thoughts about how to represent missing
input values to a neural network and asked for comments and further ideas.
This message is a summary of the replies I received (some in my personal mail
some in connectionists).  I show the most significant comments and ideas
and append versions of the messages that are trimmed to the most important parts
(in case somebody wants to keep this discussion in his/her archive)

This was my original message:

------------------------------------------------------------------------


From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <mailman.657.1149540277.24850.connectionists@cs.cmu.edu>

I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value,
      the relevant one being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone
      transformation to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to
      indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

------------------------------------------------------------------------

SUMMARY:

For a), the following ideas were mentioned:
1. use statistical techniques to compute replacement values from the rest
   of the data set
2. use a Boltzman machine to do this for you
3. use an autoencoder feed forward network to do this for you
4. randomize on the missing values (correct in the Bayesian sense)

For b), some experience was reported. I don't know how to summarize
that nicely, so I just don't summarize at all.

For c), no explicit quantitative results were given directly.

Some replies suggest that data is not always missing randomly.
The biases are often known and should be taken into account (e.g.
medical tests are not carried out (resulting in missing data) for
moreless healthy persons more often than for ill persons).

Many replies contained references to published work on this area,
from NN, machine learning, and mathematical statistics.
To ease searching for these references in the replies below, I have
marked them with the string ##REF## (if you have a 'grep' program that
extracts whole paragraphs, you can get them all out with one command).

Thanks to all who answered.
These are the trimmed versions of the replies:

------------------------------------------------------------------------

From: tgd at research.CS.ORST.EDU (Tom Dietterich)

[...for nominal attributes:]
An alternative here is to encode them as bit-strings in a
error-correcting code, so that the hamming distance between any two
bit strings is constant.  This would probably be better than a dense
binary encoding.  The cost in additional inputs is small.  I haven't
tried this though.  My guess is that distributed representations at
the input are a bad idea.

One must always determine WHY the value is missing.  In the heart
disease data, I believe the values were not measured because other
features were believed to be sufficient in each case.  In such cases,
the network should learn to down-weight the importance of the feature
(which can be accomplished by randomizing it---see below).

In other cases, it may be more appropriate to treat a missing value as
a separate value for the feature, e.g., in survey research, where a
subject chooses not to answer a question.

[...for continuous attributes:]
Ross Quinlan suggests encoding missing values as the mean observed
output value when the value is missing.  He has tried this in his
regression tree work.

Another obvious approach is to randomize the missing values--on each
presentation of the training example, choose a different, random,
value for each missing input feature.  This is the "right thing to do"
in the bayesian sense.

[...for binary attributes:]
I'm skeptical of the -1,0,1 encoding, but I think there is more
research to be done here.

[...for ordinal attributes:]
I would treat them as continuous.  

------------------------------------------------------------------------

From: shavlik at cs.wisc.edu (Jude W. Shavlik)

We looked at some of the methods you talked about in
the following article in the journal Machine Learning.

##REF##
%T Symbolic and Neural Network Learning Algorithms: An Experimental Comparison
%A J. W. Shavlik
%A R. J. Mooney
%A G. G. Towell
%J Machine Learning
%V 6
%N 2
%P 111-143
%D 1991

------------------------------------------------------------------------

From: hertz at nordita.dk (John Hertz)

It seems to me that the most natural way to handle missing data is to
leave them out.  You can do this if you work with a recurrent network
(fx Boltzmann machine) where the inputs are fed in by clamping the
input units to the given input values and the rest of the net relaxes
to a fixed point, after which the output is read off the output units.
If some of the input values are missing, the corresponding input units
are just left unclamped, free to relax to values most consistent with 
the known inputs. 

I have meant for a long time to try this on some medical prognosis data
I was working on, but I never got around to it, so I would be happy to
hear how it works if you try it.  

------------------------------------------------------------------------

From: jozo at sequoia.WPI.EDU (Jozo Dujmovic)

In the case of clustering benchmark programs I frequently have
the the problem of estimation of missing data. A relatively
simple SW that implements a heuristic algorithm generates
estimates having the average error of 8%. NN will somehow
"implicitly estimate" the missing data. The two approaches
might even be in some sense equivalent (?).

Jozo

[ I suspect that they are not: When you generate values for the
  missing items and put them in the training set, the network loses the
  information that this data is only estimated. Since estimations are
  not as reliable as true input data, the network will weigh inputs that
  have lots of generated values as less important. If it gets the 'is
  missing' information explicitly, it can discriminate true values from
  estimations instead. ]

------------------------------------------------------------------------

From: guy at cs.uq.oz.au

A final year student of mine worked on the problem of dealing with missing
inputs, without much success. However, the student as not very good, so take
the following opinions with a pinch of salt. 

We (very tentatively) came to the conclusion that if the inputs were
redundant, the problem was easy; if the missing input contained vital
information, the problem was pretty much impossible.

We used the heart disease data. I don't recommend it for the missing inputs
problem. All of the inputs are very good indicators of the correct result,
so missing inputs were not important. 

Apparently there is a large literature in statistics on dealing with missing
inputs. 

Anthony Adams (University of Tasmania) has published a technical report on
this. His email address is "A.Adams at cs.utas.edu.au". 

##REF##
@techreport{kn:Vamplew-91,
   author = "P. Vamplew and A. Adams", 
   address = {Hobart, Tasmania, Australia},
   institution = {Department of Computer Science, University of Tasmania},
   number = {R1-4},
   title = {Real World Problems in Backpropagation: Missing Values and Generalisability},
   year = {1991}
}

------------------------------------------------------------------------

From: Mike Southcott <mlsouth at cssip.levels.unisa.edu.au>

##REF##
I wrote a paper for the Australian conference on neural networks in 1993.
``Classification of Incomplete Data using neural networks''
Southcott, Bogner.

You may find it interesting. You may not be able to get the proceedings
for this conference, but I am in the process of digging up a postscript
copy for someone in the States, so when I do that, I will send
you a copy.

------------------------------------------------------------------------

From: Eric Saund <saund at parc.xerox.com>

I have done some work on unsupervised learning of mulitple cause 
clusters in binary data, for which an appropriate encoding scheme
is -1 = FALSE, 1 = TRUE, and 0 = NO DATA.  This has worked well
for me, but my paradigm is not your standard feedforward network
and uses a different activiation function from the standard
weighted sum followed by sigmoid squashing.
I presented the paper on this work at NIPS:

##REF##
Saund, Eric; 1994; "Unsupervised Learning of Mixtures of Multiple Causes
in Binary Data," in Advances in Neural Information Processing Systems -6-, 
Cowan, J., Tesauro, G, and Alspector, J., eds. Morgan Kaufmann, San Francisco.

------------------------------------------------------------------------

From: Thierry.Denoeux at hds.univ-compiegne.fr

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:
##REF##
X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.

------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

N. Karunanithi and Y. K. Malaiya, "The Scaling Problem in Neural
     Networks for Software Reliability Prediction", Proc. IEEE Int.
    Symposium on Rel. Eng., Oct. 1992, pp. 776-82.

I have not found a simple solution that is general. I think
representation in general and the missing information in specific
are open problems within connectionist research. I am not sure we will
have a magic bullet for all problems. The best approach is to come up
with a specific solution for a given problem.

------------------------------------------------------------------------

From: Bill Skaggs <bill at nsma.arizona.edu>

There is at least one kind of network that has no problem (in
principle) with missing inputs, namely a Boltzmann machine.
You just refrain from clamping the input node whose value is
missing, and treat it like an output node or hidden unit.

This may seem to be irrelevant to anything other than Boltzmann
machines, but I think it could be argued that nothing very much
simpler is capable of dealing with the problem.  When you ask
a network to handle missing inputs, you are in effect asking it
to do pattern completion on the input layer, and for this a
Boltzmann machine or some other sort of attractor network would
seem to be required. 

------------------------------------------------------------------------

From: "Scott E. Fahlman" <sef+ at cs.cmu.edu>
    
[Follow-up to Bill Skaggs:]
Good point, but perhaps in need of clarification for some readers:

There are two ways of training a Boltzmann machine.  In one (the original
form), there is no distinction between input and output units.  During
training we alternate between an instruction phase, in which all of the
externally visible units are clamped to some pattern, and a normalization
phase, in which the whole network is allow to run free.  The idea is to
modify the weights so that, when running free, the external units assume
the various pattern values in the training set in their proper frequencies.
If only some subset of the externally visible units are clamped to certain
values, the net will produce compatible completions in the other units,
again with frequencies that match this part of the training set.

A net trained in this way will (in principle -- it might take a *very* long
time for anything complicated) do what you suggest: Complete an "input"
pattern and produce a compatible output at the same time.  This works even
if the input is *totally* missing.

I believe it was Geoff Hinton who realized that a Boltzmann machine could
be trained more efficiently if you do make a distinction between input and
output units, and don't waste any of the training effort learning to
reconstruct the input.  In this model, the instruction phase clamps both
input and output units to some pattern, while the normalization phase
clamps only the input units.  Since the input units are correct in both
cases, all of the networks learning power (such as it is) goes into
producing correct patterns on the output units.  A net trained in this way
will not do input-completion.

I bring this up because I think many people will only have seen the latter
kind of Boltzmann training, and will therefore misunderstand your
observation.

By the way, one alternative method I have seen proposed for reconstructing
missing input values is to first train an auto-encoder (with some degree of
bottleneck to get generalization) on the training set, and then feed the
output of this auto-encoder into the classification net.  The auto-encoder
should be able to replace any missing values with some degree of accuracy.
I haven't played with this myself, but it does sound plausible.  If anyone
can point to a good study of this method, please post it here or send me
E-mail.

------------------------------------------------------------------------

From: "David G. Stork" <stork at cache.crc.ricoh.com>

##REF##
There is a provably optimal method for performing classification with
missing inputs, described in Chapter 2 of "Pattern Classification and
Scene Analysis" (2nd ed.) by R. O. Duda, P. E. Hart and D. G. Stork,
which avoids the ad-hoc heuristics that have been described by others.
Those interested in obtaining Chapter two via ftp should contact me.

------------------------------------------------------------------------

From: Wray Buntine <wray at ptolemy-ethernet.arc.nasa.gov>

This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.

A classic study in this area, which includes most suggestions
I've read here so far, is

##REF##
@inproceedings{quinlan:ml6,
        AUTHOR = "J.R. Quinlan",
        TITLE = "Unknown Attribute Values in Induction",
        YEAR = 1989,
        BOOKTITLE = "Proceedings of the Sixth International
                        Machine Learning Workshop",
        PUBLISHER = "Morgan Kaufmann",
        ADDRESS = "Cornell, New York"}

The most frequently cited methods I've seen, and they're so common 
amongst the different communities its hard to lay credit:
  1)	 replace missings by their some best guess
  2)     fracture the example into a set of fractional examples
		each with the missing value filled in somehow
  3)     call the missing value another input value

3 is a good thing to do if they are "informative" missing,
i.e.  if someone leaves the entry "telephone number" blank in a 
	questionaire, then maybe they don't have a telephone,
	but probably not good otherwise unless you
	have loads of data and don't mind all the extra
	example types generated (as already mentioned)

1 is a quick and dirty hack at 2.  How good depends on your
application.

2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model".  The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever.  We do it for feed-forward nets in

##REF##
@article{buntine.weigend:bbp,
        AUTHOR = "W.L. Buntine and A.S. Weigend",
        TITLE =  "Bayesian Back-Propagation",
        JOURNAL = "Complex Systems",
        Volume = 5,
        PAGES = "603--643",
        Number = 1,
        YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.

I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.

------------------------------------------------------------------------

From: Volker Tresp <Volker.Tresp at zfe.siemens.de>

In general, the solution to the missing-data problem depends on 
the missing-data mechanism. For example, if you sample the income
of a population and rich people tend to refuse the answer the mean
of your sample is biased. To obtain an unbiased solution
you would have to take into account the missing-data mechanism.

The missing-data mechanism can be ignored if it is independent of 
the input and the output (in the example: the likelihood that a 
person refuses to answer is independent of the person's income). 
Most approaches assume that the missing-data mechanism can be ignored.

There exist a number of ad hoc solutions to the missing-data problem
but it is also possible to approach the problem from a statistical point
of view. In our paper (which will be published in the upcoming 
NIPS-volume and which will be available on neuroprose
shortly) we discuss a systematic likelihood-based approach.
NN-regression  can be framed as a maximum likelihood learning problem
if we assume the standard signal plus Gaussian noise model  

P(x, y) =  P(x) P(y|x)    \propto P(x) exp(-1/(2 \sigma^2) (y - NN(x))^2).


By deriving the probability density function for  a pattern with missing
features  we can formulate a likelihood function including patterns 
with complete and incomplete features.  

The solution  requires an  integration over the missing input. 
In practice, the  integral  is  approximated  using a numerical approximation. 
For networks of Gaussian basis functions,  it is possible to obtain 
closed-form solutions (by extending the EM algorithm).

Our paper also discusses why and when ad hoc solutions --such as substituting
the mean for an unknown input--  are  harmful. For example, 
if the mapping is approximately linear substituting the mean might work
quite well. In general, although, it introduces bias. 


Training with missing and noisy input data is described in:

##REF##
``Training Neural Networks with Deficient Data,''
V. Tresp, S. Ahmad and R. Neuneier, in Cowan, J. D., Tesauro, G., 
and Alspector, J. (eds.), {\em  Advances in Neural Information Processing Systems 6}, Morgan Kaufmann,  1994.

A related paper by Zoubin Ghahramani and Michael Jordan will also appear 
in the  upcoming NIPS-volume.

Recall with missing and noisy data is discussed in (available in neuroprose
as ahmad.missing.ps.Z): 

``Some Solutions to the Missing Feature Problem in Vision,'' 
 S. Ahmad and  V. Tresp,  in  
{\em Advances in Neural Information Processing Systems 5,}
S. J. Hanson, J. D. Cowan,  and C. L. Giles eds.,
San Mateo, CA, Morgan Kaufman,  1993. 

------------------------------------------------------------------------

From: Subhash Kak <kak at gate.ee.lsu.edu>

Missing values in feedback networks raise interesting questions:
Should these values be considered "don't know" values or should 
these be generated in some "most likelihood" fashion? These issues
are discussed in the following paper:

##REF##
S.C. Kak, "Feedback neural networks: new characteristics and a
generalization", Circuits, Systems, Signal Processing, vol. 12,
no. 2, 1993, pp. 263-278.

------------------------------------------------------------------------

From: Zoubin Ghahramani <zoubin at psyche.mit.edu>

I have also been looking into the issue of encoding and learning from
missing values in a neural network. The issue of handling missing
values has been addressed extensively in the statistics literature for
obvious reasons.  To learn despite the missing values the data has to
be filled in, or the missing values integrated over. The basic
question is how to fill in the missing data. There are many different
methods for doing this in stats (mean imputation, regression
imputation, Bayesian methods, EM, etc.). For good reviews see (Little
and Rubin 1987; Little, 1992).

I do not in general recommend encoding "missing" as yet another value
to be learned over. Missing means something in a statistical sense --
that the input could be any of the values with some probability
distribution. You could, for example, augment the original data
filling in different values for the missing data points according to a
prior distribution. Then the training would assign different weights
to the artificially filled-in data points depending on how well they
predict the output (their posterior probability). This is essentially
the method proposed by Buntine and Weigand (1991). Other approaches
have been proposed by Tresp et al. (1993) and Ahmad and Tresp (1993).

I have just written a paper on the topic of learning from incomplete
data. In this paper I bring a statistical algorithm for learning from
incomplete data, called EM, into the framework of nonlinear function
approximation and classification with missing values. This approach
fits the data iteratively with a mixture model and uses that same
mixture model to effectively fill in any missing input or output
values at each step. 

You can obtain the preprint by 
	ftp psyche.mit.edu
	login: anonymous
	cd pub
	get zoubin.nips93.ps
To obtain code for the algorithm please contact me directly.

##REF##
Ahmad, S and Tresp, V (1993) "Some Solutions to the Missing Feature
Problem in Vision." In Hanson, S.J., Cowan, J.D., and Giles, C.L.,
editors, Advances in Neural Information Processing Systems 5. Morgan
Kaufmann Publishers, San Mateo, CA.

Buntine, WL, and Weigand, AS (1991) "Bayesian back-propagation." Complex
Systems. Vol 5 no 6 pp 603-43

Ghahramani, Z and Jordan MI (1994) "Supervised learning from
incomplete data via an EM approach" To appear in Cowan, J.D., Tesauro,
G., and Alspector,J. (eds.). Advances in Neural Information Processing
Systems 6.  Morgan Kaufmann Publishers, San Francisco, CA, 1994.

Little, RJA (1992) "Regression With Missing X's:  A Review." Journal of the
American Statistical Association.  Volume 87, Number 420. pp.
1227-1237

Little, RJA. and Rubin, DB (1987). Statistical Analysis with Missing
Data. Wiley, New York.

Tresp, V, Hollatz J, Ahmad S (1993) "Network structuring and training
using rule-based knowledge." In Hanson, S.J., Cowan, J.D., and
Giles, C.~L., editors,  Advances in Neural Information Processing
Systems 5. Morgan Kaufmann Publishers, San Mateo, CA.

------------------------------------------------------------------------

That's it.

  Lutz
  
Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From n.burgess at ucl.ac.uk  Fri Feb 11 05:00:20 1994
From: n.burgess at ucl.ac.uk (Neil Burgess)
Date: Fri, 11 Feb 94 10:00:20 +0000
Subject: pre-print in neuroprose
Message-ID: <141927.9402111000@link-1.ts.bcc.ac.uk>


FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/burgess.hipmod.ps.Z

*****do not forward to other groups*****

Dear connectionists,
the following preprint has been put on neuroprose, contact 
n.burgess at ucl.ac.uk with any retrieval problems,  
--Neil


	    `A model of hippocampal function'

	Neil Burgess, Michael Recce and John O'Keefe
 	   Dept. of Anatomy, University College, 
		London WC1E 6BT, U.K.

The firing rate maps of hippocampal place cells recorded in a freely 
moving rat are viewed as a set of approximate radial basis functions over
the (2-D) environment of the rat. It is proposed that these firing 
fields are constructed during exploration from `sensory inputs' (tuning 
curve responses to the distance of cues from the rat) and used by cells 
downstream to construct firing rate maps that approximate any desired 
surface over the environment. It is shown that, when a rat moves freely 
in an open field, the phase of firing of a place cell (with respect 
to the EEG $\theta$ rhythm) contains information as to the relative 
position of its firing field from the rat. A model of hippocampal function 
is presented in which the firing rate maps of cells downstream of the 
hippocampus provide a `population vector' encoding the instantaneous 
direction of the rat from a previously encountered reward site, enabling 
navigation to it. A neuronal simulation, involving reinforcement only at 
the goal location, provides good agreement with single cell recording
from the hippocampal region, and can navigate to reward sites in open 
fields using sensory input from environmental cues. The system requires 
only brief exploration, performs latent learning, and can return to a goal
location after encountering it only once.

Neural Networks, to be published. 26 pages, 2Mbytes uncompressed.


From eric at research.nj.nec.com  Fri Feb 11 11:11:29 1994
From: eric at research.nj.nec.com (Eric B. Baum)
Date: Fri, 11 Feb 94 11:11:29 EST
Subject: No subject
Message-ID: <9402111611.AA00562@yin>


        Fifth Annual NEC Research Symposium

    NATURAL AND ARTIFICIAL PARALLEL COMPUTATION

                  PRINCETON, NJ

                 MAY 4 - 5, 1994


NEC Research Institute is pleased to announce that
the Fifth Annual NEC Research Symposium will be
held at the Hyatt Regency Hotel in Princeton, New
Jersey on May 4 and 5, 1994. The title of this
year's symposium is Natural and Artificial Parallel
Computation. The conference will feature ten
invited talks. The speakers are:

-   Larry Abbott, Brandeis University, "Activity-
        Dependent Modulation of Intrinsic Neuronal
        Properties"
-   Catherine Carr, University of Maryland, "Time
        Coding in the Central Nervous System"
-   Bill Dally, MIT, "Bandwidth, Granularity, and
       Mechanisms:  Key Issues in the Design of
       Parallel Computers"
-   Amiram Grinvald, Weitzmann Institute,
       "Architecture and Dynamics of Cell Assemblies in
       the Visual Cortex; New Perspectives From Fast
       and Slow Optical Imaging"
-   Akihiko Konagaya, NEC C&C Research Labs,
       "Knowledge Discovery in Genetic Sequences"
-   Chris Langton, Santa Fe Institute, "SWARM:  An
       Agent Based Simulation System for Research in
       Complex Systems"
-   Thomas Ray, University of Delaware and ATR,
      "Evolution and Ecology of Digital Organisms"
-   Shuichi Sakai, Real World Computing Partnership,
      "RWC Massively Parallel ComputerProject"
-   Shigeru Tanaka, NEC Fundamental Research Labs,
      "A Mathematical Theory for the Experience-
       Dependent Development of Visual Cortex"
-   Leslie Valiant, Harvard University and NECI, "A
       Computational Model for Cognition"

There will be no contributed papers.  Registration
is free of charge, but space is limited.
Registrations will be accepted on a first come,
first served basis.  YOU MUST PREREGISTER. There
will be no on-site registration.  To preregister by
e-mail, send a request to:
symposium at research.nj.nec.com.  
Registrants will receive an acknowledgment, space 
allowing.  A request for preregistration is also 
possible by regular mail to Mrs. Irene Parker, NEC Research
Institute, 4 Independence Way, Princeton, NJ 08540.

Registrants will also be invited to an Open
House/Poster Session and Reception at NEC Research
Institute on Tuesday, May 3.  The Open House will
begin at 3:30 PM and the Reception will begin at
5:30 PM.  In order to estimate headcount, please
indicate in your preregistration request whether
you plan to attend the Open House on May 3.

Registrants are expected to make their own
arrangements for accommodations.  Provided below is
a list of hotels in the area together with daily
room rates.  Please ask for the NEC Corporate Rate
when reserving a room.  Sessions will start at 8:15
AM Wednesday, May 4 and will be scheduled to finish
at approximately 3:30 PM on Thursday, May 5.

Red Roof Inn, South Brunswick      (908)821-8800  $37.99
Novotel Hotel, Princeton           (609)520-1200  $68.00
                                     ($74.00/w breakfast)
Palmer Inn, Princeton              (609)452-2500  $73.00
Marriott Residence Inn, Princeton  (908)329-9600  $85.00 
                                     w/continental breakfast
Summerfield Suites, Princeton      (609)951-0009  $92.00
Hyatt Regency, Princeton           (609)987-1234 $105.00
Marriott Hotel, Princeton          (609)452-7900 $125.00

- - - - - - - - - - - - - - - - - - - - - - - - - -

PLEASE RESPOND BY E-MAIL TO:
symposium at research.nj.nec.com

I would like to attend:

_____     Open House

_____     Symposium


Name:               ____________________________

Organization:       ____________________________

E-mail address:     ____________________________

Phone number:       ____________________________


From bishopc at helios.aston.ac.uk  Fri Feb 11 09:59:33 1994
From: bishopc at helios.aston.ac.uk (bishopc)
Date: Fri, 11 Feb 94 14:59:33 GMT
Subject: Postdoctoral Fellowships
Message-ID: <27570.9402111459@sun.aston.ac.uk>


-------------------------------------------------------------------


                     Aston University 

               Neural Computing Research Group


            TWO POSTDOCTORAL RESEARCH FELLOWSHIPS: 
            --------------------------------------

           FUNDAMENTAL RESEARCH IN NEURAL NETWORKS


Two postdoctoral fellowships, each with a duration of 3 years, 
will be funded by the U.K. Science and Engineering Research Council, 
and are to commence on or after 1 April 1994. These posts are part 
of a major project to be undertaken within the Neural Computing 
Research Group at Aston, and will involve close collaboration with 
Professors Chris Bishop and David Lowe, with additional input from 
Professor David Bounds. This interdisciplinary program requires 
researchers capable of extending theoretical concepts, and 
developing algorithmic and proof-of-principle demonstrations 
through software simulation. The two Research Fellows will work 
on distinct, though closely related, areas as follows:


1. Generalization in Neural Networks

The usual approach to complexity optimisation and model order 
selection in neural networks makes use of computationally intensive 
cross-validation techniques. This project will build on recent 
developments in the use of Bayesian methods and the description 
length formalism to develop systematic techniques for model 
optimization in feedforward neural networks from a principled 
statistical perspective. In its later stages, the project will 
demonstrate the practical utility of the techniques which emerge, 
in the context of a wide range of real-world applications. 


2. Dynamic Neural Networks

Current embodiments of neural networks, when applied to `dynamic' 
events such as time series forecasting, are successful only if 
the underlying `generator' of the data is stationary. If the 
underlying generator is slowly varying in time then we do not 
have a principled basis for designing effective neural network 
structures, though ad hoc procedures do exist. This program will 
address some of the key issues in this area using techniques 
from statistical pattern processing and dynamical systems theory. 
In addition, application studies will be conducted which will 
focus on time series problems and tracking in non-stationary 
noise. 


If you wish to be considered for these positions, please send 
a CV and publications list, together with the names of 3 
referees, to:

    Professor Chris M Bishop
    Neural Computing Research Group
    Aston University
    Birmingham B4 7ET, U.K.
    Tel: 021 359 3611 ext. 4270
    Fax: 021 333 6215
    e-mail: c.m.bishop at aston.ac.uk


From ahmad at interval.com  Fri Feb 11 12:04:37 1994
From: ahmad at interval.com (ahmad@interval.com)
Date: Fri, 11 Feb 94 09:04:37 -0800
Subject: Computing visual feature correspondences
Message-ID: <9402111704.AA28021@iris10.interval.com>


The following paper is available for anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52), in directory
pub/neuroprose, as file "ahmad.correspondence.ps.Z":

Feature Densities are Required for Computing Feature Correspondences

Subutai Ahmad
Interval Research Corporation
1801-C Page Mill Road, Palo Alto, CA 94304
E-mail: ahmad at interval.com

			       Abstract

The feature correspondence problem is a classic hurdle in visual
object-recognition concerned with determining the correct mapping
between the features measured from the image and the features expected
by the model.  In this paper we show that determining good
correspondences requires information about the joint probability
density over the image features.  We propose "likelihood based
correspondence matching" as a general principle for selecting optimal
correspondences. The approach is applicable to non-rigid models,
allows nonlinear perspective transformations, and can optimally deal
with occlusions and missing features. Experiments with rigid and
non-rigid 3D hand gesture recognition support the theory. The
likelihood based techniques show almost no decrease in classification
performance when compared to performance with perfect correspondence
knowledge.


To appear in:
 
Cowan, J.D., Tesauro, G., and Alspector, J.  (Eds.), Advances in
Neural Information Processing Systems 6. San Francisco CA: Morgan
Kaufmann, 1994.


From ahmad at interval.com  Fri Feb 11 13:03:31 1994
From: ahmad at interval.com (ahmad@interval.com)
Date: Fri, 11 Feb 94 10:03:31 -0800
Subject: Training NN's with missing or noisy data
Message-ID: <9402111803.AA28794@iris10.interval.com>


The following paper is available for anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52), in directory
pub/neuroprose, as file "tresp.deficient.ps.Z". (The companion paper,
"Some Solutions to the Missing Feature Problem in Vision" is available
as "ahmad.missing.ps.Z")

	     Training Neural Networks with Deficient Data

  Volker Tresp			Subutai Ahmad
  Siemens AG			Interval Research Corporation
  Central Research		1801-C Page Mill Rd.
  81730 Muenchen, Germany	Palo Alto, CA 94304
  tresp at zfe.siemens.de 		ahmad at interval.com

  		   Ralph Neuneier
  		   Siemens AG
  		   Central Research
  		   Otto-Hahn-Ring 6
  		   81730 Muenchen, Germany
  		   ralph at zfe.siemens.de 

Abstract:

We analyze how data with uncertain or missing input features can be
incorporated into the training of a neural network.  The general
solution requires a weighted integration over the unknown or uncertain
input although computationally cheaper closed-form solutions can be
found for certain Gaussian Basis Function (GBF) networks.  We also
discuss cases in which heuristical solutions such as substituting the
mean of an unknown input can be harmful.


The paper will appear in: 

Cowan, J.D., Tesauro, G., and Alspector, J.  (Eds.), Advances in
Neural Information Processing Systems 6. San Francisco CA: Morgan
Kaufmann, 1994.


Subutai Ahmad
Interval Research Corporation		       Phone: 415-354-3639
1801-C Page Mill Rd.				 Fax: 415-354-0872
Palo Alto, CA 94304			E-mail: ahmad at interval.com


From mel at klab.caltech.edu  Fri Feb 11 15:05:47 1994
From: mel at klab.caltech.edu (Bartlett Mel)
Date: Fri, 11 Feb 94 12:05:47 PST
Subject: NIPS*94 Call for Papers
Message-ID: <9402112005.AA10791@plato.klab.caltech.edu>


 ********* PLEASE NOTE NEW SUBMISSIONS FORMAT FOR 1994 *********


                         CALL FOR PAPERS
              Neural Information Processing Systems
                     -Natural and Synthetic-
        Monday, November 28 - Saturday, December 3, 1994
                        Denver, Colorado

This is the  eighth meeting  of an  interdisciplinary  conference
which   brings   together  neuroscientists,  engineers,  computer
scientists, cognitive scientists, physicists, and  mathematicians
interested    in    all    aspects    of  neural  processing  and
computation.  The conference will include invited talks, and oral
and  poster  presentations  of refereed papers.  There will be no
parallel sessions.  There  will  also  be  one  day  of  tutorial
presentations  (Nov   28) preceding the  regular session, and two
days of focused workshops will follow at a nearby ski  area  (Dec
2-3).

Major categories for paper submission, and examples  of  keywords
within categories, are the following:

  Neuroscience: systems physiology, cellular physiology,  signal
  and noise analysis, oscillations, synchronization, inhibition,
  neuromodulation, synaptic plasticity, computational models.

  Theory:  computational  learning  theory,  complexity  theory,
  dynamical  systems,  statistical  mechanics,  probability  and
  statistics, approximation theory.

  Implementations: VLSI, optical, parallel processors,  software
  simulators, implementation languages.

  Algorithms   and    Architectures:    learning     algorithms,
  constructive/pruning  algorithms,  localized  basis functions,
  decision  trees,  recurrent  networks,   genetic   algorithms,
  combinatorial optimization, performance comparisons.

  Visual   Processing:    image    recognition,    coding    and
  classification,    stereopsis,    motion   detection,   visual
  psychophysics.

  Speech, Handwriting and Signal Processing: speech recognition,
  coding   and   synthesis,  handwriting  recognition,  adaptive
  equalization, nonlinear noise removal.

  Applications:  time-series  prediction,   medical   diagnosis,
  financial   analysis,  DNA/protein  sequence  analysis,  music
  processing, expert systems.

  Cognitive Science & AI: natural language, human  learning  and
  memory, perception and psychophysics, symbolic reasoning.

  Control, Navigation,  and  Planning:  robotic  motor  control,
  process   control,  navigation,  path  planning,  exploration,
  dynamic programming.

Review  Criteria:   All  submitted  papers  will  be   thoroughly
refereed on the basis of technical quality, novelty, significance
and clarity.  Submissions should contain new  results  that  have
not  been published previously.  Authors are encouraged to submit
their most recent work, as there will be an opportunity after the
meeting  to  revise  accepted manuscripts before submitting final
camera-ready copy.

********** PLEASE NOTE NEW SUBMISSIONS FORMAT FOR 1994 **********

Paper Format:  Submitted papers may  be  up  to  eight  pages  in
length.   The  page  limit  will  be  strictly  enforced, and any
submission exceeding eight pages will not be considered.  Authors
are  encouraged  (but  not  required) to use the NIPS style files
obtainable by anonymous FTP at the sites given below. Papers must
include  physical  and  e-mail addresses of all authors, and must
indicate one of the nine major categories listed  above,  keyword
information  if  appropriate,  and  preference for oral or poster
presentation.  Unless otherwise indicated, correspondence will be
sent to the first author.

Submission Instructions: Send six copies of submitted  papers  to
the  address  given  below;  electronic  or FAX submission is not
acceptable.  Include one additional copy of the abstract only, to
be  used  for preparation of the abstracts booklet distributed at
the meeting.  Submissions mailed first-class  within  the  US  or
Canada  must  be  postmarked  by  May 21, 1994.  Submissions from
other places must be received by this date.  Mail submissions to:

	David Touretzky
	NIPS*94 Program Chair
	Computer Science Department
	Carnegie Mellon University
	5000 Forbes Avenue
	Pittsburgh PA 15213-3890  USA

Mail general inquiries/requests for registration material to:

	NIPS*94 Conference
	NIPS Foundation
	PO Box 60035
	Pasadena, CA 91116-6035  USA
	(e-mail: nips94 at caltech.edu)

FTP sites for LaTex style files "nips.tex" and "nips.sty":

	helper.systems.caltech.edu (131.215.68.12) in /pub/nips
	b.gp.cs.cmu.edu (128.2.242.8) in /usr/dst/public/nips

NIPS*94 Organizing Committee: General Chair, Gerry Tesauro,  IBM;
Program  Chair,  David Touretzky, CMU; Publications Chair, Joshua
Alspector, Bellcore;  Publicity  Chair,  Bartlett  Mel,  Caltech;
Workshops  Chair,  Todd  Leen,  OGI;  Treasurer,  Rodney Goodman,
Caltech; Local  Arrangements,  Lori  Pratt,  Colorado  School  of
Mines; Tutorials Chairs, Steve Hanson, Siemens and Gerry Tesauro,
IBM; Contracts, Steve Hanson, Siemens and Scott Kirkpatrick, IBM;
Government   &  Corporate  Liaison,  John  Moody,  OGI;  Overseas
Liaisons: Marwan Jabri, Sydney Univ., Mitsuo  Kawato,  ATR,  Alan
Murray,  Univ.  of  Edinburgh,  Joachim  Buhmann,  Univ. of Bonn,
Andreas Meier, Simon Bolivar Univ.


      DEADLINE FOR SUBMISSIONS IS MAY 21, 1994 (POSTMARKED)

                          -please post-


From yamauchi at alpha.ces.cwru.edu  Fri Feb 11 17:24:43 1994
From: yamauchi at alpha.ces.cwru.edu (Brian Yamauchi)
Date: Fri, 11 Feb 94 17:24:43 -0500
Subject: Preprints Available
Message-ID: <9402112224.AA03791@yuggoth.CES.CWRU.Edu>

The following papers are available via anonymous ftp from
yuggoth.ces.cwru.edu:

----------------------------------------------------------------------

Sequential Behavior and Learning in Evolved Dynamical Neural Networks

	       Brian Yamauchi(1) and Randall Beer(1,2)

	  Department of Computer Engineering and Science(1)
		       Department of Biology(2)
		   Case Western Reserve University
			 Cleveland, OH 44106

      Case Western Reserve University Technical Report CES-93-25

	  This paper will be appearing in Adaptive Behavior.

			       Abstract

This paper explores the use of a real-valued modular genetic algorithm
to evolve continuous-time recurrent neural networks capable of
sequential behavior and learning.  We evolve networks that can
generate a fixed sequence of outputs in response to an external
trigger occurring at varying intervals of time.  We also evolve
networks that can learn to generate one of a set of possible sequences
based upon reinforcement from the environment.  Finally, we utilize
concepts from dynamical systems theory to understand the operation of
some of these evolved networks.  A novel feature of our approach is
that we assume neither an a priori discretization of states or time
nor an a priori learning algorithm that explicitly modifies network
parameters during learning.  Rather, we merely expose dynamical neural
networks to tasks that require sequential behavior and learning and
allow the genetic algorithm to evolve network dynamics capable of
accomplishing these tasks.

Files:

/pub/agents/yamauchi/seqlearn.ps.Z	Article Text (73K)
/pub/agents/yamauchi/seqlearn-fig.ps.Z	Figures (654K)

----------------------------------------------------------------------

    Integrating Reactive, Sequential, and Learning Behavior Using
		      Dynamical Neural Networks

	      Brian Yamauchi(1,3) and Randall Beer(1,2)

	  Department of Computer Engineering and Science(1)
		       Department of Biology(2)
		   Case Western Reserve University
			 Cleveland, OH 44106

    Navy Center for Applied Research in Artificial Intelligence(3)
		      Naval Research Laboratory
		      Washington, DC 20375-5000

 This paper has been submitted to the Third International Conference
		 on Simulation of Adaptive Behavior.

			       Abstract

This paper explores the use of dynamical neural networks to control
autonomous agents in tasks requiring reactive, sequential, and
learning behavior.  We use a genetic algorithm to evolve networks that
can solve these tasks.  These networks provide a mechanism for
integrating these different types of behavior in a smooth, continuous
manner.  We applied this approach to three different task domains:
landmark recognition using sonar on a real mobile robot,
one-dimensional navigation using a simulated agent, and
reinforcement-based sequence learning.  For the landmark recognition
task, we evolved networks capable of differentiating between two
different landmarks based on the spatiotemporal information in a
sequence of sonar readings obtained as the robot circled the landmark.
For the navigation task, we evolved networks capable of associating
the location of a landmark with a corresponding goal location and
directing the agent to that goal.  For the sequence learning task, we
evolved networks that can learn to generate one of a set of possible
sequences based upon reinforcement from the environment.  A novel
feature of the learning aspects of our approach is that we assume
neither an a priori discretization of states or time nor an a priori
learning algorithm that explicitly modifies network parameters during
learning.  Instead, we expose dynamical neural networks to tasks that
require learning and allow the genetic algorithm to evolve network
dynamics capable of accomplishing these tasks.

Files:

/pub/agents/yamauchi/integ.ps.Z		Complete Article (233K)

If your printer has problems printing the complete document as a
single file, try printing the following two files:

/pub/agents/yamauchi/integ-part1.ps.Z	Pages 1-8 (77K)
/pub/agents/yamauchi/integ-part2.ps.Z	Pages 9-11 (147K)

----------------------------------------------------------------------

 On the Dynamics of a Continuous Hopfield Neuron with Self-Connection

			     Randall Beer

	    Department of Computer Engineering and Science
			Department of Biology
		   Case Western Reserve University
			 Cleveland, OH 44106

      Case Western Reserve University Technical Report CES-94-1

	 This paper has been submitted to Neural Computation.

Continuous-time recurrent neural networks are being applied to a wide
variety of problems.  As a first step toward a comprehensive
understanding of the dynamics of such networks, this paper studies the
dynamical behavior of their basic building block: a continuous
Hopfield neuron with self-connection.  Specifically, we characterize
the equilibria of this model neuron and the dependence of those
equilibria on the parameters.  We also describe the bifurcations of
this model and derive very accurate approximate expressions for its
bifurcation set.  Finally, we indicate how the basic theory developed
in this paper generalizes to a larger class of related model neurons.

File:

/pub/agents/beer/CTRNNDynamics1.ps.Z	Complete Article (233K)

----------------------------------------------------------------------

FTP instructions:

To retrieve and print a file (for example: seqlearn.ps), use the
following commands:

unix> ftp yuggoth.ces.cwru.edu
Name: anonymous
Password: (your email address)
ftp> binary
ftp> cd /pub/agents/yamauchi (or cd /pub/agents/beer for CTRNNDynamics1.ps.Z)
ftp> get seqlearn.ps.Z
ftp> quit
unix> uncompress seqlearn.ps.Z
unix> lpr seqlearn.ps

(ls doesn't currently work properly on our ftp server.  This will be fixed
soon, but in the meantime, these files can still be copied, even though
they don't appear in the directory listing.)

_______________________________________________________________________________

Brian Yamauchi			Case Western Reserve University
yamauchi at alpha.ces.cwru.edu	Department of Computer Engineering and Science
_______________________________________________________________________________


From isabelle at neural.att.com  Fri Feb 11 20:51:16 1994
From: isabelle at neural.att.com (Isabelle Guyon)
Date: Fri, 11 Feb 94 20:51:16 EST
Subject: robust statistics
Message-ID: <9402120151.AA21483@neural>


I would like to bring more arguments to Terry's remarks:

> One man's outlyer is another man's data point.

If the data is perfectly clean, outlyers are very valuable patterns.

From mmoller at daimi.aau.dk  Mon Feb 14 02:15:18 1994
From: mmoller at daimi.aau.dk (Martin Fodslette M|ller)
Date: Mon, 14 Feb 1994 08:15:18 +0100
Subject: Thesis available.
Message-ID: <199402140715.AA18638@titan.daimi.aau.dk>


/*******************  PLEASE DO NOT FORWARD ***********************/

I finally finished up my thesis:

	Efficient Training of Feed-Forward Neural Networks


The thesis has the following content:

	Chapter 1.  Resume in danish (should anyone need that (-:)

	Chapter 2.  Notation and basic definitions.

	Chapter 3.  Training Methods: An Overview

	Chapter 4.  Calculation of Hessian Information

	Chapter 5.  Different Error Functions.

	Appendix A. A Scaled Conjugate Gradient Algorithm
		    for Fast Supervised Learning.

	Appendix B. Supervised Learning on Large Redundant
		    Training Sets.

	Appendix C. Exact Calculation of the Product of the Hessian
		    Matrix and a Vector in O(N) time.

	Appendix D. Adaptive Preconditioning of the Hessian Matrix.

	Appendix E. Improving Network Solutions.


The appendices concerns own work (original contributions), while
the chapters provide an overview.

The thesis is now available in a limited number of hard-copies.

People interested in a copy should send an email with there address to me.


Best Regards

-martin

----------------------------------------------------------------
Martin Moller                    email: mmoller at daimi.aau.dk
Computer Science Dept.           Fax:   +45 8942 3255
Aarhus University                Phone: +45 8942 3371
Ny Munkegade, Build. 540,
DK-8000 Aarhus C,
Denmark
----------------------------------------------------------------


From edelman at wisdom.weizmann.ac.il  Mon Feb 14 02:39:27 1994
From: edelman at wisdom.weizmann.ac.il (Edelman Shimon)
Date: Mon, 14 Feb 1994 09:39:27 +0200
Subject: TR available: Representation of similarity in 3D ...
Message-ID: <199402140739.JAA00503@eris.wisdom.weizmann.ac.il>

FTP-host: eris.wisdom.weizmann.ac.il
FTP-filename: /pub/tr-94-02.ps.Z
URL: http://eris.wisdom.weizmann.ac.il/

Uncompressed size: 2.6 Mb. Preliminary version; comments welcome.


Representation of similarity in 3D object discrimination

Shimon Edelman

\begin{abstract}

  How does the brain represent visual objects? In simple perceptual
  generalization  tasks, the human visual system performs as if
  it represents the stimuli in a low-dimensional metric psychological
  space \cite{Shepard87}. In theories of 3D shape recognition, the
  role of feature-space representations (as opposed to structural
  \cite{Biederman87} or pictorial \cite{Ullman89} descriptions) has
  been for a long time a major point of contention. If shapes are
  indeed represented as points in a feature space, patterns of
  perceived similarity among different objects must reflect the
  structure of this space. The feature space hypothesis can then be
  tested by presenting subjects with complex parameterized 3D shapes,
  and by relating the similarities among subjective representations,
  as revealed in the response data by multidimensional scaling
  \cite{Shepard80}, to the objective parameterization of the stimuli.
  The results of four such tests, reported below, support the notion
  that discrimination among 3D objects may rely on a low-dimensional
  feature space representation, and suggest that this space may be
  spanned by explicitly encoded class prototypes.  

\end{abstract}

From grumbach at inf.enst.fr  Mon Feb 14 03:51:22 1994
From: grumbach at inf.enst.fr (grumbach@inf.enst.fr)
Date: Mon, 14 Feb 94 09:51:22 +0100
Subject: papers on time and neural networks
Message-ID: <9402140851.AA10372@enst.enst.fr>


As guest editors of a special issue of the Sigart Bulletin about :

                      Time and Neural Networks

we are looking for 4 articles about 10 pages each.

Sigart is a quarterly publication of the Association for Computing
Machinery (ACM) special interest group on Artificial Intelligence.

The paper may either deal with approachs of time processing using
traditional connectionist architectures, or with more specific models
integrating time in their basis.

If you are interested, and if you can submit a paper (not already
published) within a short delay (about 1 month and a half), please send a
draft (if possible a Word file) :
- preferably by giving ftp access to it (information via e-mail)
- or sending it as "attached file" on e-mail
- or posting a paper copy of it.

Drafts should be received before April 1.
Notification of acceptance will be sent before April 20.

grumbach at enst.fr or chaps at enst.fr

Alain Grumbach and Cedric Chappelier
ENST dept INF
46 rue Barrault
75634 Paris Cedex 13
France


From P.Refenes at cs.ucl.ac.uk  Mon Feb 14 09:13:12 1994
From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk)
Date: Mon, 14 Feb 94 14:13:12 +0000
Subject: robust statistics
In-Reply-To: Your message of "Thu, 10 Feb 94 09:45:15 PST." <9402101745.AA28545@salk.edu>
Message-ID: <mailman.659.1149540277.24850.connectionists@cs.cmu.edu>

The term outliers does not mean that they are not part of the joint
data probability distribution or that they contain no information
for estimating the regression surface; it means rather that outliers are too small 
a fraction of the observations to be allowed to dominate the small-sample 
behaviour of the statistics to be calculated. With parametric regression
modelling techniques it is easy to quantify this efefct by simply
comptuing the effect that each data point has on the regression surface.
This is not a trivial problem in non-parametric modelling but the
statistics literature is full of methods to deal with it.

Paul refenes

From rsun at cs.ua.edu  Mon Feb 14 12:22:20 1994
From: rsun at cs.ua.edu (Ron Sun)
Date: Mon, 14 Feb 1994 11:22:20 -0600
Subject: No subject
Message-ID: <9402141722.AA28238@athos.cs.ua.edu>


A monograph on connectionist models is available 
from John Wiley and Sons, Inc. 

Title: Integrating Rules and Conenctionism for Robust Commonsense Reasoning

ISBN 0-471-59324-9
Author:   Ron Sun
          Assistant Professor
          Department of Computer Science
          The University of Alabama
          Tuscaloosa, AL 35487


contact John Wiley and Sons, Inc.
at   1-800-call-wiley

Or
John Wiley and Sons, Inc.
605 Third Ave.
New York, NY 10158-0012 USA
(212) 850-6589
FAX: (212) 850-6088

------------------------------------------------------------------
A brief description is as follows:

One of the outstanding problems for artificial intelligence is 
the problem of better modeling commonsense reasoning
and alleviating brittleness of traditional symbolic rule-based models.
This work tackles this problem by trying to  combining rules with 
connectionist models in an integrated framework.
This idea leads to the development of a connectionist
architecture with dual representation combining symbolic and subsymbolic 
(feature-based) processing for evidential robust reasoning: {\sc CONSYDERR}.
Reasoning data are analyzed based on the notions of {\it rules} and 
{\it similarity} and modeled by the architecture which carries out 
rule application and similarity matching through interaction of the two levels;
formal analyses are performed to understand  rule encoding in connectionist
models, in order to prove that it handles a superset of Horn clause logic and 
a nonmonotonic logic; the notion of causality is explored for the purpose 
of clarifying  how the proposed architecture can better capture commonsense 
reasoning, and it is shown that causal knowledge can be well represented by 
{\sc CONSYDERR} and utilized in reasoning, which further justifies the design 
of the architecture; the variable binding problem is addressed, and a solution 
is proposed within this architecture and is shown to surpass existing ones;
several aspects of the architecture are discussed to demonstrate how 
connectionist models can supplement, enhance, and integrate symbolic 
rule-based reasoning; large-scale application-oriented systems are prototyped.
This architecture utilizes the synergy resulting from the interaction of
the two different types of representation and processing, and is therefore  
capable of handling a large number of difficult issues in one integrated
framework, such as partial and inexact information, cumulative evidential 
combination, lack of exact match, similarity-based inference, inheritance,
and representational interactions, all of which are proven to be crucial
elements of commonsense reasoning.  The results show that connectionism 
coupled with symbolic processing capabilities can be effective and 
efficient models of reasoning for both theoretical and practical purposes.


Table of Content

 1 Introduction
 1.1 Overview
 1.2 Commonsense Reasoning
 1.3 The Problem of Common Reasoning Patterns
 1.4 What is the Point?
 1.5 Some Clarifications
 1.6 The Organization of the Book
 1.7 Summary

 2 Accounting for Commonsense Reasoning: A Framework with Rules and Similarities
 2.1 Overview
 2.2 Examples of Reasoning
 2.3 Patterns of Reasoning
 2.4 Brittleness of Rule-Based Reasoning
 2.5 Towards a Solution
 2.6 Some Reflections on Rules and Connectionism
 2.7 Summary

 3 A Connectionist Architecture for Commonsense Reasoning
 3.1 Overview
 3.2 A Generic Architecture
 3.3 Fine-Tuning --- from Constraints to Specifications
 3.4 Summary
 3.5 Appendix

 4 Evaluations and Experiments
 4.1 Overview
 4.2 Accounting for the Reasoning Examples
 4.3 Evaluations of the Architecture
 4.4 Systematic Experiments
 4.5 Choice, Focus and Context
 4.6 Reasoning with Geographical Knowledge
 4.7 Applications to Other Domains
 4.8 Summary
 4.9 Appendix: Determining Similarities and CD representations

 5 More on the Architecture: Logic and Causality
 5.1 Overview
 5.2 Causality in General
 5.3 Shoham's Causal Theory
 5.4 Defining FEL
 5.5 Accounting for Commonsense Causal Reasoning
 5.6 Determining Weights
 5.7 Summary
 5.8 Appendix: Proofs For Theorems

 6 More on the Architecture: Beyond Logic
 6.1 Overview
 6.2 Further Analysis of Inheritance
 6.3 Analysis of Interaction in Representation
 6.4 Knowledge Acquisition, Learning, and Adaptation 
 6.5 Summary

 7 An Extension: Variables and Bindings
 7.1 Overview
 7.2 The Variable Binding Problem
 7.3 First-Order FEL
 7.4 Representing Variables
 7.5 A Formal Treatment
 7.6 Dealing with Difficult Issues
 7.7 Compilation
 7.8 Correctness
 7.9 Summary
 7.10 Appendix

 8 Reviews and Comparisons
 8.1 Overview
 8.2 Rule-Based Reasoning
 8.3 Case-Based Reasoning
 8.4 Connectionism
 8.5 Summary

 9 Conclusions
 9.1 Overview
 9.2 Some Accomplishments
 9.3 Lessons Learned
 9.4 Existing Limitations
 9.5 Future Directions
 9.6 Summary

 References


From trevor at white.Stanford.EDU  Mon Feb 14 17:37:50 1994
From: trevor at white.Stanford.EDU (Trevor Darrell)
Date: Mon, 14 Feb 94 14:37:50 PST
Subject: outlier, robust statistics
In-Reply-To: Terry Sejnowski's message of Thu, 10 Feb 94 09:45:15 PST <9402101745.AA28545@salk.edu>
Message-ID: <9402142237.AA24561@white.Stanford.EDU>


   [terry at salk.edu]
   One man's outlier is another man's data point.  Another
   way to handle outliers is not to remove them but to model them
   explicitly.  Geoff Hinton has pointed out that character
   recognition can be made more robust by including models
   for background noise such as postmarks.

Explicitly modeling an occluding or transparently combined "outlier"
process is a powerful way to build a robust estimator. As mentioned in
other replies to this post, estimators which use a mixture model
(either implicitly or explicitly), such as the EM algorithm, are
promising methods to implement this type of strategy.

One issue which often complicates matters is how to decide how many
objects or processes there are in the signal, e.g. determine K in the
EM estimator. I would like to ask if anyone has a pointer to work on
estimating K in the context of an EM estimator or similar methods?
Often the appropriate cardinality of the model is not easily known
a priori.

   Steve Nowlan and I recently used mixtures of expert networks
   to separate multiple interpenetrating flow fields -- the
   transparency problem for visual motion.  The gating network
   was used to select regions of the visual field that 
   contained reliable estimates of local velocity for 
   which there was coherent global support.  There is
   evidence for such selection neurons in area MT of primate
   visual cortex, a region of cortex that specializes in
   the detection of coherent motion.

I'd also like to add a pointer to some related work Sandy Pentland,
Eero Simoncelli and I have done in this domain developing a strategy
for robust estimation ("outlier exclusion") based on minimum
description length theory. Our method effectively implements a
clustering method to find how many processes there are (e.g. estimate
K), and then iteratively refine estimates of the parameters and
"support" (segmentation) of those processes.  We have developed
versions of this method for range and motion segmentation, both for
occluded and transparently combined processes.

   [pluto at cs.ucsd.edu:]
   >I look forward to reading (Liu 94).  Can you (or anyone else)
   >point me to other references utilizing a similar definition
   >of "outlier?"  (IMHO) "outlier" is quite a value-laden term
   >that I tend to avoid since I feel it has multiple and
   >often ambiguous interpretations/definitions.  

Here are some references to conference papers on our work. A longer
journal paper that combines these is in the works, email me if you
would like a preprint when it becomes available.

Darrell, Sclaroff and Pentland, "Segmentation by Minimal Description",
Proc. 3rd Intl. Conf. Computer Vision, Osaka, Japan, 1990 (also
avail. as MIT Media Lab Percom TR-163.)

Darrell and Pentland, "Robust Estimation of a Multi-Layer Motion 
Representation", Proc. IEEE Workshop on Visual Motion, Princeton, October 1991

Darrell and Pentland, "Against Edges: Function Approximation with
Multiple Support Maps", NIPS 4, 1992

Darrell and Simoncelli, "Separation of Transparent Motion into Layers
using Velocity-tuned Mechanisms", Assn. for Resarch in Vision and
Opthm. (ARVO) 1993, also available as MIT Media Lab Percom TR-244.

(Percom TR's can be anon. ftp'ed from whitechapel.media.mit.edu)

--trevor


From jagota at next1.msci.memst.edu  Mon Feb 14 20:18:56 1994
From: jagota at next1.msci.memst.edu (Arun Jagota)
Date: Mon, 14 Feb 1994 19:18:56 -0600
Subject: DIMACS Challenge neural net papers
Message-ID: <199402150118.AA02676@next1>


Dear Connectionists:

Expanded versions of two neural net papers presented at the DIMACS Challenge 
on Cliques, Coloring, and Satisfiability are now available via anonymous ftp
(see below). First an excerpt from the Challenge announcement back in 1993:

			----------------------
The purpose of this Challenge is to encourage high quality empirical
research on difficult problems.  The problems chosen are known to be
difficult to solve in theory. How difficult are they to solve in practice?  
			----------------------

ftp ftp.cs.buffalo.edu (or 128.205.32.9 subject-to-change)
Name : anonymous
> cd users/jagota
> binary
> get DIMACS_Grossman.ps.Z
> get DIMACS_Jagota.ps.Z
> quit
> uncompress *.Z

Sorry, no hard copies. Copies may be requested by electronic mail to me 
(jagota at next1.msci.memst.edu) for those without access to ftp or for whom
ftp fails. Please use as last resort.

		Applying The INN Model to the MaxClique Problem

		  Tal Grossman, email: tal at goshawk.lanl.gov
      Complex Systems Group, T-13, and Center for Non Linear Studies
      		MS B213, Los Alamos National Laboratory
			 Los Alamos, NM 87545
 
		  Los Alamos Tech Report: LA-UR-93-3082

A neural network model, the INN (Inverted Neurons Network), is applied to
the Maximum Clique problem. First, I describe the INN model and how it 
implements a given graph instance. The model has a threshold parameter $t$, 
which determines the character of the network stable states. As shown in an 
earlier work (Grossman-Jagota), the stable states of the network correspond 
to the $t$-codegree sets of its underlying graph, and, in the case of $t<1$, 
to its maximal cliques. These results are briefly reviewed. In this work I 
concentrate on improving the deterministic dynamics called $t$-annealing.
The main issue is the initialization procedure and the choice of parameters.
Adaptive procedures for choosing the initial state of the network and
setting the threshold are presented. The result is the ``Adaptive t-Annealing"
algorithm (AtA). This algorithm is tested on many benchmark problems and found 
to be more efficient  than steepest descent or the simple t-annealing procedure.


		Approximately Solving Maximum Clique using 
		  Neural Network and Related Heuristics *

	   Arun Jagota 				Laura Sanchis 
     Memphis State University		      Colgate University

			   Ravikanth Ganesan 
		State University of New York at Buffalo

We explore neural network and related heuristic methods for the fast 
approximate solution of the Maximum Clique problem. One of these algorithms, 
{\em Mean Field Annealing}, is implemented on the Connection Machine CM-5 and 
a fast annealing schedule is experimentally evaluated on random graphs, as 
well as on several benchmark graphs. The other algorithms, which perform 
certain randomized local search operations, are evaluated on the same 
benchmark graphs, and on {\bf Sanchis} graphs. One of our algorithms adjusts 
its internal parameters as its computation evolves. On {\bf Sanchis} graphs,
it finds significantly larger cliques than the other algorithms do. Another 
algorithm, GSD$(\emptyset)$, works best overall, but is slower than the 
others. All our algorithms obtain significantly larger cliques than other 
simpler heuristics but run slightly slower; they obtain significantly smaller 
cliques on average than exact algorithms or more sophisticated heuristics but 
run considerably faster. All our algorithms are simple and inherently 
parallel.

* - 24 pages in length (twice as long as its previous version). 

Arun Jagota

From terry at salk.edu  Tue Feb 15 02:56:04 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Mon, 14 Feb 94 23:56:04 PST
Subject: outlier, robust statistics
Message-ID: <9402150756.AA17907@salk.edu>

I have received many requests for a reference to the motion model
I mentioned recently in the context of robust statistics.
An early version can be found in:

Nowlan, S. J. and Sejnowski, T. J., Filter selection model for generating
visual motion signals, In: C. L. Giles, S. J. Hanson and J. D. Cowan (Eds.)
Advances in Neural Information Processing Systems 5, San Mateo, CA:
Morgan Kaufman Publishers, 369-376 (1993).

Two longer papers on the computational theory and the biological
consequences are in review.

Darrell and Pentland have an interesting iterative approach
in which multiple hypotheses compete to include motion samples
within their regions of support.  A relaxation scheme must decide
on the number of objects and the correct velocity assignments.
Our approach to motion estimation is simpler in that hypotheses
do not correspond to objects, but to distinct velocities, and 
the number of hypotheses is always fixed.  This allows the
selection of regions of support to be performed non-iteratively.
The architecture of the model is feedforward with soft-max within 
layers, so it is quite fast.  Mixtures of experts was used to optimize
the weights in the network.

Terry

-----

From schmidhu at informatik.tu-muenchen.de  Tue Feb 15 04:06:19 1994
From: schmidhu at informatik.tu-muenchen.de (Juergen Schmidhuber)
Date: Tue, 15 Feb 1994 10:06:19 +0100
Subject: postdoctoral thesis
Message-ID: <94Feb15.100623met.42337@papa.informatik.tu-muenchen.de>


        ---------------- postdoctoral thesis ----------------
                         Juergen Schmidhuber
                  Technische Universitaet Muenchen
	    (submitted April 1993, accepted October 1993)    
        -----------------------------------------------------

        NETZWERKARCHITEKTUREN, ZIELFUNKTIONEN UND KETTENREGEL

       Es gibt relativ neuartige, auf R"uckkopplung basierende 
       k"unstliche  neuronale  Netze (KNN), deren F"ahigkeiten 
       betr"achtlich "uber  simple Musterassoziation hinausge-
       hen. Diese KNN gestatten im Prinzip die Implementierung 
       beliebiger auf  einem herk"ommlichen sequentiell arbei-
       tenden  Digitalrechner berechenbarer Funktionen. Im Ge-
       gensatz  zu herk"ommlichen  Rechnern l"a"st  sich dabei 
       jedoch die Qualit"at der Ausgaben  (formal spezifiziert 
       durch  eine  sinnvolle  Zielfunktion)   bez"uglich  der 
       ``Software'' (bei KNN  die Gewichtsmatrix) mathematisch 
       differenzieren, was  die Anwendung der  Kettenregel zur 
       Herleitung  gradientenbasierter Software"anderungsalgo-
       rithmen erm"oglicht. Die Arbeit verdeutlicht dies durch 
       formale Herleitung einer Reihe neuartiger Lernalgorith-
       men aus  folgenden Bereichen:  (1) "uberwachtes  Lernen 
       sequentiellen Ein/Ausgabeverhaltens  mit zyklischen und 
       azyklischen Architekturen, (2) ``Reinforcement Lernen'' 
       und  Subzielgenerierung  ohne  informierten Lehrer, (3) 
       un"uberwachtes Lernen zur  Redundanzextraktion aus Ein-
       gaben und Eingabestr"omen.  Zahlreiche Experimente zei-
       gen M"oglichkeiten und Schranken dieser Lernalgorithmen 
       auf.  Zum Abschluss  wird ein  ``selbstreferentielles'' 
       neuronales  Netzwerk pr"asentiert,  welches theoretisch 
       lernen kann, seinen eigenen Software"anderungsalgorith-
       mus zu "andern.

       -----------------------------------------------------


The postdoctoral thesis above is now available (in unrevised form) 
via ftp. To obtain a copy, follow the instructions at the end of 
this message.  

Here is additional information for those who are interested 
but don't understand German (or are unfamiliar with Germany's 
academic system): The postdoctoral thesis is  part of a process 
called ``Habilitation'' which is seen as a qualification for 
tenure. The thesis is about learning algorithms derived by the 
chain rule. It addresses supervised sequence learning, variants 
of reinforcement learning, and unsupervised learning (for 
redundancy reduction).  Unlike some previous papers of mine, 
it contains lots of experiments and lots of figures.  Here is 
a very brief summary based on pointers to recent English 
publications upon which the thesis elaborates:

Chapters 2 and 3 are on supervised sequence learning and extend 
publications [1] and [4]. Chapter 4 is on variants of learning 
with a ``distal teacher'' and extends publication [7] (robot 
experiments in chapter 4  were conducted by Eldracher and Baginski, 
see e.g. [9]). Chapters 5, 6 and 7 describe  unsupervised learning 
algorithms based on detection of redundant information in input 
patterns and pattern sequences: Chapter 5 elaborates on publication 
[5], and chapter 6 extends publication [3].  Chapter 6 includes a 
result by Peter Dayan, Richard Zemel and A. Pouget (SALK Institute) 
who demonstrated that equation (4.3) in [3] with $\beta = 0, \alpha =
= \gamma =1$ is essentially equivalent to equation (5.1).  Chapter 
6 also includes experiments conducted by Stefanie Lindstaedt who 
successfully applied the method in [3] to redundant images of 
letters presented according to the probabilities of English 
language, see [10].  Chapter 7 extends publications [2] and [8]. 
Experiments show how sequence processing neural nets using algorithms 
for redundancy reduction can learn to bridge time lags (between 
correlated events) of more than 1000 discrete time steps. Other 
experiments use neural nets for text compression and compare them
to standard data compression algorithms. Finally, chapter 8 
elaborates on publication [6]. 

-------------------------- References -------------------------------

[1] J. H. Schmidhuber.  A fixed size storage O(n^3) time complexity 
learning algorithm for fully recurrent continually running networks.
Neural Computation, 4(2):243--248, 1992.

[2] J. H. Schmidhuber.  Learning complex, extended sequences using the 
principle of history compression.  Neural Computation, 4(2):234--242, 1992.

[3] J. H. Schmidhuber.  Learning factorial codes by predictability 
minimization.  Neural Computation, 4(6):863--879, 1992.

[4] J. H. Schmidhuber.  Learning to control fast-weight memories: An 
alternative to recurrent nets.  Neural Computation, 4(1):131--139, 1992.

[5] J. H. Schmidhuber and D. Prelinger.  Discovering predictable 
classifications.  Neural Computation, 5(4):625--635, 1993.

[6] J. H. Schmidhuber. A self-referential weight matrix. In Proc. of 
the Int. Conf. on Artificial Neural Networks, Amsterdam, pages 446--451. 
Springer, 1993.

[7] J. H. Schmidhuber and R. Wahnsiedler.  Planning simple trajectories 
using neural subgoal generators.  In J. A. Meyer, H. L. Roitblat, and S. W. 
Wilson, editors, Proc.  of the 2nd Int. Conf. on Simulation of Adaptive 
Behavior, pages 196--202. MIT Press, 1992.

[8] J. H. Schmidhuber, M. C. Mozer, and D. Prelinger.  Continuous history 
compression.  In H. Huening, S. Neuhauser, M. Raus, and W. Ritschel, 
editors,  Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, 
pages 87--95.  Augustinus, 1993.

[9] M. Eldracher and B. Baginski. Neural subgoal generation using 
backpropagation.  In George G. Lendaris, Stephen Grossberg and Bart 
Kosko, editors, Proc.  of WCNN'93, Lawrence Erlbaum Associates, Inc.,  
Hillsdale, pages = III-145--III-148, 1993.

[10] S.  Lindstaedt.  Comparison of unsupervised neural networks for 
redundancy reduction.  In M. C. Mozer, P. Smolensky, D. S. Touretzky, 
J. L. Elman and A. S.  Weigend, editors, Proc. of the 1993 Connectionist 
Models Summer School, pages  308-315. Hillsdale, NJ: Erlbaum Associates, 
1993.

----------------------------------------------------------------------	

The thesis comes in three parts. To obtain a copy, do:

	     unix>         ftp 131.159.8.35

	     Name:         anonymous
             Password:     (your email address, please) 
	     ftp>          binary
	     ftp>          cd pub/fki
             ftp>          get schmidhuber.habil.1.ps.Z
             ftp>          get schmidhuber.habil.2.ps.Z
             ftp>          get schmidhuber.habil.3.ps.Z
	     ftp>          bye

	     unix>         uncompress schmidhuber.habil.1.ps.Z
	     unix>         lpr  schmidhuber.habil.1.ps
	     .
	     .
	     .
    
    Note: The layout is designed for conventional 
    European DINA4 format. Expect 145 pages.


----------------------------------------------------------------------	
 
Dr. habil. J. H. Schmidhuber,  Fakultaet fuer Informatik, 
Technische Universitaet Muenchen, 80290 Muenchen, Germany
schmidhu at informatik.tu-muenchen.de


        --------- postdoctoral thesis (unrevised) -----------
        NETZWERKARCHITEKTUREN, ZIELFUNKTIONEN UND KETTENREGEL
                      Juergen Schmidhuber, TUM

From Petri.Myllymaki at cs.Helsinki.FI  Tue Feb 15 04:52:42 1994
From: Petri.Myllymaki at cs.Helsinki.FI (Petri Myllymaki)
Date: Tue, 15 Feb 1994 11:52:42 +0200
Subject: Thesis in neuroprose
Message-ID: <199402150952.LAA01783@keos.Helsinki.FI>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/myllymaki.thesis.ps.Z

The following report has been placed in the neuroprose archive.

-----------------------------------------------------------------------
Bayesian Reasoning by Stochastic Neural Networks

Petri Myllymaki

Ph.Lic. Thesis
Department of Computer Science, University of Helsinki
Report C-1993-67, Helsinki, December 1993
78 pages

This work has been motivated by problems in several research areas:
expert system design, uncertain reasoning, optimization theory, and
neural network research. From the expert system design point of view,
our goal was to develop a generic expert system shell capable of
handling uncertain data. The theoretical framework used here for
handling uncertainty is probabilistic reasoning, in particular the
theory of Bayesian belief network representations. The probabilistic
reasoning task we are interested in is, given a Bayesian network
representation of a probability distribution on a set of discrete
random variables, to find a globally maximal probability state
consistent with given initial constraints. To solve this NP-hard
problem approximatively, we use an iterative stochastic method, Gibbs
sampling.  As this method can be quite inefficient when implemented on
a conventional sequential computer, we show how to construct a Gibbs
sampling process for a given Bayesian network on a massively parallel
architecture, a harmony neural network, which is a special case of the
Boltzmann machine architecture.

To empirically test the method developed, we implemented a hybrid
neural-symbolic expert system shell, NEULA. The symbolic part of the
system consists of a high-level conceptual description language and a
compiler, which can be used for constructing Bayesian networks and
providing them with the corresponding parameters (conditional
probabilities).  As the number of parameters needed for a given network
may generally be quite large, we restrict ourselves to Bayesian
networks having a special hierarchical structure.  The neural part of
the system consists of a neural network simulator which performs
massively parallel Gibbs sampling.  The performance of the NEULA system
was empirically tested by using a small artificial test example.

Computing Reviews (1991) Categories and Subject Descriptors:
G.3     [Probability and statistics]: Probabilistic algorithms
F.1.1   [Models of computation]: Neural networks
G.1.6   [Optimization]: Constrained optimization
I.2.5   [Programming languages and software]: Expert system tools and
techniques

General Terms: Algorithms, Theory.

Additional Key Words and Phrases:
Monte Carlo algorithms, Gibbs sampling, simulated annealing,
Bayesian belief networks, connectionism, massive parallelism

-----------------------------------------------------------------------
To obtain a copy:

  ftp archive.cis.ohio-state.edu
  login: anonymous
  password: <your email address>
  cd pub/neuroprose/Thesis
  binary
  get myllymaki.thesis.ps.Z
  quit

Then at your system:

  uncompress myllymaki.thesis.ps.Z
  lpr myllymaki.thesis.ps

-----------------------------------------------------------------------
Petri Myllymaki                          Petri.Myllymaki at cs.Helsinki.FI
Department of Computer Science           Int.+358 0 708 4212 (tel.)
P.O.Box 26 (Teollisuuskatu 23)           Int.+358 0 708 4441 (fax)
FIN-00014 University of Helsinki, Finland
-----------------------------------------------------------------------

From thrun at uran.cs.bonn.edu  Tue Feb 15 08:25:02 1994
From: thrun at uran.cs.bonn.edu (Sebastian Thrun)
Date: Tue, 15 Feb 1994 14:25:02 +0100
Subject: 2 papers on robot learning
Message-ID: <199402151325.OAA17317@carbon.informatik.uni-bonn.de>


This is to announce two recent papers in the connectionists' archive.
Both papers deal with robot learning issues. The first paper describes
two learning approaches (EBNN with reinforcement learning, COLUMBUS),
and the second paper gives some empirical results for learning robot
navigation using reinforcement learning and EBNN.  Both approaches have
been evaluated using real robot hardware.

Enjoy reading!
Sebastian


------------------------------------------------------------------------


                   LIFELONG ROBOT LEARNING


         Sebastian Thrun              Tom Mitchell
        University of Bonn    Carnegie Mellon University

Learning provides a useful tool for the automatic design of autonomous
robots.  Recent research on learning robot control has predominantly
focussed on learning single tasks that were studied in isolation.  If
robots encounter a multitude of control learning tasks over their
entire lifetime, however, there is an opportunity to transfer knowledge
between them. In order to do so, robots may learn the invariants of the
individual tasks and environments. This task-independent knowledge can
be employed to bias generalization when learning control, which reduces
the need for real-world experimentation.  We argue that knowledge
transfer is essential if robots are to learn control with moderate
learning times in complex scenarios.  Two approaches to lifelong robot
learning which both capture invariant knowledge about the robot and its
environments are reviewed.  Both approaches have been evaluated using a
HERO-2000 mobile robot.  Learning tasks included navigation in unknown
indoor environments and a simple find-and-fetch task.


                                          (Technical Report IAI-TR-93-7,
                                           Univ. of Bonn, CS Dept.)


------------------------------------------------------------------------


           AN APPROACH TO LEARNING ROBOT NAVIGATION

                Sebastian Thrun. Univ. of Bonn


Designing robots that can learn by themselves to perform complex
real-world tasks is still an open challenge for the fields of Robotics
and Artificial Intelligence.  In this paper we describe an approach to
learning indoor robot navigation through trial-and-error.  A mobile
robot, equipped with visual, ultrasonic and infrared sensors, learns to
navigate to a designated target object.  In less than 10 minutes
operation time, the robot is able to learn to navigate to a marked
target object in an office environment.  The underlying learning
mechanism is the explanation-based neural network (EBNN) learning
algorithm. EBNN initially learns function from scratch using neural
network representations. With increasing experience, EBNN employs
domain knowledge to explain and to analyze training data in order to
generalize in a knowledgeable way.


                                (to appear in: Proceedings of the 
                                 IEEE Conference on Intelligent
                                 Robots and Systems 1994)

------------------------------------------------------------------------


Postscript versions of both papers may be retrieved from Jordan
Pollack's neuroprose archive by following the instructions below.

	unix>           ftp archive.cis.ohio-state.edu

	ftp login name> anonymous
	ftp password>   xxx at yyy.zzz
	ftp>            cd pub/neuroprose
	ftp>		bin
	ftp>            get thrun.lifelong-learning.ps.Z
	ftp>            get thrun.learning-robot-navg.ps.Z
	ftp>            bye

	unix>           uncompress thrun.lifelong-learning.ps.Z
	unix>           uncompress thrun.learning-robot-navg.ps.Z
	unix>           lpr thrun.lifelong-learning.ps.Z
	unix>           lpr thrun.learning-robot-navg.ps.Z


From chaps at inf.enst.fr  Tue Feb 15 09:22:03 1994
From: chaps at inf.enst.fr (Cedric Chappelier)
Date: Tue, 15 Feb 94 15:22:03 +0100
Subject: papers on time and neural networks (Correction)
Message-ID: <9402151422.AA03059@ulysse.enst.fr.enst.fr>

Yesterday we send the following announcement. We want to make a
little correction : the format of the paper can either be Word file
(as mentioned in the first mail) OR A LATEX FILE. 

> 
> As guest editors of a special issue of the Sigart Bulletin about :
> 
>                       Time and Neural Networks
> 
> we are looking for 4 articles about 10 pages each.
> 
> Sigart is a quarterly publication of the Association for Computing
> Machinery (ACM) special interest group on Artificial Intelligence.
> 
> The paper may either deal with approachs of time processing using
> traditional connectionist architectures, or with more specific models
> integrating time in their basis.
> 
> If you are interested, and if you can submit a paper (not already
> published) within a short delay (about 1 month and a half), please send a
> draft (if possible a Word file) :
         ^^^^^^^^^^^^^^^^^^^^^^^

OR A LATEX FILE            

> - preferably by giving ftp access to it (information via e-mail)
> - or sending it as "attached file" on e-mail
> - or posting a paper copy of it.
> 
> Drafts should be received before April 1.
> Notification of acceptance will be sent before April 20.
> 
> grumbach at enst.fr or chaps at enst.fr
> 
> Alain Grumbach and Cedric Chappelier
> ENST dept INF
> 46 rue Barrault
> 75634 Paris Cedex 13
> France
> 
> 

Sorry for the negligence.

---
E-mail: chaps at inf.enst.fr  ||  Cedric.Chappelier at enst.fr

P-mail: Telecom Paris
        46, rue Barrault - 75634 Paris cedex 13


From COTTRLL at FRMOP22.CNUSC.FR  Tue Feb 15 18:42:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Tue, 15 Feb 94 18:42
Subject: Available paper : Kohonen algorithm
Message-ID: <"94-02-15-18:42:21.90*COTTRLL"@FRMOP22.CNUSC.FR>

The following paper is available from anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52)
in directory pub/neuroprose as file
cottrell.things.ps

"Two or three things that we know about the Kohonen algorithm"
10 pages
by Marie Cottrell, Jean-Claude Fort, Gilles Pages
SAMOS, Universite Paris 1
90, rue de Tolbiac
75634 PARIS Cedex 13
FRANCE

ABSTRACT

Many theoretical papers are published about the Kohonen algorithm. It is not
not
easy to understand what is exactly proved, because of the great variety
of mathematical methods. Despite all these efforts, many problems
remain without solution. In this small review paper, we intend to sum up
the situation.

 To appear in the Proceedings of
 ESANN 94, Bruxelles

 To retrieve
 >ftp archive.cis.ohio-state.edu
 name : anonymous
 password: (use your e-mail address)
 ftp> cd pub/neuroprose
 ftp> get cottrell.things.ps
 ftp> quit

From platt at synaptics.com  Tue Feb 15 20:13:14 1994
From: platt at synaptics.com (John Platt)
Date: Tue, 15 Feb 94 17:13:14 PST
Subject: Neuroprose paper available
Message-ID: <9402160113.AA18442@synaptx.synaptics.com>

****** PAPER AVAILABLE VIA NEUROPROSE ***************************************
****** AVAILABLE VIA FTP ONLY ***********************************************
****** PLEASE DO NOT FORWARD TO OTHER MAILING LISTS OR BOARDS. **************

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wolf.address-block.ps.Z

The following paper has been placed in the Neuroprose archives at Ohio
State. The file is wolf.address-block.ps.Z . Only the electronic version
of this paper is available.  This paper is 8 pages in length. 

NOTE: The uncompressed postscript file is approximately 2.7 megabytes in
length, so it may take a while to print out. Also, you may have to tell
the lpr program to use a symbolic link to copy into the spool directory
(lpr -s under SunOS).


-----------------------------------------------------------------------------

     Postal Address Block Location Using A Convolutional Locator Network

			 Ralph Wolf and John C. Platt
			       Synaptics, Inc.
			     2698 Orchard Parkway
			      San Jose, CA 95134

				  ABSTRACT:

    This paper describes the use of a convolutional neural network to
    perform address block location on machine-printed mail pieces.
    Locating the address block is a difficult object recognition problem
    because there is often a large amount of extraneous printing on a mail
    piece and because address blocks vary dramatically in size and shape.
    
    We used a convolutional locator network with four outputs, each trained
    to find a different corner of the address block.  A simple set of
    rules was used to generate ABL candidates from the network output. The
    system performs very well: when allowed five guesses, the network will
    tightly bound the address delivery information in 98.2% of the cases.

-----------------------------------------------------------------------------

							John Platt
							platt at synaptics.com


From terry at salk.edu  Tue Feb 15 22:44:00 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Tue, 15 Feb 94 19:44:00 PST
Subject: Telluride Workshops
Message-ID: <9402160344.AA25170@salk.edu>

CALL FOR PARTICIPATION IN TWO WORKSHOPS ON "NEUROMORPHIC  ENGINEERING"

             JULY 3 - 9, 1994  AND  JULY 10 -  16,  1994

                       TELLURIDE, COLORADO

Christof Koch (Caltech) and Terry Sejnowski (Salk Institute/UCSD)
invite applications for two different workshops that will be held in 
Telluride, Colorado in July 1994. Travel and housing expenses will be 
provided for ten to twenty active researchers for each workshop.

Deadline for application is March 10, 1994. 

GOALS:

Carver Mead has introduced the term "Neuromorphic Engineering" for a new field  
based on the design and fabrication of artificial neural systems, such as 
vision systems, head-eye systems, and roving robots, whose architecture and 
design principles are based on those of biological nervous systems. The goal 
of these workshops is to bring together young investigators and more 
established researchers from academia with their counterparts in industry 
and national laboratories, working on both neurobiological as well as 
engineering aspects of sensory systems and sensory-motor integration. The 
focus of the workshop will be on ``active" participation, with 
demonstration systems and hands-on-experience for all participants. 

Neuromorphic engineering has a wide range of applications from nonlinear 
adaptive control of complex systems to the design of smart sensors. Many of 
the fundamental principles in this field, such as the use of learning methods 
and the design of parallel hardware, are inspired by biological systems. 
However, existing applications are modest and the challenge of scaling up 
from small artificial neural networks and designing completely autonomous 
systems at the levels achieved by biological systems lies ahead. The
assumption underlying these workshops is that the next generation of 
neuromorphic systems would benefit from closer attention to the principles 
found through experimental and theoretical studies of brain systems.

WORKSHOPS:

                  NEUROMORPHIC ANALOG VLSI SYSTEMS 
               Sunday, July 3 to Saturday, July 9, 1994

Organized by Rodney Douglas (Oxford), Misha Mahowald (Oxford) 
and Stephen Lisberger (UCSF). 

The goal of this week is to bring together biologists and engineers who are 
interested in exploring neuromorphic systems through the medium of analog VLSI. 
The workshop will cover methods for the design and fabrication of 
multi-chip neuromorphic systems. This framework is suitable both for 
creating analogs of specific biological systems, which can serve as a 
modeling environment for biologists, and as a tool for engineers to 
create cooperative circuits based on biological principles. 
The workshop will provide the community with a 
common formal language for describing neuromorphic systems. 

Equipment will be present for participants to evaluate 
existing neuromorphic chips (including silicon retina, silicon neurons, 
oculomotor system). 


                SYSTEMS LEVEL MODELS OF VISUAL BEHAVIOR
               Sunday, July 10 to Saturday, July 16, 1994 

Organized by Dana Ballard (Rochester) and Richard Andersen (Caltech).

The goal of this week is to bring together biologists and engineers who are 
interested in systems level modeling of visual behaviors and their 
interactions with the motor systems.

Sessions will cover issues of sensory-motor integration in the mammalian 
brain. Special emphasis will be placed on understanding  neural algorithms 
used by the brain which can provide insights into constructing electrical 
circuits which can accomplish similar tasks.  Issues to be covered will include 
spatial localization and constancy, attention, motor planning, eye 
movements, and the use of visual motion information for motor control.  
Two or three prominent neuroscientists will be invited to give lectures on 
the above subjects. These researchers will also be asked to bring their own 
demonstrations, classroom experiments, and software for computer models. 

Demonstrations include recording eye movements and simple eye 
movement psychophysical experiments, neural network models for 
coordinate transformations and the representation of space, visual 
attention psychophysical experiments. Participants can conduct their own 
experiments using the Virtual Reality equipment. 

FORMAT:

Time in both workshops will be divided between planned presentation, free 
interaction, and contributed material. Each day will consist of a lecture in 
the morning that covers the theory behind the hands-on investigation in the 
afternoon. Following each lecture, there will be a demonstration that 
introduces participants to the equipment that will be available in the 
afternoon session. Participants will be free to explore and play with whatever 
they choose in the afternoon. Participants are encouraged to bring their own 
material to share with others. After dinner, time for participants to provide 
an informal lecture/demonstration is reserved.  

LOCATION AND ARRANGEMENTS:

The two workshops will take place at the "Telluride Summer Research 
Center," located in the small  town of Telluride, 9000 feet high in Southwest 
Colorado, about 6 hours away from Denver (350 miles) and 4 hours from 
Aspen. Continental and United Airlines provide many daily flights directly 
into Telluride. Participants will be housed in shared condominiums, 
within walking distance of the Center. 

The workshop is intended to be very informal and hands-on. Participants are 
not required to have had previous experience in analog VLSI circuit design, 
computational or machine vision, systems level neurophysiology or modeling 
the brain at the systems level. However, we strongly encourage active 
researchers with relevant backgrounds from academia, industry and 
national laboratories to apply, in particular if they are prepared to talk about 
their work or to bring demonstrators to Telluride (e.g. robots, chips, 
software). 

We expect to be able to pay for shipping necessary equipment 
to Telluride and will have at least three technical staff present 
throughout both workshops to assist us with software and hardware 
problems. We will have a network of  SUN workstations running UNIX and 
connected to the Internet at the Center available to us. 

All domestic travel and housing expenses will be provided. 
Participants are expected to pay for food and incidental expenses. 

HOW TO APPLY:

The deadline for receipt of applications is March 10, 1994

Applicants should be at the level of graduate students or above (i.e. post-
doctoral fellows, faculty, research and engineering staff and the equivalent
positions in industry and national laboratories). We actively encourage
qualified women and minority candidates to apply.

Each participant can apply for only one workshop and the application should include:

1. Name, address, telephone, e-mail, FAX, and and minority status (optional).
2. Resume.
3. One page summary of background and interests relevant to the workshop.
4. Description of special equipment needed for demonstrations.
5. Two letters of recommendation

Complete applications should be sent to:

Prof. Terrence Sejnowski
The Salk Institute
Post Office Box 85800
San Diego, CA 
92186-5800

Applicants will be notified by April 15, 1994.


From venu at pixel.mipg.upenn.edu  Wed Feb 16 17:28:00 1994
From: venu at pixel.mipg.upenn.edu (Venugopal)
Date: Wed, 16 Feb 94 17:28:00 EST
Subject: Paper available on ftp
Message-ID: <9402162228.AA00373@pixel.mipg.upenn.edu>


	      *** PLEASE DO NOT FORWARD TO OTHER GROUPS  ***


      Preprint of the following paper (to appear in Circuits, Systems and
      Signal Processing) is available on ftp from neuroprose archive:


               AN IMPROVED SCHEME FOR THE DIRECT ADAPTIVE CONTROL
          OF DYNAMICAL SYSTEMS USING BACKPROPAGATION NEURAL NETWORKS
                       

		 K. P. Venugopal, R. Sudhakar and A. S. Pandya
		   
		        Department of Electrical Eng.
                   Department of Computer Science and Eng.
		        Florida Atlantic University


        Abstract:

	This paper presents an improved direct control architecture for
	the on-line learning control of dynamical systems using backpropagation
	neural networks. The proposed architecture is compared with the other
	direct control schemes. In the present scheme, the neural network
	interconnection strengths are updated based on the output error of
	the dynamical system directly, rather than using a transformed version
	of the error employed in other schemes. The ill effects of the 
	controlled dynamics on the on-line updating of the network weights
	are moderated by including a compensating gain layer. An error feedback
	is introduced to improve the dynamic response of the control system.
	Simulation studies are performed using the nonlinear dynamics of an
	underwater vehicle and the promising results support the effectiveness 
	of the proposed scheme.

	
	-----------------------------------------

	The file at archive.cis.ohio-state.edu is

	 venugopal.css.ps.Z
	 (34 pages)

	to ftp the files:

	unix> ftp archive.cis.ohio-state.edu

	Name (archive.cis.ohio-state.edu:xxxxx): anonymous
	Password: your address

	ftp> cd pub/neuroprose
	ftp> binary
	ftp> get venugopal.css.ps.Z


	uncompress the file after transfering to your machine.
        
	unix> uncompress venugopal.css.ps.Z


	________________________________________________________________

	K. P. Venugopal
	Medical Image Processing Group
	University of Pennsylvania
	423 Blockley Hall
	Philadelphia, PA 19104   	   (venu at pixel.mipg.upenn.edu)

From anandan at sarnoff.com  Wed Feb 16 09:22:51 1994
From: anandan at sarnoff.com (P. Anandan x3249)
Date: Wed, 16 Feb 94 09:22:51 EST
Subject: outlier, robust statistics
In-Reply-To: <9402150756.AA17907@salk.edu> (message from Terry Sejnowski on Mon, 14 Feb 94 23:56:04 PST)
Message-ID: <9402161422.AA13890@peanut.sarnoff.com>

Hi Terry,

It may be worth mentioning that a simple extension of your "fixed velocity"
formulation leads to something quite powerful and is a decent approximation for
many real situations.  This is to look formulate the hypothesis space as 2-D
affine transforms of the image plane.  Most of the references below have not
used robust estimators but have focussed on the layered representation problem.
However, recent extensions of all these algorithms at Sarnoff have included
several different types of robust estimators as options.  One noteworthy
omission (simply because I have not yet updated my bib file, is the paper by
Black and Jepson, CVPR93.)   I also did not inlude the paper by Wang and
Adelson at CVPR93, because that can be viewed as falling into either category
(affine hypotheses or object hypotheses).

In general, when you use a parametric motion model (translation, affine,
8-parameter quadratic for planar surface motion), you have the choice of
working with motion-parameters as hypotheses or the objects as hypotheses. But
if you are working with non-parametric motion fields (e.g., smooth flow), it is
not obvious how to work with motion parameters as hypotheses.  

Last but not least, I should mention a recent paper that we have written which
is under review that goes beyond parametric layers to include residual flow to
fully account for the scene motion.  This is an alternative approach to the
standard formulation of the spatial-coherence assumption as a "smoothness"
constraint (e.g., minimum quadratic variation, etc.).  This paper also
describes a computational framework that identifies the critical choice points
for layered motion estimation and shows how different algorithms fit into that
framework.   I should be in a position to send you a copy of the paper in a
couple of weeks or so.

-- anandan

@article{Irani-Peleg:IJCV,
	author =	{M. Irani and S. Peleg},
	title =		{Computing Occluding and Transparent Motions},
	journal =	IJCV,
	year =		{accepted for publication, 1993},
}


@inproceedings{Bergen-etal:AICV91,
        author =	{J.R. Bergen and P.J. Burt and K. Hanna and
			R. Hingorani and P. Jeanne and S. Peleg},
        title =         {Dynamic Multiple-Motion Computation},
        booktitle =     {Artificial Intelligence and Computer Vision:
                         Proceedings of the Israeli Conference},
	publisher =	{Elsevier},
	editor =	{Y.A. Feldman and A. Bruckstein},
        year =          {1991},
	pages =		{147--156}
}

@inproceedings{Burt-etal:WVM89,
	title =	{Object tracking with a moving camera, an application of
		dynamic motion analysis},
	author ={P.J. Burt and J.R. Bergen and R. Hingorani and R. Kolczynski and W.A. Lee and A. Leung and J. Lubin and H. Shvaytser},
	booktitle = WVM,
	address =	{Irvine, CA},
	month =		{March},
	year =		{1989},
	pages =		{2--12}
}


@article{Bergen-etal:PAMI92,
        author =   {J.R. Bergen and P.J. Burt and R. Hingorani and S. Peleg},
        title =    {A Three Frame Algorithm for Estimating Two-Component Image 
              Motion}, 
        journal =       PAMI,
        month =         {September},
        year =          {1992},
        volume =        {14},
        pages =         {886--896}
}


From M.Cooke at dcs.shef.ac.uk  Wed Feb 16 09:22:17 1994
From: M.Cooke at dcs.shef.ac.uk (Martin Cooke)
Date: Wed, 16 Feb 94 14:22:17 GMT
Subject: missing values
Message-ID: <9402161427.AA10510@dcs.shef.ac.uk>


I've only just seen the discussion on missing values, so forgive this late  
response. The issue of training the Kohonen self-organising feature map with  
partial data is covered in

	Samad & Harp (1992)
	Self-organisation with partial data
	Network, 3, 205-212.

Essentially, weight changes are restricted to the subspace of available data.  
Samad & Harp report three experiments using partial training data, and  
demonstrate that performance is essentially unchanged up to about 60% missing  
data. This is presumably due to the n -> 2 dimensionality reduction.

We recently applied this result to training a speech recogniser on partial  
data, and got similar results [tech. rep. in preparation]. We're coming at this  
from the field of auditory scene analysis, where the result of source  
segregation is an inherently partial description of one or other source.

I'd be happy to supply further details on request.

Martin Cooke
Computer Science
Sheffield University
UK

From mmoller at daimi.aau.dk  Wed Feb 16 11:10:00 1994
From: mmoller at daimi.aau.dk (Martin Fodslette M|ller)
Date: Wed, 16 Feb 1994 17:10:00 +0100
Subject: copy of thesis.
Message-ID: <199402161610.AA28147@titan.daimi.aau.dk>


To all that have requested a copy of my thesis 
(and apologies to those that did not for sending this message).


Thank you all for your interest in my thesis. Since so many 
have requested a copy (about 200), I will not be able to answer you 
all separately right now. Please accept my apologies.

You will all receive a copy of the thesis in a few weeks.

Best Regards

-martin

----------------------------------------------------------------
Martin Moller                    email: mmoller at daimi.aau.dk
Computer Science Dept.           Fax:   +45 8942 3255
Aarhus University                Phone: +45 8942 3371
Ny Munkegade, Build. 540,
DK-8000 Aarhus C,
Denmark
----------------------------------------------------------------


From venu at pixel.mipg.upenn.edu  Wed Feb 16 17:15:31 1994
From: venu at pixel.mipg.upenn.edu (Venugopal)
Date: Wed, 16 Feb 94 17:15:31 EST
Subject: Thesis available on ftp
Message-ID: <9402162215.AA00370@pixel.mipg.upenn.edu>


   The following thesis is available on ftp from neuroprose archive:


                   LEARNING IN CONNECTIONIST NETWORKS 
                       USING THE ALOPEX ALGORITHM

		           K. P. Venugopal
		     Florida Atlantic University


        Abstract:

	The ALOPEX algorithm is presented as a `universal' learning
	algorithm for connectionist models. It is shown that the ALOPEX
	procedure can be used efficiently as a supervised learning algorithm
	for such models. The algorithm is demonstrated successfully on a 
	variety of network architectures. Such architectures include multi-
	layered perceptrons, time-delay models, asymmetric fully recurrent
	networks and memory neurons. The learning performance as well as the
	generalization capability of the ALOPEX algorithm, are compared with
	those of the backpropagation procedure, concerning a number of
	benchmark problems, and it is shown that the ALOPEX has specific
	advantages. Results on the MONKS problems are the best reported
	ones so far.

	Two new architectures are proposed for the on-line, direct adaptive
	control of dynamical systems using neural networks. The proposed 
	schemes are shown to provide better response and tracking
        characteristics, than the other existing direct control schemes.
	A velocity reference scheme is introduced to improve the dynamic
	response of on-line learning controllers.

	The proposed learning algorithm and architectures are also studied on
	three practical problems: (i) classification of handwritten digits
	using Fourier descriptors, (ii) recognition of underwater targets
	from sonar returns, conidering temporal dependencies of consecutive
	returns, and (iii) on-line learning control of autonomous underwater
	vehicles, starting from random initial conditions. Detailed studies
	are conducted on the learning control applications. Also, the ability 
	of the neural network controllers to adapt to slow and sudden varying 
	parameter disturbances and measurement noise is studied in detail. 


	---------------------
	Some of the related papers:


	K. P. Venugopal, A. S. Pandya and R. Sudhakar, 'A recurrent neural
	network controller and learning algorithm for the on-line learning
	control of autonomous underwater vehicles', to appear in Neural
	Networks (1994)

	K. P. Venugopal, R. Sudhakar and A. S. Pandya, 'On-line learning
	control of autonomous underwater vehicles using feedforward neural
	networks', IEEE Journal of Oceanic Engineering, vol. 17 (1992)

	K. P. Venugopal, R. Sudhakar and A. S. Pandya, 'An improved scheme
	for the direct adaptive control of dynamical systems using
	backpropagation neural networks' to appear in Circuits, Systems and
	Signal Processing (1994)

	K. P. Venugopal and S. M. Smith, 'Improving the dynamic response of
	neural network controllers using velocity reference feedback'
	IEEE Trans. on Neural Networks, vol. 4, (1993)

	K. P. Unnikrishnan and K. P. Venugopal, 'Alopex: a correlation
	based learning algorithm for feedforward and feedback neural 
	networks' to appear in Neural Computation, vol. 6, (1994)

	A. S. Pandya and K. P. Venugopal, 'A stochastic parallel algorithm
	for learning in neural networks', to appear in IEICE Transactions 
	on Information Processing (1994)

	-----------------------------------------

	The files at archive.cis.ohio-state.edu are

	 venugopal.thesis1.ps.Z
	 venugopal.thesis2.ps.Z
	 venugopal.thesis3.ps.Z
	 venugopal.thesis4.ps.Z
	 venugopal.thesis5.ps.Z
	 venugopal.thesis6.ps.Z
	 venugopal.thesis7.ps.Z

	(total 200 pages)


	to ftp the files:

	unix> ftp archive.cis.ohio-state.edu

	Name (archive.cis.ohio-state.edu:xxxxx): anonymous
	Password: your address

	ftp> cd pub/neuroprose/Thesis
	ftp> binary
	ftp> mget venugopal.thesis*


	uncompress the files after transfering to your machine.
        
	unix> uncompress venugopal*


	-------------------------------------------------

	K. P. Venugopal
	Medical Image Processing Group
	University of Pennsylvania
	423 Blockley Hall
	Philadelphia, PA 19104   	   (venu at pixel.mipg.upenn.edu)

From minton at ptolemy.arc.nasa.gov  Wed Feb 16 21:03:21 1994
From: minton at ptolemy.arc.nasa.gov (Steve Minton)
Date: Wed, 16 Feb 94 18:03:21 PST
Subject: JAIR article
Message-ID: <9402170203.AA27856@ptolemy.arc.nasa.gov>

Readers of this newsgroup may be interested the following article, which
was recently published in the Journal of Artificial Intelligence
Research:

Ling, C.X.  (1994)
  "Learning the Past Tense of English Verbs: The Symbolic Pattern Associator
   vs. Connectionist Models", Volume 1, pages 209-229

   Postscript: volume1/ling94a.ps (247K)
   Online Appendix: volume1/ling-appendix.Z (109K) data file, compressed

   Appendix: Learning the past tense of English verbs - a seemingly minor
   aspect of language acquisition - has generated heated debates since
   1986, and has become a landmark task for testing the adequacy of
   cognitive modeling. Several artificial neural networks (ANNs) have
   been implemented, and a challenge for better symbolic models has been
   posed.  In this paper, we present a general-purpose Symbolic Pattern
   Associator (SPA) based upon the decision-tree learning algorithm ID3.
   We conduct extensive head-to-head comparisons on the generalization
   ability between ANN models and the SPA under different
   representations. We conclude that the SPA generalizes the past tense
   of unseen verbs better than ANN models by a wide margin, and we offer
   insights as to why this should be the case.  We also discuss a new
   default strategy for decision-tree learning algorithms.

JAIR's server can be accessed by WWW, FTP, gopher, or automated email.
For further information, check out our WWW server (URL is
gopher://p.gp.cs.cmu.edu/) or one of our FTP sites (/usr/jair/pub at
p.gp.cs.cmu.edu), or send email to jair at cs.cmu.edu with the subject
AUTORESPOND and the message body HELP.


From COTTRLL at FRMOP22.CNUSC.FR  Thu Feb 17 10:04:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Thu, 17 Feb 94 10:04
Subject: Paper available
Message-ID: <"94-02-17-10:04:06.72*COTTRLL"@FRMOP22.CNUSC.FR>

Dear connectionnits
Some people report that they cannot retrieve the paper
cottrell.things.ps
that I put in the neuroprose archive some days ago
I will try to solve the problem as soon as possible
Please wait a little before trying again
Yours sincerely
Marie Cottrell
SAMOS Universite Paris1
90, rue de Tolbiac
F-75634 PARIS 13
FRANCE
E-mail : cottrll at frmop22.cnusc.fr

From COTTRLL at FRMOP22.CNUSC.FR  Thu Feb 17 19:54:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Thu, 17 Feb 94 19:54
Subject: Paper available : Kohonen algorithm
Message-ID: <"94-02-17-19:54:08.03*COTTRLL"@FRMOP22.CNUSC.FR>

Dear connectionnists
The problem that some of you encounter in retrieving
the paper
Two or three...
file cottrell.things.ps in neuroprose repository
comes from a change in its name
its name is now : cottrell.things.ps.Z
in pub/neuroprose
in archive.cis.ohio-state.edu
It has been compressed.
Sorry for the delay
Yours sincerely
Marie Cottrell

From reza at ai.mit.edu  Thu Feb 17 09:03:53 1994
From: reza at ai.mit.edu (Reza Shadmehr)
Date: Thu, 17 Feb 94 09:03:53 EST
Subject: Tech reports from CBCL at MIT
Message-ID: <9402171403.AA02835@corpus-callosum>


Hello,

Following is a list of recent technical reports from the Center for
Biological and Computational Learning at M.I.T.  These reports are 
available via anonymous ftp. (see end of this message for details)

--------------------------------
:CBCL Paper #78/AI Memo #1405
:author Amnon Shashua
:title On Geometric and Algebraic Aspects of 3D Affine and Projective
Structures from Perspective 2D Views
:date July 1993
:pages 14
:keywords visual recognition, structure from motion, projective
geometry, 3D reconstruction

We investigate the differences --- conceptually and algorithmically
--- between affine and projective frameworks for the tasks of visual
recognition and reconstruction from perspective views.  It is shown
that an affine invariant exists between any view and a fixed view
chosen as a reference view. This implies that for tasks for which a
reference view can be chosen, such as in alignment schemes for visual
recognition, projective invariants are not really necessary.  We then
use the affine invariant to derive new algebraic connections between
perspective views. It is shown that three perspective views of an
object are connected by certain algebraic functions of image
coordinates alone (no structure or camera geometry needs to be
involved).

--------------
:CBCL Paper #79/AI Memo #1390
:author  Jose L. Marroquin and Federico Girosi
:title Some Extensions of the K-Means Algorithm for Image Segmentation
and Pattern Classification
:date  January 1993
:pages 21
:keywords K-means, clustering, vector quantization, segmentation,
classification

We present some extensions to the k-means algorithm for vector
quantization that permit its efficient use in image segmentation and
pattern classification tasks. We show that by introducing a certain
set of state variables it is possible to find the representative
centers of the lower dimensional manifolds that define the boundaries
between classes; this permits one, for example, to find class
boundaries directly from sparse data or to efficiently place centers
for pattern classification. The same state variables can be used to
determine adaptively the optimal number of centers for clouds of data
with space-varying density. Some examples of the application of these
extensions are also given.

--------------
:CBCL Paper #80/AI Memo #1431
:title Example-Based Image Analysis and Synthesis
:author David Beymer, Amnon Shashua and Tomaso Poggio
:date November, 1993
:pages 21
:keywords computer graphics, networks, computer vision,
teleconferencing, image compression, computer interfaces 

Image analysis and graphics synthesis can be achieved with learning
techniques using directly image examples without physically-based, 3D
models.  In our technique:  1) the mapping from novel images to a vector of 
``pose'' and ``expression'' parameters can be learned from a small set of 
example images using a function approximation technique that we call an 
analysis network; 2) the inverse mapping from  input ``pose'' and 
``expression'' parameters to output images can be synthesized from a small
set of example images and used to produce new images using a similar synthesis 
network.  The techniques described here have several applications in computer
graphics, special effects, interactive multimedia and very low bandwidth 
teleconferencing.

--------------
:CBCL Paper #81/AI Memo #1432
:title Conditions for Viewpoint Dependent Face Recognition
:author Philippe G. Schyns and Heinrich H. B\"ulthoff
:date August 1993
:pages 6
:keywords face recognition, RBF Network Symmetry

Face recognition stands out as a singular case of object recognition:  
although most faces are very much alike, people discriminate between many 
different faces with outstanding efficiency.  Even though little is known 
about the mechanisms of face recognition, viewpoint dependence, a recurrent 
characteristic of many research on faces, could inform algorithms and 
representations.  Poggio and Vetter's symmetry argument predicts that learning 
only one view of a face may be sufficient for recognition, if this view allows 
the computation of a symmetric, "virtual," view.  More specifically, as faces 
are roughly bilaterally symmetric objects, learning a side-view---which always 
has a symmetric view--- should give rise to better generalization performances 
that learning the frontal view.  It is also predicted that among all new 
views, a virtual view should be best recognized.  We ran two psychophysical 
experiments to test these predictions.  Stimuli were views of 3D models of 
laser-scanned faces.  Only shape was available for recognition; all other face 
cues--- texture, color, hair, etc.--- were removed from the stimuli.  The first
 experiment tested wqhich single views of a face give rise to best 
generalization performances.  The results were compatible with the symmetry 
argument: face recognition from a single view is always better when the 
learned view allows the computation 0f a symmetric view.

--------------
:CBCL Paper #82/AI Memo #1437
:author Reza Shadmehr and Ferdinando A. Mussa-Ivaldi
:title Geometric Structure of the Adaptive Controller of the Human Arm
:date  July 1993
:pages 34
:keywords Motor learning, reaching movements, internal models, force fields, 
virtual environments, generalization, motor control

The objects with which the hand interacts with may significantly change the 
dynamics of the arm.  How does the brain adapt control of arm movements
to this new dynamics?  We show that adaptation is via composition of a 
model of the task's dynamics.  By exploring generalization capabilities 
of this adaptation we infer some of the properties of the computational 
elements with which the brain formed this model:
the elements have broad receptive fields and encode the learned 
dynamics as a map structured in an intrinsic coordinate system closely related 
to the geometry of the skeletomusculature.  The low--level nature of 
these elements suggests that they may represent a set of primitives 
with which movement are represented in the CNS.

--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29
:keywords supervised learning, statistics, decision trees, neural
networks

We present a tree-structured architecture for supervised learning.  The 
statistical model underlying the architecture is a hierarchical mixture model 
in which both the mixture coefficients and the mixture components are 
generalized linear models (GLIM's).  Learning is treated as a maximum 
likelihood problem; in particular, we present an Expectation-Maximization (EM) 
algorithm for adjusting the parameters of the architecture.  We also develop 
an on-line learning algorithm in which the parameters are updated 
incrementally.  Comparative simulation results are presented in the robot 
dynamics domain.

--------------
:CBCL Paper #84/AI Memo #1441
:title On the Convergence of Stochastic Iterative Dynamic Programming 
Algorithms
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:date August 1993
:pages 15
:keywords reinforcement learning, stochastic approximation,
convergence, dynamic programming

Recent developments in the area of reinforcement learning have yielded a 
number of new algorithms for the prediction and control of Markovian 
environments.  These algorithms, including the TD(lambda) algorithm of Sutton 
(1988) and the Q-learning algorithm of Watkins (1989), can be motivated 
heuristically as approximations to dynamic programming (DP).  In this paper 
we provide a rigorous proof of convergence of these DP-based learning 
algorithms by relating them to the powerful techniques of stochastic 
approximation theory via a new convergence theorem.  The theorem establishes 
a general class of convergent algorithms to which both TD (lambda) and 
Q-learning belong.

--------------
:CBCL Paper #86/AI Memo #1449
:title Formalizing Triggers:  A Learning Model for Finite Spaces
:author Patha Niyogi and Robert Berwick
:pages 14
:keywords language learning, parameter systems, Markov chains,
convergence times, computational learning theory
:date November 1993

In a recent seminal paper, Gibson and Wexler (1993) take important
steps to formalizing the notion of language learning in a (finite)
space whose grammars are characterized by a finite number of {\it
parameters\/}. They introduce the Triggering Learning Algorithm (TLA)
and show that even in finite space convergence may be a problem due to
local maxima. In this paper we explicitly formalize learning in finite
parameter space as a Markov structure whose states are parameter
settings. We show that this captures the dynamics of TLA completely
and allows us to explicitly compute the rates of convergence for TLA
and other variants of TLA e.g. random walk. Also included in the paper
are a corrected version of GW's central convergence proof, a list of
``problem states'' in addition to local maxima, and batch and
PAC-style learning bounds for the model.

--------------
:CBCL Paper #87/AI Memo #1458
:title Convergence Results for the EM Approach to Mixtures of Experts 
Architectures
:author Michael Jordan and Xei Xu
:pages 33
:date September 1993

The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation.  Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992).  They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent.  In the current paper we provide a theoretical analysis of this algorithm.  We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood.  We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate.  In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.

--------------
:CBCL Paper #89/AI Memo #1461
:title Face Recognition under Varying Pose
:author David J. Beymer
:pages 14
:date December 1993
:keywords computer vision, face recognition, facial feature detection,
template matching

While researchers in computer vision and pattern recognition have
worked on automatic techniques for recognizing faces for the last 20
years, most systems specialize on frontal views of the face.  We
present a face recognizer that works under varying pose, the difficult
part of which is to handle face rotations in depth.  Building on
successful template-based systems, our basic approach is to represent
faces with templates from multiple model views that cover different
poses from the viewing sphere.  Our system has achieved a recognition
rate of 98% on a data base of 62 people containing 10 testing and 15
modelling views per person.

--------------
:CBCL Paper #90/AI Memo #1452
:title Algebraic Functions for Recognition
:author Amnon Shashua
:pages 11
:date January 1994

In the general case, a trilinear relationship between three perspective views
is shown to exist.  The trilinearity result is shown to be of much practical
use in visual recognition by alignment --- yielding a direct method that cuts
through the computations of camera transformation, scene structure and epipolar
geometry.  The proof of the central result may be of further interest as it
demonstrates certain regularities across homographies of the plane and 
introdues new view invariants.  Experiments on simulated and real image data
were conducted, including a comparative analysis with epipolar intersection
and the linear combination methods, with results indicating a greater degree
of robustness in practice and higher level of performance in re-projection 
tasks.

============================

How to get a copy of a report:

The files are in compressed postscript format and are named by their 
AI memo number.  They are put in a directory named as the year
in which the paper was written.  

Here is the procedure for ftp-ing:

unix> ftp publications.ai.mit.edu (128.52.32.22, log-in as anonymous)
ftp>  cd ai-publications/1993
ftp>  binary
ftp>  get AIM-number.ps.Z
ftp>  quit
unix> zcat AIM-number.ps.Z | lpr


Best wishes,

Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139


From mel at klab.caltech.edu  Thu Feb 17 21:00:32 1994
From: mel at klab.caltech.edu (Bartlett Mel)
Date: Thu, 17 Feb 94 18:00:32 PST
Subject: NIPS*94 Call for Workshops
Message-ID: <9402180200.AA20549@plato.klab.caltech.edu>

	  
		         CALL FOR PROPOSALS 
  
		  NIPS*94 Post-Conference Workshops 
   
		       December 2 and 3, 1994 
			   Vail, Colorado 
	  
  Following the regular program of the Neural Information Processing
  Systems 1994 conference, workshops on current topics in neural
  information processing will be held on December 2 and 3, 1994, in
  Vail, Colorado.  Proposals by qualified individuals interested in
  chairing one of these workshops are solicited.  Past topics have
  included: active learning and control, architectural issues,
  attention, bayesian analysis, benchmarking neural network
  applications, computational complexity issues, computational
  neuroscience, fast training techniques, genetic algorithms, music,
  neural network dynamics, optimization, recurrent nets, rules and
  connectionist models, self- organization, sensory biophysics, speech,
  time series prediction, vision and audition, implementations, and
  grammars.
  
  The goal of the workshops is to provide an informal forum for
  researchers to discuss important issues of current interest.  Sessions
  will meet in the morning and in the afternoon of both days, with free
  time in between for ongoing individual exchange or outdoor activities.
  Concrete open and/or controversial issues are encouraged and preferred
  as workshop topics.  Representation of alternative viewpoints and
  panel-style discussions are particularly encouraged.  Individuals
  proposing to chair a workshop will have responsibilities including:
  1) arranging short informal presentations by experts working on the
  topic, 2) moderating or leading the discussion and reporting its high
  points, findings, and conclusions to the group during evening plenary
  sessions (the ``gong show''), and 3) writing a brief summary.
  
  Submission Procedure: Interested parties should submit a short
  proposal for a workshop of interest postmarked by May 21, 1994.
  (Express mail is   not   necessary.  Submissions by electronic
  mail will also be accepted.)  Proposals should include a title, a
  description of what the workshop is to address and accomplish, the
  proposed length of the workshop (one day or two days), and the planned
  format.  It should motivate why the topic is of interest or
  controversial, why it should be discussed and what the targeted group
  of participants is.  In addition, please send a brief resume of the
  prospective workshop chair, a list of publications and evidence of
  scholarship in the field of interest. 
  
  Mail submissions to:
  
  Todd K. Leen, NIPS*94 Workshops Chair 
  Department of Computer Science and Engineering 
  Oregon Graduate Institute of Science and Technology 
  P.O. Box 91000  Portland 
  Oregon 97291-1000  USA 
  
  (e-mail: tleen at cse.ogi.edu) 
    
  Name, mailing address, phone number, fax number, and e-mail net
  address should be on all submissions.  
  
  PROPOSALS MUST BE POSTMARKED BY MAY 21, 1994
   
  Please Post  

From scheler at informatik.tu-muenchen.de  Fri Feb 18 11:10:21 1994
From: scheler at informatik.tu-muenchen.de (Gabriele Scheler)
Date: Fri, 18 Feb 1994 17:10:21 +0100
Subject: TR announcement: Adaptive Distance Measures
Message-ID: <94Feb18.171027met.42273@papa.informatik.tu-muenchen.de>

FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/scheler.adaptive.ps.Z
 
The file scheler.adaptive.ps.Z is now available for
copying from the Neuroprose repository:

Pattern Classification with Adaptive Distance Measures
Gabriele Scheler
Technische Universit"at M"unchen
(25 pages)

also available as Report FKI-188-94 from
Institut f"ur Informatik
TU M"unchen
D 80290 M"unchen

ftp-host: flop.informatik.tu-muenchen.de
ftp-file: pub/fki/fki-188-94.ps.gz

ABSTRACT:


In this paper, we want to explore the notion of learning the classification
of patterns from examples by synthesizing distance functions.

A working implementation of a distance classifier is presented.
Its operation is illustrated with the problem of classification according
to parity (highly non-linear) and a classification of feature vectors which
involves dimension reduction (a linear problem). A solution to these
problems is sought in two steps: (a) a parametrized distance function (called
a `distance function scheme') is chosen, (b) setting parameters to values
according to the classification of training patterns results in a specific
distance function. This induces a classification on all remaining
patterns.

The general idea of this approach is to find restricted functional shapes
in order to model certain cognitive functions of classification exactly,
i.e. performing classifications that occur as well as excluding classifications
that do not naturally occur and may even be experimentally proven to be 
excluded from learnability by a living organism.
 
There are also certain technical advantages in using restricted function
shapes and simple learning rules, such as reducing learning time, generating
training sets and individual patterns to set certain parameters, determining
the learnability of a specific problem with a given function scheme or 
providing additions to functions for individual exceptions, while retaining 
the general shape for generalization.


From soller at asylum.cs.utah.edu  Fri Feb 18 19:13:34 1994
From: soller at asylum.cs.utah.edu (Jerome Soller)
Date: Fri, 18 Feb 94 17:13:34 -0700
Subject: 2nd An. Utah Workshop on the Applicat. of Intelligent and Adap. Systems
Message-ID: <9402190013.AA09689@asylum.cs.utah.edu>

------------------------------------------------
2nd Annual Utah Workshop on:

"Applications of Intelligent and Adaptive Systems"

Sponsored by:

The University of Utah Cognitive Science Industrial Advisory Board
and
The Joint Services Software Technology Conference '94

--------------------------------------------------

Date:  April 15, 1994  Time:  8:00 a.m.-2:30 p.m.
Cost:  contact Jerome Soller or Dale Sanders for the cost for 
	non-conference attendees, free for conference attendees
Location:  Salt Lake City Marriott, Salon E, 75 South and West Temple

--------------------------------------------------

Talk 1:

"The Use of Genetic Algorithms and Neural Networks in the Automatic
Interpretation of Medical Images",

Dr. Charles Rosenberg
Research Investigator, 
VA Geriatric, Research, Education, and Clinical Center
and 
Adjunct Assistant Professor,
Department of Psychology,
University of Utah

(crr at cogsci.psych.utah.edu)

((801) 582-1565, x-2458)

--------------------------------------------------

Talk 2:

"A Hybrid On-line Handwriting Recognition System"

Dr. Nicholas S. Flann.
Assistant Professor,
Computer Science Department,
Utah State University.

(flann at nick.cs.usu.edu)

((801) 750-2451)

--------------------------------------------------

Talk 3:

"Prototyping Activities in Robotics, Control, and Manufacturing"


Dr. Tarek M. Sobh
Research Assistant Professor 
Computer Science Department
University of Utah

(sobh at wingate.cs.utah.edu)

((801) 585-5047)

--------------------------------------------------

Talk 4:

"Software Architecture and Unmanned Ground Vehicles"

Dr. David Morgenthaler
Program Manager
Sarcos Research Corporation
Salt Lake City, UT

(David_Morgenthaler at ced.utah.edu)

((801) 581-0155)

--------------------------------------------------

Lunch Break:  11:45 a.m.-12:45 p.m.

--------------------------------------------------

Talk 5:

"Use of Decision Support in a Hospital Information System"

Dr. Allan Pryor
Professor of Medical Informatics
University of Utah
and 
Assistant Vice President of Informatics
Intermountain Health Care
Salt Lake City UT

(tapryor at cc.utah.edu)

((801) 321-2128)

--------------------------------------------------

Talk 6:

"Applications of Neural Networks in Critical Care Monitoring"

Dr. Joe Orr
Research Instructor
Department of Anesthesiology
University of Utah

(jorr at soma.med.utah.edu)

((801) 581-6393)

--------------------------------------------------

Pre-registration required; For registration, copies of the abstracts,
or references for publications relating to these talks, please contact:

Jerome Soller, Veterans Affairs Medical Center and
University of Utah Computer Science
(801) 582-1565, ext 2469; (801) 581-7977
soller at cs.utah.edu

or

Dale Sanders, TRW Inc., Ogden Engineering Services
(801) 625-8343
dale_sanders at oz.bmd.trw.com

--------------------------------------------------

We wish to thank the following for their support of this workshop:

Applied Information and Management Systems, Inc.; Intermountain Health Care;
The Joint Services Software Technology Conference; Salt Lake Veterans Affairs
Geriatric Research, Education, and Clinical Center; Sarcos Corporation; 3M
Health Information Systems; TRW Systems Integration Group; University of Utah
Departments of Computer Science, Medical Informatics, and Physiology; Utah
Information Technology Association


From judd at scr.siemens.com  Fri Feb 18 21:31:24 1994
From: judd at scr.siemens.com (Stephen Judd)
Date: Fri, 18 Feb 1994 21:31:24 -0500
Subject: Optimal Stopping Time  paper
Message-ID: <199402190231.VAA27524@tern.siemens.com>

***Do not forward to other bboards***
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wang.optistop.ps.Z

The file wang.optistop.ps.Z is now available for
copying from the Neuroprose repository:

Optimal Stopping and Effective Machine Complexity in Learning

    Changfeng Wang			U.Penn
    Santosh S. Venkatesh	U.Penn
    J. Stephen Judd			Siemens

Abstract:
We study the problem of when to stop training a class of feedforward networks
-- networks with fixed input weights, one hidden layer, and a linear output --
when they are trained with a gradient descent algorithm on a finite number
of examples. Under general regularity conditions, it is shown analytically 
that there are, in general, three distinct phases in the generalization 
performance in the learning process. In particular, the network has better 
generalization performance when learning is stopped at a certain time before 
the global minimum of the empirical error is reached. A notion of "effective 
size" of a machine is defined and used to explain the trade-off between the 
complexity of the machine and the training error in the learning process.

The study leads naturally to a network size selection criterion,
which turns out to be a generalization of Akaike's Information Criterion
for the learning process.
It is shown that stopping learning before the global minimum of the
empirical error has the effect of network size selection.


(8 pages)    To appear in NIPS-6-  (1993)

sj
        Stephen Judd				Siemens Corporate Research,
	(609) 734-6573				755 College Rd. East,
	fax (609) 734-6565			Princeton,
	judd at learning.scr.siemens.com		NJ  usa 08540


From mjolsness-eric at CS.YALE.EDU  Mon Feb 21 10:58:26 1994
From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness)
Date: Mon, 21 Feb 94 10:58:26 EST
Subject: clustering & matching papers
Message-ID: <199402211558.AA05604@NEBULA.SYSTEMSZ.CS.YALE.EDU>

****** PLEASE DO NOT FORWARD TO OTHER MAILING LISTS OR BOARDS. **************
****** PAPER AVAILABLE VIA NEUROPROSE ***************************************

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/gold.object-clustering.ps.Z
FTP-filename: /pub/neuroprose/lu.object-matching.ps.Z

The following two NIPS papers have been placed in the Neuroprose archive
at Ohio State. The files are "gold.object-clustering.ps.Z" and
"lu.object-matching.ps.Z".  Each is  8 pages in length.  The uncompressed
postscript file for the second paper, "lu.object-matching.ps.Z", contains
images and is 4.3 megabytes long.  So you may need to use a symbolic link
in printing it: "lpr -s" under SunOS.


-----------------------------------------------------------------------------

Clustering with a Domain-Specific Distance Measure

Stephen Gold, Eric Mjolsness and Anand Rangarajan
Yale Computer Science Department

With a point matching distance measure which is invariant under
translation, rotation and permutation, we learn 2-D point-set objects,
by clustering noisy point-set images.  Unlike traditional clustering
methods which use distance measures that operate on feature vectors - a
representation common to most problem domains - this object-based
clustering technique employs a distance measure specific to a type of
object within a problem domain.  Formulating the clustering problem as
two nested objective functions, we derive optimization dynamics similar
to the Expectation-Maximization algorithm used in mixture models.

-----------------------------------------------------------------------------

Two-Dimensional Object Localization by Coarse-to-Fine Correlation
Matching

Chien-Ping Lu and Eric Mjolsness
Yale Computer Science Department

We present a Mean Field Theory method for locating two-dimensional
objects that have undergone rigid transformations.  The resulting
algorithm is a coarse-to-fine correlation matching.  We first consider
problems of matching synthetic point data, and derive a point matching
objective function.  A tractable line segment matching objective
function is derived by considering each line segment as a dense
collection of points, and approximating it by a sum of Gaussians.  The
algorithm is tested on real images from which line segments are
extracted and matched.

-----------------------------------------------------------------------------

		- Eric Mjolsness
		  mjolsness at cs.yale.edu


-------

From pkso at castle.ed.ac.uk  Tue Feb 22 13:54:42 1994
From: pkso at castle.ed.ac.uk (P Sollich)
Date: Tue, 22 Feb 94 18:54:42 GMT
Subject: Preprint on query learning in Neuroprose archive
Message-ID: <9402221854.aa28409@uk.ac.ed.castle>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/sollich.queries.ps.Z

The file sollich.queries.ps.Z (16 pages) is now available via anonymous
ftp from the Neuroprose archive.  Title and abstract are given below.
We regret that hardcopies are not available. 

---------------------------------------------------------------------------

                     Query Construction, Entropy and 
                             Generalization 
                       in Neural Network Models


                             Peter Sollich

             Department of Physics, University of Edinburgh,
         Kings Buildings, Mayfield Road, Edinburgh EH9 3JZ, U.K.

                    (To appear in Physical Review E) 


                               Abstract 

    We study query construction algorithms, which aim at improving the
    generalization ability of systems that learn from examples by
    choosing optimal, non-redundant training sets.  We set up a
    general probabilistic framework for deriving such algorithms from
    the requirement of optimizing a suitable objective function;
    specifically, we consider the objective functions entropy (or
    information gain) and generalization error. 
    
    For two learning scenarios, the high-low game and the linear
    perceptron, we evaluate the generalization performance obtained by
    applying the corresponding query construction algorithms and
    compare it to training on random examples.  We find qualitative
    differences between the two scenarios due to the different
    structure of the underlying rules (nonlinear and `non-invertible'
    vs.linear); in particular, for the linear perceptron, random
    examples lead to the same generalization ability as a sequence of
    queries in the limit of an infinite number of examples. 
    
    We also investigate learning algorithms which are ill-matched to
    the learning environment and find that in this case, minimum
    entropy queries can in fact yield a lower generalization ability
    than random examples.  Finally, we study the efficiency of single
    queries and its dependence on the learning history, i.e. on
    whether the previous training examples were generated randomly or
    by querying, and the difference between globally and locally
    optimal query construction. 

---------------------------------------------------------------------------
 Peter Sollich                           Dept. of Physics
                                         University of Edinburgh
 e-mail: P.Sollich at ed.ac.uk              Kings Buildings
 Tel. +44-31-650 5236                    Mayfield Road
                                         Edinburgh EH9 3JZ, U.K.
---------------------------------------------------------------------------

From B344DSL at UTARLG.UTA.EDU  Tue Feb 22 22:18:10 1994
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Tue, 22 Feb 1994 21:18:10 -0600 (CST)
Subject: Conference announcement
Message-ID: <01H9786W7CBM0004O8@UTARLG.UTA.EDU>

	ANNOUNCEMENT AND CALL FOR ABSTRACTS

Conference on Oscillations in Neural Systems, Sponsored by the 
Metroplex Institute for Neural Dynamics (MIND) and the University  
of Texas at Arlington.  To be held Thursday through Saturday, 
MAY 5-7, 1994

Location:
UNIVERSITY OF TEXAS AT ARLINGTON
MAIN LIBRARY, 6TH FLOOR PARLOR

Official Conference Motel:
Park Inn
703 Benge Drive
Arlington, TX 76013
1-800-777-0100 or 817-860-2323

A block of rooms has been reserved at the Park Inn for $35 a night 
(single or double).  Room sharing arrangements are possible.  
Reservations should be made directly through the motel.

Official Conference Travel Agent:
Airline reservations to Dallas-Fort Worth airport should be made 
through Dan Dipert travel in Arlington, 1-800-443-5335.  For those 
who wish to fly on American Airlines, a Star File account has been 
set up for a 5% discount off lowest available fares (two week 
advance, staying over Saturday night) or 10% off regular coach 
fare; arrangements for Star File reservations should be made  
through Dan Dipert.  Please let the conference organizers know   
(by e-mail or telephone) when you plan to arrive: some people    
can be met at the airport (about 30 minutes from Arlington),   
others can call Super Shuttle at 817-329-2000 upon arrival for  
transportation to the Park Inn (about $14-$16 per person).

Registration for the conference is $25 for students, $65 for non-
student oral or poster presenters, $85 for others.  MIND members 
will have $20 (or $10 for students) deducted from the registration.
A registration form is attached to this announcement.  
Registrants will receive the MIND monthly newsletter (on e-mail 
when possible) for the remainder of 1994. 

Invited speakers:

Bill Baird (University of California, Berkeley)
Adi Bulsara (Naval Research Laboratories, San Diego)
Alianna Maren (Accurate Automation Corporation)
George Mpitsos (Oregon State University)
Martin Stemmler (California Institute of Technology)
Roger Traub (IBM, Tarrytown, New York) 
Robert Wong (Downstate Medical Center, Brooklyn)
Geoffrey Yuen (Northwestern University)

       Those interested in presenting are invited to submit 
abstracts (1-2 paragraphs) any time between now and March 15,  
1994, of any work related to the theme of the conference.  The 
topic of neural oscillation is currently of great interest to  
psychologists and neuroscientists alike.  Recently it has been  
observed that neurons in separate areas of the brain will oscillate
in synchrony in response to certain stimuli.  One hypothesized 
function for such synchronized oscillations is to solve the    
"binding problem," that is, how is it that disparate features
of objects (e.g., a person's face and their voice) are tied  
together into a single unitary whole.  Some bold speculators
(such as Francis Crick in his recent book, The Astonishing 
Hypothesis) even argue that synchronized neural oscillations form
the basis for consciousness.
       Talks will be 1 hour for invited speakers and 45 minutes for 
contributed speakers including questions.  There will be no  
parallel sessors.  Contributors whose work is considered worthy
of presentation but who cannot be fit into the schedule will be 
invited to present posters.
       Presenters will not be required to write complete papers.  
After the conference is over, we will attempt to obtain a contract 
with a publisher for a book based on the conference.  Oral and
poster presenters will be invited to submit chapters to this book, 
although it is not a precondition for being a speaker.  Two books 
based on previous MIND conferences (Motivation, Emotion, and
Goal Direction in Neural Networks and Neural Networks for Knowledge
Representation and Inference) have been published by Lawrence 
Erlbaum Associates, and a book based on our last conference 
(Optimality in Biological and Artificial Networks?) is now in
progress, under contract with Erlbaum as part of their joint series 
with INNS.
       Abstracts should submitted, by e-mail, snail mail, or fax, 
to:

Professor Daniel S. Levine
Department of Mathematics, University of Texas at Arlington
411 S. Nedderman Drive
Arlington, TX 76019-0408
Office telephone: 817-273-3598, fax: 817-794-5802
e-mail: b344dsl at utarlg.uta.edu

Further inquiries about the conference can be addressed to  
Professor Levine or to the other two conference organizers:

Professor Vincent Brown       Mr. Timothy Shirey
817-273-3247                  214-495-3500 or 214-422-4570
b096vrb at utarlg.uta.edu        73353.3524 at compuserve.com


Please distribute this announcement to anyone you think may be 
interested in the conference.

REGISTRATION FOR MIND/INNS CONFERENCE ON OSCILLATIONS IN NEURAL
SYSTEMS, UNIVERSITY OF TEXAS AT ARLINGTON, MAY 5-7, 1994


Name  ______________________________________________________________

Address  ___________________________________________________________

         ___________________________________________________________
          
         ___________________________________________________________

         ____________________________________________________________

E-Mail    __________________________________________________________

Telephone _________________________________________________________


Registration fee enclosed:
                   _____   $15  Student, member of MIND

                   _____   $25  Student

                   _____   $65  Non-student oral or poster presenter

                   _____   $65  Non-student member of MIND

                   _____   $85  All others
 
Will you be staying at the Park Inn?         ____  Yes  ____  No
Are you planning to share a room with
someone you know?                            ____  Yes  ____  No

If so, please list that person's name __________________________            

If not, would be you be interested in
sharing a room with another conference
attendee to be assigned?                     ____  Yes  ____ No

PLEASE REMEMBER TO CALL THE PARK INN DIRECTLY FOR YOUR RESERVATION
(WHETHER SINGLE OR DOUBLE) AT 1-800-777-0100 OR 817-860-2323.

From fellous at selforg.usc.edu  Tue Feb 22 23:31:06 1994
From: fellous at selforg.usc.edu (Jean-Marc Fellous)
Date: Tue, 22 Feb 94 20:31:06 PST
Subject: Research Associate
Message-ID: <9402230431.AA00747@selforg.usc.edu>


Could you please post this announcement ?

Thanks,

Jean-Marc

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

                  TENNESSEE STATE UNIVERSITY
                 CENTER FOR NEURAL ENGINEERING

                   RESEARCH ASSOCIATE

Applications are invited for a research associate position, for a unique
consortium involving a medical school, and an engineering college, Oak Ridge
National Laboratory and a private high-tech industry.

Ph.D in Biomedical/Electrical Engineering (or related fields) with strong
intrest in artificial and biological neural networks is required, inthe areas
of auditory system modeling and sensory motor control.

This position will be supported for at least two years and possibly longer.
Teaching of a graduate or an undergraduate course is optional.

Send resume to :

                   Dr. Mohan J. Malkani
               Director, Center for Neural Engineering
               Tennessee State University
                3500 John Merritt Blvd.
                Nashville, TN  37209-1561
               (615)320-3550  Fax: (615)320-3554
               e-mail: malkani at harpo.tnstate.edu 


From sbh at eng.cam.ac.uk  Tue Feb 22 12:00:33 1994
From: sbh at eng.cam.ac.uk (S.B. Holden)
Date: Tue, 22 Feb 94 17:00:33 GMT
Subject: PhD dissertation available by anonymous ftp
Message-ID: <5730.199402221700@tw700.eng.cam.ac.uk>


The following PhD dissertation is available by anonymous ftp from the
archive of the Speech, Vision and Robotics Group at the Cambridge
University Engineering Department.

   On the Theory of Generalization and Self-Structuring in Linearly 
                   Weighted Connectionist Networks

                          Sean B. Holden

             Technical Report CUED/F-INFENG/TR161

	    Cambridge University Engineering Department 
		        Trumpington Street 
		        Cambridge CB2 1PZ 
			     England 


                             Abstract

The study of connectionist networks has often been criticized for 
an overall lack of rigour, and for being based on excessively ad 
hoc techniques. Even though connectionist networks have now been 
the subject of several decades of study, the available body of
research is characterized by the existence of a significant body of
experimental results, and a large number of different techniques, with
relatively little supporting, explanatory theory. This dissertation
addresses the theory of {\em generalization performance\/} and {\em
architecture selection\/} for a specific class of connectionist
networks; a subsidiary aim is to compare these networks with the
well-known class of multilayer perceptrons. 

After discussing in general terms the motivation for our study, we
introduce and review the class of networks of interest, which we call 
{\em $\Phi$-networks\/}, along with the relevant supervised training 
algorithms. In particular, we argue that $\Phi$-networks can in
general be trained significantly faster than multilayer perceptrons,
and we demonstrate that many standard networks are specific examples
of $\Phi$-networks. 

Chapters 3, 4 and 5 consider generalization performance by presenting
an analysis based on tools from computational learning theory. In
chapter 3 we introduce and review the theoretical apparatus required,
which is drawn from {\em Probably Approximately Correct (PAC) learning
theory\/}. In chapter 4 we investigate the {\em growth function\/} and
{\em VC dimension\/} for general and specific $\Phi$-networks,
obtaining several new results. We also introduce a technique which
allows us to use the relevant PAC learning formalism to gain some
insight into the effect of training algorithms which adapt
architecture as well as weights (we call these {\em self-structuring
training algorithms\/}). We then use our results to provide a
theoretical explanation for the observation that $\Phi$-networks can
in practice require a relatively large number of weights when compared
with multilayer perceptrons. In chapter 5 we derive new necessary and
sufficient conditions on the number of training examples required when
training a $\Phi$-network such that we can expect a particular
generalization performance. We compare our results with those derived
elsewhere for feedforward networks of Linear Threshold Elements, and
we extend one of our results to take into account the effect of using
a self-structuring training algorithm. 

In chapter 6 we consider in detail the problem of designing a good
self-structuring training algorithm for $\Phi$-networks. We discuss
the best way in which to define an optimum architecture, and we then
use various ideas from linear algebra to derive an algorithm, which we
test experimentally. Our initial analysis allows us to show that the
well-known {\em weight decay\/} approach to self-structuring is not
guaranteed to provide a network which has an architecture close to the
optimum one. We also extend our theoretical work in order to provide a
basis for the derivation of an improved version of our algorithm. 

Finally, chapter 7 provides conclusions and suggestions for future
research. 


************************ How to obtain a copy ************************

a) Via FTP:

unix> ftp svr-ftp.eng.cam.ac.uk
Name: anonymous
Password: (type your email address)
ftp> cd reports
ftp> binary
ftp> get holden_tr161.ps.Z
ftp> quit
unix> uncompress holden_tr161.ps.Z
unix> lpr holden_tr161.ps (or however you print PostScript)

b) Via postal mail:

Request a hardcopy from

Dr. Sean B. Holden,
Cambridge University Engineering Department, 
Trumpington Street, 
Cambridge CB2 1PZ,
England.

or email me: sbh at eng.cam.ac.uk


From viola at salk.edu  Wed Feb 23 14:17:52 1994
From: viola at salk.edu (Paul Viola)
Date: Wed, 23 Feb 94 11:17:52 PST
Subject: Heinous Patent
Message-ID: <9402231917.AA24448@salk.edu>

    From: Vision-List moderator Phil Kahn <Vision-List-Request at teleos.com>

    VISION-LIST Digest    Tue Feb 22 11:26:42 PDT 94     Volume 13 : Issue 8

    Date: Thu, 17 Feb 1994 22:23:00 GMT
    From: eledavis at ubvms.cc.buffalo.edu (Elliot Davis)
    Organization: University at Buffalo
    Subject: Error Reduction

    I would greatly appreciate your thoughts on the:

			ERROR TEMPLATE TECHNIQUE

    The "Error Template" technique (patent 4,802,231) provides an
    alternative method for reducing false alarms in pattern recognition
    systems. In this approach, a pattern representing a mismatched
    pattern is stored in the reference lexicon. It is a reference
    pattern to an error rather then to what is desired. THIS IS DONE
    WITH THE EXPECTATION THAT IF THE ERROR PATTERN OR A VARIATION OF IT
    IS REPEATED IT WILL TEND TO BE CLOSER TO ITSELF THEN TO THE PATTERN
    THAT IT FALSED OUT TO. 

    ...

Unless this patent is very old, I find it terrifying.  It is a concept
that is clearly part of the pattern recognition literature of the
70's.  Essentially pattern classification works by finding clusters
that represent classes.  These clusters along with a measurement model
define a probability density over the pattern space.  All this
technique is doing is adding an additional cluster which represents a
particular type of measurement error sensing a class.  Pattern
classification theory tells us that this should be done whenever there
is a particular measurement error that is not modeled well by our
measurement model.  You add a cluster when the distribution of data is
different from the probability density predicted by the model -- i.e.
a particular measurement error is more common than your model
predicts.  You can add these clusters by hand, as the patent suggests,
or you can let a density estimation scheme discover them for you (a
mixture of gaussians model trained with EM works nicely).  End of
story.

So remember, anytime someone adds another cluster to a pattern
classification model, they owe the owner of this patent money.

I wonder what the date of this fine patent is??

Paul Viola

From cohn at psyche.mit.edu  Wed Feb 23 18:15:17 1994
From: cohn at psyche.mit.edu (David Cohn)
Date: Wed, 23 Feb 94 18:15:17 EST
Subject: Paper available: Exploration using optimal experiment design
Message-ID: <9402232315.AA21110@psyche.mit.edu>

Those who find Peter Sollich's paper on query construction of interest
may also wish to look at the following paper, now available by
anonymous ftp. This is a slightly revised version of the paper that is
to appear in Advances in Neural Information Processing Systems 6, but
includes a correction to Equation 2 that was made too late to be
included in the NIPS volume.

#####################################################################
Neural Network Exploration Using Optimal Experiment Design

David A. Cohn
Dept. of Brain and Cognitive Sciences
Massachusetts Inst.\ of Technology 
Cambridge, MA 02139

Consider the problem of learning input/output mappings through
exploration, e.g. learning the kinematics or dynamics of a robotic
manipulator.  If actions are expensive and computation is cheap, then
we should explore by selecting a trajectory through the input space
which gives us the most amount of information in the fewest number of
steps.  I discuss how results from the field of optimal experiment
design may be used to guide such exploration, and demonstrate its use
on a simple kinematics problem.
#####################################################################

The paper may be retrieved by anonymous ftp to "psyche.mit.edu" using
the following protocol:

unix> ftp psyche.mit.edu

Name (psyche.mit.edu:joebob): anonymous    <- use "anonymous" here
331 Guest login ok, send ident as password.
Password: joebob at machine.univ.edu	   <- use your email address here
230 Guest login ok, access restrictions apply.
ftp> cd pub/cohn                           <- go to the directory
250 CWD command successful.
ftp> binary                                <- change to binary transfer
200 Type set to I.
ftp> get cohn.explore.ps.Z                 <- get the file
200 PORT command successful.
150 Binary data connection for cohn.explore.ps.Z ...
226 Binary Transfer complete.
local: cohn.explore.ps.Z remote: cohn.explore.ps.Z
301099 bytes received in 2.8 seconds (1e+02 Kbytes/s)
ftp> quit                                  <- all done
221 Goodbye.

From terry at salk.edu  Thu Feb 24 05:49:35 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Thu, 24 Feb 94 02:49:35 PST
Subject: Shakespeare and Neural Nets
Message-ID: <9402241049.AA02725@salk.edu>

from New Scientist 22 january 1994 p. 23

In an interesting article on the use of statistical measures to
assess the attribution of texts to authors, Robert Matthews and 
Tom Merrriam report that:

"Applying our neural network to disputed works such as 
'The Two Noble Kinsman' has produced some interesting
results and helped to settle some bitter arguments over authorship
of controversial texts. ...

"The first task was to train the network.  This we did by exposing
it to data extracted from a large number of samples of Shakespeare's
undisputed work, together with that of his successor with The King's
Men [a theater], John Fletcher. ... We then set the network loose on
'The Two Noble Kinsman'.  Drawing on a wide variety of essentially
subjective evidence, scholars have claimed that Shakespeare's hand
dominates Acts I and V, with much of the rest appearing to be by
Fletcher.  In March last year, our neural network agreed with these
attributions -- and proferred the extra opinion that Fletcher may
have received considerable help from Shakespeare in Act IV.  In short,
our neural network quantitatively supports the subjective view of its
much more sophisticated human counterparts that 'The Two Noble Kinsman'
is a genuine collaboration between Shakespeare and one of his
contemporaries."

These results will appear in the journal 'Literary and Linguistic Computing'.

A similar approach might be used to determine the contributions of
coauthors to scientific papers.

Terry

-----

From efiesler at maya.idiap.ch  Fri Feb 25 09:16:09 1994
From: efiesler at maya.idiap.ch (E. Fiesler)
Date: Fri, 25 Feb 94 15:16:09 +0100
Subject: NN Formalization paper available by ftp.
Message-ID: <9402251416.AA04305@maya.idiap.ch>


                              PLEASE POST
                              -----------


The following paper is available via anonymous ftp from the neuroprose archive.
It counts 13 A4-size PostScript pages, and  replaces a shorter preliminary ver-
sion.  Instructions for retrieval follow the abstract.


              NEURAL NETWORK CLASSIFICATION AND FORMALIZATION


                               E. Fiesler

                                 IDIAP
                               c.p. 609
                           CH-1920 Martigny
                              Switzerland


This paper has been accepted for publication in the special issue on Neural
Network Standards of  "Computer Standards & Interfaces", volume 16,  edited
by J. Fulcher.  Elsevier Science Publishers, Amsterdam, 1994.


                                ABSTRACT

In order to  assist the field of neural networks  in maturing, a formalization
and a solid foundation are essential. Additionally, to permit the introduction of formal proofs, it is essential to have an all-encompassing formal mathemat-
ical definition of a neural network.
        This publication offers a neural network formalization consisting of a
topological taxonomy,  a uniform nomenclature, and  an accompanying consistent
mnemonic notation.  Supported by this formalization, both a flexible mathemat-
ical definition are presented.


                     ------------------------------

To obtain a copy of this paper, please follow the following FTP instructions:

unix>     ftp archive.cis.ohio-state.edu (or: ftp 128.146.8.52)
login:    anonymous
password: <your e-mail address>
ftp>      cd pub/neuroprose
ftp>      binary
ftp>      get fiesler.formalization.ps.Z
ftp>      bye
unix>     zcat fiesler.formalization.ps.Z | lpr
                  (or however you uncompress and print postscript)

For convenience of those outside the US, the paper has also been placed on the
IDIAP ftp site:

unix>     ftp Maya.IDIAP.CH     (or: ftp 192.33.221.1)
login:    anonymous
password: <your e-mail address>
ftp>      cd pub/papers/neural
ftp>      binary
ftp>      get fiesler.formalization.ps.Z  (OR  get fiesler.formalization.ps)
ftp>      bye
unix>     zcat fiesler.formalization.ps.Z | lpr
      OR
unix>     lpr fiesler.formalization.ps

(Hard copies of the paper are unfortunately not available.)


P.S.  Thanks for the update, Jordan !

From giles at research.nj.nec.com  Fri Feb 25 18:28:59 1994
From: giles at research.nj.nec.com (Lee Giles)
Date: Fri, 25 Feb 94 18:28:59 EST
Subject: Available
Message-ID: <9402252328.AA28936@fuzzy>


********************************************************************************
Reprint:USING RECURRENT NEURAL NETWORKS TO LEARN THE STRUCTURE
OF INTERCONNECTION NETWORKS

The following reprint is available via the University of Maryland
Department of Computer Science Technical Report archive:

________________________________________________________________________________
                  "Using Recurrent Neural Networks to 
           Learn the Structure of Interconnection Networks"

  UNIVERSITY OF MARYLAND TECHNICAL REPORT UMIACS-TR-94-20 AND CS-TR-3226
               
                G.W. Goudreau(a) and C.L. Giles(b,c)

           goudreau at cs.ucf.edu, giles at research.nj.nec.com

(a) Department of Computer Science, U. of Central Florida, Orlando, FL 32816
(b) NEC Research Inst.,4 Independence Way, Princeton, NJ 08540
(c) Inst. for Advanced Computer Studies, U. of Maryland, College Park, MD 20742  


A modified Recurrent Neural Network (RNN) is used to learn a Self-Routing 
Interconnection Network (SRIN) from a set of routing examples. The RNN is 
modified so that it has several distinct initial states. This is equivalent 
to a single RNN learning multiple different synchronous sequential machines. 
We define such a sequential machine structure as "augmented" and show that
a SRIN is essentially an Augmented Synchronous Sequential Machine (ASSM).
As an example, we learn a small six-switch SRIN. After training we extract 
the network's internal representation of the ASSM and corresponding SRIN.

--------------------------------------------------------------------------------
                          FTP INSTRUCTIONS

                unix> ftp cs.umd.edu (128.8.128.8)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/pub/papers/TRs
                ftp> binary
                ftp> get 3226.ps.Z
                ftp> quit
                unix> uncompress 3226.ps.Z

---------------------------------------------------------------------------------

--                                 
C. Lee Giles / NEC Research Institute / 4 Independence Way
Princeton, NJ 08540 / 609-951-2642 / Fax 2482
==


From terry at salk.edu  Fri Feb 25 12:59:53 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Fri, 25 Feb 94 09:59:53 PST
Subject: NEURAL COMPUTATION 6:2
Message-ID: <9402251759.AA18225@salk.edu>

Neural Computation  March 1994  Volume 6  Issue 2

Article:

Hierarchical Mixtures of Experts and the EM Algorithm
        Michael I. Jordan and Robert A. Jacobs

Notes:

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play
        Gerald Tesauro 

Correlated Attractors from Uncorrelated Stimuli
        L.F. Cugliandolo
        
Letters:

Learning of Phase-lags in Coupled Neural Oscillators
        Bard Ermentrout and Nancy Kopell
        
A Mechanism for Neuronal Gain Control by Descending Pathways
        Mark E. Nelson

The Role of Weight Normalization in Competitive Learning
        Geoffrey J. Goodhill and Harry G. Barrow

A Probabilistic Resource Allocating Network for Novelty Detection
        Stephen Roberts and Lionel Tarassenko

Diffusion Approximations for the Constant Learning Rate 
Backpropagation Algorithm and Resistance to Local Minima
        William Finnoff

Relating Real-time Backpropagation and Back-propagation Through Time:
An Application of Flow Graph Interreciprocity
        Francoise Beaufays and Eric A. Wan

Smooth On-line Learning Algorithms for Hidden Markov Models
        Pierre Baldi and Yves Chauvin

On Functional Approximation with Normalized Gaussian Units
        Michel Benaim

Statistical Physics, Mixtures of Distributions and the EM Algorithm
        Yuille, A.L., Stolorz, P., and Utans, J.


-----
 
SUBSCRIPTIONS - 1994 - VOLUME 6 - BIMONTHLY (6 issues)
 
______ $40     Student and Retired
______ $65     Individual
______ $166    Institution
 
Add $22 for postage and handling outside USA (+7% GST for Canada).
 
(Back issues from Volumes 1-5 are regularly available for $28 each
to institutions and $14 each for individuals
Add $5 for postage per issue outside USA (+7% GST for Canada)
 
MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142.
Tel: (617) 253-2889  FAX: (617) 258-6779  e-mail: hiscox at mitvma.mit.edu
 
-----

From heger at Informatik.Uni-Bremen.DE  Mon Feb 28 07:27:12 1994
From: heger at Informatik.Uni-Bremen.DE (Matthias Heger)
Date: Mon, 28 Feb 94 13:27:12 +0100
Subject: paper available
Message-ID: <9402281227.AA06748@Informatik.Uni-Bremen.DE>

FTP-host: 	ftp.gmd.de
FTP-filename: 	/Learning/rl/papers/heger.consider-risk.ps.Z

The file heger.consider-risk.ps.Z is now available for copying from the RL
papers repository:


	   ***************************************************
	   * Consideration of Risk in Reinforcement Learning *
	   ***************************************************


	(Revised submission to the 11th International Conference on
 	 	  Machine Learning (ML94), 15 pages)

				
				Abstract
				--------
 
Most Reinforcement Learning (RL) work supposes policies for sequential
decision tasks to be optimal that minimize the expected total discounted
cost (e.g. Q-Learning [Wat 89], AHC [Bar Sut And 83]). On the other hand,
it is well known that it is not always reliable and can be treacherous to 
use the expected value as a decision criterion [Tha 87]. A lot of alter-
native decision criteria have been suggested in decision theory to get a
more sophisticated consideration of risk but most RL researchers have not 
concerned themselves with this subject until now. The purpose of this
paper is to draw the reader's attention to the problems of the expected
value criterion in Markov Decision Processes and to give Dynamic Pro-
gramming algorithms for an alternative criterion, namely the Minimax cri-
terion. A counterpart to Watkins' Q-Learning related to the Minimax cri-
terion is presented. The new algorithm, called Q^-Learning
(Q-hat-Learning), finds policies that minimize the >>worst-case<< total
discounted costs. Most mathematical details aren't presented here but can
be found in [Heg 94].


----------------------------------------------------------------------------
Here is an example of retrieving and printing the file:

-> ftp ftp.gmd.de
Connected to gmdzi.gmd.de.
220 gmdzi FTP server (Version 5.72 Fri Nov 20 20:35:05 MET 1992) ready.
Name (ftp.gmd.de:heger): anonymous
331 Guest login ok, send your email-address as password.
Password:
230-This is an experimental FTP Server. See /README for details.
    This site is in Germany, Europe. Please restrict downloads to
    our non-working hours (i.e outside of 08:00-18:00 MET, Mo-Fr)
*** Local time is 12:25:22 MET
230 Guest login ok, access restrictions apply.
ftp> cd Learning/rl/papers
250 CWD command successful.
ftp> binary
200 Type set to I.
ftp> get heger.consider-risk.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for heger.consider-risk.ps.Z (100477 bytes).
226 Transfer complete.
local: heger.consider-risk.ps.Z remote: heger.consider-risk.ps.Z
100477 bytes received in 3.2e+02 seconds (0.3 Kbytes/s)
ftp> quit
221 Goodbye.
-> uncompress heger.consider-risk.ps.Z
-> lpr heger.consider-risk.ps
-------------------------------------------------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Matthias Heger						+
+ Zentrum fuer Kognitionswissenschaften, Universitaet Bremen,	+
+ Postfach 330 440						+
+ D-28334 Bremen, Germany					+
+ 								+
+ email: heger at informatik.uni-bremen.de				+
+ Tel.: +49 (0) 421 218 4659					+
+ Fax:  +49 (0) 421 218 3054					+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From gerda at ai.univie.ac.at  Mon Feb 28 10:42:04 1994
From: gerda at ai.univie.ac.at (Gerda Helscher)
Date: Mon, 28 Feb 1994 16:42:04 +0100
Subject: EMCSR'94
Message-ID: <199402281542.AA23377@anif.ai.univie.ac.at>


After the general info which appeared in this mailing list recently 
about the

                T W E L F T H   E U R O P E A N   M E E T I N G
                     O N   C Y B E R N E T I C S   A N D
                       S Y S T E M S   R E S E A R C H
                             ( E M C S R ' 9 4 ) 

here is the detailed programme of Neural Network-related events:


                             Plenary Lecture by 

                      S t e p h e n   G r o s s b e r g :

           "Neural Networks for Learning, Recognition and Prediction"

                         Wednesday, April 6, 9:00 a.m.,
                            University of Vienna, 
                           Main Building, Room 47


                                  Symposium 

             A r t i f i c i a l   N e u r a l   N e t w o r k s
                                    a n d
                        A d a p t i v e   S y s t e m s

                                Chairpersons: 
                  S.Grossberg, USA, and G.Dorffner, Austria 

                  Tuesday, April 5, and Wednesday, April 6, 
                   Univ. of Vienna, Main Building, Room 47


Tuesday, April 5:

14.00-14.30: Synchronization in a Large Neural Network of Phase Oscillators
             with the Central Element
             Y.Kazanovich, Russian Academy of Sciences, Moscow, Russia

14.30-15.00: Synchronization in a Neural Network Model with Time Delayed 
             Coupling
             T.B.Luzyanina, Russian Academy of Sciences, Moscow, Russia

15.00-15.30: Reinforcement Learning in a Network Model of the Basal Ganglia
             R.M.Borisyuk, J.R.Wickens, R.Koetter, University of Otago,
             New Zealand

Wednesday, April 6:

11.00-11.30: Adaptive High Performance Classifier Based on Random 
             Threshold Neurons
             E.M.Kussul, T.N.Baidyk, V.V.Lukovich, D.A.Rachkovskij, 
             Ukrainian Academy of Science, Kiev, Ukraine

11.30-12.00: Dynamics of Ordering for One-dimensional Topological Mappings
             R.Folk, A.Kartashov, University of Linz, Austria

12.00-12.30: Informational Properties of Willshaw-like Neural Networks 
             Capable of Autoassociative Learning
             A.Kartashov, R.Folk, A.Goltsev, A.Frolov, University of Linz,
             Austria

12.30-13.00: Relaxing the Hyperplane Assumption in the Analysis and 
             Modification of Back-propagation Neural Networks
             L.Y.Pratt, A.N.Christensen, Colorado School of Mines, 
             Golden, CO, USA

14.00-14.30: Improving Discriminability Based Transfer by Modifying the 
             IM Metric to Use Sigmoidal Activations
             L.Y.Pratt, V.I.Gough, Colorado School of Mines, 
             Golden, CO, USA

14.30-15.00: Order-theoretic View of Families of Neural Network 
             Architectures
             M.Holena, University of Paderborn, Germany

15.00-15.30: A New Class of Neural Networks: Recognition Invariant to 
             Arbitrary Transformation Groups
             A.Kartashov, K.Erman, University of Linz, Austria

16.00-16.30: Neural Assembly Architecture for Texture Recognition
             A.Goltsev, A.Kartashov, R.Folk, University of Linz, 
             Austria

16.30-17.00: A Neural System for Character Recognition on Isovalue Maps
             E.P.L.Passos, L.E.S.Varella, M.A.Santos, R.L.de Araujo, 
             Engineering Military Institute, Rio de Janeiro, Brazil

17.00-17.30: Neurocomputing Model Inference for Nonlinear Signal 
             Processing
             Z.Zografski, T.Durrani, University of Strathclyde, 
             Glasgow, United Kingdom

17.30-18.00: Learning from Examples and VLSI Implementation of 
             Neural Networks
             V.Beiu, J.A.Peperstraete, J.Vandewalle, R.Lauwereins, 
             Catholic University of Leuven, Heverlee, Belgium


For more information please contact: sec at ai.univie.ac.at

From ZECCHINA at to.infn.it  Mon Feb 28 13:22:01 1994
From: ZECCHINA at to.infn.it (Riccardo Zecchina - tel.11-5647358, fax. 11-5647399)
Date: Mon, 28 Feb 1994 19:22:01 +0100 (WET)
Subject: role of response functions in ANN's.
Message-ID: <940228192201.20800db9@to.infn.it>


FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/zecchina.response.ps.Z

The file zecchina.response.ps.Z is available for copying from the Neuroprose
repository:

"Response Functions Improving Performance in Analog Attractor Neural
Networks"
N .Brunel, R. Zecchina
(13 pages, to appear in Phys. Rev. E Rapid Comm.)

ABSTRACT: In the context of attractor neural networks, we study how the
equilibrium analog neural activities, reached by the network dynamics during
memory retrieval, may improve storage performance by reducing the interferences
between the recalled pattern and the other stored ones. We determine a simple
dynamics that stabilizes network states which are highly correlated with the
retrieved pattern, for a number of stored memories that does not exceed
$\alpha_{\star} N$, where $\alpha_{\star}\in[0,0.41]$ depends on the global
activity level in the network and $N$ is the number of neurons. 


From andre at physics.uottawa.ca  Mon Feb 28 12:13:53 1994
From: andre at physics.uottawa.ca (Andre Longtin)
Date: Mon, 28 Feb 94 12:13:53 EST
Subject: Hebb Symposium
Message-ID: <9402281713.AA23088@miro.physics.uottawa.ca.physics.uottawa.ca>


          *******   Preliminary Announcement   *******

 THE FIELDS INSTITUTE FOR RESEARCH IN MATHEMATICAL SCIENCES

      HEBB SYMPOSIUM ON NEURONS AND BIOLOGICAL DYNAMICS

           Sunday, May 15 to Friday May 20, 1994
               Koffler Pharmaceutical Center
                  University of Toronto

D.O. Hebb's classic, "The Organization of Behavior" published in 1949,
sketched out how behavior might emerge from the properties of nerve cells
and assemblies of nerve cells.  This book was a landmark achievement in
neurophysiological psychology.  The modifiable synapse, discussed at length
by Hebb and now known as the "Hebb synapse", was a lasting contribution. 
Hebb was from Nova Scotia and spent most of his professional life at McGill
in the Psychology Department.  We are having this symposium in his honor. 
Topics will range from cellular level to systems level, with an eye towards
interesting dynamics and connections between dynamics and functions.  We
will bring together physiological and mathematical researchers with some
didactic and research talks oriented towards graduate students and
postdoctoral fellows. 

SCIENTIFIC PROGRAM:

Lectures will be presented by Nancy Kopell (Boston University) and David
Mumford (Harvard) in the Institute's Distinguished Lecture Series.

Invited talks by Larry Abbott (Brandeis), *Moshe Abeles 
(Hebrew U., Jerusalem), Harold Atwood (U. Toronto), David Brillinger 
(Berkeley), Jos Eggermont (U. Calgary), Bard Ermentrout (U. Pittsburg), 
Leon Glass (McGill), Ilona Kovacs (Rutgers), Gilles Laurent (Caltech), 
Andre Longtin (U. Ottawa), Leonard Maler (U. Ottawa), Karl Pribram 
(Radford U.), Paul Rapp (Med. Coll. Penn.), John Rinzel (NIH), 
Mike Shadlin (Stanford), Matt Wilson (Tucson), Martin Wojtowicz 
(U. Toronto), Steve Zucker (McGill).

Invited Attendees: Jose Segundo (UCLA), Alessandro Villa (Lausanne)

The meeting will emphasize poster sessions as well as discussion groups 
where participants can give short oral presentations of their work. 

(*=tentative)

TOPICS

Larry Abbott: Population vectors and Hebbian learning
Moshe Abeles: Information processing of synchronized activity
Harold Atwood: Synaptic transmission and plasticity
David Brillinger: Statistical analysis of neurophysiological data
Jos Eggermont: Spatial and temporal interactions in auditory cortex 
Bard Ermentrout: Patterns in visual cortex 
Leon Glass: Nonlinear dynamics of neural networks
Ilona Kovacs: Visual psychophysics/perceptual organization   
Gilles Laurent: Oscillations in olfaction
Andre Longtin: Stochastic nonlinear dynamics of sensory transduction
Leonard Maler: Bursting and recurrent feedback in electroreception
Karl Pribram: Behavioral neurodynamics
Paul Rapp: Dynamical characterization of neurological data 
John Rinzel: Thalamic rhythmogenesis in sleep and epilepsy
Mike Shadlin: Analysis of visual motion
Matt Wilson: Behaviorally induced changes in hippocampal connectivity
Martin Wojtowicz: Membranes, channels and synapses
Steve Zucker: Neural networks and visual computations
  

IMPORTANT DATES:

Monday April 11:    Last date to return questionnaire
Friday April 22:    Cut-off for registrations and Deadline 
                    for hotel/residence booking
Sunday May 15:      Arrival and registration (9 am - 12 noon)
Sunday May 15 to
 Friday May 20      Scientific program (ending Friday noon)

INFORMATION ON SCIENTIFIC PROGRAM:
David Brillinger (brill at stat.berkeley.edu)
Andre Longtin (andre at physics.uottawa.ca)

REGISTRATION AND ORGANIZATIONAL INFORMATION:
To receive registration information, please fill out the questionnaire
below and return it to:
                 Sheri Albers
                 The Fields Institute
                 185 Columbia St. W.
                 Waterloo, Ontario, Canada N2L 5Z5
                 Phone: (519) 725-0096
                 Fax: (519) 725-0704
                 e-mail: hebb at fields.uwaterloo.ca


-------------------------------------------------------------

              *******   Questionnaire   *******

       TO BE COMPLETED BY ANYONE WISHING TO ATTEND THE
      HEBB SYMPOSIUM ON NEURONS AND BIOLOGICAL DYNAMICS

Name:
Institution:
Department:
Address:

Phone:
Fax:
E-mail:

I plan to attend:  Yes ( )  No ( )  Maybe ( )

I plan to participate in the discussion groups: Yes ( )  No ( )  Maybe ( )
I plan to present a poster:   Yes ( )  No ( )  Maybe ( )

Topic or tentative title:


Arrival and departure dates (if other than May 14-20):


FAX TO: (519)725-0704 or e-mail: hebb at fields.uwaterloo.ca


From ling at csd.uwo.ca  Tue Feb  1 03:37:10 1994
From: ling at csd.uwo.ca (Charles X. Ling)
Date: Tue, 1 Feb 94 03:37:10 EST
Subject: some questions on training neural nets...
Message-ID: <9402010837.AA01695@godel.csd.uwo.ca>

Hi neural net experts,

I am using backprop (and variations of it) quite often although I have
not followed neural net (NN) research as well as I wanted. Some rather 
basic issues in training NN still puzzle me a lot, and I hope to get advice 
and help from the experts in the area. Sorry for being ignorant.

Say we are learning a function F (such as a Boolean function of n vars).
The training set (TR) and testing set (TS) are drawn randomly according to
the same probability distribution, with no noise added in.

1. Is it true that, since there is no noise, the smaller the training error
on TR, the better it would predict in general on TS? That is, stopping 
training earlier is not needed (so cross-validation is not needed).

2. Is it true that, to get reliable prediction (good or bad), we should
always choose net architecture with a minimum number of hidden units 
(or weights via weight decaying)? Will cross-validation help if we have
too much freedom in the net (could results on the validation set be coincident)?

3. If, for some reason, cross-validation is needed, and TR is split to
TR1 (for training) and TR2 (for validation), what would be the proper ways
to do cross-validation? Training on TR1 uses only partial information in 
TR, but training TR1 to find right parameters and then training on TR1+TR2 
may require parameters different from the estimation of training TR1. 

4. In case the net has too much freedom (even different random seeds
produce very different predictive accuracies), how can we effectively 
reduce the variations? Weight decaying seems to be a powerful tool, any others?
What kind of "simple" functions weight decaying is biased to?

Thanks very much for help
Charles

From marwan at sedal.sedal.su.OZ.AU  Tue Feb  1 21:07:09 1994
From: marwan at sedal.sedal.su.OZ.AU (Marwan Jabri)
Date: Tue, 1 Feb 94 21:07:09 EST
Subject: job openning
Message-ID: <9402011007.AA09253@sedal.sedal.su.OZ.AU>

The advertisment below could be of interest to a person with Unix and
connectionism skills.

---------------------------------------------------------------------
	Systems Engineering and Design Automation Laboratory
	Sydney University Electrical Engineering

	Computer Systems Officer (in other words, a software engineer!)
	Reference No:	B04/17


Applications are invited for the position of Computer Systems Officer
with the Systems Engineering and Design Automation Laboratory (SEDAL) 
at Sydney University Electrical Engineering. 

The position is aimed at:

- Supporting the administration of a computer network
  (Sun and DEC workstations);

- Developing software in the areas of neural computing,
  video coding and parallel computers.

The appointee must have knowledge and experience of C programming under
Unix, DOS and Windows, and a degree in electronics or computer science.
Experience in the areas of neural computing and/or video coding is highly
desirable.

Appointment will be for one year in the first instance with the
possibility of renewal for up to a further four years subject to need and
funding.

Further information from Marwan Jabri on (+61-2) 692 2240, fax (+61-2) 660
1228 or email: marwan at sedal.su.oz.au.

Salary:		Level 5   $28,899 - $32,598 per annum
Closing:  	10 February 1994

To apply, an application quoting reference number, including CV,
qualifications and the names, addresses, phone numbers and email addresses
of two referees should be sent to

	Personnel Officier
	Personnel Services K07
	The University of Sydney
	NSW 2006 Australia

----------------------------------------------------------------------
Equal opportunity and no smoking in the workplace are University Policies.
The University Resevers the right not to proceed with an appointment for
financial or other reasons.

From prechelt at ira.uka.de  Tue Feb  1 09:08:12 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 01 Feb 1994 15:08:12 +0100
Subject: donations to bibliography server
Message-ID: <"irafs2.ira.632:01.01.94.14.08.31"@ira.uka.de>


A colleague of mine here at University of Karlsruhe is currently
building a large bibliographic database that is available free of
charge on the internet.
It currently contains about 210000 entries from various fields of
computer science (mostly parallel processing, graphics, theoretical
computer science, computational geometry, human computer interaction)
Although there are several thousand entries on Artificial Intelligence topics,
connectionism is not covered very well yet (Neural Computation's contents
are present and some personal bibliographies).

To extend this database by at least some basic information about
neural network and other connectionist research, it would be fine if
somebody could donate bibliographies on these topics which are 
(almost) comprehensive in some respect.

In particular, I think it would be a very good start to have complete
contents of NIPS, IJCNN, and Neural Networks (and perhaps, other journals
such as Complex Systems).

   If anybody is able and willing to donate such bibliographies,
   please send me email.

BibTeX format would be best, but refer or other parsable formats are OK, too.

For information on the bibliography service, send mail with a single line
containing the word 'help' in the body to bibserv at ira.uka.de
[ The query service is still in a test stage and is not yet available to
  people located outside email domain '.de' (Germany)
  due to resource restrictions.
  The bibliographies themselves, however, are available for anonymous ftp
  from ftp.ira.uka.de:/pub/bibliography ]

  Lutz

Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From schraudo at salk.edu  Tue Feb  1 03:04:05 1994
From: schraudo at salk.edu (Nici Schraudolph)
Date: Tue, 1 Feb 94 00:04:05 PST
Subject: Neural Computation BibTeX database available
Message-ID: <9402010804.AA02809@salk.edu>

I've made a database of BibTeX entries for all articles published in the
first five volumes of the journal Neural Computation; it's available by
anonymous ftp from mitpress.mit.edu (18.173.0.28), file NC.bib.Z in the
pub/NeuralComp directory.

Share and enjoy,

- Nici Schraudolph.


From prechelt at ira.uka.de  Wed Feb  2 04:12:48 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 10:12:48 +0100
Subject: Techreport on CuPit available
Message-ID: <"irafs2.ira.960:02.01.94.09.13.09"@ira.uka.de>


The technical report

 Lutz Prechelt: "CuPit --- A Parallel Language for Neural Algorithms:
                 Language Reference and Tutorial"

is now available for anonymous ftp from
ftp.ira.uka.de /pub/uni-karlsruhe/papers/cupit.ps.gz (154 Kb, 75 pages)

It is NOT on neuroprose, because its topic does not quite fit into
neuroprose's scope.


Abstract:
----------
CuPit is a parallel programming language with two main design goals:
1. to allow the simple, problem-adequate formulation of learning
   algorithms for neural networks with focus on algorithms that change
   the topology of the underlying neural network during the learning
   process and
2. to allow the generation of efficient code for massively parallel
   machines from a completely machine-independent program description, in
   particular to maximize both data locality and load balancing even for
   irregular neural networks.
The idea to achieve these goals lies in the programming model:
CuPit programs are object-centered, with connections and nodes of a
graph (which is the neural network) being the objects. Algorithms are
based on parallel local computations in the nodes and connections and
communication along the connections (plus broadcast and reduction
operations).  
This report describes the design considerations and the resulting
language definition and discusses in detail a tutorial example
program.
----------

Remember to use 'binary' mode for ftp. To uncompress the Postscript file,
you need to have the GNU gzip utility.

  Lutz
  
Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  D-76128 Karlsruhe;  Germany    | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <"irafs2.ira.708:02.01.94.08.59.37"@ira.uka.de>


I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value, the relevant one
    being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone transformation
    to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

  Lutz

Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From marshall at cs.unc.edu  Wed Feb  2 12:41:49 1994
From: marshall at cs.unc.edu (Jonathan A. Marshall)
Date: Wed, 2 Feb 94 12:41:49 -0500
Subject: Papers on visual occlusion and neural networks
Message-ID: <9402021741.AA17887@marshall.cs.unc.edu>

Dear Colleagues,

Below I list two new papers that I have added to the Neuroprose archives
(thanks to Jordan Pollack!).  In addition, I list two of my older papers
in Neuroprose.  You can retrieve a copy of these papers -- follow the
instructions at the end of this message.

--Jonathan

----------------------------------------------------------------------------

marshall.occlusion.ps.Z  (5 pages)

	      A SELF-ORGANIZING NEURAL NETWORK THAT LEARNS TO
	  DETECT AND REPRESENT VISUAL DEPTH FROM OCCLUSION EVENTS
				      
		JONATHAN A. MARSHALL  and  RICHARD K. ALLEY
				      
	  Department of Computer Science, CB 3175, Sitterson Hall
      University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A.
		   marshall at cs.unc.edu, alley at cs.unc.edu

Visual occlusion events constitute a major source of depth information.  We
have developed a neural network model that learns to detect and represent
depth relations, after a period of exposure to motion sequences containing
occlusion and disocclusion events.  The network's learning is governed by a
new set of learning and activation rules.  The network develops two parallel
opponent channels or "chains" of lateral excitatory connections for every
resolvable motion trajectory.  One channel, the "On" chain or "visible"
chain, is activated when a moving stimulus is visible.  The other channel,
the "Off" chain or "invisible" chain, is activated when a formerly visible
stimulus becomes invisible due to occlusion.  The On chain carries a
predictive modal representation of the visible stimulus.  The Off chain
carries a persistent, amodal representation that predicts the motion of the
invisible stimulus.  The new learning rule uses disinhibitory signals
emitted from the On chain to trigger learning in the Off chain.  The Off
chain neurons learn to interact reciprocally with other neurons that
indicate the presence of occluders.  The interactions let the network
predict the disappearance and reappearance of stimuli moving behind
occluders, and they let the unexpected disappearance or appearance of
stimuli excite the representation of an inferred occluder at that location.
Two results that have emerged from this research suggest how visual systems
may learn to represent visual depth information.  First, a visual system can
learn a nonmetric representation of the depth relations arising from
occlusion events.  Second, parallel opponent On and Off channels that
represent both modal and amodal stimuli can also be learned through the same
process.

[In Bowyer KW & Hall L (Eds.), Proceedings of the AAAI Fall Symposium on
 Machine Learning and Computer Vision, Research Triangle Park, NC, October
 1993, 70-74.]

----------------------------------------------------------------------------

marshall.context.ps.Z  (46 pages)

		  ADAPTIVE PERCEPTUAL PATTERN RECOGNITION
		    BY SELF-ORGANIZING NEURAL NETWORKS:
	       CONTEXT, UNCERTAINTY, MULTIPLICITY, AND SCALE
				      
			    JONATHAN A. MARSHALL

	  Department of Computer Science, CB 3175, Sitterson Hall
      University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A.
			    marshall at cs.unc.edu

A new context-sensitive neural network, called an "EXIN" (excitatory+
inhibitory) network, is described.  EXIN networks self-organize in complex
perceptual environments, in the presence of multiple superimposed patterns,
multiple scales, and uncertainty.  The networks use a new inhibitory
learning rule, in addition to an excitatory learning rule, to allow
superposition of multiple simultaneous neural activations (multiple
winners), under strictly regulated circumstances, instead of forcing
winner-take-all pattern classifications.  The multiple activations represent
uncertainty or multiplicity in perception and pattern recognition.
Perceptual scission (breaking of linkages) between independent category
groupings thus arises and allows effective global context-sensitive
segmentation and constraint satisfaction.  A Weber Law neuron-growth rule
lets the network learn and classify input patterns despite variations in
their spatial scale.  Applications of the new techniques include
segmentation of superimposed auditory or biosonar signals, segmentation of
visual regions, and representation of visual transparency.

[Submitted for publication.]

----------------------------------------------------------------------------

marshall.steering.ps.Z  (16 pages)

    CHALLENGES OF VISION THEORY:  SELF-ORGANIZATION OF NEURAL MECHANISMS
  FOR STABLE STEERING OF OBJECT-GROUPING DATA IN VISUAL MOTION PERCEPTION

			    JONATHAN A. MARSHALL

[Invited paper, in Chen S-S (Ed.), Stochastic and Neural Methods in Signal
 Processing, Image Processing, and Computer Vision, Proceedings of the SPIE
 1569, San Diego, July 1991, 200-215.]

----------------------------------------------------------------------------

martin.unsmearing.ps.Z  (8 pages)

			 UNSMEARING VISUAL MOTION:
	 DEVELOPMENT OF LONG-RANGE HORIZONTAL INTRINSIC CONNECTIONS

		 KEVIN E. MARTIN  and  JONATHAN A. MARSHALL

[In Hanson SJ, Cowan JD, & Giles CL, Eds., Advances in Neural Information
 Processing Systems, 5.  San Mateo, CA: Morgan Kaufmann Publishers, 1993,
 417-424.]

----------------------------------------------------------------------------


RETRIEVAL INSTRUCTIONS

    % ftp archive.cis.ohio-state.edu
    Name (cheops.cis.ohio-state.edu:yourname): anonymous
    Password: (use your email address)
    ftp> cd pub/neuroprose
    ftp> binary
    ftp> get marshall.occlusion.ps.Z
    ftp> get marshall.context.ps.Z
    ftp> get marshall.steering.ps.Z
    ftp> get martin.unsmearing.ps.Z
    ftp> quit
    % uncompress marshall.occlusion.ps.Z ; lpr marshall.occlusion.ps
    % uncompress marshall.context.ps.Z ;   lpr marshall.context.ps
    % uncompress marshall.steering.ps.Z ;  lpr marshall.steering.ps
    % uncompress martin.unsmearing.ps.Z ;  lpr martin.unsmearing.ps


From tgd at chert.CS.ORST.EDU  Wed Feb  2 13:02:30 1994
From: tgd at chert.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 2 Feb 94 10:02:30 PST
Subject: some questions on training neural nets...
In-Reply-To: "Charles X. Ling"'s message of Tue, 1 Feb 94 03:37:10 EST <9402010837.AA01695@godel.csd.uwo.ca>
Message-ID: <9402021802.AA00565@curie.CS.ORST.EDU>

   From: "Charles X. Ling" <ling at csd.uwo.ca>
   Date: Tue, 1 Feb 94 03:37:10 EST

   Hi neural net experts,

   I am using backprop (and variations of it) quite often although I have
   not followed neural net (NN) research as well as I wanted. Some rather 
   basic issues in training NN still puzzle me a lot, and I hope to get advice 
   and help from the experts in the area. Sorry for being ignorant.

   Say we are learning a function F (such as a Boolean function of n vars).
   The training set (TR) and testing set (TS) are drawn randomly according to
   the same probability distribution, with no noise added in.

   1. Is it true that, since there is no noise, the smaller the training error
   on TR, the better it would predict in general on TS? That is, stopping 
   training earlier is not needed (so cross-validation is not needed).

No, this is not true.  Even in the noise-free case, the bias/variance
tradeoff is operating and it is possible to overfit the training data.
Consider for example an algorithm that just memorized the training set
and guessed "false" on all unseen examples.  It has obviously overfit,
and it will obviously do poorly even in the absence of noise.

   2. Is it true that, to get reliable prediction (good or bad), we should
   always choose net architecture with a minimum number of hidden units 
   (or weights via weight decaying)? Will cross-validation help if we have
   too much freedom in the net (could results on the validation set be coincident)?

There are many ways to manage the bias/variance tradeoff.  I would say
that there is nothing approaching complete agreement on the best
approaches (and more fundamentally, the best approach varies from one
application to another, since this is really a form of prior).  The
approaches can be summarized as

* early stopping
* error function penalties
* size optimization
  - growing
  - pruning
  - other

Early stopping usually employs cross-validation to decide when to stop
training.  (see below).  In my experience, training an overlarge
network with early stopping gives better performance than trying to
find the minimum network size.  It has the disadvantage that training
costs are very high.

Error function penalties such as weight decay and soft weight-sharing
have been very effective in some applications. In my experience, they
introduce additional training problems, because the error surface can
develop more local minima.  A solution to this is to gradually
increase the penalties during training, but this requires more
hands-on work than I have patience for.

Size optimization attempts to find the optimal number of units and/or
number of weights.  Cascade-correlation and related algorithms grow
the network, optimal brain damage and optimal brain surgeon prune the
network, and then of course one can use cross-validation and just
generate-and-test different network sizes.  An advantage of
"right-sizing" is that training time can be considerably reduced (at
least the time per epoch).  A problem with right-sizing, I believe, is
that simply counting units or weights is not necessarily a good
measure of network size.  The work by Weigend (see 1993 summer school
proceedings) suggests that early stopping provides a better method for
modulating the effective number of parameters in the network.  The
OBD/OBS methods do not "just count weights", but instead assess the
significance of the weights, so even non-zero weights that are useless
can be removed.

   3. If, for some reason, cross-validation is needed, and TR is split to
   TR1 (for training) and TR2 (for validation), what would be the proper ways
   to do cross-validation? Training on TR1 uses only partial information in 
   TR, but training TR1 to find right parameters and then training on TR1+TR2 
   may require parameters different from the estimation of training TR1. 

I use the TR1+TR2 approach.  On large data sets, this works well.  On
small data sets, the cross-validation estimates themselves are very
noisy, so I have not found it to be as successful.  I compute the
stopping point using the sum squared error per training example, so
that it scales.  I think it is an open research problem to know
whether this is the right thing to do.  On a large speech recognition
data set, after doing cross-validation training, we later checked to
see if we had stopped at the right point (by monitoring using the test
set).  The cross-validation point was nearly exactly right.  This was
a case with a large data set.

   4. In case the net has too much freedom (even different random seeds
   produce very different predictive accuracies), how can we effectively 
   reduce the variations? Weight decaying seems to be a powerful tool, any others?
   What kind of "simple" functions weight decaying is biased to?


   Thanks very much for help
   Charles

--Tom

From karun at faline.bellcore.com  Thu Feb  3 10:15:55 1994
From: karun at faline.bellcore.com (N. Karunanithi)
Date: Thu, 3 Feb 1994 10:15:55 -0500
Subject: Encoding missing values
Message-ID: <199402031515.KAA29100@faline.bellcore.com>


> I am currently thinking about the problem of how to encode data with
> a ttributes for which some of the values are missing in the data set for
> neural network training and use.

I am also having the same problem. I would like to get a copy
responses.

>1. Nominal attributes (that have n different possible values)
>  1.1 encoded "1-of-n", i.e., one network input per possible value, the relevant one
>    being 1 all others 0.
>      This encoding is very general, but has the disadvantage of producing
>      networks with very many connections.
>      Missing values can either be represented as 'all zero' or by simply
>      treating 'is missing' as just another possible input value, resulting
>      in a "1-of-(n+1)" encoding.
>  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
>    binary representation of the numbers 0...n-1 (or 1...n).
>      Missing values can either be represented as just another possible input
>      value (probably all-bits-zero is best) or by adding an additional network
>      input which is 1 for 'is missing' and 0 for 'is present'. The original
>      inputs should probably be all zero in the 'is missing' case.
>

   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

>2. continuous attributes (or attributes treated as continuous)
>  2.1 encoded as a single network input, perhaps using some monotone transformation
>    to force the values into a certain distribution.
>      Missing values are either encoded as a kind of 'best guess' (e.g. the
>      average of the non-missing values for this attribute) or by using
>      an additional network input being 0 for 'missing' and 1 for 'present' 
>      (or vice versa) and setting the original attribute input either to 0
>      or to the 'best guess'. (The 'best guess' variant also applies to
>      nominal attributes above)

This representation requires GUESS. A nominal tranformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical intervel. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. May be a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

> 3. binary attributes (truth values)
>   3.1 encoded by one input:  0=false  1=true   or vice versa
>       Treat like (2.1)
>   3.2 encoded by one input:  -1=false 1=true   or vice versa
>       In this case we may act as for (3.1) or may just use 0 to indicate 'missing'.
>   3.3 treat like nominal attribute with 2 possible values

No comments.

> 4. ordinal attributes (having n different possible values, which are ordered)
>   4.1 treat either like continuous or like nominal attribute.
>     If (1.2) is chosen, a Gray-Code should be used.
>     Continuous representation is risky unless a 'sensible' quantification
>     of the possible values is available.    

I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

(Refer to: N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

  N. Karunanithi and Y. K. Malaiya, "The Scaling Problem in Neural
     Networks for Software Reliability Prediction", Proc. IEEE Int.
    Symposium on Rel. Eng., Oct. 1992, pp. 776-82.
 )


> So far to my considerations. Now to my questions.
> 
> a) Can you think of other encoding methods that seem reasonable ?  Which ?
> 
> b) Do you have experience with some of these methods that is worth sharing ?
> 
> c) Have you compared any of the alternatives directly ?
> 
>   Lutz

 I have not found a simple solution that is general. I think
representation in general and the missing information in specific
are open problems within connectionist research. I am not sure we will
have a magic bullet for all problems. The best approach is to come up
with a specific solution for a given problem.

-Karun


From Thierry.Denoeux at hds.univ-compiegne.fr  Thu Feb  3 03:36:47 1994
From: Thierry.Denoeux at hds.univ-compiegne.fr (Thierry.Denoeux@hds.univ-compiegne.fr)
Date: Thu, 3 Feb 1994 09:36:47 +0100
Subject: Encoding missing values
Message-ID: <199402030836.AA29123@kaa.hds.univ-compiegne.fr>

Dear Lutz, dear connectionists,

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:

X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.


Thierry Denoeux

 
+------------------------------------------------------------------------+
| tdenoeux at hds.univ-compiegne.fr  Thierry DENOEUX                        |
|                                 Departement de Genie Informatique      |
|                                 Centre de Recherches de Royallieu      |
| tel (+33) 44 23 44 96           Universite de Technologie de Compiegne |
| fax (+33) 44 23 44 77           B.P. 649                               |
|                                 60206 COMPIEGNE CEDEX                  |
|                                 France                                 |
+------------------------------------------------------------------------+

From rreilly at nova.ucd.ie  Thu Feb  3 10:38:08 1994
From: rreilly at nova.ucd.ie (Ronan Reilly)
Date: Thu, 3 Feb 1994 15:38:08 +0000
Subject: Fourth Irish Neural Networks Conference - INNC'94
Message-ID: <mailman.654.1149591299.29955.connectionists@cs.cmu.edu>

FOURTH IRISH NEURAL NETWORK CONFERENCE - INNC'94

University College Dublin, Ireland
September 12-13, 1994

FIRST CALL FOR PAPERS

Papers are solicited for the Fourth Irish Neural Network
Conference (INNC'94).  They can be in any area of theoretical
or applied neural networks.  A non-exhaustive list of topic headings 
include:

	Learning algorithms
	Cognitive modelling
	Neurobiology
	Natural language processing
	Vision
	Signal processing
	Time series analysis
	Hardware implementations

An extended abstract of not more than 500 words should be sent, 
preferably by e-mail, to:

	Ronan Reilly - INNC'94
	Dept. of Computer Science
	University College Dublin
	Belfield
	Dublin 4
	IRELAND

	e-mail: rreilly at nova.ucd.ie

The deadline for receipt of abstracts is March 31, 1994.  Authors will
be contacted regarding acceptance by April 30, 1994.  Full papers will
be required by August 31, 1994.

From finnoff at predict.com  Thu Feb  3 11:40:51 1994
From: finnoff at predict.com (William Finnoff)
Date: Thu, 3 Feb 94 09:40:51 MST
Subject: some questions on training neural nets...
Message-ID: <9402031640.AA01243@predict.com>

 Charles X. Ling writes:   

>   Hi neural net experts,
>
>   I am using backprop (and variations of it) quite often although I have
>   not followed neural net (NN) research as well as I wanted. Some rather 
>   basic issues in training NN still puzzle me a lot, and I hope to get advice 
>   and help from the experts in the area. Sorry for being ignorant....

In addition to Tom's pertinent comments, (tgd at chert.cs.orst.edu, Thu Feb 3) I 
would suggest consulting the following references which contain discussions
of various issues pretaining to /model selection/overfitting/stopped training/
complexity control/bias variance dilema.  (This list is by no means
complete).  References 2), 4), 13), 15) and 17) are particularly relevant
to the questions raised.


1)  Baldi, P. and Chauvin, Y. (1991). Temporal evolution of generalization during learning in linear networks,  {\it Neural Computation} 3, 589-603.                 
2) Finnoff, W., Hergert, F. and Zimmermann, H.G., 
Improving generalization performance by  nonconvergent model selection methods,  {\it Neural Networks}, vol.6, nr.6, pp. 771-783, 1993. 

3)  Finnoff, W. and  Zimmermann, H.G. (1991). Detecting structure in small datasets by network fitting under complexity constraints. To appear in {\it Proc. of 2nd Ann. Workshop on Computational Learning Theory and Natural Learning Systems}, Berkley.                                


4) Geman, S., Bienenstock, E. and Doursat R., (1992). Neural networks and the bias/variance dilemma, {\it Neural Computation} 4, 1-58.
  
5)  Guyon, I., Vapnik, V., Boser, B., Bottou, L. and Solla, S. (1992). Structural risk minimization for character recognition. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman.  

6) Hanson, S. J., and Pratt, L. Y. (1989). Comparing biases for minimal network construction with back-propagation, In  D. S. Touretzky, (Ed.), {\it Advances in Neural Information Processing I} (pp.177-185). San Mateo: Morgan Kaufman.

7)  Hergert, F., Finnoff, W. and Zimmermann, H.G. (1992). A comparison of weight elimination methods                                                           for reducing complexity in neural networks.  {\it Proc. Int. Joint Conf. on Neural Networks}, Baltimore.

8)   Hergert, F., Zimmermann, H.G., Kramer, U., and Finnoff,  W. (1992).
Domain independent testing and performance comparisons for neural networks.  In I. Aleksander and J. Taylor (Eds.) {\it  Artificial Neural Networks II} (pp.1071-1076). London: North Holland.

9)  Le Cun, Y., Denker J. and Solla, S. (1990). Optimal Brain Damage.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605).   San Mateo: Morgan Kaufman.

10)  MacKay, D. (1991). {\it Bayesian Modelling and Neural Networks},  Dissertation, Computational and Neural Systems, California Inst. of Tech. 139-74, Pasadena.    
              
11)  Moody, J. (1992). Generalization, weight decay and architecture selection for nonlinear learning systems. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman. 

 
12) Morgan, N. and Bourlard, H. (1990).  Generalization and parameter estimation in feedforward nets: Some experiments.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605). San Mateo: Morgan Kaufman.  

13) Sj\"oberg, J. and  Ljung, L. (1992). Overtraining, regularization and searching for minimum in neural networks, {Report LiTH-ISY-I-1297, Dep. of Electrical Engineering}, Link\"oping University, S-581 83 Link\"oping, Sweden.                                       
14) Stone, C.J. (1977).  Cross-validation: A review. {\it Math. Operations res. Statist. Ser.}, 9, 1-51.


15)  Vapnik, V. (1992). Principles of risk minimization for learning theory.  In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 831-838 ). San Mateo: Morgan Kaufman.                                 

16)  Weigend, A. and  Rumelhart, D. (1991). The effective dimension of the space of hidden units, in {\it Proc. Int. Joint Conf. on Neural Networks}, Singapore. 
17)  Weigend, A.,  Rumelhart, D., and Huberman, B. (1991). Generalization by weight elimination with application to forecasting.  In  R. Lippman, J. Moody and D. Touretzy (Eds.), {\it Advances in Neural Information Processing III} (pp.875-882).  San Mateo:  Morgan Kaufman.                                
                    
18) White, H. (1989). Learning in artificial neural networks: A statistical perspective, {\it Neural Computation} 1, 425-464.


-William


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

William Finnoff
Prediction Co.
320 Aztec St., Suite B
Santa Fe, NM, 87501, USA

Tel.: (505)-984-3123
Fax:  (505)-983-0571

e-mail: finnoff at predict.com

From jlm at crab.psy.cmu.edu  Thu Feb  3 11:27:41 1994
From: jlm at crab.psy.cmu.edu (James L. McClelland)
Date: Thu, 3 Feb 94 11:27:41 EST
Subject: CMU-Pitt Center for the Neural Basis of Cognition
Message-ID: <9402031627.AA08304@crab.psy.cmu.edu.psy.cmu.edu>


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


      Carnegie Mellon University and the University of Pittsburgh

                        Announce the Creation of

             the Center for the Neural Basis of Cognition


The Center is dedicated to the study of the neural basis of cognitive
processes, including learning and memory, language and thought,
perception, attention, and planning; to the study of the development
of the neural substrate of these processes; to the study of disorders
of these processes and their underlying neuropathology; and to the
promotion of applications of the results of these studies to artificial 
intelligence, technology, and medicine.  The Center will synthesize the 
disciplines of basic and clinical neuroscience, cognitive psychology, 
and computer science, combining neurobiological, behavioral, computa-
tional and brain imaging methods.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


                     Faculty Openings in the Center


The Center seeks faculty and research scientists whose work relates to
the mission stated above.  Recruiting is beginning immediately, and
will continue for several years.  Appointments can be at any level and
will be coordinated with one or more departments at either university.

Coordinating departments include Biological Sciences, Computer Science, 
and Psychology at Carnegie Mellon and the departments of Behavioral 
Neuroscience, Neurobiology, Neurology, Psychiatry and Psychology at the 
University of Pittsburgh.  Other affiliations may be possible.

Candidates should send an application to either of the Co-Directors of
the Center, listed below.  The application should include a statement
of interest indicating how the candidate's work fits the mission of the 
center and suggesting possible departmental affiliations, as well as a
CV, copies of publications, and three letters of reference.  Both uni-
versities are EEO/AA Employers.


James L. McClelland                                     Robert Y. Moore
Department of Psychology                        Center for Neuroscience
Baker Hall 345-F                          Biomedical Science Tower 1656
Carnegie Mellon University                     University of Pittsburgh
Pittsburgh, PA 15213                               Pittsburgh, PA 15261

                                                                       
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From wahba at stat.wisc.edu  Thu Feb  3 20:42:28 1994
From: wahba at stat.wisc.edu (Grace Wahba)
Date: Thu, 3 Feb 94 19:42:28 -0600
Subject: nips6 paper on ss-anova in archive
Message-ID: <9402040142.AA06981@hera.stat.wisc.edu>


Dear Colleagues
  Our paper for the 1993 Neural Information Processing 
Society (NIPS) Proceedings is in the neuroprose archive 
under

wahba.nips6.ps.Z

Title: Structured Machine Learning For `Soft' Classification 
with Smoothing Spline ANOVA and Stacked Tuning, Testing
and Evaluation. 

Authors: G. Wahba, Y. Wang, C. Gu, R. Klein and B. Klein

              Summary

We describe the use of smoothing spline analysis of variance 
(SS-ANOVA) in the penalized log likelihood context, 
for learning (estimating) the probability 
$p$ of a `$1$' outcome, given a training set with 
attribute vectors and 0-1 outcomes.
$p$ is of the form $p(t) = e^{f(t)}/(1+e^{f(t)})$, where, 
if $t$ is a vector of attributes, $f$ is learned as a sum of smooth 
functions of one attribute plus a sum of smooth functions 
of two attributes, etc. The smoothing parameters governing 
$f$ are obtained by an iterative unbiased risk or iterative GCV method. 
Confidence intervals for these estimates are available. 
The method is applied to estimate the risk of progression 
of diabetic retinopathy given predictor variables of 
age, body mass index and glycosylated hemoglobin.

RETRIEVAL INSTRUCTIONS for NEUROPROSE ARCHIVE

    % ftp archive.cis.ohio-state.edu
    Name (cheops.cis.ohio-state.edu:yourname): anonymous
    Password: (use your email address)
    ftp> cd pub/neuroprose
    ftp> binary
    ftp> get wahba.nips6.ps.Z
    ftp> quit
    % uncompress wahba.nips6.ps.Z 
    % lpr wahba.nips6.ps

Some other papers of yours truly, friends and students, 
and an idiosyncratic bibliography of possible interest 
to connectionists are available by ftp. Get the (ascii)
file Contents to see what's there.

RETRIEVAL INSTRUCTIONS for WAHBA's public directory

    % ftp ftp.stat.wisc.edu
    Name (ftp.stat.wisc.edu:yournamehere): anonymous
    Password: (use your email address)
    ftp> binary
    ftp> cd pub/wahba
    ftp> get Contents
          ...
read Contents and retrieve files of interest


From pollack at cis.ohio-state.edu  Thu Feb  3 17:17:14 1994
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Thu, 3 Feb 1994 17:17:14 -0500
Subject: new neuroprose/Thesis subdirectory
Message-ID: <199402032217.RAA01292@dendrite.cis.ohio-state.edu>

*** do not forward **

The filesystem on which neuroprose resides has overflowed.
A set of very large files (all the files with *thesis* in their
filename), have been moved to a new subdirectory.

jordan


From bill at nsma.arizona.edu  Thu Feb  3 23:53:26 1994
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Thu, 03 Feb 1994 21:53:26 -0700 (MST)
Subject: Encoding missing values
Message-ID: <9402040453.AA24599@nsma.arizona.edu>


There is at least one kind of network that has no problem (in
principle) with missing inputs, namely a Boltzmann machine.
You just refrain from clamping the input node whose value is
missing, and treat it like an output node or hidden unit.

This may seem to be irrelevant to anything other than Boltzmann
machines, but I think it could be argued that nothing very much
simpler is capable of dealing with the problem.  When you ask
a network to handle missing inputs, you are in effect asking it
to do pattern completion on the input layer, and for this a
Boltzmann machine or some other sort of attractor network would
seem to be required. 

	-- Bill

From tal at goshawk.lanl.gov  Fri Feb  4 10:22:12 1994
From: tal at goshawk.lanl.gov (Tal Grossman)
Date: Fri, 4 Feb 1994 08:22:12 -0700
Subject: some questions on training neural nets...
Message-ID: <199402041522.IAA22945@goshawk.lanl.gov>

Dear Charles X. Ling,

You say: "Some rather basic issues in training NN still puzzle me a lot, 
and I hope to get advice and help from the experts in the area."
Well... the questions you have asked still puzzle the experts as well,
and good answers, where they exist, are very much case dependent.
As Tom Dietterich wrote, in general "Even in the noise-free case, the bias/variance tradeoff is operating and it is possible to overfit the training 
data", therefore you can not expect just any large net to generalize well.


It was also observed recently that...
When having a large enough set of examples (so one can have a good enough sample
for the training and the validation set), you can obtain better generalization with larger nets by using cross validation to decide when to stop training,
as is demonstrated in the paper of A. Weigend :

Weigend A.S. (1994), in the {\em Proc. of the 1993 Connectionist
Models Summer School}, edited by M.C. Mozer, P. Smolensky, D.S. Touretzky,
J.L. Elman and A.S. Weigend, pp. 335-342 
(Erlbaum Associates, Hillsdale NJ, 1994).

Rich Caruana has presented similar results in the "Complexity Issues" workshop
in the last NIPS post-conference.

But... Larger networks can generalize as good as, or even better than small
networks even without cross-validation.  A simple experiment that demonstrates
that was presented in :

    T. Grossman, R. Meir and E. Domany,
    Learning by choice of Internal Representations,
    Complex Systems 2,  555-575 (1988).

In that experiment, networks with different number of hidden units were 
trained to perform the symmetry task by using a fraction of the possible
examples as the training set, training the net to 100% performance on the TR set
and testing the performance on the rest  (off training set generalization). 
No early stopping, no cross validation. 
The symmetry problem can be solved by 2 hidden units - 
so this is the minimal architecture required for this specific function. 
However, it was found that it is NOT the best generalizing architecture.
The generalization rates of all the architectures (H=2..N, the size of the input)
were similar, with the larger networks somewhat better.
Now, this is a special case. One can explain it by observing that the symmetry
problem can also be solved by a network of N hidden units, with smaller
weights, and not only by effectively "zeroing" the contributions of all but two
units (see an example in Minsky and Papert's Perceptrons). Probably by
all the other architectures as well. So, considering the mapping from weight
space to function space, it is very likely that training a large network on
partial data will take you closer (in function space) to your target function
F (symmetry in that case) than training a small one.

The picture can be different in other cases...
One has to remember that the training/generalization problem (including the
bias/variance tradeoff problem) is, in general, a complex interaction between
three entities:
1. The target function (or the task).
2. The learning model, and what is the class of functions that is realizable
 by this model (and its associated learning algorithm).
3. The training set, and how well it represents the task.

Even the simple question: is my training set large enough (or good enough) ?
is not simple at all. One might think that it should be larger than, say, twice
the number of free parameters (weights) in your model/network architecture.
It turns out that not even this is enough in general.
Allow me to advertise here the paper presented by A.Lapedes and myself at the 
last NIPS where we present a method to test a "general" classification algorithm
(i.e. any classifier such as a neural net, a decision tree, etc. and its 
 learning algorithm, which may include pruning or net construction) by a method
we call "noise sensitivity signature" NSS (see abstract below). In addition to 
introducing this new model selection method, which we believe can be a good
alternative to cross-validation in data limited cases, we present the following
experiment:  the target function is a network with 20:5:1 architecture (weights
chosen at random). The training set is provided by choosing M random input
patterns and classifying them by the teacher net. we then train other nets
with various architectures, ranging from 1 to 8 hidden units on the training
set (without controlled stopping, but with tolerance in the error function).
A different (and large) set of classified examples is used to determine the
generalization performance of the trained nets (averaged over several 
realizations with different initial weights).

Some of the results are :
1. With different training set sizes M=400,700,1000,  the the optimal
 architecture is different. Smaller training set yields smaller optimal 
 network, according to the independent test set measure.
2. Even with M=1000 (much more than twice the number of weights), the
 optimal learning net is still smaller than the original teacher net.
3. There are differences of up to a few percents in generalization performance
 of the different learning nets for all training set sizes.
 In particular, nets that are larger than the optimal are doing worse with
 size.
 Depends on your problem, a few percents can be insignificant or they can make
 a real difference.  In some real applications, 1-2 % can be the difference
 between a contract or a paper...  In such cases you would like to tune your
 model (i.e to identify the optimal architecture) as best as you can.
4. Using the NSS it was possible to recognize the optimal architectures for 
each training set, without using extra data.

Some conclusions are:  
1. If one uses a validation set to choose the architecture (not for 
stopping) - for example by using the extra 1000 examples - then the architecture
that will be picked up when using the 700 training set is going to be smaller
(and worse) than the one picked up when using the 1000 training set.
In other words, if your data is just a 1000 examples, and you devote 300 of 
them to be your validation set. Then even if those 300 will give a good estimation of the generalization of the trained net, when you choose the model
according to this test set, you end up with the optimal model for 700 training
examples, which is less good than the optimal model that you can obtain when
training with all the 1000 examples.
It means that in many cases you need more examples than one might expect in
order to obtain a well tuned model. Especially if you are using a considerable
fraction of it as a validation set.
2. Using NSS one would find the right architecture for the total number of 
examples you have - paying a factor of about 30 on training effort.
3. You can use "set 1 aside" cross validation in order to select your model.
 This will probably overcome the bias caused by giving up a large fraction of
the examples. However, in order to obtain a reliable estimate of the performance
the training process will have to be repeated many times, probably more than
what is needed in order to calculate the NSS.

It is important to emphasize again:
The above results were obtained for that specific experiment. We have obtained
similar results with different tasks (e.g. DNA structure classification) and
with different learning machines (e.g. decision trees), but still, these
results prove nothing "in general", except may be, that life is complicated
and full of uncertainty... 
A more careful comparison with cross validation as a stopping method, and using
NSS in other scenarios (like function fitting) is under investigation.
If anyone is interested in using the NSS method in combination with pruning
methods (e.g. to test the stopping criteria), I will be glad to help.
I will be grateful for any other information/ref about similar experiments.

I hope all the above did not add too much to your puzzlement.
Good luck with your training,
Tal
------------------------------------------------

The paper I mentioned above is:
Learning Theory seminar:  Thursday Feb.10. 15:15.  CNLS Conference room.

title: Use of Bad Training Data For Better Predictions.

by : Tal Grossman and Alan Lapedes (Complex Systems group, LANL)

Abstract:
We present a method for calculating the ``noise sensitivity signature''
of a learning algorithm which is based on scrambling the output classes of
various fractions of the training data. This signature
can be used to indicate a good (or bad) match between the complexity of the
classifier and the complexity of the data and hence
to improve the predictive accuracy of a classification algorithm.
Use of noise sensitivity signatures is distinctly different from other schemes
to avoid overtraining, such as cross-validation, which uses only part of the
training data, or various penalty functions, which are not data-adaptive.
Noise sensitivity signature methods use all of the training data and
are manifestly data-adaptive and non-parametric.
They are well suited for situations with limited training data

It is going to appear in the Proc. of NIPS 6.   An expanded version of it
will (hopefully) be placed in the neuroprose archive within a week or two.
Until then I can send a ps file of it to the interested.


From sef+ at cs.cmu.edu  Fri Feb  4 10:25:51 1994
From: sef+ at cs.cmu.edu (Scott E. Fahlman)
Date: Fri, 04 Feb 94 10:25:51 EST
Subject: Encoding missing values 
In-Reply-To: Your message of Thu, 03 Feb 94 21:53:26 -0700.
             <9402040453.AA24599@nsma.arizona.edu> 
Message-ID: <mailman.655.1149591299.29955.connectionists@cs.cmu.edu>


    There is at least one kind of network that has no problem (in
    principle) with missing inputs, namely a Boltzmann machine.
    You just refrain from clamping the input node whose value is
    missing, and treat it like an output node or hidden unit.
    
    This may seem to be irrelevant to anything other than Boltzmann
    machines, but I think it could be argued that nothing very much
    simpler is capable of dealing with the problem.  When you ask
    a network to handle missing inputs, you are in effect asking it
    to do pattern completion on the input layer, and for this a
    Boltzmann machine or some other sort of attractor network would
    seem to be required. 
    
Good point, but perhaps in need of clarification for some readers:

There are two ways of training a Boltzmann machine.  In one (the original
form), there is no distinction between input and output units.  During
training we alternate between an instruction phase, in which all of the
externally visible units are clamped to some pattern, and a normalization
phase, in which the whole network is allow to run free.  The idea is to
modify the weights so that, when running free, the external units assume
the various pattern values in the training set in their proper frequencies.
If only some subset of the externally visible units are clamped to certain
values, the net will produce compatible completions in the other units,
again with frequencies that match this part of the training set.

A net trained in this way will (in principle -- it might take a *very* long
time for anything complicated) do what you suggest: Complete an "input"
pattern and produce a compatible output at the same time.  This works even
if the input is *totally* missing.

I believe it was Geoff Hinton who realized that a Boltzmann machine could
be trained more efficiently if you do make a distinction between input and
output units, and don't waste any of the training effort learning to
reconstruct the input.  In this model, the instruction phase clamps both
input and output units to some pattern, while the normalization phase
clamps only the input units.  Since the input units are correct in both
cases, all of the networks learning power (such as it is) goes into
producing correct patterns on the output units.  A net trained in this way
will not do input-completion.

I bring this up because I think many people will only have seen the latter
kind of Boltzmann training, and will therefore misunderstand your
observation.

By the way, one alternative method I have seen proposed for reconstructing
missing input values is to first train an auto-encoder (with some degree of
bottleneck to get generalization) on the training set, and then feed the
output of this auto-encoder into the classification net.  The auto-encoder
should be able to replace any missing values with some degree of accuracy.
I haven't played with this myself, but it does sound plausible.  If anyone
can point to a good study of this method, please post it here or send me
E-mail.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================

From zoubin at psyche.mit.edu  Fri Feb  4 11:04:32 1994
From: zoubin at psyche.mit.edu (Zoubin Ghahramani)
Date: Fri, 4 Feb 94 11:04:32 EST
Subject: Encoding missing values
Message-ID: <9402041604.AA28037@psyche.mit.edu>


Dear Lutz, Thierry, Karun, and connectionists,

I have also been looking into the issue of encoding and learning from
missing values in a neural network. The issue of handling missing
values has been addressed extensively in the statistics literature for
obvious reasons.  To learn despite the missing values the data has to
be filled in, or the missing values integrated over. The basic
question is how to fill in the missing data. There are many different
methods for doing this in stats (mean imputation, regression
imputation, Bayesian methods, EM, etc.). For good reviews see (Little
and Rubin 1987; Little, 1992).

I do not in general recommend encoding "missing" as yet another value
to be learned over. Missing means something in a statistical sense --
that the input could be any of the values with some probability
distribution. You could, for example, augment the original data
filling in different values for the missing data points according to a
prior distribution. Then the training would assign different weights
to the artificially filled-in data points depending on how well they
predict the output (their posterior probability). This is essentially
the method proposed by Buntine and Weigand (1991). Other approaches
have been proposed by Tresp et al. (1993) and Ahmad and Tresp (1993).

I have just written a paper on the topic of learning from incomplete
data. In this paper I bring a statistical algorithm for learning from
incomplete data, called EM, into the framework of nonlinear function
approximation and classification with missing values. This approach
fits the data iteratively with a mixture model and uses that same
mixture model to effectively fill in any missing input or output
values at each step. 

You can obtain the preprint by 
	ftp psyche.mit.edu
	login: anonymous
	cd pub
	get zoubin.nips93.ps
To obtain code for the algorithm please contact me directly.

Zoubin Ghahramani
zoubin at psyche.mit.edu

-----------------------------------------------------------------------
Ahmad, S and Tresp, V (1993) "Some Solutions to the Missing Feature
Problem in Vision." In Hanson, S.J., Cowan, J.D., and Giles, C.L.,
editors, Advances in Neural Information Processing Systems 5. Morgan
Kaufmann Publishers, San Mateo, CA.

Buntine, WL, and Weigand, AS (1991) "Bayesian back-propagation." Complex
Systems. Vol 5 no 6 pp 603-43

Ghahramani, Z and Jordan MI (1994) "Supervised learning from
incomplete data via an EM approach" To appear in Cowan, J.D., Tesauro,
G., and Alspector,J. (eds.). Advances in Neural Information Processing
Systems 6.  Morgan Kaufmann Publishers, San Francisco, CA, 1994.

Little, RJA (1992) "Regression With Missing X's:  A Review." Journal of the
American Statistical Association.  Volume 87, Number 420. pp.
1227-1237

Little, RJA. and Rubin, DB (1987). Statistical Analysis with Missing
Data. Wiley, New York.

Tresp, V, Hollatz J, Ahmad S (1993) "Network structuring and training
using rule-based knowledge." In Hanson, S.J., Cowan, J.D., and
Giles, C.~L., editors,  Advances in Neural Information Processing
Systems 5. Morgan Kaufmann Publishers, San Mateo, CA.

From Volker.Tresp at zfe.siemens.de  Fri Feb  4 13:09:46 1994
From: Volker.Tresp at zfe.siemens.de (Volker Tresp)
Date: Fri, 4 Feb 1994 19:09:46 +0100
Subject: missing data
Message-ID: <199402041809.AA14305@inf21.zfe.siemens.de>


In response to  the questions raised by Lutz Prechelt concerning
the missing data problem:


In general, the solution to the missing-data problem depends on 
the missing-data mechanism. For example, if you sample the income
of a population and rich people tend to refuse the answer the mean
of your sample is biased. To obtain an unbiased solution
you would have to take into account the missing-data mechanism.

The missing-data mechanism can be ignored if it is independent of 
the input and the output (in the example: the likelihood that a 
person refuses to answer is independent of the person's income). 
Most approaches assume that the missing-data mechanism can be ignored.


There exist a number of ad hoc solutions to the missing-data problem
but it is also possible to approach the problem from a statistical point
of view. In our paper (which will be published in the upcoming 
NIPS-volume and which will be available on neuroprose
shortly) we discuss a systematic likelihood-based approach.
NN-regression  can be framed as a maximum likelihood learning problem
if we assume the standard signal plus Gaussian noise model  

P(x, y) =  P(x) P(y|x)    \propto P(x) exp(-1/(2 \sigma^2) (y - NN(x))^2).


By deriving the probability density function for  a pattern with missing
features  we can formulate a likelihood function including patterns 
with complete and incomplete features.  

The solution  requires an  integration over the missing input. 
In practice, the  integral  is  approximated  using a numerical approximation. 
For networks of Gaussian basis functions,  it is possible to obtain 
closed-form solutions (by extending the EM algorithm).

Our paper also discusses why and when ad hoc solutions --such as substituting
the mean for an unknown input--  are  harmful. For example, 
if the mapping is approximately linear substituting the mean might work
quite well. In general, although, it introduces bias. 


Training with missing and noisy input data is described in:

``Training Neural Networks with Deficient Data,''
V. Tresp, S. Ahmad and R. Neuneier, in Cowan, J. D., Tesauro, G., 
and Alspector, J. (eds.), {\em  Advances in Neural Information Processing Systems 6}, Morgan Kaufmann,  1994.

A related paper by Zoubin Ghahramani and Michael Jordan will also appear 
in the  upcoming NIPS-volume.


Recall with missing and noisy data is discussed in (available in neuroprose
as ahmad.missing.ps.Z): 

``Some Solutions to the Missing Feature Problem in Vision,'' 
 S. Ahmad and  V. Tresp,  in  
{\em Advances in Neural Information Processing Systems 5,}
S. J. Hanson, J. D. Cowan,  and C. L. Giles eds.,
San Mateo, CA, Morgan Kaufman,  1993. 


Volker Tresp		Subutai Ahmad		Ralph Neuneier
tresp at zfe.siemens.de	ahmad at interval.com 	ralph at zfe.siemens.de

From wray at ptolemy-ethernet.arc.nasa.gov  Fri Feb  4 15:19:44 1994
From: wray at ptolemy-ethernet.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Feb 94 12:19:44 PST
Subject: Encoding missing values
In-Reply-To: <199402031515.KAA29100@faline.bellcore.com> (karun@faline.bellcore.com)
Message-ID: <9402042019.AA05621@ptolemy.arc.nasa.gov>

regarding this missing value question raised thusly ....
  by Thierry Denoeux, Lutz Prechelt, and others

>>>>>>>>>>>>>>>
> So far to my considerations. Now to my questions.
> 
> a) Can you think of other encoding methods that seem reasonable ?  Which ?
>  
> b) Do you have experience with some of these methods that is worth sharing ?
> 
> c) Have you compared any of the alternatives directly ?
> 
>   Lutz
 + 
>   I have not found a simple solution that is general. I think
>  representation in general and the missing information in specific
>  are open problems within connectionist research. I am not sure we will
>  have a magic bullet for all problems. The best approach is to come up
>  with a specific solution for a given problem.

->  Karun
>>>>>>>>>>

This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.

A classic study in this area, which includes most suggestions
I've read here so far, is
inproceedings{quinlan:ml6,
        AUTHOR = "J.R. Quinlan",
        TITLE = "Unknown Attribute Values in Induction",
        YEAR = 1989,
        BOOKTITLE = "Proceedings of the Sixth International
                        Machine Learning Workshop",
        PUBLISHER = "Morgan Kaufmann",
        ADDRESS = "Cornell, New York"}

The most frequently cited methods I've seen, and they're so common 
amongst the different communities its hard to lay credit:
  1)	 replace missings by their some best guess
  2)     fracture the example into a set of fractional examples
		each with the missing value filled in somehow
  3)     call the missing value another input value

3 is a good thing to do if they are "informative" missing,
i.e.  if someone leaves the entry "telephone number" blank in a 
	questionaire, then maybe they don't have a telephone,
	but probably not good otherwise unless you
	have loads of data and don't mind all the extra
	example types generated (as already mentioned)

1 is a quick and dirty hack at 2.  How good depends on your
application.

2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model".  The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever.  We do it for feed-forward nets in
@article{buntine.weigend:bbp,
        AUTHOR = "W.L. Buntine and A.S. Weigend",
        TITLE =  "Bayesian Back-Propagation",
        JOURNAL = "Complex Systems",
        Volume = 5,
        PAGES = "603--643",
        Number = 1,
        YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.

I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov

From stork at cache.crc.ricoh.com  Fri Feb  4 11:57:37 1994
From: stork at cache.crc.ricoh.com (David G. Stork)
Date: Fri, 4 Feb 94 08:57:37 -0800
Subject: Missing features...
Message-ID: <9402041657.AA12260@neva.crc.ricoh.com>


There is a provably optimal method for performing classification with
missing inputs, described in Chapter 2 of "Pattern Classification and
Scene Analysis" (2nd ed.) by R. O. Duda, P. E. Hart and D. G. Stork,
which avoids the ad-hoc heuristics that have been described by others.
Those interested in obtaining Chapter two via ftp should contact me.

Dr. David G. Stork
Chief Scientist and
Head, Machine Learning and Perception
Ricoh California Research Center
2882 Sand Hill Road  Suite 115
Menlo Park, CA 94025-7022 USA
415-496-5720 (w)
415-854-8740 (fax)
stork at crc.ricoh.com

From wray at ptolemy-ethernet.arc.nasa.gov  Fri Feb  4 15:47:25 1994
From: wray at ptolemy-ethernet.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Feb 94 12:47:25 PST
Subject: some questions on training neural nets...
In-Reply-To: <9402031640.AA01243@predict.com> (message from William Finnoff on Thu, 3 Feb 94 09:40:51 MST)
Message-ID: <9402042047.AA06120@ptolemy.arc.nasa.gov>

Tom Dietterich and William Finnof covered a lot of issues.

I'd just like to highlight two points:
	*   this is a contentious area
	*   there are several opposing factors at play that
		confuse our understanding of this

================  detail

Basically, this comment below is SO true.

>  There are many ways to manage the bias/variance tradeoff.  I would say
>  that there is nothing approaching complete agreement on the best
>  approaches (and more fundamentally, the best approach varies from one
>  application to another, since this is really a form of prior).  The
>  approaches can be summarized as

The bias/variance tradeoff lies at the heart of almost all disagreements
between different learning philosophies such as classical, Bayesian, minimum
description length, resampling schemes (now often viewed as empirical
Bayesian), statistical physics approaches, and the various
"implementation" schemes.

One thing to note is that there are several quite separate forces
in operation here:
	computational and search issues:
		(e.g.  maybe early stopping works better
			because its a more efficient way of
			searching the space of smaller networks ?)
	prior issues:
		(e.g.  have you thrown in 20 attributes you
			happen to think might apply, but probably
			15 are irrelevant;  OR did a medical
			specialist carefully pick all 10 attributes
			and assures you every one is important,
			OR  is a medical specialist able to solve the
			task blind, just be reading the 20 attribute
			values (without seeing the patient), etc.)
		(e.g.  are 30 hidden units adequate for the structure
			of the task? )
	asking the right question:
		(e.g.  sometimes the question:  what's the "best" network
			is a bit silly when you have a small amount of
			data, perhaps you should be trying to find
			10 reasonable alternative networks and pool their
			results (ala.  Michael Perrone's NIPS'93 workshop)
	understanding your representation:
		(e.g.   with rule based systems, each rule has a good
			interpretation so the question of how to
			prune, etc., is something you can understand
			well BUT with a large feed-forward network,
			understanding the structure of the space is more
			involved, e.g.  if I set these 2 weights to zero
			what the hell happens to my proposed solution)
		(e.g.   this confuses the problem of designing
			good regularizes/priors/network-encodings).

Problem is that theory people tend to focus on one, maybe two
of these, whereas application people tend to confuse them together.


Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov


From kak at gate.ee.lsu.edu  Fri Feb  4 17:24:34 1994
From: kak at gate.ee.lsu.edu (Subhash Kak)
Date: Fri, 4 Feb 94 16:24:34 CST
Subject: Encoding missing values
Message-ID: <9402042224.AA23849@gate.ee.lsu.edu>

Missing values in feedback networks raise interesting questions:
Should these values be considered "don't know" values or should 
these be generated in some "most likelihood" fashion? These issues
are discussed in the following paper:

S.C. Kak, "Feedback neural networks: new characteristics and a
generalization", Circuits, Systems, Signal Processing, vol. 12,
no. 2, 1993, pp. 263-278.

-Subhash Kak

From moody at chianti.cse.ogi.edu  Fri Feb  4 18:50:07 1994
From: moody at chianti.cse.ogi.edu (John Moody)
Date: Fri, 4 Feb 94 15:50:07 -0800
Subject: PhD and Masters Programs at the Oregon Graduate Institute
Message-ID: <9402042350.AA19148@chianti.cse.ogi.edu>


Fellow Connectionists:

The Oregon Graduate Institute of Science and Technology (OGI) has
openings for a few outstanding students in its Computer Science
and Electrical Engineering Masters and Ph.D programs in the areas
of Neural Networks, Learning, Signal Processing, Time Series,
Control, Speech, Language, and Vision.

Faculty and postdocs in these areas include Etienne Barnard, Ron
Cole, Mark Fanty, Dan Hammerstrom, Hynek Hermansky, Todd Leen, Uzi
Levin, John Moody, David Novick, Misha Pavel, Joachim Utans, Eric
Wan, and Lizhong Wu. Short descriptions of our research interests
are appended below.

OGI is a young, but rapidly growing, private research institute
located in the Portland area. OGI offers Masters and PhD programs
in Computer Science and Engineering, Applied Physics, Electrical
Engineering, Biology, Chemistry, Materials Science and Engineering,
and Environmental Science and Engineering.

Inquiries about the Masters and PhD programs and admissions for
either Computer Science or Electrical Engineering should be addressed
to:

Margaret Day, Director
Office of Admissions and Records 

Oregon Graduate Institute 

PO Box 91000
Portland, OR 97291

Phone: (503)690-1028
Email: margday at admin.ogi.edu


The final deadline for receipt of all applications materials for
the Ph.D. programs is March 1, 1994, so it's not too late to apply!
Masters program applications are accepted continuously.

	+++++++++++++++++++++++++++++++++++++++++++++++++++++++

	   Oregon Graduate Institute of Science & Technology
            Department of Computer Science and Engineering
       & Department of Electrical Engineering and Applied Physics

     Research Interests of Faculty in Adaptive & Interactive Systems
(Neural Networks, Signal Processing, Control, Speech, Language, and Vision)

       
Etienne Barnard (Assistant Professor):

Etienne Barnard is interested in the theory, design and implementation
of pattern-recognition systems, classifiers, and neural networks.
He is also interested in adaptive control systems -- specifically,
the design of near-optimal controllers for real- world problems
such as robotics.


Ron Cole (Professor):

Ron Cole is director of the Center for Spoken Language Understanding
at OGI. Research in the Center currently focuses on speaker-
independent recognition of continuous speech over the telephone
and automatic language identification for English and ten other
languages. The approach combines knowledge of hearing, speech
perception, acoustic phonetics, prosody and linguistics with neural
networks to produce systems that work in the real world.


Mark Fanty (Research Assistant Professor):

Mark Fanty's research interests include continuous speech recognition
for the telephone; natural language and dialog for spoken language
systems; neural networks for speech recognition; and voice control
of computers.


Dan Hammerstrom (Associate Professor):

Based on research performed at the Institute, Dan Hammerstrom and
several of his students have spun out a company, Adaptive Solutions
Inc., which is creating massively parallel computer hardware for
the acceleration of neural network and pattern recognition
applications.  There are close ties between OGI and Adaptive
Solutions.  Dan is still on the faculty of the Oregon Graduate
Institute and continues to study next generation VLSI neurocomputer
architectures.


Hynek Hermansky (Associate Professor);

Hynek Hermansky is interested in speech processing by humans and
machines with engineering applications in speech and speaker
recognition, speech coding, enhancement, and synthesis. His main
research interest is in practical engineering models of human
information processing.


Todd K. Leen (Associate Professor):

Todd Leen's research spans theory of neural network models,
architecture and algorithm design and applications to speech
recognition. His theoretical work is currently focused on the
foundations of stochastic learning, while his work on Algorithm
design is focused on fast algorithms for non-linear data modeling.


Uzi Levin (Senior Research Scientist):

Uzi Levin's research interests include neural networks, learning
systems, decision dynamics in distributed and hierarchical
environments, dynamical systems, Markov decision processes, and
the application of neural networks to the analysis of financial
markets.


John Moody (Associate Professor):

John Moody does research on the design and analysis of learning
algorithms, statistical learning theory (including generalization
and model selection), optimization methods (both deterministic and
stochastic), and applications to signal processing, time series,
and finance.


David Novick (Assistant Professor):

David Novick conducts research in interactive systems, including
computational models of conversation, technologically mediated
communication, and human-computer interaction. A central theme of
this research is the role of meta-acts in the control of interaction.
Current projects include dialogue models for telephone-based
information systems.


Misha Pavel (Associate Professor):

Misha Pavel does mathematical and neural modeling of adaptive
behaviors including visual processing, pattern recognition, visually
guided motor control, categorization, and decision making.  He is
also interested in the application of these  models to sensor
fusion, visually guided vehicular control, and human-computer
interfaces.


Joachim Utans (Post-Doctoral Research Associate):

Joachim Utans's research interests include computer vision and
image processing, model based object recognition, neural network
learning algorithms and optimization methods, model selection and
generalization, with applications in handwritten character recognition
and financial analysis.


Lizhong Wu (Post-Doctoral Research Associate):

Lizhong Wu's research interests include neural network theory and
modeling, time series analysis and prediction, pattern classification
and recognition, signal processing, vector quantization, source
coding and data compression.  He is now working on the application
of neural networks and nonparametric statistical paradigms to
finance.


Eric A. Wan  (Assistant Professor):

Eric Wan's research interests include learning algorithms and
architectures for neural networks and adaptive signal processing.
He is particularly interested in neural applications to time series
prediction, adaptive control, active noise cancellation, and
telecommunications.


From hicks at cs.titech.ac.jp  Sun Feb  6 17:22:17 1994
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Sun, 6 Feb 94 17:22:17 JST
Subject: Methods for improving generalization (was Re: some questions on ...)
Message-ID: <9402060822.AA11860@maruko.cs.titech.ac.jp>

Dear Mr. Grossman,

	I read with great interest your analysis of overlearning and about
your research into achieving better generalization with less data.

	However, I only want to point out an ommision in your background
despcription.  In the abstract of your paper "Use of Bad Training Data For
Better Predictions" you write:

>Use of noise sensitivity signatures is distinctly different from other schemes
>to avoid overtraining, such as cross-validation, which uses only part of the
>training data, or various penalty functions, which are not data-adaptive.
>Noise sensitivity signature methods use all of the training data and
>are manifestly data-adaptive and non-parametric.

When you say penalty functions the first thing which comes to mind is a
penalty on the sum of squared weights.  This method is indeed not
data-adaptive.  However, an interesting article in Neural Computation 4, pp. 
473-493, "Simplifying Neural Networks by Soft Weight-Sharing" proposes a
weight penalty method which is adaptive.  Basically, the weights are grouped
together in Gaussian clusters whose mean and variance are allowed to adapt to
the data.  The experimental results they published show improvement over both
cross-validation and weight decay.  

I am looking forward to reading your paper when it is available.

Yours Respectfully,


	Craig Hicks

Craig Hicks           hicks at cs.titech.ac.jp | Kore ya kono  Yuku mo kaeru mo
Ogawa Laboratory, Dept. of Computer Science | Wakarete wa   Shiru mo shiranu mo
Tokyo Institute of Technology, Tokyo, Japan |  	    Ausaka no seki        
lab:03-3726-1111 ext.2190 home:03-3785-1974 |  (from hyaku-nin-issyu)
fax: +81(3)3729-0685 (from abroad) 
     03-3729-0685  (from Japan)


From pluto at cs.ucsd.edu  Fri Feb  4 17:01:47 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Fri, 04 Feb 1994 14:01:47 -0800
Subject: some questions on training neural nets...
Message-ID: <9402042201.AA16326@odin.ucsd.edu>


I have another reference to add that may be helpful to those
interested in the cross-validation issue raised in the following 
discussion, which I have edited in what follows to focus on
the particular issue this reference addresses:


------- Forwarded Message


From tgd at chert.CS.ORST.EDU  Wed Feb  2 13:02:30 1994
From: tgd at chert.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 2 Feb 94 10:02:30 PST
Subject: some questions on training neural nets...
In-Reply-To: "Charles X. Ling"'s message of Tue, 1 Feb 94 03:37:10 EST <9402010837.AA01695@godel.csd.uwo.ca>
Message-ID: <9402021802.AA00565@curie.CS.ORST.EDU>


In answer to the following:

   From: "Charles X. Ling" <ling at csd.uwo.ca>
   Date: Tue, 1 Feb 94 03:37:10 EST

   Hi neural net experts,

   Will cross-validation help ? [...]
   (could results on the validation set be coincident)?


Tom Dietterich replies:

			[stuff deleted]

There are many ways to manage the bias/variance tradeoff.  I would say
that there is nothing approaching complete agreement on the best
approaches (and more fundamentally, the best approach varies from one
application to another, since this is really a form of prior).  The
approaches can be summarized as

* early stopping
* error function penalties
* size optimization
  - growing
  - pruning
  - other

Early stopping usually employs cross-validation to decide when to stop
training.  (see below).  In my experience, training an overlarge
network with early stopping gives better performance than trying to
find the minimum network size.  It has the disadvantage that training
costs are very high.

			[stuff deleted]

   3. If, for some reason, cross-validation is needed, and TR is split to
   TR1 (for training) and TR2 (for validation), what would be the proper ways
   to do cross-validation? Training on TR1 uses only partial information in 
   TR, but training TR1 to find right parameters and then training on TR1+TR2 
   may require parameters different from the estimation of training TR1. 

I use the TR1+TR2 approach.  On large data sets, this works well.  On
small data sets, the cross-validation estimates themselves are very
noisy, so I have not found it to be as successful.  I compute the
stopping point using the sum squared error per training example, so
that it scales.  I think it is an open research problem to know
whether this is the right thing to do.  [the reply continues..]

------- End of Forwarded Message


In response to the last point, I supply a reference that provides theoretical
guidance from a statistical perspective.  It proves that cross-validation
estimates Integrated Mean Squared Error (IMSE) within a constant due to
noise.

			What this means:  

IMSE is a version of the mean squared error that accounts for the 
finite size of the training set.  Think of it as the expected squared
error obtained by training a network on random training sets of a 
particular size.   It is an ideal (i.e., in general, unobservable)
measure of generalization.

IMSE embodies the bias and variance tradeoff.   It can be decomposed into
the sum of two terms, which directly quantify the bias + variance. 
Therefore, if IMSE embodies the measure
of generalization that is relevant to you, (which will depend on your
learning task) then, least-squares cross-validation provides a realizable
estimate of generalization. 


		Summary of the main results of the paper:

It proves that two versions of cross-validation
(one being the "hold-out set" version discussed above, and the other
being the "delete-1" version) provide unbiased and strongly consistent
estimates of IMSE  This is statistical jargon meaning that, on
average, the estimate is accurate, (i.e., the expectation
of the estimate for given training set size equals the IMSE + a noise term)
and asymtotically precise (in that as the training set and test set
size grow large, the estimate converges to the IMSE within the
constant factor due to noise, with probability 1.)

Note that it does not say anything about the rate at which the
variance of the estimate converges to the truth; therefore, it is
possible that other IMSE-approximate measures may excel for small
training set sizes (e.g., resampling methods such as bootstrap and
jackknife.)   However, it is the first result generally applicable
to nonlinear regression that the authors are aware of, extending
the well-known (in the statistical and econometric literature)
work by C.J. Stone and others that prove similar results for particular
learning tasks or for particular models.

The statement of the results will appear in NIPS 6.  I will post
the soon-to-be-completed extended version to Neuroprose if anyone 
wants to see it sooner, or need access to the proofs.

I hope this is helpful,

= Mark Plutowski
  Institute for Neural Computation,
  and Department of Computer Science and Engineering
  University of California, San Diego
  La Jolla, California.  USA.


Here is the reference:


Plutowski, Mark~E., Shinichi Sakata, and Halbert White. (1994).
``Cross-validation estimates IMSE.''
Cowan, J.D., Tesauro, G., and Alspector, J. (eds.),
{\em Advances in Neural Information Processing Systems 6},
San Mateo, CA: Morgan Kaufmann Publishers.


From esann at dice.ucl.ac.be  Sun Feb  6 15:19:56 1994
From: esann at dice.ucl.ac.be (esann@dice.ucl.ac.be)
Date: Sun, 6 Feb 94 21:19:56 +0100
Subject: ESANN'94: European Symposium on ANNs
Message-ID: <9402062019.AA07827@ns1.dice.ucl.ac.be>


******************************************************************
*                        European Symposium                      *
*                  on Artificial Neural Networks                 *
*                                                                *
*             Brussels (Belgium) - April 20-21-22, 1994          *
*                                                                *
*            Preliminary Program and registration form           *
******************************************************************

Foreword
********

The actual developments in the field of artificial neural networks mark a
watershed in its relatively young history.  Far from the blind passion for
disparate applications some years ago, the tendency is now to an objective
assessment of this emerging technology, with a better knowledge of the
basic concepts, and more appropriate comparisons and links with classical
methods of computing.

Neural networks are not restricted to the use of back-propagation and
multi-layer perceptrons.  Self-organization, adaptive signal processing,
vector quantization, classification, statistics, image and speech
processing are some of the domains where neural networks techniques may be
successfully used; but a beneficial use goes through an in-depth
examination of both the theoretical basis of the neural techniques and
standard methods commonly used in the specified domain.

ESANN'94 is the second symposium covering these specified aspects of neural
networks computing.  After a successful edition in 1993, ESANN'94 will open
new perspectives, by focusing on theoretical and mathematical aspects of
neural networks, biologically-inspired models, statistical aspects, and
relations between neural networks and both information and signal
processing (classification, vector quantization, self-organization,
approximation of functions, image and speech processing,...).

The steering and program committees of ESANN'94 are pleased to invite you
to participate to this symposium.  More than a formal conference presenting
the last developments in the field, ESANN'94 will be also a forum for open
discussions, round tables and opportunities for future collaborations.  We
hope to have the pleasure to meet you in April, in the splendid town of
Brussels, and that your stay in Belgium will be as scientifically
beneficial as agreeable.


Symposium information
*********************

Registration fees for symposium
-------------------------------
           registration before  registration after
            18th March 1994     18th March 1994
Universities    BEF 14500       BEF 15500
Industries      BEF 18500       BEF 19500

Registration fees include attendance to all sessions, the ESANN'94 banquet,
a copy of the conference proceedings, daily lunches (20-22 April '94), and
coffee breaks twice a day during the symposium.

Advance registration is mandatory.  Young researchers may apply for grants
offered by the European Community (restricted to citizens or residents of a
Western European country or, tentatively, Central or Eastern European
country - deadline for applications: March 11th, 1994 - please write to the
conference secretariat for details).

Advance payments (see registration form) must be made to the conference
secretariat by bank transfers in Belgian Francs (free of charges) or by
sending a cheque (add BEF 500 for processing fees).

Language
--------

The official language of the conference is English.  It will be used for
all printed material, presentations and discussions.

Proceedings
-----------

A copy of the proceedings will be provided to all Conference Registrants.
All technical papers will be included in the proceedings.
Additional copies of the proceedings (ESANN'93 and ESANN'94) may be
purchased at the following rate:
ESANN'94 proceedings:  BEF 2000
ESANN'93 proceedings:  BEF 1500.

Add BEF 500 to any order for p.&p. and/or bank charges.  Please write to
the conference secretariat for ordering proceedings.

Conference dinner
-----------------

A banquet will be offered on Thursday 21th to all conference registrants in
a famous and typical place of Brussels.  Additional vouchers for the
banquet may be purchased on Wednesday 20th at the conference.

Cancellation
------------

If cancellation is received by 25th March 1994, 50% of the registration
fees will be returned.  Cancellation received after this date will not be
entitled to any refund.


General information
*******************

Brussels, Belgium
-----------------

Brussels is not only the host city of the European Commission and of
hundreds of multinational companies; it is also a marvelous historical
town, with typical quarters, famous monuments known throughout the world,
and the splendid "Grand-Place".  It is a cultural and artistic center, with
numerous museums.

Night life in Brussels is considerable.  There are of lot of restaurants
and pubs open late in the night, where typical Belgian dishes can be tasted
with one of the more than 1000 different beers.

Hotel accommodation
-------------------

Special rates for participants to ESANN'94 have been arranged at the
MAYFAIR HOTEL, a De Luxe 4 stars hotel with 99 fully air conditioned guest
rooms, tastefully decorated to the highest standards of luxury and comfort.
The hotel includes two restaurants, a bar and private parking.  Public
transportation (trams n93 & 94) goes directly from the hotel to the
conference center (Parc stop)

Single room                             BEF 2800
Double room or twin room                BEF 3500
Prices include breakfast, taxes and service.  Rooms can only be confirmed
upon receipt of booking form (see at the end of this booklet) and deposit.

Located on the elegant Avenue Louise, the exclusive Hotel Mayfair is a
short walk from the "uppertown" luxurious shopping district.  Also nearby
is the 14th century Cistercian abbey and the magnificent "Bois de la
Cambre" park with its open-air cafes - ideal for a leisurely stroll at the
end of a busy day.
HOTEL MAYFAIR                   tel: +32 2 649 98 00
381 av. Louise                  fax: +32 2 649 22 49
1050 Brussels - Belgium

Conference location
-------------------

The conference will be held at the "Chancellerie" of the Generale de
Banque.  A map is included in the printed programme.
Generale de Banque - Chancellerie
1 rue de la Chancellerie
1000 Brussels - Belgium

Conference secretariat
D facto conference services     tel: + 32 2 245 43 63
45 rue Masui                    fax: + 32 2 245 46 94
B-1210 Brussels - Belgium       E-mail: esann at dice.ucl.ac.be


PROGRAM OF THE CONFERENCE
*************************

Wednesday 20th April 1994
-------------------------

9H30    Registration

10H00   Opening session

Session 1: Neural networks and chaos
Chairman: M. Hasler (Ecole Polytechnique Fdrale de Lausanne, Switzerland)

10H10   "Concerning the formation of chaotic behaviour in recurrent neural
networks"
        T. Kolb, K. Berns
        Forschungszentrum Informatik Karlsruhe (Germany)

10H30   "Stability and bifurcation in an autoassociative memory model"
        W.G. Gibson, J. Robinson, C.M. Thomas
        University of Sidney (Australia)

10H50   Coffee break

Session 2: Theoretical aspects 1
Chairman: C. Jutten (Institut National Polytechnique de Grenoble, France)

11H30   "Capabilities of a structured neural network.  Learning and
comparison with classical techniques"
        J. Codina, J. C. Aguado, J.M. Fuertes
        Universitat Politecnica de Catalunya (Spain)

11H50   "Projection learning: alternative approaches to the computation of
the projection"
        K. Weigl, M. Berthod
        INRIA Sophia Antipolis (France)

12H10   "Stability bounds of momentum coefficient and learning rate in
backpropagation algorithm""
        Z. Mao, T.C. Hsia
        University of California at Davis (USA)

12H30 Lunch

Session 3: Links between neural networks and statistics
Chairman: J.C. Fort (Universit Nancy I, France)

14H00   "Model selection for neural networks: comparing MDL and NIC""
        G. te Brake*, J.N. Kok*, P.M.B. Vitanyi**
        *Utrecht University, **Centre for Mathematics and Computer Science,
Amsterdam (Netherlands)

14H20   "Estimation of performance bounds in supervised classification"
        P. Comon*, J.L. Voz**, M. Verleysen**
        *Thomson-Sintra Sophia Antipolis (France), **Universit Catholique
de Louvain, Louvain-la-Neuve (Belgium)

14H40   "Input Parameters' estimation via neural networks"
        I.V. Tetko, A.I. Luik
        Institute of Bioorganic & Petroleum Chemistry, Kiev (Ukraine)

15H00   "Combining multi-layer perceptrons in classification problems"
        E. Filippi, M. Costa, E. Pasero
        Politecnico di Torino (Italy)

15H20   Coffee break

Session 4: Algorithms 1
Chairman: J. Hrault (Institut National Polytechnique de Grenoble, France)

16H00   "Diluted neural networks with binary couplings: a replica symmetry
breaking calculation of the storage capacity"
        J. Iwanski, J. Schietse
        Limburgs Universitair Centrum (Belgium)

16H20   "Storage capacity of the reversed wedge perceptron with binary
connections"
        G.J. Bex, R. Serneels
        Limburgs Universitair Centrum (Belgium)

16H40   "A general model for higher order neurons"
        F.J. Lopez-Aligue, M.A. Jaramillo-Moran, I. Acedevo-Sotoca, M.G. Valle
        Universidad de Extremadura, Badajoz (Spain)

17H00   "A discriminative HCNN modeling"
        B. Petek
        University of Ljubljana (Slovenia)


Thursday 21th April 1994
------------------------

Session 5: Biological models
Chairman: P. Lansky (Academy of Science of the Czech Republic)

9H00    "Biologically plausible hybrid network design and motor control"
        G.R. Mulhauser
        University of Edinburgh (Scotland)

9H20    "Analysis of critical effects in a stochastic neural model"
        W. Mommaerts, E.C. van der Meulen, T.S. Turova
        K.U. Leuven (Belgium)

9H40    "Stochastic model of odor intensity coding in first-order olfactory
neurons"
        J.P. Rospars*, P. Lansky**
        *INRA Versailles (France), **Academy of Sciences, Prague (Czech
Republic)

10H00   "Memory, learning and neuromediators"
        A.S. Mikhailov
        Fritz-Haber-Institut der MPG, Berlin (Germany), and Russian Academy
of Sciences, Moscow (Russia)

10H20   "An explicit comparison of spike dynamics and firing rate dynamics
in neural network modeling"
        F. Chapeau-Blondeau, N. Chambet
        Universit d'Angers (France)

10H40   Coffee break

Session 6: Algorithms 2
Chairman: T. Denoeux (Universit Technologique de Compigne, France)

11H10   "A stop criterion for the Boltzmann machine learning algorithm"
        B. Ruf
        Carleton University (Canada)

11H30   "High-order Boltzmann machines applied to the Monk's problems"
        M. Grana, V. Lavin, A. D'Anjou, F.X. Albizuri, J.A. Lozano
        UPV/EHU, San Sebastian (Spain)

11H50   "A constructive training algorithm for feedforward neural networks
with ternary weights"
        F. Aviolat, E. Mayoraz
        Ecole Polytechnique Fdrale de Lausanne (Switzerland)

12H10   "Synchronization in a neural network of phase oscillators with time
delayed coupling"
        T.B. Luzyanina
        Russian Academy of Sciences, Moscow (Russia)

12H30   Lunch

Session 7:  Evolutive and incremental learning
Chairman: T.J. Stonham (Brunel University, UK) - to be confirmed

14H00   "Reinforcement learning and neural reinforcement learning"
        S. Sehad, C. Touzet
        Ecole pour les Etudes et la Recherche en Informatique et
Electronique, Nmes (France)

14H20   "Improving piecewise linear separation incremental algorithms using
complexity reduction methods"
        J.M. Moreno, F. Castillo, J. Cabestany
        Universitat Politecnica de Catalunya (Spain)

14H40   "A comparison of two weight pruning methods"
        O. Fambon, C. Jutten
        Institut National Polytechnique de Grenoble (France)

15H00   "Extending immediate reinforcement learning on neural networks to
multiple actions"
        C. Touzet
        Ecole pour les Etudes et la Recherche en Informatique et
Electronique, Nmes (France)

15H20   "Incremental increased complexity training"
        J. Ludik, I. Cloete
        University of Stellenbosch (South Africa)

15H40   Coffee break

Session 8: Function approximation
Chairman: E. Filippi (Politecnico di Torino, Italy) - to be confirmed

16H20   "Approximation of continuous functions by RBF and KBF networks"
        V. Kurkova, K. Hlavackova
        Academy of Sciences of the Czech Republic

16H40   "An optimized RBF network for approximation of functions"
        M. Verleysen*, K. Hlavackova**
        *Universit Catholique de Louvain, Louvain-la-Neuve (Belgium),
**Academy of Science of the Czech Republic

17H00   "VLSI complexity reduction by piece-wise approximation of the
sigmoid function"
        V. Beiu, J.A. Peperstraete, J. Vandewalle, R. Lauwereins
        K.U. Leuven (Belgium)

20H00 Conference dinner


Friday 22th April 1994
----------------------

Session 9: Algorithms 3
Chairman: J. Vandewalle (K.U. Leuven, Belgium) - to be confirmed

9H00    "Dynamic pattern selection for faster learning and controlled
generalization of neural networks"
        A. Rbel
        Technische Universitt Berlin (Germany)

9H20    "Noise reduction by multi-target learning"
        J.A. Bullinaria
        Edinburgh University (Scotland)

9H40    "Variable binding in a neural network using a distributed
representation"
        A. Browne, J. Pilkington
        South Bank University, London (UK)

10H00   "A comparison of neural networks, linear controllers, genetic
algorithms and simulated annealing for real time control"
        M. Chiaberge*, J.J. Merelo**, L.M. Reyneri*, A. Prieto**, L. Zocca*
        *Politecnico di Torino (Italy), **Universidad de Granada (Spain)

10H20   "Visualizing the learning process for neural networks"
        R. Rojas
        Freie Universitt Berlin (Germany)

10H40   Coffee break

Session 10: Theoretical aspects 2
Chairman: M. Cottrell (Universit Paris I, France)

11H20   "Stability analysis of diagonal recurrent neural networks"
        Y. Tan, M. Loccufier, R. De Keyser, E. Noldus
        University of Gent (Belgium)

11H40   "Stochastics of on-line back-propagation"
        T. Heskes
        University of Illinois at Urbana-Champaign (USA)

12H00   "A lateral contribution learning algorithm for multi MLP architecture"
        N. Pican*, J.C. Fort**, F. Alexandre*
        *INRIA Lorraine, **Universit Nancy I (France)

12H20   Lunch

Session 11: Self-organization
Chairman: F. Blayo (EERIE Nmes, France)

14H00   "Two or three things that we know about the Kohonen algorithm"
        M. Cottrell*, J.C. Fort**, G. Pags***
        Universits *Paris 1, **Nancy 1, ***Paris 6 (France)

14H20   "Decoding functions for Kohonen maps"
        M. Alvarez, A. Varfis
        CEC Joint Research Center, Ispra (Italy)

14H40   "Improvement of learning results of the selforganizing map by
calculating fractal dimensions"
        H. Speckmann, G. Raddatz, W. Rosenstiel
        University of Tbingen (Germany)

15H00   Coffee break

Session 11 (continued): Self-organization
Chairman: F. Blayo (EERIE Nmes, France)

15H40   "A non linear Kohonen algorithm"
        J.-C. Fort*, G. Pags**
        *Universit Nancy 1, **Universits Pierre et Marie Curie, et Paris
12 (France)

16H00   "Self-organizing maps based on differential equations"
        A. Kanstein, K. Goser
        Universitt Dortmund (Germany)

16H20   "Instabilities in self-organized feature maps with short
neighbourhood range"
        R. Der, M. Herrmann
        Universitt Leipzig (Germany)


ESANN'94 Registration and Hotel Booking Form
********************************************

Registration fees
-----------------
                registration before     registration after
                18th March 1994 18th March 1994
Universities    BEF 14500               BEF 15500
Industries      BEF 18500               BEF 19500

University fees are applicable to members and students of academic and
teaching institutions.

Each registration will be confirmed by an acknowledgment of receipt, which
must be given to the registration desk of the conference to get entry
badge, proceedings and all materials.

Registration fees include attendance to all sessions, the ESANN'94 banquet,
a copy of the conference proceedings, daily lunches (20-22 April '94), and
coffee breaks twice a day during the symposium.

Advance registration is mandatory.  Students and young researchers from
European countries may apply for European Community grants.


Hotel booking
-------------

Hotel MAYFAIR (4 stars) - 381 av. Louise - 1050 Brussels

Single room :                   BEF 2800
Double room (large bed) :       BEF 3500
Twin room (2 beds) :            BEF 3500

Prices include breakfast, service and taxes.  A deposit corresponding to
the first night is mandatory.


Registration to ESANN'94 (please give full address and tick appropriate)
------------------------------------------------------------------------
Ms., Mr., Dr., Prof.:...............................................
Name:...............................................................
First Name:.........................................................
Institution:........................................................
...................................................................
Address:............................................................
...................................................................
ZIP:................................................................
Town:...............................................................
Country:............................................................
Tel:................................................................
Fax:................................................................
E-mail:.............................................................
VAT n:.............................................................

Universities:
O   registration before 18th March 1994:        BEF 14500
O   registration after 18th March 1994:         BEF 15500
Industries:
O   registration before 18th March 1994:        BEF 18500
O   registration after 18th March 1994:         BEF 19500

Hotel Mayfair booking (please tick appropriate)
O   single room                         deposit: BEF 2800
O   double room (large bed)             deposit: BEF 3500
O   twin room (twin beds)               deposit: BEF 3500

Arrival date:           ..../..../1994
Departure date:         ..../..../1994

O   Additional payment if fees are paid
       through bank abroad check:                 BEF 500

Total   BEF ____

Payment (please tick):
O   Bank transfer, stating name of participant, made payable to:
                Gnrale de Banque
                ch. de Waterloo 1341 A
                B-1180 Brussels - Belgium
                Acc.no: 210-0468648-93 of D facto (45 rue
                        Masui, B-1210 Brussels)
        Bank transfers must be free of charges.  EVENTUAL CHARGES
        MUST BE PAID BY THE PARTICIPANT.
O   Cheques/Postal Money Orders made payable to:
                D facto
                45 rue Masui
                B-1210 Brussels - Belgium
        A SUPPLEMENTARY FEE OF BEF 500 MUST BE ADDED if
        the payment is made through bank abroad cheque
        or postal money order.

Only registrations accompanied by a cheque, a postal money order or the
proof of bank transfer will be considered.

Registration and hotel booking form, together with payment, must be send as
soon as possible, and in no case later than 8th April 1994, to the
conference secretariat:

       &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
       &    D facto conference services - ESANN'94     &
       &   45, rue Masui - B-1210 Brussels - Belgium   &
       &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&


Support
*******

ESANN'94 is organized with the support of:
- Commission of the European Communities (DG XII, Human Capital and
Mobility programme)
- IEEE Region 8
- IFIP WG 10.6 on neural networks
- Region of Brussels-Capital
- EERIE (Ecole pour les Etudes et la Recherche en Informatique et
Electronique - Nmes)
- UCL (Universit Catholique de Louvain - Louvain-la-Neuve)
- REGARDS (Research Group on Algorithmic, Related Devices and Systems - UCL)


Steering committee
******************

Franois Blayo          EERIE, Nmes (F)
Marie Cottrell          Univ. Paris I (F)
Nicolas Franceschini    CNRS Marseille (F)
Jeanny Hrault          INPG Grenoble (F)
Michel Verleysen        UCL Louvain-la-Neuve (B)

Scientific committee
********************

Luis Almeida            INESC - Lisboa (P)
Jorge Barreto           UCL Louvain-en-Woluwe (B)
Herv Bourlard          L. & H. Speech Products (B)
Joan Cabestany          Univ. Polit. de Catalunya (E)
Dave Cliff              University of Sussex (UK)
Pierre Comon            Thomson-Sintra Sophia (F)
Holk Cruse              Universitt Bielefeld (D)
Dante Del Corso         Politecnico di Torino (I)
Marc Duranton           Philips / LEP (F)
Jean-Claude Fort        Universit Nancy I (F)
Karl Goser              Universitt Dortmund (D)
Martin Hasler           EPFL Lausanne (CH)
Philip Husbands         University of Sussex (UK)
Christian Jutten        INPG Grenoble (F)
Petr Lansky             Acad. of Science of the Czech Rep. (CZ)
Jean-Didier Legat       UCL Louvain-la-Neuve (B)
Jean Arcady Meyer       Ecole Normale Suprieure - Paris (F)
Erkki Oja               Helsinky University of Technology (SF)
Guy Orban               KU Leuven (B)
Gilles Pags            Universit Paris I (F)
Alberto Prieto          Universitad de Granada (E)
Pierre Puget            LETI Grenoble (F)
Ronan Reilly            University College Dublin (IRE)
Tamas Roska             Hungarian Academy of Science (H)
Jean-Pierre Rospars     INRA Versailles (F)
Jean-Pierre Royet       Universit Lyon 1 (F)
John Stonham            Brunel University (UK)
Lionel Tarassenko       University of Oxford (UK)
John Taylor             King's College London (UK)
Vincent Torre           Universita di Genova (I)
Claude Touzet           EERIE Nmes (F)
Joos Vandewalle         KUL Leuven (B)
Eric Vittoz             CSEM Neuchtel (CH)
Christian Wellekens     Eurecom Sophia-Antipolis (F)


_____________________________
Michel Verleysen

D facto conference services
45 rue Masui
1210 Brussels
Belgium
tel: +32 2 245 43 63
fax: +32 2 245 46 94
E-mail: esann at dice.ucl.ac.be
_____________________________


From lba at ilusion.inesc.pt  Mon Feb  7 04:57:07 1994
From: lba at ilusion.inesc.pt (Luis B. Almeida)
Date: Mon, 7 Feb 94 10:57:07 +0100
Subject: Encoding missing values
Message-ID: <9402070957.AA18932@ilusion.inesc.pt>

Bill Skaggs writes:

  There is at least one kind of network that has no problem (in
  principle) with missing inputs, namely a Boltzmann machine.
  You just refrain from clamping the input node whose value is
  missing, and treat it like an output node or hidden unit.

  This may seem to be irrelevant to anything other than Boltzmann
  machines, but I think it could be argued that nothing very much
  simpler is capable of dealing with the problem.  When you ask
  a network to handle missing inputs, you are in effect asking it
  to do pattern completion on the input layer, and for this a
  Boltzmann machine or some other sort of attractor network would
  seem to be required.


The same effect, of trying to guess the missing inputs, can also be
obtained with a recurrent multilayer perceptron, trained with
recurrent backprop. This is the reason why the pattern completion
results that I described in my 1987 ICNN paper (ref. below) were
rather good.

L. B. Almeida, "A learning rule for asynchronous perceptrons with
feedback in a combinatorial environment", Proc IEEE First
International Conference on Neural Networks, San Diego, Ca., 1987.

Luis B. Almeida

INESC                             Phone: +351-1-544607, +351-1-3100246
Apartado 10105                    Fax:   +351-1-525843
P-1017 Lisboa Codex
Portugal

lba at inesc.pt

-----------------------------------------------------------------------------

      *** Indonesians are killing innocent people in East Timor ***


From jordan at psyche.mit.edu  Mon Feb  7 20:47:09 1994
From: jordan at psyche.mit.edu (Michael Jordan)
Date: Mon, 7 Feb 94 20:47:09 EST
Subject: Encoding missing values
Message-ID: <CMM.0.90.0.760672029.jordan@psyche.mit.edu>


> There is at least one kind of network that has no problem (in
> principle) with missing inputs, namely a Boltzmann machine.
> You just refrain from clamping the input node whose value is
> missing, and treat it like an output node or hidden unit.
>  
> This may seem to be irrelevant to anything other than Boltzmann
> machines, but I think it could be argued that nothing very much
> simpler is capable of dealing with the problem. 

The above is a nice observation that is worth emphasizing;
I agree with all of it except the comment about being irrelevant 
to anything else.  The Boltzmann machine is actually relevant
to everything else.  What the Boltzmann algorithm is doing
with the missing value is essentially the same as what the 
EM algorithm for mixtures (that Ghahramani and Tresp referred 
to) is doing, and epitomizes the general case of an iterative
"filling in" algorithm.  The Boltzmann machine learning algorithm 
is a generalized EM (GEM) algorithm.  During the E step the 
system computes the conditional correlation function for the 
nodes under the Boltzmann distribution, where the conditioning 
variables are the known data (the values of the clamped units) 
and the current values of the parameters (weights).  This
"fills in" the relevant statistic (the correlation function)
and allows it to be used in the generalized M step (the
contrastive Hebb rule).

Moreover, despite the fancy terminology, these algorithms are 
nothing more (nor less) than maximum likelihood estimation, 
where the likelihood function is the likelihood of the parameters 
*given the data that was actually observed*.  By "filling in" 
missing data, you're not adding new information to the problem;
rather, you're allowing yourself to use all the information 
that is in those components of the data vector that aren't 
missing.  (EM theory provides the justification for that 
statement).  E.g., if only one component of an input vector 
is missing, it's obviously wasteful to neglect what the 
other components of the input vector are telling you.  And, 
indeed, if you neglect the whole vector, you will not end up 
with maximum likelihood estimates for the weights (nor in 
general will you get maximum likelihood estimates if you fill 
in a value with the unconditional mean of that variable).

"Filling in" is not the only way to compute ML estimates for
missing data problems, but its virtue is that it allows the 
use of the same learning algorithms as would be used for complete 
data (without incurring any bias, if the filling in is done correctly).
The only downside is that even if the complete-data algorithm 
is one-pass (which the Boltzmann algorithm and mixture fitting 
are not) the "filling-in" approach is generally iterative,
because the parameter estimates depend on the filled-in values 
which in turn depend on the parameter estimates.
On the other hand, there are so-called "monotone" patterns
of missing data for which the filling-in approach is not
necessarily iterative.  This monotone case might be of
interest, because it is relevant for problems involving 
feedforward networks in which the input vectors are
complete but some of the outputs are missing.  (Note that even 
if all the output values for a case are missing, a ML
algorithm will not throw the case out; there is statistical 
structure in the input vector that the algorithm must not
neglect).

	Mike

(See Ghahramani's message for references; particularly the
Little and Rubin book).


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


A few days ago, I posted some thoughts about how to represent missing
input values to a neural network and asked for comments and further ideas.
This message is a summary of the replies I received (some in my personal mail
some in connectionists).  I show the most significant comments and ideas
and append versions of the messages that are trimmed to the most important parts
(in case somebody wants to keep this discussion in his/her archive)

This was my original message:

------------------------------------------------------------------------


From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <mailman.656.1149591299.29955.connectionists@cs.cmu.edu>

I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value,
      the relevant one being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone
      transformation to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to
      indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

------------------------------------------------------------------------

SUMMARY:

For a), the following ideas were mentioned:
1. use statistical techniques to compute replacement values from the rest
   of the data set
2. use a Boltzman machine to do this for you
3. use an autoencoder feed forward network to do this for you
4. randomize on the missing values (correct in the Bayesian sense)

For b), some experience was reported. I don't know how to summarize
that nicely, so I just don't summarize at all.

For c), no explicit quantitative results were given directly.

Some replies suggest that data is not always missing randomly.
The biases are often known and should be taken into account (e.g.
medical tests are not carried out (resulting in missing data) for
moreless healthy persons more often than for ill persons).

Many replies contained references to published work on this area,
from NN, machine learning, and mathematical statistics.
To ease searching for these references in the replies below, I have
marked them with the string ##REF## (if you have a 'grep' program that
extracts whole paragraphs, you can get them all out with one command).

Thanks to all who answered.
These are the trimmed versions of the replies:

------------------------------------------------------------------------

From: tgd at research.CS.ORST.EDU (Tom Dietterich)

[...for nominal attributes:]
An alternative here is to encode them as bit-strings in a
error-correcting code, so that the hamming distance between any two
bit strings is constant.  This would probably be better than a dense
binary encoding.  The cost in additional inputs is small.  I haven't
tried this though.  My guess is that distributed representations at
the input are a bad idea.

One must always determine WHY the value is missing.  In the heart
disease data, I believe the values were not measured because other
features were believed to be sufficient in each case.  In such cases,
the network should learn to down-weight the importance of the feature
(which can be accomplished by randomizing it---see below).

In other cases, it may be more appropriate to treat a missing value as
a separate value for the feature, e.g., in survey research, where a
subject chooses not to answer a question.

[...for continuous attributes:]
Ross Quinlan suggests encoding missing values as the mean observed
output value when the value is missing.  He has tried this in his
regression tree work.

Another obvious approach is to randomize the missing values--on each
presentation of the training example, choose a different, random,
value for each missing input feature.  This is the "right thing to do"
in the bayesian sense.

[...for binary attributes:]
I'm skeptical of the -1,0,1 encoding, but I think there is more
research to be done here.

[...for ordinal attributes:]
I would treat them as continuous.  

------------------------------------------------------------------------

From: shavlik at cs.wisc.edu (Jude W. Shavlik)

We looked at some of the methods you talked about in
the following article in the journal Machine Learning.

##REF##
%T Symbolic and Neural Network Learning Algorithms: An Experimental Comparison
%A J. W. Shavlik
%A R. J. Mooney
%A G. G. Towell
%J Machine Learning
%V 6
%N 2
%P 111-143
%D 1991

------------------------------------------------------------------------

From: hertz at nordita.dk (John Hertz)

It seems to me that the most natural way to handle missing data is to
leave them out.  You can do this if you work with a recurrent network
(fx Boltzmann machine) where the inputs are fed in by clamping the
input units to the given input values and the rest of the net relaxes
to a fixed point, after which the output is read off the output units.
If some of the input values are missing, the corresponding input units
are just left unclamped, free to relax to values most consistent with 
the known inputs. 

I have meant for a long time to try this on some medical prognosis data
I was working on, but I never got around to it, so I would be happy to
hear how it works if you try it.  

------------------------------------------------------------------------

From: jozo at sequoia.WPI.EDU (Jozo Dujmovic)

In the case of clustering benchmark programs I frequently have
the the problem of estimation of missing data. A relatively
simple SW that implements a heuristic algorithm generates
estimates having the average error of 8%. NN will somehow
"implicitly estimate" the missing data. The two approaches
might even be in some sense equivalent (?).

Jozo

[ I suspect that they are not: When you generate values for the
  missing items and put them in the training set, the network loses the
  information that this data is only estimated. Since estimations are
  not as reliable as true input data, the network will weigh inputs that
  have lots of generated values as less important. If it gets the 'is
  missing' information explicitly, it can discriminate true values from
  estimations instead. ]

------------------------------------------------------------------------

From: guy at cs.uq.oz.au

A final year student of mine worked on the problem of dealing with missing
inputs, without much success. However, the student as not very good, so take
the following opinions with a pinch of salt. 

We (very tentatively) came to the conclusion that if the inputs were
redundant, the problem was easy; if the missing input contained vital
information, the problem was pretty much impossible.

We used the heart disease data. I don't recommend it for the missing inputs
problem. All of the inputs are very good indicators of the correct result,
so missing inputs were not important. 

Apparently there is a large literature in statistics on dealing with missing
inputs. 

Anthony Adams (University of Tasmania) has published a technical report on
this. His email address is "A.Adams at cs.utas.edu.au". 

##REF##
@techreport{kn:Vamplew-91,
   author = "P. Vamplew and A. Adams", 
   address = {Hobart, Tasmania, Australia},
   institution = {Department of Computer Science, University of Tasmania},
   number = {R1-4},
   title = {Real World Problems in Backpropagation: Missing Values and Generalisability},
   year = {1991}
}

------------------------------------------------------------------------

From: Mike Southcott <mlsouth at cssip.levels.unisa.edu.au>

##REF##
I wrote a paper for the Australian conference on neural networks in 1993.
``Classification of Incomplete Data using neural networks''
Southcott, Bogner.

You may find it interesting. You may not be able to get the proceedings
for this conference, but I am in the process of digging up a postscript
copy for someone in the States, so when I do that, I will send
you a copy.

------------------------------------------------------------------------

From: Eric Saund <saund at parc.xerox.com>

I have done some work on unsupervised learning of mulitple cause 
clusters in binary data, for which an appropriate encoding scheme
is -1 = FALSE, 1 = TRUE, and 0 = NO DATA.  This has worked well
for me, but my paradigm is not your standard feedforward network
and uses a different activiation function from the standard
weighted sum followed by sigmoid squashing.
I presented the paper on this work at NIPS:

##REF##
Saund, Eric; 1994; "Unsupervised Learning of Mixtures of Multiple Causes
in Binary Data," in Advances in Neural Information Processing Systems -6-, 
Cowan, J., Tesauro, G, and Alspector, J., eds. Morgan Kaufmann, San Francisco.

------------------------------------------------------------------------

From: Thierry.Denoeux at hds.univ-compiegne.fr

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:
##REF##
X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.

------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

From yong at cns.brown.edu  Tue Feb  8 10:40:35 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Tue, 8 Feb 94 10:40:35 EST
Subject: some questions on training neural nets
Message-ID: <9402081540.AA15383@cns.brown.edu>

On the discussion of cross-validation method, Dr. Plutowski
referred to his paper by writing 

> It proves that two versions of cross-validation
> (one being the "hold-out set" version discussed above, and the other
> being the "delete-1" version) provide unbiased and strongly consistent
> estimates of IMSE  This is statistical jargon meaning that, on
> average, the estimate is accurate, (i.e., the expectation
> of the estimate for given training set size equals the IMSE + a noise term)
> and asymtotically precise (in that as the training set and test set
> size grow large, the estimate converges to the IMSE within the
> constant factor due to noise, with probability 1.)

Comment: 
  This comment is on the above result about "delete-1" version
cross-validation. The  result must have assumed that the
training data set have no outliers (corruption in Y component of a
data point).  Since deleting  a data point that is outlier will cause
a great change in the estimated neural net weights, and also the
squared prediction error on this outliers will be large. This will
then eventually cause a biased estimation of the IMSE. Even if a
robust algorithm is used to estimate the neural net weights in order
to reduce the sensitive of outlier in the estimation, the squared
prediction error on  the outlier will still be large. 
A possible correction would be to weight this outlier less in the
cross-validation, or in another word, to take less attention to this
outlier when delete this outlier.
A weighted cross-validation like this has been discussed
briefly in Liu (1994). The weighting of a data is calculated through
an iterative reweighted algorithm for robust regression. 

  One interesting thing about this version of cross-validation is its
asymptotical equivalency to Moody's criterion (Moody,1992; Liu, 1993).

References:

Liu, Y.(1993) Neural Network Model Selection Using Asymptotic Jackknife
Estimator and Cross-Validation Method. In  C.L. Giles, S.J. Hanson,
and and J.D. Cowan editors, {\em Advances in neural information
processing system}, volume 5, pages 599-606. Morgan Kaufmann, San
Mateo, CA.  

Liu, Y.(1994) Robust Parameter Estimation and Model Selection for Neural
Network Regression. To Appear in Jack D. Cowan, Gerald Tesauro and
Joshua Alspector editors, {\em  Advances in neural information
processing system},  volume 6. Morgan Kaufmann, San Mateo, CA.

Moody, J.E. (1992).The effective number of parameters, an analysis of
generalization and regularization in nonlinear learning system. In
Moody, J.E., Hanson, S.J., and Lippmann, R.P., editors, {\em
Advances in Neural Information Processing System 4}. Morgan Kaufmann
Publication.


----------------------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

From pluto at cs.ucsd.edu  Wed Feb  9 02:39:00 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Tue, 08 Feb 1994 23:39:00 -0800
Subject: some questions on training neural nets
Message-ID: <9402090739.AA07477@odin.ucsd.edu>


	------- Previous Message: ---------

From yong at cns.brown.edu  Tue Feb  8 10:40:35 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Tue, 8 Feb 94 10:40:35 EST
Subject: some questions on training neural nets
Message-ID: <9402081540.AA15383@cns.brown.edu>

On the discussion of cross-validation method, Dr. Plutowski
referred to his paper by writing 

> It proves that two versions of cross-validation
> (one being the "hold-out set" version discussed above, and the other
> being the "delete-1" version) provide unbiased and strongly consistent
> estimates of IMSE  This is statistical jargon meaning that, on
> average, the estimate is accurate, (i.e., the expectation
> of the estimate for given training set size equals the IMSE + a noise term)
> and asymtotically precise (in that as the training set and test set
> size grow large, the estimate converges to the IMSE within the
> constant factor due to noise, with probability 1.)

Comment: 
  This comment is on the above result about "delete-1" version
cross-validation. The  result must have assumed that the
training data set have no outliers (corruption in Y component of a
data point).  Since deleting  a data point that is outlier will cause
a great change in the estimated neural net weights, and also the
squared prediction error on this outliers will be large. This will
then eventually cause a biased estimation of the IMSE.

- ----------------------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

------- End of Previous Message	  ------


No, actually it turns out that delete-1 cross-validation delivers 
unbiased estimates of IMSE under fairly reasonable conditions.
(More precisely, it delivers estimates of IMSE_N + \sigma^2,
for training set size N and noise variance \sigma^2.) 

Roughly, the noise must have variance the same everywhere in input space,
(or, "homoscedasticity" as the statisticians would say,) with examples
selected independently from the same, fixed environment (i.e., "i.i.d.") 
the expectation of the squared-target must be finite (this just ensures
that conditional expectations of the target and the noise exist everywhere)
plus some conditions on the network to make it behave nicely.  

For these same conditions, the estimate is additionally "conservative," 
in that it does not, (asymptotically, anyway, as N grows large) 
underestimate the expected squared error of the network for optimal weights.

(These results and the prerequisite assumptions are of course 
stated more precisely in the paper.)  

However, we did require an additional assumption to obtain the
"strong" convergence result, in that the optimal weights must be unique.
This is to ensure that the weights for each of the deleted
subsets of N-1 examples converge to the weights obtained by training
on all N examples.

As an aside: This latter condition may seem strong, but it seems to be
(intuitively) applicable to a particular variant of delete-1
cross-validation commonly employed to make its computation more feasible -
(in which case the global optima are in a sense "locally" unique under
the right conditions.) In this variant, the network is trained on
the entire training set to obtain the "base" network.
These weights are then "fine-tuned" upon each of the deleted subsets 
of size N-1 to obtain the N cross-validated weight vectors.
This tends to distribute the fine-tuned weights within a local region
that seens to get tighter as the training set size increases.
It tends to work well in practice, under the right conditions. 
(Essentially, you need to ensure that the ratio of examples to weights
is sufficiently large, and it is easy to detect when this is
not the case.)

A bit off the original subject, I suppose, but I hope these results
help clarify what cross-validation is doing, at least in that
wonderfully ideal place called "asymptopia."  It (apparently) turns out that
these conditions suffice to ensure that the detrimental effect of a
malicious outlier becomes negligible as the size of the training
set grows large, at least with respect to the estimation of this 
particular kind of generalization by cross-validation.

= Mark Plutowski
  UCSD: INC and CS&E


P.S. Thank you for the honorable salutation!  Actually, I am 
(still) just a student here.  8-) 8-| 

From lange at ira.uka.de  Wed Feb  9 14:19:22 1994
From: lange at ira.uka.de (lange@ira.uka.de)
Date: Wed, 9 Feb 94 14:19:22 MET
Subject: Methods for improving generalization (was Re: some questions on 
         ...)
Message-ID: <"iraun1.ira.337:09.01.94.13.22.32"@ira.uka.de>

Dear Mr. Hicks,

in your mail to Mr. Grossman you mentioned the "Soft Weight-Sharing" algorithm
and stated, that this algorithm would do some adaption to the data.
I don't think, that this is right: Soft Weight-Sharing is just a bit more
complicated than Weight-Decay or other things (so some improvements have
been made). But Soft Weight-Sharing does not really adapt to the data,
because you have to tune the same parameters as in normal Weight-Decay:
the parameters, that are used to handle the strength of the penalty-term.
The article of Nowlan and Hinton "Simplifying Neural Networks by Soft Weight-
Sharing" does not mention a method to do this automatically - so no "real"
adaption to the data is made.

Maybe the methods of MacKay ("Bayesian Interpolation", Neural Comp. 4 (1992),
page 415-447) could be used to get a fully-automatic adaption. A combination
of this method with Weight-Decay or Soft Weight-Sharing would perhaps be
data-adaptive; but Soft Weight-Sharing alone has still a parameter, that is
not adapted by the data.

Yours,
Frank Lange

From sec at ai.univie.ac.at  Wed Feb  9 08:53:36 1994
From: sec at ai.univie.ac.at (sec@ai.univie.ac.at)
Date: Wed, 9 Feb 1994 14:53:36 +0100
Subject: No subject
Message-ID: <199402091353.AA14535@prater.ai.univie.ac.at>

                              *          *       
                      *                          *
                                                       
                *       TWELFTH EUROPEAN MEETING        *
                                   
              *                    ON                     *

                *    CYBERNETICS AND SYSTEMS RESEARCH   *
                                                            
                      *       (EMCSR 1994)       *

                           April 5 - 8, 1994

                          UNIVERSITY OF VIENNA

      
        organized by the Austrian Society for Cybernetic Studies
                          in cooperation with
 Dept.of Medical Cybernetics and Artificial Intelligence, Univ.of Vienna
                                  and
             International Federation for Systems Research
                                   

Plenary lectures:
***************** 

	MARGARET BODEN (United Kingdom):
        "Artificial Intelligence and Creativity"

      	STEPHEN GROSSBERG (USA):
	"Neural Networks for Learning, Recognition, and Prediction" 

	STUART A. UMPLEBY (USA):
	"Twenty Years of Second Order Cybernetics"

      
241 papers will be presented and discussed in the following symposia:
*********************************************************************

GENERAL SYSTEMS METHODOLOGY
	G.J.Klir (USA)

ADVANCES IN MATHEMATICAL SYSTEMS THEORY
	J.Miro (Spain), M.Peschel (Germany), F.Pichler (Austria)

FUZZY SYSTEMS, APPROXIMATE REASONING AND KNOWLEDGE-BASED SYSTEMS
	C.Carlsson (Finland), K.-P.Adlassnig (Austria), E.P.Klement
	(Austria)

DESIGNING AND SYSTEMS, AND THEIR EDUCATION
	B.Banathy (USA), W.Gasparski (Poland), G.Goldschmidt 
	(Israel)

HUMANITY, ARCHITECTURE AND CONCEPTUALIZATION
	G.Pask (United Kingdom), G.de Zeeuw (Netherlands)

BIOCYBERNETICS AND MATHEMATICAL BIOLOGY
	L.M.Ricciardi (Italy)

SYSTEMS AND ECOLOGY
	F.J.Radermacher (Germany), K.Fedra (Austria)

CYBERNETICS AND INFORMATICS IN MEDICINE
	G.Gell (Austria), G.Porenta (Austria)

CYBERNETICS OF SOCIO-ECONOMIC SYSTEMS
	K.Balkus (USA), O.Ladanyi (Austria)

SYSTEMS, MANAGEMENT AND ORGANIZATION
	G.Broekstra (Netherlands), R.Hough (USA)

CYBERNETICS OF COUNTRY DEVELOPMENT
	P.Ballonoff (USA), T.Koizumi (USA), S.A.Umpleby (USA)

COMMUNICATION AND COMPUTERS
	A M.Tjoa (Austria)

INTELLIGENT AUTONOMOUS SYSTEMS
	J.Rozenblit (USA), H.Praehofer (Austria)

CYBERNETIC PRINCIPLES OF KNOWLEDGE DEVELOPMENT
	F.Heylighen (Belgium), S.A.Umpleby (USA)

CYBERNETICS, SYSTEMS AND PSYCHOTHERAPY
	M.Okuyama (Japan), H.Koizumi (USA)

ARTIFICIAL NEURAL NETWORKS AND ADAPTIVE SYSTEMS
	S.Grossberg (USA), G.Dorffner (Austria)

ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE
	V.Marik (Czech Republic), R.Born (Austria)
 

TUTORIALS:
**********

A SYNTACTIC APPROACH TO HEURISTIC NETWORKS: LINGUISTIC GEOMETRY
	Prof.Boris Stilman, University of Colorado, Denver, USA

FUZZY SETS AND IMPRECISE BUT RELEVANT DECISIONS
	Prof.Christer Carlsson, Abo Akademi University, Abo, Finland

CONTEXTUAL SYSTEMS: A NEW TECHNOLOGY FOR KNOWLEDGE BASED SYSTEM
DEVELOPMENT
	Dr.Irina V. Ezhkova, Russian Academy of Science, Moscow

TWENTY YEARS OF SECOND ORDER CYBERNETICS
	Prof.Stuart A. Umpleby, George Washington University,
	Washington, D.C., USA


PROCEEDINGS: 
************ 

Trappl R.(ed.): CYBERNETICS AND SYSTEMS '94,
2 vols, 1911 pages, World Scientific Publishing, Singapore. 


FOR FURTHER INFORMATION PLEASE CONTACT: 
***************************************

	EMCSR'94 Secretariat
	c/o Austrian Society for Cybernetic Studies
	Schottengasse 3
	A-1010 Vienna
	Austria
	Phone:  +43-1-53532810
	Fax:    +43-1-5320652
	E-mail: sec at ai.univie.ac.at


From gert at jhunix.hcf.jhu.edu  Wed Feb  9 09:32:57 1994
From: gert at jhunix.hcf.jhu.edu (Gert Cauwenberghs)
Date: Wed, 9 Feb 1994 09:32:57 -0500
Subject: "A Learning Analog Neural Network Chip..."
Message-ID: <94Feb9.093258edt.70280-3@jhunix.hcf.jhu.edu>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/cauwenberghs.nips93.ps.Z

A preprint of the paper:

A Learning Analog Neural Network Chip with Continuous-Time Recurrent Dynamics,
  by Gert Cauwenberghs,  8 pages including figures,
  to appear in Advances in Neural Information Processing Systems, vol. 6, 1994,

is available on the neuroprose repository, in compressed PostScript format:

anonymous binary ftp to archive.cis.ohio-state.edu
cd pub/neuroprose
get cauwenberghs.nips93.ps.Z
uncompress and print.

The abstract follows below.
--- Gert Cauwenberghs
    (gert at jhunix.hcf.jhu.edu)

We present experimental results on supervised learning of dynamical features in
an analog VLSI neural network chip.  The recurrent network, containing six
continuous-time analog neurons and 42 free parameters (connection strengths and
thresholds), is trained to generate time-varying outputs approximating given
periodic signals presented to the network.  The chip implements a stochastic
perturbative algorithm, which observes the error gradient along random
directions in the parameter space for error-descent learning.  In addition to
the integrated learning functions and the generation of pseudo-random
perturbations, the  chip provides for teacher forcing and long-term storage of
the volatile  parameters.  The network learns a 1 kHz circular trajectory
in 100 sec.  The chip occupies 2 X 2 mm in a 2 um CMOS process, and dissipates
1.2 mW.

From yong at cns.brown.edu  Wed Feb  9 14:42:14 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Wed, 9 Feb 94 14:42:14 EST
Subject: some questions on training neural nets
Message-ID: <9402091942.AA19342@cns.brown.edu>


Plutowski (Tue, 08 Feb 1994) wrote

 >No, actually it turns out that delete-1 cross-validation delivers 
 >unbiased estimates of IMSE under fairly reasonable conditions.
 >(More precisely, it delivers estimates of IMSE_N + \sigma^2,
 >for training set size N and noise variance \sigma^2.) 

 >Roughly, the noise must have variance the same everywhere in input space,
 >(or, "homoscedasticity" as the statisticians would say,) with examples
 >selected independently from the same, fixed environment (i.e., "i.i.d.") 
 >the expectation of the squared-target must be finite (this just ensures
 >that conditional expectations of the target and the noise exist everywhere)
 >plus some conditions on the network to make it behave nicely.  

 >For these same conditions, the estimate is additionally "conservative," 
 >in that it does not, (asymptotically, anyway, as N grows large) 
 >underestimate the expected squared error of the network for optimal weights.

Outliers are the data points that come in an "unexpected" way, both
in the training  data and in the future. For example, the data is collected
so that a proportional  of them are typos. So as the size of the data gets
large, the number of outliers in them also gets large. Plutowski's
assumption, as I understand it, is to assume the ratio of the number outliers
over the size of data size is very small. 

One way to look at data set containing outliers is to assume noises
are inhomoscedastic. Outlier data points have their noises with large variance,
and good data points have their noises with small variance (Liu 1994).
This is different from Plutowski's   "homoscedasticity" assumption.
Since we have no intention of  predicting the value of outliers, 
robust estimation in both the parameters and the generalization error
requires the "removal" of the outliers.

These discussion, I hope, could convey the idea that when using
cross-validation for the estimation of generalization error, 
some cautions should be taken as regards to the 
influence of Bad data in the training data set. 

------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912


From pluto at cs.ucsd.edu  Wed Feb  9 17:52:55 1994
From: pluto at cs.ucsd.edu (Mark Plutowski)
Date: Wed, 9 Feb 94 14:52:55 -0800
Subject: Outliers (Was: "Some questions on training..")
Message-ID: <9402092252.AA14771@beowulf>


------- previous message -------
Dr. Liu writes:

Outliers are the data points that come in an "unexpected" way, both
in the training  data and in the future. For example, the data is collected
so that a proportional  of them are typos. So as the size of the data gets
large, the number of outliers in them also gets large. Plutowski's
assumption, as I understand it, is to assume the ratio of the number outliers
over the size of data size is very small. 

One way to look at data set containing outliers is to assume noises
are inhomoscedastic. Outlier data points have their noises with large variance,
and good data points have their noises with small variance (Liu 1994).
This is different from Plutowski's   "homoscedasticity" assumption.
Since we have no intention of  predicting the value of outliers, 
robust estimation in both the parameters and the generalization error
requires the "removal" of the outliers.

These discussion, I hope, could convey the idea that when using
cross-validation for the estimation of generalization error, 
some cautions should be taken as regards to the 
influence of Bad data in the training data set. 

------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

------- end previous message -------

Dear Dr Liu,

Yes, this points out the importance of examining the
assumptions carefully to ensure that they apply to your
particular learning task.  As another example of where these
results do not apply, note that the assumption of mean zero noise 
can be easily violated in discrimination tasks (often referred
to as "classification" tasks) where the noise involves
random misclassification of the target.  

It also points out an appealling definition  of "outlier",
My interpretation of this is the following:
When the noise variance on the target can depends upon the input 
(in statistical jargon, referred to as "heteroscedasticity of
the conditional variance of Y_i given X_i")
there is the possibility that a plot of the conditional 
target variance over the input space could display
discontinuous jumps, corresponding to where it is more likely
to encounter targets that are much more "noisy" - as compared
to targets for neighboring inputs.   Is this accurate?

I look forward to reading (Liu 94).  Can you (or anyone else)
point me to other references utilizing a similar definition
of "outlier?"  (IMHO) "outlier" is quite a value-laden term
that I tend to avoid since I feel it has multiple and
often ambiguous interpretations/definitions.  

I am currently doing work on detection of what I call
"offliers" since I have a precise definition of what this
means to me, and since I hesitate to use the term "outliers"
for the reason stated above.

= Mark


PS: I would appreciate further opinions/references/examples 
of what "outlier" means (either in practice or in theory) 
which I will summarize and post to the mailing list.   


From mlsouth at cssip.levels.unisa.edu.au  Wed Feb  9 21:00:23 1994
From: mlsouth at cssip.levels.unisa.edu.au (mlsouth@cssip.levels.unisa.edu.au)
Date: Thu, 10 Feb 1994 12:30:23 +1030 (CST)
Subject: Missing values
Message-ID: <8610.9402100200@hotham.levels.unisa.edu.au>

Connectionists,

I did a short study on methods for classification of incomplete data
18 months ago.

I compared the statistical methods of discrimination and classification
and the EM algorithm to some neural methods. These methods could only
be applied to an artificial data set due to the inavailability of
a set of real data with missing values. Despite this, I believe that
the conclusions are still sound.

A copy of the paper ``Classification of incomplete data using
neural networks'', M.L. Southcott, R.E. Bogner which was presented
to the Fourth Australian Conference on Neural Networks (ACNN '93)
is available via anonymous ftp from ftp.cssip.edu.au. 
The file is pub/users/michael/southcott.missing.ps

Michael Southcott				mlsouth at cssip.edu.au
Centre for Sensor Signal and Information Processing
SPRI Building, The Levels, Pooraka 5095, South Australia.


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


[ Due to a transmission error at our end, Lutz Prechelt's 28 Kbyte summary
of the missing values discussion got truncated at about 16 Kbytes.  Here is
the second half of his summary.  Sorry for any inconvenience.

 	-- Dave Touretzky, CONNECTIONISTS moderator  ]


------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

From hicks at cs.titech.ac.jp  Fri Feb 11 00:02:54 1994
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Fri, 11 Feb 94 00:02:54 JST
Subject: Methods for improving generalization (was Re: some questions on 
         ...)
In-Reply-To: lange@ira.uka.de's message of Wed, 9 Feb 94 14:19:22 MET <"iraun1.ira.337:09.01.94.13.22.32"@ira.uka.de>
Message-ID: <9402101503.AA16767@maruko.cs.titech.ac.jp>


Dear Mr. Franke Lange (lange at ira.uka.de),

	On Wed, 9 Feb 94 14:19:22 MET you wrote:
>But Soft Weight-Sharing does not really adapt to the data,
>because you have to tune the same parameters as in normal Weight-Decay:
>the parameters, that are used to handle the strength of the penalty-term.
>The article of Nowlan and Hinton "Simplifying Neural Networks by Soft Weight-
>Sharing" does not mention a method to do this automatically - so no "real"
>adaption to the data is made.

I say "every model is adaptive, and no model is adaptive, but some are more
adaptive than others".  Every model has parameters which are adjusted during
learning.  Penalty functions, including soft weight sharing, affects the prior
distribution of weights and so can be thought of as just providing different
models.  All of these models adapt to data.  On the other hand, every model
>must< make some assumptions about which it is adamant.  If it didn't there
wouldn't be a model.  These assumptions are non-adaptive to the data. (note1)

	You further wrote:
>Maybe the methods of MacKay ("Bayesian Interpolation", Neural Comp. 4 (1992),
>page 415-447) could be used to get a fully-automatic adaption. A combination
>of this method with Weight-Decay or Soft Weight-Sharing would perhaps be
>data-adaptive; but Soft Weight-Sharing alone has still a parameter, that is
>not adapted by the data.


The article was very enlighenting.  Figure 1 on page 417 shows the 2 main
steps of modeling which involve Baysian methods: (1) Fit each model to the
data, (2) Assign preferences to the alternative models.  The first step is the
one we are all familiar with.  The second one is the topic of the paper and
consists of assigning objective preferences to each model: the probability of
the data given the model is called the evidence for the model.

Re your idea of "fully-automatic adaption". I will first review the parameters
related to soft weight sharing: (a) the number of weight groups (b) the mean
and variance of each group of weights.  The weight penalty weighting is not
arbitrary but determined by the variance of the squared error (which changes
with time) divided by a factor (determined by cross-validation) to adjust to
the number of free parameters.  I think you mean by "fully-automatic adaption"
that parameters (a) and (b) should be constant during stage (1), and after
running the simulation for a large number of times with different values for
(a) and (b) we should select the best ones with stage (2) methods: i.e. 
weighing the evidence for each model.  This would take a long time BUT we
might get a different answer from the one obtained by choosing (a) and (b) in
stage 1.

However, as to which way is best called "automatic", I would personaly favor
the present stage (1) way, because it automatically (although maybe
imperfectly) estimates the best parameters (a) and (b) implicitly during
learning, leaving less labor for the later and harder stage (2).  I realize I
am getting semantic here.

(note1) Mackay does give a special example of a 100% data-adaptive model: the
Sure Thing hypothesis, which is that the data set will be what it is
(predicted of course before seeing the data, selected afterwards), but this
hypothesis has very small a priori probability.  Too bad for our universe.
The other example is of course stock tips, (predicted of course before seeing
the money, collected afterwards), but look what happened to Micheal Milliken!

Respectfully Yours,

	Craig Hicks

Craig Hicks           hicks at cs.titech.ac.jp | Kore ya kono  Yuku mo kaeru mo
Ogawa Laboratory, Dept. of Computer Science | Wakarete wa   Shiru mo shiranu mo
Tokyo Institute of Technology, Tokyo, Japan |  	    Ausaka no seki        
lab:03-3726-1111 ext.2190 home:03-3785-1974 |  (from hyaku-nin-issyu)
fax: +81(3)3729-0685 (from abroad) 
     03-3729-0685  (from Japan)


From terry at salk.edu  Thu Feb 10 12:45:15 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Thu, 10 Feb 94 09:45:15 PST
Subject: robust statistics
Message-ID: <9402101745.AA28545@salk.edu>

One man's outlier is another man's data point.  Another
way to handle outliers is not to remove them but to model them
explicitly.  Geoff Hinton has pointed out that character
recognition can be made more robust by including models
for background noise such as postmarks.

Steve Nowlan and I recently used mixtures of expert networks
to separate multiple interpenetrating flow fields -- the
transparency problem for visual motion.  The gating network
was used to select regions of the visual field that 
contained reliable estimates of local velocity for 
which there was coherent global support.  There is
evidence for such selection neurons in area MT of primate
visual cortex, a region of cortex that specializes in
the detection of coherent motion.

Terry

-----

From yong at cns.brown.edu  Thu Feb 10 13:39:19 1994
From: yong at cns.brown.edu (Yong Liu)
Date: Thu, 10 Feb 94 13:39:19 EST
Subject: outlier, robust statistics
Message-ID: <9402101839.AA21430@cns.brown.edu>


Plutowski wrote (Wed, 9 Feb 94)

   >It also points out an appealling definition  of "outlier",
   >My interpretation of this is the following:
   >When the noise variance on the target can depends upon the input 
   >(in statistical jargon, referred to as "heteroscedasticity of
   >the conditional variance of Y_i given X_i")
   >there is the possibility that a plot of the conditional 
   >target variance over the input space could display
   >discontinuous jumps, corresponding to where it is more likely
   >to encounter targets that are much more "noisy" - as compared
   >to targets for neighboring inputs.   Is this accurate?

Yes. It is the heuristics behind modelling the error as a mixture of
normal distributions in (Liu 94). In simple words, the statistical
formulation  regards the error for each data points as from a normal 
distribution with  different variances, and regard the variances as
missing observations.  By using a prior on the variance and EM
algorithm, one can estimate the variance. It turns out during the
estimation, the EM algorithm looks for the data points that have
larger variances and down-weights those data points. 

This way of modelling is in agreement with Dr. Sejnowski's view

  >One man's outlier is another man's data point.  Another
  >way to handle outliers is not to remove them but to model them
  >explicitly. ...


Plutowski also wrote (Wed, 9 Feb 94)

   >I look forward to reading (Liu 94).  Can you (or anyone else)
   >point me to other references utilizing a similar definition
   >of "outlier?"  (IMHO) "outlier" is quite a value-laden term
   >that I tend to avoid since I feel it has multiple and
   >often ambiguous interpretations/definitions.  

Box and Tiao (1968) hold similar views.
Outlier are generated from  a distribution
that is a perturbation to the underlying distribution, for example,
a small amount of noise with ever changing distribution in the
background. Huber's (1981) book is referred as a excellent reference.
Anyway, no matter what outlier is, what one really want is to use a
model/method that is not sensitive to them and predict the relevant
information.

References

Box, G.E.P. and Tiao, G.C.(1968) A Bayesian approach to some outlier
problem. Biometrika, 55, 119-129

Huber (1981) Robust Statistics. John Wiley & Sons, Inc..

BTW. I will be a Phd only three month later. 
-------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912


From zl at venezia.rockefeller.edu  Thu Feb 10 20:54:42 1994
From: zl at venezia.rockefeller.edu (Zhaoping Li)
Date: Thu, 10 Feb 94 20:54:42 -0500
Subject: Paper announcement on neuroprose
Message-ID: <9402110154.AA00738@venezia.rockefeller.edu>


FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/li-zhaoping.stereocoding.ps.Z


The file li-zhaoping.stereocoding.ps.Z is now available for copying from the 
Neuroprose archive. This is a 16 page paper plus 6 figures, to be
published in Network: Computation in Neural Systems.
---------------------------------------------------------------------------

        Efficient Stereo Coding in the Multiscale Representation

                Zhaoping Li and Joseph J. Atick

                The Rockefeller University
                1230 York Avenue
                New York, NY 10021, USA


Abstract:

Stereo images are highly redundant; the left and right
frames of typical  scenes are  very similar. We explore the consequences
of the hypothesis that cortical cells --- in addition to their
multiscale coding strategies (Li and Atick 1994a) --- are concerned with
reducing binocular redundancy due to correlations between the two eyes.
We derive the most efficient coding strategies that achieve
binocular decorrelation. It is shown that multiscale coding combined
with a binocular decorrelation strategy  leads to a rich diversity
of cell types. In particular, the theory predicts monocular/binocular cells
as well as a family of disparity selective cells, among which one can identify
cells that are tuned-zero-excitatory, near, far, and tuned inhibitory.
The theory also predicts  correlations between
ocular dominance, cell size, orientation, and disparity selectivities.
Consequences on cortical ocular dominance column formation from
abnormal developmental conditions such as strabismus and monocular
eye closure are also predicted.
These findings are compared with physiological measurements.


Please address correspondence  to Zhaoping Li
----------------------------------------------------------------------------
To obtain a copy:

  ftp archive.cis.ohio-state.edu
  login: anonymous
  password: <your email address>
  cd pub/neuroprose
  binary
  get li-zhaoping.stereocoding.ps.Z
  quit

Then at your system:

  uncompress li-zhaoping.stereocoding.ps
  lpr -P<printer-name> li-zhaoping.stereocoding.ps


Zhaoping Li 
Box 272
Rockefeller University
1230 York Ave
New York, NY 10021
phone: 212-327-7423
fax: 212-327-7422
zl at rockvax.rockefeller.edu


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


[ My attempt to forward Lutz Prechelt's summary of the missing values
discussion was twice foiled by technical problems.  Note to future posters:
do not attempt to transmit lines containing nothing but a period and a
carriage return.  It confuses our FTP software.

  Here is my final attempt to transmit the entire summary.  If this fails,
Lutz will just have to dump it to neuroprose and let people access it
via FTP.  Sorry about the repeated postings.

 	-- Dave Touretzky, CONNECTIONISTS moderator  ]

================================================================


From prechelt at ira.uka.de  Tue Feb  8 07:19:16 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Tue, 08 Feb 1994 13:19:16 +0100
Subject: SUMMARY: encoding missing values
Message-ID: <"irafs2.ira.957:08.01.94.12.19.58"@ira.uka.de>


A few days ago, I posted some thoughts about how to represent missing
input values to a neural network and asked for comments and further ideas.
This message is a summary of the replies I received (some in my personal mail
some in connectionists).  I show the most significant comments and ideas
and append versions of the messages that are trimmed to the most important parts
(in case somebody wants to keep this discussion in his/her archive)

This was my original message:

------------------------------------------------------------------------


From prechelt at ira.uka.de  Wed Feb  2 03:58:56 1994
From: prechelt at ira.uka.de (Lutz Prechelt)
Date: Wed, 02 Feb 1994 09:58:56 +0100
Subject: Encoding missing values
Message-ID: <mailman.657.1149591299.29955.connectionists@cs.cmu.edu>

I am currently thinking about the problem of how to encode data with
attributes for which some of the values are missing in the data set for
neural network training and use.

An example of such data is the 'heart-disease' dataset from the UCI machine
learning database (anonymous FTP on "ics.uci.edu" [128.195.1.1], directory
"/pub/machine-learning-databases"). There are 920 records altogether with
14 attributes each. Only 299 of the records are complete, the others have one
or several missing attribute values.  11% of all values are missing.

I consider only networks that handle arbitrary numbers of real-valued inputs
here (e.g. all backpropagation-suited network types etc). I do NOT consider
missing output values. In this setting, I can think of several ways how to
encode such missing values that might be reasonable and depend on
the kind of attribute and how it was encoded in the first place:

1. Nominal attributes (that have n different possible values)
  1.1 encoded "1-of-n", i.e., one network input per possible value,
      the relevant one being 1 all others 0.
      This encoding is very general, but has the disadvantage of producing
      networks with very many connections.
      Missing values can either be represented as 'all zero' or by simply
      treating 'is missing' as just another possible input value, resulting
      in a "1-of-(n+1)" encoding.
  1.2 encoded binary, i.e.,  log2(n) inputs being used like the bits in a
    binary representation of the numbers 0...n-1 (or 1...n).
      Missing values can either be represented as just another possible input
      value (probably all-bits-zero is best) or by adding an additional network
      input which is 1 for 'is missing' and 0 for 'is present'. The original
      inputs should probably be all zero in the 'is missing' case.

2. continuous attributes (or attributes treated as continuous)
  2.1 encoded as a single network input, perhaps using some monotone
      transformation to force the values into a certain distribution.
      Missing values are either encoded as a kind of 'best guess' (e.g. the
      average of the non-missing values for this attribute) or by using
      an additional network input being 0 for 'missing' and 1 for 'present' 
      (or vice versa) and setting the original attribute input either to 0
      or to the 'best guess'. (The 'best guess' variant also applies to
      nominal attributes above)

3. binary attributes (truth values)
  3.1 encoded by one input:  0=false  1=true   or vice versa
      Treat like (2.1)
  3.2 encoded by one input:  -1=false 1=true   or vice versa
      In this case we may act as for (3.1) or may just use 0 to
      indicate 'missing'.
  3.3 treat like nominal attribute with 2 possible values

4. ordinal attributes (having n different possible values, which are ordered)
  4.1 treat either like continuous or like nominal attribute.
    If (1.2) is chosen, a Gray-Code should be used.
    Continuous representation is risky unless a 'sensible' quantification
    of the possible values is available.    

So far to my considerations. Now to my questions.

a) Can you think of other encoding methods that seem reasonable ?  Which ?

b) Do you have experience with some of these methods that is worth sharing ?

c) Have you compared any of the alternatives directly ?

------------------------------------------------------------------------

SUMMARY:

For a), the following ideas were mentioned:
1. use statistical techniques to compute replacement values from the rest
   of the data set
2. use a Boltzman machine to do this for you
3. use an autoencoder feed forward network to do this for you
4. randomize on the missing values (correct in the Bayesian sense)

For b), some experience was reported. I don't know how to summarize
that nicely, so I just don't summarize at all.

For c), no explicit quantitative results were given directly.

Some replies suggest that data is not always missing randomly.
The biases are often known and should be taken into account (e.g.
medical tests are not carried out (resulting in missing data) for
moreless healthy persons more often than for ill persons).

Many replies contained references to published work on this area,
from NN, machine learning, and mathematical statistics.
To ease searching for these references in the replies below, I have
marked them with the string ##REF## (if you have a 'grep' program that
extracts whole paragraphs, you can get them all out with one command).

Thanks to all who answered.
These are the trimmed versions of the replies:

------------------------------------------------------------------------

From: tgd at research.CS.ORST.EDU (Tom Dietterich)

[...for nominal attributes:]
An alternative here is to encode them as bit-strings in a
error-correcting code, so that the hamming distance between any two
bit strings is constant.  This would probably be better than a dense
binary encoding.  The cost in additional inputs is small.  I haven't
tried this though.  My guess is that distributed representations at
the input are a bad idea.

One must always determine WHY the value is missing.  In the heart
disease data, I believe the values were not measured because other
features were believed to be sufficient in each case.  In such cases,
the network should learn to down-weight the importance of the feature
(which can be accomplished by randomizing it---see below).

In other cases, it may be more appropriate to treat a missing value as
a separate value for the feature, e.g., in survey research, where a
subject chooses not to answer a question.

[...for continuous attributes:]
Ross Quinlan suggests encoding missing values as the mean observed
output value when the value is missing.  He has tried this in his
regression tree work.

Another obvious approach is to randomize the missing values--on each
presentation of the training example, choose a different, random,
value for each missing input feature.  This is the "right thing to do"
in the bayesian sense.

[...for binary attributes:]
I'm skeptical of the -1,0,1 encoding, but I think there is more
research to be done here.

[...for ordinal attributes:]
I would treat them as continuous.  

------------------------------------------------------------------------

From: shavlik at cs.wisc.edu (Jude W. Shavlik)

We looked at some of the methods you talked about in
the following article in the journal Machine Learning.

##REF##
%T Symbolic and Neural Network Learning Algorithms: An Experimental Comparison
%A J. W. Shavlik
%A R. J. Mooney
%A G. G. Towell
%J Machine Learning
%V 6
%N 2
%P 111-143
%D 1991

------------------------------------------------------------------------

From: hertz at nordita.dk (John Hertz)

It seems to me that the most natural way to handle missing data is to
leave them out.  You can do this if you work with a recurrent network
(fx Boltzmann machine) where the inputs are fed in by clamping the
input units to the given input values and the rest of the net relaxes
to a fixed point, after which the output is read off the output units.
If some of the input values are missing, the corresponding input units
are just left unclamped, free to relax to values most consistent with 
the known inputs. 

I have meant for a long time to try this on some medical prognosis data
I was working on, but I never got around to it, so I would be happy to
hear how it works if you try it.  

------------------------------------------------------------------------

From: jozo at sequoia.WPI.EDU (Jozo Dujmovic)

In the case of clustering benchmark programs I frequently have
the the problem of estimation of missing data. A relatively
simple SW that implements a heuristic algorithm generates
estimates having the average error of 8%. NN will somehow
"implicitly estimate" the missing data. The two approaches
might even be in some sense equivalent (?).

Jozo

[ I suspect that they are not: When you generate values for the
  missing items and put them in the training set, the network loses the
  information that this data is only estimated. Since estimations are
  not as reliable as true input data, the network will weigh inputs that
  have lots of generated values as less important. If it gets the 'is
  missing' information explicitly, it can discriminate true values from
  estimations instead. ]

------------------------------------------------------------------------

From: guy at cs.uq.oz.au

A final year student of mine worked on the problem of dealing with missing
inputs, without much success. However, the student as not very good, so take
the following opinions with a pinch of salt. 

We (very tentatively) came to the conclusion that if the inputs were
redundant, the problem was easy; if the missing input contained vital
information, the problem was pretty much impossible.

We used the heart disease data. I don't recommend it for the missing inputs
problem. All of the inputs are very good indicators of the correct result,
so missing inputs were not important. 

Apparently there is a large literature in statistics on dealing with missing
inputs. 

Anthony Adams (University of Tasmania) has published a technical report on
this. His email address is "A.Adams at cs.utas.edu.au". 

##REF##
@techreport{kn:Vamplew-91,
   author = "P. Vamplew and A. Adams", 
   address = {Hobart, Tasmania, Australia},
   institution = {Department of Computer Science, University of Tasmania},
   number = {R1-4},
   title = {Real World Problems in Backpropagation: Missing Values and Generalisability},
   year = {1991}
}

------------------------------------------------------------------------

From: Mike Southcott <mlsouth at cssip.levels.unisa.edu.au>

##REF##
I wrote a paper for the Australian conference on neural networks in 1993.
``Classification of Incomplete Data using neural networks''
Southcott, Bogner.

You may find it interesting. You may not be able to get the proceedings
for this conference, but I am in the process of digging up a postscript
copy for someone in the States, so when I do that, I will send
you a copy.

------------------------------------------------------------------------

From: Eric Saund <saund at parc.xerox.com>

I have done some work on unsupervised learning of mulitple cause 
clusters in binary data, for which an appropriate encoding scheme
is -1 = FALSE, 1 = TRUE, and 0 = NO DATA.  This has worked well
for me, but my paradigm is not your standard feedforward network
and uses a different activiation function from the standard
weighted sum followed by sigmoid squashing.
I presented the paper on this work at NIPS:

##REF##
Saund, Eric; 1994; "Unsupervised Learning of Mixtures of Multiple Causes
in Binary Data," in Advances in Neural Information Processing Systems -6-, 
Cowan, J., Tesauro, G, and Alspector, J., eds. Morgan Kaufmann, San Francisco.

------------------------------------------------------------------------

From: Thierry.Denoeux at hds.univ-compiegne.fr

In a recent mailing, Lutz Prechelt mentioned the interesting problem of how 
to encode attributes with missing values as inputs to a neural network.
I have recently been faced to that problem while applying neural nets to
rainfall prediction using weather radar images. The problem was to classify
pairs of "echoes" -- defined as groups of connected pixels with reflectivity
above some threshold -- taken from successive images as corresponding to
the same rain cell or not. Each pair of echoes was discribed by a list of
attributes. Some of these attributes, refering to the past of a sequence, were
not defined for some instances. To encode these attributes with potentially
missing values, we applied two different methods actually suggested by Lutz:

- the replacement of the missing value by a "best-guess" value 
- the addition of a binary input indicating whether the corresponding attribute
  was present or absent.

Significantly better results were obtained by the second method.

This work was presented at ICANN'93 last september:
##REF##
X. Ding, T. Denoeux & F. Helloco (1993). Tracking rain cells in radar images
using multilayer neural networks. In Proc. of ICANN'93, Springer-Verlag, 
p. 962-967.

------------------------------------------------------------------------

From: "N. Karunanithi" <karun at faline.bellcore.com>

[...for nominal attributes:]
   Both methods have the problem of poor scalability. If the number of
missing values increases then the number of additional inputs will
increase linearly in 1.1 and logarithmically in 1.2.
    In fact, 1-of-n encoding may be a poor choice if (1) the number
of input features is large and (2) such an expanded dimensional 
representation does not become a (semi) linearly separable problem.
Even if it becomes a linearly separable problem, the overall complexity
of the network can sometimes be very high.

[...for continuous attributes:]
This representation requires GUESS. A nominal transformation may not be
a proper representation in some cases. Assume that the output values
range over a large numerical interval. For example, from 0.0 to 10,000.0.  
If you use a simple scaling like dividing by 10,000.0 to make it
between 0.0 and 1.0, this will result in poor accuracy of prediction.
If the attribute is on the input side, then on theory the
scaling is unnecessary because the input layer weights will scale
accordingly. However, in practice I had lot of problem with this
approach. Maybe a log tranformation before scaling may not be a bad
choice.
If you use a closed scaling you may have problem whenever a future value
exceeds the maximum value of the numerical intervel. For example,
assume that the attribute is time, say in miliseconds. Any future time 
from the point of reference can exceed the limit. Hence any closed
scaling will not work properly.

[...for ordinal attributes:]
I have compared Binary Encoding (1.2), Gray-Coded representation and
straighforward scaling. Colsed scaling seems to do a good job. I have 
also compared open scaling and closed scaling and did find significant
improvement in prediction accuracy. 

###REF###
N. Karunanithi, D. Whitley and Y. K. Malaiya,
   "Prediction of Software Reliability Using Connectionist Models",
    IEEE Trans. Software Eng., July 1992, pp 563-574.

N. Karunanithi and Y. K. Malaiya, "The Scaling Problem in Neural
     Networks for Software Reliability Prediction", Proc. IEEE Int.
    Symposium on Rel. Eng., Oct. 1992, pp. 776-82.

I have not found a simple solution that is general. I think
representation in general and the missing information in specific
are open problems within connectionist research. I am not sure we will
have a magic bullet for all problems. The best approach is to come up
with a specific solution for a given problem.

------------------------------------------------------------------------

From: Bill Skaggs <bill at nsma.arizona.edu>

There is at least one kind of network that has no problem (in
principle) with missing inputs, namely a Boltzmann machine.
You just refrain from clamping the input node whose value is
missing, and treat it like an output node or hidden unit.

This may seem to be irrelevant to anything other than Boltzmann
machines, but I think it could be argued that nothing very much
simpler is capable of dealing with the problem.  When you ask
a network to handle missing inputs, you are in effect asking it
to do pattern completion on the input layer, and for this a
Boltzmann machine or some other sort of attractor network would
seem to be required. 

------------------------------------------------------------------------

From: "Scott E. Fahlman" <sef+ at cs.cmu.edu>
    
[Follow-up to Bill Skaggs:]
Good point, but perhaps in need of clarification for some readers:

There are two ways of training a Boltzmann machine.  In one (the original
form), there is no distinction between input and output units.  During
training we alternate between an instruction phase, in which all of the
externally visible units are clamped to some pattern, and a normalization
phase, in which the whole network is allow to run free.  The idea is to
modify the weights so that, when running free, the external units assume
the various pattern values in the training set in their proper frequencies.
If only some subset of the externally visible units are clamped to certain
values, the net will produce compatible completions in the other units,
again with frequencies that match this part of the training set.

A net trained in this way will (in principle -- it might take a *very* long
time for anything complicated) do what you suggest: Complete an "input"
pattern and produce a compatible output at the same time.  This works even
if the input is *totally* missing.

I believe it was Geoff Hinton who realized that a Boltzmann machine could
be trained more efficiently if you do make a distinction between input and
output units, and don't waste any of the training effort learning to
reconstruct the input.  In this model, the instruction phase clamps both
input and output units to some pattern, while the normalization phase
clamps only the input units.  Since the input units are correct in both
cases, all of the networks learning power (such as it is) goes into
producing correct patterns on the output units.  A net trained in this way
will not do input-completion.

I bring this up because I think many people will only have seen the latter
kind of Boltzmann training, and will therefore misunderstand your
observation.

By the way, one alternative method I have seen proposed for reconstructing
missing input values is to first train an auto-encoder (with some degree of
bottleneck to get generalization) on the training set, and then feed the
output of this auto-encoder into the classification net.  The auto-encoder
should be able to replace any missing values with some degree of accuracy.
I haven't played with this myself, but it does sound plausible.  If anyone
can point to a good study of this method, please post it here or send me
E-mail.

------------------------------------------------------------------------

From: "David G. Stork" <stork at cache.crc.ricoh.com>

##REF##
There is a provably optimal method for performing classification with
missing inputs, described in Chapter 2 of "Pattern Classification and
Scene Analysis" (2nd ed.) by R. O. Duda, P. E. Hart and D. G. Stork,
which avoids the ad-hoc heuristics that have been described by others.
Those interested in obtaining Chapter two via ftp should contact me.

------------------------------------------------------------------------

From: Wray Buntine <wray at ptolemy-ethernet.arc.nasa.gov>

This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.

A classic study in this area, which includes most suggestions
I've read here so far, is

##REF##
@inproceedings{quinlan:ml6,
        AUTHOR = "J.R. Quinlan",
        TITLE = "Unknown Attribute Values in Induction",
        YEAR = 1989,
        BOOKTITLE = "Proceedings of the Sixth International
                        Machine Learning Workshop",
        PUBLISHER = "Morgan Kaufmann",
        ADDRESS = "Cornell, New York"}

The most frequently cited methods I've seen, and they're so common 
amongst the different communities its hard to lay credit:
  1)	 replace missings by their some best guess
  2)     fracture the example into a set of fractional examples
		each with the missing value filled in somehow
  3)     call the missing value another input value

3 is a good thing to do if they are "informative" missing,
i.e.  if someone leaves the entry "telephone number" blank in a 
	questionaire, then maybe they don't have a telephone,
	but probably not good otherwise unless you
	have loads of data and don't mind all the extra
	example types generated (as already mentioned)

1 is a quick and dirty hack at 2.  How good depends on your
application.

2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model".  The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever.  We do it for feed-forward nets in

##REF##
@article{buntine.weigend:bbp,
        AUTHOR = "W.L. Buntine and A.S. Weigend",
        TITLE =  "Bayesian Back-Propagation",
        JOURNAL = "Complex Systems",
        Volume = 5,
        PAGES = "603--643",
        Number = 1,
        YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.

I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.

------------------------------------------------------------------------

From: Volker Tresp <Volker.Tresp at zfe.siemens.de>

In general, the solution to the missing-data problem depends on 
the missing-data mechanism. For example, if you sample the income
of a population and rich people tend to refuse the answer the mean
of your sample is biased. To obtain an unbiased solution
you would have to take into account the missing-data mechanism.

The missing-data mechanism can be ignored if it is independent of 
the input and the output (in the example: the likelihood that a 
person refuses to answer is independent of the person's income). 
Most approaches assume that the missing-data mechanism can be ignored.

There exist a number of ad hoc solutions to the missing-data problem
but it is also possible to approach the problem from a statistical point
of view. In our paper (which will be published in the upcoming 
NIPS-volume and which will be available on neuroprose
shortly) we discuss a systematic likelihood-based approach.
NN-regression  can be framed as a maximum likelihood learning problem
if we assume the standard signal plus Gaussian noise model  

P(x, y) =  P(x) P(y|x)    \propto P(x) exp(-1/(2 \sigma^2) (y - NN(x))^2).


By deriving the probability density function for  a pattern with missing
features  we can formulate a likelihood function including patterns 
with complete and incomplete features.  

The solution  requires an  integration over the missing input. 
In practice, the  integral  is  approximated  using a numerical approximation. 
For networks of Gaussian basis functions,  it is possible to obtain 
closed-form solutions (by extending the EM algorithm).

Our paper also discusses why and when ad hoc solutions --such as substituting
the mean for an unknown input--  are  harmful. For example, 
if the mapping is approximately linear substituting the mean might work
quite well. In general, although, it introduces bias. 


Training with missing and noisy input data is described in:

##REF##
``Training Neural Networks with Deficient Data,''
V. Tresp, S. Ahmad and R. Neuneier, in Cowan, J. D., Tesauro, G., 
and Alspector, J. (eds.), {\em  Advances in Neural Information Processing Systems 6}, Morgan Kaufmann,  1994.

A related paper by Zoubin Ghahramani and Michael Jordan will also appear 
in the  upcoming NIPS-volume.

Recall with missing and noisy data is discussed in (available in neuroprose
as ahmad.missing.ps.Z): 

``Some Solutions to the Missing Feature Problem in Vision,'' 
 S. Ahmad and  V. Tresp,  in  
{\em Advances in Neural Information Processing Systems 5,}
S. J. Hanson, J. D. Cowan,  and C. L. Giles eds.,
San Mateo, CA, Morgan Kaufman,  1993. 

------------------------------------------------------------------------

From: Subhash Kak <kak at gate.ee.lsu.edu>

Missing values in feedback networks raise interesting questions:
Should these values be considered "don't know" values or should 
these be generated in some "most likelihood" fashion? These issues
are discussed in the following paper:

##REF##
S.C. Kak, "Feedback neural networks: new characteristics and a
generalization", Circuits, Systems, Signal Processing, vol. 12,
no. 2, 1993, pp. 263-278.

------------------------------------------------------------------------

From: Zoubin Ghahramani <zoubin at psyche.mit.edu>

I have also been looking into the issue of encoding and learning from
missing values in a neural network. The issue of handling missing
values has been addressed extensively in the statistics literature for
obvious reasons.  To learn despite the missing values the data has to
be filled in, or the missing values integrated over. The basic
question is how to fill in the missing data. There are many different
methods for doing this in stats (mean imputation, regression
imputation, Bayesian methods, EM, etc.). For good reviews see (Little
and Rubin 1987; Little, 1992).

I do not in general recommend encoding "missing" as yet another value
to be learned over. Missing means something in a statistical sense --
that the input could be any of the values with some probability
distribution. You could, for example, augment the original data
filling in different values for the missing data points according to a
prior distribution. Then the training would assign different weights
to the artificially filled-in data points depending on how well they
predict the output (their posterior probability). This is essentially
the method proposed by Buntine and Weigand (1991). Other approaches
have been proposed by Tresp et al. (1993) and Ahmad and Tresp (1993).

I have just written a paper on the topic of learning from incomplete
data. In this paper I bring a statistical algorithm for learning from
incomplete data, called EM, into the framework of nonlinear function
approximation and classification with missing values. This approach
fits the data iteratively with a mixture model and uses that same
mixture model to effectively fill in any missing input or output
values at each step. 

You can obtain the preprint by 
	ftp psyche.mit.edu
	login: anonymous
	cd pub
	get zoubin.nips93.ps
To obtain code for the algorithm please contact me directly.

##REF##
Ahmad, S and Tresp, V (1993) "Some Solutions to the Missing Feature
Problem in Vision." In Hanson, S.J., Cowan, J.D., and Giles, C.L.,
editors, Advances in Neural Information Processing Systems 5. Morgan
Kaufmann Publishers, San Mateo, CA.

Buntine, WL, and Weigand, AS (1991) "Bayesian back-propagation." Complex
Systems. Vol 5 no 6 pp 603-43

Ghahramani, Z and Jordan MI (1994) "Supervised learning from
incomplete data via an EM approach" To appear in Cowan, J.D., Tesauro,
G., and Alspector,J. (eds.). Advances in Neural Information Processing
Systems 6.  Morgan Kaufmann Publishers, San Francisco, CA, 1994.

Little, RJA (1992) "Regression With Missing X's:  A Review." Journal of the
American Statistical Association.  Volume 87, Number 420. pp.
1227-1237

Little, RJA. and Rubin, DB (1987). Statistical Analysis with Missing
Data. Wiley, New York.

Tresp, V, Hollatz J, Ahmad S (1993) "Network structuring and training
using rule-based knowledge." In Hanson, S.J., Cowan, J.D., and
Giles, C.~L., editors,  Advances in Neural Information Processing
Systems 5. Morgan Kaufmann Publishers, San Mateo, CA.

------------------------------------------------------------------------

That's it.

  Lutz
  
Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.

From n.burgess at ucl.ac.uk  Fri Feb 11 05:00:20 1994
From: n.burgess at ucl.ac.uk (Neil Burgess)
Date: Fri, 11 Feb 94 10:00:20 +0000
Subject: pre-print in neuroprose
Message-ID: <141927.9402111000@link-1.ts.bcc.ac.uk>


FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/burgess.hipmod.ps.Z

*****do not forward to other groups*****

Dear connectionists,
the following preprint has been put on neuroprose, contact 
n.burgess at ucl.ac.uk with any retrieval problems,  
--Neil


	    `A model of hippocampal function'

	Neil Burgess, Michael Recce and John O'Keefe
 	   Dept. of Anatomy, University College, 
		London WC1E 6BT, U.K.

The firing rate maps of hippocampal place cells recorded in a freely 
moving rat are viewed as a set of approximate radial basis functions over
the (2-D) environment of the rat. It is proposed that these firing 
fields are constructed during exploration from `sensory inputs' (tuning 
curve responses to the distance of cues from the rat) and used by cells 
downstream to construct firing rate maps that approximate any desired 
surface over the environment. It is shown that, when a rat moves freely 
in an open field, the phase of firing of a place cell (with respect 
to the EEG $\theta$ rhythm) contains information as to the relative 
position of its firing field from the rat. A model of hippocampal function 
is presented in which the firing rate maps of cells downstream of the 
hippocampus provide a `population vector' encoding the instantaneous 
direction of the rat from a previously encountered reward site, enabling 
navigation to it. A neuronal simulation, involving reinforcement only at 
the goal location, provides good agreement with single cell recording
from the hippocampal region, and can navigate to reward sites in open 
fields using sensory input from environmental cues. The system requires 
only brief exploration, performs latent learning, and can return to a goal
location after encountering it only once.

Neural Networks, to be published. 26 pages, 2Mbytes uncompressed.


From eric at research.nj.nec.com  Fri Feb 11 11:11:29 1994
From: eric at research.nj.nec.com (Eric B. Baum)
Date: Fri, 11 Feb 94 11:11:29 EST
Subject: No subject
Message-ID: <9402111611.AA00562@yin>


        Fifth Annual NEC Research Symposium

    NATURAL AND ARTIFICIAL PARALLEL COMPUTATION

                  PRINCETON, NJ

                 MAY 4 - 5, 1994


NEC Research Institute is pleased to announce that
the Fifth Annual NEC Research Symposium will be
held at the Hyatt Regency Hotel in Princeton, New
Jersey on May 4 and 5, 1994. The title of this
year's symposium is Natural and Artificial Parallel
Computation. The conference will feature ten
invited talks. The speakers are:

-   Larry Abbott, Brandeis University, "Activity-
        Dependent Modulation of Intrinsic Neuronal
        Properties"
-   Catherine Carr, University of Maryland, "Time
        Coding in the Central Nervous System"
-   Bill Dally, MIT, "Bandwidth, Granularity, and
       Mechanisms:  Key Issues in the Design of
       Parallel Computers"
-   Amiram Grinvald, Weitzmann Institute,
       "Architecture and Dynamics of Cell Assemblies in
       the Visual Cortex; New Perspectives From Fast
       and Slow Optical Imaging"
-   Akihiko Konagaya, NEC C&C Research Labs,
       "Knowledge Discovery in Genetic Sequences"
-   Chris Langton, Santa Fe Institute, "SWARM:  An
       Agent Based Simulation System for Research in
       Complex Systems"
-   Thomas Ray, University of Delaware and ATR,
      "Evolution and Ecology of Digital Organisms"
-   Shuichi Sakai, Real World Computing Partnership,
      "RWC Massively Parallel ComputerProject"
-   Shigeru Tanaka, NEC Fundamental Research Labs,
      "A Mathematical Theory for the Experience-
       Dependent Development of Visual Cortex"
-   Leslie Valiant, Harvard University and NECI, "A
       Computational Model for Cognition"

There will be no contributed papers.  Registration
is free of charge, but space is limited.
Registrations will be accepted on a first come,
first served basis.  YOU MUST PREREGISTER. There
will be no on-site registration.  To preregister by
e-mail, send a request to:
symposium at research.nj.nec.com.  
Registrants will receive an acknowledgment, space 
allowing.  A request for preregistration is also 
possible by regular mail to Mrs. Irene Parker, NEC Research
Institute, 4 Independence Way, Princeton, NJ 08540.

Registrants will also be invited to an Open
House/Poster Session and Reception at NEC Research
Institute on Tuesday, May 3.  The Open House will
begin at 3:30 PM and the Reception will begin at
5:30 PM.  In order to estimate headcount, please
indicate in your preregistration request whether
you plan to attend the Open House on May 3.

Registrants are expected to make their own
arrangements for accommodations.  Provided below is
a list of hotels in the area together with daily
room rates.  Please ask for the NEC Corporate Rate
when reserving a room.  Sessions will start at 8:15
AM Wednesday, May 4 and will be scheduled to finish
at approximately 3:30 PM on Thursday, May 5.

Red Roof Inn, South Brunswick      (908)821-8800  $37.99
Novotel Hotel, Princeton           (609)520-1200  $68.00
                                     ($74.00/w breakfast)
Palmer Inn, Princeton              (609)452-2500  $73.00
Marriott Residence Inn, Princeton  (908)329-9600  $85.00 
                                     w/continental breakfast
Summerfield Suites, Princeton      (609)951-0009  $92.00
Hyatt Regency, Princeton           (609)987-1234 $105.00
Marriott Hotel, Princeton          (609)452-7900 $125.00

- - - - - - - - - - - - - - - - - - - - - - - - - -

PLEASE RESPOND BY E-MAIL TO:
symposium at research.nj.nec.com

I would like to attend:

_____     Open House

_____     Symposium


Name:               ____________________________

Organization:       ____________________________

E-mail address:     ____________________________

Phone number:       ____________________________


From bishopc at helios.aston.ac.uk  Fri Feb 11 09:59:33 1994
From: bishopc at helios.aston.ac.uk (bishopc)
Date: Fri, 11 Feb 94 14:59:33 GMT
Subject: Postdoctoral Fellowships
Message-ID: <27570.9402111459@sun.aston.ac.uk>


-------------------------------------------------------------------


                     Aston University 

               Neural Computing Research Group


            TWO POSTDOCTORAL RESEARCH FELLOWSHIPS: 
            --------------------------------------

           FUNDAMENTAL RESEARCH IN NEURAL NETWORKS


Two postdoctoral fellowships, each with a duration of 3 years, 
will be funded by the U.K. Science and Engineering Research Council, 
and are to commence on or after 1 April 1994. These posts are part 
of a major project to be undertaken within the Neural Computing 
Research Group at Aston, and will involve close collaboration with 
Professors Chris Bishop and David Lowe, with additional input from 
Professor David Bounds. This interdisciplinary program requires 
researchers capable of extending theoretical concepts, and 
developing algorithmic and proof-of-principle demonstrations 
through software simulation. The two Research Fellows will work 
on distinct, though closely related, areas as follows:


1. Generalization in Neural Networks

The usual approach to complexity optimisation and model order 
selection in neural networks makes use of computationally intensive 
cross-validation techniques. This project will build on recent 
developments in the use of Bayesian methods and the description 
length formalism to develop systematic techniques for model 
optimization in feedforward neural networks from a principled 
statistical perspective. In its later stages, the project will 
demonstrate the practical utility of the techniques which emerge, 
in the context of a wide range of real-world applications. 


2. Dynamic Neural Networks

Current embodiments of neural networks, when applied to `dynamic' 
events such as time series forecasting, are successful only if 
the underlying `generator' of the data is stationary. If the 
underlying generator is slowly varying in time then we do not 
have a principled basis for designing effective neural network 
structures, though ad hoc procedures do exist. This program will 
address some of the key issues in this area using techniques 
from statistical pattern processing and dynamical systems theory. 
In addition, application studies will be conducted which will 
focus on time series problems and tracking in non-stationary 
noise. 


If you wish to be considered for these positions, please send 
a CV and publications list, together with the names of 3 
referees, to:

    Professor Chris M Bishop
    Neural Computing Research Group
    Aston University
    Birmingham B4 7ET, U.K.
    Tel: 021 359 3611 ext. 4270
    Fax: 021 333 6215
    e-mail: c.m.bishop at aston.ac.uk


From ahmad at interval.com  Fri Feb 11 12:04:37 1994
From: ahmad at interval.com (ahmad@interval.com)
Date: Fri, 11 Feb 94 09:04:37 -0800
Subject: Computing visual feature correspondences
Message-ID: <9402111704.AA28021@iris10.interval.com>


The following paper is available for anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52), in directory
pub/neuroprose, as file "ahmad.correspondence.ps.Z":

Feature Densities are Required for Computing Feature Correspondences

Subutai Ahmad
Interval Research Corporation
1801-C Page Mill Road, Palo Alto, CA 94304
E-mail: ahmad at interval.com

			       Abstract

The feature correspondence problem is a classic hurdle in visual
object-recognition concerned with determining the correct mapping
between the features measured from the image and the features expected
by the model.  In this paper we show that determining good
correspondences requires information about the joint probability
density over the image features.  We propose "likelihood based
correspondence matching" as a general principle for selecting optimal
correspondences. The approach is applicable to non-rigid models,
allows nonlinear perspective transformations, and can optimally deal
with occlusions and missing features. Experiments with rigid and
non-rigid 3D hand gesture recognition support the theory. The
likelihood based techniques show almost no decrease in classification
performance when compared to performance with perfect correspondence
knowledge.


To appear in:
 
Cowan, J.D., Tesauro, G., and Alspector, J.  (Eds.), Advances in
Neural Information Processing Systems 6. San Francisco CA: Morgan
Kaufmann, 1994.


From ahmad at interval.com  Fri Feb 11 13:03:31 1994
From: ahmad at interval.com (ahmad@interval.com)
Date: Fri, 11 Feb 94 10:03:31 -0800
Subject: Training NN's with missing or noisy data
Message-ID: <9402111803.AA28794@iris10.interval.com>


The following paper is available for anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52), in directory
pub/neuroprose, as file "tresp.deficient.ps.Z". (The companion paper,
"Some Solutions to the Missing Feature Problem in Vision" is available
as "ahmad.missing.ps.Z")

	     Training Neural Networks with Deficient Data

  Volker Tresp			Subutai Ahmad
  Siemens AG			Interval Research Corporation
  Central Research		1801-C Page Mill Rd.
  81730 Muenchen, Germany	Palo Alto, CA 94304
  tresp at zfe.siemens.de 		ahmad at interval.com

  		   Ralph Neuneier
  		   Siemens AG
  		   Central Research
  		   Otto-Hahn-Ring 6
  		   81730 Muenchen, Germany
  		   ralph at zfe.siemens.de 

Abstract:

We analyze how data with uncertain or missing input features can be
incorporated into the training of a neural network.  The general
solution requires a weighted integration over the unknown or uncertain
input although computationally cheaper closed-form solutions can be
found for certain Gaussian Basis Function (GBF) networks.  We also
discuss cases in which heuristical solutions such as substituting the
mean of an unknown input can be harmful.


The paper will appear in: 

Cowan, J.D., Tesauro, G., and Alspector, J.  (Eds.), Advances in
Neural Information Processing Systems 6. San Francisco CA: Morgan
Kaufmann, 1994.


Subutai Ahmad
Interval Research Corporation		       Phone: 415-354-3639
1801-C Page Mill Rd.				 Fax: 415-354-0872
Palo Alto, CA 94304			E-mail: ahmad at interval.com


From mel at klab.caltech.edu  Fri Feb 11 15:05:47 1994
From: mel at klab.caltech.edu (Bartlett Mel)
Date: Fri, 11 Feb 94 12:05:47 PST
Subject: NIPS*94 Call for Papers
Message-ID: <9402112005.AA10791@plato.klab.caltech.edu>


 ********* PLEASE NOTE NEW SUBMISSIONS FORMAT FOR 1994 *********


                         CALL FOR PAPERS
              Neural Information Processing Systems
                     -Natural and Synthetic-
        Monday, November 28 - Saturday, December 3, 1994
                        Denver, Colorado

This is the  eighth meeting  of an  interdisciplinary  conference
which   brings   together  neuroscientists,  engineers,  computer
scientists, cognitive scientists, physicists, and  mathematicians
interested    in    all    aspects    of  neural  processing  and
computation.  The conference will include invited talks, and oral
and  poster  presentations  of refereed papers.  There will be no
parallel sessions.  There  will  also  be  one  day  of  tutorial
presentations  (Nov   28) preceding the  regular session, and two
days of focused workshops will follow at a nearby ski  area  (Dec
2-3).

Major categories for paper submission, and examples  of  keywords
within categories, are the following:

  Neuroscience: systems physiology, cellular physiology,  signal
  and noise analysis, oscillations, synchronization, inhibition,
  neuromodulation, synaptic plasticity, computational models.

  Theory:  computational  learning  theory,  complexity  theory,
  dynamical  systems,  statistical  mechanics,  probability  and
  statistics, approximation theory.

  Implementations: VLSI, optical, parallel processors,  software
  simulators, implementation languages.

  Algorithms   and    Architectures:    learning     algorithms,
  constructive/pruning  algorithms,  localized  basis functions,
  decision  trees,  recurrent  networks,   genetic   algorithms,
  combinatorial optimization, performance comparisons.

  Visual   Processing:    image    recognition,    coding    and
  classification,    stereopsis,    motion   detection,   visual
  psychophysics.

  Speech, Handwriting and Signal Processing: speech recognition,
  coding   and   synthesis,  handwriting  recognition,  adaptive
  equalization, nonlinear noise removal.

  Applications:  time-series  prediction,   medical   diagnosis,
  financial   analysis,  DNA/protein  sequence  analysis,  music
  processing, expert systems.

  Cognitive Science & AI: natural language, human  learning  and
  memory, perception and psychophysics, symbolic reasoning.

  Control, Navigation,  and  Planning:  robotic  motor  control,
  process   control,  navigation,  path  planning,  exploration,
  dynamic programming.

Review  Criteria:   All  submitted  papers  will  be   thoroughly
refereed on the basis of technical quality, novelty, significance
and clarity.  Submissions should contain new  results  that  have
not  been published previously.  Authors are encouraged to submit
their most recent work, as there will be an opportunity after the
meeting  to  revise  accepted manuscripts before submitting final
camera-ready copy.

********** PLEASE NOTE NEW SUBMISSIONS FORMAT FOR 1994 **********

Paper Format:  Submitted papers may  be  up  to  eight  pages  in
length.   The  page  limit  will  be  strictly  enforced, and any
submission exceeding eight pages will not be considered.  Authors
are  encouraged  (but  not  required) to use the NIPS style files
obtainable by anonymous FTP at the sites given below. Papers must
include  physical  and  e-mail addresses of all authors, and must
indicate one of the nine major categories listed  above,  keyword
information  if  appropriate,  and  preference for oral or poster
presentation.  Unless otherwise indicated, correspondence will be
sent to the first author.

Submission Instructions: Send six copies of submitted  papers  to
the  address  given  below;  electronic  or FAX submission is not
acceptable.  Include one additional copy of the abstract only, to
be  used  for preparation of the abstracts booklet distributed at
the meeting.  Submissions mailed first-class  within  the  US  or
Canada  must  be  postmarked  by  May 21, 1994.  Submissions from
other places must be received by this date.  Mail submissions to:

	David Touretzky
	NIPS*94 Program Chair
	Computer Science Department
	Carnegie Mellon University
	5000 Forbes Avenue
	Pittsburgh PA 15213-3890  USA

Mail general inquiries/requests for registration material to:

	NIPS*94 Conference
	NIPS Foundation
	PO Box 60035
	Pasadena, CA 91116-6035  USA
	(e-mail: nips94 at caltech.edu)

FTP sites for LaTex style files "nips.tex" and "nips.sty":

	helper.systems.caltech.edu (131.215.68.12) in /pub/nips
	b.gp.cs.cmu.edu (128.2.242.8) in /usr/dst/public/nips

NIPS*94 Organizing Committee: General Chair, Gerry Tesauro,  IBM;
Program  Chair,  David Touretzky, CMU; Publications Chair, Joshua
Alspector, Bellcore;  Publicity  Chair,  Bartlett  Mel,  Caltech;
Workshops  Chair,  Todd  Leen,  OGI;  Treasurer,  Rodney Goodman,
Caltech; Local  Arrangements,  Lori  Pratt,  Colorado  School  of
Mines; Tutorials Chairs, Steve Hanson, Siemens and Gerry Tesauro,
IBM; Contracts, Steve Hanson, Siemens and Scott Kirkpatrick, IBM;
Government   &  Corporate  Liaison,  John  Moody,  OGI;  Overseas
Liaisons: Marwan Jabri, Sydney Univ., Mitsuo  Kawato,  ATR,  Alan
Murray,  Univ.  of  Edinburgh,  Joachim  Buhmann,  Univ. of Bonn,
Andreas Meier, Simon Bolivar Univ.


      DEADLINE FOR SUBMISSIONS IS MAY 21, 1994 (POSTMARKED)

                          -please post-


From yamauchi at alpha.ces.cwru.edu  Fri Feb 11 17:24:43 1994
From: yamauchi at alpha.ces.cwru.edu (Brian Yamauchi)
Date: Fri, 11 Feb 94 17:24:43 -0500
Subject: Preprints Available
Message-ID: <9402112224.AA03791@yuggoth.CES.CWRU.Edu>

The following papers are available via anonymous ftp from
yuggoth.ces.cwru.edu:

----------------------------------------------------------------------

Sequential Behavior and Learning in Evolved Dynamical Neural Networks

	       Brian Yamauchi(1) and Randall Beer(1,2)

	  Department of Computer Engineering and Science(1)
		       Department of Biology(2)
		   Case Western Reserve University
			 Cleveland, OH 44106

      Case Western Reserve University Technical Report CES-93-25

	  This paper will be appearing in Adaptive Behavior.

			       Abstract

This paper explores the use of a real-valued modular genetic algorithm
to evolve continuous-time recurrent neural networks capable of
sequential behavior and learning.  We evolve networks that can
generate a fixed sequence of outputs in response to an external
trigger occurring at varying intervals of time.  We also evolve
networks that can learn to generate one of a set of possible sequences
based upon reinforcement from the environment.  Finally, we utilize
concepts from dynamical systems theory to understand the operation of
some of these evolved networks.  A novel feature of our approach is
that we assume neither an a priori discretization of states or time
nor an a priori learning algorithm that explicitly modifies network
parameters during learning.  Rather, we merely expose dynamical neural
networks to tasks that require sequential behavior and learning and
allow the genetic algorithm to evolve network dynamics capable of
accomplishing these tasks.

Files:

/pub/agents/yamauchi/seqlearn.ps.Z	Article Text (73K)
/pub/agents/yamauchi/seqlearn-fig.ps.Z	Figures (654K)

----------------------------------------------------------------------

    Integrating Reactive, Sequential, and Learning Behavior Using
		      Dynamical Neural Networks

	      Brian Yamauchi(1,3) and Randall Beer(1,2)

	  Department of Computer Engineering and Science(1)
		       Department of Biology(2)
		   Case Western Reserve University
			 Cleveland, OH 44106

    Navy Center for Applied Research in Artificial Intelligence(3)
		      Naval Research Laboratory
		      Washington, DC 20375-5000

 This paper has been submitted to the Third International Conference
		 on Simulation of Adaptive Behavior.

			       Abstract

This paper explores the use of dynamical neural networks to control
autonomous agents in tasks requiring reactive, sequential, and
learning behavior.  We use a genetic algorithm to evolve networks that
can solve these tasks.  These networks provide a mechanism for
integrating these different types of behavior in a smooth, continuous
manner.  We applied this approach to three different task domains:
landmark recognition using sonar on a real mobile robot,
one-dimensional navigation using a simulated agent, and
reinforcement-based sequence learning.  For the landmark recognition
task, we evolved networks capable of differentiating between two
different landmarks based on the spatiotemporal information in a
sequence of sonar readings obtained as the robot circled the landmark.
For the navigation task, we evolved networks capable of associating
the location of a landmark with a corresponding goal location and
directing the agent to that goal.  For the sequence learning task, we
evolved networks that can learn to generate one of a set of possible
sequences based upon reinforcement from the environment.  A novel
feature of the learning aspects of our approach is that we assume
neither an a priori discretization of states or time nor an a priori
learning algorithm that explicitly modifies network parameters during
learning.  Instead, we expose dynamical neural networks to tasks that
require learning and allow the genetic algorithm to evolve network
dynamics capable of accomplishing these tasks.

Files:

/pub/agents/yamauchi/integ.ps.Z		Complete Article (233K)

If your printer has problems printing the complete document as a
single file, try printing the following two files:

/pub/agents/yamauchi/integ-part1.ps.Z	Pages 1-8 (77K)
/pub/agents/yamauchi/integ-part2.ps.Z	Pages 9-11 (147K)

----------------------------------------------------------------------

 On the Dynamics of a Continuous Hopfield Neuron with Self-Connection

			     Randall Beer

	    Department of Computer Engineering and Science
			Department of Biology
		   Case Western Reserve University
			 Cleveland, OH 44106

      Case Western Reserve University Technical Report CES-94-1

	 This paper has been submitted to Neural Computation.

Continuous-time recurrent neural networks are being applied to a wide
variety of problems.  As a first step toward a comprehensive
understanding of the dynamics of such networks, this paper studies the
dynamical behavior of their basic building block: a continuous
Hopfield neuron with self-connection.  Specifically, we characterize
the equilibria of this model neuron and the dependence of those
equilibria on the parameters.  We also describe the bifurcations of
this model and derive very accurate approximate expressions for its
bifurcation set.  Finally, we indicate how the basic theory developed
in this paper generalizes to a larger class of related model neurons.

File:

/pub/agents/beer/CTRNNDynamics1.ps.Z	Complete Article (233K)

----------------------------------------------------------------------

FTP instructions:

To retrieve and print a file (for example: seqlearn.ps), use the
following commands:

unix> ftp yuggoth.ces.cwru.edu
Name: anonymous
Password: (your email address)
ftp> binary
ftp> cd /pub/agents/yamauchi (or cd /pub/agents/beer for CTRNNDynamics1.ps.Z)
ftp> get seqlearn.ps.Z
ftp> quit
unix> uncompress seqlearn.ps.Z
unix> lpr seqlearn.ps

(ls doesn't currently work properly on our ftp server.  This will be fixed
soon, but in the meantime, these files can still be copied, even though
they don't appear in the directory listing.)

_______________________________________________________________________________

Brian Yamauchi			Case Western Reserve University
yamauchi at alpha.ces.cwru.edu	Department of Computer Engineering and Science
_______________________________________________________________________________


From isabelle at neural.att.com  Fri Feb 11 20:51:16 1994
From: isabelle at neural.att.com (Isabelle Guyon)
Date: Fri, 11 Feb 94 20:51:16 EST
Subject: robust statistics
Message-ID: <9402120151.AA21483@neural>


I would like to bring more arguments to Terry's remarks:

> One man's outlyer is another man's data point.

If the data is perfectly clean, outlyers are very valuable patterns.

From mmoller at daimi.aau.dk  Mon Feb 14 02:15:18 1994
From: mmoller at daimi.aau.dk (Martin Fodslette M|ller)
Date: Mon, 14 Feb 1994 08:15:18 +0100
Subject: Thesis available.
Message-ID: <199402140715.AA18638@titan.daimi.aau.dk>


/*******************  PLEASE DO NOT FORWARD ***********************/

I finally finished up my thesis:

	Efficient Training of Feed-Forward Neural Networks


The thesis has the following content:

	Chapter 1.  Resume in danish (should anyone need that (-:)

	Chapter 2.  Notation and basic definitions.

	Chapter 3.  Training Methods: An Overview

	Chapter 4.  Calculation of Hessian Information

	Chapter 5.  Different Error Functions.

	Appendix A. A Scaled Conjugate Gradient Algorithm
		    for Fast Supervised Learning.

	Appendix B. Supervised Learning on Large Redundant
		    Training Sets.

	Appendix C. Exact Calculation of the Product of the Hessian
		    Matrix and a Vector in O(N) time.

	Appendix D. Adaptive Preconditioning of the Hessian Matrix.

	Appendix E. Improving Network Solutions.


The appendices concerns own work (original contributions), while
the chapters provide an overview.

The thesis is now available in a limited number of hard-copies.

People interested in a copy should send an email with there address to me.


Best Regards

-martin

----------------------------------------------------------------
Martin Moller                    email: mmoller at daimi.aau.dk
Computer Science Dept.           Fax:   +45 8942 3255
Aarhus University                Phone: +45 8942 3371
Ny Munkegade, Build. 540,
DK-8000 Aarhus C,
Denmark
----------------------------------------------------------------


From edelman at wisdom.weizmann.ac.il  Mon Feb 14 02:39:27 1994
From: edelman at wisdom.weizmann.ac.il (Edelman Shimon)
Date: Mon, 14 Feb 1994 09:39:27 +0200
Subject: TR available: Representation of similarity in 3D ...
Message-ID: <199402140739.JAA00503@eris.wisdom.weizmann.ac.il>

FTP-host: eris.wisdom.weizmann.ac.il
FTP-filename: /pub/tr-94-02.ps.Z
URL: http://eris.wisdom.weizmann.ac.il/

Uncompressed size: 2.6 Mb. Preliminary version; comments welcome.


Representation of similarity in 3D object discrimination

Shimon Edelman

\begin{abstract}

  How does the brain represent visual objects? In simple perceptual
  generalization  tasks, the human visual system performs as if
  it represents the stimuli in a low-dimensional metric psychological
  space \cite{Shepard87}. In theories of 3D shape recognition, the
  role of feature-space representations (as opposed to structural
  \cite{Biederman87} or pictorial \cite{Ullman89} descriptions) has
  been for a long time a major point of contention. If shapes are
  indeed represented as points in a feature space, patterns of
  perceived similarity among different objects must reflect the
  structure of this space. The feature space hypothesis can then be
  tested by presenting subjects with complex parameterized 3D shapes,
  and by relating the similarities among subjective representations,
  as revealed in the response data by multidimensional scaling
  \cite{Shepard80}, to the objective parameterization of the stimuli.
  The results of four such tests, reported below, support the notion
  that discrimination among 3D objects may rely on a low-dimensional
  feature space representation, and suggest that this space may be
  spanned by explicitly encoded class prototypes.  

\end{abstract}

From grumbach at inf.enst.fr  Mon Feb 14 03:51:22 1994
From: grumbach at inf.enst.fr (grumbach@inf.enst.fr)
Date: Mon, 14 Feb 94 09:51:22 +0100
Subject: papers on time and neural networks
Message-ID: <9402140851.AA10372@enst.enst.fr>


As guest editors of a special issue of the Sigart Bulletin about :

                      Time and Neural Networks

we are looking for 4 articles about 10 pages each.

Sigart is a quarterly publication of the Association for Computing
Machinery (ACM) special interest group on Artificial Intelligence.

The paper may either deal with approachs of time processing using
traditional connectionist architectures, or with more specific models
integrating time in their basis.

If you are interested, and if you can submit a paper (not already
published) within a short delay (about 1 month and a half), please send a
draft (if possible a Word file) :
- preferably by giving ftp access to it (information via e-mail)
- or sending it as "attached file" on e-mail
- or posting a paper copy of it.

Drafts should be received before April 1.
Notification of acceptance will be sent before April 20.

grumbach at enst.fr or chaps at enst.fr

Alain Grumbach and Cedric Chappelier
ENST dept INF
46 rue Barrault
75634 Paris Cedex 13
France


From P.Refenes at cs.ucl.ac.uk  Mon Feb 14 09:13:12 1994
From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk)
Date: Mon, 14 Feb 94 14:13:12 +0000
Subject: robust statistics
In-Reply-To: Your message of "Thu, 10 Feb 94 09:45:15 PST." <9402101745.AA28545@salk.edu>
Message-ID: <mailman.659.1149591300.29955.connectionists@cs.cmu.edu>

The term outliers does not mean that they are not part of the joint
data probability distribution or that they contain no information
for estimating the regression surface; it means rather that outliers are too small 
a fraction of the observations to be allowed to dominate the small-sample 
behaviour of the statistics to be calculated. With parametric regression
modelling techniques it is easy to quantify this efefct by simply
comptuing the effect that each data point has on the regression surface.
This is not a trivial problem in non-parametric modelling but the
statistics literature is full of methods to deal with it.

Paul refenes

From rsun at cs.ua.edu  Mon Feb 14 12:22:20 1994
From: rsun at cs.ua.edu (Ron Sun)
Date: Mon, 14 Feb 1994 11:22:20 -0600
Subject: No subject
Message-ID: <9402141722.AA28238@athos.cs.ua.edu>


A monograph on connectionist models is available 
from John Wiley and Sons, Inc. 

Title: Integrating Rules and Conenctionism for Robust Commonsense Reasoning

ISBN 0-471-59324-9
Author:   Ron Sun
          Assistant Professor
          Department of Computer Science
          The University of Alabama
          Tuscaloosa, AL 35487


contact John Wiley and Sons, Inc.
at   1-800-call-wiley

Or
John Wiley and Sons, Inc.
605 Third Ave.
New York, NY 10158-0012 USA
(212) 850-6589
FAX: (212) 850-6088

------------------------------------------------------------------
A brief description is as follows:

One of the outstanding problems for artificial intelligence is 
the problem of better modeling commonsense reasoning
and alleviating brittleness of traditional symbolic rule-based models.
This work tackles this problem by trying to  combining rules with 
connectionist models in an integrated framework.
This idea leads to the development of a connectionist
architecture with dual representation combining symbolic and subsymbolic 
(feature-based) processing for evidential robust reasoning: {\sc CONSYDERR}.
Reasoning data are analyzed based on the notions of {\it rules} and 
{\it similarity} and modeled by the architecture which carries out 
rule application and similarity matching through interaction of the two levels;
formal analyses are performed to understand  rule encoding in connectionist
models, in order to prove that it handles a superset of Horn clause logic and 
a nonmonotonic logic; the notion of causality is explored for the purpose 
of clarifying  how the proposed architecture can better capture commonsense 
reasoning, and it is shown that causal knowledge can be well represented by 
{\sc CONSYDERR} and utilized in reasoning, which further justifies the design 
of the architecture; the variable binding problem is addressed, and a solution 
is proposed within this architecture and is shown to surpass existing ones;
several aspects of the architecture are discussed to demonstrate how 
connectionist models can supplement, enhance, and integrate symbolic 
rule-based reasoning; large-scale application-oriented systems are prototyped.
This architecture utilizes the synergy resulting from the interaction of
the two different types of representation and processing, and is therefore  
capable of handling a large number of difficult issues in one integrated
framework, such as partial and inexact information, cumulative evidential 
combination, lack of exact match, similarity-based inference, inheritance,
and representational interactions, all of which are proven to be crucial
elements of commonsense reasoning.  The results show that connectionism 
coupled with symbolic processing capabilities can be effective and 
efficient models of reasoning for both theoretical and practical purposes.


Table of Content

 1 Introduction
 1.1 Overview
 1.2 Commonsense Reasoning
 1.3 The Problem of Common Reasoning Patterns
 1.4 What is the Point?
 1.5 Some Clarifications
 1.6 The Organization of the Book
 1.7 Summary

 2 Accounting for Commonsense Reasoning: A Framework with Rules and Similarities
 2.1 Overview
 2.2 Examples of Reasoning
 2.3 Patterns of Reasoning
 2.4 Brittleness of Rule-Based Reasoning
 2.5 Towards a Solution
 2.6 Some Reflections on Rules and Connectionism
 2.7 Summary

 3 A Connectionist Architecture for Commonsense Reasoning
 3.1 Overview
 3.2 A Generic Architecture
 3.3 Fine-Tuning --- from Constraints to Specifications
 3.4 Summary
 3.5 Appendix

 4 Evaluations and Experiments
 4.1 Overview
 4.2 Accounting for the Reasoning Examples
 4.3 Evaluations of the Architecture
 4.4 Systematic Experiments
 4.5 Choice, Focus and Context
 4.6 Reasoning with Geographical Knowledge
 4.7 Applications to Other Domains
 4.8 Summary
 4.9 Appendix: Determining Similarities and CD representations

 5 More on the Architecture: Logic and Causality
 5.1 Overview
 5.2 Causality in General
 5.3 Shoham's Causal Theory
 5.4 Defining FEL
 5.5 Accounting for Commonsense Causal Reasoning
 5.6 Determining Weights
 5.7 Summary
 5.8 Appendix: Proofs For Theorems

 6 More on the Architecture: Beyond Logic
 6.1 Overview
 6.2 Further Analysis of Inheritance
 6.3 Analysis of Interaction in Representation
 6.4 Knowledge Acquisition, Learning, and Adaptation 
 6.5 Summary

 7 An Extension: Variables and Bindings
 7.1 Overview
 7.2 The Variable Binding Problem
 7.3 First-Order FEL
 7.4 Representing Variables
 7.5 A Formal Treatment
 7.6 Dealing with Difficult Issues
 7.7 Compilation
 7.8 Correctness
 7.9 Summary
 7.10 Appendix

 8 Reviews and Comparisons
 8.1 Overview
 8.2 Rule-Based Reasoning
 8.3 Case-Based Reasoning
 8.4 Connectionism
 8.5 Summary

 9 Conclusions
 9.1 Overview
 9.2 Some Accomplishments
 9.3 Lessons Learned
 9.4 Existing Limitations
 9.5 Future Directions
 9.6 Summary

 References


From trevor at white.Stanford.EDU  Mon Feb 14 17:37:50 1994
From: trevor at white.Stanford.EDU (Trevor Darrell)
Date: Mon, 14 Feb 94 14:37:50 PST
Subject: outlier, robust statistics
In-Reply-To: Terry Sejnowski's message of Thu, 10 Feb 94 09:45:15 PST <9402101745.AA28545@salk.edu>
Message-ID: <9402142237.AA24561@white.Stanford.EDU>


   [terry at salk.edu]
   One man's outlier is another man's data point.  Another
   way to handle outliers is not to remove them but to model them
   explicitly.  Geoff Hinton has pointed out that character
   recognition can be made more robust by including models
   for background noise such as postmarks.

Explicitly modeling an occluding or transparently combined "outlier"
process is a powerful way to build a robust estimator. As mentioned in
other replies to this post, estimators which use a mixture model
(either implicitly or explicitly), such as the EM algorithm, are
promising methods to implement this type of strategy.

One issue which often complicates matters is how to decide how many
objects or processes there are in the signal, e.g. determine K in the
EM estimator. I would like to ask if anyone has a pointer to work on
estimating K in the context of an EM estimator or similar methods?
Often the appropriate cardinality of the model is not easily known
a priori.

   Steve Nowlan and I recently used mixtures of expert networks
   to separate multiple interpenetrating flow fields -- the
   transparency problem for visual motion.  The gating network
   was used to select regions of the visual field that 
   contained reliable estimates of local velocity for 
   which there was coherent global support.  There is
   evidence for such selection neurons in area MT of primate
   visual cortex, a region of cortex that specializes in
   the detection of coherent motion.

I'd also like to add a pointer to some related work Sandy Pentland,
Eero Simoncelli and I have done in this domain developing a strategy
for robust estimation ("outlier exclusion") based on minimum
description length theory. Our method effectively implements a
clustering method to find how many processes there are (e.g. estimate
K), and then iteratively refine estimates of the parameters and
"support" (segmentation) of those processes.  We have developed
versions of this method for range and motion segmentation, both for
occluded and transparently combined processes.

   [pluto at cs.ucsd.edu:]
   >I look forward to reading (Liu 94).  Can you (or anyone else)
   >point me to other references utilizing a similar definition
   >of "outlier?"  (IMHO) "outlier" is quite a value-laden term
   >that I tend to avoid since I feel it has multiple and
   >often ambiguous interpretations/definitions.  

Here are some references to conference papers on our work. A longer
journal paper that combines these is in the works, email me if you
would like a preprint when it becomes available.

Darrell, Sclaroff and Pentland, "Segmentation by Minimal Description",
Proc. 3rd Intl. Conf. Computer Vision, Osaka, Japan, 1990 (also
avail. as MIT Media Lab Percom TR-163.)

Darrell and Pentland, "Robust Estimation of a Multi-Layer Motion 
Representation", Proc. IEEE Workshop on Visual Motion, Princeton, October 1991

Darrell and Pentland, "Against Edges: Function Approximation with
Multiple Support Maps", NIPS 4, 1992

Darrell and Simoncelli, "Separation of Transparent Motion into Layers
using Velocity-tuned Mechanisms", Assn. for Resarch in Vision and
Opthm. (ARVO) 1993, also available as MIT Media Lab Percom TR-244.

(Percom TR's can be anon. ftp'ed from whitechapel.media.mit.edu)

--trevor


From jagota at next1.msci.memst.edu  Mon Feb 14 20:18:56 1994
From: jagota at next1.msci.memst.edu (Arun Jagota)
Date: Mon, 14 Feb 1994 19:18:56 -0600
Subject: DIMACS Challenge neural net papers
Message-ID: <199402150118.AA02676@next1>


Dear Connectionists:

Expanded versions of two neural net papers presented at the DIMACS Challenge 
on Cliques, Coloring, and Satisfiability are now available via anonymous ftp
(see below). First an excerpt from the Challenge announcement back in 1993:

			----------------------
The purpose of this Challenge is to encourage high quality empirical
research on difficult problems.  The problems chosen are known to be
difficult to solve in theory. How difficult are they to solve in practice?  
			----------------------

ftp ftp.cs.buffalo.edu (or 128.205.32.9 subject-to-change)
Name : anonymous
> cd users/jagota
> binary
> get DIMACS_Grossman.ps.Z
> get DIMACS_Jagota.ps.Z
> quit
> uncompress *.Z

Sorry, no hard copies. Copies may be requested by electronic mail to me 
(jagota at next1.msci.memst.edu) for those without access to ftp or for whom
ftp fails. Please use as last resort.

		Applying The INN Model to the MaxClique Problem

		  Tal Grossman, email: tal at goshawk.lanl.gov
      Complex Systems Group, T-13, and Center for Non Linear Studies
      		MS B213, Los Alamos National Laboratory
			 Los Alamos, NM 87545
 
		  Los Alamos Tech Report: LA-UR-93-3082

A neural network model, the INN (Inverted Neurons Network), is applied to
the Maximum Clique problem. First, I describe the INN model and how it 
implements a given graph instance. The model has a threshold parameter $t$, 
which determines the character of the network stable states. As shown in an 
earlier work (Grossman-Jagota), the stable states of the network correspond 
to the $t$-codegree sets of its underlying graph, and, in the case of $t<1$, 
to its maximal cliques. These results are briefly reviewed. In this work I 
concentrate on improving the deterministic dynamics called $t$-annealing.
The main issue is the initialization procedure and the choice of parameters.
Adaptive procedures for choosing the initial state of the network and
setting the threshold are presented. The result is the ``Adaptive t-Annealing"
algorithm (AtA). This algorithm is tested on many benchmark problems and found 
to be more efficient  than steepest descent or the simple t-annealing procedure.


		Approximately Solving Maximum Clique using 
		  Neural Network and Related Heuristics *

	   Arun Jagota 				Laura Sanchis 
     Memphis State University		      Colgate University

			   Ravikanth Ganesan 
		State University of New York at Buffalo

We explore neural network and related heuristic methods for the fast 
approximate solution of the Maximum Clique problem. One of these algorithms, 
{\em Mean Field Annealing}, is implemented on the Connection Machine CM-5 and 
a fast annealing schedule is experimentally evaluated on random graphs, as 
well as on several benchmark graphs. The other algorithms, which perform 
certain randomized local search operations, are evaluated on the same 
benchmark graphs, and on {\bf Sanchis} graphs. One of our algorithms adjusts 
its internal parameters as its computation evolves. On {\bf Sanchis} graphs,
it finds significantly larger cliques than the other algorithms do. Another 
algorithm, GSD$(\emptyset)$, works best overall, but is slower than the 
others. All our algorithms obtain significantly larger cliques than other 
simpler heuristics but run slightly slower; they obtain significantly smaller 
cliques on average than exact algorithms or more sophisticated heuristics but 
run considerably faster. All our algorithms are simple and inherently 
parallel.

* - 24 pages in length (twice as long as its previous version). 

Arun Jagota

From terry at salk.edu  Tue Feb 15 02:56:04 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Mon, 14 Feb 94 23:56:04 PST
Subject: outlier, robust statistics
Message-ID: <9402150756.AA17907@salk.edu>

I have received many requests for a reference to the motion model
I mentioned recently in the context of robust statistics.
An early version can be found in:

Nowlan, S. J. and Sejnowski, T. J., Filter selection model for generating
visual motion signals, In: C. L. Giles, S. J. Hanson and J. D. Cowan (Eds.)
Advances in Neural Information Processing Systems 5, San Mateo, CA:
Morgan Kaufman Publishers, 369-376 (1993).

Two longer papers on the computational theory and the biological
consequences are in review.

Darrell and Pentland have an interesting iterative approach
in which multiple hypotheses compete to include motion samples
within their regions of support.  A relaxation scheme must decide
on the number of objects and the correct velocity assignments.
Our approach to motion estimation is simpler in that hypotheses
do not correspond to objects, but to distinct velocities, and 
the number of hypotheses is always fixed.  This allows the
selection of regions of support to be performed non-iteratively.
The architecture of the model is feedforward with soft-max within 
layers, so it is quite fast.  Mixtures of experts was used to optimize
the weights in the network.

Terry

-----

From schmidhu at informatik.tu-muenchen.de  Tue Feb 15 04:06:19 1994
From: schmidhu at informatik.tu-muenchen.de (Juergen Schmidhuber)
Date: Tue, 15 Feb 1994 10:06:19 +0100
Subject: postdoctoral thesis
Message-ID: <94Feb15.100623met.42337@papa.informatik.tu-muenchen.de>


        ---------------- postdoctoral thesis ----------------
                         Juergen Schmidhuber
                  Technische Universitaet Muenchen
	    (submitted April 1993, accepted October 1993)    
        -----------------------------------------------------

        NETZWERKARCHITEKTUREN, ZIELFUNKTIONEN UND KETTENREGEL

       Es gibt relativ neuartige, auf R"uckkopplung basierende 
       k"unstliche  neuronale  Netze (KNN), deren F"ahigkeiten 
       betr"achtlich "uber  simple Musterassoziation hinausge-
       hen. Diese KNN gestatten im Prinzip die Implementierung 
       beliebiger auf  einem herk"ommlichen sequentiell arbei-
       tenden  Digitalrechner berechenbarer Funktionen. Im Ge-
       gensatz  zu herk"ommlichen  Rechnern l"a"st  sich dabei 
       jedoch die Qualit"at der Ausgaben  (formal spezifiziert 
       durch  eine  sinnvolle  Zielfunktion)   bez"uglich  der 
       ``Software'' (bei KNN  die Gewichtsmatrix) mathematisch 
       differenzieren, was  die Anwendung der  Kettenregel zur 
       Herleitung  gradientenbasierter Software"anderungsalgo-
       rithmen erm"oglicht. Die Arbeit verdeutlicht dies durch 
       formale Herleitung einer Reihe neuartiger Lernalgorith-
       men aus  folgenden Bereichen:  (1) "uberwachtes  Lernen 
       sequentiellen Ein/Ausgabeverhaltens  mit zyklischen und 
       azyklischen Architekturen, (2) ``Reinforcement Lernen'' 
       und  Subzielgenerierung  ohne  informierten Lehrer, (3) 
       un"uberwachtes Lernen zur  Redundanzextraktion aus Ein-
       gaben und Eingabestr"omen.  Zahlreiche Experimente zei-
       gen M"oglichkeiten und Schranken dieser Lernalgorithmen 
       auf.  Zum Abschluss  wird ein  ``selbstreferentielles'' 
       neuronales  Netzwerk pr"asentiert,  welches theoretisch 
       lernen kann, seinen eigenen Software"anderungsalgorith-
       mus zu "andern.

       -----------------------------------------------------


The postdoctoral thesis above is now available (in unrevised form) 
via ftp. To obtain a copy, follow the instructions at the end of 
this message.  

Here is additional information for those who are interested 
but don't understand German (or are unfamiliar with Germany's 
academic system): The postdoctoral thesis is  part of a process 
called ``Habilitation'' which is seen as a qualification for 
tenure. The thesis is about learning algorithms derived by the 
chain rule. It addresses supervised sequence learning, variants 
of reinforcement learning, and unsupervised learning (for 
redundancy reduction).  Unlike some previous papers of mine, 
it contains lots of experiments and lots of figures.  Here is 
a very brief summary based on pointers to recent English 
publications upon which the thesis elaborates:

Chapters 2 and 3 are on supervised sequence learning and extend 
publications [1] and [4]. Chapter 4 is on variants of learning 
with a ``distal teacher'' and extends publication [7] (robot 
experiments in chapter 4  were conducted by Eldracher and Baginski, 
see e.g. [9]). Chapters 5, 6 and 7 describe  unsupervised learning 
algorithms based on detection of redundant information in input 
patterns and pattern sequences: Chapter 5 elaborates on publication 
[5], and chapter 6 extends publication [3].  Chapter 6 includes a 
result by Peter Dayan, Richard Zemel and A. Pouget (SALK Institute) 
who demonstrated that equation (4.3) in [3] with $\beta = 0, \alpha =
= \gamma =1$ is essentially equivalent to equation (5.1).  Chapter 
6 also includes experiments conducted by Stefanie Lindstaedt who 
successfully applied the method in [3] to redundant images of 
letters presented according to the probabilities of English 
language, see [10].  Chapter 7 extends publications [2] and [8]. 
Experiments show how sequence processing neural nets using algorithms 
for redundancy reduction can learn to bridge time lags (between 
correlated events) of more than 1000 discrete time steps. Other 
experiments use neural nets for text compression and compare them
to standard data compression algorithms. Finally, chapter 8 
elaborates on publication [6]. 

-------------------------- References -------------------------------

[1] J. H. Schmidhuber.  A fixed size storage O(n^3) time complexity 
learning algorithm for fully recurrent continually running networks.
Neural Computation, 4(2):243--248, 1992.

[2] J. H. Schmidhuber.  Learning complex, extended sequences using the 
principle of history compression.  Neural Computation, 4(2):234--242, 1992.

[3] J. H. Schmidhuber.  Learning factorial codes by predictability 
minimization.  Neural Computation, 4(6):863--879, 1992.

[4] J. H. Schmidhuber.  Learning to control fast-weight memories: An 
alternative to recurrent nets.  Neural Computation, 4(1):131--139, 1992.

[5] J. H. Schmidhuber and D. Prelinger.  Discovering predictable 
classifications.  Neural Computation, 5(4):625--635, 1993.

[6] J. H. Schmidhuber. A self-referential weight matrix. In Proc. of 
the Int. Conf. on Artificial Neural Networks, Amsterdam, pages 446--451. 
Springer, 1993.

[7] J. H. Schmidhuber and R. Wahnsiedler.  Planning simple trajectories 
using neural subgoal generators.  In J. A. Meyer, H. L. Roitblat, and S. W. 
Wilson, editors, Proc.  of the 2nd Int. Conf. on Simulation of Adaptive 
Behavior, pages 196--202. MIT Press, 1992.

[8] J. H. Schmidhuber, M. C. Mozer, and D. Prelinger.  Continuous history 
compression.  In H. Huening, S. Neuhauser, M. Raus, and W. Ritschel, 
editors,  Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, 
pages 87--95.  Augustinus, 1993.

[9] M. Eldracher and B. Baginski. Neural subgoal generation using 
backpropagation.  In George G. Lendaris, Stephen Grossberg and Bart 
Kosko, editors, Proc.  of WCNN'93, Lawrence Erlbaum Associates, Inc.,  
Hillsdale, pages = III-145--III-148, 1993.

[10] S.  Lindstaedt.  Comparison of unsupervised neural networks for 
redundancy reduction.  In M. C. Mozer, P. Smolensky, D. S. Touretzky, 
J. L. Elman and A. S.  Weigend, editors, Proc. of the 1993 Connectionist 
Models Summer School, pages  308-315. Hillsdale, NJ: Erlbaum Associates, 
1993.

----------------------------------------------------------------------	

The thesis comes in three parts. To obtain a copy, do:

	     unix>         ftp 131.159.8.35

	     Name:         anonymous
             Password:     (your email address, please) 
	     ftp>          binary
	     ftp>          cd pub/fki
             ftp>          get schmidhuber.habil.1.ps.Z
             ftp>          get schmidhuber.habil.2.ps.Z
             ftp>          get schmidhuber.habil.3.ps.Z
	     ftp>          bye

	     unix>         uncompress schmidhuber.habil.1.ps.Z
	     unix>         lpr  schmidhuber.habil.1.ps
	     .
	     .
	     .
    
    Note: The layout is designed for conventional 
    European DINA4 format. Expect 145 pages.


----------------------------------------------------------------------	
 
Dr. habil. J. H. Schmidhuber,  Fakultaet fuer Informatik, 
Technische Universitaet Muenchen, 80290 Muenchen, Germany
schmidhu at informatik.tu-muenchen.de


        --------- postdoctoral thesis (unrevised) -----------
        NETZWERKARCHITEKTUREN, ZIELFUNKTIONEN UND KETTENREGEL
                      Juergen Schmidhuber, TUM

From Petri.Myllymaki at cs.Helsinki.FI  Tue Feb 15 04:52:42 1994
From: Petri.Myllymaki at cs.Helsinki.FI (Petri Myllymaki)
Date: Tue, 15 Feb 1994 11:52:42 +0200
Subject: Thesis in neuroprose
Message-ID: <199402150952.LAA01783@keos.Helsinki.FI>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/myllymaki.thesis.ps.Z

The following report has been placed in the neuroprose archive.

-----------------------------------------------------------------------
Bayesian Reasoning by Stochastic Neural Networks

Petri Myllymaki

Ph.Lic. Thesis
Department of Computer Science, University of Helsinki
Report C-1993-67, Helsinki, December 1993
78 pages

This work has been motivated by problems in several research areas:
expert system design, uncertain reasoning, optimization theory, and
neural network research. From the expert system design point of view,
our goal was to develop a generic expert system shell capable of
handling uncertain data. The theoretical framework used here for
handling uncertainty is probabilistic reasoning, in particular the
theory of Bayesian belief network representations. The probabilistic
reasoning task we are interested in is, given a Bayesian network
representation of a probability distribution on a set of discrete
random variables, to find a globally maximal probability state
consistent with given initial constraints. To solve this NP-hard
problem approximatively, we use an iterative stochastic method, Gibbs
sampling.  As this method can be quite inefficient when implemented on
a conventional sequential computer, we show how to construct a Gibbs
sampling process for a given Bayesian network on a massively parallel
architecture, a harmony neural network, which is a special case of the
Boltzmann machine architecture.

To empirically test the method developed, we implemented a hybrid
neural-symbolic expert system shell, NEULA. The symbolic part of the
system consists of a high-level conceptual description language and a
compiler, which can be used for constructing Bayesian networks and
providing them with the corresponding parameters (conditional
probabilities).  As the number of parameters needed for a given network
may generally be quite large, we restrict ourselves to Bayesian
networks having a special hierarchical structure.  The neural part of
the system consists of a neural network simulator which performs
massively parallel Gibbs sampling.  The performance of the NEULA system
was empirically tested by using a small artificial test example.

Computing Reviews (1991) Categories and Subject Descriptors:
G.3     [Probability and statistics]: Probabilistic algorithms
F.1.1   [Models of computation]: Neural networks
G.1.6   [Optimization]: Constrained optimization
I.2.5   [Programming languages and software]: Expert system tools and
techniques

General Terms: Algorithms, Theory.

Additional Key Words and Phrases:
Monte Carlo algorithms, Gibbs sampling, simulated annealing,
Bayesian belief networks, connectionism, massive parallelism

-----------------------------------------------------------------------
To obtain a copy:

  ftp archive.cis.ohio-state.edu
  login: anonymous
  password: <your email address>
  cd pub/neuroprose/Thesis
  binary
  get myllymaki.thesis.ps.Z
  quit

Then at your system:

  uncompress myllymaki.thesis.ps.Z
  lpr myllymaki.thesis.ps

-----------------------------------------------------------------------
Petri Myllymaki                          Petri.Myllymaki at cs.Helsinki.FI
Department of Computer Science           Int.+358 0 708 4212 (tel.)
P.O.Box 26 (Teollisuuskatu 23)           Int.+358 0 708 4441 (fax)
FIN-00014 University of Helsinki, Finland
-----------------------------------------------------------------------

From thrun at uran.cs.bonn.edu  Tue Feb 15 08:25:02 1994
From: thrun at uran.cs.bonn.edu (Sebastian Thrun)
Date: Tue, 15 Feb 1994 14:25:02 +0100
Subject: 2 papers on robot learning
Message-ID: <199402151325.OAA17317@carbon.informatik.uni-bonn.de>


This is to announce two recent papers in the connectionists' archive.
Both papers deal with robot learning issues. The first paper describes
two learning approaches (EBNN with reinforcement learning, COLUMBUS),
and the second paper gives some empirical results for learning robot
navigation using reinforcement learning and EBNN.  Both approaches have
been evaluated using real robot hardware.

Enjoy reading!
Sebastian


------------------------------------------------------------------------


                   LIFELONG ROBOT LEARNING


         Sebastian Thrun              Tom Mitchell
        University of Bonn    Carnegie Mellon University

Learning provides a useful tool for the automatic design of autonomous
robots.  Recent research on learning robot control has predominantly
focussed on learning single tasks that were studied in isolation.  If
robots encounter a multitude of control learning tasks over their
entire lifetime, however, there is an opportunity to transfer knowledge
between them. In order to do so, robots may learn the invariants of the
individual tasks and environments. This task-independent knowledge can
be employed to bias generalization when learning control, which reduces
the need for real-world experimentation.  We argue that knowledge
transfer is essential if robots are to learn control with moderate
learning times in complex scenarios.  Two approaches to lifelong robot
learning which both capture invariant knowledge about the robot and its
environments are reviewed.  Both approaches have been evaluated using a
HERO-2000 mobile robot.  Learning tasks included navigation in unknown
indoor environments and a simple find-and-fetch task.


                                          (Technical Report IAI-TR-93-7,
                                           Univ. of Bonn, CS Dept.)


------------------------------------------------------------------------


           AN APPROACH TO LEARNING ROBOT NAVIGATION

                Sebastian Thrun. Univ. of Bonn


Designing robots that can learn by themselves to perform complex
real-world tasks is still an open challenge for the fields of Robotics
and Artificial Intelligence.  In this paper we describe an approach to
learning indoor robot navigation through trial-and-error.  A mobile
robot, equipped with visual, ultrasonic and infrared sensors, learns to
navigate to a designated target object.  In less than 10 minutes
operation time, the robot is able to learn to navigate to a marked
target object in an office environment.  The underlying learning
mechanism is the explanation-based neural network (EBNN) learning
algorithm. EBNN initially learns function from scratch using neural
network representations. With increasing experience, EBNN employs
domain knowledge to explain and to analyze training data in order to
generalize in a knowledgeable way.


                                (to appear in: Proceedings of the 
                                 IEEE Conference on Intelligent
                                 Robots and Systems 1994)

------------------------------------------------------------------------


Postscript versions of both papers may be retrieved from Jordan
Pollack's neuroprose archive by following the instructions below.

	unix>           ftp archive.cis.ohio-state.edu

	ftp login name> anonymous
	ftp password>   xxx at yyy.zzz
	ftp>            cd pub/neuroprose
	ftp>		bin
	ftp>            get thrun.lifelong-learning.ps.Z
	ftp>            get thrun.learning-robot-navg.ps.Z
	ftp>            bye

	unix>           uncompress thrun.lifelong-learning.ps.Z
	unix>           uncompress thrun.learning-robot-navg.ps.Z
	unix>           lpr thrun.lifelong-learning.ps.Z
	unix>           lpr thrun.learning-robot-navg.ps.Z


From chaps at inf.enst.fr  Tue Feb 15 09:22:03 1994
From: chaps at inf.enst.fr (Cedric Chappelier)
Date: Tue, 15 Feb 94 15:22:03 +0100
Subject: papers on time and neural networks (Correction)
Message-ID: <9402151422.AA03059@ulysse.enst.fr.enst.fr>

Yesterday we send the following announcement. We want to make a
little correction : the format of the paper can either be Word file
(as mentioned in the first mail) OR A LATEX FILE. 

> 
> As guest editors of a special issue of the Sigart Bulletin about :
> 
>                       Time and Neural Networks
> 
> we are looking for 4 articles about 10 pages each.
> 
> Sigart is a quarterly publication of the Association for Computing
> Machinery (ACM) special interest group on Artificial Intelligence.
> 
> The paper may either deal with approachs of time processing using
> traditional connectionist architectures, or with more specific models
> integrating time in their basis.
> 
> If you are interested, and if you can submit a paper (not already
> published) within a short delay (about 1 month and a half), please send a
> draft (if possible a Word file) :
         ^^^^^^^^^^^^^^^^^^^^^^^

OR A LATEX FILE            

> - preferably by giving ftp access to it (information via e-mail)
> - or sending it as "attached file" on e-mail
> - or posting a paper copy of it.
> 
> Drafts should be received before April 1.
> Notification of acceptance will be sent before April 20.
> 
> grumbach at enst.fr or chaps at enst.fr
> 
> Alain Grumbach and Cedric Chappelier
> ENST dept INF
> 46 rue Barrault
> 75634 Paris Cedex 13
> France
> 
> 

Sorry for the negligence.

---
E-mail: chaps at inf.enst.fr  ||  Cedric.Chappelier at enst.fr

P-mail: Telecom Paris
        46, rue Barrault - 75634 Paris cedex 13


From COTTRLL at FRMOP22.CNUSC.FR  Tue Feb 15 18:42:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Tue, 15 Feb 94 18:42
Subject: Available paper : Kohonen algorithm
Message-ID: <"94-02-15-18:42:21.90*COTTRLL"@FRMOP22.CNUSC.FR>

The following paper is available from anonymous ftp on
archive.cis.ohio-state.edu (128.146.8.52)
in directory pub/neuroprose as file
cottrell.things.ps

"Two or three things that we know about the Kohonen algorithm"
10 pages
by Marie Cottrell, Jean-Claude Fort, Gilles Pages
SAMOS, Universite Paris 1
90, rue de Tolbiac
75634 PARIS Cedex 13
FRANCE

ABSTRACT

Many theoretical papers are published about the Kohonen algorithm. It is not
not
easy to understand what is exactly proved, because of the great variety
of mathematical methods. Despite all these efforts, many problems
remain without solution. In this small review paper, we intend to sum up
the situation.

 To appear in the Proceedings of
 ESANN 94, Bruxelles

 To retrieve
 >ftp archive.cis.ohio-state.edu
 name : anonymous
 password: (use your e-mail address)
 ftp> cd pub/neuroprose
 ftp> get cottrell.things.ps
 ftp> quit

From platt at synaptics.com  Tue Feb 15 20:13:14 1994
From: platt at synaptics.com (John Platt)
Date: Tue, 15 Feb 94 17:13:14 PST
Subject: Neuroprose paper available
Message-ID: <9402160113.AA18442@synaptx.synaptics.com>

****** PAPER AVAILABLE VIA NEUROPROSE ***************************************
****** AVAILABLE VIA FTP ONLY ***********************************************
****** PLEASE DO NOT FORWARD TO OTHER MAILING LISTS OR BOARDS. **************

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wolf.address-block.ps.Z

The following paper has been placed in the Neuroprose archives at Ohio
State. The file is wolf.address-block.ps.Z . Only the electronic version
of this paper is available.  This paper is 8 pages in length. 

NOTE: The uncompressed postscript file is approximately 2.7 megabytes in
length, so it may take a while to print out. Also, you may have to tell
the lpr program to use a symbolic link to copy into the spool directory
(lpr -s under SunOS).


-----------------------------------------------------------------------------

     Postal Address Block Location Using A Convolutional Locator Network

			 Ralph Wolf and John C. Platt
			       Synaptics, Inc.
			     2698 Orchard Parkway
			      San Jose, CA 95134

				  ABSTRACT:

    This paper describes the use of a convolutional neural network to
    perform address block location on machine-printed mail pieces.
    Locating the address block is a difficult object recognition problem
    because there is often a large amount of extraneous printing on a mail
    piece and because address blocks vary dramatically in size and shape.
    
    We used a convolutional locator network with four outputs, each trained
    to find a different corner of the address block.  A simple set of
    rules was used to generate ABL candidates from the network output. The
    system performs very well: when allowed five guesses, the network will
    tightly bound the address delivery information in 98.2% of the cases.

-----------------------------------------------------------------------------

							John Platt
							platt at synaptics.com


From terry at salk.edu  Tue Feb 15 22:44:00 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Tue, 15 Feb 94 19:44:00 PST
Subject: Telluride Workshops
Message-ID: <9402160344.AA25170@salk.edu>

CALL FOR PARTICIPATION IN TWO WORKSHOPS ON "NEUROMORPHIC  ENGINEERING"

             JULY 3 - 9, 1994  AND  JULY 10 -  16,  1994

                       TELLURIDE, COLORADO

Christof Koch (Caltech) and Terry Sejnowski (Salk Institute/UCSD)
invite applications for two different workshops that will be held in 
Telluride, Colorado in July 1994. Travel and housing expenses will be 
provided for ten to twenty active researchers for each workshop.

Deadline for application is March 10, 1994. 

GOALS:

Carver Mead has introduced the term "Neuromorphic Engineering" for a new field  
based on the design and fabrication of artificial neural systems, such as 
vision systems, head-eye systems, and roving robots, whose architecture and 
design principles are based on those of biological nervous systems. The goal 
of these workshops is to bring together young investigators and more 
established researchers from academia with their counterparts in industry 
and national laboratories, working on both neurobiological as well as 
engineering aspects of sensory systems and sensory-motor integration. The 
focus of the workshop will be on ``active" participation, with 
demonstration systems and hands-on-experience for all participants. 

Neuromorphic engineering has a wide range of applications from nonlinear 
adaptive control of complex systems to the design of smart sensors. Many of 
the fundamental principles in this field, such as the use of learning methods 
and the design of parallel hardware, are inspired by biological systems. 
However, existing applications are modest and the challenge of scaling up 
from small artificial neural networks and designing completely autonomous 
systems at the levels achieved by biological systems lies ahead. The
assumption underlying these workshops is that the next generation of 
neuromorphic systems would benefit from closer attention to the principles 
found through experimental and theoretical studies of brain systems.

WORKSHOPS:

                  NEUROMORPHIC ANALOG VLSI SYSTEMS 
               Sunday, July 3 to Saturday, July 9, 1994

Organized by Rodney Douglas (Oxford), Misha Mahowald (Oxford) 
and Stephen Lisberger (UCSF). 

The goal of this week is to bring together biologists and engineers who are 
interested in exploring neuromorphic systems through the medium of analog VLSI. 
The workshop will cover methods for the design and fabrication of 
multi-chip neuromorphic systems. This framework is suitable both for 
creating analogs of specific biological systems, which can serve as a 
modeling environment for biologists, and as a tool for engineers to 
create cooperative circuits based on biological principles. 
The workshop will provide the community with a 
common formal language for describing neuromorphic systems. 

Equipment will be present for participants to evaluate 
existing neuromorphic chips (including silicon retina, silicon neurons, 
oculomotor system). 


                SYSTEMS LEVEL MODELS OF VISUAL BEHAVIOR
               Sunday, July 10 to Saturday, July 16, 1994 

Organized by Dana Ballard (Rochester) and Richard Andersen (Caltech).

The goal of this week is to bring together biologists and engineers who are 
interested in systems level modeling of visual behaviors and their 
interactions with the motor systems.

Sessions will cover issues of sensory-motor integration in the mammalian 
brain. Special emphasis will be placed on understanding  neural algorithms 
used by the brain which can provide insights into constructing electrical 
circuits which can accomplish similar tasks.  Issues to be covered will include 
spatial localization and constancy, attention, motor planning, eye 
movements, and the use of visual motion information for motor control.  
Two or three prominent neuroscientists will be invited to give lectures on 
the above subjects. These researchers will also be asked to bring their own 
demonstrations, classroom experiments, and software for computer models. 

Demonstrations include recording eye movements and simple eye 
movement psychophysical experiments, neural network models for 
coordinate transformations and the representation of space, visual 
attention psychophysical experiments. Participants can conduct their own 
experiments using the Virtual Reality equipment. 

FORMAT:

Time in both workshops will be divided between planned presentation, free 
interaction, and contributed material. Each day will consist of a lecture in 
the morning that covers the theory behind the hands-on investigation in the 
afternoon. Following each lecture, there will be a demonstration that 
introduces participants to the equipment that will be available in the 
afternoon session. Participants will be free to explore and play with whatever 
they choose in the afternoon. Participants are encouraged to bring their own 
material to share with others. After dinner, time for participants to provide 
an informal lecture/demonstration is reserved.  

LOCATION AND ARRANGEMENTS:

The two workshops will take place at the "Telluride Summer Research 
Center," located in the small  town of Telluride, 9000 feet high in Southwest 
Colorado, about 6 hours away from Denver (350 miles) and 4 hours from 
Aspen. Continental and United Airlines provide many daily flights directly 
into Telluride. Participants will be housed in shared condominiums, 
within walking distance of the Center. 

The workshop is intended to be very informal and hands-on. Participants are 
not required to have had previous experience in analog VLSI circuit design, 
computational or machine vision, systems level neurophysiology or modeling 
the brain at the systems level. However, we strongly encourage active 
researchers with relevant backgrounds from academia, industry and 
national laboratories to apply, in particular if they are prepared to talk about 
their work or to bring demonstrators to Telluride (e.g. robots, chips, 
software). 

We expect to be able to pay for shipping necessary equipment 
to Telluride and will have at least three technical staff present 
throughout both workshops to assist us with software and hardware 
problems. We will have a network of  SUN workstations running UNIX and 
connected to the Internet at the Center available to us. 

All domestic travel and housing expenses will be provided. 
Participants are expected to pay for food and incidental expenses. 

HOW TO APPLY:

The deadline for receipt of applications is March 10, 1994

Applicants should be at the level of graduate students or above (i.e. post-
doctoral fellows, faculty, research and engineering staff and the equivalent
positions in industry and national laboratories). We actively encourage
qualified women and minority candidates to apply.

Each participant can apply for only one workshop and the application should include:

1. Name, address, telephone, e-mail, FAX, and and minority status (optional).
2. Resume.
3. One page summary of background and interests relevant to the workshop.
4. Description of special equipment needed for demonstrations.
5. Two letters of recommendation

Complete applications should be sent to:

Prof. Terrence Sejnowski
The Salk Institute
Post Office Box 85800
San Diego, CA 
92186-5800

Applicants will be notified by April 15, 1994.


From venu at pixel.mipg.upenn.edu  Wed Feb 16 17:28:00 1994
From: venu at pixel.mipg.upenn.edu (Venugopal)
Date: Wed, 16 Feb 94 17:28:00 EST
Subject: Paper available on ftp
Message-ID: <9402162228.AA00373@pixel.mipg.upenn.edu>


	      *** PLEASE DO NOT FORWARD TO OTHER GROUPS  ***


      Preprint of the following paper (to appear in Circuits, Systems and
      Signal Processing) is available on ftp from neuroprose archive:


               AN IMPROVED SCHEME FOR THE DIRECT ADAPTIVE CONTROL
          OF DYNAMICAL SYSTEMS USING BACKPROPAGATION NEURAL NETWORKS
                       

		 K. P. Venugopal, R. Sudhakar and A. S. Pandya
		   
		        Department of Electrical Eng.
                   Department of Computer Science and Eng.
		        Florida Atlantic University


        Abstract:

	This paper presents an improved direct control architecture for
	the on-line learning control of dynamical systems using backpropagation
	neural networks. The proposed architecture is compared with the other
	direct control schemes. In the present scheme, the neural network
	interconnection strengths are updated based on the output error of
	the dynamical system directly, rather than using a transformed version
	of the error employed in other schemes. The ill effects of the 
	controlled dynamics on the on-line updating of the network weights
	are moderated by including a compensating gain layer. An error feedback
	is introduced to improve the dynamic response of the control system.
	Simulation studies are performed using the nonlinear dynamics of an
	underwater vehicle and the promising results support the effectiveness 
	of the proposed scheme.

	
	-----------------------------------------

	The file at archive.cis.ohio-state.edu is

	 venugopal.css.ps.Z
	 (34 pages)

	to ftp the files:

	unix> ftp archive.cis.ohio-state.edu

	Name (archive.cis.ohio-state.edu:xxxxx): anonymous
	Password: your address

	ftp> cd pub/neuroprose
	ftp> binary
	ftp> get venugopal.css.ps.Z


	uncompress the file after transfering to your machine.
        
	unix> uncompress venugopal.css.ps.Z


	________________________________________________________________

	K. P. Venugopal
	Medical Image Processing Group
	University of Pennsylvania
	423 Blockley Hall
	Philadelphia, PA 19104   	   (venu at pixel.mipg.upenn.edu)

From anandan at sarnoff.com  Wed Feb 16 09:22:51 1994
From: anandan at sarnoff.com (P. Anandan x3249)
Date: Wed, 16 Feb 94 09:22:51 EST
Subject: outlier, robust statistics
In-Reply-To: <9402150756.AA17907@salk.edu> (message from Terry Sejnowski on Mon, 14 Feb 94 23:56:04 PST)
Message-ID: <9402161422.AA13890@peanut.sarnoff.com>

Hi Terry,

It may be worth mentioning that a simple extension of your "fixed velocity"
formulation leads to something quite powerful and is a decent approximation for
many real situations.  This is to look formulate the hypothesis space as 2-D
affine transforms of the image plane.  Most of the references below have not
used robust estimators but have focussed on the layered representation problem.
However, recent extensions of all these algorithms at Sarnoff have included
several different types of robust estimators as options.  One noteworthy
omission (simply because I have not yet updated my bib file, is the paper by
Black and Jepson, CVPR93.)   I also did not inlude the paper by Wang and
Adelson at CVPR93, because that can be viewed as falling into either category
(affine hypotheses or object hypotheses).

In general, when you use a parametric motion model (translation, affine,
8-parameter quadratic for planar surface motion), you have the choice of
working with motion-parameters as hypotheses or the objects as hypotheses. But
if you are working with non-parametric motion fields (e.g., smooth flow), it is
not obvious how to work with motion parameters as hypotheses.  

Last but not least, I should mention a recent paper that we have written which
is under review that goes beyond parametric layers to include residual flow to
fully account for the scene motion.  This is an alternative approach to the
standard formulation of the spatial-coherence assumption as a "smoothness"
constraint (e.g., minimum quadratic variation, etc.).  This paper also
describes a computational framework that identifies the critical choice points
for layered motion estimation and shows how different algorithms fit into that
framework.   I should be in a position to send you a copy of the paper in a
couple of weeks or so.

-- anandan

@article{Irani-Peleg:IJCV,
	author =	{M. Irani and S. Peleg},
	title =		{Computing Occluding and Transparent Motions},
	journal =	IJCV,
	year =		{accepted for publication, 1993},
}


@inproceedings{Bergen-etal:AICV91,
        author =	{J.R. Bergen and P.J. Burt and K. Hanna and
			R. Hingorani and P. Jeanne and S. Peleg},
        title =         {Dynamic Multiple-Motion Computation},
        booktitle =     {Artificial Intelligence and Computer Vision:
                         Proceedings of the Israeli Conference},
	publisher =	{Elsevier},
	editor =	{Y.A. Feldman and A. Bruckstein},
        year =          {1991},
	pages =		{147--156}
}

@inproceedings{Burt-etal:WVM89,
	title =	{Object tracking with a moving camera, an application of
		dynamic motion analysis},
	author ={P.J. Burt and J.R. Bergen and R. Hingorani and R. Kolczynski and W.A. Lee and A. Leung and J. Lubin and H. Shvaytser},
	booktitle = WVM,
	address =	{Irvine, CA},
	month =		{March},
	year =		{1989},
	pages =		{2--12}
}


@article{Bergen-etal:PAMI92,
        author =   {J.R. Bergen and P.J. Burt and R. Hingorani and S. Peleg},
        title =    {A Three Frame Algorithm for Estimating Two-Component Image 
              Motion}, 
        journal =       PAMI,
        month =         {September},
        year =          {1992},
        volume =        {14},
        pages =         {886--896}
}


From M.Cooke at dcs.shef.ac.uk  Wed Feb 16 09:22:17 1994
From: M.Cooke at dcs.shef.ac.uk (Martin Cooke)
Date: Wed, 16 Feb 94 14:22:17 GMT
Subject: missing values
Message-ID: <9402161427.AA10510@dcs.shef.ac.uk>


I've only just seen the discussion on missing values, so forgive this late  
response. The issue of training the Kohonen self-organising feature map with  
partial data is covered in

	Samad & Harp (1992)
	Self-organisation with partial data
	Network, 3, 205-212.

Essentially, weight changes are restricted to the subspace of available data.  
Samad & Harp report three experiments using partial training data, and  
demonstrate that performance is essentially unchanged up to about 60% missing  
data. This is presumably due to the n -> 2 dimensionality reduction.

We recently applied this result to training a speech recogniser on partial  
data, and got similar results [tech. rep. in preparation]. We're coming at this  
from the field of auditory scene analysis, where the result of source  
segregation is an inherently partial description of one or other source.

I'd be happy to supply further details on request.

Martin Cooke
Computer Science
Sheffield University
UK

From mmoller at daimi.aau.dk  Wed Feb 16 11:10:00 1994
From: mmoller at daimi.aau.dk (Martin Fodslette M|ller)
Date: Wed, 16 Feb 1994 17:10:00 +0100
Subject: copy of thesis.
Message-ID: <199402161610.AA28147@titan.daimi.aau.dk>


To all that have requested a copy of my thesis 
(and apologies to those that did not for sending this message).


Thank you all for your interest in my thesis. Since so many 
have requested a copy (about 200), I will not be able to answer you 
all separately right now. Please accept my apologies.

You will all receive a copy of the thesis in a few weeks.

Best Regards

-martin

----------------------------------------------------------------
Martin Moller                    email: mmoller at daimi.aau.dk
Computer Science Dept.           Fax:   +45 8942 3255
Aarhus University                Phone: +45 8942 3371
Ny Munkegade, Build. 540,
DK-8000 Aarhus C,
Denmark
----------------------------------------------------------------


From venu at pixel.mipg.upenn.edu  Wed Feb 16 17:15:31 1994
From: venu at pixel.mipg.upenn.edu (Venugopal)
Date: Wed, 16 Feb 94 17:15:31 EST
Subject: Thesis available on ftp
Message-ID: <9402162215.AA00370@pixel.mipg.upenn.edu>


   The following thesis is available on ftp from neuroprose archive:


                   LEARNING IN CONNECTIONIST NETWORKS 
                       USING THE ALOPEX ALGORITHM

		           K. P. Venugopal
		     Florida Atlantic University


        Abstract:

	The ALOPEX algorithm is presented as a `universal' learning
	algorithm for connectionist models. It is shown that the ALOPEX
	procedure can be used efficiently as a supervised learning algorithm
	for such models. The algorithm is demonstrated successfully on a 
	variety of network architectures. Such architectures include multi-
	layered perceptrons, time-delay models, asymmetric fully recurrent
	networks and memory neurons. The learning performance as well as the
	generalization capability of the ALOPEX algorithm, are compared with
	those of the backpropagation procedure, concerning a number of
	benchmark problems, and it is shown that the ALOPEX has specific
	advantages. Results on the MONKS problems are the best reported
	ones so far.

	Two new architectures are proposed for the on-line, direct adaptive
	control of dynamical systems using neural networks. The proposed 
	schemes are shown to provide better response and tracking
        characteristics, than the other existing direct control schemes.
	A velocity reference scheme is introduced to improve the dynamic
	response of on-line learning controllers.

	The proposed learning algorithm and architectures are also studied on
	three practical problems: (i) classification of handwritten digits
	using Fourier descriptors, (ii) recognition of underwater targets
	from sonar returns, conidering temporal dependencies of consecutive
	returns, and (iii) on-line learning control of autonomous underwater
	vehicles, starting from random initial conditions. Detailed studies
	are conducted on the learning control applications. Also, the ability 
	of the neural network controllers to adapt to slow and sudden varying 
	parameter disturbances and measurement noise is studied in detail. 


	---------------------
	Some of the related papers:


	K. P. Venugopal, A. S. Pandya and R. Sudhakar, 'A recurrent neural
	network controller and learning algorithm for the on-line learning
	control of autonomous underwater vehicles', to appear in Neural
	Networks (1994)

	K. P. Venugopal, R. Sudhakar and A. S. Pandya, 'On-line learning
	control of autonomous underwater vehicles using feedforward neural
	networks', IEEE Journal of Oceanic Engineering, vol. 17 (1992)

	K. P. Venugopal, R. Sudhakar and A. S. Pandya, 'An improved scheme
	for the direct adaptive control of dynamical systems using
	backpropagation neural networks' to appear in Circuits, Systems and
	Signal Processing (1994)

	K. P. Venugopal and S. M. Smith, 'Improving the dynamic response of
	neural network controllers using velocity reference feedback'
	IEEE Trans. on Neural Networks, vol. 4, (1993)

	K. P. Unnikrishnan and K. P. Venugopal, 'Alopex: a correlation
	based learning algorithm for feedforward and feedback neural 
	networks' to appear in Neural Computation, vol. 6, (1994)

	A. S. Pandya and K. P. Venugopal, 'A stochastic parallel algorithm
	for learning in neural networks', to appear in IEICE Transactions 
	on Information Processing (1994)

	-----------------------------------------

	The files at archive.cis.ohio-state.edu are

	 venugopal.thesis1.ps.Z
	 venugopal.thesis2.ps.Z
	 venugopal.thesis3.ps.Z
	 venugopal.thesis4.ps.Z
	 venugopal.thesis5.ps.Z
	 venugopal.thesis6.ps.Z
	 venugopal.thesis7.ps.Z

	(total 200 pages)


	to ftp the files:

	unix> ftp archive.cis.ohio-state.edu

	Name (archive.cis.ohio-state.edu:xxxxx): anonymous
	Password: your address

	ftp> cd pub/neuroprose/Thesis
	ftp> binary
	ftp> mget venugopal.thesis*


	uncompress the files after transfering to your machine.
        
	unix> uncompress venugopal*


	-------------------------------------------------

	K. P. Venugopal
	Medical Image Processing Group
	University of Pennsylvania
	423 Blockley Hall
	Philadelphia, PA 19104   	   (venu at pixel.mipg.upenn.edu)

From minton at ptolemy.arc.nasa.gov  Wed Feb 16 21:03:21 1994
From: minton at ptolemy.arc.nasa.gov (Steve Minton)
Date: Wed, 16 Feb 94 18:03:21 PST
Subject: JAIR article
Message-ID: <9402170203.AA27856@ptolemy.arc.nasa.gov>

Readers of this newsgroup may be interested the following article, which
was recently published in the Journal of Artificial Intelligence
Research:

Ling, C.X.  (1994)
  "Learning the Past Tense of English Verbs: The Symbolic Pattern Associator
   vs. Connectionist Models", Volume 1, pages 209-229

   Postscript: volume1/ling94a.ps (247K)
   Online Appendix: volume1/ling-appendix.Z (109K) data file, compressed

   Appendix: Learning the past tense of English verbs - a seemingly minor
   aspect of language acquisition - has generated heated debates since
   1986, and has become a landmark task for testing the adequacy of
   cognitive modeling. Several artificial neural networks (ANNs) have
   been implemented, and a challenge for better symbolic models has been
   posed.  In this paper, we present a general-purpose Symbolic Pattern
   Associator (SPA) based upon the decision-tree learning algorithm ID3.
   We conduct extensive head-to-head comparisons on the generalization
   ability between ANN models and the SPA under different
   representations. We conclude that the SPA generalizes the past tense
   of unseen verbs better than ANN models by a wide margin, and we offer
   insights as to why this should be the case.  We also discuss a new
   default strategy for decision-tree learning algorithms.

JAIR's server can be accessed by WWW, FTP, gopher, or automated email.
For further information, check out our WWW server (URL is
gopher://p.gp.cs.cmu.edu/) or one of our FTP sites (/usr/jair/pub at
p.gp.cs.cmu.edu), or send email to jair at cs.cmu.edu with the subject
AUTORESPOND and the message body HELP.


From COTTRLL at FRMOP22.CNUSC.FR  Thu Feb 17 10:04:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Thu, 17 Feb 94 10:04
Subject: Paper available
Message-ID: <"94-02-17-10:04:06.72*COTTRLL"@FRMOP22.CNUSC.FR>

Dear connectionnits
Some people report that they cannot retrieve the paper
cottrell.things.ps
that I put in the neuroprose archive some days ago
I will try to solve the problem as soon as possible
Please wait a little before trying again
Yours sincerely
Marie Cottrell
SAMOS Universite Paris1
90, rue de Tolbiac
F-75634 PARIS 13
FRANCE
E-mail : cottrll at frmop22.cnusc.fr

From COTTRLL at FRMOP22.CNUSC.FR  Thu Feb 17 19:54:00 1994
From: COTTRLL at FRMOP22.CNUSC.FR (COTTRELL)
Date: Thu, 17 Feb 94 19:54
Subject: Paper available : Kohonen algorithm
Message-ID: <"94-02-17-19:54:08.03*COTTRLL"@FRMOP22.CNUSC.FR>

Dear connectionnists
The problem that some of you encounter in retrieving
the paper
Two or three...
file cottrell.things.ps in neuroprose repository
comes from a change in its name
its name is now : cottrell.things.ps.Z
in pub/neuroprose
in archive.cis.ohio-state.edu
It has been compressed.
Sorry for the delay
Yours sincerely
Marie Cottrell

From reza at ai.mit.edu  Thu Feb 17 09:03:53 1994
From: reza at ai.mit.edu (Reza Shadmehr)
Date: Thu, 17 Feb 94 09:03:53 EST
Subject: Tech reports from CBCL at MIT
Message-ID: <9402171403.AA02835@corpus-callosum>


Hello,

Following is a list of recent technical reports from the Center for
Biological and Computational Learning at M.I.T.  These reports are 
available via anonymous ftp. (see end of this message for details)

--------------------------------
:CBCL Paper #78/AI Memo #1405
:author Amnon Shashua
:title On Geometric and Algebraic Aspects of 3D Affine and Projective
Structures from Perspective 2D Views
:date July 1993
:pages 14
:keywords visual recognition, structure from motion, projective
geometry, 3D reconstruction

We investigate the differences --- conceptually and algorithmically
--- between affine and projective frameworks for the tasks of visual
recognition and reconstruction from perspective views.  It is shown
that an affine invariant exists between any view and a fixed view
chosen as a reference view. This implies that for tasks for which a
reference view can be chosen, such as in alignment schemes for visual
recognition, projective invariants are not really necessary.  We then
use the affine invariant to derive new algebraic connections between
perspective views. It is shown that three perspective views of an
object are connected by certain algebraic functions of image
coordinates alone (no structure or camera geometry needs to be
involved).

--------------
:CBCL Paper #79/AI Memo #1390
:author  Jose L. Marroquin and Federico Girosi
:title Some Extensions of the K-Means Algorithm for Image Segmentation
and Pattern Classification
:date  January 1993
:pages 21
:keywords K-means, clustering, vector quantization, segmentation,
classification

We present some extensions to the k-means algorithm for vector
quantization that permit its efficient use in image segmentation and
pattern classification tasks. We show that by introducing a certain
set of state variables it is possible to find the representative
centers of the lower dimensional manifolds that define the boundaries
between classes; this permits one, for example, to find class
boundaries directly from sparse data or to efficiently place centers
for pattern classification. The same state variables can be used to
determine adaptively the optimal number of centers for clouds of data
with space-varying density. Some examples of the application of these
extensions are also given.

--------------
:CBCL Paper #80/AI Memo #1431
:title Example-Based Image Analysis and Synthesis
:author David Beymer, Amnon Shashua and Tomaso Poggio
:date November, 1993
:pages 21
:keywords computer graphics, networks, computer vision,
teleconferencing, image compression, computer interfaces 

Image analysis and graphics synthesis can be achieved with learning
techniques using directly image examples without physically-based, 3D
models.  In our technique:  1) the mapping from novel images to a vector of 
``pose'' and ``expression'' parameters can be learned from a small set of 
example images using a function approximation technique that we call an 
analysis network; 2) the inverse mapping from  input ``pose'' and 
``expression'' parameters to output images can be synthesized from a small
set of example images and used to produce new images using a similar synthesis 
network.  The techniques described here have several applications in computer
graphics, special effects, interactive multimedia and very low bandwidth 
teleconferencing.

--------------
:CBCL Paper #81/AI Memo #1432
:title Conditions for Viewpoint Dependent Face Recognition
:author Philippe G. Schyns and Heinrich H. B\"ulthoff
:date August 1993
:pages 6
:keywords face recognition, RBF Network Symmetry

Face recognition stands out as a singular case of object recognition:  
although most faces are very much alike, people discriminate between many 
different faces with outstanding efficiency.  Even though little is known 
about the mechanisms of face recognition, viewpoint dependence, a recurrent 
characteristic of many research on faces, could inform algorithms and 
representations.  Poggio and Vetter's symmetry argument predicts that learning 
only one view of a face may be sufficient for recognition, if this view allows 
the computation of a symmetric, "virtual," view.  More specifically, as faces 
are roughly bilaterally symmetric objects, learning a side-view---which always 
has a symmetric view--- should give rise to better generalization performances 
that learning the frontal view.  It is also predicted that among all new 
views, a virtual view should be best recognized.  We ran two psychophysical 
experiments to test these predictions.  Stimuli were views of 3D models of 
laser-scanned faces.  Only shape was available for recognition; all other face 
cues--- texture, color, hair, etc.--- were removed from the stimuli.  The first
 experiment tested wqhich single views of a face give rise to best 
generalization performances.  The results were compatible with the symmetry 
argument: face recognition from a single view is always better when the 
learned view allows the computation 0f a symmetric view.

--------------
:CBCL Paper #82/AI Memo #1437
:author Reza Shadmehr and Ferdinando A. Mussa-Ivaldi
:title Geometric Structure of the Adaptive Controller of the Human Arm
:date  July 1993
:pages 34
:keywords Motor learning, reaching movements, internal models, force fields, 
virtual environments, generalization, motor control

The objects with which the hand interacts with may significantly change the 
dynamics of the arm.  How does the brain adapt control of arm movements
to this new dynamics?  We show that adaptation is via composition of a 
model of the task's dynamics.  By exploring generalization capabilities 
of this adaptation we infer some of the properties of the computational 
elements with which the brain formed this model:
the elements have broad receptive fields and encode the learned 
dynamics as a map structured in an intrinsic coordinate system closely related 
to the geometry of the skeletomusculature.  The low--level nature of 
these elements suggests that they may represent a set of primitives 
with which movement are represented in the CNS.

--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29
:keywords supervised learning, statistics, decision trees, neural
networks

We present a tree-structured architecture for supervised learning.  The 
statistical model underlying the architecture is a hierarchical mixture model 
in which both the mixture coefficients and the mixture components are 
generalized linear models (GLIM's).  Learning is treated as a maximum 
likelihood problem; in particular, we present an Expectation-Maximization (EM) 
algorithm for adjusting the parameters of the architecture.  We also develop 
an on-line learning algorithm in which the parameters are updated 
incrementally.  Comparative simulation results are presented in the robot 
dynamics domain.

--------------
:CBCL Paper #84/AI Memo #1441
:title On the Convergence of Stochastic Iterative Dynamic Programming 
Algorithms
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:date August 1993
:pages 15
:keywords reinforcement learning, stochastic approximation,
convergence, dynamic programming

Recent developments in the area of reinforcement learning have yielded a 
number of new algorithms for the prediction and control of Markovian 
environments.  These algorithms, including the TD(lambda) algorithm of Sutton 
(1988) and the Q-learning algorithm of Watkins (1989), can be motivated 
heuristically as approximations to dynamic programming (DP).  In this paper 
we provide a rigorous proof of convergence of these DP-based learning 
algorithms by relating them to the powerful techniques of stochastic 
approximation theory via a new convergence theorem.  The theorem establishes 
a general class of convergent algorithms to which both TD (lambda) and 
Q-learning belong.

--------------
:CBCL Paper #86/AI Memo #1449
:title Formalizing Triggers:  A Learning Model for Finite Spaces
:author Patha Niyogi and Robert Berwick
:pages 14
:keywords language learning, parameter systems, Markov chains,
convergence times, computational learning theory
:date November 1993

In a recent seminal paper, Gibson and Wexler (1993) take important
steps to formalizing the notion of language learning in a (finite)
space whose grammars are characterized by a finite number of {\it
parameters\/}. They introduce the Triggering Learning Algorithm (TLA)
and show that even in finite space convergence may be a problem due to
local maxima. In this paper we explicitly formalize learning in finite
parameter space as a Markov structure whose states are parameter
settings. We show that this captures the dynamics of TLA completely
and allows us to explicitly compute the rates of convergence for TLA
and other variants of TLA e.g. random walk. Also included in the paper
are a corrected version of GW's central convergence proof, a list of
``problem states'' in addition to local maxima, and batch and
PAC-style learning bounds for the model.

--------------
:CBCL Paper #87/AI Memo #1458
:title Convergence Results for the EM Approach to Mixtures of Experts 
Architectures
:author Michael Jordan and Xei Xu
:pages 33
:date September 1993

The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation.  Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992).  They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent.  In the current paper we provide a theoretical analysis of this algorithm.  We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood.  We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate.  In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.

--------------
:CBCL Paper #89/AI Memo #1461
:title Face Recognition under Varying Pose
:author David J. Beymer
:pages 14
:date December 1993
:keywords computer vision, face recognition, facial feature detection,
template matching

While researchers in computer vision and pattern recognition have
worked on automatic techniques for recognizing faces for the last 20
years, most systems specialize on frontal views of the face.  We
present a face recognizer that works under varying pose, the difficult
part of which is to handle face rotations in depth.  Building on
successful template-based systems, our basic approach is to represent
faces with templates from multiple model views that cover different
poses from the viewing sphere.  Our system has achieved a recognition
rate of 98% on a data base of 62 people containing 10 testing and 15
modelling views per person.

--------------
:CBCL Paper #90/AI Memo #1452
:title Algebraic Functions for Recognition
:author Amnon Shashua
:pages 11
:date January 1994

In the general case, a trilinear relationship between three perspective views
is shown to exist.  The trilinearity result is shown to be of much practical
use in visual recognition by alignment --- yielding a direct method that cuts
through the computations of camera transformation, scene structure and epipolar
geometry.  The proof of the central result may be of further interest as it
demonstrates certain regularities across homographies of the plane and 
introdues new view invariants.  Experiments on simulated and real image data
were conducted, including a comparative analysis with epipolar intersection
and the linear combination methods, with results indicating a greater degree
of robustness in practice and higher level of performance in re-projection 
tasks.

============================

How to get a copy of a report:

The files are in compressed postscript format and are named by their 
AI memo number.  They are put in a directory named as the year
in which the paper was written.  

Here is the procedure for ftp-ing:

unix> ftp publications.ai.mit.edu (128.52.32.22, log-in as anonymous)
ftp>  cd ai-publications/1993
ftp>  binary
ftp>  get AIM-number.ps.Z
ftp>  quit
unix> zcat AIM-number.ps.Z | lpr


Best wishes,

Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139


From mel at klab.caltech.edu  Thu Feb 17 21:00:32 1994
From: mel at klab.caltech.edu (Bartlett Mel)
Date: Thu, 17 Feb 94 18:00:32 PST
Subject: NIPS*94 Call for Workshops
Message-ID: <9402180200.AA20549@plato.klab.caltech.edu>

	  
		         CALL FOR PROPOSALS 
  
		  NIPS*94 Post-Conference Workshops 
   
		       December 2 and 3, 1994 
			   Vail, Colorado 
	  
  Following the regular program of the Neural Information Processing
  Systems 1994 conference, workshops on current topics in neural
  information processing will be held on December 2 and 3, 1994, in
  Vail, Colorado.  Proposals by qualified individuals interested in
  chairing one of these workshops are solicited.  Past topics have
  included: active learning and control, architectural issues,
  attention, bayesian analysis, benchmarking neural network
  applications, computational complexity issues, computational
  neuroscience, fast training techniques, genetic algorithms, music,
  neural network dynamics, optimization, recurrent nets, rules and
  connectionist models, self- organization, sensory biophysics, speech,
  time series prediction, vision and audition, implementations, and
  grammars.
  
  The goal of the workshops is to provide an informal forum for
  researchers to discuss important issues of current interest.  Sessions
  will meet in the morning and in the afternoon of both days, with free
  time in between for ongoing individual exchange or outdoor activities.
  Concrete open and/or controversial issues are encouraged and preferred
  as workshop topics.  Representation of alternative viewpoints and
  panel-style discussions are particularly encouraged.  Individuals
  proposing to chair a workshop will have responsibilities including:
  1) arranging short informal presentations by experts working on the
  topic, 2) moderating or leading the discussion and reporting its high
  points, findings, and conclusions to the group during evening plenary
  sessions (the ``gong show''), and 3) writing a brief summary.
  
  Submission Procedure: Interested parties should submit a short
  proposal for a workshop of interest postmarked by May 21, 1994.
  (Express mail is   not   necessary.  Submissions by electronic
  mail will also be accepted.)  Proposals should include a title, a
  description of what the workshop is to address and accomplish, the
  proposed length of the workshop (one day or two days), and the planned
  format.  It should motivate why the topic is of interest or
  controversial, why it should be discussed and what the targeted group
  of participants is.  In addition, please send a brief resume of the
  prospective workshop chair, a list of publications and evidence of
  scholarship in the field of interest. 
  
  Mail submissions to:
  
  Todd K. Leen, NIPS*94 Workshops Chair 
  Department of Computer Science and Engineering 
  Oregon Graduate Institute of Science and Technology 
  P.O. Box 91000  Portland 
  Oregon 97291-1000  USA 
  
  (e-mail: tleen at cse.ogi.edu) 
    
  Name, mailing address, phone number, fax number, and e-mail net
  address should be on all submissions.  
  
  PROPOSALS MUST BE POSTMARKED BY MAY 21, 1994
   
  Please Post  

From scheler at informatik.tu-muenchen.de  Fri Feb 18 11:10:21 1994
From: scheler at informatik.tu-muenchen.de (Gabriele Scheler)
Date: Fri, 18 Feb 1994 17:10:21 +0100
Subject: TR announcement: Adaptive Distance Measures
Message-ID: <94Feb18.171027met.42273@papa.informatik.tu-muenchen.de>

FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/scheler.adaptive.ps.Z
 
The file scheler.adaptive.ps.Z is now available for
copying from the Neuroprose repository:

Pattern Classification with Adaptive Distance Measures
Gabriele Scheler
Technische Universit"at M"unchen
(25 pages)

also available as Report FKI-188-94 from
Institut f"ur Informatik
TU M"unchen
D 80290 M"unchen

ftp-host: flop.informatik.tu-muenchen.de
ftp-file: pub/fki/fki-188-94.ps.gz

ABSTRACT:


In this paper, we want to explore the notion of learning the classification
of patterns from examples by synthesizing distance functions.

A working implementation of a distance classifier is presented.
Its operation is illustrated with the problem of classification according
to parity (highly non-linear) and a classification of feature vectors which
involves dimension reduction (a linear problem). A solution to these
problems is sought in two steps: (a) a parametrized distance function (called
a `distance function scheme') is chosen, (b) setting parameters to values
according to the classification of training patterns results in a specific
distance function. This induces a classification on all remaining
patterns.

The general idea of this approach is to find restricted functional shapes
in order to model certain cognitive functions of classification exactly,
i.e. performing classifications that occur as well as excluding classifications
that do not naturally occur and may even be experimentally proven to be 
excluded from learnability by a living organism.
 
There are also certain technical advantages in using restricted function
shapes and simple learning rules, such as reducing learning time, generating
training sets and individual patterns to set certain parameters, determining
the learnability of a specific problem with a given function scheme or 
providing additions to functions for individual exceptions, while retaining 
the general shape for generalization.


From soller at asylum.cs.utah.edu  Fri Feb 18 19:13:34 1994
From: soller at asylum.cs.utah.edu (Jerome Soller)
Date: Fri, 18 Feb 94 17:13:34 -0700
Subject: 2nd An. Utah Workshop on the Applicat. of Intelligent and Adap. Systems
Message-ID: <9402190013.AA09689@asylum.cs.utah.edu>

------------------------------------------------
2nd Annual Utah Workshop on:

"Applications of Intelligent and Adaptive Systems"

Sponsored by:

The University of Utah Cognitive Science Industrial Advisory Board
and
The Joint Services Software Technology Conference '94

--------------------------------------------------

Date:  April 15, 1994  Time:  8:00 a.m.-2:30 p.m.
Cost:  contact Jerome Soller or Dale Sanders for the cost for 
	non-conference attendees, free for conference attendees
Location:  Salt Lake City Marriott, Salon E, 75 South and West Temple

--------------------------------------------------

Talk 1:

"The Use of Genetic Algorithms and Neural Networks in the Automatic
Interpretation of Medical Images",

Dr. Charles Rosenberg
Research Investigator, 
VA Geriatric, Research, Education, and Clinical Center
and 
Adjunct Assistant Professor,
Department of Psychology,
University of Utah

(crr at cogsci.psych.utah.edu)

((801) 582-1565, x-2458)

--------------------------------------------------

Talk 2:

"A Hybrid On-line Handwriting Recognition System"

Dr. Nicholas S. Flann.
Assistant Professor,
Computer Science Department,
Utah State University.

(flann at nick.cs.usu.edu)

((801) 750-2451)

--------------------------------------------------

Talk 3:

"Prototyping Activities in Robotics, Control, and Manufacturing"


Dr. Tarek M. Sobh
Research Assistant Professor 
Computer Science Department
University of Utah

(sobh at wingate.cs.utah.edu)

((801) 585-5047)

--------------------------------------------------

Talk 4:

"Software Architecture and Unmanned Ground Vehicles"

Dr. David Morgenthaler
Program Manager
Sarcos Research Corporation
Salt Lake City, UT

(David_Morgenthaler at ced.utah.edu)

((801) 581-0155)

--------------------------------------------------

Lunch Break:  11:45 a.m.-12:45 p.m.

--------------------------------------------------

Talk 5:

"Use of Decision Support in a Hospital Information System"

Dr. Allan Pryor
Professor of Medical Informatics
University of Utah
and 
Assistant Vice President of Informatics
Intermountain Health Care
Salt Lake City UT

(tapryor at cc.utah.edu)

((801) 321-2128)

--------------------------------------------------

Talk 6:

"Applications of Neural Networks in Critical Care Monitoring"

Dr. Joe Orr
Research Instructor
Department of Anesthesiology
University of Utah

(jorr at soma.med.utah.edu)

((801) 581-6393)

--------------------------------------------------

Pre-registration required; For registration, copies of the abstracts,
or references for publications relating to these talks, please contact:

Jerome Soller, Veterans Affairs Medical Center and
University of Utah Computer Science
(801) 582-1565, ext 2469; (801) 581-7977
soller at cs.utah.edu

or

Dale Sanders, TRW Inc., Ogden Engineering Services
(801) 625-8343
dale_sanders at oz.bmd.trw.com

--------------------------------------------------

We wish to thank the following for their support of this workshop:

Applied Information and Management Systems, Inc.; Intermountain Health Care;
The Joint Services Software Technology Conference; Salt Lake Veterans Affairs
Geriatric Research, Education, and Clinical Center; Sarcos Corporation; 3M
Health Information Systems; TRW Systems Integration Group; University of Utah
Departments of Computer Science, Medical Informatics, and Physiology; Utah
Information Technology Association


From judd at scr.siemens.com  Fri Feb 18 21:31:24 1994
From: judd at scr.siemens.com (Stephen Judd)
Date: Fri, 18 Feb 1994 21:31:24 -0500
Subject: Optimal Stopping Time  paper
Message-ID: <199402190231.VAA27524@tern.siemens.com>

***Do not forward to other bboards***
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wang.optistop.ps.Z

The file wang.optistop.ps.Z is now available for
copying from the Neuroprose repository:

Optimal Stopping and Effective Machine Complexity in Learning

    Changfeng Wang			U.Penn
    Santosh S. Venkatesh	U.Penn
    J. Stephen Judd			Siemens

Abstract:
We study the problem of when to stop training a class of feedforward networks
-- networks with fixed input weights, one hidden layer, and a linear output --
when they are trained with a gradient descent algorithm on a finite number
of examples. Under general regularity conditions, it is shown analytically 
that there are, in general, three distinct phases in the generalization 
performance in the learning process. In particular, the network has better 
generalization performance when learning is stopped at a certain time before 
the global minimum of the empirical error is reached. A notion of "effective 
size" of a machine is defined and used to explain the trade-off between the 
complexity of the machine and the training error in the learning process.

The study leads naturally to a network size selection criterion,
which turns out to be a generalization of Akaike's Information Criterion
for the learning process.
It is shown that stopping learning before the global minimum of the
empirical error has the effect of network size selection.


(8 pages)    To appear in NIPS-6-  (1993)

sj
        Stephen Judd				Siemens Corporate Research,
	(609) 734-6573				755 College Rd. East,
	fax (609) 734-6565			Princeton,
	judd at learning.scr.siemens.com		NJ  usa 08540


From mjolsness-eric at CS.YALE.EDU  Mon Feb 21 10:58:26 1994
From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness)
Date: Mon, 21 Feb 94 10:58:26 EST
Subject: clustering & matching papers
Message-ID: <199402211558.AA05604@NEBULA.SYSTEMSZ.CS.YALE.EDU>

****** PLEASE DO NOT FORWARD TO OTHER MAILING LISTS OR BOARDS. **************
****** PAPER AVAILABLE VIA NEUROPROSE ***************************************

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/gold.object-clustering.ps.Z
FTP-filename: /pub/neuroprose/lu.object-matching.ps.Z

The following two NIPS papers have been placed in the Neuroprose archive
at Ohio State. The files are "gold.object-clustering.ps.Z" and
"lu.object-matching.ps.Z".  Each is  8 pages in length.  The uncompressed
postscript file for the second paper, "lu.object-matching.ps.Z", contains
images and is 4.3 megabytes long.  So you may need to use a symbolic link
in printing it: "lpr -s" under SunOS.


-----------------------------------------------------------------------------

Clustering with a Domain-Specific Distance Measure

Stephen Gold, Eric Mjolsness and Anand Rangarajan
Yale Computer Science Department

With a point matching distance measure which is invariant under
translation, rotation and permutation, we learn 2-D point-set objects,
by clustering noisy point-set images.  Unlike traditional clustering
methods which use distance measures that operate on feature vectors - a
representation common to most problem domains - this object-based
clustering technique employs a distance measure specific to a type of
object within a problem domain.  Formulating the clustering problem as
two nested objective functions, we derive optimization dynamics similar
to the Expectation-Maximization algorithm used in mixture models.

-----------------------------------------------------------------------------

Two-Dimensional Object Localization by Coarse-to-Fine Correlation
Matching

Chien-Ping Lu and Eric Mjolsness
Yale Computer Science Department

We present a Mean Field Theory method for locating two-dimensional
objects that have undergone rigid transformations.  The resulting
algorithm is a coarse-to-fine correlation matching.  We first consider
problems of matching synthetic point data, and derive a point matching
objective function.  A tractable line segment matching objective
function is derived by considering each line segment as a dense
collection of points, and approximating it by a sum of Gaussians.  The
algorithm is tested on real images from which line segments are
extracted and matched.

-----------------------------------------------------------------------------

		- Eric Mjolsness
		  mjolsness at cs.yale.edu


-------

From pkso at castle.ed.ac.uk  Tue Feb 22 13:54:42 1994
From: pkso at castle.ed.ac.uk (P Sollich)
Date: Tue, 22 Feb 94 18:54:42 GMT
Subject: Preprint on query learning in Neuroprose archive
Message-ID: <9402221854.aa28409@uk.ac.ed.castle>

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/sollich.queries.ps.Z

The file sollich.queries.ps.Z (16 pages) is now available via anonymous
ftp from the Neuroprose archive.  Title and abstract are given below.
We regret that hardcopies are not available. 

---------------------------------------------------------------------------

                     Query Construction, Entropy and 
                             Generalization 
                       in Neural Network Models


                             Peter Sollich

             Department of Physics, University of Edinburgh,
         Kings Buildings, Mayfield Road, Edinburgh EH9 3JZ, U.K.

                    (To appear in Physical Review E) 


                               Abstract 

    We study query construction algorithms, which aim at improving the
    generalization ability of systems that learn from examples by
    choosing optimal, non-redundant training sets.  We set up a
    general probabilistic framework for deriving such algorithms from
    the requirement of optimizing a suitable objective function;
    specifically, we consider the objective functions entropy (or
    information gain) and generalization error. 
    
    For two learning scenarios, the high-low game and the linear
    perceptron, we evaluate the generalization performance obtained by
    applying the corresponding query construction algorithms and
    compare it to training on random examples.  We find qualitative
    differences between the two scenarios due to the different
    structure of the underlying rules (nonlinear and `non-invertible'
    vs.linear); in particular, for the linear perceptron, random
    examples lead to the same generalization ability as a sequence of
    queries in the limit of an infinite number of examples. 
    
    We also investigate learning algorithms which are ill-matched to
    the learning environment and find that in this case, minimum
    entropy queries can in fact yield a lower generalization ability
    than random examples.  Finally, we study the efficiency of single
    queries and its dependence on the learning history, i.e. on
    whether the previous training examples were generated randomly or
    by querying, and the difference between globally and locally
    optimal query construction. 

---------------------------------------------------------------------------
 Peter Sollich                           Dept. of Physics
                                         University of Edinburgh
 e-mail: P.Sollich at ed.ac.uk              Kings Buildings
 Tel. +44-31-650 5236                    Mayfield Road
                                         Edinburgh EH9 3JZ, U.K.
---------------------------------------------------------------------------

From B344DSL at UTARLG.UTA.EDU  Tue Feb 22 22:18:10 1994
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Tue, 22 Feb 1994 21:18:10 -0600 (CST)
Subject: Conference announcement
Message-ID: <01H9786W7CBM0004O8@UTARLG.UTA.EDU>

	ANNOUNCEMENT AND CALL FOR ABSTRACTS

Conference on Oscillations in Neural Systems, Sponsored by the 
Metroplex Institute for Neural Dynamics (MIND) and the University  
of Texas at Arlington.  To be held Thursday through Saturday, 
MAY 5-7, 1994

Location:
UNIVERSITY OF TEXAS AT ARLINGTON
MAIN LIBRARY, 6TH FLOOR PARLOR

Official Conference Motel:
Park Inn
703 Benge Drive
Arlington, TX 76013
1-800-777-0100 or 817-860-2323

A block of rooms has been reserved at the Park Inn for $35 a night 
(single or double).  Room sharing arrangements are possible.  
Reservations should be made directly through the motel.

Official Conference Travel Agent:
Airline reservations to Dallas-Fort Worth airport should be made 
through Dan Dipert travel in Arlington, 1-800-443-5335.  For those 
who wish to fly on American Airlines, a Star File account has been 
set up for a 5% discount off lowest available fares (two week 
advance, staying over Saturday night) or 10% off regular coach 
fare; arrangements for Star File reservations should be made  
through Dan Dipert.  Please let the conference organizers know   
(by e-mail or telephone) when you plan to arrive: some people    
can be met at the airport (about 30 minutes from Arlington),   
others can call Super Shuttle at 817-329-2000 upon arrival for  
transportation to the Park Inn (about $14-$16 per person).

Registration for the conference is $25 for students, $65 for non-
student oral or poster presenters, $85 for others.  MIND members 
will have $20 (or $10 for students) deducted from the registration.
A registration form is attached to this announcement.  
Registrants will receive the MIND monthly newsletter (on e-mail 
when possible) for the remainder of 1994. 

Invited speakers:

Bill Baird (University of California, Berkeley)
Adi Bulsara (Naval Research Laboratories, San Diego)
Alianna Maren (Accurate Automation Corporation)
George Mpitsos (Oregon State University)
Martin Stemmler (California Institute of Technology)
Roger Traub (IBM, Tarrytown, New York) 
Robert Wong (Downstate Medical Center, Brooklyn)
Geoffrey Yuen (Northwestern University)

       Those interested in presenting are invited to submit 
abstracts (1-2 paragraphs) any time between now and March 15,  
1994, of any work related to the theme of the conference.  The 
topic of neural oscillation is currently of great interest to  
psychologists and neuroscientists alike.  Recently it has been  
observed that neurons in separate areas of the brain will oscillate
in synchrony in response to certain stimuli.  One hypothesized 
function for such synchronized oscillations is to solve the    
"binding problem," that is, how is it that disparate features
of objects (e.g., a person's face and their voice) are tied  
together into a single unitary whole.  Some bold speculators
(such as Francis Crick in his recent book, The Astonishing 
Hypothesis) even argue that synchronized neural oscillations form
the basis for consciousness.
       Talks will be 1 hour for invited speakers and 45 minutes for 
contributed speakers including questions.  There will be no  
parallel sessors.  Contributors whose work is considered worthy
of presentation but who cannot be fit into the schedule will be 
invited to present posters.
       Presenters will not be required to write complete papers.  
After the conference is over, we will attempt to obtain a contract 
with a publisher for a book based on the conference.  Oral and
poster presenters will be invited to submit chapters to this book, 
although it is not a precondition for being a speaker.  Two books 
based on previous MIND conferences (Motivation, Emotion, and
Goal Direction in Neural Networks and Neural Networks for Knowledge
Representation and Inference) have been published by Lawrence 
Erlbaum Associates, and a book based on our last conference 
(Optimality in Biological and Artificial Networks?) is now in
progress, under contract with Erlbaum as part of their joint series 
with INNS.
       Abstracts should submitted, by e-mail, snail mail, or fax, 
to:

Professor Daniel S. Levine
Department of Mathematics, University of Texas at Arlington
411 S. Nedderman Drive
Arlington, TX 76019-0408
Office telephone: 817-273-3598, fax: 817-794-5802
e-mail: b344dsl at utarlg.uta.edu

Further inquiries about the conference can be addressed to  
Professor Levine or to the other two conference organizers:

Professor Vincent Brown       Mr. Timothy Shirey
817-273-3247                  214-495-3500 or 214-422-4570
b096vrb at utarlg.uta.edu        73353.3524 at compuserve.com


Please distribute this announcement to anyone you think may be 
interested in the conference.

REGISTRATION FOR MIND/INNS CONFERENCE ON OSCILLATIONS IN NEURAL
SYSTEMS, UNIVERSITY OF TEXAS AT ARLINGTON, MAY 5-7, 1994


Name  ______________________________________________________________

Address  ___________________________________________________________

         ___________________________________________________________
          
         ___________________________________________________________

         ____________________________________________________________

E-Mail    __________________________________________________________

Telephone _________________________________________________________


Registration fee enclosed:
                   _____   $15  Student, member of MIND

                   _____   $25  Student

                   _____   $65  Non-student oral or poster presenter

                   _____   $65  Non-student member of MIND

                   _____   $85  All others
 
Will you be staying at the Park Inn?         ____  Yes  ____  No
Are you planning to share a room with
someone you know?                            ____  Yes  ____  No

If so, please list that person's name __________________________            

If not, would be you be interested in
sharing a room with another conference
attendee to be assigned?                     ____  Yes  ____ No

PLEASE REMEMBER TO CALL THE PARK INN DIRECTLY FOR YOUR RESERVATION
(WHETHER SINGLE OR DOUBLE) AT 1-800-777-0100 OR 817-860-2323.

From fellous at selforg.usc.edu  Tue Feb 22 23:31:06 1994
From: fellous at selforg.usc.edu (Jean-Marc Fellous)
Date: Tue, 22 Feb 94 20:31:06 PST
Subject: Research Associate
Message-ID: <9402230431.AA00747@selforg.usc.edu>


Could you please post this announcement ?

Thanks,

Jean-Marc

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

                  TENNESSEE STATE UNIVERSITY
                 CENTER FOR NEURAL ENGINEERING

                   RESEARCH ASSOCIATE

Applications are invited for a research associate position, for a unique
consortium involving a medical school, and an engineering college, Oak Ridge
National Laboratory and a private high-tech industry.

Ph.D in Biomedical/Electrical Engineering (or related fields) with strong
intrest in artificial and biological neural networks is required, inthe areas
of auditory system modeling and sensory motor control.

This position will be supported for at least two years and possibly longer.
Teaching of a graduate or an undergraduate course is optional.

Send resume to :

                   Dr. Mohan J. Malkani
               Director, Center for Neural Engineering
               Tennessee State University
                3500 John Merritt Blvd.
                Nashville, TN  37209-1561
               (615)320-3550  Fax: (615)320-3554
               e-mail: malkani at harpo.tnstate.edu 


From sbh at eng.cam.ac.uk  Tue Feb 22 12:00:33 1994
From: sbh at eng.cam.ac.uk (S.B. Holden)
Date: Tue, 22 Feb 94 17:00:33 GMT
Subject: PhD dissertation available by anonymous ftp
Message-ID: <5730.199402221700@tw700.eng.cam.ac.uk>


The following PhD dissertation is available by anonymous ftp from the
archive of the Speech, Vision and Robotics Group at the Cambridge
University Engineering Department.

   On the Theory of Generalization and Self-Structuring in Linearly 
                   Weighted Connectionist Networks

                          Sean B. Holden

             Technical Report CUED/F-INFENG/TR161

	    Cambridge University Engineering Department 
		        Trumpington Street 
		        Cambridge CB2 1PZ 
			     England 


                             Abstract

The study of connectionist networks has often been criticized for 
an overall lack of rigour, and for being based on excessively ad 
hoc techniques. Even though connectionist networks have now been 
the subject of several decades of study, the available body of
research is characterized by the existence of a significant body of
experimental results, and a large number of different techniques, with
relatively little supporting, explanatory theory. This dissertation
addresses the theory of {\em generalization performance\/} and {\em
architecture selection\/} for a specific class of connectionist
networks; a subsidiary aim is to compare these networks with the
well-known class of multilayer perceptrons. 

After discussing in general terms the motivation for our study, we
introduce and review the class of networks of interest, which we call 
{\em $\Phi$-networks\/}, along with the relevant supervised training 
algorithms. In particular, we argue that $\Phi$-networks can in
general be trained significantly faster than multilayer perceptrons,
and we demonstrate that many standard networks are specific examples
of $\Phi$-networks. 

Chapters 3, 4 and 5 consider generalization performance by presenting
an analysis based on tools from computational learning theory. In
chapter 3 we introduce and review the theoretical apparatus required,
which is drawn from {\em Probably Approximately Correct (PAC) learning
theory\/}. In chapter 4 we investigate the {\em growth function\/} and
{\em VC dimension\/} for general and specific $\Phi$-networks,
obtaining several new results. We also introduce a technique which
allows us to use the relevant PAC learning formalism to gain some
insight into the effect of training algorithms which adapt
architecture as well as weights (we call these {\em self-structuring
training algorithms\/}). We then use our results to provide a
theoretical explanation for the observation that $\Phi$-networks can
in practice require a relatively large number of weights when compared
with multilayer perceptrons. In chapter 5 we derive new necessary and
sufficient conditions on the number of training examples required when
training a $\Phi$-network such that we can expect a particular
generalization performance. We compare our results with those derived
elsewhere for feedforward networks of Linear Threshold Elements, and
we extend one of our results to take into account the effect of using
a self-structuring training algorithm. 

In chapter 6 we consider in detail the problem of designing a good
self-structuring training algorithm for $\Phi$-networks. We discuss
the best way in which to define an optimum architecture, and we then
use various ideas from linear algebra to derive an algorithm, which we
test experimentally. Our initial analysis allows us to show that the
well-known {\em weight decay\/} approach to self-structuring is not
guaranteed to provide a network which has an architecture close to the
optimum one. We also extend our theoretical work in order to provide a
basis for the derivation of an improved version of our algorithm. 

Finally, chapter 7 provides conclusions and suggestions for future
research. 


************************ How to obtain a copy ************************

a) Via FTP:

unix> ftp svr-ftp.eng.cam.ac.uk
Name: anonymous
Password: (type your email address)
ftp> cd reports
ftp> binary
ftp> get holden_tr161.ps.Z
ftp> quit
unix> uncompress holden_tr161.ps.Z
unix> lpr holden_tr161.ps (or however you print PostScript)

b) Via postal mail:

Request a hardcopy from

Dr. Sean B. Holden,
Cambridge University Engineering Department, 
Trumpington Street, 
Cambridge CB2 1PZ,
England.

or email me: sbh at eng.cam.ac.uk


From viola at salk.edu  Wed Feb 23 14:17:52 1994
From: viola at salk.edu (Paul Viola)
Date: Wed, 23 Feb 94 11:17:52 PST
Subject: Heinous Patent
Message-ID: <9402231917.AA24448@salk.edu>

    From: Vision-List moderator Phil Kahn <Vision-List-Request at teleos.com>

    VISION-LIST Digest    Tue Feb 22 11:26:42 PDT 94     Volume 13 : Issue 8

    Date: Thu, 17 Feb 1994 22:23:00 GMT
    From: eledavis at ubvms.cc.buffalo.edu (Elliot Davis)
    Organization: University at Buffalo
    Subject: Error Reduction

    I would greatly appreciate your thoughts on the:

			ERROR TEMPLATE TECHNIQUE

    The "Error Template" technique (patent 4,802,231) provides an
    alternative method for reducing false alarms in pattern recognition
    systems. In this approach, a pattern representing a mismatched
    pattern is stored in the reference lexicon. It is a reference
    pattern to an error rather then to what is desired. THIS IS DONE
    WITH THE EXPECTATION THAT IF THE ERROR PATTERN OR A VARIATION OF IT
    IS REPEATED IT WILL TEND TO BE CLOSER TO ITSELF THEN TO THE PATTERN
    THAT IT FALSED OUT TO. 

    ...

Unless this patent is very old, I find it terrifying.  It is a concept
that is clearly part of the pattern recognition literature of the
70's.  Essentially pattern classification works by finding clusters
that represent classes.  These clusters along with a measurement model
define a probability density over the pattern space.  All this
technique is doing is adding an additional cluster which represents a
particular type of measurement error sensing a class.  Pattern
classification theory tells us that this should be done whenever there
is a particular measurement error that is not modeled well by our
measurement model.  You add a cluster when the distribution of data is
different from the probability density predicted by the model -- i.e.
a particular measurement error is more common than your model
predicts.  You can add these clusters by hand, as the patent suggests,
or you can let a density estimation scheme discover them for you (a
mixture of gaussians model trained with EM works nicely).  End of
story.

So remember, anytime someone adds another cluster to a pattern
classification model, they owe the owner of this patent money.

I wonder what the date of this fine patent is??

Paul Viola

From cohn at psyche.mit.edu  Wed Feb 23 18:15:17 1994
From: cohn at psyche.mit.edu (David Cohn)
Date: Wed, 23 Feb 94 18:15:17 EST
Subject: Paper available: Exploration using optimal experiment design
Message-ID: <9402232315.AA21110@psyche.mit.edu>

Those who find Peter Sollich's paper on query construction of interest
may also wish to look at the following paper, now available by
anonymous ftp. This is a slightly revised version of the paper that is
to appear in Advances in Neural Information Processing Systems 6, but
includes a correction to Equation 2 that was made too late to be
included in the NIPS volume.

#####################################################################
Neural Network Exploration Using Optimal Experiment Design

David A. Cohn
Dept. of Brain and Cognitive Sciences
Massachusetts Inst.\ of Technology 
Cambridge, MA 02139

Consider the problem of learning input/output mappings through
exploration, e.g. learning the kinematics or dynamics of a robotic
manipulator.  If actions are expensive and computation is cheap, then
we should explore by selecting a trajectory through the input space
which gives us the most amount of information in the fewest number of
steps.  I discuss how results from the field of optimal experiment
design may be used to guide such exploration, and demonstrate its use
on a simple kinematics problem.
#####################################################################

The paper may be retrieved by anonymous ftp to "psyche.mit.edu" using
the following protocol:

unix> ftp psyche.mit.edu

Name (psyche.mit.edu:joebob): anonymous    <- use "anonymous" here
331 Guest login ok, send ident as password.
Password: joebob at machine.univ.edu	   <- use your email address here
230 Guest login ok, access restrictions apply.
ftp> cd pub/cohn                           <- go to the directory
250 CWD command successful.
ftp> binary                                <- change to binary transfer
200 Type set to I.
ftp> get cohn.explore.ps.Z                 <- get the file
200 PORT command successful.
150 Binary data connection for cohn.explore.ps.Z ...
226 Binary Transfer complete.
local: cohn.explore.ps.Z remote: cohn.explore.ps.Z
301099 bytes received in 2.8 seconds (1e+02 Kbytes/s)
ftp> quit                                  <- all done
221 Goodbye.

From terry at salk.edu  Thu Feb 24 05:49:35 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Thu, 24 Feb 94 02:49:35 PST
Subject: Shakespeare and Neural Nets
Message-ID: <9402241049.AA02725@salk.edu>

from New Scientist 22 january 1994 p. 23

In an interesting article on the use of statistical measures to
assess the attribution of texts to authors, Robert Matthews and 
Tom Merrriam report that:

"Applying our neural network to disputed works such as 
'The Two Noble Kinsman' has produced some interesting
results and helped to settle some bitter arguments over authorship
of controversial texts. ...

"The first task was to train the network.  This we did by exposing
it to data extracted from a large number of samples of Shakespeare's
undisputed work, together with that of his successor with The King's
Men [a theater], John Fletcher. ... We then set the network loose on
'The Two Noble Kinsman'.  Drawing on a wide variety of essentially
subjective evidence, scholars have claimed that Shakespeare's hand
dominates Acts I and V, with much of the rest appearing to be by
Fletcher.  In March last year, our neural network agreed with these
attributions -- and proferred the extra opinion that Fletcher may
have received considerable help from Shakespeare in Act IV.  In short,
our neural network quantitatively supports the subjective view of its
much more sophisticated human counterparts that 'The Two Noble Kinsman'
is a genuine collaboration between Shakespeare and one of his
contemporaries."

These results will appear in the journal 'Literary and Linguistic Computing'.

A similar approach might be used to determine the contributions of
coauthors to scientific papers.

Terry

-----

From efiesler at maya.idiap.ch  Fri Feb 25 09:16:09 1994
From: efiesler at maya.idiap.ch (E. Fiesler)
Date: Fri, 25 Feb 94 15:16:09 +0100
Subject: NN Formalization paper available by ftp.
Message-ID: <9402251416.AA04305@maya.idiap.ch>


                              PLEASE POST
                              -----------


The following paper is available via anonymous ftp from the neuroprose archive.
It counts 13 A4-size PostScript pages, and  replaces a shorter preliminary ver-
sion.  Instructions for retrieval follow the abstract.


              NEURAL NETWORK CLASSIFICATION AND FORMALIZATION


                               E. Fiesler

                                 IDIAP
                               c.p. 609
                           CH-1920 Martigny
                              Switzerland


This paper has been accepted for publication in the special issue on Neural
Network Standards of  "Computer Standards & Interfaces", volume 16,  edited
by J. Fulcher.  Elsevier Science Publishers, Amsterdam, 1994.


                                ABSTRACT

In order to  assist the field of neural networks  in maturing, a formalization
and a solid foundation are essential. Additionally, to permit the introduction of formal proofs, it is essential to have an all-encompassing formal mathemat-
ical definition of a neural network.
        This publication offers a neural network formalization consisting of a
topological taxonomy,  a uniform nomenclature, and  an accompanying consistent
mnemonic notation.  Supported by this formalization, both a flexible mathemat-
ical definition are presented.


                     ------------------------------

To obtain a copy of this paper, please follow the following FTP instructions:

unix>     ftp archive.cis.ohio-state.edu (or: ftp 128.146.8.52)
login:    anonymous
password: <your e-mail address>
ftp>      cd pub/neuroprose
ftp>      binary
ftp>      get fiesler.formalization.ps.Z
ftp>      bye
unix>     zcat fiesler.formalization.ps.Z | lpr
                  (or however you uncompress and print postscript)

For convenience of those outside the US, the paper has also been placed on the
IDIAP ftp site:

unix>     ftp Maya.IDIAP.CH     (or: ftp 192.33.221.1)
login:    anonymous
password: <your e-mail address>
ftp>      cd pub/papers/neural
ftp>      binary
ftp>      get fiesler.formalization.ps.Z  (OR  get fiesler.formalization.ps)
ftp>      bye
unix>     zcat fiesler.formalization.ps.Z | lpr
      OR
unix>     lpr fiesler.formalization.ps

(Hard copies of the paper are unfortunately not available.)


P.S.  Thanks for the update, Jordan !

From giles at research.nj.nec.com  Fri Feb 25 18:28:59 1994
From: giles at research.nj.nec.com (Lee Giles)
Date: Fri, 25 Feb 94 18:28:59 EST
Subject: Available
Message-ID: <9402252328.AA28936@fuzzy>


********************************************************************************
Reprint:USING RECURRENT NEURAL NETWORKS TO LEARN THE STRUCTURE
OF INTERCONNECTION NETWORKS

The following reprint is available via the University of Maryland
Department of Computer Science Technical Report archive:

________________________________________________________________________________
                  "Using Recurrent Neural Networks to 
           Learn the Structure of Interconnection Networks"

  UNIVERSITY OF MARYLAND TECHNICAL REPORT UMIACS-TR-94-20 AND CS-TR-3226
               
                G.W. Goudreau(a) and C.L. Giles(b,c)

           goudreau at cs.ucf.edu, giles at research.nj.nec.com

(a) Department of Computer Science, U. of Central Florida, Orlando, FL 32816
(b) NEC Research Inst.,4 Independence Way, Princeton, NJ 08540
(c) Inst. for Advanced Computer Studies, U. of Maryland, College Park, MD 20742  


A modified Recurrent Neural Network (RNN) is used to learn a Self-Routing 
Interconnection Network (SRIN) from a set of routing examples. The RNN is 
modified so that it has several distinct initial states. This is equivalent 
to a single RNN learning multiple different synchronous sequential machines. 
We define such a sequential machine structure as "augmented" and show that
a SRIN is essentially an Augmented Synchronous Sequential Machine (ASSM).
As an example, we learn a small six-switch SRIN. After training we extract 
the network's internal representation of the ASSM and corresponding SRIN.

--------------------------------------------------------------------------------
                          FTP INSTRUCTIONS

                unix> ftp cs.umd.edu (128.8.128.8)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/pub/papers/TRs
                ftp> binary
                ftp> get 3226.ps.Z
                ftp> quit
                unix> uncompress 3226.ps.Z

---------------------------------------------------------------------------------

--                                 
C. Lee Giles / NEC Research Institute / 4 Independence Way
Princeton, NJ 08540 / 609-951-2642 / Fax 2482
==


From terry at salk.edu  Fri Feb 25 12:59:53 1994
From: terry at salk.edu (Terry Sejnowski)
Date: Fri, 25 Feb 94 09:59:53 PST
Subject: NEURAL COMPUTATION 6:2
Message-ID: <9402251759.AA18225@salk.edu>

Neural Computation  March 1994  Volume 6  Issue 2

Article:

Hierarchical Mixtures of Experts and the EM Algorithm
        Michael I. Jordan and Robert A. Jacobs

Notes:

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play
        Gerald Tesauro 

Correlated Attractors from Uncorrelated Stimuli
        L.F. Cugliandolo
        
Letters:

Learning of Phase-lags in Coupled Neural Oscillators
        Bard Ermentrout and Nancy Kopell
        
A Mechanism for Neuronal Gain Control by Descending Pathways
        Mark E. Nelson

The Role of Weight Normalization in Competitive Learning
        Geoffrey J. Goodhill and Harry G. Barrow

A Probabilistic Resource Allocating Network for Novelty Detection
        Stephen Roberts and Lionel Tarassenko

Diffusion Approximations for the Constant Learning Rate 
Backpropagation Algorithm and Resistance to Local Minima
        William Finnoff

Relating Real-time Backpropagation and Back-propagation Through Time:
An Application of Flow Graph Interreciprocity
        Francoise Beaufays and Eric A. Wan

Smooth On-line Learning Algorithms for Hidden Markov Models
        Pierre Baldi and Yves Chauvin

On Functional Approximation with Normalized Gaussian Units
        Michel Benaim

Statistical Physics, Mixtures of Distributions and the EM Algorithm
        Yuille, A.L., Stolorz, P., and Utans, J.


-----
 
SUBSCRIPTIONS - 1994 - VOLUME 6 - BIMONTHLY (6 issues)
 
______ $40     Student and Retired
______ $65     Individual
______ $166    Institution
 
Add $22 for postage and handling outside USA (+7% GST for Canada).
 
(Back issues from Volumes 1-5 are regularly available for $28 each
to institutions and $14 each for individuals
Add $5 for postage per issue outside USA (+7% GST for Canada)
 
MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142.
Tel: (617) 253-2889  FAX: (617) 258-6779  e-mail: hiscox at mitvma.mit.edu
 
-----

From heger at Informatik.Uni-Bremen.DE  Mon Feb 28 07:27:12 1994
From: heger at Informatik.Uni-Bremen.DE (Matthias Heger)
Date: Mon, 28 Feb 94 13:27:12 +0100
Subject: paper available
Message-ID: <9402281227.AA06748@Informatik.Uni-Bremen.DE>

FTP-host: 	ftp.gmd.de
FTP-filename: 	/Learning/rl/papers/heger.consider-risk.ps.Z

The file heger.consider-risk.ps.Z is now available for copying from the RL
papers repository:


	   ***************************************************
	   * Consideration of Risk in Reinforcement Learning *
	   ***************************************************


	(Revised submission to the 11th International Conference on
 	 	  Machine Learning (ML94), 15 pages)

				
				Abstract
				--------
 
Most Reinforcement Learning (RL) work supposes policies for sequential
decision tasks to be optimal that minimize the expected total discounted
cost (e.g. Q-Learning [Wat 89], AHC [Bar Sut And 83]). On the other hand,
it is well known that it is not always reliable and can be treacherous to 
use the expected value as a decision criterion [Tha 87]. A lot of alter-
native decision criteria have been suggested in decision theory to get a
more sophisticated consideration of risk but most RL researchers have not 
concerned themselves with this subject until now. The purpose of this
paper is to draw the reader's attention to the problems of the expected
value criterion in Markov Decision Processes and to give Dynamic Pro-
gramming algorithms for an alternative criterion, namely the Minimax cri-
terion. A counterpart to Watkins' Q-Learning related to the Minimax cri-
terion is presented. The new algorithm, called Q^-Learning
(Q-hat-Learning), finds policies that minimize the >>worst-case<< total
discounted costs. Most mathematical details aren't presented here but can
be found in [Heg 94].


----------------------------------------------------------------------------
Here is an example of retrieving and printing the file:

-> ftp ftp.gmd.de
Connected to gmdzi.gmd.de.
220 gmdzi FTP server (Version 5.72 Fri Nov 20 20:35:05 MET 1992) ready.
Name (ftp.gmd.de:heger): anonymous
331 Guest login ok, send your email-address as password.
Password:
230-This is an experimental FTP Server. See /README for details.
    This site is in Germany, Europe. Please restrict downloads to
    our non-working hours (i.e outside of 08:00-18:00 MET, Mo-Fr)
*** Local time is 12:25:22 MET
230 Guest login ok, access restrictions apply.
ftp> cd Learning/rl/papers
250 CWD command successful.
ftp> binary
200 Type set to I.
ftp> get heger.consider-risk.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for heger.consider-risk.ps.Z (100477 bytes).
226 Transfer complete.
local: heger.consider-risk.ps.Z remote: heger.consider-risk.ps.Z
100477 bytes received in 3.2e+02 seconds (0.3 Kbytes/s)
ftp> quit
221 Goodbye.
-> uncompress heger.consider-risk.ps.Z
-> lpr heger.consider-risk.ps
-------------------------------------------------------------------------------

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Matthias Heger						+
+ Zentrum fuer Kognitionswissenschaften, Universitaet Bremen,	+
+ Postfach 330 440						+
+ D-28334 Bremen, Germany					+
+ 								+
+ email: heger at informatik.uni-bremen.de				+
+ Tel.: +49 (0) 421 218 4659					+
+ Fax:  +49 (0) 421 218 3054					+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From gerda at ai.univie.ac.at  Mon Feb 28 10:42:04 1994
From: gerda at ai.univie.ac.at (Gerda Helscher)
Date: Mon, 28 Feb 1994 16:42:04 +0100
Subject: EMCSR'94
Message-ID: <199402281542.AA23377@anif.ai.univie.ac.at>


After the general info which appeared in this mailing list recently 
about the

                T W E L F T H   E U R O P E A N   M E E T I N G
                     O N   C Y B E R N E T I C S   A N D
                       S Y S T E M S   R E S E A R C H
                             ( E M C S R ' 9 4 ) 

here is the detailed programme of Neural Network-related events:


                             Plenary Lecture by 

                      S t e p h e n   G r o s s b e r g :

           "Neural Networks for Learning, Recognition and Prediction"

                         Wednesday, April 6, 9:00 a.m.,
                            University of Vienna, 
                           Main Building, Room 47


                                  Symposium 

             A r t i f i c i a l   N e u r a l   N e t w o r k s
                                    a n d
                        A d a p t i v e   S y s t e m s

                                Chairpersons: 
                  S.Grossberg, USA, and G.Dorffner, Austria 

                  Tuesday, April 5, and Wednesday, April 6, 
                   Univ. of Vienna, Main Building, Room 47


Tuesday, April 5:

14.00-14.30: Synchronization in a Large Neural Network of Phase Oscillators
             with the Central Element
             Y.Kazanovich, Russian Academy of Sciences, Moscow, Russia

14.30-15.00: Synchronization in a Neural Network Model with Time Delayed 
             Coupling
             T.B.Luzyanina, Russian Academy of Sciences, Moscow, Russia

15.00-15.30: Reinforcement Learning in a Network Model of the Basal Ganglia
             R.M.Borisyuk, J.R.Wickens, R.Koetter, University of Otago,
             New Zealand

Wednesday, April 6:

11.00-11.30: Adaptive High Performance Classifier Based on Random 
             Threshold Neurons
             E.M.Kussul, T.N.Baidyk, V.V.Lukovich, D.A.Rachkovskij, 
             Ukrainian Academy of Science, Kiev, Ukraine

11.30-12.00: Dynamics of Ordering for One-dimensional Topological Mappings
             R.Folk, A.Kartashov, University of Linz, Austria

12.00-12.30: Informational Properties of Willshaw-like Neural Networks 
             Capable of Autoassociative Learning
             A.Kartashov, R.Folk, A.Goltsev, A.Frolov, University of Linz,
             Austria

12.30-13.00: Relaxing the Hyperplane Assumption in the Analysis and 
             Modification of Back-propagation Neural Networks
             L.Y.Pratt, A.N.Christensen, Colorado School of Mines, 
             Golden, CO, USA

14.00-14.30: Improving Discriminability Based Transfer by Modifying the 
             IM Metric to Use Sigmoidal Activations
             L.Y.Pratt, V.I.Gough, Colorado School of Mines, 
             Golden, CO, USA

14.30-15.00: Order-theoretic View of Families of Neural Network 
             Architectures
             M.Holena, University of Paderborn, Germany

15.00-15.30: A New Class of Neural Networks: Recognition Invariant to 
             Arbitrary Transformation Groups
             A.Kartashov, K.Erman, University of Linz, Austria

16.00-16.30: Neural Assembly Architecture for Texture Recognition
             A.Goltsev, A.Kartashov, R.Folk, University of Linz, 
             Austria

16.30-17.00: A Neural System for Character Recognition on Isovalue Maps
             E.P.L.Passos, L.E.S.Varella, M.A.Santos, R.L.de Araujo, 
             Engineering Military Institute, Rio de Janeiro, Brazil

17.00-17.30: Neurocomputing Model Inference for Nonlinear Signal 
             Processing
             Z.Zografski, T.Durrani, University of Strathclyde, 
             Glasgow, United Kingdom

17.30-18.00: Learning from Examples and VLSI Implementation of 
             Neural Networks
             V.Beiu, J.A.Peperstraete, J.Vandewalle, R.Lauwereins, 
             Catholic University of Leuven, Heverlee, Belgium


For more information please contact: sec at ai.univie.ac.at

From ZECCHINA at to.infn.it  Mon Feb 28 13:22:01 1994
From: ZECCHINA at to.infn.it (Riccardo Zecchina - tel.11-5647358, fax. 11-5647399)
Date: Mon, 28 Feb 1994 19:22:01 +0100 (WET)
Subject: role of response functions in ANN's.
Message-ID: <940228192201.20800db9@to.infn.it>


FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/zecchina.response.ps.Z

The file zecchina.response.ps.Z is available for copying from the Neuroprose
repository:

"Response Functions Improving Performance in Analog Attractor Neural
Networks"
N .Brunel, R. Zecchina
(13 pages, to appear in Phys. Rev. E Rapid Comm.)

ABSTRACT: In the context of attractor neural networks, we study how the
equilibrium analog neural activities, reached by the network dynamics during
memory retrieval, may improve storage performance by reducing the interferences
between the recalled pattern and the other stored ones. We determine a simple
dynamics that stabilizes network states which are highly correlated with the
retrieved pattern, for a number of stored memories that does not exceed
$\alpha_{\star} N$, where $\alpha_{\star}\in[0,0.41]$ depends on the global
activity level in the network and $N$ is the number of neurons. 


From andre at physics.uottawa.ca  Mon Feb 28 12:13:53 1994
From: andre at physics.uottawa.ca (Andre Longtin)
Date: Mon, 28 Feb 94 12:13:53 EST
Subject: Hebb Symposium
Message-ID: <9402281713.AA23088@miro.physics.uottawa.ca.physics.uottawa.ca>


          *******   Preliminary Announcement   *******

 THE FIELDS INSTITUTE FOR RESEARCH IN MATHEMATICAL SCIENCES

      HEBB SYMPOSIUM ON NEURONS AND BIOLOGICAL DYNAMICS

           Sunday, May 15 to Friday May 20, 1994
               Koffler Pharmaceutical Center
                  University of Toronto

D.O. Hebb's classic, "The Organization of Behavior" published in 1949,
sketched out how behavior might emerge from the properties of nerve cells
and assemblies of nerve cells.  This book was a landmark achievement in
neurophysiological psychology.  The modifiable synapse, discussed at length
by Hebb and now known as the "Hebb synapse", was a lasting contribution. 
Hebb was from Nova Scotia and spent most of his professional life at McGill
in the Psychology Department.  We are having this symposium in his honor. 
Topics will range from cellular level to systems level, with an eye towards
interesting dynamics and connections between dynamics and functions.  We
will bring together physiological and mathematical researchers with some
didactic and research talks oriented towards graduate students and
postdoctoral fellows. 

SCIENTIFIC PROGRAM:

Lectures will be presented by Nancy Kopell (Boston University) and David
Mumford (Harvard) in the Institute's Distinguished Lecture Series.

Invited talks by Larry Abbott (Brandeis), *Moshe Abeles 
(Hebrew U., Jerusalem), Harold Atwood (U. Toronto), David Brillinger 
(Berkeley), Jos Eggermont (U. Calgary), Bard Ermentrout (U. Pittsburg), 
Leon Glass (McGill), Ilona Kovacs (Rutgers), Gilles Laurent (Caltech), 
Andre Longtin (U. Ottawa), Leonard Maler (U. Ottawa), Karl Pribram 
(Radford U.), Paul Rapp (Med. Coll. Penn.), John Rinzel (NIH), 
Mike Shadlin (Stanford), Matt Wilson (Tucson), Martin Wojtowicz 
(U. Toronto), Steve Zucker (McGill).

Invited Attendees: Jose Segundo (UCLA), Alessandro Villa (Lausanne)

The meeting will emphasize poster sessions as well as discussion groups 
where participants can give short oral presentations of their work. 

(*=tentative)

TOPICS

Larry Abbott: Population vectors and Hebbian learning
Moshe Abeles: Information processing of synchronized activity
Harold Atwood: Synaptic transmission and plasticity
David Brillinger: Statistical analysis of neurophysiological data
Jos Eggermont: Spatial and temporal interactions in auditory cortex 
Bard Ermentrout: Patterns in visual cortex 
Leon Glass: Nonlinear dynamics of neural networks
Ilona Kovacs: Visual psychophysics/perceptual organization   
Gilles Laurent: Oscillations in olfaction
Andre Longtin: Stochastic nonlinear dynamics of sensory transduction
Leonard Maler: Bursting and recurrent feedback in electroreception
Karl Pribram: Behavioral neurodynamics
Paul Rapp: Dynamical characterization of neurological data 
John Rinzel: Thalamic rhythmogenesis in sleep and epilepsy
Mike Shadlin: Analysis of visual motion
Matt Wilson: Behaviorally induced changes in hippocampal connectivity
Martin Wojtowicz: Membranes, channels and synapses
Steve Zucker: Neural networks and visual computations
  

IMPORTANT DATES:

Monday April 11:    Last date to return questionnaire
Friday April 22:    Cut-off for registrations and Deadline 
                    for hotel/residence booking
Sunday May 15:      Arrival and registration (9 am - 12 noon)
Sunday May 15 to
 Friday May 20      Scientific program (ending Friday noon)

INFORMATION ON SCIENTIFIC PROGRAM:
David Brillinger (brill at stat.berkeley.edu)
Andre Longtin (andre at physics.uottawa.ca)

REGISTRATION AND ORGANIZATIONAL INFORMATION:
To receive registration information, please fill out the questionnaire
below and return it to:
                 Sheri Albers
                 The Fields Institute
                 185 Columbia St. W.
                 Waterloo, Ontario, Canada N2L 5Z5
                 Phone: (519) 725-0096
                 Fax: (519) 725-0704
                 e-mail: hebb at fields.uwaterloo.ca


-------------------------------------------------------------

              *******   Questionnaire   *******

       TO BE COMPLETED BY ANYONE WISHING TO ATTEND THE
      HEBB SYMPOSIUM ON NEURONS AND BIOLOGICAL DYNAMICS

Name:
Institution:
Department:
Address:

Phone:
Fax:
E-mail:

I plan to attend:  Yes ( )  No ( )  Maybe ( )

I plan to participate in the discussion groups: Yes ( )  No ( )  Maybe ( )
I plan to present a poster:   Yes ( )  No ( )  Maybe ( )

Topic or tentative title:


Arrival and departure dates (if other than May 14-20):


FAX TO: (519)725-0704 or e-mail: hebb at fields.uwaterloo.ca