From hicks at cs.titech.ac.jp  Sun Aug  1 16:14:14 1993
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Sun, 1 Aug 93 16:14:14 JST
Subject: Multiple Models, Committee of nets etc...
In-Reply-To: "Michael P. Perrone"'s message of Thu, 29 Jul 93 03:27:21 EDT <9307290727.AA19084@cns.brown.edu>
Message-ID: <9308010714.AA25751@maruko.cs.titech.ac.jp>


Michael P. Perrone  writes

>Tom Dietterich write:
>> This analysis predicts that using a committee of very diverse
>> algorithms (i.e., having diverse approximation errors) would yield
>> better performance (as long as the committee members are competent)
>> than a committee made up of a single algorithm applied multiple times
>> under slightly varying conditions.
>
>and David Wolpert writes:
>>There is a good deal of heuristic and empirical evidence supporting
>>this claim. In general, when using stacking to combine generalizers,
>>one wants them to be as "orthogonal" as possible, as Tom maintains.
>
>One minor result from my thesis shows that when the estimators are
>orthogonal in the sense that
>
>              E[n_i(x)n_j(x)] = 0 for all i<>j
>
>where n_i(x) = f(x) - f_i(x), f(x) is the target function, f_i(x) is
>the i-th estimator and the expected value is over the underlying 
>distribution; then the MSE of the average estimator goes like 1/N
>times the average of the MSE of the estimators where N is the number 
>of estimators in the population.  
>
>This is a shocking result because all we have to do to get arbitrarily 
>good performance is to increase the size of our estimator population!
>Of course in practice, the nets are correlated and the result is no
>longer true.


The matrix E[n_i(x)n_j(x)] may not be known but an estimate E[n'_i(x)n'_j(x)]
may be obtained using some training data which is different from the training
data used to train the generalizers in the first place. Here n'_i(x) = f'(x) -
f_i(x), E[n'_i(x)] = 0, f'(x) is a training data, f_i(x) is the i-th
estimator, and the expected value is over the training data.  Take the
eigenvectors (with non-zero eigenvalues) of E[n'_i(x)n'_j(x)] and you have a
set of generalizers (each a linear combination of the original generalizers)
which are orthogonal and uncorrelated over the training data, i.e.
E[n'_i(x)n'_j(x)] = 0 for all i<>j.  They can even be normalized by 
their eigenvalues so that E[n'_i(x)n'_j(x)] = 1 for all i==j. 

To summarize, in practice the generalizers can be de-correlated (to the extent
that they are linearly independent) by finding new generalizers composed of
appropriate linear sums of the originals.


I have an unrelated comment regarding Drucker Harris' earlier mail about using
synthetic data to improve performance.  Wouldn't it be true to say that if you
had a choice between learning with N synthetically created data and learning
with N novel training data that the latter is, on average, going to give
better results?  If so, then using synth data is a way to stretch your
training data; something like potato helper.


From jim at hydra.maths.unsw.EDU.AU  Sun Aug  1 19:51:14 1993
From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU)
Date: Mon, 2 Aug 93 09:51:14 +1000
Subject: committees
Message-ID: <9308012351.AA15492@hydra.maths.unsw.EDU.AU>

 A small caveat about when it is good to average different estimates
of an unknown quantity:
 If you have a fairly accurate and a fairly inaccurate way of estimating
something, it is obviously not good to take their simple average (that
is, half of one plus half of the other). The correct weighting of the
estimates is in inverse proportion to their variances (that is, keep
closer to the more accurate one). (At least, that is the correct
weighting if the estimates are independent: if they are correlated,
it is more complicated, but not much more). Proofs are easy, and included
in the ref below:

R. Templeton & J. Franklin, `Adaptive information and animal behaviour',
Evolutionary Theory 10 (Dec 1992): 145-155.

(Note that this concerns inaccurate estimates, not biased ones, as some
previous posters have been considering).
Of course, averaging and correlations are very easy calculations for
neural nets.
Some similar ideas have been studied in connection with "sensor fusion"
for robots:

Journal of Robotic Systems 7 (3): (1990), Special issue on multisensor
integration and fusion for intelligent robots, ed. R.C. Luo.

Interesting work on how real committees combine information is reviewed in:

D. Bunn & G. Wright, `Interaction of judgemental and statistical
forecasting methods: issues and analysis', Management Science 37 (1991): 501.


James Franklin
School of Mathematics
University of New South Wales


From mpp at cns.brown.edu  Mon Aug  2 17:54:29 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Mon, 2 Aug 93 17:54:29 EDT
Subject: Multiple Models, Committee of nets etc...
Message-ID: <9308022154.AA00323@cns.brown.edu>

Joydeep Ghosh writes:
> in our experiments, the difference between simple averaging
> and the best among other arbitration mechanisms does not
> seem statistically significant, thus supporting Waibel and
> Hampshire's observations.  The combination of
> networks trained on different feature vectors, on the other
> hand leads to 15-25% reduction in errors on a very difficult data set.

The result I discussed in a previous posting (that there is a 1/n relation
between the MSE of the averaged estimator and the avg. population MSE)
helps explain this result in the following terms:

   Averaging is more effective when the estimates are more distinct.

Thus in the example that Joydeep gives, the fact that different
features where used to generate different estimates suggests that those
estimates will be distinct (unless the features carry the same information).
also we have the advantage that using fewer features, we can use smaller
nets which helps avoid problems like over-fitting and the curse of dimensionality.

-Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912


From mpp at cns.brown.edu  Tue Aug  3 01:45:18 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Tue, 3 Aug 93 01:45:18 EDT
Subject: Committees
Message-ID: <9308030545.AA01131@cns.brown.edu>

David Wolpert writes:
-->Many of the results in the literature which appear to dispute this
-->are simply due to use of an error function which is not restricted to
-->being off-training set. In other words, there's always a "win" 
-->if you perform rationally on the training set (e.g., reproduce it
-->exactly, when there's no noise), if your error function gives you
-->points for performing rationally on the training set. In a certain
-->sense, this is trivial, and what's really interesting is off-training
-->set behavior. In any case, this automatic on-training set win is all
-->those aforementioned results refer to; in particular, they imply essentially
-->nothing concerning performance off of the training set.

In the case of averaging for MSE optimization (the meat and potatoes of 
neural networks) and any other convex measure, the improvement due
to averaging is independent of the distribution - on-training or off-.
It depends only on the topology of the optimization measure. 

It is important to note that this result does NOT say the average is better
than any individual estimate - only better than the average population
performance.  For example, if one had a reliable selection criterion for
deciding which element of a population of estimators was the best and that
estimator was better than the average estimator, then just choose the better
one. (Going one step further, simply use the selection criterion to choose
the best estimator from all possible weighted averages of the elements of
the population.) As David Wolpert pointed out, any estimator can be confounded
by a pathological data sample and therefore there doesn't exist a *guaranteed*
method for deciding which estimator is the best from a population in all cases.
Weak (as opposed to guaranteed) selection criteria exist in in the form of
cross-validation (in all of its flavors).  Coupling cross-validation with
averaging is a good idea since one gets the best of both worlds particularly
for problems with insufficient data.

I think that another very interesting direction for research (as David Wolpert
alluded to) is the investigation of more reliable selection criterion.

-Michael


From bernasch at forwiss.tu-muenchen.de  Tue Aug  3 03:41:45 1993
From: bernasch at forwiss.tu-muenchen.de (Jost Bernasch)
Date: Tue, 3 Aug 1993 09:41:45 +0200
Subject: weighting of estimates
In-Reply-To: jim@hydra.maths.unsw.EDU.AU's message of Mon, 2 Aug 93 09:51:14 +1000 <9308012351.AA15492@hydra.maths.unsw.EDU.AU>
Message-ID: <9308030741.AA29386@forwiss.tu-muenchen.de>


James Franklin writes:
 > If you have a fairly accurate and a fairly inaccurate way of estimating
 >something, it is obviously not good to take their simple average (that
 >is, half of one plus half of the other). The correct weighting of the
 >estimates is in inverse proportion to their variances (that is, keep
 >closer to the more accurate one).

Of course this is the correct weighting. Since the 60s this is done
very succesfully with the well-known "Kalman Filter". In this theory
the optimal combination of knowledge sources is described and
proofed in detail.

See the original work

@article{Kalman:60,
        AUTHOR = {R.E. Kalman},
        TITLE = "A New Approach to Linear Filtering and Prdiction Problems.",
        VOLUME = 12,
        number = 1,
        PAGES = {35--45},
        JOURNAL = "Trans. ASME, series D, J. Basic Eng.",
        YEAR = 1960
        }

 some neural network literature concerning this subject


@Article{WatanabeTzafestas:90,
  author = 	 "Watanabe and Tzafestas",
  title = 	 "Learning Algorithms for Neural Networks with the Kalman
                  Filter",
  journal = 	 JIRS,
  year = 	 1990,
  volume = 	 3,
  number = 	 4,
  pages = 	 "305-319",
  keywords=     "kalman, neural net"
}
@string{JIRS = {Journal of Intelligent and Robotic Systems}}

and a very good and practice oriented book

@book{Gelb:74,
        AUTHOR = "A. Gelb",
        TITLE = "Applied {O}ptimal {E}stimation",
        PUBLISHER = "{M.I.T} {P}ress, {C}ambridge, {M}assachusetts",
        YEAR = "1974"
     }

 (At least, that is the correct
 >weighting if the estimates are independent: if they are correlated,
 >it is more complicated, but not much more). Proofs are easy, and included
 >in the ref below:

For proofs and extensions to non-linear filtering and correlated
weights see the control theory literature. A lot of work is already
done!


-- Jost


Jost Bernasch       	
Bavarian Research Center for Knowledge-Based Systems 
Orleansstr. 34, D-81667 Muenchen , Germany
bernasch at forwiss.tu-muenchen.de 


From edelman at wisdom.weizmann.ac.il  Tue Aug  3 16:23:11 1993
From: edelman at wisdom.weizmann.ac.il (Edelman Shimon)
Date: Tue, 3 Aug 93 23:23:11 +0300
Subject: TR on representation with receptive fields available
Message-ID: <9308032023.AA23457@wisdom.weizmann.ac.il>

The following TR is available via anonymous ftp from
eris.wisdom.weizmann.ac.il (132.76.80.53), as /pub/rfs-for-recog.ps.Z

Representation with receptive fields: gearing up for recognition

Weizmann Institute CS-TR 93-09

Yair Weiss and Shimon Edelman

Abstract:

  Receptive fields are probably the most prominent and ubiquitous
  computational mechanism employed by biological information
  processing systems.  We report an attempt to understand the
  representational capabilities of the kind of receptive fields found
  in mammalian vision motivated by the assumption that the successive
  stages of processing remap the retinal representation space in a
  manner that makes objectively similar stimuli (e.g., different views
  of the same 3D object) closer to each other, and dissimilar stimuli
  farther apart. We present theoretical analysis and computational
  experiments that compare the similarity between stimuli as they are
  represented at the successive levels of the processing hierarchy,
  from the retina to the nonlinear cortical units. Our results
  indicate that population-based codes do convey information that
  seems lost in the activities of the individual receptive fields, and
  that at the higher levels of the hierarchy objects may be
  represented in a form that is more useful for visual recognition.
  This finding may, therefore, explain the success of previous
  empirical approaches to object recognition that employed
  representation by localized receptive fields.


From jim at hydra.maths.unsw.EDU.AU  Wed Aug  4 02:32:13 1993
From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU)
Date: Wed, 4 Aug 93 16:32:13 +1000
Subject: weighting of estimates
Message-ID: <9308040632.AA07933@hydra.maths.unsw.EDU.AU>

bernasch at forwiss.tu-muenchen.de (Jost Bernasch) writes:


>James Franklin writes:
 >> If you have a fairly accurate and a fairly inaccurate way of estimating
 >>something, it is obviously not good to take their simple average (that
 >>is, half of one plus half of the other). The correct weighting of the
 >>estimates is in inverse proportion to their variances (that is, keep
 >>closer to the more accurate one).
>
>Of course this is the correct weighting. Since the 60s this is done
>very succesfully with the well-known "Kalman Filter". In this theory
>the optimal combination of knowledge sources is described and
>proofed in detail.

Well, yes, in a way, but that's something like saying that the
motion of your body can be derived from Einstein's equations of
General Relativity. Too complicated. In particular, Kalman filters,
and control theory generally, are about time-varying entities, and
Kalman filters are an (essentially Bayesian) way of successively
updating estimates of a (possibly time-varying) quantity
(See R.J. Meinhold & N.D. Singpurwalla, `Understanding the
Kalman filter', American Statistician 37 (1983): 123).

The situation I was considering, and what is relevant to committees,
is much simpler (hence more general): how to combine estimates
(possibly correlated) of a single unknown quantity.

James Franklin
Mathematics
University of New South Wales


From Graham.Lamont at newcastle.ac.uk  Wed Aug  4 12:41:36 1993
From: Graham.Lamont at newcastle.ac.uk (Graham Lamont)
Date: Wed, 4 Aug 93 12:41:36 BST
Subject: multiple models, hybrid estimation
Message-ID: <AA04871.9308041141.poros@uk.ac.newcastle>

When I emailled Wray Buntine about his original posting on the subject of
multiple models, I quipped:

`Shhh.... dont tell everyone, they'll all want one!' (a multiple model)

Little did I know everyman and his dog appears to have one already:)

The recent postings and especially Michael Perrone's recent contribution(s)
have persuaded me to sketch the extent of my work in this area and donate a
FREE piece of Mathematica code.

 
I mention Michael's work because it follows the same basic approach of
general least squares as mine, and I agree with many of the points that he
raises in his general discussion of hybrid estimation, such as the need for
a completely general method, the utility of a closed form solution, and his
novel description of distinct local minima in functional space as opposed
to parameter space.

However.....
he says that for his method (GEM):


 >> 7) The *optimal* parameters of the ensemble estimator are given in closed
 >> form.


 I present a method in the same general spirit of Michael's that is
slightly more optimal and general (and I am not claiming even this is the
best!). It is based on the unconstrained least squares of the estimator
poulation "design matrix" via SVD.

1 Generality: The technique utilises singular value decomposition (SVD),
and hence avoids the problem of collinearity between estimators that can
(and often does) occur in a population of estimators as mentioned by
Michael. SVD happily copes with highly collinear or even duplicate
estimators in the design matrix, without preprocessing/thresholding.

2 Optimality: The technique places no constraint on the value of the
weights (MP [1] has sum=1 and also in the results he presents all w are
0<w<1 due to the simplification made ). The *unconstrained* minimisation is
ipso facto more optimal. Also the inclusion of a bias weight (see recipe)
can improve matters further.

The resulting +ve and -ve weightings of estimators can be intepreted
loosely as both competition and co-operation between estimators and near 0
weightings as redundant (non-distinct) estimators.

I, however do not claim that the technique is completely *optimal* since it
is not clear how much of the improvement due to combining estimators is due
to extra degrees of freedom. (The same is true of Michael's technique).

This type of technique however is completely *general*, and as alluded by
Michael, all manner of estimators can be added to the population. This
includes, networks, KNN, parzen regressors, information trees, 1st
principles models, expert sytems, even the original raw or preprocessed
data......in fact anything you have kicking about to make the population
more information rich.
---------------------------------------------------- 
Here is the recipe:

1. Given a design matrix A composed of m estimators f_i,i=1,m and n samples
(x,y) where x is a vector input, y a scalar target .

A_ij = f_i(x_j)  i=1,m j=1,n


2. Then the general unconstrained least squares minimisation of

|A.w-y|^2

wrt w the weight vector for the m estimators, is given by

3.       U_i.y
      w= ----- V_i
          m_i
 
where U.m.V^T=A are the singular value decomposition (SVD) of the design
matrix A.  4. The method is easily extended to include a bias weight by
simply adding an m+1th "estimator" which is simply a column of 1s.
---------------------------------------------------- 
Here is the FREE code:

Hybrid[A_,b_]:=
Block[{u,w,v,a},
{u,w,v}=SingularValues[A];
{a=((v.b)/w).u,(a.A-b).(a.A-b)}]

The routine accepts n x m design matrix A and vector b of n targets and
returns a vector of m weights, and the result of the minimisation (which is
of course always an underestimate of the *true* MSE, since it is
effectively the trainiTo use with bias simply do something like:

Hybrid[AddColumn[A,Table[1,{Length[y]}]],y]
--------------------------------------------------------------
Of course like any fitting procedure by increasing dof you can overfit.
It is recommended that your favourite resampling plan is used to avoid this.
Like everything else: GIGO.

1. @inproceedings{PerroneCooper93CAIP,
   AUTHOR    = {Michael P. Perrone and Leon N Cooper},
   TITLE     = {When Networks Disagree: Ensemble Method for Neural Networks},
   BOOKTITLE = {Neural Networks for Speech and Image processing},
   YEAR      = {1993},
   PUBLISHER = {Chapman-Hall},
   EDITOR    = {R. J. Mammone},
   NOTE      = {[To Appear]},
   where     = {London}
}


Cheers Graham


Graham Lamont
Department of Chemical and Process Engineering
Merz Court
University of Newcastle
Newcastle-upon-Tyne
NE1 7RU
UK

Phone: 91-2226000 x7241
Fax: 91-2611182
Email: graham.lamont at ncl.ac.uk


From sims at pdesds1.scra.org  Wed Aug  4 07:24:08 1993
From: sims at pdesds1.scra.org (sims@pdesds1.scra.org)
Date: Wed, 4 Aug 93 07:24:08 EDT
Subject: [ F.Y.I. Announcement of Conference on AI in Medicine]
Message-ID: <9308041124.AA02767@pdesds1.noname>

 This seemed relevant. Please excuse the bandwidth if it's not.

 jim

From:	IN%"DFP10 at ALBANY.ALBANY.EDU"  "Donald F. Parsons MD"  3-AUG-1993 05:38:36.51
To:	IN%"hspnet-l at albnydh2.bitnet"  "Rural Hospital Consulting Network"
CC:	
Subj:	Call for Papers: AIM-94 Spring Symposium

----------------------------Original message----------------------------
                             Call for Papers

                       AAAI 1994 Spring Symposium:
       Artificial Intelligence in Medicine:  Interpreting Clinical Data

           (March 21-23, 1994, Stanford University, Stanford, CA)


The deployment of on-line clinical databases, many supplanting the
traditional role of the paper patient chart, has increased rapidly over the
past decade. The consequent explosion in the quality and volume of
available clinical data, along with an ever more stringent medicolegal
obligation to remain aware of all implications of these data, has created a
substantial burden for the clinician. The challenge of providing intelligent
tools to help clinicians monitor patient clinical courses, forecast likely
prognoses, and discover new relational knowledge, is at least as large as
that generated by the knowledge explosion which motivated earlier efforts
in Artificial Intelligence in Medicine (AIM).  Whereas many of the
pioneering programs worked on small data sets which were entered
interactively by knowledge engineers or clinicians, the current generation
of programs have to act on raw data, unfiltered and unmediated by human
beings.  Interaction with human users typically only occurs on demand or on
detection of clinically significant events.  The emphasis of this symposium
will be on methodologies that provide robust autonomous performance in
data-rich clinical environments ranging from busy outpatient practices to
operating rooms and intensive care units. Relevant topics include
intelligent alarming (including anticipation and prevention of adverse
clinical events), data abstraction, sensor validation, preliminary event
classification, therapy advice, critiquing, and assistance in the
establishment and execution of clinical treatment protocols. Detection of
temporal and geographical patterns of disease manifestations and machine
learning of clinical patterns are also of interest.


Organizing committee

Serdar Uckun, Co-chair (Stanford University)
Isaac Kohane, Co-chair (Harvard Medical School)
Enrico Coiera (Hewlett-Packard Laboratories/Bristol)
Ramesh Patil (USC/Information Sciences Institute)
Mario Stefanelli (Universita di Pavia)


Format

A large data sample will be made available to participants to serve as
training and test sets for various approaches to information management and
to provide a common domain of discourse.  The sample will consist of two
data sets:

* A dense, high volume data set typical of a critical care environment.
This data set will consist of hemodynamic measurements, mechanical
ventilator settings, laboratory values including arterial blood gas
measurements, and treatment information covering a 12-hour period of a
patient with severe respiratory distress.  Monitored parameters (10-15
channels of data) will be sampled and recorded at rates up to 1/10 Hz.
The data set will be annotated with other clinically relevant data,
physician's interpretations, and established diagnoses.

* A large number of sparse data sets representative of outpatient
environments.  The data will include laboratory measurements, treatment
information, and physical findings on a large sample of patients (50 to
100 patients) taken from the same disorder population. Each patient record
will consist of several weeks' or months' worth of clinical information
sampled at irregular intervals.  Most of the cases will be made available
to interested researchers to be used as training cases.  For interested
parties, a small percentage of cases will be made available two weeks
prior to the symposium to be used as an optional testing set for various
approaches.

The data samples and accompanying clinical information will be available
via ftp or e-mail server around August 15, 1993.  Please contact the
organizers at the addresses below for further information.  The
data will also be made available on diskettes to participants who do not
have Internet access.  It will be left to the discretion of the participants
to use any subset of these samples to help focus their approaches and
presentations.  The data can also be used as test vehicles for their own
research and to create sample programs for demonstration at the symposium.
Participants do not have to use the data in order to participate.  However,
the program committee will favor presentations which exploit the provided
data sets in their analyses.


Submission process

Potential participants are invited to submit abstracts no longer than
2 pages (< 1200 words) by October 15, 1993.  The abstracts should outline
methodology and indicate, if applicable, how the provided data may be used
as a proof-of-principle for the discussed methodology.  Electronic submissions
are encouraged.  The abstracts may be sent to <aim-94 at camis.stanford.edu>
in ASCII, RTF, or PostScript formats.  Authors of accepted abstracts will
be asked to submit a working paper by January 31, 1994.  They will also be
asked to prepare either a poster or an oral presentation.


Submissions by mail

Use this method ONLY IF you cannot submit an abstract electronically.  Fax
submissions will not be accepted. Send 6 copies of the abstract to:

Serdar Uckun, MD, PhD
Co-chair, AIM-94
Knowledge Systems Laboratory
Stanford University
701 Welch Road, Bldg. C
Palo Alto, CA 94304
U.S.A.
Phone: [+1] (415) 723-1915


Calendar

      Abstracts due:                                    October 15, 1993
      Notification of authors by:                       November 15, 1993
      Working papers due:                               January 31, 1994
      Spring Symposium:                                 March 21-23, 1994


Information

For further information, please contact the co-chairs at the address above
or (preferably) via e-mail at:
        <aim-94 at camis.stanford.edu>


From hicks at cs.titech.ac.jp  Thu Aug  5 11:33:00 1993
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Thu, 5 Aug 93 11:33:00 JST
Subject: weighting of estimates
In-Reply-To: Jost Bernasch's message of Tue, 3 Aug 1993 09:41:45 +0200 <9308030741.AA29386@forwiss.tu-muenchen.de>
Message-ID: <9308050233.AA29633@maruko.cs.titech.ac.jp>


Jost Bernasch writes:
>
>James Franklin writes:
> > If you have a fairly accurate and a fairly inaccurate way of estimating
> >something, it is obviously not good to take their simple average (that
> >is, half of one plus half of the other). The correct weighting of the
> >estimates is in inverse proportion to their variances (that is, keep
> >closer to the more accurate one).
>
>Of course this is the correct weighting. Since the 60s this is done
>very succesfully with the well-known "Kalman Filter". In this theory
>the optimal combination of knowledge sources is described and
>proofed in detail.
>
> (At least, that is the correct
> >weighting if the estimates are independent: if they are correlated,
> >it is more complicated, but not much more). Proofs are easy, and included
> >in the ref below:
>
>For proofs and extensions to non-linear filtering and correlated
>weights see the control theory literature. A lot of work is already
>done!

I think the comments about the Kalman filter are a bit off the mark.  The
Kalman filter is based on the mathematics of conditional expectation.
However, the Kalman filter is designed to be used for time series.  What makes
the Kalman filter particularly useful is its recursive nature; a stream of
observations may be processed (often in real time) to produce a stream of
current estimates (or next estimates if you're trying to beat the stock
market).  

Committees of networks may also use conditional expectation, but combining
networks is not the same as processing time series of data.  I think it is
appropriate at this point to bring up 2 classical results concerning
probability theory, conditional expectation, and wide sense conditional
expectation.  (Wide sense conditional expectation uses the same formulas as
conditional expectation.  "Wide sense' merely serves to emphasize that the
distribution is not assumed to be normal.  'Conditional expectation' is used
in the case where the underlying distribution is assumed to be normal.)

(1) When the objective function is to minimize the mean squared error over the
training data, the wide sense conditional expectation is the best linear
predictor, regardless of the original distribution.

(2) If the original distribution is normal, and the objective function is to minimize
the MSE over the >entire< distribution, (both on-training and off-training), then 
the conditional expectation is the best predictor, linear or otherwise.

There are 3 important factors here.
[1]: Underlying distribution (of network outputs): 	normal?  not normal?
[2]: Objective function (assume MSE):  			on-training?  off-training?
[3]: Predictor: 					linear?  non-linear?

{1}
[1:normal] => [2:off-training],[3:linear]
Neural nets (as opposed to systolic arrays) are needed because the world
is full of non-normal distributions.  But that doesn't mean that the ouputs of
non-linear networks don't have joint normal distributions (over off-training
data).  Perhaps the non-linearities have been successfully ironed out by the
non-linear networks, leaving only linear (or nearly linear) errors to be
corrected.  In that case we can refer to result (2) to build the optimal
off-training predictor for the given committee of networks.

{2}
[1:not normal] and [2:on-training] and [3:linear] => best predictor is WSE. 
If the distribution of network outputs is not normal, and we use an
on-training criterion, then by virtue of (1), the best linear predictor is the
wide sense conditional expectation.  

{3}
[1:not normal] and [2:off-training] and [3:non-linear] => research
It is the case in {2} that since [1:not normal],
<1> better on-training results may be obtained using some non-linear predictor
<2> better on-or-off-training results may be obtained using some different criterion
<3> <1> and <2> together.
The problem is of course to find such criterion and non-linear predictors.
The existence of a priori knowledge can play an important role here; for
example adding a term to penelize the complexity of output functions.


In conclusion, if {1} is true, that is the networks have captured the
non-linearities and the network outputs are joint normal (or nearly normal)
distributions, we're home free.  Otherwise we ought to think about {3},
non-linear predictors and alternative criterion.  {2}, using the WSE, the best
performing linear predictor over the MSE of the on-training data, is useful to
get the job done, but is only optimal in a limited sense.


Craig Hicks           hicks at cs.titech.ac.jp
Ogawa Laboratory, Dept. of Computer Science
Tokyo Institute of Technology, Tokyo, Japan
lab: 03-3726-1111 ext. 2190  		home: 03-3785-1974
fax: +81(3)3729-0685 (from abroad), 03-3729-0685  (from Japan)


From mpp at cns.brown.edu  Thu Aug  5 15:13:44 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Thu, 5 Aug 93 15:13:44 EDT
Subject: committees
Message-ID: <9308051913.AA13266@cns.brown.edu>

Scott Farrar writes:
-->John Hampshire characterized a committee as a collection of biased
-->estimators; the idea being that a collection of many different kinds of
-->bias might constitute a unbiased estimator.  I was wondering if anyone
-->had any ideas about how this might be related to, supported by, or refuted
-->by the Central Limit Theorem.  Could experimental variances or confounds
-->be likened to "biases", and if so, do these "average out" in a manner which
-->can give us a useful mean or useful estimator?

I think that this is a very interesting point because, for averaging with
MSE optimization, it is possible to show using the strong law of large numbers
that the bias of the average estimator converges to the expected bias of any
individual estimator while the variance converges to zero.  Thus the only way
to cancel existing bias using averaging is to average two (or more) different
populations from two (or more) estimators which are (somehow) known to have
complementary bias.  The trick is of course the "somehow"... Any ideas?

-Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912


From wray at ptolemy.arc.nasa.gov  Thu Aug  5 19:37:42 1993
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Thu, 5 Aug 93 16:37:42 PDT
Subject: committees
In-Reply-To: "Michael P. Perrone"'s message of Thu, 5 Aug 93 15:13:44 EDT <9308051913.AA13266@cns.brown.edu>
Message-ID: <9308052337.AA04745@ptolemy.arc.nasa.gov>


I'm not convinced that the notion of an "unbiased estimator" is useful
here.  

It comes from classical statistics and is really a means of justifying
the choice of an estimator for lack of better ideas.  An estimator is
"unbiased" if the average of the estimator based on all the other
samples which we might have seen (but didn't) is equal to the "truth".

Notice that unbiased estimators and the use of Occam's razor conflict.
We all routinely throw away an "unbiased" neural network,
	i.e.  the best fitting network, 
in favor of a smoother, simpler network,
	i.e.  by early stopping, weight decay, ....,
which is very clearly "biased".

So I think its a great thing to be biased.  

One reason for averaging is because we have several quite different
biased networks that we think are reasonable, so like any good
gambler, we hedge our bets.  Of course, averaging is also standard
Bayesian practice, i.e. an obvious result of the mathematics.

----------
Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov


From cohn at psyche.mit.edu  Fri Aug  6 11:28:11 1993
From: cohn at psyche.mit.edu (David Cohn)
Date: Fri, 6 Aug 93 11:28:11 EDT
Subject: Call for Participation: Workshop on Exploration
Message-ID: <9308061528.AA06177@psyche.mit.edu>

I am helping organize the following one-day workshop during the
post-NIPS workshops in Vail, Colorado, on December 3, 1993. We would
like to hear from people interested in participating in the workshop,
either formally, as a presenter, or informally, as an attendee. Even
if you will not be able to attend, if you have work which you feel is
relevant, and would like to see discussed, please contact me at the
email address below.

Given the limited time available, we will not be able to present
*every* approach, but we hope to cover a broad range of approaches,
both in formal presentations, and in informal discussion,

Many thanks in advance,

-David Cohn (cohn at psyche.mit.edu)

====================== begin workshop announcement =====================

         Robot Learning II: Exploration and Continuous Domains
                        A NIPS '93 Workshop
 
                            David Cohn
               Dept. of Brain and Cognitive Sciences
               Massachusetts Institute of Technology
                       Cambridge, MA 02138
                       cohn at psyche.mit.edu
 
The goal of this one-day workshop will be to provide a forum for
researchers active in the area of robot learning and related fields.
Due to the limited time available, we will focus on two major issues:
efficient exploration of a learner's state space, and learning in
continuous domains.
 
Robot learning is characterized by sensor noise, control error,
dynamically changing environments and the opportunity for learning by
experimentation.  A number of approaches, such as Q-learning, have
shown great practical utility learning under these difficult
conditions.  However, these approaches have only been proven to
converge to a solution if all states of a system are visited
infinitely often.  What has yet to be determined is whether we can
efficiently explore a state space so that we can learn without having
to visit every state an infinite number of times, and how we are to
address problems on continuous domains, where there are effectively an
infinite number of states to be visited.
 
This workshop is intended to serve as a followup to last year's
post-NIPS workshop on robot learning. The two problems to be addressed
this year were identified as two (of the many) crucial issues facing
the field.
 
The morning session of the workshop will consist of short
presentations discussing theoretical approaches to exploration and to
learning in continuous domains, followed by general discussion guided
by a moderator. The afternoon session will center on practical and/or
heuristic approaches to these problems in the same format.  As time
permits, we may also attempt to create an updated "Where do we go from
here?"  list, like that drawn up in last year's workshop.
 
The targeted audience for the workshop are those researchers who are
interested in robot learning, exploration, or active learning in
general.  We expect to draw an eclectic audience, so every attempt
will be made to ensure that presentations are accessible to people
without any specific background in the field.


From sontag at control.rutgers.edu  Fri Aug  6 17:35:16 1993
From: sontag at control.rutgers.edu (Eduardo Sontag)
Date: Fri, 6 Aug 93 17:35:16 EDT
Subject: Expository Tech Report on Neural Nets Available by FTP
Message-ID: <9308062135.AA06104@control.rutgers.edu>

As notes for a short course given at the 1993 European Control Conference this
summer, I prepared an expository introduction to two related topics:

1. Some mathematical results on "neural networks".

2. "Neurocontrol" and "learning control".

The choice of topics was heavily influenced by my interests, but some readers
may still find the material useful.  The two parts are essentially independent.
In particular, the part on mathematical results does not require any
knowledge of (nor interest in) control theory.

An *extended* version of the paper which appeared in the conference proceedings
is now available as a tech report.  This report, in postscript form, can be
obtained by anonymous FTP.  Retrieval instructions are as follows:

yourhost> ftp siemens.com
Connected to siemens.com.
220 siemens FTP server (SunOS 4.1) ready.
Name (siemens.com:sontag): anonymous
331 Guest login ok, send ident as password.
Password:
230 Guest login ok, access restrictions apply.
ftp> cd pub/learning/TechReports
250 CWD command successful.
ftp> bin
200 Type set to I.
ftp> get Sontag9302.ps.Z
200 PORT command successful.
150 Binary data connection for Sontag9302.ps.Z (128.6.62.9,1600) (114253 bytes)
226 Binary Transfer complete.
local: Sontag9302.ps.Z remote: Sontag9302.ps.Z
114253 bytes received in 24 seconds (4.6 Kbytes/s)
ftp> quit
221 Goodbye.
yourhost> uncompress Sontag9302.ps.Z
yourhost> lpr Sontag9302.ps (or however you print PostScript)
 
     ****** Please note: I am not able to send hardcopy. ******
-- 
Eduardo D. Sontag


From liaw%dylink.usc.edu at usc.edu  Fri Aug  6 18:45:39 1993
From: liaw%dylink.usc.edu at usc.edu (Jim Liaw)
Date: Fri, 6 Aug 93 15:45:39 PDT
Subject: Workshop on Neural Architectures and Distributed AI
Message-ID: <9308062245.AA23804@dylink.usc.edu>

Please note the change in deadline of submission of abstracts.

------


The Center for Neural Engineering
University of Southern California
announces a Workshop on

Neural Architectures and Distributed AI:
>From Schema Assemblages to Neural Networks
October 19-20, 1993

[This Workshop was previously 
scheduled for April 1993]

Program Committee: Michael Arbib (Organizer), 
George Bekey, Damian Lyons, Paul Rosenbloom, and 
Ron Sun

To design complex technological systems, we need a 
multilevel methodology which combines a coarse-
grain analysis of cooperative or distributed 
computation (we shall refer to the computing agents 
at this level as "schemas") with a fine-grain model 
of flexible, adaptive computation (for which neural 
networks provide a powerful general paradigm).  
Schemas provide a language for distributed 
artificial intelligence and perceptual robotics 
which is "in the style of the brain", but at a 
relatively high level of abstraction relative to 
neural networks.  We seek (both at the level of 
schema asemblages, and in terms of "modular" neural 
networks) a distributed model of computation, 
supporting many concurrent activities for 
recognition of objects, and the planning and 
control of different activities.  The use, 
representation, and recall of knowledge is mediated 
through the activity of a network of interacting 
computing agents which between them provide 
processes for going from a particular situation and 
a particular structure of goals and tasks to a 
suitable course of action.  This action may involve 
passing of messages, changes of state, 
instantiation to add new schema instances to the 
network, deinstantiation to remove instances, and 
may involve self-modification and self-
organization. Schemas provide a form of knowledge 
representation which differs from frames and 
scripts by being of a finer granularity.  Schema 
theory is generative: schemas may well be linked to 
others to provide yet more comprehensive schemas, 
whereas frames tend to "build in" from the overall 
framework.  The analysis of interacting computing 
agents (the schema instances) is intermediate 
between the overall specification of some behavior 
and the neural networks that subserve it.   The 
Workshop will focus on different facets of this 
multi-level methodology.  While the emphasis will 
be on technological systems, papers will also be 
accepted on  biological and cognitive systems.


Submission of Papers

A list of sample topics for contributions is as 
follows, where a hybrid approach means one in which 
the abstract schema level is integrated with neural 
or other lower level models:

Schema Theory as a description language for 
neural networks

Modular neural networks

Alternative paradigms for modeling symbolic 
and subsymbolic knowledge

Hierarchical and distributed representations: 
adaptation and coding:

Linking DAI to Neural Networks to Hybrid 
Architecture

Formal Theories of Schemas

Hybrid approaches to integrating planning & 
reaction

Hybrid approaches to learning 

Hybrid approaches to commonsense reasoning by 
integrating neural networks and rule-based reasoning 
(using schemas for the integration)

Programming Languages for Schemas and Neural 
Networks

Schema Theory Applied in Cognitive 
Psychology, Linguistics, and Neuroscience

Prospective contributors should send a five-page 
extended abstract, including figures with 
informative captions and full references - a hard 
copy, either by regular mail or fax - by 
August 30, 1993  to Michael Arbib, Center for 
Neural Engineering, University of Southern 
California, Los Angeles, CA 90089-2520, USA [Tel: 
(213) 740-9220, Fax: (213) 746-2863, 
arbib at pollux.usc.edu].   Please include your full 
address, including fax and email, on the paper. 

In accepting papers submitted in response to this 
Call for Papers, preference will be given to papers 
which present practical examples of, theory of, 
and/or methodology for the design and analysis of 
complex systems in which the overall specification 
or analysis is conducted in terms of a network of 
interacting schemas, and where some but not 
necessarily all of the schemas are implemented in 
neural networks.  Papers which present a single 
neural network for pattern recognition ("perceptual 
schema") or pattern generation ("motor schema") 
will not be accepted.  It is the development of a 
methodology to analyze the interaction of multiple 
functional units that constitutes the distinctive 
thrust of this Workshop.

Notification of acceptance or rejection will be 
sent by email no later than September 1, 1993.  
There are currently no plans to issue a formal 
proceedings of full papers, but (revised versions) 
of accepted abstracts received prior to October 1, 
1993 will be collected with the full text of the 
Tutorial in a CNE Technical Report which will be 
made available to registrants at the start of the 
meeting.


A number of papers have already been accepted for 
the Workshop.  These include the following:

Arbib: Schemas and Neural Networks: A Tutorial 
Introduction to Integrating Symbolic and 
Subsymbolic Approaches to Cooperative Computation

Arkin: Reactive Schema-based Robotic Systems: 
Principles and Practice

Heenskerk and Keijzer: A Real-time Neural 
Implementation of a Schema Driven Toy-Car

Leow and Miikkulainen, Representing and Learning 
Visual Schemas in Neural Networks for Scene 
Analysis

Lyons  & Hendriks: Describing and analysing robot 
behavior with schema theory

Murphy, Lyons & Hendriks: Visually Guided Multi-
Fingered Robot Hand Grasping as Defined by Schemas 
and a Reactive System

Sun: Neural Schemas and Connectionist Logic:  A 
Synthesis of the Symbolic and the Subsymbolic

Weitzenfeld: Hierarchy, Composition, Heterogeneity, 
and Multi-granularity in Concurrent Object-Oriented 
Programming for Schemas and Neural Networks

Wilson & Hendler: Neural Network Software Modules

Bonus Event: The CNE Research Review: 
Monday, October 18, 1993
The CNE Review will present a day-long sampling of 
CNE research, with talks by faculty, and students, 
as well as demos of hardware and software.  Special 
attention will  be paid to talks on, and demos in, 
our new Autonomous Robotics Lab and Neuro-Optical 
Computing Lab.  Fully paid registrants of the 
Workshop are entitled to attend the CNE Review at 
no extra charge.


Registration
The registration fee of $150 ($40 for qualified 
students who include a "certificate of student 
status" from their advisor) includes a copy of the 
abstracts, coffee breaks, and a dinner to be held 
on the evening of October 18th.

Those wishing to register should send a check 
payable to "Center for Neural Engineering, USC" for 
$150 ($40 for students and CNE members) together 
with the following information to Paulina Tagle, 
Center for Neural Engineering, University of 
Southern California, University Park, Los Angeles, 
CA 90089-2520, USA.


---------------------------------------------------
SCHEMAS AND NEURAL NETWORKS
Center for Neural Engineering, USC
October 19-20, 1993
NAME:  ___________________________________________ 
ADDRESS: _________________________________________ 
PHONE NO.: _______________ 
FAX:___________________ 
EMAIL: ___________________________________________
I intend to submit a paper: YES  [   ]      NO   [   ]

I wish to be registered for the CNE Research 
Review: YES  [   ]      NO   [   ]


Accommodation
Attendees may register at the hotel of their 
choice, but the closest hotel to USC is the 
University Hilton, 3540 South Figueroa Street, Los 
Angeles, CA 90007, Phone:  (213) 748-4141, 
Reservation: (800) 872-1104, Fax: (213) 7480043.  A 
single room costs $70/night while a double room 
costs $75/night.  Workshop participants must 
specify that they are "Schemas and Neural Networks 
Workshop" attendees to avail of the above rates.   
Information on student accommodation may be 
obtained from the Student Chair, Jean-Marc Fellous, 
fellous at pollux.usc.edu.


From sims at pdesds1.scra.org  Mon Aug  9 07:39:31 1993
From: sims at pdesds1.scra.org (Jim Sims)
Date: Mon, 9 Aug 93 07:39:31 EDT
Subject: [fwd: jbeard@aip.org: bee learning in Nature]
Message-ID: <9308091139.AA02487@pdesds1.noname>

 Some cross-disciplinary, ah, pollination.

jim


From greiner at learning.siemens.com  Mon Aug  9 14:59:26 1993
From: greiner at learning.siemens.com (Russell Greiner)
Date: Mon, 9 Aug 93 14:59:26 EDT
Subject: CLNL'93 Schedule
Message-ID: <9308091859.AA05371@eagle.siemens.com>

	***********************************************************
	* CLNL'93 -- Computational Learning and Natural Learning  *
	*		Provincetown, Massachusetts		  *
	*	 	  10-12 September 1993			  *
	***********************************************************

CLNL'93 is the fourth of an ongoing series of workshops designed to bring
together researchers from a diverse set of disciplines --- including
  computational learning theory, AI/machine learning, 
  connectionist learning, statistics, and control theory ---
to explore issues at the intersection of theoretical learning research
and natural learning systems.

The schedule of presentations appears below, followed by logistics and
information on registration

================  ** CLNL'93 Schedule (tentative) **  =======================

Thursday 9/Sept/93:
    6:30-9:00 (optional) Ferry (optional): Boston to Provincetown
	[departs Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue]

Friday  10/Sept/93  [CLNL meetings, at Provincetown Inn]
    9   - 9:15	Opening remarks
    9:15-10:15	Scaling Up Machine Learning:  Practical and Theoretical Issues
			Thomas Dietterich [Oregon State Univ] 
			(invited talk, see abstract below)
		
   10:30-12:30	Paper session 1
	What makes derivational analogy work: an experience report using APU
		Sanjay Bhansali [Stanford]; Mehdi T. Harandi [Univ of Illinois]
	Scaling Up Strategy Learning:  A Study with Analogical Reasoning
		Manuela M. Veloso [CMU]
	Learning Hierarchies in Stochastic Domains
		Leslie Pack Kaebling [Brown]
	Learning an Unknown Signalling Alphabet
		Edward C. Posner, Eugene R. Rodemich [CalTech/JPL]

   12:30- 2	Lunch (on own)

	Unscheduled TIME
	( Whale watching, beach walking, ...		)
	( Poster set-up time; Poster preview (perhaps)	)

		Dinner (on own)

    7 - 10	Poster Session	[16 posters]
   		(Hors d'oeuvres)
	Induction of Verb Translation Rules from Ambiguous Training and a
			Large Semantic Hierarchy
		Hussein Almuallim, Yasuhiro Akiba, Takefumi Yamazaki, Shigeo Kaneda
		[NTT Network Information Systems Lab.]
	What Cross-Validation Doesn't Say About Real-World Generalization
		Gunner Blix, Gary Bradshaw, Larry Rendall [Univ of Illinois]
	Efficient Learning of Regular Expressions from Approximate Examples
		Alvis Brazma [Univ of Latvia]
	Capturing the Dynamics of Chaotic Time Series by Neural Networks
		Gurtavo Deco, Bernd Schurmann [Siemens AG]
	Learning One-Dimensional Geometrical Patterns Under One-Sided Random
			Misclassification Noise 
		Paul Goldberg [Sandia National Lab]; Sally Goldman [Washington Univ]
	Adaptive Learning of Feedforward Control Using RBF Network ...
		Dimitry M Gorinevsky	[Univ of Toronto]
	A practical approach for evaluating generalization performance
		Marjorie Klenin [North Carolina State Univ]
	Scaling to Domains with Many Irrelevant Features
		Pat Langley, Stephanie Sage  [Siemens Corporate Research]
	Variable-Kernel Similarity Metric Learning
		David G. Lowe [Univ British Columbia]
	On-Line Training of Recurrent Neural Networks with Continuous
			Topology Adaptation  
		Dragan Obradovic [Siemens AG]
	N-Learners Problem:  System of PAC Learners
		Nageswara Rao, E.M. Oblow [Engineering Systems/Advanced Research]
	Soft Dynamic Programming Algorithms:  Convergence Proofs
		Satinder P. Singh	[Univ of Mass]
	Integrating Background Knowledge into Incremental Concept Formation
		Leon Shklar [Bell Communications Research]; Haym Hirsh [Rutgers]
	Learning Metal Models
		Astro Teller [Stanford]
	Generalized Competitive Learning and then Handling of Irrelevant Features
		Chris Thornton	[Univ of Sussex]
	Learning to Ignore:  Psychophysics and Computational Modeling of Fast
			Learning of Direction in Noisy Motion Stimuli
		Lucia M. Vaina [Boston Univ], John G. Harris [Univ of Florida]

Saturday 11/Sept/93 [CLNL meetings, at Provincetown Inn]
    9:00-10:00	Current Tree Research 
			Leo Breiman [UCBerkeley]
			(invited talk, see abstract below)

   10:30-12:30	Paper session 2
	Initializing Neural Networks using Decision Trees
		Arunava Banerjee [Rutgers]
	Exploring the Decision Forest
		Patrick M. Murphy, Michael Pazzani [UC Irvine]
	What Do We Do When There Is Outrageous Data Points in the Data Set? -
			Algorithm for Robust Neural Net Regression 
		Yong Liu [Brown]
	A Comparison of RBF and MLP Networks for Classification of 
			Biomagnetic Fields 
		Martin F. Schlang, Ralph Neunier, Klaus Abraham-Fuchs [Siemens AG]

   12:30- 2	Lunch (on own)

    2:30- 3:30	TBA (invited talk)
			Yann le Cun [ATT]

    4:00- 6:00	Paper session 3
	On Learning the Neural Network Architecture: An Average Case Analysis
		Mostefa Golea [Univ of Ottawa]
	Fast (Distribution Specific) Learning
		Dale Schuurmans  [Univ of Toronto]
	Computational capacity of single neuron models
		Anthony Zador  [Yale Univ School of Medicine]
	Probalistic Self-Structuring and Learning
		A.D.M. Garvin, P.J.W. Rayner [Cambridge]

    7:00- 9	Banquet dinner

Sunday 12/Sept/93  [CLNL meetings, at Provincetown Inn]
    9   -11	Paper session 4
	Supervised Learning from real and Discrete Incomplete Data
		Zoubin Ghaharamani, Michael Jordan [MIT]
	Model Building with Uncertainty in the Independent Variable
		Volker Tresp, Subutai Ahmad, Ralph Neuneier	[Siemens AG]
	Supervised Learning using Unclassified and Classified Examples
		Geoff Towell [Siemens Corp. Res.]
	Learning to Classify Incomplete Examples
		Dale Schuurmans [Univ of Toronto]; R. Greiner [Siemens Corp. Res.]

   11:30 -12:30	TBA (invited talk)
			Ron Rivest [MIT] 

   12:30 - 2	Lunch (on own)

   3:30 - 6:30	Ferry (optional): Provincetown to Boston
	    Depart from Boston (on own)

------ ------
    Scaling Up Machine Learning:  Practical and Theoretical Issues
                                   
                         Thomas G. Dietterich
                     Oregon State University and
                   Arris Pharmaceutical Corporation
                                   

Supervised learning methods are being applied to an ever-expanding
range of problems.  This talk will review issues arising in these
applications that require further research.  The issues can be
organized according to the problem-solving task, the form of the
inputs and outputs, and any constraints or prior knowledge that must
be considered.  For example, the learning task often involves
extrapolating beyond the training data in ways that are not addressed
in current theory or engineering experience.  As another example, each
training example may be represented by a disjunction of feature
vectors, rather than a unique feature vector as is usually assumed.
More generally, each training example may correspond to a manifold of
feature vectors.  As a third example, background knowledge may take
the form of constraints that must be satisfied by any hypothesis
output by a learning algorithm.  The issues will be illustrated using
examples from several applications including recent work in
computational drug design and ecosystem modelling.

--------
		Current Tree Research

		   Leo Breiman 
	     Deptartment of Statistics
         University of California, Berkeley

This talk will summarize current research by myself and collaborators
into methods of enhancing tree methodology.  The topics covered will be:

1)  Tree optimization
2)  Forming features
3)  Regularizing trees
4)  Multiple response trees
5)  Hyperplane trees

These research areas are in a simmer.  They have been programmed and
are undergoing testing.  The results are diverse.  

--------
--------

Programme Committee:
  Andrew Barron, Russell Greiner, Tom Hancock, Steve Hanson, Robert Holte, 
  Michael Jordan, Stephen Judd, Pat Langley, Thomas Petsche, Tomaso Poggio,
  Ron Rivest, Eduardo Sontag, Steve Whitehead 

Workshop Sponsors:
  Siemens Corporate Research	and	MIT Laboratory of Computer Science

================  ** CLNL'93 Logistics **  =======================

Dates:
  The workshop begins at 9am Friday 10/Sept, and concludes by 3pm 
   Sunday 12/Sept, in time to catch the 3:30pm Provincetown--Boston ferry.

Location:  
  All sessions will take place in the Provincetown Inn (800 942-5388); we
  encourage registrants to stay there.   Provincetown Massachusetts is located
  at the very tip of Cape Cod, jutting into the Atlantic Ocean. 

Transportation: 
  We have rented a ship from  The Portuguese Princess  to transport CLNL'93
  registrants from Boston to Provincetown on Thursday 9/Sept/93, at no charge
  to the registrants.  We will also supply light munchies en route.  This ship
  will depart from the back of Boston Harbor Hotel, 70 Rowes Wharf on Atlantic
  Avenue (parking garage is 617 439-0328); tentatively at 6:30pm.
  If you are interested in using this service, please let us know ASAP (via
  e-mail to clnl93 at learning.scr.siemens.com) and  also tell us whether you be
  able to make the scheduled 6:30pm departure.

  (N.b., this service replaces the earlier proposal, which involved the
  Bay State Cruise Lines.)

  The drive from Boston to Provincetown requires approximately two hours.
  There are cabs, busses, ferries and commuter airplanes (CapeAir, 800 352-0714)
  that service this Boston--Provincetown route.
  The Hyannis/Plymouth bus (508 746-0378) leaves Logan Airport at 8:45am,
  11:45am, 2:45pm, 4:45pm on weekdays, and arrives in Provincetown about 
  4 hours later; its cost is $24.25.
  For the return trip (only), Bay State Cruise Lines (617 723-7800) runs a
  ferry that departs Provincetown at 3:30pm on Sundays, arriving at
  Commonwealth Pier in Boston Harbor at 6:30pm; its cost is $15/person, one way. 

Inquiries:
  For additional information about CLNL'93, contact 
	 clnl93 at learning.scr.siemens.com
  or
	 CLNL'93 Workshop
         Learning Systems Department
	 Siemens Corporate Research
	 755 College Road East
	 Princeton, NJ 08540--6632

To learn more about Provincetown, contact their 
	Chamber of Commerce at 508 487-3424.


================  ** CLNL'93 Registration **  =======================

Name:		________________________________________________
Affiliation:	________________________________________________
Address:	________________________________________________
		________________________________________________
Telephone: ____________________  	E-mail: ____________________

Select the appropriate options and fees:

Workshop registration fee	 ($50 regular; $25 student) 	   ___________
  Includes
    * attendance at all presentation and poster sessions
    * the banquet dinner on Saturday night; and
    * a copy of the accepted abstracts.

Hotel room			 ($74 = 1 night deposit)	   ___________
  [This is at the Provincetown Inn, assuming a minimum stay of 
   2 nights.   The total cost for three nights is $222 = $74 x 3, 
   plus optional breakfasts. 
   Room reservations are accepted subject to availability.  
   See hotel for cancellation policy.]

	Arrival date ___________    Departure date _____________
	Name of person sharing room (optional)  __________________
	  [Notice the $74/night does correspond to $37/person per
	   night double-occupancy, if two people share one room.]
	# of breakfasts desired ($7.50/bkfst; no deposit req'd) ___

Total amount enclosed:						   ___________


If you are not using a credit card, make your check payable in U.S. dollars
to "Provincetown Inn/CLNL'93", and mail your completed registration form to 
	Provincetown Inn/CLNL
	P.O. Box 619
	Provincetown, MA 02657.
If you are using Visa or MasterCard, please fill out the following,
which you may mail to above address, or FAX to 508 487-2911.
	Signature:	    ______________________________________________
	Visa/MasterCard #:  ______________________________________________
	Expiration:         ______________________________________________


From bill at nsma.arizona.edu  Mon Aug  9 17:00:59 1993
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Mon, 09 Aug 1993 14:00:59 -0700 (MST)
Subject: [fwd: jbeard@aip.org: bee learning in Nature]
Message-ID: <9308092100.AA10510@nsma.arizona.edu>


This is a very interesting piece of work, but the "news release" is
overblown and historically ignorant.  The connection between mushroom
bodies and learning has been known for a long time.  There is also
direct evidence for changes in the structure of the mushroom bodies as
a result of experience:  Coss and Perkel over a decade ago found
changes in the length of dendritic spines after honeybees went on a
single exploratory flight.  This is much more direct than the evidence
described in the "news release".
 
Contrary to the claims in the "news release", these new results are
unlikely to tell us much about human learning.  It is not true that the
honeybee brain is merely a simpler version of the human brain.  They're
completely different -- even the neurons are different in structure.
Also insect learning and mammal learning are qualitatively different:
for example, both honeybees and mammals can learn to navigate to a
location using landmarks, but honeybees do it by simple visual
pattern-matching, while mammals use considerably more sophisticated
algorithms.  

Furthermore, it is not news that experience can lead to an increase in
the number of connections.  It has long been known that mammals raised
in an enriched environment have thicker cortices, due to a greater
density of synaptic structures.  Surely this is more directly relevant
to humans than data from honeybees could be.

It's a shame to obscure a nice piece of work by making bogus claims
about its significance.

	-- Bill


From dhw at santafe.edu  Tue Aug 10 15:58:41 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 10 Aug 93 13:58:41 MDT
Subject: Provable optimality of averaging generalizers
Message-ID: <9308101958.AA15514@zia>


Michael Perrone writes:


>>>
In the case of averaging for MSE optimization (the meat and potatoes of 
neural networks) and any other convex measure, the improvement due
to averaging is independent of the distribution - on-training or off-.
It depends only on the topology of the optimization measure. 

It is important to note that this result does NOT say the average is
better than any individual estimate - only better than the average
population performance.  For example, if one had a reliable selection
criterion for deciding which element of a population of estimators was
the best and that estimator was better than the average estimator,
then just choose the better one. (Going one step further, simply use
the selection criterion to choose the best estimator from all possible
weighted averages of the elements of the population.) As David Wolpert
pointed out, any estimator can be confounded by a pathological data
sample and therefore there doesn't exist a *guaranteed* method for
deciding which estimator is the best from a population in all cases.
Weak (as opposed to guaranteed) selection criteria exist in in the
form of cross-validation (in all of its flavors).  Coupling
cross-validation with averaging is a good idea since one gets the best
of both worlds particularly for problems with insufficient data.

I think that another very interesting direction for research (as David Wolpert
alluded to) is the investigation of more reliable selection criterion.
>>>


***

Well, I agree with the second two paragraphs, but not the first. At
least not exactly as written. Although Michael is making an
interesting and important point, I think it helps to draw attention to
some things:

I) First, I haven't yet gone though Michael's work in detail, but it
seems to me that the "measures" Michael is referring to really only
make sense as real-world cost functions (otherwise known as loss
functions, sometimes as risk functions, etc.). Indeed many very
powerful learning algorithms (e.g., memory based reasoning) are not
directly cast as finding the minimum on an energy surface, be it
"convex" or otherwise. For such algorithms, "measures" come in with
the cost function.

In fact, *by definition*, one is only interested in such real world
cost - results concerning anything else do not concern the primary
object of interest.

With costs, an example of a convex surface is the quadratic cost
function, which says that given truth f, your penalty for guessing h
is given by the function (f - h)^2. For such a cost, Michael's result
holds essentially because by guessing the average you reduce variance
but keep the same bias (as compared to the average over all
guesses). In other words, it holds because for any f, h1, and h2, [(h1
+ h2)/2 - f)]^2 <= [(h1 - f)^2 + (h2 - f)^2] / 2. (When f, h1, and h2
refer to distributions rather than single values, as Michael rightly
points out, you have to worry about other issues before making this
statement, like whether the distributions are correlated with one
another.)

***

It should be realized though that there are many non-convex cost
functions in the real world. For example, when doing classification,
one popular cost function is zero-one. This function says you get credit
for guessing exactly correctly, and if you miss, it doesn't matter
what you guessed; all misses "cost you" the same.

This cost function is implicit in much of PAC, stat. mech. of
learning, etc. Moreover, in Bayesian decision theory, guessing the
weights which maximize the posterior probability P(weights | data)
(which in the Bayesian perspective of neural nets is exactly what is
done in backprop with weight decay) is the optimal strategy only for
this zero-one cost.

Now if we take this zero-one cost function, and evaluate it only off
the training set, it is straight-forward to prove that for a uniform
Pr(target function), the probability of a certain value of cost, given
data, is independent of the learning algorithm. (The same result holds
for other cost functions as well, though as Michael points out, you
must be careful in trying to extend this result to convex cost
functions.)

This is true for any data set, i.e., it is not based on "pathological
data", as Michael puts it. It says that unless you can rule out a
uniform Pr(target function), you can not prove any one algorithm to be
superior to any other (as far as this cost function is concerned).


***

II) Okay. Now Michael correctly points out that even in those cases w/
a convex cost "measure", you must interpret his result with caution. I
agree, and would say that this is somewhat like the famous "two
letters" paradox of probability theory. Consider the following:

1) Say I have 3 real numbers, A, B, and X. In general, it's always
true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} /
2. (This is exactly analogous to having the cost of the average guess
bounded above by the average cost of the individual guesses.)

2) This means that if we had a choice of either randomly drawing one
of the numbers {A, B}, or drawing C, that on average drawing C would
give smaller quadratic cost with respect to X.

3) However, as Michael points out, this does *not* mean that if we had
just the numbers A and C, and could either draw A or C, that we should
draw C. In fact, point (1) tells us nothing whatsoever about whether A
or C is preferable (as far as quadratic cost with respect to X is
concerned).

4) In fact, now create a 5th number, D = [C + A] / 2. By the same
logic as in (1), we see that the cost (wrt/ X) of D is less than the
average of the costs of C and A. So to the exact same degree that (1)
says we "should" guess C rather than A or B, it also says we should
guess D rather than A or C. (Note that this does *not* mean that D's
cost is necessarily less than C's though; we don't get endlessly
diminishing costs.)

5) Step (4) can be repeated ad infinitum, getting a never-ending
sequence of "newly optimal" guesses. In particular, in the *exact*
sense in which C is "preferable" to A or B, and therefore should
"replace" them, D is preferable to A or B, and therefore should
replace *them* (and in particular replace C). So one is never left
with C as the object of choice.

***

So (1) isn't really normative; it doesn't say one "should" guess the
average of a bunch of guesses:

7) Choosing D is better than randomly choosing amongst C or A, just as
   choosing C is better than randomly choosing amongst A or B.

8) This doesn't mean that given C, one should introduce an A and
then guess the average of C and A (D) rather than C, just as
   this doesn't mean that given A, one should introduce a B and 
then guess the average of A and B (C) rather than A.

9) An analogy which casts some light on all this: view A and B not
as the outputs of separate single-valued learning algorithms, but rather
as the random outputs of a single learning algorithm. Using this analogy,
the result of Michael's, that one should always guess C rather than 
randomly amongst A or B, suggest that one should always use a
deterministic, single-valued learning algorithm (i.e., just guess C) 
rather than one that guesses randomly from a distribution over
possible guesses (i.e., one that guess randomly amongst A or B). 

This implication shouldn't surprise anyone familiar with Bayesian
decision theory. In fact, it's (relatively) straight-forward to prove
that independent of priors or the like, for a convex cost function,
one should always use a single-valued learning algorithm rather than
one which guesses randomly. (This has probably been proven many
times. One proof can be found in Wolpert and Stolorz, On the
implementation of Bayes optimal generalizers, SFI tech. report
92-03-012.)
(Blatant self-promotion: Other interesting things proven in that
report and others in its series are: there are priors and noise
processes such that the expected cost, given the data set and that one
is using a Bayes-optimal learning algorithm, can *decrease* with added
noise; if the cost function is a proper metric, then the magnitude of
the change in expected cost if one guesses h rather than h' is bounded
above by the cost of h relative to h'; other results about using
"Bayes-optimal" generalizers predicated on an incorrect prior, etc., etc.)

***

The important point is that although it is both intriguing and
illuminating, there are no implications of Michael's result for what
one should do with (or in place of) a particular deterministic,
single-valued learning algorithm.  It was for such learning algorithms
that my original comments were intended.


David Wolpert


From dhw at santafe.edu  Tue Aug 10 16:29:07 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 10 Aug 93 14:29:07 MDT
Subject: MacKay's recent work and feature selection
Message-ID: <9308102029.AA15554@zia>


Recently David MacKay made a posting concerning a technique he used to
win an energy prediction competition. Parts of that technique have
been done before (e.g., combining generalizers via validation set
behavior). However other parts are both novel and very interesting.
This posting concerns the "feature selection" aspect of his technique,
which I understand MacKay developed in association w/ Radford Neal.
(Note: MacKay prefers to call the technique "automatic relevance
determination"; nothing I'll discuss here will be detailed enough for
that distinction to be important though.)

What I will say grew out of conversations w/ David Rosen and Tom
Loredo, in part. Of course, any stupid or silly aspects to what I will
say should be assumed to originate w/ me.

***

Roughly speaking, MacKay implemented feature selection in a neural net
framework as follows:

1) Define a potentially different "weight decay constant" (i.e.,
regularization hyperparameter) for each input neuron. The idea is
that one wants to have those constants set high for input neurons
representing "features" of the input vector which it behooves us to
ignore.

2) One way to set those hyperparameters would be via a technique like
cross-validation. MacKay instead set them via maximum likelihood,
i.e., he set the weight decay constants alpha_i to those values
maximizing P(data | alpha_i). Given a reasonably smooth prior
P(alpha_i), this is equivalent to finding the maximum a posterior
(MAP) alpha_i, i.e., the alpha_i maximizing P(alpha_i | data).

3) Empirically, David found that this worked very well. (I.e., he won
the competition.)

***

This neat idea makes some interesting suggestions:

1) The first grows out of "blurring" the distinction between
parameters (i.e., weights w_j) and hyperparameters (the
alpha_i). Given such squinting, MacKay's procedure amounts to a sort
of "greedy MAP". First he sets one set of parameters to its MAP values
(the alpha_i), and then with those values fixed, he sets the other
parameters (the w_j) to their MAP values (this is done via the usual
back-propagation w/ weight-decay, which we can do since the first
stage set the weight decay constants). In general, the resultant
system will not be at the global MAP maximizing P(alpha_i, w_j | D).
In essence, a sort of extra level of regularization has been
added. (Note: Radford Neal informs me that calculationally, in the
procedure MacKay used, the second MAP step is "automatic", in the
sense that one has already made the necessary calcualtions to perform
that step when one carries out the first MAP step.)

Of course, this viewing the technique from a "blurred" perspective is
a bit of a fudge, since hyperparameters are not the same thing as
parameters. Nonetheless, this view suggests some interesting new
techniques. E.g., first set the weights leading to hidden layer 1 to
their MAP values (or maximum likelihood values, for that matter). Then
with those values fixed, do the same to the weights in the second
layer, etc.  Another reason to consider this layer-by-layer technique
is the fact that training of the weights connecting different layers
should in general be distinguishable, e.g., as MacKay has pointed out,
one should have different weight-decay constants for the different
layers.

2) Another interesting suggestion comes from justifying the technique
not as a priori reasonable, but rather as an approximation to a full
"hierarchical" Bayesian technique, in which one writes

P(w_j | data) (i.e., the ultimate object of interest)
	prop. to
integral d_alpha_i P(data | w_j alpha_i) P(w_j | alpha_i) P(alpha_i).

Note that all 3 distributions occuring in this integrand must be set
in order to use MacKay's technique. (The by-now-familiar object of
contention between MacKay and myself is on how generically this
approximation will be valid, and whether one should explicitly test
its validity when one claims that it holds. This issue isn't pertinent
to the current discussion however.)

Let's assume the approximation is very good. Then under the
assumptions:
i) P(alpha_i) is flat enought to be ignored;
ii) the distribution P(w_j | alpha_i) is a product of gaussians (each
gaussian being for those w_j connecting to input neuron i, i.e., for
those weights using weight decay constant alpha_i);

then what MacKay did is equivalent to back-propagation with
weight-decay, where rather than minimizing

{training set error} + constant x {sum over all j (w_j)^2},

as in conventional weight decay, MacKay is minimizing (something like)

{training set error } +
   {(sum over i) [ (number of weights connecting to neuron i)  x
	   ln [(sum over j; those weights connecting to neuron i) (w_j)^2] ]}.

What's interesting about this isn't so much the logarithm in the
"weight decay" term, but rather the fact that weights are being
clumped together in that weight-decay term, into groups of those
weights connecting to the same neuron. (This is not true in
conventional weight decay.) So in essence, the weight-decay term in
MacKay's scenario is designed to affect all the weights connecting to a
given neuron as a group. This makes intuitive sense if the goal is
feature selection.

3) One obvious idea based on viewing things this way is to try to
perform weight-decay using this modified weight-decay term. This might
be reasonable even if MacKay's technique is not a good approximation
to this full Bayesian technique.

4) The idea of MacKay's also leads to all kinds of ideas about how to
set up the weight-decay term so as to enforce feature selection (or
automatic relevance determination, if you prefer). These need not have
anything to do w/ the precise weight-decay term MacKay used; rather
the idea is simply to take his (implicit) suggestion of trying to do
feature selection via the weight-decay term, and see where it leads.

For example: Where originally we have input neurons at layer 1, hidden
layers 2 through n, and then output neurons at layer n+1, now have the
same architecture with an extra "pre-processing" layer 0 added. Inputs
are now fed to the neurons at layer 0. For each input neuron at layer
0, there is one and only weight, leading straight up to the neuron at
layer 1 which in the original formulation was the (corresponding)
input neuron.

The hope would be that for those input neurons which we "should" mostly
ignore, something like backprop might set the associated weights from
layer 0 to layer 1 to very small values.


David Wolpert


From rsun at athos.cs.ua.edu  Tue Aug 10 17:33:56 1993
From: rsun at athos.cs.ua.edu (Ron Sun)
Date: Tue, 10 Aug 1993 16:33:56 -0500
Subject: No subject
Message-ID: <9308102133.AA22967@athos.cs.ua.edu>


                            CALL  FOR   PAPERS


   International Symposium on Integrating Knowledge and Neural Heuristics
			      (ISIKNH'94)

Sponsored by University of Florida, and AAAI, 
in cooperation with IEEE Neural Network Council,
and Florida AI Research Society.

Time: May 9-10 1994; Place: Pensacola Beach, Florida, USA.


A large amount of research has been directed
toward integrating neural and symbolic methods in recent years. 
Especially, the integration of knowledge-based principles and
neural heuristics holds great promise
in solving complicated real-world problems.
This symposium will provide a forum for discussions
and exchanges of ideas in this area. The objective of this symposium
is to bring together researchers from a variety of fields
who are interested in applying neural network techniques
to augmenting existing knowledge or proceeding the other way around,
and especially, who have demonstrated that this combined approach 
outperforms either approach alone. 
We welcome views of this problem from
areas such as constraint-(knowledge-) based learning and
reasoning, connectionist symbol processing,
hybrid intelligent systems, fuzzy neural networks,
multi-strategic learning, and cognitive science.

Examples of specific research include but are not limited to:
1. How do we build a neural network based on {\em a priori}
knowledge (i.e., a knowledge-based neural network)?
2. How do neural heuristics improve the current model
for a particular problem (e.g., classification, planning,
signal processing, and control)?
3. How does knowledge in conjunction with neural heuristics
contribute to machine learning?
4. What is the emergent behavior of a hybrid system?
5. What are the fundamental issues behind the combined approach?

Program activities include keynote speeches, paper presentation,
and panel discussions.

*****
Scholarships are offered to assist students in attending the
symposium.  Students who wish to apply for a scholarship should send
their resumes and a statement of how their researches are related
to the symposium. 
*****


Symposium Chairs:
LiMin Fu, University of Florida, USA. 
Chris Lacher,  Florida State University, USA. 

Program Committee: 
Jim Anderson,   Brown University,  USA 
Michael Arbib,  University of Southern California,  USA 
Fevzi Belli,  The University of Paderborn,  Germany 
Jim Bezdek,  University of West Florida,  USA 
Bir Bhanu,  University of California,  USA  
Su-Shing Chen,  National Science Foundation,  USA 
Tharam Dillon,  La Trobe University,  Australia 
Douglas Fisher,  Vanderbilt University,  USA  
Paul Fishwick,  University of Florida,  USA 
Stephen Gallant,  HNC Inc.,  USA 
Yoichi Hayashi,  Ibaraki University,  Japan 
Susan I. Hruska,  Florida State University,  USA 
Michel Klefstad-Sillonville  CCETT,  France 
David C. Kuncicky,  Florida State University,  USA 
Joseph Principe,  University of Florida,  USA 
Sylvian Ray,  University of Illinois,  USA 
Armando F. Rocha, University of Estadual, Brasil 
Ron Sun,  University of Alabama,  USA 

Keynote Speaker: Balakrishnan Chandrasekaran, Ohio-State University 


Schedule for Contributed Papers
----------------------------------------------------------------------
Paper Summaries Due: December 15, 1993 
Notice of Acceptance Due: February 1, 1994 
Camera Ready Papers Due: March 1, 1994

Extended paper summaries should be 
limited to four pages (single or double-spaced)
and should include the title, names of the authors, the
network and mailing addresses and telephone number of the corresponding
author.  Important research results should be attached. 
Send four copies of extended paper summaries to 

      LiMin Fu 
      Dept. of CIS, 301 CSE 
      University of Florida 
      Gainesville, FL 32611 
      USA 
      (e-mail: fu at cis.ufl.edu; phone: 904-392-1485).

Students' applications for a scholarship should also be sent
to the above address.

General information and registration materials can be obtained by
writing to 

      Rob Francis 
      ISIKNH'94 
      DOCE/Conferences 
      2209 NW 13th Street, STE E 
      University of Florida 
      Gainesville, FL 32609-3476 
      USA 
      (Phone: 904-392-1701; fax: 904-392-6950)
---------------------------------------------------------------------


---------------------------------------------------------------------
If you intend to attend the symposium, you may submit the following
information by returning this message:


NAME: _______________________________________
ADDRESS: ____________________________________
_____________________________________________
_____________________________________________
_____________________________________________
_____________________________________________
PHONE: ______________________________________
FAX: ________________________________________
E-MAIL: _____________________________________


---------------------------------------------------------------------


From ld231782 at longs.lance.colostate.edu  Wed Aug 11 00:56:26 1993
From: ld231782 at longs.lance.colostate.edu (L. Detweiler)
Date: Tue, 10 Aug 93 22:56:26 -0600
Subject: neuroanatomy list ad & more on bee brains
In-Reply-To: Your message of "Mon, 09 Aug 93 14:00:59 PDT."
             <9308092100.AA10510@nsma.arizona.edu> 
Message-ID: <9308110456.AA06912@longs.lance.colostate.edu>

While many on this list will not be interested in the details of
bee-brain neuroanatomy or arguments thereon, an excellent list for
discussions of this can be requested from
cogneuro-request at ptolemy.arc.nasa.gov, maintained by Kimball Collins
<kpc at ptolemy.arc.nasa.gov>. The list has fairly low volume although
definitely more than connectionists, and I'd like to encourage any of
this amazingly literate connectionist crowd with a strong interest in
neurobiological research to subscribe (recent/past topics: neurobiology
of rabies infections, Hebb's rule, vision, dyslexia, etc.)

* * *

Mr. Skaggs writes an exceedingly hostile flame (a redundant phrase) on
the recent syndicated news article describing research into bee
function and neuroanatomy, calling it `overblown and historically
ignorant'. While I don't have as close of a background to the area in
question as Mr. Skaggs appears to, this is just a short note to balance
the scale a little closer to equilibrium.

The critical feature that I see going on here is a professional
scientist demeaning a non-detailed popular account of scientific work,
esp. in that person's area of expertise, for lapses in precise
description. This happens all the time, of course, both the presence of
the quasi-skewed material and the criticism. Definitely, the article
was the overwrought cheerleeding type, rather stereotypical, but Mr.
Skaggs, on the other hand, plays into the cliche of the pessimistic and
sour curmudgeon-scientist in attacking it.

I'd like to point out that this popular literature serves a very useful
purpose in keeping the lay public apprised of new developments in
scientific fields and, ultimately, encouraging funding. It is not fair
to apply the strict scientific standard of evaluation to something that
appears in the popular press. In this case, there is no significant
error, and the purpose is served in being `approximately correct', and
there is no point to rebutting it. We are bound to lose something in
the translation, and the major points of disagreement are likely to be
over opinion. We should instead be highly encouraged and appreciative
of these attempts to bring increasingly abstruse and technical science
to the interested layman.

I appreciate the popular press to some degree in that it forces
scientists to get at the essence of their research, something they
sometimes lose sight of. The scientist (perhaps the neuroscientist in
particular) is forever saying `it's not quite that simple' or `it
doesn't quite happen like that' or `there are exceptions to that' to
the point that an outsider can give up in frustration, thinking that it
is nothing but a disconnected morass with no underlying message or
cohesion. The general press usually gives a close and fascinating view
into what the `big picture' is. Looking at reporters as nothing but
clueless intruders is a somewhat self-destructive position, IMHO. And
yes, the grandiose statements like `will shed insight into human
learning' can be recognized by other scientists as the necessary fodder
and not criticized but ignored.

Now, to address a few points:

>Coss and Perkel over a decade ago found
>changes in the length of dendritic spines after honeybees went on a
>single exploratory flight.  This is much more direct than the evidence
>described in the "news release".
 
Incidentally, the changes in dendritic growth with learning are IMHO
one of the most fascinating studies of plasticity, and on the cutting
edge of current research, and perhaps others will wish to post
references. (The classic study showed that rats reared in deprived vs.
abundant sensory-stimulii containing environments had less or more
growth, respectively.)

>It is not true that the
>honeybee brain is merely a simpler version of the human brain.  They're
>completely different -- even the neurons are different in structure.

definitely, any animal model always has minor or major imperfections
and pitfalls. But this brings up an interesting point--is there an
analogue to LTP in the insect brain? there is probably at least a
degree of overlap in the kinds of neurotransmitters involved.

However, arguing against the relevance, superiority, and verisimilitude
of one animal model vs. another can turn into a very emotional debate,
and should be engaged with the utmost delicacy or statements come out
with a connotation much like `the car you drive all day is worthless'.


From delliott at src.umd.edu  Wed Aug 11 17:52:44 1993
From: delliott at src.umd.edu (David L. Elliott)
Date: Wed, 11 Aug 1993 17:52:44 -0400
Subject: Call for papers, NeuroControl book
Message-ID: <199308112152.AA04995@newra.src.umd.edu>

		  PROGRESS IN NEURAL NETWORKS
		  series Editor O. M. Omidvar
Special Volume:
NEURAL NETWORKS FOR CONTROL
Editor: David L. Elliott

CALL FOR PAPERS


Original manuscripts describing recent progress in neural 
networks research directly applicable to Control or making use 
of modern control theory. Manuscripts  may be survey or 
tutorial in nature. Suggested topics for this book are:

	%New directions in neurocontrol

	%Adaptive control

	%Biological control architectures

	%Mathematical foundations of control

	%Model-based control with learning capability 

	%Natural neural control systems

	%Neurocontrol hardware research

	%Optimal control and incremental dynamic programming

	%Process control and manufacturing

	%Reinforcement-Learning Control

	%Sensor fusion and vector quantization

	%Validating neural control systems


The papers will be refereed and uniformly typeset. Ablex and the Progress 
Series editors invite you to submit an abstract, extended summary or 
manuscript proposal, directly to the Special Volume Editor:

Dr. David L. Elliott, Institute for Systems Research
University of Maryland, College Park, MD 20742	
Tel: (301)405-1241   FAX (301)314-9920 
Email: DELLIOTT at SRC.UMD.EDU

      or to the Series Editor:
Dr. Omid M. Omidvar, Computer Science Dept., 
University of the District of Columbia, Washington DC 20008
Tel: (202)282-7345   FAX: (202)282-3677  
Email: OOMIDVAR at UDCVAX.BITNET
The Publisher is Ablex Publishing Corporation, Norwood, NJ


From pittman at mcc.com  Thu Aug 12 08:38:30 1993
From: pittman at mcc.com (Jay Pittman)
Date: Thu, 12 Aug 93 08:38:30 EDT
Subject: neuroanatomy list ad & more on bee brains 
Message-ID: <9308121338.AA14022@gluttony.mcc.com>


Excellent note, well stated.  I agree with everything
Detweiler said about the press.

On the other hand, when I originally read Bill Skaggs note
I didn't think he was being all that critical.  I went back
and looked at it again, and, yes, he does sound like a real
flamer, WHEN I START OUT ASSUMING THAT.  One can also read
it as a calmly-stated critique of the article.  I find
myself imagining different "tones of voice", depending on
(presumably) random triggers.  I hope when you read this
note you perceive me speaking in a calm, relaxed manner.

While I agree with Detweiler's attitude toward the popular
press, I think Skaggs statements were addressed to us, the
members of the research community, and not to the reporters.
As long as the note does not reach members of that
community, we should tolerate somewhat-more-grouchy
phrasing than we might want for lay consumption.

I've just spent a lot of time trying to carefully word the
above message.  The neat thing about a group such as
connectionists is that (I think) we can skip that labor,
and just spit out our thoughts.  Or perhaps I am being
naive?

BTW, I have no HO on bee brains.  My own dendrites get
thinner every day.

J


From chris at arraysystems.nstn.ns.ca  Sat Aug 14 14:25:29 1993
From: chris at arraysystems.nstn.ns.ca (Chris Brobeck)
Date: Sat, 14 Aug 93 15:25:29 ADT
Subject: Genetic Algorithms
Message-ID: <9308141825.AA07238@arraysystems.nstn.ns.ca>

Dear Colleagues;
	We're currently in the process of building a relatively large net
and were looking at of using a genetic algorithm to optimize the network 
structure.
	The question is as follows. Early forms of genetic algorithms seemed
to rely on reading the gene once, linearly, in the construction process,
whereas a number of more recent algorithms allow the reading to start anywhere
along the gene, and continue to read (construct rules) until some stopping
criteria is met.
	In the former, it seems reasonable then for one organism to compete
against the other in a winner-take-all sort of way. On the other hand, the
rigidity of the genetic structure makes it very sensitive to mutation.
	In the latter case the gene may be thought of as a generator for
a process (randomly) creating rules of a variety of lengths. If one assumes
that individual rules are much shorter than the entire gene this method
becomes less sensitive to mutation,crossover,etc (both the beneficial
and not so beneficial aspects). In this case it seems that competition among
species would be as critical as competion among individuals, with the
interspecies competion perhaps representing a fast way to remove ineffective
rule sets, and individual competion more of a way of fine-tuning a distribution.
The upshot would be (one assumes) slower but more robust convergence. 
	In any case, if there is anyone out there who can point us in the
direction of some good references let us know - particularly ones that might
be available via ftp. Thanks,

	Chris Brobeck.
	

From bengio at iro.umontreal.ca  Mon Aug 16 11:09:57 1993
From: bengio at iro.umontreal.ca (Samy Bengio)
Date: Mon, 16 Aug 1993 11:09:57 -0400
Subject: Preprint announcement: Generalization of a Parametric Learning Rule
Message-ID: <9308161509.AA06576@carre.iro.umontreal.ca>


FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/bengio.general.ps.Z


The following file has been placed in neuroprose:
(no hardcopies will be provided):


           GENERALIZATION OF A PARAMETRIC LEARNING RULE (8 pages)

    by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei


                            Abstract:

In previous work we discussed the 
subject of parametric learning rules for neural networks.
In this article, we present a theoretical basis permitting to study 
the {\it generalization} property of a learning rule whose parameters 
are estimated from a set of learning tasks. By generalization, we mean 
the possibility of using the learning rule to learn solve new tasks. 
Finally, we describe simple experiments on two-dimensional categorization 
tasks and show how they corroborate the theoretical results.


This paper is an extended version of a paper which will appear in ICANN'93: 
Proceedings of the International Conference on Artificial Neural Networks.


To retrieve the file:
 
unix> ftp cheops.cis.ohio-state.edu
Connected to cheops.cis.ohio-state.edu.
220 cheops.cis.ohio-state.edu FTP server ready.
Name: anonymous
331 Guest login ok, send ident as password.
Password:<your email adress>
230 Guest login ok, access restrictions apply.
ftp> binary
200 Type set to I.
ftp> cd pub/neuroprose
250 CWD command successful.
ftp> get bengio.general.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for bengio.general.ps.Z
226 Transfer complete.
100000 bytes sent in 3.14159 seconds
ftp> quit
221 Goodbye.
unix> uncompress bengio.general.ps.Z
unix lpr bengio.general.ps (or however you print out postscript)

Many thanks to Jordan Pollack for maintaining this archive.


-- 
Samy Bengio	E-mail: bengio at iro.umontreal.ca	     Fax:       (514) 343-5834
		Tel: (514) 343-6111 ext. 3545/3494   Residence: (514) 495-3869
		Universite de Montreal, Dept. IRO, C.P. 6128, Succ. A,
		Montreal, Quebec, Canada, H3C 3J7


From reza at ai.mit.edu  Mon Aug 16 12:37:02 1993
From: reza at ai.mit.edu (Reza Shadmehr)
Date: Mon, 16 Aug 93 12:37:02 EDT
Subject: Tech Reports from CBCL at MIT
Message-ID: <9308161637.AA03497@corpus-callosum.ai.mit.edu>


The following technical reports from the Center for Biological
and Computational Learning at M.I.T. are now available via 
anonymous ftp.

--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29

We present a tree-structured architecture for supervised learning.  
The statistical model underlying the architecture is a hierarchical 
mixture model in which both the mixture coefficients and the mixture 
components are generalized linear models (GLIM's).  Learning is treated 
as a maximum likelihood problem; in particular, we present an Expectation-
Maximization (EM) algorithm for adjusting the parameters of the architecture.
We also develop an on-line learning algorithm in which the parameters are 
updated incrementally.  Comparative simulation results are presented in 
the robot dynamics domain.

--------------
:CBCL Paper #84/AI Memo #1441
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:title On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
:date August 1993
:pages 13

Recent developments in the area of reinforcement learning have yielded 
a number of new algorithms for the prediction and control of Markovian 
environments.  These algorithms, including the TD(lambda) algorithm of 
Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be 
motivated heuristically as approximations to dynamic programming (DP).  
In this paper we provide a rigorous proof of convergence of these DP-based 
learning algorithms by relating them to the powerful techniques of 
stochastic approximation theory via a new convergence theorem.  The 
theorem establishes a general class of convergent algorithms to which 
both TD (lambda) and Q-learning belong.

============================

How to get a copy of above reports:

The files are in compressed postscript format and are named by their 
AI memo number, e.g., the Jordan and Jacobs paper is named 
AIM-1440.ps.Z.  

Here is the procedure for ftp-ing:

unix> ftp ftp.ai.mit.edu   (log-in as anonymous)
ftp>  cd ai-pubs/publications/1993
ftp>  binary
ftp>  get AIM-number.ps.Z
ftp>  quit
unix> zcat AIM-number.ps.Z | lpr


I will periodically update the above list as new titles become
available.


Best wishes,

Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139


From mikewj at signal.dra.hmg.gb  Tue Aug 17 04:15:33 1993
From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb)
Date: Tue, 17 Aug 93 09:15:33 +0100
Subject: Practical Neural Nets Conference & Workshops in UK.
Message-ID: AA20707@ravel.dra.hmg.gb


        ***********************************
	NEURAL COMPUTING APPLICATIONS FORUM
	***********************************

		8 - 9 September 1993

         Brunel University, Runnymede, UK


     *****************************************
     PRACTICAL APPLICATIONS OF NEURAL NETWORKS
     *****************************************


Neural Computing Applications Forum is the primary meeting place for
people developing Neural Network applications in industry and
academia.  It has 200 members from the UK and Europe, from
universities, small companies and big ones, and holds four main
meeting each year.  It has been running for 3 years, and is cheap to
join.

This meeting spans two days with informal workshops on 8 September and
the main meeting comprising talks about neural network techniques and
applications on 9 September.

*********
WORKSHOPS
*********

**********************************************************
      Neural Networks in Engine Health Monitoring 

           8 September, 13.00 to 15.00
**********************************************************

Technical Contact: Tom Harris: (+44/0) 784 431341

Including :

Roger Hutton (ENTEK): 
"What is Predictive Maintenance?"

John Hobday (Lloyds Register):
"Gas Turbine Start Monitoring"

John McIntyre (University of Sunderland / National Power plc):
Predictive Maintenance at Blyth Power Station

*********************************************************
         Building a Neural Network Application

              8 September, 15.30 to 17.30
*********************************************************

Technical Contact: Tom Harris: (+44/0) 784 431341

Including:

Chris Bishop (Aston University): 
"The DTI Neural Computing Guidelines Project"

Tom Harris (Brunel University):
"A Design Process for Neural Network Applications"

Paul Gregory (Recognition Research Ltd.):
"Building an Applications in Software" (case study)

Simon Hancock (Neural Technologies Ltd.):
"Implementing Hardware Neural Network Solutions" (case study)


*************************
Evening: Barbecue Supper
*************************


*****************************
MAIN MEETING  -  24 June 1993
*****************************


8.30 Registration

9.05 Welcome

9.15 Neil Burgess (CRL):
   	"Feature Selection in Neural Networks"

9.50 Bryn Williams (Aston University):
	"Convergence and Diversity of Species in Genetic Algorithms
		for Optimization of a Bump-Tree Classifier"

10.20 Coffee

11.00 Mike Brinn (Health and Safety Executive):
	"Kohonen Networks Classifying Toxic Molecules"

11.40 John Bridle (Dragon Systems Ltd.):
	"Speech Recognition in Principle and Practice"

12.15 Lunch

2.00 Bruce Wilkie (Brunel University):
	"Real Time Logical Neural Networks"

2.40 Stan Swallow (Brunel University):
	"TARDIS: The World's Fastest Neural Network?"

3.15 Tea

3.40 Dave Cressy (Logica Cambridge Ltd.):
	"Neural Control of an Experimental Batch Distillation Column"

4.10 Discussions

4.30 Close & minibus to the station

ACCOMODATION is available in Brunel University at 35 pounds (including
barbecue supper) and **MUST** be booked and paid for in advance.
Accommodation and breakfast only: 25 pounds; barbecue supper only: 12
pounds.


*****************
   Application 
*****************

Members of NCAF get free entry to all meetings for a year. (This is
very good value - main meetings, tutorials, special interest
meetings).  It also includes subscription to Springer Verlag's journal
"Neural Computing and Applications".

Full membership: 250 pounds. 
	- anybody in your small company / research group in big company.

Individual membership: 140 pounds
	- named individual only.

Student membership (with journal): 55 pounds
	- copy of student ID required.

Student membership (no journal, very cheap!): 25 pounds
	- copy of student ID required.

Entry to this meeting without membership costs 35 pounds for the
workshops, and 80 pounds for the main day.

Payment in advance if possible; please give an official order number 
if an invoice is required.


Email enquiries to Mike Wynne-Jones, mikewj at signal.dra.hmg.gb.

Postal to Mike Wynne-Jones, NCAF, PO Box 62, Malvern, WR14 4NU, UK.

Fax to Karen Edwards, (+44/0) 21 333 6215


From mpp at cns.brown.edu  Tue Aug 17 12:18:01 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Tue, 17 Aug 93 12:18:01 EDT
Subject: Provable optimality of averaging generalizers
Message-ID: <9308171618.AA15207@cns.brown.edu>

David Wolpert writes:

-->1) Say I have 3 real numbers, A, B, and X. In general, it's always
-->true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} /
-->2. (This is exactly analogous to having the cost of the average guess
-->bounded above by the average cost of the individual guesses.)
-->
-->2) This means that if we had a choice of either randomly drawing one
-->of the numbers {A, B}, or drawing C, that on average drawing C would
-->give smaller quadratic cost with respect to X.
-->
-->3) However, as Michael points out, this does *not* mean that if we had
-->just the numbers A and C, and could either draw A or C, that we should
-->draw C. In fact, point (1) tells us nothing whatsoever about whether A
-->or C is preferable (as far as quadratic cost with respect to X is
-->concerned).
-->
-->4) In fact, now create a 5th number, D = [C + A] / 2. By the same
-->logic as in (1), we see that the cost (wrt/ X) of D is less than the
-->average of the costs of C and A. So to the exact same degree that (1)
-->says we "should" guess C rather than A or B, it also says we should
-->guess D rather than A or C. (Note that this does *not* mean that D's
-->cost is necessarily less than C's though; we don't get endlessly
-->diminishing costs.)
-->
-->5) Step (4) can be repeated ad infinitum, getting a never-ending
-->sequence of "newly optimal" guesses. In particular, in the *exact*
-->sense in which C is "preferable" to A or B, and therefore should
-->"replace" them, D is preferable to A or B, and therefore should
-->replace *them* (and in particular replace C). So one is never left
-->with C as the object of choice.

This argument does not imply a contradiction for averaging!

This argument shows the natural result of throwing away information.
Step (4) throws away number B.  Given that we no longer know B, number D
is the correct choice. (One could imagine such "forgetting" to be useful
in time varying situations - which leads towards the Kalman filtering
that was mentioned in relation to averaging a couple of weeks ago.)
In Step (5), an infinite sequence is developed by successively throwing
away more and more of number B.  The infinite limit of Step (5) is
number A.  In other words, we have thrown away all knowledge of B.

-->So (1) isn't really normative; it doesn't say one "should" guess the
-->average of a bunch of guesses:

Normative?  Hey is this an ethics class!?  :-)

-->7) Choosing D is better than randomly choosing amongst C or A, just as
-->   choosing C is better than randomly choosing amongst A or B.
-->
-->8) This doesn't mean that given C, one should introduce an A and
-->   then guess the average of C and A (D) rather than C, just as
-->   this doesn't mean that given A, one should introduce a B and 
-->   then guess the average of A and B (C) rather than A.

Sure, if you're willing to throw away information.

Michael


From cns at clarity.Princeton.EDU  Tue Aug 17 11:30:02 1993
From: cns at clarity.Princeton.EDU (Cognitive Neuroscience)
Date: Tue, 17 Aug 93 11:30:02 EDT
Subject: RFP Research - McDonnell-Pew Program
Message-ID: <9308171530.AA27618@clarity.Princeton.EDU>


McDonnell-Pew Program
in Cognitive Neuroscience

SEPTEMBER 1993

Individual Grants-in-Aid 
for Research

Program supported jointly by the 
James S. McDonnell Foundation
and The Pew Charitable Trusts

INTRODUCTION
	The McDonnell-Pew Program in Cognitive Neuroscience has been
created jointly by the James S. McDonnell Foundation and The Pew Charitable
Trusts to promote the development of cognitive neuroscience.  The foundations
have allocated $20 million over a five-year period for this program.

	Cognitive neuroscience attempts to understand human mental events by
specifying how neural tissue carries out computations.  Work in cognitive
neuroscience is interdisciplinary in character, drawing on developments in
clinical and basic neuroscience, computer science, psychology, linguistics,
and philosophy.  Cognitive neuroscience excludes descriptions of psychological
function that do not address the underlying brain mechanisms and 
neuroscientific descriptions that do not speak to psychological function.

 	The program has three components.

	   (1)	Institutional grants, which have already been awarded, 
		for the purpose of creating centers where cognitive 
		scientists and neuroscientists can work together.
	
	   (2)	Small grants-in-aid, presently being awarded, for individual
		research projects to encourage Ph.D. and M.D. investigators 
		in cognitive neuroscience.

	   (3)	Small grants-in-aid, presently being awarded, for individual
		training projects to encourage Ph.D. and M.D. investigators 
		to acquire skills for interdisciplinary research.

	This brochure describes the individual grants-in-aid for research.


RESEARCH GRANTS
	The McDonnell-Pew Program in Cognitive Neuroscience will issue a
limited number of awards to support collaborative work by cognitive      
neuroscientists.  Applications are sought for projects of exceptional merit
that are not currently fundable through other channels and from investigators
who are not at institutions already funded by an institutional grant from
the program.  In order to distribute available funds as widely as possible,    
preference will be given to applicants who have not received previous grants
under this program.

	Preference will be given to projects that are interdisciplinary in   
character.  The goals of the program are to encourage broad participation
in the development of the field and to facilitate the participation of
investigators outside the major centers of cognitive neuroscience.

	There are no U.S. citizenship restrictions or requirements, nor does
the proposed work need to be conducted at a U.S. institution, providing the
sponsoring organization qualifies as tax-exempt as described in the
"Applications" section of this brochure.  Ph.D. thesis research of graduate
students will not be funded.

	Grant support under the research component is limited to $30,000
per year for two years.  Indirect costs are to be included in the $30,000
maximum and may not exceed 10 percent of total salaries and fringe
benefits.  These grants are not renewable after two years.

	The program is looking for innovative proposals that would, for
example:
	
	* combine experimental data from cognitive psychology and neuroscience;

	* explore the implications of neurobiological methods for the study
	  of the higher cognitive processes;

	* bring formal modeling techniques to bear on cognition, including
	  emotions and higher thought processes;

	* use sensing or imaging techniques to observe the brain during
	  conscious activity;

	* make imaginative use of patient populations to analyze cognition;

	* develop new theories of the human mind/brain system.

This list of examples is necessarily incomplete but should suggest the
general kind of proposals desired.  Ideally, a small grant-in-aid for 
research should facilitate the initial exploration of a novel or risky
idea, with success leading to more extensive funding from other sources.


APPLICATIONS
	Applicants should submit five copies of the following information:

	* a brief, one-page abstract describing the proposed work;

	* a brief, itemized budget that includes direct and indirect
	  costs (indirect costs may not exceed 10 percent of total    
	  salaries and fringe benefits);

	* a budget justification;

	* a narrative proposal that does not exceed 5,000 words; the
	  5,000-word proposal should include:

	    1)	a description of the work to be done and where
		it might lead;

	    2)	an account of the investigator's professional
		qualifications to do the work;

	    3)	an account of any plans to collaborate with other
		cognitive neuroscientists;

	    4)	a brief description of the available research
		facilities;

	* curriculum(a) vitae of the participating investigator(s);

	* an authorized document indicating clearance for the use of
	  human and animal subjects;

	* an endoresement letter from the officer of the sponsoring
	  institution who will be responsible for administering the
	  grant.

	One copy of the following items must also be submitted along with the
proposal.  These documents should be readily available from the sponsoring
institution's grants or development office.

	* A copy of the IRS determination letter, or the international 
	  equivalent, stating that the sponsoring organization is a nonprofit,
	  tax-exempt institution classified as a 501(c)(3) organization.

	* A copy of the IRS determination letter stating that your organization
	  is not listed as a private foundation under section 509(a) of the
	  Internal Revenue Service Code.

	* A statement on the sponsoring institution's letterhead, following
	  the wording on Attachment A and signed by an officer of the 
	  institution, certifying that the status or purpose of the
	  organization has not changed since the issuance of the IRS
	  determinations.  (If your organization's name has changed, include
	  a copy of the IRS document reflecting this change.)

	* An audited financial statement of the most recently completed fiscal
	  year of the sponsoring organization.

	* A current list of the names and professional affiliations of the
	  members of the organization's board of trustees and the names and
	  titles of the principal officers.

Other appended documents will not be accepted for evaluation and will be   
returned to the applicant.  Any incomplete proposals will also be returned
to the applicant.    

	Submissions will be reviewed by the program's advisory board.
Applications must be postmarked on or before FEBRUARY 1 to be considered
for review.

INFORMATION
McDonnell-Pew Program in Cognitive Neuroscience
Green Hall 1-N-6
Princeton University
Princeton, New Jersey 08544-1010
Telephone: 609-258-5014
Facsimile: 609-258-3031
Email: cns at clarity.princeton.edu


ADVISORY BOARD

Emilio Bizzi, M.D.
Eugene McDermott Professor in the Brain 
   Sciences and Human Behavior
Chairman, Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology, E25-526
Cambridge, Massachusetts 02139

Sheila E. Blumstein, Ph.D.
Professor of Cognitive and Linguistic Sciences
Dean of the College
Brown University
University Hall, Room 218
Providence, Rhode Island 02912

Stephen J. Hanson, Ph.D.
Head, Learning Systems Department
Siemens Corporate Research
755 College Road East
Princeton, New Jersey 08540

Jon H. Kaas, Ph.D.
Centennial Professor
Department of Psychology
Vanderbilt University
301 Wilson Hall
111 21st Avenue South
Nashville, Tennessee 37240

George A. Miller, Ph.D.
Director, McDonnell-Pew Program in Cognitive Neuroscience
James S. McDonnell Distinguished University Professor of Psychology
Department of Psychology
Princeton University
Princeton, New Jersey 08544-1010

Mortimer Mishkin, Ph.D.
Chief, Laboratory of Neurpsychology
National Institute of Mental Health
9000 Rockville Pike
Building 49, Room 1B80
Bethesda, Maryland 20892

Marcus E. Raichle, M.D.
Professor of Neurology and Radiology
Division of Radiation Sciences
Washington University School of Medicine
Campus Box 8225
510 S. Kingshighway Boulevard
St. Louis, Missouri 63110

Endel Tulving, Ph.D.
Tanenbaum Chair in Cognitive Neuroscience
Rotman Research Institute of Baycrest Centre
3560 Bathurst Street
North York, Ontario M6A 2E1
Canada


From dhw at santafe.edu  Tue Aug 17 21:26:08 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 17 Aug 93 19:26:08 MDT
Subject: Yet more on averaging
Message-ID: <9308180126.AA02904@zia>


In several recent e-mail conversations, Michael Perrone and I have
gotten to where think we agree with each other substance, although
we disagree a bit on emphasis. To complete the picture for the 
connectionist community and present the other side to Michael's
recent posting:


In my back pocket, I have a number. I'll fine you according to the
squared difference between your guess for the number and its actual
value. Okay, should you guess 3 or 5? Obviously you can't answer. 7 or
5? Same response. 5 or a random sample of 3 or 7? Now, as Michael points
out, you *can* answer: 5.

However I'm not as convinced as Michael that this actually tells us
anything of practical use. How should you use this fact to help you
guess the number in my back pocket? Seems to me you can't.

The bottom line, as I see it: arguments like Michael's show that one
should always use a single-valued learning algorithm rather than a
stochastic one. (Subtle caveat: If used only once, there is no
difference between a stochastic learning algorithm and a single-valued
one; multiple trials are implicitly assumed here.)

But if one has before one a smorgasbord of single-valued learning
algorithms, one can not infer that one should average over them. Even if I
choose amongst them in a really stupid way (say according to the
alphabetical listing of their creators), *so long as I am consistent
and single-valued in how I make my choice*, I have no assurace that doing
this will give worse results than averaging them.

To sum it up: one can not prove averaging to be preferable to a scheme
like using the alphabet to pick. Michael's result shows instead that
averaging the guess is better (over multiple trials) than randomly
picking amongst the guesses.

Which simply means that one should not randomly pick amongst the
guesses. It does *not* mean that one should average rather than use
some other (arbitrarilly silly) single-valued scheme.


David Wolpert


Disclaimer: All the above notwithstanding, I personally *would*
use some sort of averaging scheme in practice. The only issue of
contention here is what is *provably* the way one should generalize.
In addition to disseminating the important result concerning
the sub-optimality of stochastic schems (of which there are many
in the neural nets community!), Michael is to be commended for
bringing this entire fascinating subject to the attention of the
community.


From tmb at idiap.ch  Wed Aug 18 02:27:58 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Wed, 18 Aug 93 08:27:58 +0200
Subject: Yet more on averaging
Message-ID: <9308180627.AA18505@idiap.ch>

In reply to dhw at santafe.edu:
|The bottom line, as I see it: arguments like Michael's show that one
|should always use a single-valued learning algorithm rather than a
|stochastic one.


From tmb at idiap.ch  Wed Aug 18 02:29:42 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Wed, 18 Aug 93 08:29:42 +0200
Subject: Yet more on averaging
Message-ID: <9308180629.AA18508@idiap.ch>

dhw at santafe.edu writes:
|To sum it up: one can not prove averaging to be preferable to a scheme
|like using the alphabet to pick. Michael's result shows instead that
|averaging the guess is better (over multiple trials) than randomly
|picking amongst the guesses.
|
|Which simply means that one should not randomly pick amongst the
|guesses. It does *not* mean that one should average rather than use
|some other (arbitrarilly silly) single-valued scheme.

I would like to strengthen this point a little.

In general, averaging is clearly not optimal, nor even justifiable on
theoretical grounds.  For example, let us take the classification case
and let us assume that each neural network $i$ returns an estimate
$p^i_j(x)$ of the probability that the object belongs to class $j$
given the measurement $x$.

Consider now the case in which we know that the predictions of those
networks are statistically independent (for example, because they are
run on independent parts of the input data).  Then, we should really
multiply the probabilities estimated by each network, rather than
computing a weighted sum.  That is, we should make a decision
according to the maximum of $\prod_i p^i_j(x)$, not according to the
maximum of $\sum_i w_i p^i_j(x)$ (assuming a 0-1 loss function).

As another example, consider the case in which we have an odd number
of experts.  If they are trained and designed individually in a
particularly peculiar way, it might turn out that the optimal decision
rule is to output class 1 if an odd number of them pick class 1, and
pick class 0 otherwise.

Now, Michael probably limits the scope of his claims in his thesis to
exclude such cases (I only had a brief look, I must admit), but I
think it is important to make the point that, without some additional
assumptions, averaging is just a heuristic and not necessarily
optimal.

Still, linear combinations of the outputs of classifiers, regressors,
and networks seem to be useful in practice for improving
classification rates in many cases.  Lots of practical experience in
both statistics and neural networks points in that direction.

				Thomas.


From dhw at santafe.edu  Wed Aug 18 18:37:32 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Wed, 18 Aug 93 16:37:32 MDT
Subject: Random vs. single-valued rules
Message-ID: <9308182237.AA03709@zia>


tmb writes:

>>>>>
In reply to dhw at santafe.edu:
|The bottom line, as I see it: arguments like Michael's show that one
|should always use a single-valued learning algorithm rather than a
|stochastic one.

>From context, I'm assuming that you are referring to "deterministic"
vs. "randomized" decision rules, as they are called in decision theory
("stochastic learning algorithm" means something different to me, but
maybe I'm just misinterpreting your posting).

Picking an opinion from a pool of experts randomly is clearly not a
particularly good randomized decision rule in most cases.  However,
there are cases in which properly chosen randomized decision rules are
important (any good introduction on Bayesian statistics should discuss
this).  Unless there is an intelligent adversary involved, such cases
are probably mostly of theoretical interest, but nonetheless, a
randomized decision rule can be "better" than any deterministic one.
>>>>

Implicit in my statement was the context of Michael Perrone's
posting (which I was responding to): convex loss functions,
and the fact that in particular, one "single-valued learning algorithm"
one might use is the one Michael advocates: average over your
pool of experts.

Obviously one can choose a single-valued learning algorithm
which performs more poorly than randomly drawing from a pool of
experts:      

1) One can prove that (for convex loss) averaging over the pool is
preferable to randomly sampling the pool (Michael's result; note
assumptions about lack of correlations between the experts and the
like apply.)

2) One can not prove that averaging beats any other single-valued     
use of the experts.

3) Note that neither (1) nor (2) contradict the assertion that
there might be single-valued algorithms which perform worse than
randomly sampling the pool.

4) For the case of a 0-1 loss function, and a uniform prior over
target functions, it doesn't matter how you guess; all algorithms  
perform the same, both averaged over data and for one particular
data (as far as off-training set average loss is concerned).


David Wolpert


From tmb at idiap.ch  Thu Aug 19 09:17:14 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Thu, 19 Aug 93 15:17:14 +0200
Subject: Yet more on averaging
In-Reply-To: <9308180629.AA18508@idiap.ch>
References: <9308180629.AA18508@idiap.ch>
Message-ID: <9308191317.AA22756@idiap.ch>

I wrote, in response to a discussion of Michael Perrone's work:
|In general, averaging is clearly not optimal, nor even justifiable on
|theoretical grounds. [... some examples follow...]

Judging from some private mail that I have been receiving, some people
seem to have misunderstood my message.  I wasn't making a statement
about Michael's results per se, but about their application.

In particular, in the case of combining estimates of probabilities by
different "experts" for subsequent classification (e.g., in Michael's
OCR example), or in the case of combining expert "votes", using any
kind of linear combination is not justifiable in general on
theoretical grounds, and it is actually provably suboptimal in some
cases.

Now, such examples do violate some of the assumptions on which
Michael's results rely, so there is no contradiction.  My message was
only intended as a reminder that there are a number of important
problems in which the assumptions actually are violated, and in which
the approach of linear combinations reduces to a heuristic (one, I
might add, that often does work well in practice).

					Thomas.


From brandyn at brainstorm.com  Fri Aug 20 03:32:18 1993
From: brandyn at brainstorm.com (Brandyn)
Date: Fri, 20 Aug 93 00:32:18 PDT
Subject: Paper available on neuroprose
Message-ID: <9308200732.AA14000@brainstorm.com>

FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/webb.furf.ps.Z


    The following paper is now available by anonymous FTP:

                            Fusion-Reflection
                        (Self-Supervised Learning)

                           Brandyn Jerad Webb
                         brandyn at brainstorm.com


                                ABSTRACT

  By  analyzing  learning  from  the  perspective of knowledge acquisition, a
  number  of  common limitations are overcome.  Modeling efficacy is proposed
  as an empirical measure of knowledge, providing  a  concrete,  mathematical
  means  of  "acquiring  knowledge"  via gradient ascent.  A specific network
  architecture is described, a hierarchical  analog  of  node-labeled  Hidden
  Markov  Models,  and  its  evaluation  and  learning  laws are derived.  In
  empirical studies using  a  hand-printed  character  recognition  task,  an
  unsupervised  network was able to discover n-gram statistics from groups of
  letter images, and to use these statistics to enhance its ability to  later
  identify individual letters.

         Host: archive.cis.ohio-state.edu (128.146.8.52)
    Directory: pub/neuroprose
     Filename: webb.furf.ps.Z

    A version of this paper was submitted to NIPS in May '93.

    If there is sufficient interest, and if it wouldn't violate neuroprose
etiquette, I could possibly make the C code available as well.

    -Brandyn (brandyn at brainstorm.com)


From mikewj at signal.dra.hmg.gb  Fri Aug 20 12:00:02 1993
From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb)
Date: Fri, 20 Aug 93 17:00:02 +0100
Subject: Practical Neural Nets Conference & Workshops in UK.
Message-ID: AA16188@ravel.dra.hmg.gb


	***********************************
	NEURAL COMPUTING APPLICATIONS FORUM
	***********************************


     *****************************************
     PRACTICAL APPLICATIONS OF NEURAL NETWORKS

		CALL FOR PRESENTATIONS
     *****************************************

The Neural Computing Applications Forum is the primary meeting
place for people developing Neural Network applications in
industry and academia.  It has 200 members in the UK and Europe,
from Universities and small and large companies, and holds four
main meetings each year.  It has been running for three years.

Presentations, tutorials, and workshops are sought on all
practical aspects of Neural Computing and Pattern Recognition.
Previous events have included presentations and workshops on
practical issues including machine health monitoring, neural
control, financial prediction, chemical structure analysis,
power station load prediction, copyright law, alternative
energy, automatic speech recognition, and human-computer
interaction.  We also hold introductory tutorials and
theoretical workshops on all aspects of Neural computing.

Presentations at NCAF do not require a written paper for
publication.  You will have the chance to draw the attention of
the top industrial Neural Network practitioners to your work.
conference presenters of outstanding quality will be invited to
submit a paper to the Springer Verlag journal Neural Computing
and Applications.

Please contact Mike Wynne-Jones, Programme Organiser, NCAF, PO
Box 62, Malvern, WR14 4NU, UK, enclosing your proposed title and
a brief synopsis of your presentation.  Email:
mikewj at signal.dra.hmg.gb; phone +44 684 563858.


From shashem at ecn.purdue.edu  Sat Aug 21 18:08:11 1993
From: shashem at ecn.purdue.edu (Sherif Hashem)
Date: Sat, 21 Aug 93 17:08:11 -0500
Subject: Combining (averaging) NNs
Message-ID: <9308212208.AA18678@cornsilk.ecn.purdue.edu>

I have recently joined Connectionists and I read some of the email messages
arguing about combining/averaging NNs.  Unfortunately, I missed the earlier 
discussion that started this argument. 

I am interested in combining NNs, in fact, my Ph.D. thesis is about optimal 
linear combinations of NNs.

Averaging a number of estimators has been suggested/debated/examined in the 
literature for a long time, dating as far as 1818 (Laplace 1818). 
Clemen (1989) cites more than 200 papers in his review of the literature 
related to combining forecasts (estimators), including contributions from 
forecasting, psychology, statistics, and management science literatures.

Numerous empirical studies have been conducted to assess the 
benefits/limitations of combining estimators (Clemen 1989). Besides, there are
quite a few analytical results established in the area. Most of these 
studies and results are in the forecasting literature (more than 100 
publications in the last 20 years). 

I think that it is fair to say that, as long as no "absolute" best estimator 
can be identified, combining estimators may provide a superior alternative to 
picking the best from a population of estimators.

I have published some of my preliminary results on the benefits of combining 
NNs in (Hashem and Schmeiser 1992, 1993a, and Hashem et al. 1993b), and
based on my experience with combining NNs, I join Michael Perrone in 
advocating the use of combining NNs to enhance the estimation accuracy of 
NN based models.


Sherif Hashem
email:shashem at ecn.purdue.edu


References:
-----------

Clemen, R.T. (1989). Combining Forecasts: A Review and Annotated Bibliography.
        International Journal of Forecasting, Vol. 5, pp. 559-583.

Hashem, S., Y. Yih, & B. Schmeiser (1993b). An Efficient Model
       for Product Allocation using Optimal Combinations of 
       Neural Networks. In Intelligent Engineering Systems through
       Artificial Neural Networks, Vol. 3, C. Dagli, L. Burke, B. Femandez, 
       & J. Ghosh (Eds.), ASME Press, forthcoming.

Hashem, S., & B. Schmeiser (1993a). Approximating a Function
       and its Derivatives using MSE-Optimal Linear Combinations of 
       Trained Feedforward Neural Networks. Proceedings of the
       World Congress on Neural Networks, Lawrence Erlbaum
       Associates, New Jersey, Vol. 1, pp. 617-620.

Hashem, S., & B. Schmeiser (1992). Improving Model Accuracy using Optimal 
        Linear Combinations of Trained Neural Networks, Technical Report 
        SMS92-16, School of Industrial Engineering, Purdue University.
        (Submitted)

Laplace P.S. de. (1818). Deuxieme Supplement a la Theorie Analytique
        des Probabilites (Courcier, Paris).; reprinted (1847) in Oeuvers
        Completes de Laplace, Vol. 7 (Paris, Gauthier-Villars) 531-580.


From furu at uchikawa.nuem.nagoya-u.ac.jp  Mon Aug 23 11:22:41 1993
From: furu at uchikawa.nuem.nagoya-u.ac.jp (Takeshi Furuhashi)
Date: Mon, 23 Aug 93 11:22:41 JST
Subject: Call for Papers of WWW 
Message-ID: <9308230222.AA00124@cancer.uchikawa.nuem.nagoya-u.ac.jp>

CALL FOR PAPERS                                   TENTATIVE
1994 IEEE/Nagoya University
World Wisemen/women Workshop(WWW)

ON FUZZY LOGIC AND NEURAL NETWORKS/GENETIC ALGORITHMS
-Architecture and Applications for Knowledge Acquisition/Adaptation-

August 9 and 10, 1994
Nagoya University Symposion
Chikusa-ku, Nagoya, JAPAN

Sponsored by Nagoya University

Co-sponsored by
IEEE Industrial Electronics Society

Technically Co-sponsored by
IEEE Neural Network Council
IEEE Robotics and Automation Society
International Fuzzy Systems Association
Japan Society for Fuzzy Theory and Systems
North American Fuzzy Information Processing Society
Society of Instrument and Control Engineers
Robotics Society of Japan

There are growing interests in combination technologies of fuzzy logic
and neural networks, fuzzy logic and genetic algorithm for acquisition
of experts' knowledge, modeling of nonlinear systems, realizing
adaptive systems. The goal of the 1994 IEEE/Nagoya University WWW on
Fuzzy Logic and Neural Networks/Genetic Algorithm is to give its
attendees opportunities to exchange information and ideas on various
aspects of the Combination Technologies and to stimulate and inspire
pioneering works in this area. To keep the quality of these workshop
high, only a limited number of people are accepted as participants of
the workshops. The papers presented at the workshop will be edited and
published from the Oxford University Press.

TOPICS:
Combination of Fuzzy Logic and Neural Networks, Combination of Fuzzy
Logic and Genetic Algorithm, Learning and Adaptation, Knowledge
Acquisition, Modeling, Human Machine Interface

IMPORTANT DATES:
Submission of Abstracts of Papers : April 31, 1994
Acceptance Notification           : May 31, 1994
Final Manuscript                  : July 1, 1994

A partial or full assistance of travel expenses for speakers of
excellent papers will be provided by the WWW. The candidates should
apply as soon as possible, preferably by Jan. 30, '94

All correspondence and submission of papers should be sent to 
Takeshi Furuhashi, General Chair
Dept. of Information Electronics, Nagoya University
Furo-cho, Chikusa-ku, Nagoya 464-01, JAPAN
TEL: +81-52-781-5111 ext.2792
FAX: +81-52-781-9263
E mail: furu at uchikawa.nuem.nagoya-u.ac.jp

IEEE/Nagoya University WWW:

IEEE/Nagoya University WWW(World Wisemen/women Workshop) is a series
of workshops sponsored by Nagoya University and co-sponsored by IEEE
Industrial Electronics Society. City of Naoya, located two hours away
from Tokyo, has many electro-mechanical industries in its surroundings
such as Mitsubishi, TOYOTA, and their allied companies. Nagoya is a
mecca of robotics industries, machine industries and aerospace
industries in Japan. The series of workshops will give its attendees
opportunities to exchange information on advanced sciences and
technologies and to visit industries and research institutes in this
area.

*This workshop will be held just after the 3rd International
Conference on Fuzzy Logic, Neural Nets and Soft Computing(IIZUKA'94)
from Aug. 1 to 7, '94.

WORKSHOP ORGANIZATION

Honorary Chair: Tetsuo Fujimoto
                (Dean, School of Engineering, Nagoya University)
General Chair:  Takeshi Furuhashi (Nagoya University)
Advisory Committee:
        Chair:  Toshio Fukuda (Nagoya University)
                Fumio Harashima (University of Tokyo)
                Yoshiki Uchikawa (Nagoya University)
                Takeshi Yamakawa (Kyushu Institute of Technology)
Steering Committee:
                H.Berenji (NASA Ames Research Center)
                W.Eppler (University of Karlsruhe)
                I.Hayashi (Hannan University)
                Y.Hayashi (Ibaraki University)
                H.Ichihashi (Osaka Prefectural University)
                A.Imura
                (Laboratory for International Fuzzy Engineering)
                M.Jordan (Massachusetts Institute of Technology)
                C.-C.Jou (National Chiao Tung Universtiy)
                E.Khan (National Semiconductor)
                R.Langari (Texas A & M University)
                H.Takagi (Matsushita Electric Industrial Co., Ltd.)
                K.Tanaka (Kanazawa University)
                M.Valenzuela-Rendon
                (Institute Tecnologico y de Estudios Superiores de Monterrey) 
                L.-X.Wang (University of California Berkeley)
                T.Yamaguchi (Utsunomiya University)
                J.Yen (Texas A & M Universtiy)


From joachim at fit.qut.edu.au  Wed Aug 25 21:46:11 1993
From: joachim at fit.qut.edu.au (Joachim Diederich)
Date: Wed, 25 Aug 1993 21:46:11 -0400
Subject: Second Brisbane Neural Network Workshop
Message-ID: <199308260146.VAA09819@fitmail.fit.qut.edu.au>


          Second Brisbane Neural Network Workshop
          ---------------------------------------

            Queensland University of Technology
                Brisbane Q 4001, AUSTRALIA
               Gardens Point Campus, ITE 410
                     24 September 1993

This Second Brisbane Neural Network Workshop is intended  to
bring together those interested in neurocomputing and neural
network applications.  The objective of the workshop  is  to
provide   a   discussion   platform   for   researchers  and
practitioners interested in theoretical and applied  aspects
of  neurocomputing.  The  workshop  should be of interest to
computer scientists and engineers, as well as to biologists,
cognitive   scientists   and   others   interested   in  the
application of neural networks.

The Second Brisbane Neural Network Workshop will be held  at
Queensland  University  of  Technology, Gardens Point Campus
(ITE 410) on September 24, 1993 from 9:00am to 6:00pm.


 Program
 -------

 9:00-9:15
 Welcome
 Joachim Diederich, Queensland University of Technology,
 Neurocomputing Research Concentration Area

 Cognitive Science
 -----------------

 9:15-10:00
 Graeme Halford, University of Queensland,
 Department of Psychology
 "Representation of concepts in PDP models" 

 10:00-10:30
 Joachim Diederich, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Re-learning in connectionist semantic networks"

 10:30-11:00
 Coffee Break

 11:00-11:30
 James Hogan, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Recruitment learning in randomly connected neural networks"

 11:30-12:00
 Kate Stevens, University of Queensland,
 Department of Psychology
 "Music perception and neural network modelling"

 12:00-1:00
 Lunch Break

 1:00-1:30
 Software Demonstration:
 "Animal breeding advice using neural networks"

 Learning
 --------

 1:30-2:15
 Tom Downs, University of Queensland,
 Department of Electrical Engineering
 "Generalisation, structure and learning in artificial neural networks"

 2:15-3:00
 Ah Chung Tsoi, University of Queensland,
 Department of Electrical Engineering
 "Training algorithms for recurrent neural networks, a unified framework"

 3:00-3:30
 Steven Young, University of Queensland,
 Department of Electrical Engineering
 "Constructive algorithms for neural networks"

 3:30-4:00
 Coffee Break

 Pattern Recognition and Control
 -------------------------------

 4:00-4:30
 Gerald Finn, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Learning fuzzy rules by genetic algorithms"

 4:30-5:00
 Paul Hannah & Russel Stonier, University of Central Queensland,
 Department of Mathematics and Computing
 "Using a modified Kohonen associative map for function approximation
 with application to control"

 Theory and Artificial Intelligence
 ----------------------------------

 5:00-5:30
 M. Mohammadian, X. Yu & J.D. Smith, University of Central Queensland,
 Department of Mathematics and Computing
 "From connectionist learning to an optimised fuzzy knowledge base"

 5:30-6:00
 Richard Bonner & Louis Sanzogni, Griffith University,
 School of Information Systems & Management Science
 "Embedded neural networks"

All are welcome. Participation is free and there is no registration.

Enquiries should be sent to

Professor Joachim Diederich
Neurocomputing Research Concentration Area
School of Computing Science
Queensland University of Technology
GPO Box 2434
Brisbane Q 4001 Australia
Phone: +61 7 864-2143
Fax: +61 7 864-1801
Email: joachim at fitmail.fit.qut.edu.au


From sims at pdesds1.scra.org  Thu Aug 26 11:48:04 1993
From: sims at pdesds1.scra.org (Jim Sims)
Date: Thu, 26 Aug 93 11:48:04 EDT
Subject: fyi, late, but better than never
Message-ID: <9308261548.AA07086@pdesds1.noname>

I saw this while browsing the electronic CBD materils.

Agency   : NAS
Deadline : 12/01/93
Title    : Neurolab                                          
Reference: Commerce Business Dailly, 07/06/93                
BASIC RESEARCH OPPORTUNITY SOL OA SLS-4 POC                                   
Dr.  Frank  Sulzman  tel:  202/358-2359  The National Aeronautics and Space   
Administration (NASA), along with its domestic (NIH, NSF) and international   
(CNES,  CSA,  DARA,  ESA,  NASDA)  partners  is  soliciting  proposals  for   
Neurolab,  a Space Shuttle mission dedicated to brain and behavior research   
that  is  scheduled  for launch in 1998. A more detailed description of the   
opportunity  with specific guidelines for proposal preparation is available   
from Neurolab Program Scientist, NASA Headquarters, Code UL, 300 E St., SW,   
Washington,  DC  20546.  This NASA Announcement of Opportunity will be open   
for the period through December 1, 1993. (0182)                               
SPONSOR: NASA Headquarters, Code UL/Neurolab Program Scientist, Washington,   
           DC 20546 Attn:UL/Dr. Frank Sulzman    


From PIURI at IPMEL1.POLIMI.IT  Fri Aug 27 07:55:19 1993
From: PIURI at IPMEL1.POLIMI.IT (PIURI@IPMEL1.POLIMI.IT)
Date: 27 Aug 1993 12:55:19 +0100 (MET)
Subject: call for papers
Message-ID: <01H28OFZ9KS291WC7T@icil64.cilea.it>

=============================================================================

14th IMACS WORLD CONGRESS ON COMPUTATION AND APPLIED MATHEMATICS
July 11-15, 1994
Atlanta, Georgia, USA

Sponsored by:
IMACS - International Association for Mathematics and Computers in Simulation
IFAC  - International Federation for Automatic Control
IFIP  - International Federation for Information Processing
IFORS - International Federation of Operational Research Societies
IMEKO - International Measurement Confederation

General Chairman: Prof. W.F. Ames
                  Georgia Institute of Technology, Atlanta, GA, USA


SESSIONS ON NEURAL NETWORKS
     1. NEURAL NETWORK ARCHITECTURES AND IMPLEMENTATIONS
     2. APPLICATION OF NEURAL TECHNIQUES FOR SIGNAL AND IMAGE PROCESSING


                   >>>>>>  CALL FOR PAPERS  <<<<<<


The IMACS World Congress on Computation and Applied Mathematics is held every
three year to provide a large general forum to professionals and scientists
for analyzing and discussing the fundamental advances of research in all
areas of scientific computation, applied mathematics, mathematical modelling,
and system simulation in and for specific disciplines, the philosophical
aspects, and the impact on society and on disciplinary and interdisciplinary
research.

In the 14th edition, two sessions are planned on neural networks: "Neural
Network Architectures and Implementations" and "Application of Neural
Techniques for Signal and Image Processing".

The first session will focus on all theoretical and practical aspects of
architectural design and realization of neural networks:
from mathematical analysis and modelling to behavioral specification,
from architectural definition to structural design, from VLSI implementation
to software emulation, from design simulation at any abstraction level
to CAD tools for neural design, simulation and evaluation.

The second session will present the concepts, the design and the use of
neural solutions within the area of signal and image processing,
e.g., for modelling, identification, analysis, classification, recognition,
and filtering. Particular emphasis will be given to presentation of
specific applications or application areas.

Authors interested in the above neural sessions are invited to send
a one page abstract, the title of the paper and the author's address
by electronic mail, fax or postal mail to the Neural Sessions' Chairman
by October 15, 1993.
Authors must then submit five copies of their typed manuscript by postal
mail or fax to the Neural Sessions' Chairman by November 19, 1993.
Preliminary notification of acceptance/rejection will be mailed by
November 30, 1993. Final acceptance/rejection will be mailed by January
31, 1994.


Neural Sessions' Chairman: Prof. Vincenzo Piuri
                           Department of Electronics and Information
                           Politecnico di Milano
                           piazza L. da Vinci 32
                           I-20133 Milano, Italy
                           phone no. +39-2-23993606, +39-2-23993623
                           fax no. +39-2-23993411
                           e-mail piuri at ipmel1.polimi.it

=============================================================================


From goodman at unr.edu  Thu Aug 26 12:35:53 1993
From: goodman at unr.edu (Phil Goodman)
Date: Thu, 26 Aug 93 16:35:53 GMT
Subject: NevProp 1.16 Update Available
Message-ID: <9308262335.AA24854@equinox.ccs.unr.edu>

Please consider the following update announcement:
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
NevProp 1.16 corrects a bug in the output range of symmetric sigmoids and
one occuring when the number of testing is fewer than training cases.
These fixes are further described in the README.CHANGES file at the
UNR anonymous ftp, described below.

The UNR anonymous ftp host is 'unssun.scs.unr.edu', and the files are
in the directory 'pub/goodman/nevpropdir'.

Version 1.15 users can update 3 ways:

a. Just re-ftp the 'nevprop1.16.shar' file and unpack and 'make' np again.
   (also available at the CMU machine, describe below.)

b. Just re-ftp (in "binary" mode) the DOS or MAC executable binaries
   located in the 'dosdir' or 'macdir' subdirectories, respectively.

c. Ftp only the 'np.c' file provided, replacing your old version, then 'make'

d. Ftp only the 'np-patchfile', then issue the command
  'patch < np-patchfile' to locally update np.c, then 'make' again.


New users can obtain NevProp 1.16 from the anonymous UNR anonymous ftp
as described in (a) or (b) above, or from the CMU machine:

a. Create an FTP connection from wherever you are to machine
       "ftp.cs.cmu.edu".  The internet address of this machine is
       128.2.206.173, for those who need it.

b. Log in as user "anonymous" with your own ID as password.
       You may see an error message that says "filenames may not
       have /.. in them" or something like that.  Just ignore it.

c. Change remote directory to "/afs/cs/project/connect/code".
       NOTE: You must do this in a single operation.  Some of the
       super directories on this path are protected against outside
       users.

d. At this point FTP should be able to get a listing of files
       in this directory with "dir" & fetch the ones you want with "get".
       (The exact FTP commands depend on your local FTP server.)

Version 1.2 will be released soon. A major new feature will be the option
of using cross-entropy rather than least squares error function.

Phil
___________________________
___________________________ Phil Goodman,MD,MS  goodman at unr.edu
| __\ | _ \ |  \/  || _ \   Associate Professor & CBMR Director
||    ||_// ||\  /||||_//   Cardiovascular Studies Team Leader
||    | _(  || \/ ||| _(
||__  ||_\\ ||    |||| \\   CENTER for BIOMEDICAL MODELING RESEARCH
|___/ |___/ ||    ||||  \\  University of Nevada School of Medicine
                            Washoe Medical Center H1-166, 77 Pringle Way,
                            Reno, NV 89520   702-328-4867  FAX:328-4111

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


From heiniw at sun1.eeb.ele.tue.nl  Fri Aug 27 08:37:13 1993
From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen)
Date: Fri, 27 Aug 1993 14:37:13 +0200 (MET DST)
Subject: Neural hardware performance criteria
Message-ID: <9308271237.AA00409@sun1.eeb.ele.tue.nl>

A non-text attachment was scrubbed...
Name: not available
Type: text
Size: 1296 bytes
Desc: not available
Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/c47cdf08/attachment.ksh

From alex at brain.physics.swin.oz.au  Sat Aug 28 07:05:34 1993
From: alex at brain.physics.swin.oz.au (Alex A Sergejew)
Date: Sat, 28 Aug 93 21:05:34 +1000
Subject: Pan Pacific Conf on Brain Electric Topography - 1st announcement
Message-ID: <9308281105.AA12138@brain.physics.swin.oz.au>

                            FIRST ANNOUNCEMENT

                          PAN PACIFIC CONFERENCE 
                                    ON 
                         BRAIN ELECTRIC TOPOGRAPHY

                           February 10 - 12, 1994
                              SYDNEY, AUSTRALIA

INVITATION

Brain electric and magnetic topography is an exciting emerging area which
draws on the disciplines of neurophysiology, physics,  signal  processing, 
computing and cognitive neuroscience.  This conference will offer a forum for
the presentation of recent findings.  The program will include an outstanding 
series of plenary lectures, as well as platform and poster presentations
by active participants in the field.

The conference includes two major plenary sessions.

In the Plenary Session entitled "Brain Activity Topography and Cognitive
Processes," the keynote speakers include Frank Duffy (Boston), Alan Gevins
(San Francisco), Steven Hillyard (La Jolla), Yoshihiko Koga (Tokyo) and Paul
Nunez (New Orleans).

Keynote speakers for the Plenary Session entitled "Brain Rhythmic
Activity and States of Consciousness," will include Walter Freeman (Berkeley),
Rodolfo Llinas (New York), Shigeaki Matsuoka (Kitakyushu)
and Yuzo Yamaguchi (Osaka).

The plenary sessions will provide a forum for discussion of some of the most
recent developments of analysis and models of electrical brain function, and
findings of brain topography and cognitive processes.

This conference is aimed at harnessing multidisciplinary participation and
will be of interest to those working in the areas of clinical neurophysiology,
cognitive neuroscience, biological signal processing, neurophysiology,
neurology, neuropsychology and neuropsychiatry.


CALL FOR PAPERS

Papers are invited for platform and poster presentation. 
Platform presentations will be allocated 20 minutes 
(15 mins for presentation and 5 mins for questions).
Abstracts of no more than 300 words are invited. 

The deadline for receipt of abstracts is November 10th, 1993, while
notification of acceptance of abstracts will be sent on December 10th, 1993

The abstract can be sent by mail, Fax or Email to:
PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY
C/- Cognitive Neuroscience Unit
Westmead Hospital, Hawkesbury Road
Westmead  NSW  2145, Sydney
AUSTRALIA

Fax :   +61 (2) 635 7734
Tel :   +61 (2) 633 6688 
Email : pan at brain.physics.swin.oz.au

Authors may be invited to provide full manuscripts for publication of the
proceedings in CD-ROM and book form.  All authors wishing to have their
papers included must supply a full manuscript at the time of the conference.


GENERAL INFORMATION:

Date:   February 10 - 12, 1994

Venue:
The conference will be held at the Hotel Intercontinental on Sydney Harbour.

Climate:
February is summertime in Australia and the average maximum day-time 
temperature in Sydney is 26 degC (78 degF).

Social Programme:
There will be a conference dinner on a yacht sailing Sydney Harbour
on February 11th, 1994.  Cost $A65 per person.

Hotel Accommodation:
Hotels listed offer a range of accommodation at special conference rates.
Please quote the name of the conference when arranging your booking.


Scientific Committee:                             	Organising Committee:
Prof Richard Silberstein, Melbourne (Chairman)    	E Gordon (Chairman)
A/Prof Helen Beh, Sydney                          	R Silberstein
Dr Evian Gordon, Sydney                           	J Restom
Dr Shigeaki Matsuoka, Kitakyushu
Dr Patricia Michie, Sydney 
Dr Ken Nagata, Akita
Dr Alex Sergejew, Melbourne
A/Prof James Wright, Auckland


REGISTRATION:

Name(Prof/Dr/Ms/Mr):__________________________________________________
Address:______________________________________________________________
______________________________________________________________________
______________________________________________________________________
Telephone:  ______________________________ (include country/area code)
Fax:______________________________ E Mail______________________________

On or before November 10th, 1993    	        $A380.00
After November 10th, 1993           		$A400.00

Students before November 10th,1993 	        $A250.00

Conference Harbour Cruise Dinner     	        $A65.00 per person
    number of people _____


Method of Payment:

Cheque _	MasterCard _	  VISA _	BankCard _

To be completed by credit card users only:

Card Number  	_ _ _ _   _ _ _ _   _ _ _ _   _ _ _ _

Expiration Date	__________________________

Signature	__________________________ (Signature not required if 
					    registering by E-mail)

Date		__________________________

Cheques should be payable to "Pan Pacific Conference"
(Address below)


SOME SUGGESTIONS FOR HOTEL ACCOMODATION

Special conference rates apply.  Quote the name of the conference when booking.
Prices are per double room per night

SYDNEY RENAISSANCE HOTEL*****
Guaranteed harbour view.  10 min walk under cover.  $A170.00
30 Pitt St, Sydney NSW 2000, Australia.
Ph: +61 (2) 259 7000  Fax +61 (2) 252 1999

HOTEL INTERCONTINENTAL SYDNEY*****
Harbour view $A205.00  City View $A165.00
117 Macquarie Street, Sydney NSW 2000, Australia.
Ph: +61 (2) 230 0200  Fax: +61 (2) 240 1240

OLD SYDNEY PARKROYAL**** 
10 min walk.  $A190.00 including breakfast
55 George St, Sydney NSW 2000, Australia.
Ph: +61 (2) 252 0524  Fax: (2) +61 251 2093

RAMADA GRAND HOTEL, BONDI BEACH****
Complementary shuttlebus service.  $A130 - $A170 including breakfast 
Beach Rd, Bondi Beach NSW 2026, Australia.
Ph: +61 (2) 365 5666  Fax: +61 (2) 3655 330

HOTEL CRANBROOK INTERNATIONAL***
Older style, budget type accomodation overlooking Rose Bay.
Free shuttlebus service and airport transfers.  $A80.00 including breakfast
601 New South Head Rd, Rose Bay NSW 2020, Australia.
Ph: +61 (2) 252 0524  Fax: +61 (2) 251 2093


Post registration details with your cheque to:

PAN PACIFIC CONFERENCE ON ELECTRIC BRAIN TOPOGRAPHY
C/- Cognitive Neuroscience Unit
Westmead Hospital, Hawkesbury Road
Westmead  NSW  2145, Sydney
AUSTRALIA


From taylor at world.std.com  Sun Aug 29 22:21:27 1993
From: taylor at world.std.com (Russell R Leighton)
Date: Sun, 29 Aug 1993 22:21:27 -0400
Subject: AM6 Users: release notes and bug fixes available
Message-ID: <199308300221.AA27236@world.std.com>


There has been an update to the am6.notes file at the AM6
ftp sites. User's not on the AM6 users mailing list
should get this file and update their installation.

Russ


======== REPOST OF AM6 RELEASE (long) ========

 
The following describes a neural network simulation environment
made available free from the MITRE Corporation. The software
contains a neural network simulation code generator which generates
high performance ANSI C code implementations for modular backpropagation 
neural networks. Also included is an interface to visualization tools.

		  FREE NEURAL NETWORK SIMULATOR
			   AVAILABLE

		        Aspirin/MIGRAINES 

			   Version 6.0

The Mitre Corporation is making available free to the public a
neural network simulation environment called Aspirin/MIGRAINES.
The software consists of a code generator that builds neural network
simulations by reading a network description (written in a language
called "Aspirin") and generates an ANSI C simulation. An interface 
(called "MIGRAINES") is provided to export data from the neural
network to visualization tools. The previous version (Version 5.0)
has over 600 registered installation sites world wide.

The system has been ported to a number of platforms:

Host platforms:
	convex_c2	/* Convex C2 */
	convex_c3	/* Convex C3 */
	cray_xmp        /* Cray XMP */
	cray_ymp        /* Cray YMP */
	cray_c90        /* Cray C90 */
	dga_88k         /* Data General Aviion w/88XXX */
	ds_r3k          /* Dec Station w/r3000 */
	ds_alpha        /* Dec Station w/alpha */
	hp_parisc       /* HP w/parisc */ 
	pc_iX86_sysvr4  /* IBM pc 386/486 Unix SysVR4 */
	pc_iX86_sysvr3  /* IBM pc 386/486 Interactive Unix SysVR3 */
	ibm_rs6k        /* IBM w/rs6000 */
	news_68k        /* News w/68XXX */
	news_r3k        /* News w/r3000 */
	next_68k	/* NeXT w/68XXX */
	sgi_r3k 	/* Silicon Graphics w/r3000 */
	sgi_r4k 	/* Silicon Graphics w/r4000 */
	sun_sparc	/* Sun w/sparc */
	sun_68k		/* Sun w/68XXX */
 
Coprocessors:
	mc_i860		/* Mercury w/i860 */
	meiko_i860	/* Meiko w/i860 Computing Surface */


Included with the software are "config" files for these platforms. 
Porting to other platforms may be done by choosing the "closest"
platform currently supported and adapting the config files.


New Features
------------
		- ANSI C ( ANSI C compiler required! If you do not
		  have an ANSI C compiler,  a free (and very good) 
		  compiler called gcc is available by anonymous ftp
		  from prep.ai.mit.edu (18.71.0.38). ) 
		  Gcc is what was used to develop am6 on Suns.

		- Autoregressive backprop has better stability
		  constraints (see examples: ringing and sequence),
		  very good for sequence recognition

		- File reader supports "caching" so you can
		  use HUGE data files (larger than physical/virtual
		  memory).

		- The "analyze" utility which aids the analysis
		  of hidden unit behavior (see examples: sonar and
		  characters)

		- More examples

		- More portable system configuration
		  for easy installation on systems
		  without a "config" file in distribution
Aspirin 6.0
------------

The software that we are releasing now is for creating, 
and evaluating, feed-forward networks such as those used with the 
backpropagation learning algorithm. The software is aimed both at 
the expert programmer/neural network researcher who may wish to tailor
significant portions of the system to his/her precise needs, as well
as at casual users who will wish to use the system with an absolute
minimum of effort. 

Aspirin was originally conceived as ``a way of dealing with MIGRAINES.''
Our goal was to create an underlying system that would exist behind
the graphics and provide the network modeling facilities.
The system had to be flexible enough to allow research, that is, 
make it easy for a user to make frequent, possibly substantial, changes
to network designs and learning algorithms. At the same time it had to 
be efficient enough to allow large ``real-world'' neural network systems
to be developed. 

Aspirin uses a front-end parser and code generators to realize this goal. 
A high level declarative language has been developed to describe a network.
This language was designed to make commonly used network constructs simple 
to describe, but to allow any network to be described.  The Aspirin file 
defines the type of network, the size and topology of the network, and 
descriptions of the network's input and output. This file may also include
information such as initial values of weights, names of user defined 
functions.

The Aspirin language is based around the concept of a "black box".
A black box is a module that (optionally) receives input and
(necessarily) produces output.  Black boxes are autonomous units
that are used to construct neural network systems.  Black boxes
may be connected arbitrarily to create large possibly heterogeneous 
network systems. As a simple example, pre  or post-processing stages 
of a neural network can be considered black boxes that do not learn.

The output of the Aspirin parser is sent to the appropriate code 
generator that implements the desired neural network paradigm. 
The goal of Aspirin is to provide a common extendible front-end language 
and parser for different network paradigms. The publicly available software
will include a backpropagation code generator that supports several
variations of the backpropagation learning algorithm.  For backpropagation
networks and their variations, Aspirin supports a wide variety of 
capabilities: 
	1. feed-forward layered networks with arbitrary connections
        2. ``skip level'' connections 
	3. one and two-dimensional weight tessellations
        4. a few node transfer functions (as well as user defined)
	5. connections to layers/inputs at arbitrary delays,
	   also "Waibel style" time-delay neural networks
	6. autoregressive nodes.
	7. line search and conjugate gradient optimization

The file describing a network is processed by the Aspirin parser and
files containing C functions to implement that network are generated.
This code can then be linked with an application which uses these
routines to control the network. Optionally, a complete simulation 
may be automatically generated which is integrated with the MIGRAINES
interface and can read data in a variety of file formats. Currently
supported file formats are:
	Ascii
	Type1, Type2, Type3 Type4 Type5 (simple floating point file formats)
	ProMatlab

Examples
--------

A set of examples comes with the distribution:

xor: from RumelHart and McClelland, et al,
"Parallel Distributed Processing, Vol 1: Foundations",
MIT Press, 1986, pp. 330-334.

encode: from RumelHart and McClelland, et al,
"Parallel Distributed Processing, Vol 1: Foundations",
MIT Press, 1986, pp. 335-339.

bayes: Approximating the optimal bayes decision surface
for a gauss-gauss problem.

detect: Detecting a sine wave in noise.

iris: The classic iris database.

characters: Learing to recognize 4 characters independent
of rotation.

ring: Autoregressive network learns a decaying sinusoid
impulse response.

sequence: Autoregressive network learns to recognize
a short sequence of orthonormal vectors.

sonar: from  Gorman, R. P., and Sejnowski, T. J. (1988).  
"Analysis of Hidden Units in a Layered Network Trained to
Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89.

spiral: from  Kevin J. Lang and Michael J, Witbrock, "Learning
to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist
Models Summer School, Morgan Kaufmann, 1988.

ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987).  
"Parallel networks that learn to pronounce English text" in 
Complex Systems, 1, 145-168.

perf: a large network used only for performance testing.

monk: The backprop part of the monk paper. The MONK's problem were
the basis of a first international comparison
of learning algorithms. The result of this comparison is summarized in
"The MONK's Problems - A Performance Comparison of Different Learning
algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I.  Bratko, B.
Cestnik, J. Cheng, K. De Jong, S.  Dzeroski, S.E. Fahlman, D. Fisher,
R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J.  Kreuziger, R.S.
Michalski, T. Mitchell, P.  Pachowicz, Y. Reich H.  Vafaie, W. Van de
Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as
Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec.
1991.

wine: From the ``UCI Repository Of Machine Learning Databases 
and Domain Theories'' (ics.uci.edu: pub/machine-learning-databases).

Performance of Aspirin simulations
----------------------------------

The backpropagation code generator produces simulations
that run very efficiently. Aspirin simulations do
best on vector machines when the networks are large,
as exemplified by the Cray's performance. All simulations 
were done using the Unix "time" function and include all 
simulation overhead. The connections per second rating was
calculated by multiplying the number of iterations by the
total number of connections in the network and dividing by the
"user" time provided by the Unix time function. Two tests were 
performed. In the first, the network was simply run "forward" 
100,000 times and timed. In the second, the network was timed
in learning mode and run until convergence. Under both tests
the "user" time included the time to read in the data and initialize
the network.

Sonar:

This network is a two layer fully connected network
with 60 inputs: 2-34-60. 
				Millions of Connections per Second
	Forward:               
	  SparcStation1:                    1
	  IBM RS/6000 320:                  2.8
	  HP9000/720:                       4.0
	  Meiko i860 (40MHz) :              4.4
	  Mercury i860 (40MHz) :            5.6
	  Cray YMP:                         21.9
	  Cray C90:                         33.2
	Forward/Backward:
	  SparcStation1:                    0.3
	  IBM RS/6000 320:                  0.8
	  Meiko i860 (40MHz) :              0.9
	  HP9000/720:                       1.1
	  Mercury i860 (40MHz) :            1.3
	  Cray YMP:                         7.6
	  Cray C90:                         13.5

Gorman, R. P., and Sejnowski, T. J. (1988).  "Analysis of Hidden Units
in a Layered Network Trained to Classify Sonar Targets" in Neural Networks,
Vol. 1, pp. 75-89.

Nettalk:

This network is a two layer fully connected network
with [29 x 7] inputs: 26-[15 x 8]-[29 x 7]
				Millions of Connections per Second
	Forward:               
	  SparcStation1:                      1
	  IBM RS/6000 320:                    3.5
	  HP9000/720:                         4.5
	  Mercury i860 (40MHz) :              12.4
	  Meiko i860 (40MHz) :                12.6
	  Cray YMP:                           113.5
	  Cray C90:                           220.3
	Forward/Backward:
	  SparcStation1:                      0.4
	  IBM RS/6000 320:                    1.3
	  HP9000/720:                         1.7
	  Meiko i860 (40MHz) :                2.5
	  Mercury i860 (40MHz) :              3.7
	  Cray YMP:                           40
	  Cray C90:                           65.6

Sejnowski, T.J., and Rosenberg, C.R. (1987).  "Parallel networks that
learn to pronounce English text" in Complex Systems, 1, 145-168.

Perf:

This network was only run on a few systems. It is very large
with very long vectors. The performance on this network
is in some sense a peak performance for a machine.

This network is a two layer fully connected network
with 2000 inputs: 100-500-2000
				Millions of Connections per Second
	Forward:               
	 Cray YMP               103.00
	 Cray C90               220
	Forward/Backward:
	 Cray YMP               25.46
	 Cray C90               59.3

MIGRAINES 
------------

The MIGRAINES interface is a terminal based interface
that allows you to open Unix pipes to data in the neural
network. This replaces the NeWS1.1 graphical interface
in version 4.0 of the Aspirin/MIGRAINES software. The
new interface is not a simple to use as the version 4.0
interface but is much more portable and flexible.
The MIGRAINES interface allows users to output
neural network weight and node vectors to disk or to
other Unix processes. Users can display the data using
either public or commercial graphics/analysis tools.
Example filters are included that convert data exported through
MIGRAINES to formats readable by:

	- Gnuplot 3
	- Matlab
	- Mathematica
	- Xgobi

Most of the examples (see above) use the MIGRAINES
interface to dump data to disk and display it using
a public software package called Gnuplot3.

Gnuplot3 can be obtained via anonymous ftp from:

>>>> In general, Gnuplot 3  is available as the file gnuplot3.?.tar.Z 
>>>> Please obtain gnuplot from the site nearest you. Many of the major ftp
>>>> archives world-wide have already picked up the latest version, so if
>>>> you found the old version elsewhere, you might check there.
>>>> 
>>>> NORTH AMERICA:
>>>> 
>>>>      Anonymous ftp to dartmouth.edu (129.170.16.4)
>>>>      Fetch
>>>>         pub/gnuplot/gnuplot3.?.tar.Z
>>>>      in binary mode.

>>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' 
>>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution 
>>>>>>>> is also there (in that directory).
>>>>>>>>
>>>>>>>> There is a problem to be aware of--you will need to recompile. 
>>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c
>>>>>>>> file separately with the HELPFILE defined as the entire path name
>>>>>>>> (including the help file name.) If you don't, the Makefile will over
>>>>>>>> ride the def and help won't work (in fact it will bomb the program.)

NetTools
-----------
We have include a simple set of analysis tools
by Simon Dennis and Steven Phillips.
They are used in some of the examples to illustrate
the use of the MIGRAINES interface with analysis tools.
The package contains three tools for network analysis:

	gea - Group Error Analysis
	pca - Principal Components Analysis
	cda - Canonical Discriminants Analysis

Analyze
-------
"analyze" is a program inspired by Denis and Phillips'
Nettools. The "analyze" program does PCA, CDA, projections, 
and histograms. It can read the same data file formats as are 
supported by "bpmake" simulations and output data in a variety 
of formats. Associated with this utility are shell scripts that 
implement data reduction and feature extraction. "analyze" can be
used to understand how the hidden layers separate the data in order to
optimize the network architecture.


How to get Aspirin/MIGRAINES
-----------------------

The software is available from two FTP sites, CMU's simulator
collection and UCLA's cognitive science machines.  The compressed tar
file is a little less than 2 megabytes.  Most of this space is
taken up by the documentation and examples. The software is currently
only available via anonymous FTP.

> To get the software from CMU's simulator collection:

1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu"
(128.2.254.155). 

2. Log in as user "anonymous" with password your username.

3. Change remote directory to "/afs/cs/project/connect/code".  Any
subdirectories of this one should also be accessible.  Parent directories
should not be. ****You must do this in a single operation****:
	cd /afs/cs/project/connect/code

4. At this point FTP should be able to get a listing of files in this
directory and fetch the ones you want.

Problems? - contact us at "connectionists-request at cs.cmu.edu".

5. Set binary mode by typing the command "binary"  ** THIS IS IMPORTANT **

6. Get the file "am6.tar.Z"

7. Get the file "am6.notes"

> To get the software from UCLA's cognitive science machines:

1. Create an FTP connection to "ftp.cognet.ucla.edu" (128.97.8.19)
(typically with the command "ftp ftp.cognet.ucla.edu")

2. Log in as user "anonymous" with password your username.

3. Change remote directory to "pub/alexis", by typing the command "cd pub/alexis"

4. Set binary mode by typing the command "binary"  ** THIS IS IMPORTANT **

5. Get the file by typing the command "get am6.tar.Z"

6. Get the file "am6.notes"

Other sites
-----------

If these sites do not work well for you, then try the archie
internet mail server. Send email:
	To: archie at cs.mcgill.ca
	Subject: prog am6.tar.Z
Archie will reply with a list of internet ftp sites
that you can get the software from.

How to unpack the software
--------------------------

After ftp'ing the file make the directory you
wish to install the software. Go to that
directory and type:

	zcat am6.tar.Z | tar xvf - 

	      -or-

	uncompress am6.tar.Z ; tar xvf am6.tar

How to print the manual
-----------------------

The user documentation is located in ./doc in a 
few compressed PostScript files. To print 
each file on a PostScript printer type:
	uncompress *.Z
	lpr -s *.ps

Why?
----

I have been asked why MITRE is giving away this software.
MITRE is a non-profit organization funded by the
U.S. federal government. MITRE does research and
development into various technical areas. Our research
into neural network algorithms and applications has
resulted in this software. Since MITRE is a publically
funded organization, it seems appropriate that the
product of the neural network research be turned back
into the technical community at large.

Thanks
------

Thanks to the beta sites for helping me get
the bugs out and make this portable.

Thanks to the folks at CMU and UCLA for the ftp sites.

Copyright and license agreement
-------------------------------

Since the Aspirin/MIGRAINES system is licensed free of charge,
the MITRE Corporation provides absolutely no warranty. Should
the Aspirin/MIGRAINES system prove defective, you must assume
the cost of all necessary servicing, repair or correction.
In no way will the MITRE Corporation be liable to you for
damages, including any lost profits, lost monies, or other
special, incidental or consequential damages arising out of
the use or in ability to use the Aspirin/MIGRAINES system.

This software is the copyright of The MITRE Corporation. 
It may be freely used and modified for research and development
purposes. We require a brief acknowledgement in any research
paper or other publication where this software has made a significant
contribution. If you wish to use it for commercial gain you must contact 
The MITRE Corporation for conditions of use. The MITRE Corporation 
provides absolutely NO WARRANTY for this software.

		Russell Leighton                    ^    / |\ /|
		INTERNET: taylor at world.std.com     |-|  /  | | |
         	                                   | | /   | | |


From sun at umiacs.UMD.EDU  Mon Aug 30 13:11:10 1993
From: sun at umiacs.UMD.EDU (Guo-Zheng Sun)
Date: Mon, 30 Aug 93 13:11:10 -0400
Subject: Preprint
Message-ID: <9308301711.AA06031@sunsp2.umiacs.UMD.EDU>

Reprint: THE NEURAL NETWORK PUSHDOWN AUTOMATON: MODEL, STACK AND LEARNING
SIMULATIONS

The following reprint is available via the NEC Research Institute ftp
archive external.nj.nec.com. Instructions for retrieval from the archive
follow the abstract summary. Comments and remarks are always appreciated.

-----------------------------------------------------------------------------
..............................................................................
                     "The Neural Network Pushdown Automaton:
		      Model, Stack and Learning Simulations"

           G.Z. Sun(a,b), C.L. Giles(b,c), H.H. Chen(a,b), Y.C. Lee(a,b) 

(a) Laboratory for Plasma Research and (b) Institute for Advanced Computer
Studies, U. of Maryland, College Park, MD 20742  (c) NEC Research Institute,
4 Independence Way, Princeton, NJ 08540

In order for neural networks to learn complex languages or grammars, they
must have sufficient computational power or resources to recognize or
generate such languages.  Though many approaches have been discussed, one
obvious approach to enhancing the processing power of a recurrent neural
network is to couple it with an external stack mem ory - in effect creating
a neural network pushdown automata (NNPDA). This paper discusses in detail
this NNPDA - its construction, how it can be trained and how useful
symbolic information can be extracted from the trained network.

In order to couple the external stack to the neural network, an
optimization method is developed which uses an error function that connects
the learning of the state automaton of the neural network to the learning
of the operation of the external stack. To minimize the error function
using gradient descent learning, an analog stack is designed such that the
action and storage of information in the stack are continuous. One
interpretation of a continuous stack is the probabilistic storage of and
action on data. After training on sample strings of an unknown source
grammar, a quantization procedure extracts from the analog stack and neural
network a discrete pushdown automata (PDA). Simulations show that in
learning deterministic context-free grammars - the balanced parenthesis
language, 1n0n, and the deterministic Palindrome - the extracted PDA is
correct in the sense that it can correctly recognize unseen strings of
arbitrary length. In addition, the extracted PDAs can be shown to be
identical or equivalent to the PDAs of the source grammars which were used
to generate the training strings.

UNIVERSITY OF MARYLAND TR NOs. UMIACS-TR-93-77 & CS-TR-3118, August 20,
1993.

---------------------------------------------------------------------------

                          FTP INSTRUCTIONS

                unix> ftp external.nj.nec.com (138.15.10.100)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/giles/papers
                ftp> binary
                ftp> get NNPDA.ps.Z
                ftp> quit
                unix> uncompress NNPDA.ps.Z

		(Please note that this is a 35 page paper.)

-----------------------------------------------------------------------------


From biblio at nucleus.hut.fi  Tue Aug 31 13:08:00 1993
From: biblio at nucleus.hut.fi (Bibliography)
Date: Tue, 31 Aug 93 13:08:00 DST
Subject: Kohonen maps & LVQ -- huge bibliography (and reference request)
Message-ID: <9308311008.AA20054@nucleus.hut.fi.hut.fi>


Hello,

We are in the process of compiling the complete bibliography
of works on Kohonen Self-Organizing Map and Learning Vector
Quantization all over the world. Currently the bibliography
contains more than 1000 entries. The bibliography is now
available (in BibTeX and PostScript formats) by anonymous FTP from:

cochlea.hut.fi:/pub/ref/references.bib.Z  ( BibTeX file)
cochlea.hut.fi:/pub/ref/references.ps.Z   ( PostScript file)

The above files are compressed. Please make sure you use "binary" mode
when you transfer these files.

Please send any additions and corrections to :

     biblio at cochlea.hut.fi

Please follow the IEEE instructions of references (full names of
authors, name of article, journal name, volume + number where applicable,
first and last page number, year, etc.) and BibTeX-format, if possible.

Yours,
	Jari Kangas
	Helsinki University of Technology
	Laboratory of Computer and Information Science
	Rakentajanaukio 2 C
	SF-02150 Espoo,
	FINLAND


From hicks at cs.titech.ac.jp  Sun Aug  1 16:14:14 1993
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Sun, 1 Aug 93 16:14:14 JST
Subject: Multiple Models, Committee of nets etc...
In-Reply-To: "Michael P. Perrone"'s message of Thu, 29 Jul 93 03:27:21 EDT <9307290727.AA19084@cns.brown.edu>
Message-ID: <9308010714.AA25751@maruko.cs.titech.ac.jp>


Michael P. Perrone  writes

>Tom Dietterich write:
>> This analysis predicts that using a committee of very diverse
>> algorithms (i.e., having diverse approximation errors) would yield
>> better performance (as long as the committee members are competent)
>> than a committee made up of a single algorithm applied multiple times
>> under slightly varying conditions.
>
>and David Wolpert writes:
>>There is a good deal of heuristic and empirical evidence supporting
>>this claim. In general, when using stacking to combine generalizers,
>>one wants them to be as "orthogonal" as possible, as Tom maintains.
>
>One minor result from my thesis shows that when the estimators are
>orthogonal in the sense that
>
>              E[n_i(x)n_j(x)] = 0 for all i<>j
>
>where n_i(x) = f(x) - f_i(x), f(x) is the target function, f_i(x) is
>the i-th estimator and the expected value is over the underlying 
>distribution; then the MSE of the average estimator goes like 1/N
>times the average of the MSE of the estimators where N is the number 
>of estimators in the population.  
>
>This is a shocking result because all we have to do to get arbitrarily 
>good performance is to increase the size of our estimator population!
>Of course in practice, the nets are correlated and the result is no
>longer true.


The matrix E[n_i(x)n_j(x)] may not be known but an estimate E[n'_i(x)n'_j(x)]
may be obtained using some training data which is different from the training
data used to train the generalizers in the first place. Here n'_i(x) = f'(x) -
f_i(x), E[n'_i(x)] = 0, f'(x) is a training data, f_i(x) is the i-th
estimator, and the expected value is over the training data.  Take the
eigenvectors (with non-zero eigenvalues) of E[n'_i(x)n'_j(x)] and you have a
set of generalizers (each a linear combination of the original generalizers)
which are orthogonal and uncorrelated over the training data, i.e.
E[n'_i(x)n'_j(x)] = 0 for all i<>j.  They can even be normalized by 
their eigenvalues so that E[n'_i(x)n'_j(x)] = 1 for all i==j. 

To summarize, in practice the generalizers can be de-correlated (to the extent
that they are linearly independent) by finding new generalizers composed of
appropriate linear sums of the originals.


I have an unrelated comment regarding Drucker Harris' earlier mail about using
synthetic data to improve performance.  Wouldn't it be true to say that if you
had a choice between learning with N synthetically created data and learning
with N novel training data that the latter is, on average, going to give
better results?  If so, then using synth data is a way to stretch your
training data; something like potato helper.


From jim at hydra.maths.unsw.EDU.AU  Sun Aug  1 19:51:14 1993
From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU)
Date: Mon, 2 Aug 93 09:51:14 +1000
Subject: committees
Message-ID: <9308012351.AA15492@hydra.maths.unsw.EDU.AU>

 A small caveat about when it is good to average different estimates
of an unknown quantity:
 If you have a fairly accurate and a fairly inaccurate way of estimating
something, it is obviously not good to take their simple average (that
is, half of one plus half of the other). The correct weighting of the
estimates is in inverse proportion to their variances (that is, keep
closer to the more accurate one). (At least, that is the correct
weighting if the estimates are independent: if they are correlated,
it is more complicated, but not much more). Proofs are easy, and included
in the ref below:

R. Templeton & J. Franklin, `Adaptive information and animal behaviour',
Evolutionary Theory 10 (Dec 1992): 145-155.

(Note that this concerns inaccurate estimates, not biased ones, as some
previous posters have been considering).
Of course, averaging and correlations are very easy calculations for
neural nets.
Some similar ideas have been studied in connection with "sensor fusion"
for robots:

Journal of Robotic Systems 7 (3): (1990), Special issue on multisensor
integration and fusion for intelligent robots, ed. R.C. Luo.

Interesting work on how real committees combine information is reviewed in:

D. Bunn & G. Wright, `Interaction of judgemental and statistical
forecasting methods: issues and analysis', Management Science 37 (1991): 501.


James Franklin
School of Mathematics
University of New South Wales


From mpp at cns.brown.edu  Mon Aug  2 17:54:29 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Mon, 2 Aug 93 17:54:29 EDT
Subject: Multiple Models, Committee of nets etc...
Message-ID: <9308022154.AA00323@cns.brown.edu>

Joydeep Ghosh writes:
> in our experiments, the difference between simple averaging
> and the best among other arbitration mechanisms does not
> seem statistically significant, thus supporting Waibel and
> Hampshire's observations.  The combination of
> networks trained on different feature vectors, on the other
> hand leads to 15-25% reduction in errors on a very difficult data set.

The result I discussed in a previous posting (that there is a 1/n relation
between the MSE of the averaged estimator and the avg. population MSE)
helps explain this result in the following terms:

   Averaging is more effective when the estimates are more distinct.

Thus in the example that Joydeep gives, the fact that different
features where used to generate different estimates suggests that those
estimates will be distinct (unless the features carry the same information).
also we have the advantage that using fewer features, we can use smaller
nets which helps avoid problems like over-fitting and the curse of dimensionality.

-Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912


From mpp at cns.brown.edu  Tue Aug  3 01:45:18 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Tue, 3 Aug 93 01:45:18 EDT
Subject: Committees
Message-ID: <9308030545.AA01131@cns.brown.edu>

David Wolpert writes:
-->Many of the results in the literature which appear to dispute this
-->are simply due to use of an error function which is not restricted to
-->being off-training set. In other words, there's always a "win" 
-->if you perform rationally on the training set (e.g., reproduce it
-->exactly, when there's no noise), if your error function gives you
-->points for performing rationally on the training set. In a certain
-->sense, this is trivial, and what's really interesting is off-training
-->set behavior. In any case, this automatic on-training set win is all
-->those aforementioned results refer to; in particular, they imply essentially
-->nothing concerning performance off of the training set.

In the case of averaging for MSE optimization (the meat and potatoes of 
neural networks) and any other convex measure, the improvement due
to averaging is independent of the distribution - on-training or off-.
It depends only on the topology of the optimization measure. 

It is important to note that this result does NOT say the average is better
than any individual estimate - only better than the average population
performance.  For example, if one had a reliable selection criterion for
deciding which element of a population of estimators was the best and that
estimator was better than the average estimator, then just choose the better
one. (Going one step further, simply use the selection criterion to choose
the best estimator from all possible weighted averages of the elements of
the population.) As David Wolpert pointed out, any estimator can be confounded
by a pathological data sample and therefore there doesn't exist a *guaranteed*
method for deciding which estimator is the best from a population in all cases.
Weak (as opposed to guaranteed) selection criteria exist in in the form of
cross-validation (in all of its flavors).  Coupling cross-validation with
averaging is a good idea since one gets the best of both worlds particularly
for problems with insufficient data.

I think that another very interesting direction for research (as David Wolpert
alluded to) is the investigation of more reliable selection criterion.

-Michael


From bernasch at forwiss.tu-muenchen.de  Tue Aug  3 03:41:45 1993
From: bernasch at forwiss.tu-muenchen.de (Jost Bernasch)
Date: Tue, 3 Aug 1993 09:41:45 +0200
Subject: weighting of estimates
In-Reply-To: jim@hydra.maths.unsw.EDU.AU's message of Mon, 2 Aug 93 09:51:14 +1000 <9308012351.AA15492@hydra.maths.unsw.EDU.AU>
Message-ID: <9308030741.AA29386@forwiss.tu-muenchen.de>


James Franklin writes:
 > If you have a fairly accurate and a fairly inaccurate way of estimating
 >something, it is obviously not good to take their simple average (that
 >is, half of one plus half of the other). The correct weighting of the
 >estimates is in inverse proportion to their variances (that is, keep
 >closer to the more accurate one).

Of course this is the correct weighting. Since the 60s this is done
very succesfully with the well-known "Kalman Filter". In this theory
the optimal combination of knowledge sources is described and
proofed in detail.

See the original work

@article{Kalman:60,
        AUTHOR = {R.E. Kalman},
        TITLE = "A New Approach to Linear Filtering and Prdiction Problems.",
        VOLUME = 12,
        number = 1,
        PAGES = {35--45},
        JOURNAL = "Trans. ASME, series D, J. Basic Eng.",
        YEAR = 1960
        }

 some neural network literature concerning this subject


@Article{WatanabeTzafestas:90,
  author = 	 "Watanabe and Tzafestas",
  title = 	 "Learning Algorithms for Neural Networks with the Kalman
                  Filter",
  journal = 	 JIRS,
  year = 	 1990,
  volume = 	 3,
  number = 	 4,
  pages = 	 "305-319",
  keywords=     "kalman, neural net"
}
@string{JIRS = {Journal of Intelligent and Robotic Systems}}

and a very good and practice oriented book

@book{Gelb:74,
        AUTHOR = "A. Gelb",
        TITLE = "Applied {O}ptimal {E}stimation",
        PUBLISHER = "{M.I.T} {P}ress, {C}ambridge, {M}assachusetts",
        YEAR = "1974"
     }

 (At least, that is the correct
 >weighting if the estimates are independent: if they are correlated,
 >it is more complicated, but not much more). Proofs are easy, and included
 >in the ref below:

For proofs and extensions to non-linear filtering and correlated
weights see the control theory literature. A lot of work is already
done!


-- Jost


Jost Bernasch       	
Bavarian Research Center for Knowledge-Based Systems 
Orleansstr. 34, D-81667 Muenchen , Germany
bernasch at forwiss.tu-muenchen.de 


From edelman at wisdom.weizmann.ac.il  Tue Aug  3 16:23:11 1993
From: edelman at wisdom.weizmann.ac.il (Edelman Shimon)
Date: Tue, 3 Aug 93 23:23:11 +0300
Subject: TR on representation with receptive fields available
Message-ID: <9308032023.AA23457@wisdom.weizmann.ac.il>

The following TR is available via anonymous ftp from
eris.wisdom.weizmann.ac.il (132.76.80.53), as /pub/rfs-for-recog.ps.Z

Representation with receptive fields: gearing up for recognition

Weizmann Institute CS-TR 93-09

Yair Weiss and Shimon Edelman

Abstract:

  Receptive fields are probably the most prominent and ubiquitous
  computational mechanism employed by biological information
  processing systems.  We report an attempt to understand the
  representational capabilities of the kind of receptive fields found
  in mammalian vision motivated by the assumption that the successive
  stages of processing remap the retinal representation space in a
  manner that makes objectively similar stimuli (e.g., different views
  of the same 3D object) closer to each other, and dissimilar stimuli
  farther apart. We present theoretical analysis and computational
  experiments that compare the similarity between stimuli as they are
  represented at the successive levels of the processing hierarchy,
  from the retina to the nonlinear cortical units. Our results
  indicate that population-based codes do convey information that
  seems lost in the activities of the individual receptive fields, and
  that at the higher levels of the hierarchy objects may be
  represented in a form that is more useful for visual recognition.
  This finding may, therefore, explain the success of previous
  empirical approaches to object recognition that employed
  representation by localized receptive fields.


From jim at hydra.maths.unsw.EDU.AU  Wed Aug  4 02:32:13 1993
From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU)
Date: Wed, 4 Aug 93 16:32:13 +1000
Subject: weighting of estimates
Message-ID: <9308040632.AA07933@hydra.maths.unsw.EDU.AU>

bernasch at forwiss.tu-muenchen.de (Jost Bernasch) writes:


>James Franklin writes:
 >> If you have a fairly accurate and a fairly inaccurate way of estimating
 >>something, it is obviously not good to take their simple average (that
 >>is, half of one plus half of the other). The correct weighting of the
 >>estimates is in inverse proportion to their variances (that is, keep
 >>closer to the more accurate one).
>
>Of course this is the correct weighting. Since the 60s this is done
>very succesfully with the well-known "Kalman Filter". In this theory
>the optimal combination of knowledge sources is described and
>proofed in detail.

Well, yes, in a way, but that's something like saying that the
motion of your body can be derived from Einstein's equations of
General Relativity. Too complicated. In particular, Kalman filters,
and control theory generally, are about time-varying entities, and
Kalman filters are an (essentially Bayesian) way of successively
updating estimates of a (possibly time-varying) quantity
(See R.J. Meinhold & N.D. Singpurwalla, `Understanding the
Kalman filter', American Statistician 37 (1983): 123).

The situation I was considering, and what is relevant to committees,
is much simpler (hence more general): how to combine estimates
(possibly correlated) of a single unknown quantity.

James Franklin
Mathematics
University of New South Wales


From Graham.Lamont at newcastle.ac.uk  Wed Aug  4 12:41:36 1993
From: Graham.Lamont at newcastle.ac.uk (Graham Lamont)
Date: Wed, 4 Aug 93 12:41:36 BST
Subject: multiple models, hybrid estimation
Message-ID: <AA04871.9308041141.poros@uk.ac.newcastle>

When I emailled Wray Buntine about his original posting on the subject of
multiple models, I quipped:

`Shhh.... dont tell everyone, they'll all want one!' (a multiple model)

Little did I know everyman and his dog appears to have one already:)

The recent postings and especially Michael Perrone's recent contribution(s)
have persuaded me to sketch the extent of my work in this area and donate a
FREE piece of Mathematica code.

 
I mention Michael's work because it follows the same basic approach of
general least squares as mine, and I agree with many of the points that he
raises in his general discussion of hybrid estimation, such as the need for
a completely general method, the utility of a closed form solution, and his
novel description of distinct local minima in functional space as opposed
to parameter space.

However.....
he says that for his method (GEM):


 >> 7) The *optimal* parameters of the ensemble estimator are given in closed
 >> form.


 I present a method in the same general spirit of Michael's that is
slightly more optimal and general (and I am not claiming even this is the
best!). It is based on the unconstrained least squares of the estimator
poulation "design matrix" via SVD.

1 Generality: The technique utilises singular value decomposition (SVD),
and hence avoids the problem of collinearity between estimators that can
(and often does) occur in a population of estimators as mentioned by
Michael. SVD happily copes with highly collinear or even duplicate
estimators in the design matrix, without preprocessing/thresholding.

2 Optimality: The technique places no constraint on the value of the
weights (MP [1] has sum=1 and also in the results he presents all w are
0<w<1 due to the simplification made ). The *unconstrained* minimisation is
ipso facto more optimal. Also the inclusion of a bias weight (see recipe)
can improve matters further.

The resulting +ve and -ve weightings of estimators can be intepreted
loosely as both competition and co-operation between estimators and near 0
weightings as redundant (non-distinct) estimators.

I, however do not claim that the technique is completely *optimal* since it
is not clear how much of the improvement due to combining estimators is due
to extra degrees of freedom. (The same is true of Michael's technique).

This type of technique however is completely *general*, and as alluded by
Michael, all manner of estimators can be added to the population. This
includes, networks, KNN, parzen regressors, information trees, 1st
principles models, expert sytems, even the original raw or preprocessed
data......in fact anything you have kicking about to make the population
more information rich.
---------------------------------------------------- 
Here is the recipe:

1. Given a design matrix A composed of m estimators f_i,i=1,m and n samples
(x,y) where x is a vector input, y a scalar target .

A_ij = f_i(x_j)  i=1,m j=1,n


2. Then the general unconstrained least squares minimisation of

|A.w-y|^2

wrt w the weight vector for the m estimators, is given by

3.       U_i.y
      w= ----- V_i
          m_i
 
where U.m.V^T=A are the singular value decomposition (SVD) of the design
matrix A.  4. The method is easily extended to include a bias weight by
simply adding an m+1th "estimator" which is simply a column of 1s.
---------------------------------------------------- 
Here is the FREE code:

Hybrid[A_,b_]:=
Block[{u,w,v,a},
{u,w,v}=SingularValues[A];
{a=((v.b)/w).u,(a.A-b).(a.A-b)}]

The routine accepts n x m design matrix A and vector b of n targets and
returns a vector of m weights, and the result of the minimisation (which is
of course always an underestimate of the *true* MSE, since it is
effectively the trainiTo use with bias simply do something like:

Hybrid[AddColumn[A,Table[1,{Length[y]}]],y]
--------------------------------------------------------------
Of course like any fitting procedure by increasing dof you can overfit.
It is recommended that your favourite resampling plan is used to avoid this.
Like everything else: GIGO.

1. @inproceedings{PerroneCooper93CAIP,
   AUTHOR    = {Michael P. Perrone and Leon N Cooper},
   TITLE     = {When Networks Disagree: Ensemble Method for Neural Networks},
   BOOKTITLE = {Neural Networks for Speech and Image processing},
   YEAR      = {1993},
   PUBLISHER = {Chapman-Hall},
   EDITOR    = {R. J. Mammone},
   NOTE      = {[To Appear]},
   where     = {London}
}


Cheers Graham


Graham Lamont
Department of Chemical and Process Engineering
Merz Court
University of Newcastle
Newcastle-upon-Tyne
NE1 7RU
UK

Phone: 91-2226000 x7241
Fax: 91-2611182
Email: graham.lamont at ncl.ac.uk


From sims at pdesds1.scra.org  Wed Aug  4 07:24:08 1993
From: sims at pdesds1.scra.org (sims@pdesds1.scra.org)
Date: Wed, 4 Aug 93 07:24:08 EDT
Subject: [ F.Y.I. Announcement of Conference on AI in Medicine]
Message-ID: <9308041124.AA02767@pdesds1.noname>

 This seemed relevant. Please excuse the bandwidth if it's not.

 jim

From:	IN%"DFP10 at ALBANY.ALBANY.EDU"  "Donald F. Parsons MD"  3-AUG-1993 05:38:36.51
To:	IN%"hspnet-l at albnydh2.bitnet"  "Rural Hospital Consulting Network"
CC:	
Subj:	Call for Papers: AIM-94 Spring Symposium

----------------------------Original message----------------------------
                             Call for Papers

                       AAAI 1994 Spring Symposium:
       Artificial Intelligence in Medicine:  Interpreting Clinical Data

           (March 21-23, 1994, Stanford University, Stanford, CA)


The deployment of on-line clinical databases, many supplanting the
traditional role of the paper patient chart, has increased rapidly over the
past decade. The consequent explosion in the quality and volume of
available clinical data, along with an ever more stringent medicolegal
obligation to remain aware of all implications of these data, has created a
substantial burden for the clinician. The challenge of providing intelligent
tools to help clinicians monitor patient clinical courses, forecast likely
prognoses, and discover new relational knowledge, is at least as large as
that generated by the knowledge explosion which motivated earlier efforts
in Artificial Intelligence in Medicine (AIM).  Whereas many of the
pioneering programs worked on small data sets which were entered
interactively by knowledge engineers or clinicians, the current generation
of programs have to act on raw data, unfiltered and unmediated by human
beings.  Interaction with human users typically only occurs on demand or on
detection of clinically significant events.  The emphasis of this symposium
will be on methodologies that provide robust autonomous performance in
data-rich clinical environments ranging from busy outpatient practices to
operating rooms and intensive care units. Relevant topics include
intelligent alarming (including anticipation and prevention of adverse
clinical events), data abstraction, sensor validation, preliminary event
classification, therapy advice, critiquing, and assistance in the
establishment and execution of clinical treatment protocols. Detection of
temporal and geographical patterns of disease manifestations and machine
learning of clinical patterns are also of interest.


Organizing committee

Serdar Uckun, Co-chair (Stanford University)
Isaac Kohane, Co-chair (Harvard Medical School)
Enrico Coiera (Hewlett-Packard Laboratories/Bristol)
Ramesh Patil (USC/Information Sciences Institute)
Mario Stefanelli (Universita di Pavia)


Format

A large data sample will be made available to participants to serve as
training and test sets for various approaches to information management and
to provide a common domain of discourse.  The sample will consist of two
data sets:

* A dense, high volume data set typical of a critical care environment.
This data set will consist of hemodynamic measurements, mechanical
ventilator settings, laboratory values including arterial blood gas
measurements, and treatment information covering a 12-hour period of a
patient with severe respiratory distress.  Monitored parameters (10-15
channels of data) will be sampled and recorded at rates up to 1/10 Hz.
The data set will be annotated with other clinically relevant data,
physician's interpretations, and established diagnoses.

* A large number of sparse data sets representative of outpatient
environments.  The data will include laboratory measurements, treatment
information, and physical findings on a large sample of patients (50 to
100 patients) taken from the same disorder population. Each patient record
will consist of several weeks' or months' worth of clinical information
sampled at irregular intervals.  Most of the cases will be made available
to interested researchers to be used as training cases.  For interested
parties, a small percentage of cases will be made available two weeks
prior to the symposium to be used as an optional testing set for various
approaches.

The data samples and accompanying clinical information will be available
via ftp or e-mail server around August 15, 1993.  Please contact the
organizers at the addresses below for further information.  The
data will also be made available on diskettes to participants who do not
have Internet access.  It will be left to the discretion of the participants
to use any subset of these samples to help focus their approaches and
presentations.  The data can also be used as test vehicles for their own
research and to create sample programs for demonstration at the symposium.
Participants do not have to use the data in order to participate.  However,
the program committee will favor presentations which exploit the provided
data sets in their analyses.


Submission process

Potential participants are invited to submit abstracts no longer than
2 pages (< 1200 words) by October 15, 1993.  The abstracts should outline
methodology and indicate, if applicable, how the provided data may be used
as a proof-of-principle for the discussed methodology.  Electronic submissions
are encouraged.  The abstracts may be sent to <aim-94 at camis.stanford.edu>
in ASCII, RTF, or PostScript formats.  Authors of accepted abstracts will
be asked to submit a working paper by January 31, 1994.  They will also be
asked to prepare either a poster or an oral presentation.


Submissions by mail

Use this method ONLY IF you cannot submit an abstract electronically.  Fax
submissions will not be accepted. Send 6 copies of the abstract to:

Serdar Uckun, MD, PhD
Co-chair, AIM-94
Knowledge Systems Laboratory
Stanford University
701 Welch Road, Bldg. C
Palo Alto, CA 94304
U.S.A.
Phone: [+1] (415) 723-1915


Calendar

      Abstracts due:                                    October 15, 1993
      Notification of authors by:                       November 15, 1993
      Working papers due:                               January 31, 1994
      Spring Symposium:                                 March 21-23, 1994


Information

For further information, please contact the co-chairs at the address above
or (preferably) via e-mail at:
        <aim-94 at camis.stanford.edu>


From hicks at cs.titech.ac.jp  Thu Aug  5 11:33:00 1993
From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp)
Date: Thu, 5 Aug 93 11:33:00 JST
Subject: weighting of estimates
In-Reply-To: Jost Bernasch's message of Tue, 3 Aug 1993 09:41:45 +0200 <9308030741.AA29386@forwiss.tu-muenchen.de>
Message-ID: <9308050233.AA29633@maruko.cs.titech.ac.jp>


Jost Bernasch writes:
>
>James Franklin writes:
> > If you have a fairly accurate and a fairly inaccurate way of estimating
> >something, it is obviously not good to take their simple average (that
> >is, half of one plus half of the other). The correct weighting of the
> >estimates is in inverse proportion to their variances (that is, keep
> >closer to the more accurate one).
>
>Of course this is the correct weighting. Since the 60s this is done
>very succesfully with the well-known "Kalman Filter". In this theory
>the optimal combination of knowledge sources is described and
>proofed in detail.
>
> (At least, that is the correct
> >weighting if the estimates are independent: if they are correlated,
> >it is more complicated, but not much more). Proofs are easy, and included
> >in the ref below:
>
>For proofs and extensions to non-linear filtering and correlated
>weights see the control theory literature. A lot of work is already
>done!

I think the comments about the Kalman filter are a bit off the mark.  The
Kalman filter is based on the mathematics of conditional expectation.
However, the Kalman filter is designed to be used for time series.  What makes
the Kalman filter particularly useful is its recursive nature; a stream of
observations may be processed (often in real time) to produce a stream of
current estimates (or next estimates if you're trying to beat the stock
market).  

Committees of networks may also use conditional expectation, but combining
networks is not the same as processing time series of data.  I think it is
appropriate at this point to bring up 2 classical results concerning
probability theory, conditional expectation, and wide sense conditional
expectation.  (Wide sense conditional expectation uses the same formulas as
conditional expectation.  "Wide sense' merely serves to emphasize that the
distribution is not assumed to be normal.  'Conditional expectation' is used
in the case where the underlying distribution is assumed to be normal.)

(1) When the objective function is to minimize the mean squared error over the
training data, the wide sense conditional expectation is the best linear
predictor, regardless of the original distribution.

(2) If the original distribution is normal, and the objective function is to minimize
the MSE over the >entire< distribution, (both on-training and off-training), then 
the conditional expectation is the best predictor, linear or otherwise.

There are 3 important factors here.
[1]: Underlying distribution (of network outputs): 	normal?  not normal?
[2]: Objective function (assume MSE):  			on-training?  off-training?
[3]: Predictor: 					linear?  non-linear?

{1}
[1:normal] => [2:off-training],[3:linear]
Neural nets (as opposed to systolic arrays) are needed because the world
is full of non-normal distributions.  But that doesn't mean that the ouputs of
non-linear networks don't have joint normal distributions (over off-training
data).  Perhaps the non-linearities have been successfully ironed out by the
non-linear networks, leaving only linear (or nearly linear) errors to be
corrected.  In that case we can refer to result (2) to build the optimal
off-training predictor for the given committee of networks.

{2}
[1:not normal] and [2:on-training] and [3:linear] => best predictor is WSE. 
If the distribution of network outputs is not normal, and we use an
on-training criterion, then by virtue of (1), the best linear predictor is the
wide sense conditional expectation.  

{3}
[1:not normal] and [2:off-training] and [3:non-linear] => research
It is the case in {2} that since [1:not normal],
<1> better on-training results may be obtained using some non-linear predictor
<2> better on-or-off-training results may be obtained using some different criterion
<3> <1> and <2> together.
The problem is of course to find such criterion and non-linear predictors.
The existence of a priori knowledge can play an important role here; for
example adding a term to penelize the complexity of output functions.


In conclusion, if {1} is true, that is the networks have captured the
non-linearities and the network outputs are joint normal (or nearly normal)
distributions, we're home free.  Otherwise we ought to think about {3},
non-linear predictors and alternative criterion.  {2}, using the WSE, the best
performing linear predictor over the MSE of the on-training data, is useful to
get the job done, but is only optimal in a limited sense.


Craig Hicks           hicks at cs.titech.ac.jp
Ogawa Laboratory, Dept. of Computer Science
Tokyo Institute of Technology, Tokyo, Japan
lab: 03-3726-1111 ext. 2190  		home: 03-3785-1974
fax: +81(3)3729-0685 (from abroad), 03-3729-0685  (from Japan)


From mpp at cns.brown.edu  Thu Aug  5 15:13:44 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Thu, 5 Aug 93 15:13:44 EDT
Subject: committees
Message-ID: <9308051913.AA13266@cns.brown.edu>

Scott Farrar writes:
-->John Hampshire characterized a committee as a collection of biased
-->estimators; the idea being that a collection of many different kinds of
-->bias might constitute a unbiased estimator.  I was wondering if anyone
-->had any ideas about how this might be related to, supported by, or refuted
-->by the Central Limit Theorem.  Could experimental variances or confounds
-->be likened to "biases", and if so, do these "average out" in a manner which
-->can give us a useful mean or useful estimator?

I think that this is a very interesting point because, for averaging with
MSE optimization, it is possible to show using the strong law of large numbers
that the bias of the average estimator converges to the expected bias of any
individual estimator while the variance converges to zero.  Thus the only way
to cancel existing bias using averaging is to average two (or more) different
populations from two (or more) estimators which are (somehow) known to have
complementary bias.  The trick is of course the "somehow"... Any ideas?

-Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912


From wray at ptolemy.arc.nasa.gov  Thu Aug  5 19:37:42 1993
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Thu, 5 Aug 93 16:37:42 PDT
Subject: committees
In-Reply-To: "Michael P. Perrone"'s message of Thu, 5 Aug 93 15:13:44 EDT <9308051913.AA13266@cns.brown.edu>
Message-ID: <9308052337.AA04745@ptolemy.arc.nasa.gov>


I'm not convinced that the notion of an "unbiased estimator" is useful
here.  

It comes from classical statistics and is really a means of justifying
the choice of an estimator for lack of better ideas.  An estimator is
"unbiased" if the average of the estimator based on all the other
samples which we might have seen (but didn't) is equal to the "truth".

Notice that unbiased estimators and the use of Occam's razor conflict.
We all routinely throw away an "unbiased" neural network,
	i.e.  the best fitting network, 
in favor of a smoother, simpler network,
	i.e.  by early stopping, weight decay, ....,
which is very clearly "biased".

So I think its a great thing to be biased.  

One reason for averaging is because we have several quite different
biased networks that we think are reasonable, so like any good
gambler, we hedge our bets.  Of course, averaging is also standard
Bayesian practice, i.e. an obvious result of the mathematics.

----------
Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov


From cohn at psyche.mit.edu  Fri Aug  6 11:28:11 1993
From: cohn at psyche.mit.edu (David Cohn)
Date: Fri, 6 Aug 93 11:28:11 EDT
Subject: Call for Participation: Workshop on Exploration
Message-ID: <9308061528.AA06177@psyche.mit.edu>

I am helping organize the following one-day workshop during the
post-NIPS workshops in Vail, Colorado, on December 3, 1993. We would
like to hear from people interested in participating in the workshop,
either formally, as a presenter, or informally, as an attendee. Even
if you will not be able to attend, if you have work which you feel is
relevant, and would like to see discussed, please contact me at the
email address below.

Given the limited time available, we will not be able to present
*every* approach, but we hope to cover a broad range of approaches,
both in formal presentations, and in informal discussion,

Many thanks in advance,

-David Cohn (cohn at psyche.mit.edu)

====================== begin workshop announcement =====================

         Robot Learning II: Exploration and Continuous Domains
                        A NIPS '93 Workshop
 
                            David Cohn
               Dept. of Brain and Cognitive Sciences
               Massachusetts Institute of Technology
                       Cambridge, MA 02138
                       cohn at psyche.mit.edu
 
The goal of this one-day workshop will be to provide a forum for
researchers active in the area of robot learning and related fields.
Due to the limited time available, we will focus on two major issues:
efficient exploration of a learner's state space, and learning in
continuous domains.
 
Robot learning is characterized by sensor noise, control error,
dynamically changing environments and the opportunity for learning by
experimentation.  A number of approaches, such as Q-learning, have
shown great practical utility learning under these difficult
conditions.  However, these approaches have only been proven to
converge to a solution if all states of a system are visited
infinitely often.  What has yet to be determined is whether we can
efficiently explore a state space so that we can learn without having
to visit every state an infinite number of times, and how we are to
address problems on continuous domains, where there are effectively an
infinite number of states to be visited.
 
This workshop is intended to serve as a followup to last year's
post-NIPS workshop on robot learning. The two problems to be addressed
this year were identified as two (of the many) crucial issues facing
the field.
 
The morning session of the workshop will consist of short
presentations discussing theoretical approaches to exploration and to
learning in continuous domains, followed by general discussion guided
by a moderator. The afternoon session will center on practical and/or
heuristic approaches to these problems in the same format.  As time
permits, we may also attempt to create an updated "Where do we go from
here?"  list, like that drawn up in last year's workshop.
 
The targeted audience for the workshop are those researchers who are
interested in robot learning, exploration, or active learning in
general.  We expect to draw an eclectic audience, so every attempt
will be made to ensure that presentations are accessible to people
without any specific background in the field.


From sontag at control.rutgers.edu  Fri Aug  6 17:35:16 1993
From: sontag at control.rutgers.edu (Eduardo Sontag)
Date: Fri, 6 Aug 93 17:35:16 EDT
Subject: Expository Tech Report on Neural Nets Available by FTP
Message-ID: <9308062135.AA06104@control.rutgers.edu>

As notes for a short course given at the 1993 European Control Conference this
summer, I prepared an expository introduction to two related topics:

1. Some mathematical results on "neural networks".

2. "Neurocontrol" and "learning control".

The choice of topics was heavily influenced by my interests, but some readers
may still find the material useful.  The two parts are essentially independent.
In particular, the part on mathematical results does not require any
knowledge of (nor interest in) control theory.

An *extended* version of the paper which appeared in the conference proceedings
is now available as a tech report.  This report, in postscript form, can be
obtained by anonymous FTP.  Retrieval instructions are as follows:

yourhost> ftp siemens.com
Connected to siemens.com.
220 siemens FTP server (SunOS 4.1) ready.
Name (siemens.com:sontag): anonymous
331 Guest login ok, send ident as password.
Password:
230 Guest login ok, access restrictions apply.
ftp> cd pub/learning/TechReports
250 CWD command successful.
ftp> bin
200 Type set to I.
ftp> get Sontag9302.ps.Z
200 PORT command successful.
150 Binary data connection for Sontag9302.ps.Z (128.6.62.9,1600) (114253 bytes)
226 Binary Transfer complete.
local: Sontag9302.ps.Z remote: Sontag9302.ps.Z
114253 bytes received in 24 seconds (4.6 Kbytes/s)
ftp> quit
221 Goodbye.
yourhost> uncompress Sontag9302.ps.Z
yourhost> lpr Sontag9302.ps (or however you print PostScript)
 
     ****** Please note: I am not able to send hardcopy. ******
-- 
Eduardo D. Sontag


From liaw%dylink.usc.edu at usc.edu  Fri Aug  6 18:45:39 1993
From: liaw%dylink.usc.edu at usc.edu (Jim Liaw)
Date: Fri, 6 Aug 93 15:45:39 PDT
Subject: Workshop on Neural Architectures and Distributed AI
Message-ID: <9308062245.AA23804@dylink.usc.edu>

Please note the change in deadline of submission of abstracts.

------


The Center for Neural Engineering
University of Southern California
announces a Workshop on

Neural Architectures and Distributed AI:
>From Schema Assemblages to Neural Networks
October 19-20, 1993

[This Workshop was previously 
scheduled for April 1993]

Program Committee: Michael Arbib (Organizer), 
George Bekey, Damian Lyons, Paul Rosenbloom, and 
Ron Sun

To design complex technological systems, we need a 
multilevel methodology which combines a coarse-
grain analysis of cooperative or distributed 
computation (we shall refer to the computing agents 
at this level as "schemas") with a fine-grain model 
of flexible, adaptive computation (for which neural 
networks provide a powerful general paradigm).  
Schemas provide a language for distributed 
artificial intelligence and perceptual robotics 
which is "in the style of the brain", but at a 
relatively high level of abstraction relative to 
neural networks.  We seek (both at the level of 
schema asemblages, and in terms of "modular" neural 
networks) a distributed model of computation, 
supporting many concurrent activities for 
recognition of objects, and the planning and 
control of different activities.  The use, 
representation, and recall of knowledge is mediated 
through the activity of a network of interacting 
computing agents which between them provide 
processes for going from a particular situation and 
a particular structure of goals and tasks to a 
suitable course of action.  This action may involve 
passing of messages, changes of state, 
instantiation to add new schema instances to the 
network, deinstantiation to remove instances, and 
may involve self-modification and self-
organization. Schemas provide a form of knowledge 
representation which differs from frames and 
scripts by being of a finer granularity.  Schema 
theory is generative: schemas may well be linked to 
others to provide yet more comprehensive schemas, 
whereas frames tend to "build in" from the overall 
framework.  The analysis of interacting computing 
agents (the schema instances) is intermediate 
between the overall specification of some behavior 
and the neural networks that subserve it.   The 
Workshop will focus on different facets of this 
multi-level methodology.  While the emphasis will 
be on technological systems, papers will also be 
accepted on  biological and cognitive systems.


Submission of Papers

A list of sample topics for contributions is as 
follows, where a hybrid approach means one in which 
the abstract schema level is integrated with neural 
or other lower level models:

Schema Theory as a description language for 
neural networks

Modular neural networks

Alternative paradigms for modeling symbolic 
and subsymbolic knowledge

Hierarchical and distributed representations: 
adaptation and coding:

Linking DAI to Neural Networks to Hybrid 
Architecture

Formal Theories of Schemas

Hybrid approaches to integrating planning & 
reaction

Hybrid approaches to learning 

Hybrid approaches to commonsense reasoning by 
integrating neural networks and rule-based reasoning 
(using schemas for the integration)

Programming Languages for Schemas and Neural 
Networks

Schema Theory Applied in Cognitive 
Psychology, Linguistics, and Neuroscience

Prospective contributors should send a five-page 
extended abstract, including figures with 
informative captions and full references - a hard 
copy, either by regular mail or fax - by 
August 30, 1993  to Michael Arbib, Center for 
Neural Engineering, University of Southern 
California, Los Angeles, CA 90089-2520, USA [Tel: 
(213) 740-9220, Fax: (213) 746-2863, 
arbib at pollux.usc.edu].   Please include your full 
address, including fax and email, on the paper. 

In accepting papers submitted in response to this 
Call for Papers, preference will be given to papers 
which present practical examples of, theory of, 
and/or methodology for the design and analysis of 
complex systems in which the overall specification 
or analysis is conducted in terms of a network of 
interacting schemas, and where some but not 
necessarily all of the schemas are implemented in 
neural networks.  Papers which present a single 
neural network for pattern recognition ("perceptual 
schema") or pattern generation ("motor schema") 
will not be accepted.  It is the development of a 
methodology to analyze the interaction of multiple 
functional units that constitutes the distinctive 
thrust of this Workshop.

Notification of acceptance or rejection will be 
sent by email no later than September 1, 1993.  
There are currently no plans to issue a formal 
proceedings of full papers, but (revised versions) 
of accepted abstracts received prior to October 1, 
1993 will be collected with the full text of the 
Tutorial in a CNE Technical Report which will be 
made available to registrants at the start of the 
meeting.


A number of papers have already been accepted for 
the Workshop.  These include the following:

Arbib: Schemas and Neural Networks: A Tutorial 
Introduction to Integrating Symbolic and 
Subsymbolic Approaches to Cooperative Computation

Arkin: Reactive Schema-based Robotic Systems: 
Principles and Practice

Heenskerk and Keijzer: A Real-time Neural 
Implementation of a Schema Driven Toy-Car

Leow and Miikkulainen, Representing and Learning 
Visual Schemas in Neural Networks for Scene 
Analysis

Lyons  & Hendriks: Describing and analysing robot 
behavior with schema theory

Murphy, Lyons & Hendriks: Visually Guided Multi-
Fingered Robot Hand Grasping as Defined by Schemas 
and a Reactive System

Sun: Neural Schemas and Connectionist Logic:  A 
Synthesis of the Symbolic and the Subsymbolic

Weitzenfeld: Hierarchy, Composition, Heterogeneity, 
and Multi-granularity in Concurrent Object-Oriented 
Programming for Schemas and Neural Networks

Wilson & Hendler: Neural Network Software Modules

Bonus Event: The CNE Research Review: 
Monday, October 18, 1993
The CNE Review will present a day-long sampling of 
CNE research, with talks by faculty, and students, 
as well as demos of hardware and software.  Special 
attention will  be paid to talks on, and demos in, 
our new Autonomous Robotics Lab and Neuro-Optical 
Computing Lab.  Fully paid registrants of the 
Workshop are entitled to attend the CNE Review at 
no extra charge.


Registration
The registration fee of $150 ($40 for qualified 
students who include a "certificate of student 
status" from their advisor) includes a copy of the 
abstracts, coffee breaks, and a dinner to be held 
on the evening of October 18th.

Those wishing to register should send a check 
payable to "Center for Neural Engineering, USC" for 
$150 ($40 for students and CNE members) together 
with the following information to Paulina Tagle, 
Center for Neural Engineering, University of 
Southern California, University Park, Los Angeles, 
CA 90089-2520, USA.


---------------------------------------------------
SCHEMAS AND NEURAL NETWORKS
Center for Neural Engineering, USC
October 19-20, 1993
NAME:  ___________________________________________ 
ADDRESS: _________________________________________ 
PHONE NO.: _______________ 
FAX:___________________ 
EMAIL: ___________________________________________
I intend to submit a paper: YES  [   ]      NO   [   ]

I wish to be registered for the CNE Research 
Review: YES  [   ]      NO   [   ]


Accommodation
Attendees may register at the hotel of their 
choice, but the closest hotel to USC is the 
University Hilton, 3540 South Figueroa Street, Los 
Angeles, CA 90007, Phone:  (213) 748-4141, 
Reservation: (800) 872-1104, Fax: (213) 7480043.  A 
single room costs $70/night while a double room 
costs $75/night.  Workshop participants must 
specify that they are "Schemas and Neural Networks 
Workshop" attendees to avail of the above rates.   
Information on student accommodation may be 
obtained from the Student Chair, Jean-Marc Fellous, 
fellous at pollux.usc.edu.


From sims at pdesds1.scra.org  Mon Aug  9 07:39:31 1993
From: sims at pdesds1.scra.org (Jim Sims)
Date: Mon, 9 Aug 93 07:39:31 EDT
Subject: [fwd: jbeard@aip.org: bee learning in Nature]
Message-ID: <9308091139.AA02487@pdesds1.noname>

 Some cross-disciplinary, ah, pollination.

jim


From greiner at learning.siemens.com  Mon Aug  9 14:59:26 1993
From: greiner at learning.siemens.com (Russell Greiner)
Date: Mon, 9 Aug 93 14:59:26 EDT
Subject: CLNL'93 Schedule
Message-ID: <9308091859.AA05371@eagle.siemens.com>

	***********************************************************
	* CLNL'93 -- Computational Learning and Natural Learning  *
	*		Provincetown, Massachusetts		  *
	*	 	  10-12 September 1993			  *
	***********************************************************

CLNL'93 is the fourth of an ongoing series of workshops designed to bring
together researchers from a diverse set of disciplines --- including
  computational learning theory, AI/machine learning, 
  connectionist learning, statistics, and control theory ---
to explore issues at the intersection of theoretical learning research
and natural learning systems.

The schedule of presentations appears below, followed by logistics and
information on registration

================  ** CLNL'93 Schedule (tentative) **  =======================

Thursday 9/Sept/93:
    6:30-9:00 (optional) Ferry (optional): Boston to Provincetown
	[departs Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue]

Friday  10/Sept/93  [CLNL meetings, at Provincetown Inn]
    9   - 9:15	Opening remarks
    9:15-10:15	Scaling Up Machine Learning:  Practical and Theoretical Issues
			Thomas Dietterich [Oregon State Univ] 
			(invited talk, see abstract below)
		
   10:30-12:30	Paper session 1
	What makes derivational analogy work: an experience report using APU
		Sanjay Bhansali [Stanford]; Mehdi T. Harandi [Univ of Illinois]
	Scaling Up Strategy Learning:  A Study with Analogical Reasoning
		Manuela M. Veloso [CMU]
	Learning Hierarchies in Stochastic Domains
		Leslie Pack Kaebling [Brown]
	Learning an Unknown Signalling Alphabet
		Edward C. Posner, Eugene R. Rodemich [CalTech/JPL]

   12:30- 2	Lunch (on own)

	Unscheduled TIME
	( Whale watching, beach walking, ...		)
	( Poster set-up time; Poster preview (perhaps)	)

		Dinner (on own)

    7 - 10	Poster Session	[16 posters]
   		(Hors d'oeuvres)
	Induction of Verb Translation Rules from Ambiguous Training and a
			Large Semantic Hierarchy
		Hussein Almuallim, Yasuhiro Akiba, Takefumi Yamazaki, Shigeo Kaneda
		[NTT Network Information Systems Lab.]
	What Cross-Validation Doesn't Say About Real-World Generalization
		Gunner Blix, Gary Bradshaw, Larry Rendall [Univ of Illinois]
	Efficient Learning of Regular Expressions from Approximate Examples
		Alvis Brazma [Univ of Latvia]
	Capturing the Dynamics of Chaotic Time Series by Neural Networks
		Gurtavo Deco, Bernd Schurmann [Siemens AG]
	Learning One-Dimensional Geometrical Patterns Under One-Sided Random
			Misclassification Noise 
		Paul Goldberg [Sandia National Lab]; Sally Goldman [Washington Univ]
	Adaptive Learning of Feedforward Control Using RBF Network ...
		Dimitry M Gorinevsky	[Univ of Toronto]
	A practical approach for evaluating generalization performance
		Marjorie Klenin [North Carolina State Univ]
	Scaling to Domains with Many Irrelevant Features
		Pat Langley, Stephanie Sage  [Siemens Corporate Research]
	Variable-Kernel Similarity Metric Learning
		David G. Lowe [Univ British Columbia]
	On-Line Training of Recurrent Neural Networks with Continuous
			Topology Adaptation  
		Dragan Obradovic [Siemens AG]
	N-Learners Problem:  System of PAC Learners
		Nageswara Rao, E.M. Oblow [Engineering Systems/Advanced Research]
	Soft Dynamic Programming Algorithms:  Convergence Proofs
		Satinder P. Singh	[Univ of Mass]
	Integrating Background Knowledge into Incremental Concept Formation
		Leon Shklar [Bell Communications Research]; Haym Hirsh [Rutgers]
	Learning Metal Models
		Astro Teller [Stanford]
	Generalized Competitive Learning and then Handling of Irrelevant Features
		Chris Thornton	[Univ of Sussex]
	Learning to Ignore:  Psychophysics and Computational Modeling of Fast
			Learning of Direction in Noisy Motion Stimuli
		Lucia M. Vaina [Boston Univ], John G. Harris [Univ of Florida]

Saturday 11/Sept/93 [CLNL meetings, at Provincetown Inn]
    9:00-10:00	Current Tree Research 
			Leo Breiman [UCBerkeley]
			(invited talk, see abstract below)

   10:30-12:30	Paper session 2
	Initializing Neural Networks using Decision Trees
		Arunava Banerjee [Rutgers]
	Exploring the Decision Forest
		Patrick M. Murphy, Michael Pazzani [UC Irvine]
	What Do We Do When There Is Outrageous Data Points in the Data Set? -
			Algorithm for Robust Neural Net Regression 
		Yong Liu [Brown]
	A Comparison of RBF and MLP Networks for Classification of 
			Biomagnetic Fields 
		Martin F. Schlang, Ralph Neunier, Klaus Abraham-Fuchs [Siemens AG]

   12:30- 2	Lunch (on own)

    2:30- 3:30	TBA (invited talk)
			Yann le Cun [ATT]

    4:00- 6:00	Paper session 3
	On Learning the Neural Network Architecture: An Average Case Analysis
		Mostefa Golea [Univ of Ottawa]
	Fast (Distribution Specific) Learning
		Dale Schuurmans  [Univ of Toronto]
	Computational capacity of single neuron models
		Anthony Zador  [Yale Univ School of Medicine]
	Probalistic Self-Structuring and Learning
		A.D.M. Garvin, P.J.W. Rayner [Cambridge]

    7:00- 9	Banquet dinner

Sunday 12/Sept/93  [CLNL meetings, at Provincetown Inn]
    9   -11	Paper session 4
	Supervised Learning from real and Discrete Incomplete Data
		Zoubin Ghaharamani, Michael Jordan [MIT]
	Model Building with Uncertainty in the Independent Variable
		Volker Tresp, Subutai Ahmad, Ralph Neuneier	[Siemens AG]
	Supervised Learning using Unclassified and Classified Examples
		Geoff Towell [Siemens Corp. Res.]
	Learning to Classify Incomplete Examples
		Dale Schuurmans [Univ of Toronto]; R. Greiner [Siemens Corp. Res.]

   11:30 -12:30	TBA (invited talk)
			Ron Rivest [MIT] 

   12:30 - 2	Lunch (on own)

   3:30 - 6:30	Ferry (optional): Provincetown to Boston
	    Depart from Boston (on own)

------ ------
    Scaling Up Machine Learning:  Practical and Theoretical Issues
                                   
                         Thomas G. Dietterich
                     Oregon State University and
                   Arris Pharmaceutical Corporation
                                   

Supervised learning methods are being applied to an ever-expanding
range of problems.  This talk will review issues arising in these
applications that require further research.  The issues can be
organized according to the problem-solving task, the form of the
inputs and outputs, and any constraints or prior knowledge that must
be considered.  For example, the learning task often involves
extrapolating beyond the training data in ways that are not addressed
in current theory or engineering experience.  As another example, each
training example may be represented by a disjunction of feature
vectors, rather than a unique feature vector as is usually assumed.
More generally, each training example may correspond to a manifold of
feature vectors.  As a third example, background knowledge may take
the form of constraints that must be satisfied by any hypothesis
output by a learning algorithm.  The issues will be illustrated using
examples from several applications including recent work in
computational drug design and ecosystem modelling.

--------
		Current Tree Research

		   Leo Breiman 
	     Deptartment of Statistics
         University of California, Berkeley

This talk will summarize current research by myself and collaborators
into methods of enhancing tree methodology.  The topics covered will be:

1)  Tree optimization
2)  Forming features
3)  Regularizing trees
4)  Multiple response trees
5)  Hyperplane trees

These research areas are in a simmer.  They have been programmed and
are undergoing testing.  The results are diverse.  

--------
--------

Programme Committee:
  Andrew Barron, Russell Greiner, Tom Hancock, Steve Hanson, Robert Holte, 
  Michael Jordan, Stephen Judd, Pat Langley, Thomas Petsche, Tomaso Poggio,
  Ron Rivest, Eduardo Sontag, Steve Whitehead 

Workshop Sponsors:
  Siemens Corporate Research	and	MIT Laboratory of Computer Science

================  ** CLNL'93 Logistics **  =======================

Dates:
  The workshop begins at 9am Friday 10/Sept, and concludes by 3pm 
   Sunday 12/Sept, in time to catch the 3:30pm Provincetown--Boston ferry.

Location:  
  All sessions will take place in the Provincetown Inn (800 942-5388); we
  encourage registrants to stay there.   Provincetown Massachusetts is located
  at the very tip of Cape Cod, jutting into the Atlantic Ocean. 

Transportation: 
  We have rented a ship from  The Portuguese Princess  to transport CLNL'93
  registrants from Boston to Provincetown on Thursday 9/Sept/93, at no charge
  to the registrants.  We will also supply light munchies en route.  This ship
  will depart from the back of Boston Harbor Hotel, 70 Rowes Wharf on Atlantic
  Avenue (parking garage is 617 439-0328); tentatively at 6:30pm.
  If you are interested in using this service, please let us know ASAP (via
  e-mail to clnl93 at learning.scr.siemens.com) and  also tell us whether you be
  able to make the scheduled 6:30pm departure.

  (N.b., this service replaces the earlier proposal, which involved the
  Bay State Cruise Lines.)

  The drive from Boston to Provincetown requires approximately two hours.
  There are cabs, busses, ferries and commuter airplanes (CapeAir, 800 352-0714)
  that service this Boston--Provincetown route.
  The Hyannis/Plymouth bus (508 746-0378) leaves Logan Airport at 8:45am,
  11:45am, 2:45pm, 4:45pm on weekdays, and arrives in Provincetown about 
  4 hours later; its cost is $24.25.
  For the return trip (only), Bay State Cruise Lines (617 723-7800) runs a
  ferry that departs Provincetown at 3:30pm on Sundays, arriving at
  Commonwealth Pier in Boston Harbor at 6:30pm; its cost is $15/person, one way. 

Inquiries:
  For additional information about CLNL'93, contact 
	 clnl93 at learning.scr.siemens.com
  or
	 CLNL'93 Workshop
         Learning Systems Department
	 Siemens Corporate Research
	 755 College Road East
	 Princeton, NJ 08540--6632

To learn more about Provincetown, contact their 
	Chamber of Commerce at 508 487-3424.


================  ** CLNL'93 Registration **  =======================

Name:		________________________________________________
Affiliation:	________________________________________________
Address:	________________________________________________
		________________________________________________
Telephone: ____________________  	E-mail: ____________________

Select the appropriate options and fees:

Workshop registration fee	 ($50 regular; $25 student) 	   ___________
  Includes
    * attendance at all presentation and poster sessions
    * the banquet dinner on Saturday night; and
    * a copy of the accepted abstracts.

Hotel room			 ($74 = 1 night deposit)	   ___________
  [This is at the Provincetown Inn, assuming a minimum stay of 
   2 nights.   The total cost for three nights is $222 = $74 x 3, 
   plus optional breakfasts. 
   Room reservations are accepted subject to availability.  
   See hotel for cancellation policy.]

	Arrival date ___________    Departure date _____________
	Name of person sharing room (optional)  __________________
	  [Notice the $74/night does correspond to $37/person per
	   night double-occupancy, if two people share one room.]
	# of breakfasts desired ($7.50/bkfst; no deposit req'd) ___

Total amount enclosed:						   ___________


If you are not using a credit card, make your check payable in U.S. dollars
to "Provincetown Inn/CLNL'93", and mail your completed registration form to 
	Provincetown Inn/CLNL
	P.O. Box 619
	Provincetown, MA 02657.
If you are using Visa or MasterCard, please fill out the following,
which you may mail to above address, or FAX to 508 487-2911.
	Signature:	    ______________________________________________
	Visa/MasterCard #:  ______________________________________________
	Expiration:         ______________________________________________


From bill at nsma.arizona.edu  Mon Aug  9 17:00:59 1993
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Mon, 09 Aug 1993 14:00:59 -0700 (MST)
Subject: [fwd: jbeard@aip.org: bee learning in Nature]
Message-ID: <9308092100.AA10510@nsma.arizona.edu>


This is a very interesting piece of work, but the "news release" is
overblown and historically ignorant.  The connection between mushroom
bodies and learning has been known for a long time.  There is also
direct evidence for changes in the structure of the mushroom bodies as
a result of experience:  Coss and Perkel over a decade ago found
changes in the length of dendritic spines after honeybees went on a
single exploratory flight.  This is much more direct than the evidence
described in the "news release".
 
Contrary to the claims in the "news release", these new results are
unlikely to tell us much about human learning.  It is not true that the
honeybee brain is merely a simpler version of the human brain.  They're
completely different -- even the neurons are different in structure.
Also insect learning and mammal learning are qualitatively different:
for example, both honeybees and mammals can learn to navigate to a
location using landmarks, but honeybees do it by simple visual
pattern-matching, while mammals use considerably more sophisticated
algorithms.  

Furthermore, it is not news that experience can lead to an increase in
the number of connections.  It has long been known that mammals raised
in an enriched environment have thicker cortices, due to a greater
density of synaptic structures.  Surely this is more directly relevant
to humans than data from honeybees could be.

It's a shame to obscure a nice piece of work by making bogus claims
about its significance.

	-- Bill


From dhw at santafe.edu  Tue Aug 10 15:58:41 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 10 Aug 93 13:58:41 MDT
Subject: Provable optimality of averaging generalizers
Message-ID: <9308101958.AA15514@zia>


Michael Perrone writes:


>>>
In the case of averaging for MSE optimization (the meat and potatoes of 
neural networks) and any other convex measure, the improvement due
to averaging is independent of the distribution - on-training or off-.
It depends only on the topology of the optimization measure. 

It is important to note that this result does NOT say the average is
better than any individual estimate - only better than the average
population performance.  For example, if one had a reliable selection
criterion for deciding which element of a population of estimators was
the best and that estimator was better than the average estimator,
then just choose the better one. (Going one step further, simply use
the selection criterion to choose the best estimator from all possible
weighted averages of the elements of the population.) As David Wolpert
pointed out, any estimator can be confounded by a pathological data
sample and therefore there doesn't exist a *guaranteed* method for
deciding which estimator is the best from a population in all cases.
Weak (as opposed to guaranteed) selection criteria exist in in the
form of cross-validation (in all of its flavors).  Coupling
cross-validation with averaging is a good idea since one gets the best
of both worlds particularly for problems with insufficient data.

I think that another very interesting direction for research (as David Wolpert
alluded to) is the investigation of more reliable selection criterion.
>>>


***

Well, I agree with the second two paragraphs, but not the first. At
least not exactly as written. Although Michael is making an
interesting and important point, I think it helps to draw attention to
some things:

I) First, I haven't yet gone though Michael's work in detail, but it
seems to me that the "measures" Michael is referring to really only
make sense as real-world cost functions (otherwise known as loss
functions, sometimes as risk functions, etc.). Indeed many very
powerful learning algorithms (e.g., memory based reasoning) are not
directly cast as finding the minimum on an energy surface, be it
"convex" or otherwise. For such algorithms, "measures" come in with
the cost function.

In fact, *by definition*, one is only interested in such real world
cost - results concerning anything else do not concern the primary
object of interest.

With costs, an example of a convex surface is the quadratic cost
function, which says that given truth f, your penalty for guessing h
is given by the function (f - h)^2. For such a cost, Michael's result
holds essentially because by guessing the average you reduce variance
but keep the same bias (as compared to the average over all
guesses). In other words, it holds because for any f, h1, and h2, [(h1
+ h2)/2 - f)]^2 <= [(h1 - f)^2 + (h2 - f)^2] / 2. (When f, h1, and h2
refer to distributions rather than single values, as Michael rightly
points out, you have to worry about other issues before making this
statement, like whether the distributions are correlated with one
another.)

***

It should be realized though that there are many non-convex cost
functions in the real world. For example, when doing classification,
one popular cost function is zero-one. This function says you get credit
for guessing exactly correctly, and if you miss, it doesn't matter
what you guessed; all misses "cost you" the same.

This cost function is implicit in much of PAC, stat. mech. of
learning, etc. Moreover, in Bayesian decision theory, guessing the
weights which maximize the posterior probability P(weights | data)
(which in the Bayesian perspective of neural nets is exactly what is
done in backprop with weight decay) is the optimal strategy only for
this zero-one cost.

Now if we take this zero-one cost function, and evaluate it only off
the training set, it is straight-forward to prove that for a uniform
Pr(target function), the probability of a certain value of cost, given
data, is independent of the learning algorithm. (The same result holds
for other cost functions as well, though as Michael points out, you
must be careful in trying to extend this result to convex cost
functions.)

This is true for any data set, i.e., it is not based on "pathological
data", as Michael puts it. It says that unless you can rule out a
uniform Pr(target function), you can not prove any one algorithm to be
superior to any other (as far as this cost function is concerned).


***

II) Okay. Now Michael correctly points out that even in those cases w/
a convex cost "measure", you must interpret his result with caution. I
agree, and would say that this is somewhat like the famous "two
letters" paradox of probability theory. Consider the following:

1) Say I have 3 real numbers, A, B, and X. In general, it's always
true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} /
2. (This is exactly analogous to having the cost of the average guess
bounded above by the average cost of the individual guesses.)

2) This means that if we had a choice of either randomly drawing one
of the numbers {A, B}, or drawing C, that on average drawing C would
give smaller quadratic cost with respect to X.

3) However, as Michael points out, this does *not* mean that if we had
just the numbers A and C, and could either draw A or C, that we should
draw C. In fact, point (1) tells us nothing whatsoever about whether A
or C is preferable (as far as quadratic cost with respect to X is
concerned).

4) In fact, now create a 5th number, D = [C + A] / 2. By the same
logic as in (1), we see that the cost (wrt/ X) of D is less than the
average of the costs of C and A. So to the exact same degree that (1)
says we "should" guess C rather than A or B, it also says we should
guess D rather than A or C. (Note that this does *not* mean that D's
cost is necessarily less than C's though; we don't get endlessly
diminishing costs.)

5) Step (4) can be repeated ad infinitum, getting a never-ending
sequence of "newly optimal" guesses. In particular, in the *exact*
sense in which C is "preferable" to A or B, and therefore should
"replace" them, D is preferable to A or B, and therefore should
replace *them* (and in particular replace C). So one is never left
with C as the object of choice.

***

So (1) isn't really normative; it doesn't say one "should" guess the
average of a bunch of guesses:

7) Choosing D is better than randomly choosing amongst C or A, just as
   choosing C is better than randomly choosing amongst A or B.

8) This doesn't mean that given C, one should introduce an A and
then guess the average of C and A (D) rather than C, just as
   this doesn't mean that given A, one should introduce a B and 
then guess the average of A and B (C) rather than A.

9) An analogy which casts some light on all this: view A and B not
as the outputs of separate single-valued learning algorithms, but rather
as the random outputs of a single learning algorithm. Using this analogy,
the result of Michael's, that one should always guess C rather than 
randomly amongst A or B, suggest that one should always use a
deterministic, single-valued learning algorithm (i.e., just guess C) 
rather than one that guesses randomly from a distribution over
possible guesses (i.e., one that guess randomly amongst A or B). 

This implication shouldn't surprise anyone familiar with Bayesian
decision theory. In fact, it's (relatively) straight-forward to prove
that independent of priors or the like, for a convex cost function,
one should always use a single-valued learning algorithm rather than
one which guesses randomly. (This has probably been proven many
times. One proof can be found in Wolpert and Stolorz, On the
implementation of Bayes optimal generalizers, SFI tech. report
92-03-012.)
(Blatant self-promotion: Other interesting things proven in that
report and others in its series are: there are priors and noise
processes such that the expected cost, given the data set and that one
is using a Bayes-optimal learning algorithm, can *decrease* with added
noise; if the cost function is a proper metric, then the magnitude of
the change in expected cost if one guesses h rather than h' is bounded
above by the cost of h relative to h'; other results about using
"Bayes-optimal" generalizers predicated on an incorrect prior, etc., etc.)

***

The important point is that although it is both intriguing and
illuminating, there are no implications of Michael's result for what
one should do with (or in place of) a particular deterministic,
single-valued learning algorithm.  It was for such learning algorithms
that my original comments were intended.


David Wolpert


From dhw at santafe.edu  Tue Aug 10 16:29:07 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 10 Aug 93 14:29:07 MDT
Subject: MacKay's recent work and feature selection
Message-ID: <9308102029.AA15554@zia>


Recently David MacKay made a posting concerning a technique he used to
win an energy prediction competition. Parts of that technique have
been done before (e.g., combining generalizers via validation set
behavior). However other parts are both novel and very interesting.
This posting concerns the "feature selection" aspect of his technique,
which I understand MacKay developed in association w/ Radford Neal.
(Note: MacKay prefers to call the technique "automatic relevance
determination"; nothing I'll discuss here will be detailed enough for
that distinction to be important though.)

What I will say grew out of conversations w/ David Rosen and Tom
Loredo, in part. Of course, any stupid or silly aspects to what I will
say should be assumed to originate w/ me.

***

Roughly speaking, MacKay implemented feature selection in a neural net
framework as follows:

1) Define a potentially different "weight decay constant" (i.e.,
regularization hyperparameter) for each input neuron. The idea is
that one wants to have those constants set high for input neurons
representing "features" of the input vector which it behooves us to
ignore.

2) One way to set those hyperparameters would be via a technique like
cross-validation. MacKay instead set them via maximum likelihood,
i.e., he set the weight decay constants alpha_i to those values
maximizing P(data | alpha_i). Given a reasonably smooth prior
P(alpha_i), this is equivalent to finding the maximum a posterior
(MAP) alpha_i, i.e., the alpha_i maximizing P(alpha_i | data).

3) Empirically, David found that this worked very well. (I.e., he won
the competition.)

***

This neat idea makes some interesting suggestions:

1) The first grows out of "blurring" the distinction between
parameters (i.e., weights w_j) and hyperparameters (the
alpha_i). Given such squinting, MacKay's procedure amounts to a sort
of "greedy MAP". First he sets one set of parameters to its MAP values
(the alpha_i), and then with those values fixed, he sets the other
parameters (the w_j) to their MAP values (this is done via the usual
back-propagation w/ weight-decay, which we can do since the first
stage set the weight decay constants). In general, the resultant
system will not be at the global MAP maximizing P(alpha_i, w_j | D).
In essence, a sort of extra level of regularization has been
added. (Note: Radford Neal informs me that calculationally, in the
procedure MacKay used, the second MAP step is "automatic", in the
sense that one has already made the necessary calcualtions to perform
that step when one carries out the first MAP step.)

Of course, this viewing the technique from a "blurred" perspective is
a bit of a fudge, since hyperparameters are not the same thing as
parameters. Nonetheless, this view suggests some interesting new
techniques. E.g., first set the weights leading to hidden layer 1 to
their MAP values (or maximum likelihood values, for that matter). Then
with those values fixed, do the same to the weights in the second
layer, etc.  Another reason to consider this layer-by-layer technique
is the fact that training of the weights connecting different layers
should in general be distinguishable, e.g., as MacKay has pointed out,
one should have different weight-decay constants for the different
layers.

2) Another interesting suggestion comes from justifying the technique
not as a priori reasonable, but rather as an approximation to a full
"hierarchical" Bayesian technique, in which one writes

P(w_j | data) (i.e., the ultimate object of interest)
	prop. to
integral d_alpha_i P(data | w_j alpha_i) P(w_j | alpha_i) P(alpha_i).

Note that all 3 distributions occuring in this integrand must be set
in order to use MacKay's technique. (The by-now-familiar object of
contention between MacKay and myself is on how generically this
approximation will be valid, and whether one should explicitly test
its validity when one claims that it holds. This issue isn't pertinent
to the current discussion however.)

Let's assume the approximation is very good. Then under the
assumptions:
i) P(alpha_i) is flat enought to be ignored;
ii) the distribution P(w_j | alpha_i) is a product of gaussians (each
gaussian being for those w_j connecting to input neuron i, i.e., for
those weights using weight decay constant alpha_i);

then what MacKay did is equivalent to back-propagation with
weight-decay, where rather than minimizing

{training set error} + constant x {sum over all j (w_j)^2},

as in conventional weight decay, MacKay is minimizing (something like)

{training set error } +
   {(sum over i) [ (number of weights connecting to neuron i)  x
	   ln [(sum over j; those weights connecting to neuron i) (w_j)^2] ]}.

What's interesting about this isn't so much the logarithm in the
"weight decay" term, but rather the fact that weights are being
clumped together in that weight-decay term, into groups of those
weights connecting to the same neuron. (This is not true in
conventional weight decay.) So in essence, the weight-decay term in
MacKay's scenario is designed to affect all the weights connecting to a
given neuron as a group. This makes intuitive sense if the goal is
feature selection.

3) One obvious idea based on viewing things this way is to try to
perform weight-decay using this modified weight-decay term. This might
be reasonable even if MacKay's technique is not a good approximation
to this full Bayesian technique.

4) The idea of MacKay's also leads to all kinds of ideas about how to
set up the weight-decay term so as to enforce feature selection (or
automatic relevance determination, if you prefer). These need not have
anything to do w/ the precise weight-decay term MacKay used; rather
the idea is simply to take his (implicit) suggestion of trying to do
feature selection via the weight-decay term, and see where it leads.

For example: Where originally we have input neurons at layer 1, hidden
layers 2 through n, and then output neurons at layer n+1, now have the
same architecture with an extra "pre-processing" layer 0 added. Inputs
are now fed to the neurons at layer 0. For each input neuron at layer
0, there is one and only weight, leading straight up to the neuron at
layer 1 which in the original formulation was the (corresponding)
input neuron.

The hope would be that for those input neurons which we "should" mostly
ignore, something like backprop might set the associated weights from
layer 0 to layer 1 to very small values.


David Wolpert


From rsun at athos.cs.ua.edu  Tue Aug 10 17:33:56 1993
From: rsun at athos.cs.ua.edu (Ron Sun)
Date: Tue, 10 Aug 1993 16:33:56 -0500
Subject: No subject
Message-ID: <9308102133.AA22967@athos.cs.ua.edu>


                            CALL  FOR   PAPERS


   International Symposium on Integrating Knowledge and Neural Heuristics
			      (ISIKNH'94)

Sponsored by University of Florida, and AAAI, 
in cooperation with IEEE Neural Network Council,
and Florida AI Research Society.

Time: May 9-10 1994; Place: Pensacola Beach, Florida, USA.


A large amount of research has been directed
toward integrating neural and symbolic methods in recent years. 
Especially, the integration of knowledge-based principles and
neural heuristics holds great promise
in solving complicated real-world problems.
This symposium will provide a forum for discussions
and exchanges of ideas in this area. The objective of this symposium
is to bring together researchers from a variety of fields
who are interested in applying neural network techniques
to augmenting existing knowledge or proceeding the other way around,
and especially, who have demonstrated that this combined approach 
outperforms either approach alone. 
We welcome views of this problem from
areas such as constraint-(knowledge-) based learning and
reasoning, connectionist symbol processing,
hybrid intelligent systems, fuzzy neural networks,
multi-strategic learning, and cognitive science.

Examples of specific research include but are not limited to:
1. How do we build a neural network based on {\em a priori}
knowledge (i.e., a knowledge-based neural network)?
2. How do neural heuristics improve the current model
for a particular problem (e.g., classification, planning,
signal processing, and control)?
3. How does knowledge in conjunction with neural heuristics
contribute to machine learning?
4. What is the emergent behavior of a hybrid system?
5. What are the fundamental issues behind the combined approach?

Program activities include keynote speeches, paper presentation,
and panel discussions.

*****
Scholarships are offered to assist students in attending the
symposium.  Students who wish to apply for a scholarship should send
their resumes and a statement of how their researches are related
to the symposium. 
*****


Symposium Chairs:
LiMin Fu, University of Florida, USA. 
Chris Lacher,  Florida State University, USA. 

Program Committee: 
Jim Anderson,   Brown University,  USA 
Michael Arbib,  University of Southern California,  USA 
Fevzi Belli,  The University of Paderborn,  Germany 
Jim Bezdek,  University of West Florida,  USA 
Bir Bhanu,  University of California,  USA  
Su-Shing Chen,  National Science Foundation,  USA 
Tharam Dillon,  La Trobe University,  Australia 
Douglas Fisher,  Vanderbilt University,  USA  
Paul Fishwick,  University of Florida,  USA 
Stephen Gallant,  HNC Inc.,  USA 
Yoichi Hayashi,  Ibaraki University,  Japan 
Susan I. Hruska,  Florida State University,  USA 
Michel Klefstad-Sillonville  CCETT,  France 
David C. Kuncicky,  Florida State University,  USA 
Joseph Principe,  University of Florida,  USA 
Sylvian Ray,  University of Illinois,  USA 
Armando F. Rocha, University of Estadual, Brasil 
Ron Sun,  University of Alabama,  USA 

Keynote Speaker: Balakrishnan Chandrasekaran, Ohio-State University 


Schedule for Contributed Papers
----------------------------------------------------------------------
Paper Summaries Due: December 15, 1993 
Notice of Acceptance Due: February 1, 1994 
Camera Ready Papers Due: March 1, 1994

Extended paper summaries should be 
limited to four pages (single or double-spaced)
and should include the title, names of the authors, the
network and mailing addresses and telephone number of the corresponding
author.  Important research results should be attached. 
Send four copies of extended paper summaries to 

      LiMin Fu 
      Dept. of CIS, 301 CSE 
      University of Florida 
      Gainesville, FL 32611 
      USA 
      (e-mail: fu at cis.ufl.edu; phone: 904-392-1485).

Students' applications for a scholarship should also be sent
to the above address.

General information and registration materials can be obtained by
writing to 

      Rob Francis 
      ISIKNH'94 
      DOCE/Conferences 
      2209 NW 13th Street, STE E 
      University of Florida 
      Gainesville, FL 32609-3476 
      USA 
      (Phone: 904-392-1701; fax: 904-392-6950)
---------------------------------------------------------------------


---------------------------------------------------------------------
If you intend to attend the symposium, you may submit the following
information by returning this message:


NAME: _______________________________________
ADDRESS: ____________________________________
_____________________________________________
_____________________________________________
_____________________________________________
_____________________________________________
PHONE: ______________________________________
FAX: ________________________________________
E-MAIL: _____________________________________


---------------------------------------------------------------------


From ld231782 at longs.lance.colostate.edu  Wed Aug 11 00:56:26 1993
From: ld231782 at longs.lance.colostate.edu (L. Detweiler)
Date: Tue, 10 Aug 93 22:56:26 -0600
Subject: neuroanatomy list ad & more on bee brains
In-Reply-To: Your message of "Mon, 09 Aug 93 14:00:59 PDT."
             <9308092100.AA10510@nsma.arizona.edu> 
Message-ID: <9308110456.AA06912@longs.lance.colostate.edu>

While many on this list will not be interested in the details of
bee-brain neuroanatomy or arguments thereon, an excellent list for
discussions of this can be requested from
cogneuro-request at ptolemy.arc.nasa.gov, maintained by Kimball Collins
<kpc at ptolemy.arc.nasa.gov>. The list has fairly low volume although
definitely more than connectionists, and I'd like to encourage any of
this amazingly literate connectionist crowd with a strong interest in
neurobiological research to subscribe (recent/past topics: neurobiology
of rabies infections, Hebb's rule, vision, dyslexia, etc.)

* * *

Mr. Skaggs writes an exceedingly hostile flame (a redundant phrase) on
the recent syndicated news article describing research into bee
function and neuroanatomy, calling it `overblown and historically
ignorant'. While I don't have as close of a background to the area in
question as Mr. Skaggs appears to, this is just a short note to balance
the scale a little closer to equilibrium.

The critical feature that I see going on here is a professional
scientist demeaning a non-detailed popular account of scientific work,
esp. in that person's area of expertise, for lapses in precise
description. This happens all the time, of course, both the presence of
the quasi-skewed material and the criticism. Definitely, the article
was the overwrought cheerleeding type, rather stereotypical, but Mr.
Skaggs, on the other hand, plays into the cliche of the pessimistic and
sour curmudgeon-scientist in attacking it.

I'd like to point out that this popular literature serves a very useful
purpose in keeping the lay public apprised of new developments in
scientific fields and, ultimately, encouraging funding. It is not fair
to apply the strict scientific standard of evaluation to something that
appears in the popular press. In this case, there is no significant
error, and the purpose is served in being `approximately correct', and
there is no point to rebutting it. We are bound to lose something in
the translation, and the major points of disagreement are likely to be
over opinion. We should instead be highly encouraged and appreciative
of these attempts to bring increasingly abstruse and technical science
to the interested layman.

I appreciate the popular press to some degree in that it forces
scientists to get at the essence of their research, something they
sometimes lose sight of. The scientist (perhaps the neuroscientist in
particular) is forever saying `it's not quite that simple' or `it
doesn't quite happen like that' or `there are exceptions to that' to
the point that an outsider can give up in frustration, thinking that it
is nothing but a disconnected morass with no underlying message or
cohesion. The general press usually gives a close and fascinating view
into what the `big picture' is. Looking at reporters as nothing but
clueless intruders is a somewhat self-destructive position, IMHO. And
yes, the grandiose statements like `will shed insight into human
learning' can be recognized by other scientists as the necessary fodder
and not criticized but ignored.

Now, to address a few points:

>Coss and Perkel over a decade ago found
>changes in the length of dendritic spines after honeybees went on a
>single exploratory flight.  This is much more direct than the evidence
>described in the "news release".
 
Incidentally, the changes in dendritic growth with learning are IMHO
one of the most fascinating studies of plasticity, and on the cutting
edge of current research, and perhaps others will wish to post
references. (The classic study showed that rats reared in deprived vs.
abundant sensory-stimulii containing environments had less or more
growth, respectively.)

>It is not true that the
>honeybee brain is merely a simpler version of the human brain.  They're
>completely different -- even the neurons are different in structure.

definitely, any animal model always has minor or major imperfections
and pitfalls. But this brings up an interesting point--is there an
analogue to LTP in the insect brain? there is probably at least a
degree of overlap in the kinds of neurotransmitters involved.

However, arguing against the relevance, superiority, and verisimilitude
of one animal model vs. another can turn into a very emotional debate,
and should be engaged with the utmost delicacy or statements come out
with a connotation much like `the car you drive all day is worthless'.


From delliott at src.umd.edu  Wed Aug 11 17:52:44 1993
From: delliott at src.umd.edu (David L. Elliott)
Date: Wed, 11 Aug 1993 17:52:44 -0400
Subject: Call for papers, NeuroControl book
Message-ID: <199308112152.AA04995@newra.src.umd.edu>

		  PROGRESS IN NEURAL NETWORKS
		  series Editor O. M. Omidvar
Special Volume:
NEURAL NETWORKS FOR CONTROL
Editor: David L. Elliott

CALL FOR PAPERS


Original manuscripts describing recent progress in neural 
networks research directly applicable to Control or making use 
of modern control theory. Manuscripts  may be survey or 
tutorial in nature. Suggested topics for this book are:

	%New directions in neurocontrol

	%Adaptive control

	%Biological control architectures

	%Mathematical foundations of control

	%Model-based control with learning capability 

	%Natural neural control systems

	%Neurocontrol hardware research

	%Optimal control and incremental dynamic programming

	%Process control and manufacturing

	%Reinforcement-Learning Control

	%Sensor fusion and vector quantization

	%Validating neural control systems


The papers will be refereed and uniformly typeset. Ablex and the Progress 
Series editors invite you to submit an abstract, extended summary or 
manuscript proposal, directly to the Special Volume Editor:

Dr. David L. Elliott, Institute for Systems Research
University of Maryland, College Park, MD 20742	
Tel: (301)405-1241   FAX (301)314-9920 
Email: DELLIOTT at SRC.UMD.EDU

      or to the Series Editor:
Dr. Omid M. Omidvar, Computer Science Dept., 
University of the District of Columbia, Washington DC 20008
Tel: (202)282-7345   FAX: (202)282-3677  
Email: OOMIDVAR at UDCVAX.BITNET
The Publisher is Ablex Publishing Corporation, Norwood, NJ


From pittman at mcc.com  Thu Aug 12 08:38:30 1993
From: pittman at mcc.com (Jay Pittman)
Date: Thu, 12 Aug 93 08:38:30 EDT
Subject: neuroanatomy list ad & more on bee brains 
Message-ID: <9308121338.AA14022@gluttony.mcc.com>


Excellent note, well stated.  I agree with everything
Detweiler said about the press.

On the other hand, when I originally read Bill Skaggs note
I didn't think he was being all that critical.  I went back
and looked at it again, and, yes, he does sound like a real
flamer, WHEN I START OUT ASSUMING THAT.  One can also read
it as a calmly-stated critique of the article.  I find
myself imagining different "tones of voice", depending on
(presumably) random triggers.  I hope when you read this
note you perceive me speaking in a calm, relaxed manner.

While I agree with Detweiler's attitude toward the popular
press, I think Skaggs statements were addressed to us, the
members of the research community, and not to the reporters.
As long as the note does not reach members of that
community, we should tolerate somewhat-more-grouchy
phrasing than we might want for lay consumption.

I've just spent a lot of time trying to carefully word the
above message.  The neat thing about a group such as
connectionists is that (I think) we can skip that labor,
and just spit out our thoughts.  Or perhaps I am being
naive?

BTW, I have no HO on bee brains.  My own dendrites get
thinner every day.

J


From chris at arraysystems.nstn.ns.ca  Sat Aug 14 14:25:29 1993
From: chris at arraysystems.nstn.ns.ca (Chris Brobeck)
Date: Sat, 14 Aug 93 15:25:29 ADT
Subject: Genetic Algorithms
Message-ID: <9308141825.AA07238@arraysystems.nstn.ns.ca>

Dear Colleagues;
	We're currently in the process of building a relatively large net
and were looking at of using a genetic algorithm to optimize the network 
structure.
	The question is as follows. Early forms of genetic algorithms seemed
to rely on reading the gene once, linearly, in the construction process,
whereas a number of more recent algorithms allow the reading to start anywhere
along the gene, and continue to read (construct rules) until some stopping
criteria is met.
	In the former, it seems reasonable then for one organism to compete
against the other in a winner-take-all sort of way. On the other hand, the
rigidity of the genetic structure makes it very sensitive to mutation.
	In the latter case the gene may be thought of as a generator for
a process (randomly) creating rules of a variety of lengths. If one assumes
that individual rules are much shorter than the entire gene this method
becomes less sensitive to mutation,crossover,etc (both the beneficial
and not so beneficial aspects). In this case it seems that competition among
species would be as critical as competion among individuals, with the
interspecies competion perhaps representing a fast way to remove ineffective
rule sets, and individual competion more of a way of fine-tuning a distribution.
The upshot would be (one assumes) slower but more robust convergence. 
	In any case, if there is anyone out there who can point us in the
direction of some good references let us know - particularly ones that might
be available via ftp. Thanks,

	Chris Brobeck.
	

From bengio at iro.umontreal.ca  Mon Aug 16 11:09:57 1993
From: bengio at iro.umontreal.ca (Samy Bengio)
Date: Mon, 16 Aug 1993 11:09:57 -0400
Subject: Preprint announcement: Generalization of a Parametric Learning Rule
Message-ID: <9308161509.AA06576@carre.iro.umontreal.ca>


FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/bengio.general.ps.Z


The following file has been placed in neuroprose:
(no hardcopies will be provided):


           GENERALIZATION OF A PARAMETRIC LEARNING RULE (8 pages)

    by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei


                            Abstract:

In previous work we discussed the 
subject of parametric learning rules for neural networks.
In this article, we present a theoretical basis permitting to study 
the {\it generalization} property of a learning rule whose parameters 
are estimated from a set of learning tasks. By generalization, we mean 
the possibility of using the learning rule to learn solve new tasks. 
Finally, we describe simple experiments on two-dimensional categorization 
tasks and show how they corroborate the theoretical results.


This paper is an extended version of a paper which will appear in ICANN'93: 
Proceedings of the International Conference on Artificial Neural Networks.


To retrieve the file:
 
unix> ftp cheops.cis.ohio-state.edu
Connected to cheops.cis.ohio-state.edu.
220 cheops.cis.ohio-state.edu FTP server ready.
Name: anonymous
331 Guest login ok, send ident as password.
Password:<your email adress>
230 Guest login ok, access restrictions apply.
ftp> binary
200 Type set to I.
ftp> cd pub/neuroprose
250 CWD command successful.
ftp> get bengio.general.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for bengio.general.ps.Z
226 Transfer complete.
100000 bytes sent in 3.14159 seconds
ftp> quit
221 Goodbye.
unix> uncompress bengio.general.ps.Z
unix lpr bengio.general.ps (or however you print out postscript)

Many thanks to Jordan Pollack for maintaining this archive.


-- 
Samy Bengio	E-mail: bengio at iro.umontreal.ca	     Fax:       (514) 343-5834
		Tel: (514) 343-6111 ext. 3545/3494   Residence: (514) 495-3869
		Universite de Montreal, Dept. IRO, C.P. 6128, Succ. A,
		Montreal, Quebec, Canada, H3C 3J7


From reza at ai.mit.edu  Mon Aug 16 12:37:02 1993
From: reza at ai.mit.edu (Reza Shadmehr)
Date: Mon, 16 Aug 93 12:37:02 EDT
Subject: Tech Reports from CBCL at MIT
Message-ID: <9308161637.AA03497@corpus-callosum.ai.mit.edu>


The following technical reports from the Center for Biological
and Computational Learning at M.I.T. are now available via 
anonymous ftp.

--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29

We present a tree-structured architecture for supervised learning.  
The statistical model underlying the architecture is a hierarchical 
mixture model in which both the mixture coefficients and the mixture 
components are generalized linear models (GLIM's).  Learning is treated 
as a maximum likelihood problem; in particular, we present an Expectation-
Maximization (EM) algorithm for adjusting the parameters of the architecture.
We also develop an on-line learning algorithm in which the parameters are 
updated incrementally.  Comparative simulation results are presented in 
the robot dynamics domain.

--------------
:CBCL Paper #84/AI Memo #1441
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:title On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
:date August 1993
:pages 13

Recent developments in the area of reinforcement learning have yielded 
a number of new algorithms for the prediction and control of Markovian 
environments.  These algorithms, including the TD(lambda) algorithm of 
Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be 
motivated heuristically as approximations to dynamic programming (DP).  
In this paper we provide a rigorous proof of convergence of these DP-based 
learning algorithms by relating them to the powerful techniques of 
stochastic approximation theory via a new convergence theorem.  The 
theorem establishes a general class of convergent algorithms to which 
both TD (lambda) and Q-learning belong.

============================

How to get a copy of above reports:

The files are in compressed postscript format and are named by their 
AI memo number, e.g., the Jordan and Jacobs paper is named 
AIM-1440.ps.Z.  

Here is the procedure for ftp-ing:

unix> ftp ftp.ai.mit.edu   (log-in as anonymous)
ftp>  cd ai-pubs/publications/1993
ftp>  binary
ftp>  get AIM-number.ps.Z
ftp>  quit
unix> zcat AIM-number.ps.Z | lpr


I will periodically update the above list as new titles become
available.


Best wishes,

Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139


From mikewj at signal.dra.hmg.gb  Tue Aug 17 04:15:33 1993
From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb)
Date: Tue, 17 Aug 93 09:15:33 +0100
Subject: Practical Neural Nets Conference & Workshops in UK.
Message-ID: AA20707@ravel.dra.hmg.gb


        ***********************************
	NEURAL COMPUTING APPLICATIONS FORUM
	***********************************

		8 - 9 September 1993

         Brunel University, Runnymede, UK


     *****************************************
     PRACTICAL APPLICATIONS OF NEURAL NETWORKS
     *****************************************


Neural Computing Applications Forum is the primary meeting place for
people developing Neural Network applications in industry and
academia.  It has 200 members from the UK and Europe, from
universities, small companies and big ones, and holds four main
meeting each year.  It has been running for 3 years, and is cheap to
join.

This meeting spans two days with informal workshops on 8 September and
the main meeting comprising talks about neural network techniques and
applications on 9 September.

*********
WORKSHOPS
*********

**********************************************************
      Neural Networks in Engine Health Monitoring 

           8 September, 13.00 to 15.00
**********************************************************

Technical Contact: Tom Harris: (+44/0) 784 431341

Including :

Roger Hutton (ENTEK): 
"What is Predictive Maintenance?"

John Hobday (Lloyds Register):
"Gas Turbine Start Monitoring"

John McIntyre (University of Sunderland / National Power plc):
Predictive Maintenance at Blyth Power Station

*********************************************************
         Building a Neural Network Application

              8 September, 15.30 to 17.30
*********************************************************

Technical Contact: Tom Harris: (+44/0) 784 431341

Including:

Chris Bishop (Aston University): 
"The DTI Neural Computing Guidelines Project"

Tom Harris (Brunel University):
"A Design Process for Neural Network Applications"

Paul Gregory (Recognition Research Ltd.):
"Building an Applications in Software" (case study)

Simon Hancock (Neural Technologies Ltd.):
"Implementing Hardware Neural Network Solutions" (case study)


*************************
Evening: Barbecue Supper
*************************


*****************************
MAIN MEETING  -  24 June 1993
*****************************


8.30 Registration

9.05 Welcome

9.15 Neil Burgess (CRL):
   	"Feature Selection in Neural Networks"

9.50 Bryn Williams (Aston University):
	"Convergence and Diversity of Species in Genetic Algorithms
		for Optimization of a Bump-Tree Classifier"

10.20 Coffee

11.00 Mike Brinn (Health and Safety Executive):
	"Kohonen Networks Classifying Toxic Molecules"

11.40 John Bridle (Dragon Systems Ltd.):
	"Speech Recognition in Principle and Practice"

12.15 Lunch

2.00 Bruce Wilkie (Brunel University):
	"Real Time Logical Neural Networks"

2.40 Stan Swallow (Brunel University):
	"TARDIS: The World's Fastest Neural Network?"

3.15 Tea

3.40 Dave Cressy (Logica Cambridge Ltd.):
	"Neural Control of an Experimental Batch Distillation Column"

4.10 Discussions

4.30 Close & minibus to the station

ACCOMODATION is available in Brunel University at 35 pounds (including
barbecue supper) and **MUST** be booked and paid for in advance.
Accommodation and breakfast only: 25 pounds; barbecue supper only: 12
pounds.


*****************
   Application 
*****************

Members of NCAF get free entry to all meetings for a year. (This is
very good value - main meetings, tutorials, special interest
meetings).  It also includes subscription to Springer Verlag's journal
"Neural Computing and Applications".

Full membership: 250 pounds. 
	- anybody in your small company / research group in big company.

Individual membership: 140 pounds
	- named individual only.

Student membership (with journal): 55 pounds
	- copy of student ID required.

Student membership (no journal, very cheap!): 25 pounds
	- copy of student ID required.

Entry to this meeting without membership costs 35 pounds for the
workshops, and 80 pounds for the main day.

Payment in advance if possible; please give an official order number 
if an invoice is required.


Email enquiries to Mike Wynne-Jones, mikewj at signal.dra.hmg.gb.

Postal to Mike Wynne-Jones, NCAF, PO Box 62, Malvern, WR14 4NU, UK.

Fax to Karen Edwards, (+44/0) 21 333 6215


From mpp at cns.brown.edu  Tue Aug 17 12:18:01 1993
From: mpp at cns.brown.edu (Michael P. Perrone)
Date: Tue, 17 Aug 93 12:18:01 EDT
Subject: Provable optimality of averaging generalizers
Message-ID: <9308171618.AA15207@cns.brown.edu>

David Wolpert writes:

-->1) Say I have 3 real numbers, A, B, and X. In general, it's always
-->true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} /
-->2. (This is exactly analogous to having the cost of the average guess
-->bounded above by the average cost of the individual guesses.)
-->
-->2) This means that if we had a choice of either randomly drawing one
-->of the numbers {A, B}, or drawing C, that on average drawing C would
-->give smaller quadratic cost with respect to X.
-->
-->3) However, as Michael points out, this does *not* mean that if we had
-->just the numbers A and C, and could either draw A or C, that we should
-->draw C. In fact, point (1) tells us nothing whatsoever about whether A
-->or C is preferable (as far as quadratic cost with respect to X is
-->concerned).
-->
-->4) In fact, now create a 5th number, D = [C + A] / 2. By the same
-->logic as in (1), we see that the cost (wrt/ X) of D is less than the
-->average of the costs of C and A. So to the exact same degree that (1)
-->says we "should" guess C rather than A or B, it also says we should
-->guess D rather than A or C. (Note that this does *not* mean that D's
-->cost is necessarily less than C's though; we don't get endlessly
-->diminishing costs.)
-->
-->5) Step (4) can be repeated ad infinitum, getting a never-ending
-->sequence of "newly optimal" guesses. In particular, in the *exact*
-->sense in which C is "preferable" to A or B, and therefore should
-->"replace" them, D is preferable to A or B, and therefore should
-->replace *them* (and in particular replace C). So one is never left
-->with C as the object of choice.

This argument does not imply a contradiction for averaging!

This argument shows the natural result of throwing away information.
Step (4) throws away number B.  Given that we no longer know B, number D
is the correct choice. (One could imagine such "forgetting" to be useful
in time varying situations - which leads towards the Kalman filtering
that was mentioned in relation to averaging a couple of weeks ago.)
In Step (5), an infinite sequence is developed by successively throwing
away more and more of number B.  The infinite limit of Step (5) is
number A.  In other words, we have thrown away all knowledge of B.

-->So (1) isn't really normative; it doesn't say one "should" guess the
-->average of a bunch of guesses:

Normative?  Hey is this an ethics class!?  :-)

-->7) Choosing D is better than randomly choosing amongst C or A, just as
-->   choosing C is better than randomly choosing amongst A or B.
-->
-->8) This doesn't mean that given C, one should introduce an A and
-->   then guess the average of C and A (D) rather than C, just as
-->   this doesn't mean that given A, one should introduce a B and 
-->   then guess the average of A and B (C) rather than A.

Sure, if you're willing to throw away information.

Michael


From cns at clarity.Princeton.EDU  Tue Aug 17 11:30:02 1993
From: cns at clarity.Princeton.EDU (Cognitive Neuroscience)
Date: Tue, 17 Aug 93 11:30:02 EDT
Subject: RFP Research - McDonnell-Pew Program
Message-ID: <9308171530.AA27618@clarity.Princeton.EDU>


McDonnell-Pew Program
in Cognitive Neuroscience

SEPTEMBER 1993

Individual Grants-in-Aid 
for Research

Program supported jointly by the 
James S. McDonnell Foundation
and The Pew Charitable Trusts

INTRODUCTION
	The McDonnell-Pew Program in Cognitive Neuroscience has been
created jointly by the James S. McDonnell Foundation and The Pew Charitable
Trusts to promote the development of cognitive neuroscience.  The foundations
have allocated $20 million over a five-year period for this program.

	Cognitive neuroscience attempts to understand human mental events by
specifying how neural tissue carries out computations.  Work in cognitive
neuroscience is interdisciplinary in character, drawing on developments in
clinical and basic neuroscience, computer science, psychology, linguistics,
and philosophy.  Cognitive neuroscience excludes descriptions of psychological
function that do not address the underlying brain mechanisms and 
neuroscientific descriptions that do not speak to psychological function.

 	The program has three components.

	   (1)	Institutional grants, which have already been awarded, 
		for the purpose of creating centers where cognitive 
		scientists and neuroscientists can work together.
	
	   (2)	Small grants-in-aid, presently being awarded, for individual
		research projects to encourage Ph.D. and M.D. investigators 
		in cognitive neuroscience.

	   (3)	Small grants-in-aid, presently being awarded, for individual
		training projects to encourage Ph.D. and M.D. investigators 
		to acquire skills for interdisciplinary research.

	This brochure describes the individual grants-in-aid for research.


RESEARCH GRANTS
	The McDonnell-Pew Program in Cognitive Neuroscience will issue a
limited number of awards to support collaborative work by cognitive      
neuroscientists.  Applications are sought for projects of exceptional merit
that are not currently fundable through other channels and from investigators
who are not at institutions already funded by an institutional grant from
the program.  In order to distribute available funds as widely as possible,    
preference will be given to applicants who have not received previous grants
under this program.

	Preference will be given to projects that are interdisciplinary in   
character.  The goals of the program are to encourage broad participation
in the development of the field and to facilitate the participation of
investigators outside the major centers of cognitive neuroscience.

	There are no U.S. citizenship restrictions or requirements, nor does
the proposed work need to be conducted at a U.S. institution, providing the
sponsoring organization qualifies as tax-exempt as described in the
"Applications" section of this brochure.  Ph.D. thesis research of graduate
students will not be funded.

	Grant support under the research component is limited to $30,000
per year for two years.  Indirect costs are to be included in the $30,000
maximum and may not exceed 10 percent of total salaries and fringe
benefits.  These grants are not renewable after two years.

	The program is looking for innovative proposals that would, for
example:
	
	* combine experimental data from cognitive psychology and neuroscience;

	* explore the implications of neurobiological methods for the study
	  of the higher cognitive processes;

	* bring formal modeling techniques to bear on cognition, including
	  emotions and higher thought processes;

	* use sensing or imaging techniques to observe the brain during
	  conscious activity;

	* make imaginative use of patient populations to analyze cognition;

	* develop new theories of the human mind/brain system.

This list of examples is necessarily incomplete but should suggest the
general kind of proposals desired.  Ideally, a small grant-in-aid for 
research should facilitate the initial exploration of a novel or risky
idea, with success leading to more extensive funding from other sources.


APPLICATIONS
	Applicants should submit five copies of the following information:

	* a brief, one-page abstract describing the proposed work;

	* a brief, itemized budget that includes direct and indirect
	  costs (indirect costs may not exceed 10 percent of total    
	  salaries and fringe benefits);

	* a budget justification;

	* a narrative proposal that does not exceed 5,000 words; the
	  5,000-word proposal should include:

	    1)	a description of the work to be done and where
		it might lead;

	    2)	an account of the investigator's professional
		qualifications to do the work;

	    3)	an account of any plans to collaborate with other
		cognitive neuroscientists;

	    4)	a brief description of the available research
		facilities;

	* curriculum(a) vitae of the participating investigator(s);

	* an authorized document indicating clearance for the use of
	  human and animal subjects;

	* an endoresement letter from the officer of the sponsoring
	  institution who will be responsible for administering the
	  grant.

	One copy of the following items must also be submitted along with the
proposal.  These documents should be readily available from the sponsoring
institution's grants or development office.

	* A copy of the IRS determination letter, or the international 
	  equivalent, stating that the sponsoring organization is a nonprofit,
	  tax-exempt institution classified as a 501(c)(3) organization.

	* A copy of the IRS determination letter stating that your organization
	  is not listed as a private foundation under section 509(a) of the
	  Internal Revenue Service Code.

	* A statement on the sponsoring institution's letterhead, following
	  the wording on Attachment A and signed by an officer of the 
	  institution, certifying that the status or purpose of the
	  organization has not changed since the issuance of the IRS
	  determinations.  (If your organization's name has changed, include
	  a copy of the IRS document reflecting this change.)

	* An audited financial statement of the most recently completed fiscal
	  year of the sponsoring organization.

	* A current list of the names and professional affiliations of the
	  members of the organization's board of trustees and the names and
	  titles of the principal officers.

Other appended documents will not be accepted for evaluation and will be   
returned to the applicant.  Any incomplete proposals will also be returned
to the applicant.    

	Submissions will be reviewed by the program's advisory board.
Applications must be postmarked on or before FEBRUARY 1 to be considered
for review.

INFORMATION
McDonnell-Pew Program in Cognitive Neuroscience
Green Hall 1-N-6
Princeton University
Princeton, New Jersey 08544-1010
Telephone: 609-258-5014
Facsimile: 609-258-3031
Email: cns at clarity.princeton.edu


ADVISORY BOARD

Emilio Bizzi, M.D.
Eugene McDermott Professor in the Brain 
   Sciences and Human Behavior
Chairman, Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology, E25-526
Cambridge, Massachusetts 02139

Sheila E. Blumstein, Ph.D.
Professor of Cognitive and Linguistic Sciences
Dean of the College
Brown University
University Hall, Room 218
Providence, Rhode Island 02912

Stephen J. Hanson, Ph.D.
Head, Learning Systems Department
Siemens Corporate Research
755 College Road East
Princeton, New Jersey 08540

Jon H. Kaas, Ph.D.
Centennial Professor
Department of Psychology
Vanderbilt University
301 Wilson Hall
111 21st Avenue South
Nashville, Tennessee 37240

George A. Miller, Ph.D.
Director, McDonnell-Pew Program in Cognitive Neuroscience
James S. McDonnell Distinguished University Professor of Psychology
Department of Psychology
Princeton University
Princeton, New Jersey 08544-1010

Mortimer Mishkin, Ph.D.
Chief, Laboratory of Neurpsychology
National Institute of Mental Health
9000 Rockville Pike
Building 49, Room 1B80
Bethesda, Maryland 20892

Marcus E. Raichle, M.D.
Professor of Neurology and Radiology
Division of Radiation Sciences
Washington University School of Medicine
Campus Box 8225
510 S. Kingshighway Boulevard
St. Louis, Missouri 63110

Endel Tulving, Ph.D.
Tanenbaum Chair in Cognitive Neuroscience
Rotman Research Institute of Baycrest Centre
3560 Bathurst Street
North York, Ontario M6A 2E1
Canada


From dhw at santafe.edu  Tue Aug 17 21:26:08 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Tue, 17 Aug 93 19:26:08 MDT
Subject: Yet more on averaging
Message-ID: <9308180126.AA02904@zia>


In several recent e-mail conversations, Michael Perrone and I have
gotten to where think we agree with each other substance, although
we disagree a bit on emphasis. To complete the picture for the 
connectionist community and present the other side to Michael's
recent posting:


In my back pocket, I have a number. I'll fine you according to the
squared difference between your guess for the number and its actual
value. Okay, should you guess 3 or 5? Obviously you can't answer. 7 or
5? Same response. 5 or a random sample of 3 or 7? Now, as Michael points
out, you *can* answer: 5.

However I'm not as convinced as Michael that this actually tells us
anything of practical use. How should you use this fact to help you
guess the number in my back pocket? Seems to me you can't.

The bottom line, as I see it: arguments like Michael's show that one
should always use a single-valued learning algorithm rather than a
stochastic one. (Subtle caveat: If used only once, there is no
difference between a stochastic learning algorithm and a single-valued
one; multiple trials are implicitly assumed here.)

But if one has before one a smorgasbord of single-valued learning
algorithms, one can not infer that one should average over them. Even if I
choose amongst them in a really stupid way (say according to the
alphabetical listing of their creators), *so long as I am consistent
and single-valued in how I make my choice*, I have no assurace that doing
this will give worse results than averaging them.

To sum it up: one can not prove averaging to be preferable to a scheme
like using the alphabet to pick. Michael's result shows instead that
averaging the guess is better (over multiple trials) than randomly
picking amongst the guesses.

Which simply means that one should not randomly pick amongst the
guesses. It does *not* mean that one should average rather than use
some other (arbitrarilly silly) single-valued scheme.


David Wolpert


Disclaimer: All the above notwithstanding, I personally *would*
use some sort of averaging scheme in practice. The only issue of
contention here is what is *provably* the way one should generalize.
In addition to disseminating the important result concerning
the sub-optimality of stochastic schems (of which there are many
in the neural nets community!), Michael is to be commended for
bringing this entire fascinating subject to the attention of the
community.


From tmb at idiap.ch  Wed Aug 18 02:27:58 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Wed, 18 Aug 93 08:27:58 +0200
Subject: Yet more on averaging
Message-ID: <9308180627.AA18505@idiap.ch>

In reply to dhw at santafe.edu:
|The bottom line, as I see it: arguments like Michael's show that one
|should always use a single-valued learning algorithm rather than a
|stochastic one.


From tmb at idiap.ch  Wed Aug 18 02:29:42 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Wed, 18 Aug 93 08:29:42 +0200
Subject: Yet more on averaging
Message-ID: <9308180629.AA18508@idiap.ch>

dhw at santafe.edu writes:
|To sum it up: one can not prove averaging to be preferable to a scheme
|like using the alphabet to pick. Michael's result shows instead that
|averaging the guess is better (over multiple trials) than randomly
|picking amongst the guesses.
|
|Which simply means that one should not randomly pick amongst the
|guesses. It does *not* mean that one should average rather than use
|some other (arbitrarilly silly) single-valued scheme.

I would like to strengthen this point a little.

In general, averaging is clearly not optimal, nor even justifiable on
theoretical grounds.  For example, let us take the classification case
and let us assume that each neural network $i$ returns an estimate
$p^i_j(x)$ of the probability that the object belongs to class $j$
given the measurement $x$.

Consider now the case in which we know that the predictions of those
networks are statistically independent (for example, because they are
run on independent parts of the input data).  Then, we should really
multiply the probabilities estimated by each network, rather than
computing a weighted sum.  That is, we should make a decision
according to the maximum of $\prod_i p^i_j(x)$, not according to the
maximum of $\sum_i w_i p^i_j(x)$ (assuming a 0-1 loss function).

As another example, consider the case in which we have an odd number
of experts.  If they are trained and designed individually in a
particularly peculiar way, it might turn out that the optimal decision
rule is to output class 1 if an odd number of them pick class 1, and
pick class 0 otherwise.

Now, Michael probably limits the scope of his claims in his thesis to
exclude such cases (I only had a brief look, I must admit), but I
think it is important to make the point that, without some additional
assumptions, averaging is just a heuristic and not necessarily
optimal.

Still, linear combinations of the outputs of classifiers, regressors,
and networks seem to be useful in practice for improving
classification rates in many cases.  Lots of practical experience in
both statistics and neural networks points in that direction.

				Thomas.


From dhw at santafe.edu  Wed Aug 18 18:37:32 1993
From: dhw at santafe.edu (dhw@santafe.edu)
Date: Wed, 18 Aug 93 16:37:32 MDT
Subject: Random vs. single-valued rules
Message-ID: <9308182237.AA03709@zia>


tmb writes:

>>>>>
In reply to dhw at santafe.edu:
|The bottom line, as I see it: arguments like Michael's show that one
|should always use a single-valued learning algorithm rather than a
|stochastic one.

>From context, I'm assuming that you are referring to "deterministic"
vs. "randomized" decision rules, as they are called in decision theory
("stochastic learning algorithm" means something different to me, but
maybe I'm just misinterpreting your posting).

Picking an opinion from a pool of experts randomly is clearly not a
particularly good randomized decision rule in most cases.  However,
there are cases in which properly chosen randomized decision rules are
important (any good introduction on Bayesian statistics should discuss
this).  Unless there is an intelligent adversary involved, such cases
are probably mostly of theoretical interest, but nonetheless, a
randomized decision rule can be "better" than any deterministic one.
>>>>

Implicit in my statement was the context of Michael Perrone's
posting (which I was responding to): convex loss functions,
and the fact that in particular, one "single-valued learning algorithm"
one might use is the one Michael advocates: average over your
pool of experts.

Obviously one can choose a single-valued learning algorithm
which performs more poorly than randomly drawing from a pool of
experts:      

1) One can prove that (for convex loss) averaging over the pool is
preferable to randomly sampling the pool (Michael's result; note
assumptions about lack of correlations between the experts and the
like apply.)

2) One can not prove that averaging beats any other single-valued     
use of the experts.

3) Note that neither (1) nor (2) contradict the assertion that
there might be single-valued algorithms which perform worse than
randomly sampling the pool.

4) For the case of a 0-1 loss function, and a uniform prior over
target functions, it doesn't matter how you guess; all algorithms  
perform the same, both averaged over data and for one particular
data (as far as off-training set average loss is concerned).


David Wolpert


From tmb at idiap.ch  Thu Aug 19 09:17:14 1993
From: tmb at idiap.ch (Thomas M. Breuel)
Date: Thu, 19 Aug 93 15:17:14 +0200
Subject: Yet more on averaging
In-Reply-To: <9308180629.AA18508@idiap.ch>
References: <9308180629.AA18508@idiap.ch>
Message-ID: <9308191317.AA22756@idiap.ch>

I wrote, in response to a discussion of Michael Perrone's work:
|In general, averaging is clearly not optimal, nor even justifiable on
|theoretical grounds. [... some examples follow...]

Judging from some private mail that I have been receiving, some people
seem to have misunderstood my message.  I wasn't making a statement
about Michael's results per se, but about their application.

In particular, in the case of combining estimates of probabilities by
different "experts" for subsequent classification (e.g., in Michael's
OCR example), or in the case of combining expert "votes", using any
kind of linear combination is not justifiable in general on
theoretical grounds, and it is actually provably suboptimal in some
cases.

Now, such examples do violate some of the assumptions on which
Michael's results rely, so there is no contradiction.  My message was
only intended as a reminder that there are a number of important
problems in which the assumptions actually are violated, and in which
the approach of linear combinations reduces to a heuristic (one, I
might add, that often does work well in practice).

					Thomas.


From brandyn at brainstorm.com  Fri Aug 20 03:32:18 1993
From: brandyn at brainstorm.com (Brandyn)
Date: Fri, 20 Aug 93 00:32:18 PDT
Subject: Paper available on neuroprose
Message-ID: <9308200732.AA14000@brainstorm.com>

FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/webb.furf.ps.Z


    The following paper is now available by anonymous FTP:

                            Fusion-Reflection
                        (Self-Supervised Learning)

                           Brandyn Jerad Webb
                         brandyn at brainstorm.com


                                ABSTRACT

  By  analyzing  learning  from  the  perspective of knowledge acquisition, a
  number  of  common limitations are overcome.  Modeling efficacy is proposed
  as an empirical measure of knowledge, providing  a  concrete,  mathematical
  means  of  "acquiring  knowledge"  via gradient ascent.  A specific network
  architecture is described, a hierarchical  analog  of  node-labeled  Hidden
  Markov  Models,  and  its  evaluation  and  learning  laws are derived.  In
  empirical studies using  a  hand-printed  character  recognition  task,  an
  unsupervised  network was able to discover n-gram statistics from groups of
  letter images, and to use these statistics to enhance its ability to  later
  identify individual letters.

         Host: archive.cis.ohio-state.edu (128.146.8.52)
    Directory: pub/neuroprose
     Filename: webb.furf.ps.Z

    A version of this paper was submitted to NIPS in May '93.

    If there is sufficient interest, and if it wouldn't violate neuroprose
etiquette, I could possibly make the C code available as well.

    -Brandyn (brandyn at brainstorm.com)


From mikewj at signal.dra.hmg.gb  Fri Aug 20 12:00:02 1993
From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb)
Date: Fri, 20 Aug 93 17:00:02 +0100
Subject: Practical Neural Nets Conference & Workshops in UK.
Message-ID: AA16188@ravel.dra.hmg.gb


	***********************************
	NEURAL COMPUTING APPLICATIONS FORUM
	***********************************


     *****************************************
     PRACTICAL APPLICATIONS OF NEURAL NETWORKS

		CALL FOR PRESENTATIONS
     *****************************************

The Neural Computing Applications Forum is the primary meeting
place for people developing Neural Network applications in
industry and academia.  It has 200 members in the UK and Europe,
from Universities and small and large companies, and holds four
main meetings each year.  It has been running for three years.

Presentations, tutorials, and workshops are sought on all
practical aspects of Neural Computing and Pattern Recognition.
Previous events have included presentations and workshops on
practical issues including machine health monitoring, neural
control, financial prediction, chemical structure analysis,
power station load prediction, copyright law, alternative
energy, automatic speech recognition, and human-computer
interaction.  We also hold introductory tutorials and
theoretical workshops on all aspects of Neural computing.

Presentations at NCAF do not require a written paper for
publication.  You will have the chance to draw the attention of
the top industrial Neural Network practitioners to your work.
conference presenters of outstanding quality will be invited to
submit a paper to the Springer Verlag journal Neural Computing
and Applications.

Please contact Mike Wynne-Jones, Programme Organiser, NCAF, PO
Box 62, Malvern, WR14 4NU, UK, enclosing your proposed title and
a brief synopsis of your presentation.  Email:
mikewj at signal.dra.hmg.gb; phone +44 684 563858.


From shashem at ecn.purdue.edu  Sat Aug 21 18:08:11 1993
From: shashem at ecn.purdue.edu (Sherif Hashem)
Date: Sat, 21 Aug 93 17:08:11 -0500
Subject: Combining (averaging) NNs
Message-ID: <9308212208.AA18678@cornsilk.ecn.purdue.edu>

I have recently joined Connectionists and I read some of the email messages
arguing about combining/averaging NNs.  Unfortunately, I missed the earlier 
discussion that started this argument. 

I am interested in combining NNs, in fact, my Ph.D. thesis is about optimal 
linear combinations of NNs.

Averaging a number of estimators has been suggested/debated/examined in the 
literature for a long time, dating as far as 1818 (Laplace 1818). 
Clemen (1989) cites more than 200 papers in his review of the literature 
related to combining forecasts (estimators), including contributions from 
forecasting, psychology, statistics, and management science literatures.

Numerous empirical studies have been conducted to assess the 
benefits/limitations of combining estimators (Clemen 1989). Besides, there are
quite a few analytical results established in the area. Most of these 
studies and results are in the forecasting literature (more than 100 
publications in the last 20 years). 

I think that it is fair to say that, as long as no "absolute" best estimator 
can be identified, combining estimators may provide a superior alternative to 
picking the best from a population of estimators.

I have published some of my preliminary results on the benefits of combining 
NNs in (Hashem and Schmeiser 1992, 1993a, and Hashem et al. 1993b), and
based on my experience with combining NNs, I join Michael Perrone in 
advocating the use of combining NNs to enhance the estimation accuracy of 
NN based models.


Sherif Hashem
email:shashem at ecn.purdue.edu


References:
-----------

Clemen, R.T. (1989). Combining Forecasts: A Review and Annotated Bibliography.
        International Journal of Forecasting, Vol. 5, pp. 559-583.

Hashem, S., Y. Yih, & B. Schmeiser (1993b). An Efficient Model
       for Product Allocation using Optimal Combinations of 
       Neural Networks. In Intelligent Engineering Systems through
       Artificial Neural Networks, Vol. 3, C. Dagli, L. Burke, B. Femandez, 
       & J. Ghosh (Eds.), ASME Press, forthcoming.

Hashem, S., & B. Schmeiser (1993a). Approximating a Function
       and its Derivatives using MSE-Optimal Linear Combinations of 
       Trained Feedforward Neural Networks. Proceedings of the
       World Congress on Neural Networks, Lawrence Erlbaum
       Associates, New Jersey, Vol. 1, pp. 617-620.

Hashem, S., & B. Schmeiser (1992). Improving Model Accuracy using Optimal 
        Linear Combinations of Trained Neural Networks, Technical Report 
        SMS92-16, School of Industrial Engineering, Purdue University.
        (Submitted)

Laplace P.S. de. (1818). Deuxieme Supplement a la Theorie Analytique
        des Probabilites (Courcier, Paris).; reprinted (1847) in Oeuvers
        Completes de Laplace, Vol. 7 (Paris, Gauthier-Villars) 531-580.


From furu at uchikawa.nuem.nagoya-u.ac.jp  Mon Aug 23 11:22:41 1993
From: furu at uchikawa.nuem.nagoya-u.ac.jp (Takeshi Furuhashi)
Date: Mon, 23 Aug 93 11:22:41 JST
Subject: Call for Papers of WWW 
Message-ID: <9308230222.AA00124@cancer.uchikawa.nuem.nagoya-u.ac.jp>

CALL FOR PAPERS                                   TENTATIVE
1994 IEEE/Nagoya University
World Wisemen/women Workshop(WWW)

ON FUZZY LOGIC AND NEURAL NETWORKS/GENETIC ALGORITHMS
-Architecture and Applications for Knowledge Acquisition/Adaptation-

August 9 and 10, 1994
Nagoya University Symposion
Chikusa-ku, Nagoya, JAPAN

Sponsored by Nagoya University

Co-sponsored by
IEEE Industrial Electronics Society

Technically Co-sponsored by
IEEE Neural Network Council
IEEE Robotics and Automation Society
International Fuzzy Systems Association
Japan Society for Fuzzy Theory and Systems
North American Fuzzy Information Processing Society
Society of Instrument and Control Engineers
Robotics Society of Japan

There are growing interests in combination technologies of fuzzy logic
and neural networks, fuzzy logic and genetic algorithm for acquisition
of experts' knowledge, modeling of nonlinear systems, realizing
adaptive systems. The goal of the 1994 IEEE/Nagoya University WWW on
Fuzzy Logic and Neural Networks/Genetic Algorithm is to give its
attendees opportunities to exchange information and ideas on various
aspects of the Combination Technologies and to stimulate and inspire
pioneering works in this area. To keep the quality of these workshop
high, only a limited number of people are accepted as participants of
the workshops. The papers presented at the workshop will be edited and
published from the Oxford University Press.

TOPICS:
Combination of Fuzzy Logic and Neural Networks, Combination of Fuzzy
Logic and Genetic Algorithm, Learning and Adaptation, Knowledge
Acquisition, Modeling, Human Machine Interface

IMPORTANT DATES:
Submission of Abstracts of Papers : April 31, 1994
Acceptance Notification           : May 31, 1994
Final Manuscript                  : July 1, 1994

A partial or full assistance of travel expenses for speakers of
excellent papers will be provided by the WWW. The candidates should
apply as soon as possible, preferably by Jan. 30, '94

All correspondence and submission of papers should be sent to 
Takeshi Furuhashi, General Chair
Dept. of Information Electronics, Nagoya University
Furo-cho, Chikusa-ku, Nagoya 464-01, JAPAN
TEL: +81-52-781-5111 ext.2792
FAX: +81-52-781-9263
E mail: furu at uchikawa.nuem.nagoya-u.ac.jp

IEEE/Nagoya University WWW:

IEEE/Nagoya University WWW(World Wisemen/women Workshop) is a series
of workshops sponsored by Nagoya University and co-sponsored by IEEE
Industrial Electronics Society. City of Naoya, located two hours away
from Tokyo, has many electro-mechanical industries in its surroundings
such as Mitsubishi, TOYOTA, and their allied companies. Nagoya is a
mecca of robotics industries, machine industries and aerospace
industries in Japan. The series of workshops will give its attendees
opportunities to exchange information on advanced sciences and
technologies and to visit industries and research institutes in this
area.

*This workshop will be held just after the 3rd International
Conference on Fuzzy Logic, Neural Nets and Soft Computing(IIZUKA'94)
from Aug. 1 to 7, '94.

WORKSHOP ORGANIZATION

Honorary Chair: Tetsuo Fujimoto
                (Dean, School of Engineering, Nagoya University)
General Chair:  Takeshi Furuhashi (Nagoya University)
Advisory Committee:
        Chair:  Toshio Fukuda (Nagoya University)
                Fumio Harashima (University of Tokyo)
                Yoshiki Uchikawa (Nagoya University)
                Takeshi Yamakawa (Kyushu Institute of Technology)
Steering Committee:
                H.Berenji (NASA Ames Research Center)
                W.Eppler (University of Karlsruhe)
                I.Hayashi (Hannan University)
                Y.Hayashi (Ibaraki University)
                H.Ichihashi (Osaka Prefectural University)
                A.Imura
                (Laboratory for International Fuzzy Engineering)
                M.Jordan (Massachusetts Institute of Technology)
                C.-C.Jou (National Chiao Tung Universtiy)
                E.Khan (National Semiconductor)
                R.Langari (Texas A & M University)
                H.Takagi (Matsushita Electric Industrial Co., Ltd.)
                K.Tanaka (Kanazawa University)
                M.Valenzuela-Rendon
                (Institute Tecnologico y de Estudios Superiores de Monterrey) 
                L.-X.Wang (University of California Berkeley)
                T.Yamaguchi (Utsunomiya University)
                J.Yen (Texas A & M Universtiy)


From joachim at fit.qut.edu.au  Wed Aug 25 21:46:11 1993
From: joachim at fit.qut.edu.au (Joachim Diederich)
Date: Wed, 25 Aug 1993 21:46:11 -0400
Subject: Second Brisbane Neural Network Workshop
Message-ID: <199308260146.VAA09819@fitmail.fit.qut.edu.au>


          Second Brisbane Neural Network Workshop
          ---------------------------------------

            Queensland University of Technology
                Brisbane Q 4001, AUSTRALIA
               Gardens Point Campus, ITE 410
                     24 September 1993

This Second Brisbane Neural Network Workshop is intended  to
bring together those interested in neurocomputing and neural
network applications.  The objective of the workshop  is  to
provide   a   discussion   platform   for   researchers  and
practitioners interested in theoretical and applied  aspects
of  neurocomputing.  The  workshop  should be of interest to
computer scientists and engineers, as well as to biologists,
cognitive   scientists   and   others   interested   in  the
application of neural networks.

The Second Brisbane Neural Network Workshop will be held  at
Queensland  University  of  Technology, Gardens Point Campus
(ITE 410) on September 24, 1993 from 9:00am to 6:00pm.


 Program
 -------

 9:00-9:15
 Welcome
 Joachim Diederich, Queensland University of Technology,
 Neurocomputing Research Concentration Area

 Cognitive Science
 -----------------

 9:15-10:00
 Graeme Halford, University of Queensland,
 Department of Psychology
 "Representation of concepts in PDP models" 

 10:00-10:30
 Joachim Diederich, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Re-learning in connectionist semantic networks"

 10:30-11:00
 Coffee Break

 11:00-11:30
 James Hogan, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Recruitment learning in randomly connected neural networks"

 11:30-12:00
 Kate Stevens, University of Queensland,
 Department of Psychology
 "Music perception and neural network modelling"

 12:00-1:00
 Lunch Break

 1:00-1:30
 Software Demonstration:
 "Animal breeding advice using neural networks"

 Learning
 --------

 1:30-2:15
 Tom Downs, University of Queensland,
 Department of Electrical Engineering
 "Generalisation, structure and learning in artificial neural networks"

 2:15-3:00
 Ah Chung Tsoi, University of Queensland,
 Department of Electrical Engineering
 "Training algorithms for recurrent neural networks, a unified framework"

 3:00-3:30
 Steven Young, University of Queensland,
 Department of Electrical Engineering
 "Constructive algorithms for neural networks"

 3:30-4:00
 Coffee Break

 Pattern Recognition and Control
 -------------------------------

 4:00-4:30
 Gerald Finn, Queensland University of Technology,
 Neurocomputing Research Concentration Area
 "Learning fuzzy rules by genetic algorithms"

 4:30-5:00
 Paul Hannah & Russel Stonier, University of Central Queensland,
 Department of Mathematics and Computing
 "Using a modified Kohonen associative map for function approximation
 with application to control"

 Theory and Artificial Intelligence
 ----------------------------------

 5:00-5:30
 M. Mohammadian, X. Yu & J.D. Smith, University of Central Queensland,
 Department of Mathematics and Computing
 "From connectionist learning to an optimised fuzzy knowledge base"

 5:30-6:00
 Richard Bonner & Louis Sanzogni, Griffith University,
 School of Information Systems & Management Science
 "Embedded neural networks"

All are welcome. Participation is free and there is no registration.

Enquiries should be sent to

Professor Joachim Diederich
Neurocomputing Research Concentration Area
School of Computing Science
Queensland University of Technology
GPO Box 2434
Brisbane Q 4001 Australia
Phone: +61 7 864-2143
Fax: +61 7 864-1801
Email: joachim at fitmail.fit.qut.edu.au


From sims at pdesds1.scra.org  Thu Aug 26 11:48:04 1993
From: sims at pdesds1.scra.org (Jim Sims)
Date: Thu, 26 Aug 93 11:48:04 EDT
Subject: fyi, late, but better than never
Message-ID: <9308261548.AA07086@pdesds1.noname>

I saw this while browsing the electronic CBD materils.

Agency   : NAS
Deadline : 12/01/93
Title    : Neurolab                                          
Reference: Commerce Business Dailly, 07/06/93                
BASIC RESEARCH OPPORTUNITY SOL OA SLS-4 POC                                   
Dr.  Frank  Sulzman  tel:  202/358-2359  The National Aeronautics and Space   
Administration (NASA), along with its domestic (NIH, NSF) and international   
(CNES,  CSA,  DARA,  ESA,  NASDA)  partners  is  soliciting  proposals  for   
Neurolab,  a Space Shuttle mission dedicated to brain and behavior research   
that  is  scheduled  for launch in 1998. A more detailed description of the   
opportunity  with specific guidelines for proposal preparation is available   
from Neurolab Program Scientist, NASA Headquarters, Code UL, 300 E St., SW,   
Washington,  DC  20546.  This NASA Announcement of Opportunity will be open   
for the period through December 1, 1993. (0182)                               
SPONSOR: NASA Headquarters, Code UL/Neurolab Program Scientist, Washington,   
           DC 20546 Attn:UL/Dr. Frank Sulzman    


From PIURI at IPMEL1.POLIMI.IT  Fri Aug 27 07:55:19 1993
From: PIURI at IPMEL1.POLIMI.IT (PIURI@IPMEL1.POLIMI.IT)
Date: 27 Aug 1993 12:55:19 +0100 (MET)
Subject: call for papers
Message-ID: <01H28OFZ9KS291WC7T@icil64.cilea.it>

=============================================================================

14th IMACS WORLD CONGRESS ON COMPUTATION AND APPLIED MATHEMATICS
July 11-15, 1994
Atlanta, Georgia, USA

Sponsored by:
IMACS - International Association for Mathematics and Computers in Simulation
IFAC  - International Federation for Automatic Control
IFIP  - International Federation for Information Processing
IFORS - International Federation of Operational Research Societies
IMEKO - International Measurement Confederation

General Chairman: Prof. W.F. Ames
                  Georgia Institute of Technology, Atlanta, GA, USA


SESSIONS ON NEURAL NETWORKS
     1. NEURAL NETWORK ARCHITECTURES AND IMPLEMENTATIONS
     2. APPLICATION OF NEURAL TECHNIQUES FOR SIGNAL AND IMAGE PROCESSING


                   >>>>>>  CALL FOR PAPERS  <<<<<<


The IMACS World Congress on Computation and Applied Mathematics is held every
three year to provide a large general forum to professionals and scientists
for analyzing and discussing the fundamental advances of research in all
areas of scientific computation, applied mathematics, mathematical modelling,
and system simulation in and for specific disciplines, the philosophical
aspects, and the impact on society and on disciplinary and interdisciplinary
research.

In the 14th edition, two sessions are planned on neural networks: "Neural
Network Architectures and Implementations" and "Application of Neural
Techniques for Signal and Image Processing".

The first session will focus on all theoretical and practical aspects of
architectural design and realization of neural networks:
from mathematical analysis and modelling to behavioral specification,
from architectural definition to structural design, from VLSI implementation
to software emulation, from design simulation at any abstraction level
to CAD tools for neural design, simulation and evaluation.

The second session will present the concepts, the design and the use of
neural solutions within the area of signal and image processing,
e.g., for modelling, identification, analysis, classification, recognition,
and filtering. Particular emphasis will be given to presentation of
specific applications or application areas.

Authors interested in the above neural sessions are invited to send
a one page abstract, the title of the paper and the author's address
by electronic mail, fax or postal mail to the Neural Sessions' Chairman
by October 15, 1993.
Authors must then submit five copies of their typed manuscript by postal
mail or fax to the Neural Sessions' Chairman by November 19, 1993.
Preliminary notification of acceptance/rejection will be mailed by
November 30, 1993. Final acceptance/rejection will be mailed by January
31, 1994.


Neural Sessions' Chairman: Prof. Vincenzo Piuri
                           Department of Electronics and Information
                           Politecnico di Milano
                           piazza L. da Vinci 32
                           I-20133 Milano, Italy
                           phone no. +39-2-23993606, +39-2-23993623
                           fax no. +39-2-23993411
                           e-mail piuri at ipmel1.polimi.it

=============================================================================


From goodman at unr.edu  Thu Aug 26 12:35:53 1993
From: goodman at unr.edu (Phil Goodman)
Date: Thu, 26 Aug 93 16:35:53 GMT
Subject: NevProp 1.16 Update Available
Message-ID: <9308262335.AA24854@equinox.ccs.unr.edu>

Please consider the following update announcement:
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
NevProp 1.16 corrects a bug in the output range of symmetric sigmoids and
one occuring when the number of testing is fewer than training cases.
These fixes are further described in the README.CHANGES file at the
UNR anonymous ftp, described below.

The UNR anonymous ftp host is 'unssun.scs.unr.edu', and the files are
in the directory 'pub/goodman/nevpropdir'.

Version 1.15 users can update 3 ways:

a. Just re-ftp the 'nevprop1.16.shar' file and unpack and 'make' np again.
   (also available at the CMU machine, describe below.)

b. Just re-ftp (in "binary" mode) the DOS or MAC executable binaries
   located in the 'dosdir' or 'macdir' subdirectories, respectively.

c. Ftp only the 'np.c' file provided, replacing your old version, then 'make'

d. Ftp only the 'np-patchfile', then issue the command
  'patch < np-patchfile' to locally update np.c, then 'make' again.


New users can obtain NevProp 1.16 from the anonymous UNR anonymous ftp
as described in (a) or (b) above, or from the CMU machine:

a. Create an FTP connection from wherever you are to machine
       "ftp.cs.cmu.edu".  The internet address of this machine is
       128.2.206.173, for those who need it.

b. Log in as user "anonymous" with your own ID as password.
       You may see an error message that says "filenames may not
       have /.. in them" or something like that.  Just ignore it.

c. Change remote directory to "/afs/cs/project/connect/code".
       NOTE: You must do this in a single operation.  Some of the
       super directories on this path are protected against outside
       users.

d. At this point FTP should be able to get a listing of files
       in this directory with "dir" & fetch the ones you want with "get".
       (The exact FTP commands depend on your local FTP server.)

Version 1.2 will be released soon. A major new feature will be the option
of using cross-entropy rather than least squares error function.

Phil
___________________________
___________________________ Phil Goodman,MD,MS  goodman at unr.edu
| __\ | _ \ |  \/  || _ \   Associate Professor & CBMR Director
||    ||_// ||\  /||||_//   Cardiovascular Studies Team Leader
||    | _(  || \/ ||| _(
||__  ||_\\ ||    |||| \\   CENTER for BIOMEDICAL MODELING RESEARCH
|___/ |___/ ||    ||||  \\  University of Nevada School of Medicine
                            Washoe Medical Center H1-166, 77 Pringle Way,
                            Reno, NV 89520   702-328-4867  FAX:328-4111

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


From heiniw at sun1.eeb.ele.tue.nl  Fri Aug 27 08:37:13 1993
From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen)
Date: Fri, 27 Aug 1993 14:37:13 +0200 (MET DST)
Subject: Neural hardware performance criteria
Message-ID: <9308271237.AA00409@sun1.eeb.ele.tue.nl>

A non-text attachment was scrubbed...
Name: not available
Type: text
Size: 1296 bytes
Desc: not available
Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/c47cdf08/attachment-0001.ksh

From alex at brain.physics.swin.oz.au  Sat Aug 28 07:05:34 1993
From: alex at brain.physics.swin.oz.au (Alex A Sergejew)
Date: Sat, 28 Aug 93 21:05:34 +1000
Subject: Pan Pacific Conf on Brain Electric Topography - 1st announcement
Message-ID: <9308281105.AA12138@brain.physics.swin.oz.au>

                            FIRST ANNOUNCEMENT

                          PAN PACIFIC CONFERENCE 
                                    ON 
                         BRAIN ELECTRIC TOPOGRAPHY

                           February 10 - 12, 1994
                              SYDNEY, AUSTRALIA

INVITATION

Brain electric and magnetic topography is an exciting emerging area which
draws on the disciplines of neurophysiology, physics,  signal  processing, 
computing and cognitive neuroscience.  This conference will offer a forum for
the presentation of recent findings.  The program will include an outstanding 
series of plenary lectures, as well as platform and poster presentations
by active participants in the field.

The conference includes two major plenary sessions.

In the Plenary Session entitled "Brain Activity Topography and Cognitive
Processes," the keynote speakers include Frank Duffy (Boston), Alan Gevins
(San Francisco), Steven Hillyard (La Jolla), Yoshihiko Koga (Tokyo) and Paul
Nunez (New Orleans).

Keynote speakers for the Plenary Session entitled "Brain Rhythmic
Activity and States of Consciousness," will include Walter Freeman (Berkeley),
Rodolfo Llinas (New York), Shigeaki Matsuoka (Kitakyushu)
and Yuzo Yamaguchi (Osaka).

The plenary sessions will provide a forum for discussion of some of the most
recent developments of analysis and models of electrical brain function, and
findings of brain topography and cognitive processes.

This conference is aimed at harnessing multidisciplinary participation and
will be of interest to those working in the areas of clinical neurophysiology,
cognitive neuroscience, biological signal processing, neurophysiology,
neurology, neuropsychology and neuropsychiatry.


CALL FOR PAPERS

Papers are invited for platform and poster presentation. 
Platform presentations will be allocated 20 minutes 
(15 mins for presentation and 5 mins for questions).
Abstracts of no more than 300 words are invited. 

The deadline for receipt of abstracts is November 10th, 1993, while
notification of acceptance of abstracts will be sent on December 10th, 1993

The abstract can be sent by mail, Fax or Email to:
PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY
C/- Cognitive Neuroscience Unit
Westmead Hospital, Hawkesbury Road
Westmead  NSW  2145, Sydney
AUSTRALIA

Fax :   +61 (2) 635 7734
Tel :   +61 (2) 633 6688 
Email : pan at brain.physics.swin.oz.au

Authors may be invited to provide full manuscripts for publication of the
proceedings in CD-ROM and book form.  All authors wishing to have their
papers included must supply a full manuscript at the time of the conference.


GENERAL INFORMATION:

Date:   February 10 - 12, 1994

Venue:
The conference will be held at the Hotel Intercontinental on Sydney Harbour.

Climate:
February is summertime in Australia and the average maximum day-time 
temperature in Sydney is 26 degC (78 degF).

Social Programme:
There will be a conference dinner on a yacht sailing Sydney Harbour
on February 11th, 1994.  Cost $A65 per person.

Hotel Accommodation:
Hotels listed offer a range of accommodation at special conference rates.
Please quote the name of the conference when arranging your booking.


Scientific Committee:                             	Organising Committee:
Prof Richard Silberstein, Melbourne (Chairman)    	E Gordon (Chairman)
A/Prof Helen Beh, Sydney                          	R Silberstein
Dr Evian Gordon, Sydney                           	J Restom
Dr Shigeaki Matsuoka, Kitakyushu
Dr Patricia Michie, Sydney 
Dr Ken Nagata, Akita
Dr Alex Sergejew, Melbourne
A/Prof James Wright, Auckland


REGISTRATION:

Name(Prof/Dr/Ms/Mr):__________________________________________________
Address:______________________________________________________________
______________________________________________________________________
______________________________________________________________________
Telephone:  ______________________________ (include country/area code)
Fax:______________________________ E Mail______________________________

On or before November 10th, 1993    	        $A380.00
After November 10th, 1993           		$A400.00

Students before November 10th,1993 	        $A250.00

Conference Harbour Cruise Dinner     	        $A65.00 per person
    number of people _____


Method of Payment:

Cheque _	MasterCard _	  VISA _	BankCard _

To be completed by credit card users only:

Card Number  	_ _ _ _   _ _ _ _   _ _ _ _   _ _ _ _

Expiration Date	__________________________

Signature	__________________________ (Signature not required if 
					    registering by E-mail)

Date		__________________________

Cheques should be payable to "Pan Pacific Conference"
(Address below)


SOME SUGGESTIONS FOR HOTEL ACCOMODATION

Special conference rates apply.  Quote the name of the conference when booking.
Prices are per double room per night

SYDNEY RENAISSANCE HOTEL*****
Guaranteed harbour view.  10 min walk under cover.  $A170.00
30 Pitt St, Sydney NSW 2000, Australia.
Ph: +61 (2) 259 7000  Fax +61 (2) 252 1999

HOTEL INTERCONTINENTAL SYDNEY*****
Harbour view $A205.00  City View $A165.00
117 Macquarie Street, Sydney NSW 2000, Australia.
Ph: +61 (2) 230 0200  Fax: +61 (2) 240 1240

OLD SYDNEY PARKROYAL**** 
10 min walk.  $A190.00 including breakfast
55 George St, Sydney NSW 2000, Australia.
Ph: +61 (2) 252 0524  Fax: (2) +61 251 2093

RAMADA GRAND HOTEL, BONDI BEACH****
Complementary shuttlebus service.  $A130 - $A170 including breakfast 
Beach Rd, Bondi Beach NSW 2026, Australia.
Ph: +61 (2) 365 5666  Fax: +61 (2) 3655 330

HOTEL CRANBROOK INTERNATIONAL***
Older style, budget type accomodation overlooking Rose Bay.
Free shuttlebus service and airport transfers.  $A80.00 including breakfast
601 New South Head Rd, Rose Bay NSW 2020, Australia.
Ph: +61 (2) 252 0524  Fax: +61 (2) 251 2093


Post registration details with your cheque to:

PAN PACIFIC CONFERENCE ON ELECTRIC BRAIN TOPOGRAPHY
C/- Cognitive Neuroscience Unit
Westmead Hospital, Hawkesbury Road
Westmead  NSW  2145, Sydney
AUSTRALIA


From taylor at world.std.com  Sun Aug 29 22:21:27 1993
From: taylor at world.std.com (Russell R Leighton)
Date: Sun, 29 Aug 1993 22:21:27 -0400
Subject: AM6 Users: release notes and bug fixes available
Message-ID: <199308300221.AA27236@world.std.com>


There has been an update to the am6.notes file at the AM6
ftp sites. User's not on the AM6 users mailing list
should get this file and update their installation.

Russ


======== REPOST OF AM6 RELEASE (long) ========

 
The following describes a neural network simulation environment
made available free from the MITRE Corporation. The software
contains a neural network simulation code generator which generates
high performance ANSI C code implementations for modular backpropagation 
neural networks. Also included is an interface to visualization tools.

		  FREE NEURAL NETWORK SIMULATOR
			   AVAILABLE

		        Aspirin/MIGRAINES 

			   Version 6.0

The Mitre Corporation is making available free to the public a
neural network simulation environment called Aspirin/MIGRAINES.
The software consists of a code generator that builds neural network
simulations by reading a network description (written in a language
called "Aspirin") and generates an ANSI C simulation. An interface 
(called "MIGRAINES") is provided to export data from the neural
network to visualization tools. The previous version (Version 5.0)
has over 600 registered installation sites world wide.

The system has been ported to a number of platforms:

Host platforms:
	convex_c2	/* Convex C2 */
	convex_c3	/* Convex C3 */
	cray_xmp        /* Cray XMP */
	cray_ymp        /* Cray YMP */
	cray_c90        /* Cray C90 */
	dga_88k         /* Data General Aviion w/88XXX */
	ds_r3k          /* Dec Station w/r3000 */
	ds_alpha        /* Dec Station w/alpha */
	hp_parisc       /* HP w/parisc */ 
	pc_iX86_sysvr4  /* IBM pc 386/486 Unix SysVR4 */
	pc_iX86_sysvr3  /* IBM pc 386/486 Interactive Unix SysVR3 */
	ibm_rs6k        /* IBM w/rs6000 */
	news_68k        /* News w/68XXX */
	news_r3k        /* News w/r3000 */
	next_68k	/* NeXT w/68XXX */
	sgi_r3k 	/* Silicon Graphics w/r3000 */
	sgi_r4k 	/* Silicon Graphics w/r4000 */
	sun_sparc	/* Sun w/sparc */
	sun_68k		/* Sun w/68XXX */
 
Coprocessors:
	mc_i860		/* Mercury w/i860 */
	meiko_i860	/* Meiko w/i860 Computing Surface */


Included with the software are "config" files for these platforms. 
Porting to other platforms may be done by choosing the "closest"
platform currently supported and adapting the config files.


New Features
------------
		- ANSI C ( ANSI C compiler required! If you do not
		  have an ANSI C compiler,  a free (and very good) 
		  compiler called gcc is available by anonymous ftp
		  from prep.ai.mit.edu (18.71.0.38). ) 
		  Gcc is what was used to develop am6 on Suns.

		- Autoregressive backprop has better stability
		  constraints (see examples: ringing and sequence),
		  very good for sequence recognition

		- File reader supports "caching" so you can
		  use HUGE data files (larger than physical/virtual
		  memory).

		- The "analyze" utility which aids the analysis
		  of hidden unit behavior (see examples: sonar and
		  characters)

		- More examples

		- More portable system configuration
		  for easy installation on systems
		  without a "config" file in distribution
Aspirin 6.0
------------

The software that we are releasing now is for creating, 
and evaluating, feed-forward networks such as those used with the 
backpropagation learning algorithm. The software is aimed both at 
the expert programmer/neural network researcher who may wish to tailor
significant portions of the system to his/her precise needs, as well
as at casual users who will wish to use the system with an absolute
minimum of effort. 

Aspirin was originally conceived as ``a way of dealing with MIGRAINES.''
Our goal was to create an underlying system that would exist behind
the graphics and provide the network modeling facilities.
The system had to be flexible enough to allow research, that is, 
make it easy for a user to make frequent, possibly substantial, changes
to network designs and learning algorithms. At the same time it had to 
be efficient enough to allow large ``real-world'' neural network systems
to be developed. 

Aspirin uses a front-end parser and code generators to realize this goal. 
A high level declarative language has been developed to describe a network.
This language was designed to make commonly used network constructs simple 
to describe, but to allow any network to be described.  The Aspirin file 
defines the type of network, the size and topology of the network, and 
descriptions of the network's input and output. This file may also include
information such as initial values of weights, names of user defined 
functions.

The Aspirin language is based around the concept of a "black box".
A black box is a module that (optionally) receives input and
(necessarily) produces output.  Black boxes are autonomous units
that are used to construct neural network systems.  Black boxes
may be connected arbitrarily to create large possibly heterogeneous 
network systems. As a simple example, pre  or post-processing stages 
of a neural network can be considered black boxes that do not learn.

The output of the Aspirin parser is sent to the appropriate code 
generator that implements the desired neural network paradigm. 
The goal of Aspirin is to provide a common extendible front-end language 
and parser for different network paradigms. The publicly available software
will include a backpropagation code generator that supports several
variations of the backpropagation learning algorithm.  For backpropagation
networks and their variations, Aspirin supports a wide variety of 
capabilities: 
	1. feed-forward layered networks with arbitrary connections
        2. ``skip level'' connections 
	3. one and two-dimensional weight tessellations
        4. a few node transfer functions (as well as user defined)
	5. connections to layers/inputs at arbitrary delays,
	   also "Waibel style" time-delay neural networks
	6. autoregressive nodes.
	7. line search and conjugate gradient optimization

The file describing a network is processed by the Aspirin parser and
files containing C functions to implement that network are generated.
This code can then be linked with an application which uses these
routines to control the network. Optionally, a complete simulation 
may be automatically generated which is integrated with the MIGRAINES
interface and can read data in a variety of file formats. Currently
supported file formats are:
	Ascii
	Type1, Type2, Type3 Type4 Type5 (simple floating point file formats)
	ProMatlab

Examples
--------

A set of examples comes with the distribution:

xor: from RumelHart and McClelland, et al,
"Parallel Distributed Processing, Vol 1: Foundations",
MIT Press, 1986, pp. 330-334.

encode: from RumelHart and McClelland, et al,
"Parallel Distributed Processing, Vol 1: Foundations",
MIT Press, 1986, pp. 335-339.

bayes: Approximating the optimal bayes decision surface
for a gauss-gauss problem.

detect: Detecting a sine wave in noise.

iris: The classic iris database.

characters: Learing to recognize 4 characters independent
of rotation.

ring: Autoregressive network learns a decaying sinusoid
impulse response.

sequence: Autoregressive network learns to recognize
a short sequence of orthonormal vectors.

sonar: from  Gorman, R. P., and Sejnowski, T. J. (1988).  
"Analysis of Hidden Units in a Layered Network Trained to
Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89.

spiral: from  Kevin J. Lang and Michael J, Witbrock, "Learning
to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist
Models Summer School, Morgan Kaufmann, 1988.

ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987).  
"Parallel networks that learn to pronounce English text" in 
Complex Systems, 1, 145-168.

perf: a large network used only for performance testing.

monk: The backprop part of the monk paper. The MONK's problem were
the basis of a first international comparison
of learning algorithms. The result of this comparison is summarized in
"The MONK's Problems - A Performance Comparison of Different Learning
algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I.  Bratko, B.
Cestnik, J. Cheng, K. De Jong, S.  Dzeroski, S.E. Fahlman, D. Fisher,
R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J.  Kreuziger, R.S.
Michalski, T. Mitchell, P.  Pachowicz, Y. Reich H.  Vafaie, W. Van de
Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as
Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec.
1991.

wine: From the ``UCI Repository Of Machine Learning Databases 
and Domain Theories'' (ics.uci.edu: pub/machine-learning-databases).

Performance of Aspirin simulations
----------------------------------

The backpropagation code generator produces simulations
that run very efficiently. Aspirin simulations do
best on vector machines when the networks are large,
as exemplified by the Cray's performance. All simulations 
were done using the Unix "time" function and include all 
simulation overhead. The connections per second rating was
calculated by multiplying the number of iterations by the
total number of connections in the network and dividing by the
"user" time provided by the Unix time function. Two tests were 
performed. In the first, the network was simply run "forward" 
100,000 times and timed. In the second, the network was timed
in learning mode and run until convergence. Under both tests
the "user" time included the time to read in the data and initialize
the network.

Sonar:

This network is a two layer fully connected network
with 60 inputs: 2-34-60. 
				Millions of Connections per Second
	Forward:               
	  SparcStation1:                    1
	  IBM RS/6000 320:                  2.8
	  HP9000/720:                       4.0
	  Meiko i860 (40MHz) :              4.4
	  Mercury i860 (40MHz) :            5.6
	  Cray YMP:                         21.9
	  Cray C90:                         33.2
	Forward/Backward:
	  SparcStation1:                    0.3
	  IBM RS/6000 320:                  0.8
	  Meiko i860 (40MHz) :              0.9
	  HP9000/720:                       1.1
	  Mercury i860 (40MHz) :            1.3
	  Cray YMP:                         7.6
	  Cray C90:                         13.5

Gorman, R. P., and Sejnowski, T. J. (1988).  "Analysis of Hidden Units
in a Layered Network Trained to Classify Sonar Targets" in Neural Networks,
Vol. 1, pp. 75-89.

Nettalk:

This network is a two layer fully connected network
with [29 x 7] inputs: 26-[15 x 8]-[29 x 7]
				Millions of Connections per Second
	Forward:               
	  SparcStation1:                      1
	  IBM RS/6000 320:                    3.5
	  HP9000/720:                         4.5
	  Mercury i860 (40MHz) :              12.4
	  Meiko i860 (40MHz) :                12.6
	  Cray YMP:                           113.5
	  Cray C90:                           220.3
	Forward/Backward:
	  SparcStation1:                      0.4
	  IBM RS/6000 320:                    1.3
	  HP9000/720:                         1.7
	  Meiko i860 (40MHz) :                2.5
	  Mercury i860 (40MHz) :              3.7
	  Cray YMP:                           40
	  Cray C90:                           65.6

Sejnowski, T.J., and Rosenberg, C.R. (1987).  "Parallel networks that
learn to pronounce English text" in Complex Systems, 1, 145-168.

Perf:

This network was only run on a few systems. It is very large
with very long vectors. The performance on this network
is in some sense a peak performance for a machine.

This network is a two layer fully connected network
with 2000 inputs: 100-500-2000
				Millions of Connections per Second
	Forward:               
	 Cray YMP               103.00
	 Cray C90               220
	Forward/Backward:
	 Cray YMP               25.46
	 Cray C90               59.3

MIGRAINES 
------------

The MIGRAINES interface is a terminal based interface
that allows you to open Unix pipes to data in the neural
network. This replaces the NeWS1.1 graphical interface
in version 4.0 of the Aspirin/MIGRAINES software. The
new interface is not a simple to use as the version 4.0
interface but is much more portable and flexible.
The MIGRAINES interface allows users to output
neural network weight and node vectors to disk or to
other Unix processes. Users can display the data using
either public or commercial graphics/analysis tools.
Example filters are included that convert data exported through
MIGRAINES to formats readable by:

	- Gnuplot 3
	- Matlab
	- Mathematica
	- Xgobi

Most of the examples (see above) use the MIGRAINES
interface to dump data to disk and display it using
a public software package called Gnuplot3.

Gnuplot3 can be obtained via anonymous ftp from:

>>>> In general, Gnuplot 3  is available as the file gnuplot3.?.tar.Z 
>>>> Please obtain gnuplot from the site nearest you. Many of the major ftp
>>>> archives world-wide have already picked up the latest version, so if
>>>> you found the old version elsewhere, you might check there.
>>>> 
>>>> NORTH AMERICA:
>>>> 
>>>>      Anonymous ftp to dartmouth.edu (129.170.16.4)
>>>>      Fetch
>>>>         pub/gnuplot/gnuplot3.?.tar.Z
>>>>      in binary mode.

>>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' 
>>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution 
>>>>>>>> is also there (in that directory).
>>>>>>>>
>>>>>>>> There is a problem to be aware of--you will need to recompile. 
>>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c
>>>>>>>> file separately with the HELPFILE defined as the entire path name
>>>>>>>> (including the help file name.) If you don't, the Makefile will over
>>>>>>>> ride the def and help won't work (in fact it will bomb the program.)

NetTools
-----------
We have include a simple set of analysis tools
by Simon Dennis and Steven Phillips.
They are used in some of the examples to illustrate
the use of the MIGRAINES interface with analysis tools.
The package contains three tools for network analysis:

	gea - Group Error Analysis
	pca - Principal Components Analysis
	cda - Canonical Discriminants Analysis

Analyze
-------
"analyze" is a program inspired by Denis and Phillips'
Nettools. The "analyze" program does PCA, CDA, projections, 
and histograms. It can read the same data file formats as are 
supported by "bpmake" simulations and output data in a variety 
of formats. Associated with this utility are shell scripts that 
implement data reduction and feature extraction. "analyze" can be
used to understand how the hidden layers separate the data in order to
optimize the network architecture.


How to get Aspirin/MIGRAINES
-----------------------

The software is available from two FTP sites, CMU's simulator
collection and UCLA's cognitive science machines.  The compressed tar
file is a little less than 2 megabytes.  Most of this space is
taken up by the documentation and examples. The software is currently
only available via anonymous FTP.

> To get the software from CMU's simulator collection:

1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu"
(128.2.254.155). 

2. Log in as user "anonymous" with password your username.

3. Change remote directory to "/afs/cs/project/connect/code".  Any
subdirectories of this one should also be accessible.  Parent directories
should not be. ****You must do this in a single operation****:
	cd /afs/cs/project/connect/code

4. At this point FTP should be able to get a listing of files in this
directory and fetch the ones you want.

Problems? - contact us at "connectionists-request at cs.cmu.edu".

5. Set binary mode by typing the command "binary"  ** THIS IS IMPORTANT **

6. Get the file "am6.tar.Z"

7. Get the file "am6.notes"

> To get the software from UCLA's cognitive science machines:

1. Create an FTP connection to "ftp.cognet.ucla.edu" (128.97.8.19)
(typically with the command "ftp ftp.cognet.ucla.edu")

2. Log in as user "anonymous" with password your username.

3. Change remote directory to "pub/alexis", by typing the command "cd pub/alexis"

4. Set binary mode by typing the command "binary"  ** THIS IS IMPORTANT **

5. Get the file by typing the command "get am6.tar.Z"

6. Get the file "am6.notes"

Other sites
-----------

If these sites do not work well for you, then try the archie
internet mail server. Send email:
	To: archie at cs.mcgill.ca
	Subject: prog am6.tar.Z
Archie will reply with a list of internet ftp sites
that you can get the software from.

How to unpack the software
--------------------------

After ftp'ing the file make the directory you
wish to install the software. Go to that
directory and type:

	zcat am6.tar.Z | tar xvf - 

	      -or-

	uncompress am6.tar.Z ; tar xvf am6.tar

How to print the manual
-----------------------

The user documentation is located in ./doc in a 
few compressed PostScript files. To print 
each file on a PostScript printer type:
	uncompress *.Z
	lpr -s *.ps

Why?
----

I have been asked why MITRE is giving away this software.
MITRE is a non-profit organization funded by the
U.S. federal government. MITRE does research and
development into various technical areas. Our research
into neural network algorithms and applications has
resulted in this software. Since MITRE is a publically
funded organization, it seems appropriate that the
product of the neural network research be turned back
into the technical community at large.

Thanks
------

Thanks to the beta sites for helping me get
the bugs out and make this portable.

Thanks to the folks at CMU and UCLA for the ftp sites.

Copyright and license agreement
-------------------------------

Since the Aspirin/MIGRAINES system is licensed free of charge,
the MITRE Corporation provides absolutely no warranty. Should
the Aspirin/MIGRAINES system prove defective, you must assume
the cost of all necessary servicing, repair or correction.
In no way will the MITRE Corporation be liable to you for
damages, including any lost profits, lost monies, or other
special, incidental or consequential damages arising out of
the use or in ability to use the Aspirin/MIGRAINES system.

This software is the copyright of The MITRE Corporation. 
It may be freely used and modified for research and development
purposes. We require a brief acknowledgement in any research
paper or other publication where this software has made a significant
contribution. If you wish to use it for commercial gain you must contact 
The MITRE Corporation for conditions of use. The MITRE Corporation 
provides absolutely NO WARRANTY for this software.

		Russell Leighton                    ^    / |\ /|
		INTERNET: taylor at world.std.com     |-|  /  | | |
         	                                   | | /   | | |


From sun at umiacs.UMD.EDU  Mon Aug 30 13:11:10 1993
From: sun at umiacs.UMD.EDU (Guo-Zheng Sun)
Date: Mon, 30 Aug 93 13:11:10 -0400
Subject: Preprint
Message-ID: <9308301711.AA06031@sunsp2.umiacs.UMD.EDU>

Reprint: THE NEURAL NETWORK PUSHDOWN AUTOMATON: MODEL, STACK AND LEARNING
SIMULATIONS

The following reprint is available via the NEC Research Institute ftp
archive external.nj.nec.com. Instructions for retrieval from the archive
follow the abstract summary. Comments and remarks are always appreciated.

-----------------------------------------------------------------------------
..............................................................................
                     "The Neural Network Pushdown Automaton:
		      Model, Stack and Learning Simulations"

           G.Z. Sun(a,b), C.L. Giles(b,c), H.H. Chen(a,b), Y.C. Lee(a,b) 

(a) Laboratory for Plasma Research and (b) Institute for Advanced Computer
Studies, U. of Maryland, College Park, MD 20742  (c) NEC Research Institute,
4 Independence Way, Princeton, NJ 08540

In order for neural networks to learn complex languages or grammars, they
must have sufficient computational power or resources to recognize or
generate such languages.  Though many approaches have been discussed, one
obvious approach to enhancing the processing power of a recurrent neural
network is to couple it with an external stack mem ory - in effect creating
a neural network pushdown automata (NNPDA). This paper discusses in detail
this NNPDA - its construction, how it can be trained and how useful
symbolic information can be extracted from the trained network.

In order to couple the external stack to the neural network, an
optimization method is developed which uses an error function that connects
the learning of the state automaton of the neural network to the learning
of the operation of the external stack. To minimize the error function
using gradient descent learning, an analog stack is designed such that the
action and storage of information in the stack are continuous. One
interpretation of a continuous stack is the probabilistic storage of and
action on data. After training on sample strings of an unknown source
grammar, a quantization procedure extracts from the analog stack and neural
network a discrete pushdown automata (PDA). Simulations show that in
learning deterministic context-free grammars - the balanced parenthesis
language, 1n0n, and the deterministic Palindrome - the extracted PDA is
correct in the sense that it can correctly recognize unseen strings of
arbitrary length. In addition, the extracted PDAs can be shown to be
identical or equivalent to the PDAs of the source grammars which were used
to generate the training strings.

UNIVERSITY OF MARYLAND TR NOs. UMIACS-TR-93-77 & CS-TR-3118, August 20,
1993.

---------------------------------------------------------------------------

                          FTP INSTRUCTIONS

                unix> ftp external.nj.nec.com (138.15.10.100)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/giles/papers
                ftp> binary
                ftp> get NNPDA.ps.Z
                ftp> quit
                unix> uncompress NNPDA.ps.Z

		(Please note that this is a 35 page paper.)

-----------------------------------------------------------------------------


From biblio at nucleus.hut.fi  Tue Aug 31 13:08:00 1993
From: biblio at nucleus.hut.fi (Bibliography)
Date: Tue, 31 Aug 93 13:08:00 DST
Subject: Kohonen maps & LVQ -- huge bibliography (and reference request)
Message-ID: <9308311008.AA20054@nucleus.hut.fi.hut.fi>


Hello,

We are in the process of compiling the complete bibliography
of works on Kohonen Self-Organizing Map and Learning Vector
Quantization all over the world. Currently the bibliography
contains more than 1000 entries. The bibliography is now
available (in BibTeX and PostScript formats) by anonymous FTP from:

cochlea.hut.fi:/pub/ref/references.bib.Z  ( BibTeX file)
cochlea.hut.fi:/pub/ref/references.ps.Z   ( PostScript file)

The above files are compressed. Please make sure you use "binary" mode
when you transfer these files.

Please send any additions and corrections to :

     biblio at cochlea.hut.fi

Please follow the IEEE instructions of references (full names of
authors, name of article, journal name, volume + number where applicable,
first and last page number, year, etc.) and BibTeX-format, if possible.

Yours,
	Jari Kangas
	Helsinki University of Technology
	Laboratory of Computer and Information Science
	Rakentajanaukio 2 C
	SF-02150 Espoo,
	FINLAND