From carol at ai.toronto.edu  Fri Aug  3 11:14:14 1990
From: carol at ai.toronto.edu (Carol Plathan)
Date: Fri, 3 Aug 90 11:14:14 EDT
Subject: research programmer job
Message-ID: <90Aug3.111424edt.268@neuron.ai.toronto.edu>


RESEARCH PROGRAMMER JOB AT THE UNIVERSITY OF TORONTO

STARTING SALARY: $36,895 - $43,406
STARTING DATE:  Fall 1990

The Connectionist Research Group in the Department of Computer
Science at the University of Toronto is looking for a research
programmer to develop a neural network simulator that uses Unix, C,
and X-windows.  The simulator will be used by our group of about 10
researchers, directed by Geoffrey Hinton, to explore learning
procedures and their applications.  It will also be released to some
researchers in Canadian Industry.  We already have a fast, flexible
simulator and the programmer's main job will be to further develop,
document, and maintain this simulator.  The development may involve
some significant re-design of the basic simulator.  Additional
duties (if time permits) will include:

Implementing several different learning procedures within the
simulator and investigating their performance on various data-sets;
Assisting industrial collaborators and visitors in the use of the
simulator; Porting the simulator to faster workstations or to boards
that use fast processors such as the Intel i860 or DSP chips;
Developing software for a project that uses a data-glove as an input
device to an adaptive neural network that drives a speech
synthesizer; Assisting in the acquisition and installation of
hardware and software required for the project;

The applicant should possess a Bachelors or Masters, preferably in
Computer Science or Electrical Engineering, and have at least two
years programming experience including experience with unix and C,
and some experience with graphics.  Knowledge of elementary calculus
and elementary linear algebra is essential.  Knowlege of numerical
analysis, information theory, and perceptual or cognitive psychology
would be advantageous.  Good oral and written communication skills
are required.

Please send CV + names of two or three references to Carol Plathan,
Computer Science Department, University of Toronto, 10 Kings College
Road, Toronto Ontario M5S 1A4.  You could also send the information
by email to carol at ai.toronto.edu or call Carol at 416-978-3695 for
more details.  The University of Toronto is an equal opportunity
employer.

ADDITIONAL INFORMATION

The job can be given to a non-Canadian if they are better than any
Canadians or Canadian Residents who apply.  In this case, the
non-Canadian would probably start work here on a temporary work
permit while the application for a more permanent permit was being
processed.

There are already SEVERAL good applicants for the job.  Candidates
who do not already program fluently in C or have not already done
neural network simulations stand very little chance.  Also, it is
basically a programming job. The programmer may get involved in some
original research on neural nets, but this is NOT the main part of
the job, so it is not suitable for postdoctoral researchers who want
to get on with their own research agenda.

Interviews will be during September.  We will definitely not employ
anybody without an interview and we cannot afford to pay travel
expenses for interviews (except in very exceptional circumstances).
If there are several good applicants from the west coast of the USA,
I may arrange to interview them in California.

We already have sufficient funding to support the programmer for the
next three years. However, we have applied to the Canadian
Government for additional funding specifically for this work, and if
it comes through (in November 1990) the programmer will be
transferred to that source of funding and the simulator will
definitely be supplied to Canadian Industry.  The job will then
require more interactions with industrial users and more systematic
documentation, maintainance and debugging of the simulator releases.


From uhr at cs.wisc.edu  Fri Aug  3 15:18:11 1990
From: uhr at cs.wisc.edu (Leonard Uhr)
Date: Fri, 3 Aug 90 14:18:11 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008031918.AA23586@thor.cs.wisc.edu>

Neural nets using backprop have only handled VERY SIMPLE images, usually in
8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
logarithmically converging nets, but I don't know of any nets with complete
connectivity from one layer to the next that are that big.)  In sharp contrast,
pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (e.g.
houses, furniture) in 128-by-128 or even larger inputs.  So I've been really
surprised to read statements to the effect NN have proved to be much better.
What experimental evidence is there that NN recognize images as complex as those
handled by computer vision and pattern recognition approaches?

  True it's hard to run good comparative experiments, but without them where are
we?  NN re-introduce learning, which is great - except that to make learning
work we need to cut down and direct the explosive search at least as much as
using any other approach.  The brain is THE bag of tools that does the trick,
and it has a lot of structure (hierarchical convergence-divergence; local links to relatively small numbers; families of feature-detectors) that can
substantially improve today's nets.  More powerfufl structures, basic processes,
and learning mechanisms are essential to replace weak learning algorithms like
delta and backprop that need O(N*N) links to guarantee (eventual) success -
hence can't even be run on images with more than a few hundred pixels.

Len Uhr

From N.E.Sharkey at cs.exeter.ac.uk  Sat Aug  4 16:30:53 1990
From: N.E.Sharkey at cs.exeter.ac.uk (Noel Sharkey)
Date: Sat, 4 Aug 90 16:30:53 BST
Subject: special issue
Message-ID: <11054.9008041530@entropy.cs.exeter.ac.uk>


The NATURAL LANGUAGE special issue of CONNECION SCIENCE will be
on the shelves soon. I though you might like to see the contents.


CONTENTS

Catherine L Harris
   Connectionism and Cognitive Linguistics

John Rager & George Berg
   A Connectionist Model of Motion and Government in Chomsky's
   Government-binding Theory

David J Chalmers
   Syntactic Transformations on Distributed Representations

Stan C Kwasny & Kanaan A Faisal
   Connectinism and Determinism in a Syntactic Parser

Risto Miikkulainen
   Script Recognition with Hierarchical Feature Maps

Lorraine F R Karen
   Identification of Topical Entities in Discouse: a Connectionist
   Approach to Attentional Mechanism in Language

Mary Hare
   The Role of Similarity in Hungarian Vowel Harmony: a Connectionist
   Account

Robert Port
   Representation and Recognition of Temporal Patterns


Editor: Noel E. Sharkey, University of Exeter

Special Editorial Review Panel

Robert B. Allen, Bellcore
Garrison W. Cottrell, University of California, San Diego
Michael G. Dyer, University of California, Los Angeles
Jeffrey L. Elman, University of California, San Diego
George Lakoff, University of California, Berkeley
Wendy G. Lehnert, University of Massachusetts at Amherst
Jordan Pollack, Ohio State University
Ronan Reilly, Beckman Institute, University of Illinois at Urbana-Champaign
Bart Selman, University of Toronto
Paul Smolensky, University of Colorado, Boulder

We would like to encourage the CNLP community to submit many
more papers, and we would particulary like to see more papers on 
representational issues.

noel


From schraudo%cs at ucsd.edu  Sat Aug  4 15:43:20 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Sat, 4 Aug 90 12:43:20 PDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008041943.AA01622@beowulf.ucsd.edu>

> From: Leonard Uhr <uhr at cs.wisc.edu>
> 
> Neural nets using backprop have only handled VERY SIMPLE images, usually in
> 8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
> logarithmically converging nets, but I don't know of any nets with complete
> connectivity from one layer to the next that are that big.) In sharp contrast,
> pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (eg
> houses, furniture) in 128-by-128 or even larger inputs.  So I've been really
> surprised to read statements to the effect NN have proved to be much better.
> What experimental evidence is there that NN recognize images as complex as
> those handled by computer vision and pattern recognition approaches?

Well, Gary Cottrell for instance has successfully used a standard (3-layer,
fully interconnected) backprop net for various face recognition tasks from
64x64 images.  While I agree with you that many NN architectures don't scale
well to large input sizes, and that modular, heterogenous architectures have
the potential to overcome this limitation, I don't understand why you insist
that current NNs could only handle simple images - unless you consider any
image with less than 16k pixels simple.  Does face recognition qualify as a
complex visual task with you?

The whole point of using comparatively inefficient NN setups (such as fully
interconnected backprop nets) is that they are general enough to solve
complex problems without built-in heuristics.  Modular NNs require either
a lot of prior knowledge about the problem you are trying to solve, or a
second adaptive system (such as a GA) to search the architecture space.
In the former case the problem is comparatively easy, and in the latter
computational complexity rears its ugly head again... having said that,
I do believe that GA/NN hybrids will play an important role in the future.

I'm afraid I don't have a reference for Gary Cottrell's work - maybe
someone else can post the details?
--
Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From honavar at cs.wisc.edu  Sat Aug  4 20:43:56 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sat, 4 Aug 90 19:43:56 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008050043.AA05173@goat.cs.wisc.edu>


>The whole point of using comparatively inefficient NN setups (such as fully
>interconnected backprop nets) is that they are general enough to solve
>complex problems without built-in heuristics.  

	While I know of theoretical results that show that a feedforward
	neural net exists that can adequately encode any arbitrary
 	real-valued function (Hornick, Stinchcombe, & White, 1988;
	Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of
	any results that suggest that such nets can LEARN any real-vauled
	function using backpropagation (ignoring the issue of 
	computational tractability). 

	Heuristics (or architectural constraints) like those used
	by some researchers for some vision problems - locally linked 
	multi-layer converging nets (probably one of
	the most successful demonstrations is the work of LeCun et al. 
	on handwritten zip code recognition) are interesting because
	they constrain (or bias) the network to develop particular types of
	representations. Also, they might enable efficient learning
	to take place in tasks that exhibit a certain intrinsic structure.
	
	The choice of a particular fixed neural network architecture 
	(even if it is fully interconnected backprop net) implies the 
	use of a corresponding representational bias. 
	Whether such a representational bias is in any sense more
	general than some other (e.g., a network of nodes with limited 
	fan-in but sufficient depth) is questionable (For any given
	completely interconnected feedforward network, there exists
	a functionally equivalent feedforward network of nodes with
	limited fan in - and for some problems, the latter may be
	more efficient).

	On a different note, how does one go about assessing the 
	"generality" of a learning algorithm/architecture in practice?
	I would like to see a discussion on this issue.

	Vasant Honavar (honavar at cs.wisc.edu)


From schraudo%cs at ucsd.edu  Sun Aug  5 05:54:43 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Sun, 5 Aug 90 02:54:43 PDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008050954.AA00265@beowulf.ucsd.edu>

> From honavar at cs.wisc.edu Sat Aug  4 17:45:01 1990
> 
> While I know of theoretical results that show that a feedforward
> neural net exists that can adequately encode any arbitrary
> real-valued function (Hornick, Stinchcombe, & White, 1988;
> Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of
> any results that suggest that such nets can LEARN any real-vauled
> function using backpropagation (ignoring the issue of 
> computational tractability). 
> 
It is my understanding that some of the latest work of Hal White et al.
presents a learning algorithm - backprop plus a rule for adding hidden
units - that can (in the limit) provably learn any function of interest.
(Disclaimer: I don't have the mathematical proficiency required to fully
appreciate White et al.'s proofs and thus have to rely on second-hand
interpretations.)

> On a different note, how does one go about assessing the 
> "generality" of a learning algorithm/architecture in practice?
> I would like to see a discussion on this issue.
> 
I second this motion.  As a starting point for discussion, would the
Kolmogorov complexity of an architectural description be useful as a
measure of architectural bias?
--
Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From aarons at cogs.sussex.ac.uk  Sun Aug  5 07:57:52 1990
From: aarons at cogs.sussex.ac.uk (Aaron Sloman)
Date: Sun, 5 Aug 90 12:57:52 +0100
Subject: Summary (long): pattern recognition comparisons
Message-ID: <6816.9008051157@csuna.cogs.susx.ac.uk>


> From: Leonard Uhr <uhr at cs.wisc.edu>
>
> Neural nets using backprop have only handled VERY SIMPLE images.....
>  .......In sharp contrast, pr/computer vision systems are designed
> to handle MUCH MORE COMPLEX images (eg houses, furniture) in
> 128-by-128 or even larger inputs....
	.....

> From: Nici Schraudolph <schraudo%cs at ucsd.edu>
> Well, Gary Cottrell for instance has successfully used a standard (3-layer,
> fully interconnected) backprop net for various face recognition tasks from
> 64x64 images.  While I agree with you that many NN architectures don't scale
> well to large input sizes, and that modular, heterogenous architectures have
> the potential to overcome this limitation, I don't understand why you insist
> that current NNs could only handle simple images - unless you consider any
> image with less than 16k pixels simple.  Does face recognition qualify as a
> complex visual task with you?
> ......

Characterising the complexity of the task in terms of the number of
pixels seems to me to miss the most important points.

Some (but by no means all) of the people working on NNs appear to have
joined the field (the bandwagon?) without feeling obliged to study the
AI literature on vision, perhaps because it is assumed that since the
AI mechanisms are "wrong" all the literature must be irrelevant?

On the contrary, good work in AI vision was concerned with understanding
the nature of the task (or rather tasks) of a visual system,
independently of the mechanisms postulated to perform those tasks. (When
your programs fail you learn more about the nature of the task.)

Recognition of isolated objects (e.g. face recognition) is just _one_ of
the tasks of vision.

Others include:

(a) Interpreting a 2-D array (retinal array or optic array) in terms of
3-D structures and relationships. Seeing the 3-D structure of a face is
a far more complex task than simply attaching a label: "Igor", "Bruce"
or whatever.

(b) Segmenting a complex scene into separate objects and describing the
relationships between them (e.g. "houses, furniture"!). (The
relationships include 2-D and 3-D spatial and functional relations.)
Because evidence for boundaries is often unclear and ambiguous, and
because recognition has to be based on combinations of features, the
segmentation often cannot be done without recognition and recognition
cannot be done without segmentation. This chicken and egg problem can
lead to dreadful combinatorial searches. NNs offer the prospect of doing
some of the searching in parallel by propagating constraints, but as
far as I know they have not yet matched the more sophisticated AI
visual systems.

(It is important to distinguish segmentation, recognition and
description of 2-D image fragments from segmentation, recognition and
description of 3-D objects. The former seems to be what people in
pattern recognition and NN research concentrate on most. The latter has
been a major concern of AI vision work since the mid/late sixties,
starting with L.G. Roberts I think, although some people in AI have
continued trying to find 2-D cues to 3-D segmentation. Both 2-D and 3-D
interpretations are important in human vision.)

(c) Seeing events, processes and their relationships. Change "2-D" to
"3-D" and "3-D" to "4-D" in (b) above. We are able to segment, recognize
and describe events, processes and causal relationships as well as
objects (e.g. following, entering, leaving, catching, bouncing,
intercepting, grasping, sliding, supporting, stretching, compressing,
twisting, untwisting, etc. etc.) Sometimes, as Johansson showed by
attaching lights to human joints in a dark room, motion can be used
to disambiguate 3-D structure.

(d) Providing information and/or control signals for motor-control
mechanisms: e.g. visual feedback is used (unconsciously) for posture
control in sighted people, also controlling movement of arm, hand and
fingers in grasping, etc. (I suspect that many such processes of fine
tuning and control use changing 2-D "image" information rather than (or
in addition to) 3-D structural information.)

That's still only a partial list of the tasks of a visual system.
For more detail see:
 A. Sloman `On designing a visual system: Towards a Gibsonian
 computational model of vision' in Journal of Experimental and
 Theoretical AI 1,4, 1989

 Ballard, D.H. and C.M. Brown,
 Computer Vision,
 Englewood-Cliffs, Prentice Hall 1982.

A system might be able to recognize isolated faces or other objects in
an image by using mechanisms that would fail miserably in dealing with
cluttered scenes where recognition and segmentation need to be combined.
So a NN that recognised faces might tell us nothing about how it is done
in natuarly visual systems, if the latter use more general mechanisms.

One area in which I think neither AI nor NN work has made significant
progress is shape perception. (I don't mean shape recognition!). People,
and presumably many other animals, can see complex, intricate, irregular
and varied shapes in a manner that supports a wide range of tasks,
including recognizing, grasping, planning, controlling motion,
predicting the consequences of motion, copying, building, etc. etc.
Although a number of different kinds of shape representations have been
explored in work on computer vision, CAD, graphics etc. (e.g. feature
vectors; logical descriptions; networks of nodes and arcs; numbers
representing co-ordinates, orientations, curvature etc; systems of
equations for lines, planes, and other mathematically simple structures;
fractals; etc. etc. etc.) they all seem capable of capturing only a
superficial subset of what we can see when we look at kittens, sand
dunes, crumpled paper, a human torso, a shrubbery, cloud formations,
under-water scenes, etc. (Work on computer graphics is particularly
misleading, because people are often tempted to think that a
representation that _generates_ a natural looking image on a screen must
capture what we see in the image, or in the scene that it depicts.)

Does anyone have any idea what kind of breakthrough is needed in order
to give a machine the kind of grasp of shape that can explain animal
abilities to cope with real environments?

Is there anything about NN shape representations that given them an
advantage over others that have been explored, and if so what are they?

I suspect that going for descriptions of static geometric structure is a
dead end: seeing a shape really involves seeing potential processes
involving that shape, and their limits (something like what J.J. Gibson
meant by "affordances"?). I.e. a 3-D shape is inherently a vast array of
4-D possibilities and one of the tasks of a visual system is computing a
large collection of those possibilities and making them readily
available for a variety of subsequent processes.

But that's much too vague an idea to be very useful. Or is it?

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons at cogs.sussex.ac.uk
or:
            aarons%uk.ac.sussex.cogs at nsfnet-relay.ac.uk


From honavar at cs.wisc.edu  Sun Aug  5 15:48:37 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sun, 5 Aug 90 14:48:37 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008051948.AA00212@goat.cs.wisc.edu>


>It is my understanding that some of the latest work of Hal White et al.
>presents a learning algorithm - backprop plus a rule for adding hidden
>units - that can (in the limit) provably learn any function of interest.
>(Disclaimer: I don't have the mathematical proficiency required to fully
>appreciate White et al.'s proofs and thus have to rely on second-hand
>interpretations.)

	I can see how allowing the addition of (potentially unbounded
number of hidden units) could enable a back-prop architecture to learn
arbitrary functions. But in this sense, any procedure that builds up
a look-up table or random-access memory (with some interpolation
capability to cover the instances not explicitly stored) using an 
appropriate set of rules to add units is equally general 
(and probably more efficient than backprop in terms of time complexity 
of learning (cf Baum's proposal for more powerful learning algorithms). 
However look-up tables can be combinatorially intractable in terms of 
memory (space) complexity. This brings us to the issue of searching the 
architectural space along with the weight space in an efficient manner.  
There has already been some work in this direction (Fahlman's cascade
correlation architecture, Ash's DNC, Honavar &  Uhr's generative learning,
Hanson's meiosis networks, and some recent work on ga-nn hybrids). 
We have been investigating methods to constrain the search in the 
architectural space (using heuristic controls / representational bias :-) ). 
I would like to hear from others who might be working on related issues.

Vasant Honavar (honavar at cs.wisc.edu)


From galem at mcc.com  Sun Aug  5 17:48:25 1990
From: galem at mcc.com (Gale Martin)
Date: Sun, 5 Aug 90 16:48:25 CDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008052148.AA02989@sunkist.aca.mcc.com>


Leonard Uhr states (about NN learning) "to make learning work, 
we need to cut down and direct explosive search at least as much 
as using any other approach."  

Certainly there is reason to agree with this in the general case, but I
doubt it's validity in important specific cases.  I've spent the past couple
of years working on backprop-based handwritten character recognition and 
find almost no supporting evidence of the need for explicitly cutting
down on explosive search through the use of heuristics in these 
SPECIFIC cases and circumstances.

We varied input character array size (10x16, 15x24, 20x32) to backprop
nets and found no difference in the number of training samples required
to achieve a given level of generalization performance for hand-printed
letters. In nets with one hidden layer, we increased the number 
of hidden nodes from 50 to 383 and found no increase in the number of 
training samples needed to achieve high generalization (in fact, generalization
is worse for the 50 hidden node case).  We experimented extensively with
nets having local connectivity and locally-linked nets in this domain and 
find similarly little evidence to support the need for such heuristics. These
results hold across two different types of handwritten character recognition
tasks (hand-printed letters and digits).  
 
This domain/case-specific robustness across architectural parameters and 
input size is one way to characterize the generality of a learning algorithm 
and may recommend one algorithm over another for specific problems.

Gale Martin

Martin, G. L., & Pittman, J. A.  Recognizing hand-printed letters and digits
	in D.S. Touretzky (Ed.) Advances in Neural Information Processing 
	Systems 2, 1990.
Martin, G.L., Leow, W.K. & Pittman, J. A.  Function complexity effects on
	backpropagation learning.  MCC Tech Report ACT-HI-062-90. 
	
 
From ganesh at cs.wisc.edu  Sun Aug  5 17:59:23 1990
From: ganesh at cs.wisc.edu (Ganesh Mani)
Date: Sun, 5 Aug 90 16:59:23 -0500
Subject: Paper
Message-ID: <9008052159.AA21968@sharp.cs.wisc.edu>

The following paper is available for ftp from the repository at Ohio State.
Please backpropagate comments (and errors!) to ganesh at cs.wisc.edu.

-Ganesh Mani

_________________________________________________________________________


Learning by Gradient Descent in Function Space

Ganesh Mani
Computer Sciences Dept.
Unviersity of Wisconsin---Madison
ganesh at cs.wisc.edu

Abstract

Traditional connectionist networks have homogeneous nodes
wherein each node executes the same function.  Networks where each node
executes a different function can be used to achieve efficient
supervised learning. A modified back-propagation algorithm
for such networks, which performs gradient descent in ``function space,''
is presented and its advantages are discussed.  The benefits of the 
suggested paradigm include faster learning and ease of interpretation 
of the trained network.

_________________________________________________________________________


The following can be used to ftp the paper.


unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): neuron
ftp> cd pub/neuroprose
ftp> type binary
ftp> get
(remote-file)  mani.function-space.ps.Z
(local-file) mani.function-space.ps.Z
ftp> quit
unix> uncompress  mani.function-space.ps.Z
unix> lpr -P(your_local_postscript_printer)  mani.function-space.ps

From honavar at cs.wisc.edu  Mon Aug  6 00:27:25 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sun, 5 Aug 90 23:27:25 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008060427.AA00489@goat.cs.wisc.edu>


We have found that with relatively small sample sizes, 
generalization performance is improved by local connectivity
and weight sharing on simple 2-d patterns.

For position-invariant recognition, local connectivity 
and weight-sharing give substantially better generalization 
performance than that obtained without local connectivity.

Clearly this is a case where extensive empirical studies 
are needed to draw general conclusions.

Vasant Honavar (honavar at cs.wisc.edu)

From awyk at wapsyvax.oz.au  Mon Aug  6 03:17:41 1990
From: awyk at wapsyvax.oz.au (Brian Aw)
Date: Mon, 6 Aug 90 15:17:41+0800
Subject: No subject
Message-ID: <9008060725.649@munnari.oz.au>

Dear Sir/Mdm,
	Hello!  My name is Brian Aw and my e-mail address is awyk at wapsyvax.oz  
Would you kindly put me on both your address list and your mailing list for 
connectionist related results.
	I am a Ph.D. student as well as a research officer in the Psychology
Department of the University of Western Australia (UWA), Perth.  I am working 
under the supervision of Prof. John Ross who has recently joined your lists.  
	I am an enthusiastic worker of neural network theory.  Currently, I am 
developing a neural network for feature classifications in images.  This year, 
I have published a technical report in the Computer Scrience Department of UWA 
in this area.  My work has also been accepted for presentation and publication 
in the forthcoming 4th Australian Joint Conference on Artificial Intelligence 
(AI'90).
	Working in this field which advances so rapidly, I certainly need the
kind of fast going and up-to-date informations which your system can provide.
	Thanking you in advance.

brian.

From erol at ehei.ehei.fr  Mon Aug  6 08:07:39 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 12:09:39 +2 
Subject: IJPRAI CALL FOR PAPERS
Message-ID: <9008061041.AA24889@inria.inria.fr>

Would you consider a paper on my "random network model" ?
There are two papers already appeared or appearing in the journal
Neural Computation.

Best regards,

Erol

From erol at ehei.ehei.fr  Mon Aug  6 05:47:31 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 09:49:31 +2 
Subject: Visit to Poland
Message-ID: <9008061014.AA24279@inria.inria.fr>

I don't know about Poland, but you can contact me in Paris !

Erol Gelenbe

From INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU  Sun Aug  5 15:56:00 1990
From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU (INS_ATGE%JHUVMS.BITNET@VMA.CC.CMU.EDU)
Date: Sun, 5 Aug 90 14:56 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <mailman.320.1149540203.24850.connectionists@cs.cmu.edu>

As a side note on the problem of using backpropagation on large
problems, it should be noted that using efficient error minimization
methods (i.e. conjugate-gradient methods) as opposed to
the "vanilla" backprop described in _Parallel_Distributed_Processing_
allows one to work with much larger problems, and also allows for much
greater performance on problems the network was trained on.
For example, an IR target threat detection problem I have been recently
working on (with 127 or 254 inputs and 20 training patterns)
failed miserably when trained with "vanilla" backprop (hours and
hours on a Connection Machine without success).  When a
conjugate-gradient training program was used, the network was able to
learn 100% of the training set perfectly in just a minute or two.

>It is my understanding that some of the latest work of Hal White et al.
>presents a learning algorithm - backprop plus a rule for adding hidden
>units - that can (in the limit) provably learn any function of interest.
>(Disclaimer: I don't have the mathematical proficiency required to fully
>appreciate White et al.'s proofs and thus have to rely on second-hand
>interpretations.)

How does this new work compare with the Cascade Correlation method
developed by Fahlman, where a new hidden unit is added by training
its receptive weights to maximize the correlation between its
output and the network error, and then trains the projective weights
to the outputs to minimize the error (thus only allowing single-layer
backprop learning at each iteration)?

-Thomas Edwards
The Johns Hopkins University / U.S. Naval Research Lab


From erol at ehei.ehei.fr  Mon Aug  6 11:44:10 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 15:46:10 +2 
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008061444.AA05688@inria.inria.fr>

I would like to draw your attention to two recent papers of mine (my name is
Erol Gelenbe) :

Random networks with positive and negative signals and product form solutions
in Neural Computation, Vol. 1, No. 4 (1989)

Stability of the random network model
in press in Neural Computation.

The papers present a new model in which signals travel as "pulses".
The quantity looked at in the model is the "neuron potential" in an
arbitrarily connected network. I prove that these models have "product
form" which means that there state can be computed simply and analytically.

Comments and questions are welcome.

erol at ehei.ehei.fr

From fritz_dg%ncsd.dnet at gte.com  Mon Aug  6 17:26:57 1990
From: fritz_dg%ncsd.dnet at gte.com (fritz_dg%ncsd.dnet@gte.com)
Date: Mon, 6 Aug 90 17:26:57 -0400
Subject: neural network generators in Ada
Message-ID: <9008062126.AA27920@bunny.gte.com>


Are there any non-commercial Neural Network "generator programs" or
such that are in Ada? (ie. generates suitable NN code from a set of
user designated specifications, code suitable for embedding, etc).

I'm interested in

	- experience developing and using same, lessons learned
	- to what uses such have been put, successful?
	- nature of; internal use of lists, arrays; what can be user
	  specified, what can't; built-in limitations; level of HMI
	  attached; compilers used; etc., etc.
	- and other relevant info developing and applying such from those
	  who have tried developing and using them

Am also interested in opinions on:

	If you were going to design a NN Maker _today_, how would you design it?
	If Ada were the language, what special things might be done?

Motive should be transparent.  My sincere thanks to all who respond.  If there
is interest, I'll turn the info (if any) around to the list in general.

Dave Fritz	fritz_dg%ncsd at gte.com
		(301) 738-8932		

----------------------------------------------------------------------
----------------------------------------------------------------------

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Mon Aug  6 23:20:09 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Mon, 06 Aug 90 23:20:09 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Sun, 05 Aug 90 14:56:00 -0500.
Message-ID: <mailman.321.1149540203.24850.connectionists@cs.cmu.edu>


    >It is my understanding that some of the latest work of Hal White et al.
    >presents a learning algorithm - backprop plus a rule for adding hidden
    >units - that can (in the limit) provably learn any function of interest.
    >(Disclaimer: I don't have the mathematical proficiency required to fully
    >appreciate White et al.'s proofs and thus have to rely on second-hand
    >interpretations.)
    
    How does this new work compare with the Cascade Correlation method
    developed by Fahlman, where a new hidden unit is added by training
    its receptive weights to maximize the correlation between its
    output and the network error, and then trains the projective weights
    to the outputs to minimize the error (thus only allowing single-layer
    backprop learning at each iteration)?
    
    -Thomas Edwards
    The Johns Hopkins University / U.S. Naval Research Lab


I'll take a stab at answering this.  Maybe we'll also hear something from
Hal White or one of his colleagues -- especially if I somehow misrepresent
their work.

I believe that all of the published completeness results from White's group
assume a single layer of hidden units.  They show that this architecture
can approximate any desired transfer function (assuming it has certain
smoothness properties) to any desired accuracy if you add enough units in
this single layer.  It's rather like proving that a piecewise linear
approximation can approach any desired curve with arbitrarily small error
as long as you're willing to use enough tiny pieces.  Unless I've
missed something, their work does not attempt to say anything about the
minimum number of hidden units you might need in this hidden layer.

Cascade-Correlation produces a feed-forward network of sigmoid units, but
it differs in a number of ways from the kinds of nets considered by White:

1. Cascade-Correlation is intended to be a practical learning algorithm
that produces a relatively compact solution as fast as possible.

2. In a Cascade net, each new hidden unit can receive inputs from all
pre-existing hidden units.  Therefore, each new unit is potentially a new
layer.  White's results show that you don't really NEED more than a single
hidden layer, but having more layers can sometimes result in a very
dramatic reduction in the total number of units and weights needed to solve
a given problem.

3. There is no convergence proof for Cascade-Correlation.  The candidate
training phase, in which we try to create new hidden units by hill-climbing
in some correlation measure, can and does get stuck in local maxima of this
function.  That's one reason we use a pool of candidate units: by training
many candidates at once, we can greatly reduce the probability of creating
new units that do not contribute significantly to the solution, but with a
finite candidate pool we can never totally eliminate this possibility.

It would not be hard to modify Cascade-Correlation to guarantee that it
will eventually grind out a solution.  The hard part, for a practical
learning algorithm, is to guarantee that you'll find a "reasonably good"
solution, however you want to define that.  The recent work of Gallant and
of Frean are interesting steps in this direction, at least for
binary-valued transfer functions and fixed, finite training sets.

-- Scott

From jamesp at chaos.cs.brandeis.edu  Mon Aug  6 21:38:40 1990
From: jamesp at chaos.cs.brandeis.edu (James Pustejovsky)
Date: Mon, 6 Aug 90 21:38:40 edt
Subject: Visit to Poland
In-Reply-To: erol@ehei.ehei.fr's message of Mon, 6 Aug 90 09:49:31 +2  <9008061014.AA24279@inria.inria.fr>
Message-ID: <9008070138.AA17019@chaos.cs.brandeis.edu>


please withdraw my name from the list. there is too much random and irrelevant
noise around the occasional noteworthy bit. 

From ericj at starbase.MITRE.ORG  Tue Aug  7 08:33:27 1990
From: ericj at starbase.MITRE.ORG (Eric Jenkins)
Date: Tue, 7 Aug 90 08:33:27 EDT
Subject: ref for conjugate-gradient...
Message-ID: <9008071233.AA25689@starbase>


Would someone please post a pointer to info on conjugate-gradient methods of
error minimization.  Thanks.

Eric Jenkins (ericj at ai.mitre.org)
 

From erol at ehei.ehei.fr  Tue Aug  7 07:06:28 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Tue, 7 Aug 90 11:08:28 +2 
Subject: Call for Papers - ICGA-91
Message-ID: <9008071511.AA21568@inria.inria.fr>

Concerning the scope of the conference, could the program chairman indicate
what the boundaries of the area of genetic algorithms are in the context of
this meeting ?

This can be indicated by providing one or more references the conference
chairman considers to be "typical" work in this area.

Erol Gelenbe

From erol at ehei.ehei.fr  Tue Aug  7 10:42:33 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Tue, 7 Aug 90 14:44:33 +2 
Subject: postdoc position available
Message-ID: <9008071513.AA21597@inria.inria.fr>


From jose at learning.siemens.com  Tue Aug  7 19:55:05 1990
From: jose at learning.siemens.com (Steve Hanson)
Date: Tue, 7 Aug 90 18:55:05 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <9008072355.AA05108@learning.siemens.com.siemens.com>

Scott:  Isn't CC just Cart?

Steve


From schraudo%cs at ucsd.edu  Tue Aug  7 15:05:35 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Tue, 7 Aug 90 12:05:35 PDT
Subject: Similarity to Cascade-Correlation
Message-ID: <9008071905.AA10253@beowulf.ucsd.edu>

> From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU
> 
> How does [White et al.'s] new work compare with the Cascade Correlation
> method developed by Fahlman [...]?

In practical terms, very badly.  Their algorithm's point is purely theore-
tical: they can prove convergence from only a very small base of assumptions
about the function to be learned.  Do any similar proofs exist for Cascade
Correlation?  That would be interesting.
--
Nicol N. Schraudolph, C-014                nici%cs at ucsd.edu
University of California, San Diego        nici%cs at ucsd.bitnet
La Jolla, CA 92093-0114                    ...!ucsd!cs!nici

From erol at ehei.ehei.fr  Wed Aug  8 07:12:24 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Wed, 8 Aug 90 11:14:24 +2 
Subject: abstract
Message-ID: <9008081009.AA23199@inria.inria.fr>

I would be very interested to get a copy of this paper. Thankyou in advance,

Erol Gelenbe
erol at ehei.ehei.fr

From pkube at ucsd.edu  Wed Aug  8 15:23:30 1990
From: pkube at ucsd.edu (pkube@ucsd.edu)
Date: Wed, 08 Aug 90 13:23:30 MDT
Subject: ref for conjugate-gradient... 
In-Reply-To: Your message of Tue, 07 Aug 90 08:33:27 EDT.
             <9008071233.AA25689@starbase> 
Message-ID: <9008082023.AA07129@kokoro.ucsd.edu>


For understanding and implementing conjugate gradient and other optimization
methods cleverer than vanilla backprop, I've found the following to be
useful:

%A William H. Press
%T Numerical Recipes in C: The Art of Scientific Computing
%I Cambridge University Press
%D 1988

%A J. E. Dennis
%A R. B. Schnabel
%T Numerical Methods for Unconstrained Optimization and Nonlinear Equations
%I Prentice-Hall
%D 1983

%A R. Fletcher
%T Practical Methods of Optimization, Vol. 1:  Unconstrained Optimization
%I John Wiley & Sons
%D 1980

	--Paul Kube at ucsd.edu

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Wed Aug  8 10:09:48 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Wed, 08 Aug 90 10:09:48 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Wed, 08 Aug 90 08:39:55 -0500.
             <9008081339.AA05550@learning.siemens.com.siemens.com> 
Message-ID: <mailman.322.1149540203.24850.connectionists@cs.cmu.edu>


I got this clarification from Steve Hanson of his original query, which I
found a bit cryptic:

    Isn't cascade Correlation a version (almost exact except for splitting
    rule--although I believe CART allows for other splitting rules) of 
    CART---the decision tree with the hyperplane  feature space cuts...?

My memory of Cart is a bit fuzzy, but I think it's very different from
Cascade-Correlation.  Unless I'm confused, here are a couple of glaring
differences:

1. In a decision-tree setup like CART, each new split works within one of
the regions of space that you've already carved out -- that is, within only
one branch of the tree.  So for something like N-bit parity, you'd need 2^N
hidden units (hyperplanes).  In a single-layer backprop net, you need only
N hidden units because they are shared.  Because it creates higher-order
units, Cascade-Correlation can generally do the job in less than N.  (See
the results in the Cascade-Correlation paper.)  I don't remember if any
version of CART makes serendipitous use of hyperplanes that were created
earlier to split other branches.  I am pretty sure, however, that it works
on splitting just one branch at a time, and doesn't actively try to create
hyperplanes that are useful in splitting many branches at once.

2. If you create all your new hidden units in a single layer, all you can
do is create hyperplanes in the original space of input features.  Because
it builds up multiple layers, Cascade-Correlation can create higher-order
units of great complexity, not just hyperplanes.  If you have the tech
report on Cascade-Correlation (the diagrams had to be cut from the NIPS
version due to page limitations), look at the strange complex curves it
creates in solving the two-spirals problem.  If you prefer,
Cascade-Correlation works by raising the dimensionality of the space and
then drawing hyperplanes in this new complex space, but the projection back
onto the original input space does not look like a straight line.  I've
never heard of anyone solving the two-spirals problem with a single layer
of sigmoid or threshold units -- it would take an awful lot of them.

I think that these two differences change the game entirely.  The only
resemblance I see between CART and Cascade-Correlation is that both build
up a structure little by little, trying to add new nonlinear elements that
eliminate some part of the remaining error.  But the kinds of structures
the two algorithms deal in is qualitatively different.

-- Scott

From pollack at cis.ohio-state.edu  Wed Aug  8 02:11:16 1990
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Wed, 8 Aug 90 02:11:16 -0400
Subject: Cascade-Correlation, etc
Message-ID: <9008080611.AA11352@dendrite.cis.ohio-state.edu>

Scott's description of his method and the need for a convergence
proof, reminded me of the line of research by Meir & Domany (Complex
Sys 2 1988) and Nadal & Mezard (Int.Jrnl. Neural Sys 1,1,1989).  In a
paper definitely related to theirs (which I cannot find), someone
proved (by construction) that each hidden unit added on top of a
feedforward TLU network could monotonically decrease the number of
errors for arbitrary-fan-in, single-output boolean functions. This
result might be generalizable to CC networks.

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Fax/Phone: (614) 292-4890


From FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU  Thu Aug  9 01:51:08 1990
From: FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU (Tal Grossman)
Date: Thu, 09 Aug 90 08:51:08 +0300
Subject: Network Constructing Algorithms.
Message-ID: <mailman.323.1149540203.24850.connectionists@cs.cmu.edu>

Network constructing algorithms, i.e. learning algorithms which add units
while training, receive a lot of interest these days. I've recently
compiled a reference list of papers presenting such algorithms. I send
this list as a small contribution to the last discussion. I hope people
will find it relevant and usefull.  Of course, it is probably not
exhaostive - and I'd like to hear about any other related work.
Note that two refs. are quite old (Hopcroft and Cameron) - from the threshold
logic days. A few papers include convergence proofs (Frean, Gallant,
Mezard and Nadal, Marchand et al). Naturally, there is a significant
overlap between some of the algorithms/architecture.

I also appologize for the primitive Tex format.

                                               Tal grossman < fegross at weizmann>
                                               Electronics Dept.
                                               Weizmann Inst.
                                               Rehovot 76100, ISRAEL.
-------------------------------------------------------------------------------

\centerline{\bf Network Generating Learning Algortihms - Refernces.}

T. Ash, ``Dynamic Node Creation in Back-Propagation Networks",
Tech.Rep.8901, Inst. for Cognitive Sci., Univ. of California, San-Diego.

Cameron S.H., ``The Generation of Minimal Threshold Nets by an
Integer Program",
IEEE TEC {\bf EC-13},299 (1964).

S.E. Fahlman and C.L. Lebiere,
``The Cascade-Correlation Learning Architecture",
in {\it Advances in Neural Information Processing Systems 2},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 524.

M. Frean, ``The Upstart Algorithm: a Method for Constructing and
Trainig Feed Forward Neural Networks",
Neural Computation {\bf 2}:2 (1990).

S.I. Gallant, ``Perceptron -Based Learning Algorithms", IEEE Trans. on
Neural Networks {\bf 1}, 179 (1990).

M. Golea and M. Marchand, ``A Growth Algorithm for Neural Network
Decision Trees", EuroPhys.Lett. {\bf 12}, 205 (1990).

S.J. Hanson, ``Meiosis Networks",
in {\it Advances in Neural Information Processing Systems 2},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 533.

Honavar V. and Uhr L.
in the {\it Proc. of the 1988 Connectionist
Models Summer School}, Touretzky D., Hinton G. and Sejnowski T. eds.
(Morgan Kaufmann, San Mateo, 1988).

Hopcroft J.E. and Mattson R.L.,
``Synthesis of Minimal Threshold Logic Networks",
IEEE TEC {\bf EC-14}, 552 (1965).

Mezard M. and Nadal J.P.,
``Learning in Feed Forward Layered Networks - The Tiling Algorithm",
J.Phys.A {\bf 22}, 2129 (1989).

J.Moody, ``Fast Learning in Multi Resolution Hierarchies",
in {\it Advances in Neural Information Processing Systems 1},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1989).

J.P. Nadal, ``Study of a Growth Algorithm for a Feed Forward Network",
International J. of Neural Systems {\bf 1}, 55 (1989).

Rujan P. and Marchand M.,
``Learning by Activating Neurons: A New Approach to Learning in
Neural Networks",
Complex Systems {\bf 3}, 229 (1989); and also in the
{\it Proc. of the First International
Joint Conference on Neural Networks -
Washington D.C. 1989}, Vol.II, pp.105.

J.A. Sirat and J.P. Nadal, ``Neural Trees: A New Tool for Classification",
preprint, submitted to "Network", April 90.
\bye

From LAUTRUP at nbivax.nbi.dk  Thu Aug  9 05:19:00 1990
From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup)
Date: Thu, 9 Aug 90 11:19 +0200 (NBI, Copenhagen)
Subject: International Journal of Neural Systems
Message-ID: <510E1F38537FE1E6AD@nbivax.nbi.dk>

 
Begin Message:

-----------------------------------------------------------------------


INTERNATIONAL JOURNAL OF NEURAL SYSTEMS 
       
The International Journal of Neural  Systems  is  a  quarterly  journal
which covers information processing in natural  and  artificial  neural
systems. It publishes original contributions on  all  aspects  of  this
broad subject which involves  physics,  biology,  psychology,  computer
science and engineering. Contributions include research papers, reviews
and short communications.  The  journal  presents  a  fresh  undogmatic
attitude towards this multidisciplinary field with  the  aim  to  be  a
forum for novel ideas and  improved  understanding  of  collective  and
cooperative phenomena with computational capabilities. 

ISSN: 0129-0657 (IJNS) 

----------------------------------

Contents of issue number 3 (1990):

1. A. S. Weigend, B. A. Huberman and D. E. Rumelhart: 
   Predicting the future: A connectionist approach.

2. C. Chinchuan, M. Shanblatt and C. Maa: An artificial neural 
   network algorithm for dynamic programming.

3. L. Fan and T. Li: Design of competition based neural networks 
   for combinatorial optimization.

4. E. A. Ferran and R. P. J. Perazzo: Dislexic behaviour of 
   feed-forward neural networks.

5. E. Milloti: Sigmoid versus step functions in feed-forward  
   neural networks.  

6. D. Horn and M. Usher: Excitatory-inhibitory networks with
   dynamical thresholds.

7. J. G. Sutherland: A holographic model of memory, learning
   and expression. 

8. L. Xu: Adding top-down expectations into the learning procedure 
   of self-organizing maps.

9. D. Stork: BOOK REVIEW     

----------------------------------

Editorial board:

B. Lautrup (Niels Bohr Institute, Denmark)  (Editor-in-charge)
S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) 

D. Stork (Stanford) (Book review editor)

Associate editors:

B. Baird (Berkeley) 
D. Ballard (University of Rochester) 
E. Baum (NEC Research Institute)
S. Bjornsson (University of Iceland)
J. M. Bower (CalTech)
S. S. Chen (University of North Carolina)
R. Eckmiller (University of Dusseldorf)
J. L. Elman (University of California, San Diego)
M. V. Feigelman (Landau Institute for Theoretical Physics)
F. Fogelman-Soulie (Paris)  
K. Fukushima (Osaka University)
A. Gjedde (Montreal Neurological Institute)
S. Grillner (Nobel Institute for Neurophysiology, Stockholm)
T. Gulliksen (University of Oslo)
D. Hammerstroem (University of Oregon)
J. Hounsgaard (University of Copenhagen) 
B. A. Huberman (XEROX PARC)
L. B. Ioffe (Landau Institute for Theoretical Physics)
P. I. M. Johannesma (Katholieke Univ. Nijmegen)
M. Jordan (MIT)
G. Josin (Neural Systems Inc.)
I. Kanter (Princeton University)
J. H. Kaas (Vanderbilt University)
A. Lansner (Royal Institute of Technology, Stockholm)   
A. Lapedes (Los Alamos)
B. McWhinney (Carnegie-Mellon University)
M. Mezard (Ecole Normale Superieure, Paris)
A. F. Murray (University of Edinburgh)
J. P. Nadal (Ecole Normale Superieure, Paris)
E. Oja (Lappeenranta University of Technology, Finland)
N. Parga (Centro Atomico Bariloche, Argentina)
S. Patarnello (IBM ECSEC, Italy)
P. Peretto (Centre d'Etudes Nucleaires de Grenoble)
C. Peterson (University of Lund)
K. Plunkett (University of Aarhus)
S. A.  Solla (AT&T Bell Labs)
M. A. Virasoro (University of Rome)
D. J. Wallace (University of Edinburgh)
D. Zipser (University of California, San Diego) 

----------------------------------


CALL FOR PAPERS  

Original contributions consistent with the scope  of  the  journal  are
welcome.  Complete  instructions  as  well   as   sample   copies   and
subscription information are available from 

The Editorial Secretariat, IJNS
World Scientific Publishing Co. Pte. Ltd.
73, Lynton Mead, Totteridge
London N20 8DH
ENGLAND 
Telephone: (44)1-446-2461

or 

World Scientific Publishing Co. Inc.
687 Hardwell St.
Teaneck
New Jersey 07666
USA  
Telephone: (1)201-837-8858  

or

World Scientific Publishing Co. Pte. Ltd.
Farrer Road, P. O. Box 128
SINGAPORE 9128
Telephone (65)278-6188

-----------------------------------------------------------------------

End Message


From tgd at turing.CS.ORST.EDU  Thu Aug  9 01:36:10 1990
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 8 Aug 90 22:36:10 PDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU's message of Wed, 08 Aug 90 10:09:48 EDT <9008090152.AA19554@CS.ORST.EDU>
Message-ID: <9008090536.AA01129@turing.CS.ORST.EDU>

As someone with a lot of experience in decision-tree learning
algorithms, I agree with Scott.  The main similarity between
Cascade-Correlation (CC) and decision tree algorithms like CART is
that they are both greedy.  CART and related algorithms (e.g., ID3,
C4, CN2, GREEDY3) all work by choosing an (axis-parallel) hyperplane
and then subdividing the training data along that hyperplane, whereas
CC keeps all of the training data together and keeps retraining the
output units as it incrementlly adds hidden units.

There is an algorithm, called FRINGE, that learns a decision tree and
then uses that tree to define new features which are then used to
build a new tree (and this process can be repeated, of course).  This
is the best example I know of a non-connectionist (supervised)
algorithm for defining new features.

--Tom


From jose at learning.siemens.com  Thu Aug  9 10:14:39 1990
From: jose at learning.siemens.com (Steve Hanson)
Date: Thu, 9 Aug 90 09:14:39 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <9008091414.AA07343@learning.siemens.com.siemens.com>


thanks for the clarification...

however, as I understand CART, it is not required to
construct an axis-parallel hyperplane (like ID3 etc..), like 
CC any hyperplane is possible.  Now as I understand CC it does 
freeze the weights for each hidden unit once
asymptotic learning takes place and takes as input to
a next candidate hidden unit the frozen hidden unit
output (ie hyperplane decision or discriminant function).
Consequently, CC does not "...keep all of the training 
data together and <keeps> retraining the output units (weights?) as it 
incrementlly adds hidden units".

As to higher-order hidden units... I guess i see what you mean, however,
don't units below simply send a decision concerning the subset
of data which they have correctly classified?  Consequently,
units above see the usual input features and a 
newly learned hidden unit feature indicating that a some subset 
of the input vectors are on one side of its decision surface? right?
Consequently the next hidden unit in the "cascade" can learn
to ignore that subset of the input space and concentrate on
other parts of the input space that requires yet another hyperplane?  
It seems as tho this would produce a branching tree of discriminantS
similar to cart.

n'est pas?


	Steve

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Thu Aug  9 11:38:51 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Thu, 09 Aug 90 11:38:51 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Thu, 09 Aug 90 09:14:39 -0500.
             <9008091414.AA07343@learning.siemens.com.siemens.com> 
Message-ID: <mailman.324.1149540203.24850.connectionists@cs.cmu.edu>


    Now as I understand CC it does 
    freeze the weights for each hidden unit once
    asymptotic learning takes place and takes as input to
    a next candidate hidden unit the frozen hidden unit
    output (ie hyperplane decision or discriminant function).

Right.  The frozen hidden unit becomes available both for forming an output
and as an input to subsequent hidden units.

An aside: Instead of "freezing", I've decided to call this "tenure" from now
on.  When a candidate unit becomes tenured, it no longer has to learn any
new behavior, and from that point on other units will pay attention to what
it says.

    Consequently, CC does not "...keep all of the training 
    data together and <keeps> retraining the output units (weights?) as it 
    incrementlly adds hidden units".

How does this follow from the above?
    
    As to higher-order hidden units... I guess i see what you mean, however,
    don't units below simply send a decision concerning the subset
    of data which they have correctly classified?

It's not just a decision.  The unit's output can assume any value in its
continuous range.  Some hidden units develop big weights and tend to act
like sharp-threshold units, while others do not.

    Consequently,
    units above see the usual input features and a 
    newly learned hidden unit feature indicating that a some subset 
    of the input vectors are on one side of its decision surface? right?

Right, modulo the comment above.

    Consequently the next hidden unit in the "cascade" can learn
    to ignore that subset of the input space and concentrate on
    other parts of the input space that requires yet another hyperplane?  
    It seems as tho this would produce a branching tree of discriminantS
    similar to cart.

No, this doesn't follow at all.  Typically there are still errors on both
sides of the unit just created, so the next unit doesn't ignore either
"branch".  It produces some new cut that typically subdivides all (or many)
of the regions created so far.  Again, I suggest you look at the diagrams
in the tech report to see the kinds of "cuts" are actually created.
    
    n'est pas?

Only eagles nest in passes.  Lesser birds hide among the branches of
decision trees.  :-)

-- Scott

From Connectionists-Request at CS.CMU.EDU  Thu Aug  9 13:33:59 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Thu, 09 Aug 90 13:33:59 EDT
Subject: Return addresses
Message-ID: <24309.650223239@B.GP.CS.CMU.EDU>

I have received several complaints from Connectionists members that they are
not able 'reply' to messages because the original sender's address has been
removed from the message header.  

This is a problem with the receiver's local mailer.  Rather than having me try
to remotely trouble shoot 150 different mailers, the problem could be solved by
including a return email address as part of the body of any message sent to
Connectionists.

I would also like to remind subscribers that a copy of main mailing list is
available in the Connectionists archives.  

Scott Crowder
Connectionists-Request at cs.cmu.edu	(ARPAnet)  <- see, it isn't that hard


-------------------------------------------------------------------------------

			The CONNECTIONISTS Archive:
			---------------------------

All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are
now available for public perusal.  

A separate file exists for each month.  The files' names are:

			 arch.yymm

where yymm stand for the obvious thing.  Thus the earliest available data are
in the file:
			 arch.8802

Files ending with .Z are compressed using the standard unix compress program.
To browse through these files (as well as through other files, see below) you
must FTP them to your local machine.

-------------------------------------------------------------------------------

		How to FTP Files from the CONNECTIONISTS Archive
		------------------------------------------------

1.  Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address
    128.2.242.8).

2.  Login as user anonymous with password your username.

3.  'cd' directly to one of the following directories:
	/usr/connect/connectionists/archives
	/usr/connect/connectionists/bibliographies

4.  The archives and bibliographies directories are the ONLY ones you can 
    access. You can't even find out whether any other directories exist. 
    If you are using the 'cd' command you must cd DIRECTLY into one of these
    two directories. Access will be denied to any others, including their 
    parent directory.

5.  The archives subdirectory contains back issues of the mailing list.
    Some bibliographies are in the bibliographies subdirectory.


Problems? - contact us at "Connectionists-Request at cs.cmu.edu".

Happy Browsing

Scott Crowder
Connectionists-Request at cs.cmu.edu

-------------------------------------------------------------------------------

From orjan at thalamus.sans.bion.kth.se  Thu Aug  9 19:47:34 1990
From: orjan at thalamus.sans.bion.kth.se (Orjan Ekeberg)
Date: Thu, 09 Aug 90 19:47:34 N
Subject: Network Constructing Algorithms. 
In-Reply-To: Your message of Thu, 09 Aug 90 08:51:08 O.
             <9008091705.AAgarbo.bion.kth.se13977@garbo.bion.kth.se> 
Message-ID: <9008091747.AA12363@thalamus>


I assume that some of the work that we have been doing would fit
well in this context too.  Based on a recurrent network, higher order units
are added automatically.  The new units become part of the recurrent set
and helps to make the training patterns fixpoints of the network.

A couple of references (in bibtex format):

@inproceedings{sans:alaoe87,
    author = {Anders Lansner and {\"O}rjan Ekeberg},
    year = 1987,
    title = {An Associative Network Solving the ``4-Bit ADDER Problem''},
    booktitle = {Proceedings of the IEEE First Annual International
        Conference on Neural Networks},
    pages = {II{-}549},
    address = {San Diego, USA},
    month = jun}

@inproceedings{sans:paris88,
    author = {{\"O}rjan Ekeberg and Anders Lansner},
    year = 1988,
    title = {Automatic Generation of Internal Representations
        in a Probabilistic Artificial Neural Network},
    booktitle = {Neural Networks from Models to Applications},
    editor = {L. Personnaz and G. Dreyfus},
    publisher = {I.D.S.E.T.},
    address = {Paris},
    pages = {178--186},
    note = {Proceedings of {nEuro}-88, The First European Conference
        on Neural Networks},
    abstract = {In a one layer feedback perceptron type network,
the connections can be viewed as coding the pairwise correlations
between activity in the corresponding units. This can then be used to
make statistical inference by means of a relaxation technique based on
bayesian inferences.

When such a network fails, it might be because the regularities are
not visible as pairwise correlations. One cure would then be to use a
different internal coding where selected higher order correlations are
explicitly represented. A method for generating this representation
automatically is reviewed and results from experiments regarding the
resulting properties is presented with a special focus on the networks
ability to generalize properly.}}


+---------------------------------+-----------------------+
+ Orjan Ekeberg              + O---O---O          +
+ Department of Computing Science +  \ /|\ /| Studies of  +
+ Royal Institute of Technology      +   O-O-O-O  Artificial +
+ S-100 44 Stockholm, Sweden      +   |/ \ /|   Neural      +
+---------------------------------+   O---O-O    Systems  +
+ EMail: orjan at bion.kth.se      + SANS-project      +
+---------------------------------+-----------------------+

From pollack at cis.ohio-state.edu  Thu Aug  9 12:14:19 1990
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Thu, 9 Aug 90 12:14:19 -0400
Subject: Cascade Correlation and Convergence 
Message-ID: <9008091614.AA14222@dendrite.cis.ohio-state.edu>

Scott's description of his algorithm, and lack of convergence proof,
reminded me of the line of research by Meir and Domany (Complex
Systems 2, 1988) and Mezard and Nadal (Int J Neu Systems, 1,1 1989) on
methods for directly constructing networks.

In a related paper (which I cannot find), I'm quite sure that someone
proved by construction that any (n input, 1 output) boolean function
could be accomplished by a layering of TLU's, where each additional
unit is guaranteed to decrease the number of mis-classified inputs.

Perhaps this approach would help lead to some convergence proof for CC
networks.

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Fax/Phone: (614) 292-4890


From bgupta at aries.intel.com  Thu Aug  9 19:19:58 1990
From: bgupta at aries.intel.com (Bhusan Gupta)
Date: Thu, 9 Aug 90 16:19:58 PDT
Subject: Job opening at Intel for NN IC designer
Message-ID: <9008092319.AA04843@aries>


The neural network group at Intel is looking for an engineer to 
participate in the development of neural networks. 

A qualified applicant should have a M.S. or PhD in electrical 
engineering or equivalent experience. The specialization required is in
CMOS circuit design with an emphasis on digital design. Analog design
experience is considered useful as well. Familiarity with neural network 
architectures, learning algorithms, and applications is desirable. 

The duties that are specific to this job are:
	Neural network design.
		Architecture definition and circuit design.
		Chip planning, layout supervision and verification.
		Testing and debugging silicon.
	The neural network design consists primarily of digital design with
	both a gate-level and transistor-level emphasis.


The job is at the Santa Clara site and is currently open.

Interested principals can email at bgupta at aries.intel.com until the
end of August. Resumes in ascii are preferred. I will pass along all 
responses to the appropriate people.

street address:
	Bhusan Gupta
	m/s sc9-40
	2250 Mission College Blvd.
	P.O. Box 58125
	Santa Clara, Ca 95052


Intel is an equal opportunity employer, etc.

Bhusan Gupta

From sg at corwin.ccs.northeastern.edu  Thu Aug  9 14:34:35 1990
From: sg at corwin.ccs.northeastern.edu (steve gallant)
Date: Thu, 9 Aug 90 14:34:35 EDT
Subject: Cascade-Correlation, etc
Message-ID: <9008091834.AA18306@corwin.CCS.Northeastern.EDU>

To respond to Jordan's suggestion, if you copy the output cell from
a stage in cascade correlation into your growing network, then the
previous convergence results hold for boolean learning problems.
This is true whether you copy at every stage or only occasionally.

Scott tried a few simulations and there seemed to be some learning
speed gain by occasional copying, perhaps 25% on the couple of tests
he ran.

Also, if I can add an early paper (that includes convergence)
to Tal Grossman's list:

     Gallant, S. I\@.  Three Constructive Algorithms for Network
     Learning.  
     Proc.\ Eighth Annual Conference of the
     Cognitive Science Society, Amherst, Ma., Aug. 15-17, 1986, 652-660.
 

Steve Gallant

From marcus at cns.edinburgh.ac.uk  Fri Aug 10 16:37:13 1990
From: marcus at cns.edinburgh.ac.uk (Marcus Frean)
Date: Fri, 10 Aug 90 16:37:13 BST
Subject: Convergence of constructive algorithms.
Message-ID: <8340.9008101537@cns.ed.ac.uk>


Jordan Pollack writes:

> In a related paper (which I cannot find), I'm quite sure that someone
> proved by construction that any (n input, 1 output) boolean function
> could be accomplished by a layering of TLU's, where each additional
> unit is guaranteed to decrease the number of mis-classified inputs.
> Perhaps this approach would help lead to some convergence proof for CC
> networks.


There are several papers that show convergence via guaranteeing each
unit reduces the output's errors by at least one. 

[NB: They all use linear threshold units, and require for convergence
that the training set be composed of binary patterns (or at least
convex: every pattern must be separable from all the others), since
then the worst case is always that a new unit captures a single
pattern and hence is able to correct the output unit by one.]


These include 
The "Tower algorithm":
	Gallant,S.I. 1986a. Three Constructive Algorithms for Network
	Learning.  Proc. 8th Annual Conf. of Cognitive Science Soc.
	p652-660. 
also discussed in 
	Nadal,J. 1989. Study of a Growth Algorithm for Neural Networks
	International J. of Neural Systems, 1,1:55-59 
The performance of this method closely matches that of the "Tiling"
Algorithm of Mezard and Nadal, although the proof there is for
reduction of at least one error per layer rather than per unit.


The "neural decision tree" approach is shown to converge by 
	M. Golea and M. Marchand, A Growth Algorithm for Neural
	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
and also
	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
	Classification, preprint, submitted to "Network", April 90.


The "Upstart" algorithm (my favourite....)
	Frean,M.R. 1990. The Upstart Algorithm: A Method for
	Constructing and Training Feedforward Neural Networks.
	Neural Computation. 2:2, 198-209.  
in which new units are devoted to correcting errors made by existing
units (in this sense it has bears some resemblance to Cascade
Correlation).  A binary tree of units is constructed, but it is not a
decision tree: "daughter" units correct their "parent", with the most
senior parent being the output unit.

Marcus.
---------------------------------------------------------------------


From fanty at cse.ogi.edu  Fri Aug 10 13:00:03 1990
From: fanty at cse.ogi.edu (Mark Fanty)
Date: Fri, 10 Aug 90 10:00:03 -0700
Subject: conjugate gradient optimization program available
Message-ID: <9008101700.AA03174@cse.ogi.edu>

The speech group at OGI uses conjugate-gradient optimization to train
fully connected feed-forward networks.  We have made the program (OPT)
available for anonymous ftp:

1. ftp to cse.ogi.edu

2. login as "anonymous" with any password

3. cd to "pub/speech"

4. get opt.tar

OPT was written by Etienne Barnard at Carnegie-Mellon University.

  Mark Fanty				Computer Science and Engineering 
			       		Oregon Graduate Institute
  fanty at cse.ogi.edu			196000 NW Von Neumann Drive      
  (503) 690-1030			Beaverton, OR 97006-1999         


From amini at tcville.hac.com  Sun Aug 12 23:47:14 1990
From: amini at tcville.hac.com (Afshin Amini)
Date: Sun, 12 Aug 90 20:47:14 PDT
Subject: signal processing with neural nets
Message-ID: <9008130347.AA02757@ai.spl>

Hi there:

I would like to explore possibilities of using neural nets in a signal processing environment. I would like to get familiar with usage of neural nets in
the area of spectral estimation and classification. 
I have used the popular methods of high resolution spectral estimation such
as AR modeling and such.
I would like to get some reffrences to recent publications and books that
contain specific algorithms that deploys neural networks to achieve such
problems in signal processing.

thanks,

-A. Amini


-- 
Afshin Amini
Hughes Aircraft Co.  			voice:    (213) 616-6558
Electro-Optical and Data Systems Group
Signal Processing Lab
                                        fax:         (213) 607-0918
P.O. Box 902, EO/E1/B108 		email:     
El Segundo, CA 90245 			             smart: amini at tcville.hac.com
Bldg. E1 Room b2316f   			             dumb:   amini%tcville at hac2arpa.hac.com
 			                             uucp:  hacgate!tcville!dave


From nelsonde%avlab.dnet at wrdc.af.mil  Mon Aug 13 10:10:04 1990
From: nelsonde%avlab.dnet at wrdc.af.mil (nelsonde%avlab.dnet@wrdc.af.mil)
Date: Mon, 13 Aug 90 10:10:04 EDT
Subject: Last Call for Papers for AGARD Conference
Message-ID: <9008131410.AA08887@wrdc.af.mil>

                   I N T E R O F F I C E   M E M O R A N D U M

                                        Date:      13-Aug-1990 10:05am EST
                                        From:      Dale E. Nelson 
                                                   NELSONDE 
                                        Dept:      AAAT-1
                                        Tel No:    57646


From sankar at caip.rutgers.edu  Sun Aug 12 21:15:24 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Sun, 12 Aug 90 21:15:24 EDT
Subject: No subject
Message-ID: <9008130115.AA08572@caip.rutgers.edu>


>>There are several papers that show convergence via guaranteeing each
>>unit reduces the output's errors by at least one. 
>>
>>
>>The "neural decision tree" approach is shown to converge by 
>>	M. Golea and M. Marchand, A Growth Algorithm for Neural
>>	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
>>and also
>>	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
>>	Classification, preprint, submitted to "Network", April 90.

Add to this the following paper:

A. Sankar and R.J. Mammone, " A fast learning algorithm for tree
neural networks", presented at the 1990 Conference on Information
Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. 

This will appear in the conference proceedings. We also have a more
detailed technical report on this research. 

For copies please contact

Ananth Sankar
CAIP 117
Brett and Bowser Roads
Rutgers University
P.O. Box 1390
Piscataway, NJ 08855-1390


From sankar at caip.rutgers.edu  Mon Aug 13 13:48:53 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Mon, 13 Aug 90 13:48:53 EDT
Subject: No subject
Message-ID: <9008131748.AA07712@caip.rutgers.edu>


An earlier attempt to mail this seems to have failed..my apologies to
everyone who gets a duplicate copy.


>>There are several papers that show convergence via guaranteeing each
>>unit reduces the output's errors by at least one. 
>>
>>
>>The "neural decision tree" approach is shown to converge by 
>>	M. Golea and M. Marchand, A Growth Algorithm for Neural
>>	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
>>and also
>>	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
>>	Classification, preprint, submitted to "Network", April 90.

Add to this the following paper:

A. Sankar and R.J. Mammone, " A fast learning algorithm for tree
neural networks", presented at the 1990 Conference on Information
Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. 

This will appear in the conference proceedings. We also have a more
detailed technical report on this research. 

For copies please contact

Ananth Sankar
CAIP 117
Brett and Bowser Roads
Rutgers University
P.O. Box 1390
Piscataway, NJ 08855-1390


From gary%cs at ucsd.edu  Mon Aug 13 15:35:50 1990
From: gary%cs at ucsd.edu (Gary Cottrell)
Date: Mon, 13 Aug 90 12:35:50 PDT
Subject: Summary (long): pattern recognition comparisons
In-Reply-To: Leonard Uhr's message of Fri, 3 Aug 90 14:18:11 -0500 <9008031918.AA23586@thor.cs.wisc.edu>
Message-ID: <9008131935.AA19428@desi.ucsd.edu>


Leonard Uhr says:

>Neural nets using backprop have only handled VERY SIMPLE images, usually in
>8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
>logarithmically converging nets, but I don't know of any nets with complete
>connectivity from one layer to the next that are that big.)

Mike Fleming and I used 64x64 inputs for face recognition. The system does
auto-encoding as a preprocessing step, reducing the number of inputs to 80.
See IJCNN-90, Vol II p65->.

gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029
Computer Science and Engineering C-014
UCSD, 
La Jolla, Ca. 92093
gary at cs.ucsd.edu (ARPA)
{ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET)
gcottrell at ucsd.edu (BITNET)

From kuepper at ICSI.Berkeley.EDU  Tue Aug 14 14:32:37 1990
From: kuepper at ICSI.Berkeley.EDU (Wolfgang Kuepper)
Date: Tue, 14 Aug 90 11:32:37 PDT
Subject: SIEMENS Job Announcement
Message-ID: <9008141832.AA02344@icsib21.Berkeley.EDU>


		IMAGE UNDERSTANDING and ARTIFICIAL NEURAL NETWORKS

	The Corporate Research and Development Laboratories of Siemens AG, 
	one of the largest companies worldwide in the electrical and elec-
	tronics industry, have research openings in the Computer Vision 
	as well as in the Neural Network Groups. The groups do basic and 
	applied studies in the areas of image understanding (document inter-
	pretation, object recognition, 3D modeling, application of neural 
	networks) and artificial neural networks (models, implementations, 
	selected applications). The Laboratory is located in Munich, an 
	attractive city in the south of the Federal Republic of Germany.

	Connections exists with our sister laboratory, Siemens Corporate 
	Research in Princeton, as well as with various research institutes 
	and universities in Germany and in the U.S. including MIT, CMU and 
	ICSI.

	Above and beyond the Laboratory facilities, the groups have a 
	network of Sun and DEC workstations, Symbolics Lisp machines, 
	file and compute servers, and dedicated image processing hardware.

	The successful candidate should have an M.S. or Ph.D. in Computer 
	Science, Electrical Engineering, or any other AI-related or 
	Cognitive Science field. He or she should prefarably be able to 
	communicate in German and English.

	Siemens is an equal opportunity employer.

	Please send your resume and a reference list to
		Peter Moeckel
		Siemens AG
		ZFE IS INF 1
		Otto-Hahn-Ring 6
		D-8000 Muenchen 83
		West Germany
	e-mail: gm%bsun4 at ztivax.siemens.com
	Tel. +49-89-636-3372
	FAX  +49-89-636-2393

	Inquiries may also be directed to
		Wolfgang Kuepper (on leave from Siemens until 8/91)
		International Computer Science Institute
		1947 Center Street - Suite 600
		Berkeley, CA 94704
	e-mail: kuepper at icsi.berkeley.edu
	Tel. (415) 643-9153
	FAX  (415) 643-7684


From Connectionists-Request at CS.CMU.EDU  Thu Aug 16 12:31:34 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Thu, 16 Aug 90 12:31:34 EDT
Subject: patience is a virtue
Message-ID: <4776.650824294@B.GP.CS.CMU.EDU>

Recently a few people have worried that their posts were lost because of
the long resend time for messages to the connectionists list.  I would like for
all users to exercise a little patience.

CMU is happy to provide the resources and labor necessary to make the
Connectionists list available to the world wide connectionists community.
However, we do have limited resources.  The Connectionists redistribution
machine is a only a VAX 750.  This machine also services several other large
mailing lists.  Delays of 4-6 hours are typical, but delays of >16 hours are
possible during high traffic periods.

If you are trying to debate an issue with another list member, but think the
rest of the list would be interested in the debate it is best to email directly
to the other member and cc: Connectionists at cs.cmu.edu.  This allows you to
carry on your debate at normal email speeds and lets the rest of the community
'listen in' 6-16 hrs latter.

If you feel that the delays are a serious impediment to the research
progress of the connectionists community, CMU would be happy to accept your
donation of new dedicated Connectionists redistribution machine.

Scott Crowder
Connectionists-Request at cs.cmu.edu	(ARPAnet)

PS If you have waited more than 24 hours and STILL haven't recieved your post,
please contact me at Connectionists-Request at cs.cmu.edu.

From xiru at Think.COM  Fri Aug 17 16:48:58 1990
From: xiru at Think.COM (xiru@Think.COM)
Date: Fri, 17 Aug 90 16:48:58 EDT
Subject: backprop for classification
Message-ID: <9008172048.AA00756@yangtze.think.com>


While we trained a standard backprop network for some classification task
(one output unit for each class), we found that when the classes are not
evenly distribed in the training set, e.g., 50% of the training data belong
to one class, 10% belong to another, ... etc., then the network always biased
towards the classes that have the higher percentage in the training set.
Thus, we had to post-process the output of the network, giving more weights
to the classes that occur less frequently (in reverse proportion to their
population). 

I wonder if other people have encountered the same problem, and if  there
are better ways to deal with this problem.

Thanks in advance for any replies.


- Xiru Zhang

Thinking Machines Corp.

From John.Hampshire at SPEECH2.CS.CMU.EDU  Sun Aug 19 13:48:06 1990
From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU)
Date: Sun, 19 Aug 90 13:48:06 EDT
Subject: backprop for classification
Message-ID: <mailman.326.1149540209.24850.connectionists@cs.cmu.edu>

Xiru Zhang of Thinking Machines Corp. writes:

> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
> Thus, we had to post-process the output of the network, giving more weights
> to the classes that occur less frequently (in reverse proportion to their
> population). 
> 
> I wonder if other people have encountered the same problem, and if  there
> are better ways to deal with this problem.

Indeed, one can show that any classifier with sufficient
functional capacity to model the class-conditional densities
of the random vector X being classified (e.g., a MLP with sufficient
connectivity to perform the input-to-output functional mapping
necessary for robust classification) and trained with a "reasonable
error measure" (a term originated by B. Pearlmutter)
will yield outputs that are accurate estimates
of the a posteriori probabilities of X, given an asymptotically
large number of statistically independent training samples.
Examples of "reasonable error measures" are mean-squared
error (the one used by Xiru Zhang), Cross Entropy, Max. Mutual Info.,
Kullback-Liebler distance, Max. Likelihood...

Unfortunately, one never has enough training data, and
it's not always clear what constitutes sufficient but not
excessive functional capacity in the classifier.  So one
ends up *estimating* the a posterioris with one's
"reasonable error measure"-trained classifier.  If one trains
one's classifier with a disproportionately high number of
samples belonging to one particular class, one will get
precisely the behavior Xiru Zhang describes.

**************
This is because the a posterioris depend on the class priors
(you can prove this easily using Bayes' rule).  If you
bias the priors, you will bias the a posterioris accordingly.
Your classifier will therefore learn to estimate the biased
a posterioris.
**************

The best way to fix the problem if you're using a
"reasonable error measure" to train your classifier
is to have a training set that reflects the true class
priors.  If this isn't possible,
then you can post-process the classifier's outputs by
correcting for the biased priors.  Whether or not this fix
really works depends a lot on the classifier you're using.
MLPs tend to be over-parameterized, so they tend to yield
binary outputs that won't be affected by this kind of post
processing.

Another approach might be to avoid using "reasonable error
measures" to train your classifier.  I have more info regarding
such alternatives if anyone cares, but I've already blabbed too much.
If you want refs., please send me email directly.

Cheers,

John


From niranjan at engineering.cambridge.ac.uk  Sun Aug 19 10:11:29 1990
From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan)
Date: Sun, 19 Aug 90 10:11:29 BST
Subject: backprop for classification
Message-ID: <3447.9008190911@dsl.eng.cam.ac.uk>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>
This often happens when the network is too small to load the training data.
Your network, in this case, does not converge to negligible error.
My suggestion is to start with a large network that can load your training
data and gradually reduce the size of the net by pruning the weights giving
small contributions to the output error.

niranjan

From russ at dash.mitre.org  Mon Aug 20 07:17:38 1990
From: russ at dash.mitre.org (Russell Leighton)
Date: Mon, 20 Aug 90 07:17:38 EDT
Subject: backprop for classification
In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com>
Message-ID: <9008201117.AA22280@dash.mitre.org>

We have found backprop VERY sensitive to the probability of
occurance of each class. As long as you are aware of this
you can use this to advantange. For example, if
false alarms are a big concern then by training
with large amounts of "noise" you can bias the sytem
to reduce the Pfa. 

This effect has been quantified analytically and experimentally
for systems with no hidden layers in a paper being compiled now.
The bottom line is that a no hidden layer system implements
a classical Mini-Max test if the signal classes are represented
equally in the training set. By varying the the composition
of the training sets, the network can be designed relative to 
a known maximum false alarm probablity independent of signal-to-noise
ratio. This work continues for multi-layer systems.

An experimental account of how to exploit this effect for
signal classification can be found in:

Wieland, et al., `An Analysis of Noise Tolerance for a Neural
Network Recognition System', Mitre Tech. Rep. MP-88W00021, 1988

and

Wieland, et al., `Shaping Schedules as a Method of Accelerated
Learning', Proceedings of the first INNS Meeting, 1988

Russ.


NFSNET: russ at dash.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA


From wan at whirlwind.Stanford.EDU  Mon Aug 20 14:07:39 1990
From: wan at whirlwind.Stanford.EDU (Eric A. Wan)
Date: Mon, 20 Aug 90 11:07:39 PDT
Subject: Survey of Second Order Techniques
Message-ID: <9008201807.AA13338@whirlwind.Stanford.EDU>


I am compiling a study on the extent to which researches have gone
beyond simple gradient descent (back-propagation) for training
layered neural networks by applying more sophisticated classical
techniques in non-linear optimization (e.g. Newton, Quasi-Newton,
Conjugate-Gradient methods, etc.)?  Please e-mail me any comments
and/or references that you have on the subject.  I will summarize 
the responses.

Thanks in advance.

Eric Wan  
wan at isl.stanford.edu


From YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU  Mon Aug 20 11:36:47 1990
From: YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU (Yves (Zip) Lacouture)
Date: Mon, 20 Aug 90 11:36:47 HAE
Subject: BP for categorization...
Message-ID: <mailman.327.1149540209.24850.connectionists@cs.cmu.edu>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>

I encountered the same problem in a similar situation. This occur with
limited resources (HU): the network tend to neglet a subset of the
stimuli. The phenomenon is also observed when the stimuli have the same
presentation probability and the resources are very limited. It helps to
use a non-orthogonal representation (e.g. by activating neighbor units).
To build a model of (human) simple identification I modified BP to
incorporate a selective attention mechanism by which the adaptative
modifications are made larger for the stimuli for which performances are
worse. I expect to offer a TR on this topic soon.

yves

From chrisley at parc.xerox.com  Mon Aug 20 13:35:08 1990
From: chrisley at parc.xerox.com (Ron Chrisley)
Date: Mon, 20 Aug 90 10:35:08 PDT
Subject: backprop for classification
In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com>
Message-ID: <9008201735.AA07158@owl.parc.xerox.com>


Xiru, you wrote:

"While we trained a standard backprop network for some classification task
(one output unit for each class), we found that when the classes are not
evenly distribed in the training set, e.g., 50% of the training data belong
to one class, 10% belong to another, ... etc., then the network always
biased towards the classes that have the higher percentage in the training set.
Thus, we had to post-process the output of the network, giving more weights
to the classes that occur less frequently (in reverse proportion to their
population)."

My suggestion:  most BP classification paradigms will work best if you are
using the same distribution for training as for testing.  So only worry
about uneven distribution of classes in the training data if the input on
which the network will have to perform does not have that distribution.  If
rocks are 1000 times more common than mines, then given that something is
completely qualitatively ambiguous with respect to the rock/mine
distinction, it is best (in terms of minimizing # of misclassifications) to
guess that the thing is a rock.  So being biased toward rock
classifications is a valid way to minimize misclassification.  (Of course,
once you start factoring in cost, this will be skewed dramatically:  it is
much better to have a false alarm about a mine than to falsely think a mine
is a rock.)

In summary, uneven distributions aren't, in themselves, bad for training,
nor do they require any post-processing.  However, distributions that
differ from real-world ones will require some sort of post-processing, as
you have done.

But there is another issue here, I think.

How were you using the network for classification?  From your message, it
sounds like you were training and interpreting the network in such a way
that the activations of the output nodes were supposed to correspond to the
conditional probabilities of the different classes, given the input.  This
would explain what you meant by your last sentence in the above quote.

But there are other ways of using back-propagation.  For instance, if one
does not constrain the network to estimate conditional probabilities, but
instead has it solve the more general problem of minimizing classification
error, then it is possible that the network will come up with a solution
that is not affected by differences of prior probabilities of classes in the
training and testing data.  Since it is not solving the problem by
classifying via maximum liklihood, its solutions will be based on the
frequency-independent, qualitative structure of the inputs.

In fact, humans often do something like this.  The phenomenon is called
"base rate neglect".  The phenomenon is notorious in that when qualitative
differences are not so marked between a rare and a common class, humans
will always over-classify inputs into the rare class.  That is, if the
symptoms a patient has even *slightly* indicate a rare tropical disease
over a common cold, humans will give the rare disease dignosis, even
though it is extremely unlikely that the patient has that disease.  Of
course, the issue of cost is again being ignored here.  (See Gluck and Bower
for a look at the relation between neural networks and base rate neglect).

Such limitations aside, classification via means other than conditional
probability estimation may be desirable for certain applications.  For
example, those in which you do not know the priors, or they change
dramatically in an unpredictable way.  And/or where there is a strong
qualitative division bewteen members of the classes.

In such cases, you might get good classification performance, even when
the distributions differ, by relying more on qualitative differences in the
inputs than in the frequency of the classes.

Does this sound right?

Ron Chrisley	chrisley at csli.stanford.edu
Xerox PARC SSL					New College
Palo Alto, CA 94304				Oxford OX1 3BN, UK
(415) 494-4728					(865) 793-484


From niranjan at engineering.cambridge.ac.uk  Tue Aug 21 20:20:36 1990
From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan)
Date: Tue, 21 Aug 90 20:20:36 BST
Subject: Backprop for classification
Message-ID: <5229.9008211920@dsl.eng.cam.ac.uk>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>
This often happens when the network is too small to load the training data.
Your network, in this case, does not converge to negligible error.
My suggestion is to start with a large network that can load your training
data and gradually reduce the size of the net by pruning the weights giving
small contributions to the output error.

niranjan


From der%beren at Forsythe.Stanford.EDU  Wed Aug 22 13:35:59 1990
From: der%beren at Forsythe.Stanford.EDU (Dave Rumelhart)
Date: Wed, 22 Aug 90 10:35:59 PDT
Subject: BP for categorization...relative frequency problem
In-Reply-To: "Yves (Zip) Lacouture"'s message of Mon,
  20 Aug 90 11:36:47 HAE <9008210406.AA11690@nprdc.navy.mil>
Message-ID: <9008221735.AA07583@beren.>


We have also encountered the problem.  Since BP does gradient descent and
since the contribution of any set of patterns depends in part on the
relative frequency of those patterns, fewer resources are allocated to low
fequency categories.  Morover, those resources are allocated later in the
training -- probably after over-fitting has already become a problem for
higher frequency categories.  Of course, if your training distribution is
the same as your testing distribution you wil be getting the appropriate
Baysian estimate of the class probabilities.  On the other hand, if the
generalization distribution is unknown at test time we may wish to factor
out the relative frequency of your input frequency during training and add
any known "priors" during generalization.  There are two ways to do this.
One way, suggested in one of the notes on this topic is to "post process"
out output data.  That is, divide the output unit value by the relative
frequency in the training set and multiply by the relative frequency in the
test set.  This will give you an estimate of the Bayesian probability for
the test set.  For a variety of reasons, this is less appropriate that
correcting during training.  In this case, the procedure is to effectively
increase the learning rate inversely proportional to the relative frequency
of the category in the training set.  Thus, we take bigger learning steps
on low frequency categories.  In a simple classification task, this is
roughly equivalent to normalizing the data set by sampling each category
set equally.  In the case of cross-classification (in whihch a given input
can be a member of more the one class), it is roughly equivalent to
weighting each inversely by the probability that that pattern would occur,
given independence between the output classes.  We have used this method
successfully in a system designed to classify mass spectra.  In this method
an output of .5 means that the evidence for and against the category is
equal.  Whereas, in the normal traing method, an output equal to the
relative frequency in the training set means that the evidence for and
against is equal.  In some cases this can be very small.  It is possibly
to add the priors in manually and compare performance on the training set
with the original method.  We find that we do only slightly worse on the
training set with the two methods.  We do much better in generalization on
classes that were low frequency in the training set and slightly worse on
classes which were high frequency in the training set.


                                        der

From hendler at cs.UMD.EDU  Wed Aug 22 16:28:52 1990
From: hendler at cs.UMD.EDU (Jim Hendler)
Date: Wed, 22 Aug 90 16:28:52 -0400
Subject: BP for categorization...relative frequency problem
Message-ID: <9008222028.AA09120@dormouse.cs.UMD.EDU>

Herve Bourlard and Nelson Morgan had to deal with this problem in
a system being used in the context of continuous speech recognition.
They solved the problem, to some extent, by dividing the output
category strengths by the prior probabilities of the training set.  This
avoided having to do anything terribly tricky in the network, and let
them use classical back-propagation without extension (although I think
they've also used some recurrences in one version).  I know there
have been several nice publications of their work in speech - various
papers with the authors Bourlard, Wellekens, and Morgan in various
combinations.  Morgan is at ICSI, and is probably the most accessible
of these authors for requesting reprints.
 -Jim Hendler
 UMCP

From PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU  Wed Aug 22 14:47:17 1990
From: PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU (PSS001%VAXA.BANGOR.AC.UK@vma.CC.CMU.EDU)
Date: Wed, 22 AUG 90 18:47:17 GMT
Subject: No subject
Message-ID: <mailman.328.1149540209.24850.connectionists@cs.cmu.edu>


Department of Psychology, University of Wales, Bangor
and Department of Psychology, University of York


CONNECTIONISM AND PSYCHOLOGY

THREE POST-DOCTORAL RESEARCH FELLOWSHIPS

Applications are invited for three post-doctoral research
fellowships to work on the connectionist and psychological
modelling of human short-term memory and spelling
development.

Two Fellowships are available for three years, on an ESRC-
funded project concerned with the development and evaluation
of a connectionist model of short-term memory.  One Fellow will
be based with Dr. Gordon Brown in the Cognitive
Neurocomputation Unit at Bangor and will be responsible for
implementing the model.  The other Fellow, based at York with
Dr. Charles Hulme, will be responsible for undertaking
psychological experiments with children and adults to evaluate
the model.  Starting salary for both posts on research 1A grade
up to # 13,495.

One two-year Fellowship is available to work on an MRC-funded
project to  develop a sequential connectionist model of the
development of spelling and phonemic awareness in children.
This post is based in Bangor with Dr. Gordon Brown.  Starting
salary on research 1A grade up to # 14,744.


Applicants should have postgraduate research experience or
interest in cognitive psychology/cognitive science or
connectionist/ neural network modelling and computer science.
Good computing skills are essential for the posts based in
Bangor, and  experience in running psychological experiments is
required for the York-based post.  Excellent computational and
research facilities will be available to the successful applicants.

The appointments may commence from 1st. October 1990, but
start could be delayed until 1st. January 1991.   Closing date for
applications is 7th. September 1990, but intending applicants
should get in touch as soon as possible.  Informal enquiries
regarding the Bangor-based posts, and requests for further
details of the posts and host departments, to Gordon Brown
(0248 351151 Ext 2624; email PSS001 at uk.ac.bangor.vaxa);
informal enquiries concerning the York-based post to Charles
Hulme ( 0904 433145; email ch1 at uk.ac.york.vaxa).
Applications (in the form of a curriculum vitae and the names
and addresses of two referees) should be sent to Mr. Alan
James, Personnel Office, University of Wales, Bangor, Gwynedd
LL57 2DG, UK.

(Apologies to anyone who receives this posting through more
than one list or newsgroup)

From MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU  Thu Aug 23 17:22:00 1990
From: MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU (MUSICO%BGERUG51.BITNET@vma.CC.CMU.EDU)
Date: Thu, 23 Aug 90 17:22 N
Subject: signoff
Message-ID: <mailman.329.1149540209.24850.connectionists@cs.cmu.edu>

signoff

From HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU  Fri Aug 24 12:08:15 1990
From: HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU (Gregory Kohring)
Date: Fri, 24 Aug 90 12:08:15 MES
Subject: Preprints
Message-ID: <mailman.330.1149540209.24850.connectionists@cs.cmu.edu>


The following preprint is currently available.
                                       -- Greg Kohring

        Performance Enhancement of Willshaw Type
        Networks through the use of Limit Cycles

                    G.A. Kohring
               HLRZ an der KFA Julich
         (Supercomputing Center at the KFA Julich)

Simulation results of a Willshaw type model for storing sparsely
coded patterns are presented. It is suggested that random patterns can
be stored in Willshaw type models by transforming them into a set of
sparsely coded patterns and retrieving this set as a limit cycle.
In this way, the number of steps needed to recall a pattern will be
a function of the amount of information the pattern contains.
A general algorithm for simulating neural networks
with sparsely coded patterns is also discussed, and, on a fully
connected network of N=36 864  neurons (1.4 billion couplings),
it is shown to achieve effective updating speeds as high as
160 billion coupling evaluations per second on one Cray-YMP processor.

==================================================================

Additionally, the following short review article is also available.
It is aimed at graduate students in computational physics who need an
overview of the neural network literature from a computational sciences
viewpoint, as well as some simple programming hints in order to get
started with their neural network studies. It will shortly
appear in World Scientific's Internationl Journal of Modern Physics C:
Compuational Physics.


         LARGE SCALE NEURAL NETWORK SIMULATIONS

                    G.A. Kohring
               HLRZ an der KFA Julich
         (Supercomputing Center at the KFA Julich)

The current state of large scale, numerical simulations of neural
networks is reviewed. Hardware and software improvements make it likely
that biological size networks, i.e., networks with more than $10^{10}$
couplings, can be simulated in the near future. Sample programs for the
efficient simulation of a few simple models are presented as an aid to
researchers just entering the field.


Send Correspondence and request for preprints to:

G.A. Kohring
HLRZ an der KFA Julich
Postfach 1913
D-5170 Julich, West Germany

e-mail: hkf218 at djukfa11.bitnet

Address after September 1, 1990:

Institut fur Theoretische Physik
Universitat zu Koln
D-5000 Koln  41, West Germany

From Connectionists-Request at CS.CMU.EDU  Fri Aug 24 10:31:02 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Fri, 24 Aug 90 10:31:02 EDT
Subject: Quantitative Linguistics Conference Announcement
Message-ID: <10643.651508262@B.GP.CS.CMU.EDU>


            First QUANTITATIVE LINGUISTICS CONFERENCE (QUALICO)
                          September 23 - 27, 1991
                        University of Trier, Germany

                              organized by the
          GLDV - Gesellschaft fuer Linguistische Datenverarbeitung
                 (German Society for Linguistic Computing)
	                           and
                 the Editors of "Quantitative Linguistics"

OBJECTIVES

QUALICO is being held for the first time as an International Conference 
to demonstrate the state of the art in Quantitative Linguistics. This 
domain of language study and research is gaining considerable interest 
due to recent advances in linguistic modelling, particularly in computational 
linguistics, cognitive science, and developments in mathematics like non-
linear systems theory. Progress in hard- and software technology together
with ease of access to data and numerical processing has provided 
new means of empirical data acquisition and the application of mathematical 
models of adequate complexity.  The German Society for Linguistic 
Computation (Gesellschaft fuer Linguistische Datenverarbeitung - GLDV)
and the editors of 'Quantitative Linguistics' have taken the initiative 
in preparing this conference to take place at the University of Trier, 
in Trier (Germany), September 23rd - 27th, 1991.

In view of the stimulating new developments in Europe and the academic 
world, the organizers' aim is to encourage and promote mutual exchange 
of ideas in this field of interest which has been limited in the past.
Challenging advances in interdisciplinary quantitative analyses, numerical
modelling and experimental simulations from different linguistic domains
will be reported on by the following keynote speakers: Gabriel Altmann 
(Bochum), Michail V. Arapov (Moskau) (pending acceptance), Hans Goebl 
(Salzburg), Mildred L.G. Shaw (Calgary), John S. Nicolis (Patras), Stuart
M. Shieber (Harvard) (pending acceptance).

CALL FOR PAPERS

The International Program Committee invites communications (long papers: 
20 minutes plus 10; short papers: 15 minutes plus 5; demonstrations 
and posters) on basic research and development as well as on operational 
applications of Quantitative Linguistics, including - but not limited 
to - the following topics:

A. Methodology
1. Theory Construction - 2. Measurement, Scaling - 3.  Taxonomy,
Categorizing - 4. Simulation - 5. Statistics, Probabilistic Modells,
Stochastic Processes - 6. Fuzzy Theory: Possibilistic Modells - 7.  Language
and Grammar Formalisms - 8. Systems Theory: Cybernetics and Information
Theory, Synergetics, New Connectionism

B. Linguistic Analysis and Modelling
1. Phonetics - 2. Phonemics - 3. Morphology - 4. Syntax - 5. Semantics - 6.
Pragmatics - 7.Lexicology - 8. Dialectology - 9. Typology - 10. Text and
Discourse - 11. Semiotics

C. Applications
1. Speech Recognition and Synthesis - 2.Text Analysis and Generation -
3. Language Acquisition and Teaching - 4.Text Understanding and Knowledge
Representation 

Authors are asked to submit extended abstracts (1500 words; 4 copies) 
of their papers in one of the conference's working languages (German, 
English) not later than December 31, 1990 to:

QUALICO - The Program Committee
University of Trier
P.O.Box 3825
D-5500 TRIER
Germany

uucp:    qualico at utrurt.uucp
or:      ..!unido!utrurt!qualico
X.400:   qualico at ldv.rz.uni-trier.dbp.de
or:      <c=de;a=dbp;p=uni-trier;ou=rz;ou=ldv;s=qualico>

Notice of acceptance will be given by March 31, 1991; and full
versions of invited and accepted papers (camera-ready) are due by 
June 30, 1991 in order to have the Conference Proceedings be published 
in time to be available for participants at the beginning of QUALICO.
This 'Call for Papers' is distributed world-wide in order to reach 
researchers active in universities and industry.

SOCIAL PROGRAMME

The oldest city in Germany, founded 16 b.C. by the Romans as Augusta 
Treverorum in the Mosel valley is situated now in the most Western 
region of Germany near both the French and Luxembourgian border.In 
the center of Europe this ancient city will host the participants 
of QUALICO at the University of Trier, surrounded by the vineyards 
of the Mosel-Saar-Ruwer wine district at vintage beginning. The excursion 
day scheduled midway through the conference (September 25, 1991) will 
provide an opportunity to visit points of historical interest in the 
city and its vicinity during a boat-trip on the Mosel river.

PROGRAM COMMITTEE

Chair: B.B. Rieger, University of Trier
S. Embleton, University of York,
D. Gibbon, University of Bielefeld
R. Grotjahn, University of Bochum
J. Haller, IAI Saarbruecken
P. Hellwig, University of Heidelberg
E. Hopkins, University of Bochum
J. Kindermann, GMD Bonn-St.Augustin
U. Klenk, University of Goettingen
R. Koehler, University of Trier
J.P. Koester, University of Trier
J. Krause, University of Regensburg
W. Lehfeldt, University of Konstanz
W. Lenders, University of Bonn
C. Lischka, GMD Bonn-St.Augustin
W. Matthaeus, University of Bochum
R.G. Piotrowski, University of Leningrad
D. Roesner, FAW Ulm
G. Ruge, Siemens AG, Muenchen
B. Schaeder, University of Siegen
H. Schnelle, University of Bochum
J. Sambor, University of Warsaw

ORGANIZING COMMITTEE

Chair: R. Koehler, University of Trier

CONFERENCE FEES
Early registration
(paid before July 31, 1991): DM 300,- 
- Members of supporting organizations  DM 250,-
- Students (without Proceedings) DM 150,-

Registration
(paid after July 31, 1991): DM 400,-
- Members of supporting organizations  DM 350,-
- Students (without Proceedings) DM 250,-


From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Fri Aug 24 12:36:28 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Fri, 24 Aug 90 12:36:28 EDT
Subject: Quantitative Linguistics???
Message-ID: <mailman.331.1149540209.24850.connectionists@cs.cmu.edu>


Perhaps the people who sent out this conference announcement could follow
up with a *brief* description of what quantitative linguistics is all
about, and why they are so excited about new advances in the area.  I'm not
familiar with the term, and the conference announcement didn't make clear
how qualitaive linguistics differs from older (qualitative?)  linguistic
models, except maybe that the key researchers are all in Europe.

And what does quantitative linguistics have to do with connectionism?

-- Scott Fahlman, Carnegie-Mellon University


From bms at dcs.leeds.ac.uk  Fri Aug 24 13:26:57 1990
From: bms at dcs.leeds.ac.uk (B M Smith)
Date: Fri, 24 Aug 90 13:26:57 BST
Subject: Item for Distribution
Message-ID: <1511.9008241226@csuna6.dcs.leeds.ac.uk>


                      FINAL CALL FOR PAPERS

                            AISB'91

        8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE

                    University of Leeds, UK
                       16-19 April, 1991

The Society for the Study of Artificial Intelligence and Simulation of
Behaviour (SSAISB) will hold its eighth biennial conference at
Bodington Hall, University of Leeds, from 16 to 19 April 1991. There
will be a Tutorial Programme on 16 April followed by the full Technical
Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit
Brussel).

Scope:
Papers are sought in all areas of Artificial Intelligence and Simulation of
Behaviour, but especially on the following AISB91 special themes:

  * Emergent functionality in autonomous agents
  * Neural networks and self-organisation
  * Constraint logic programming
  * Knowledge level expert systems research

Papers may describe theoretical or practical work but should make a
significant and original contribution to knowledge about the field of
Artificial Intelligence. 

A prize of 500 pounds for the best paper has been offered by British
Telecom Computing (Advanced Technology Group). It is expected 
that the proceedings will be published as a book.

Submission:
All submissions should be in hardcopy in letter quality print and
should be written in 12 point or pica typewriter face on A4 or 8.5" x
11" paper, and should be no longer than 10 sides, single-spaced.
Each paper should contain an abstract of not more than 200 words and a
list of up to four keywords or phrases describing the content of the
paper. Five copies should be submitted. Papers must be written in
English. Authors should give an electronic mail address where possible. 
Submission of a paper implies that all authors have obtained
all necessary clearances from the institution and that an author will
attend the conference to present the paper if it is accepted. Papers
should describe work that will be unpublished on the date of the
conference.

Dates:
  Deadline for Submission:		1 October 1990
  Notification of Acceptance:		7 December 1990
  Deadline for camera ready copy:	16 January 1991

Location: 
Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The
city of Leeds is two and a half hours by rail from London, and there are
frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam
and Paris. The Yorkshire Dales National Park is close by, and the historic 
city of York is only 30 minutes away by rail.

Information:
Papers and all queries regarding the programme should be sent to
Judith Dennison. All other correspondence and queries regarding the
conference to the Local Organiser, Barbara Smith.

  Ms. Judith Dennison			Dr. Barbara Smith
  Cognitive Sciences			Division of AI
  University of Sussex			School of Computer Studies
  Falmer				University of Leeds
  Brighton BN1 9QN			Leeds LS2 9JT
  UK					UK

  Tel: (+44) 273 678379			Tel: (+44) 532 334627
  Email: judithd at cogs.sussex.ac.uk	FAX: (+44) 532 335468
					Email: aisb91 at ai.leeds.ac.uk


From sankar at caip.rutgers.edu  Fri Aug 24 17:19:35 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Fri, 24 Aug 90 17:19:35 EDT
Subject: No subject
Message-ID: <9008242119.AA06389@caip.rutgers.edu>

Rutgers University

CAIP Center

CAIP Neural Network Workshop

15-17 October 1990

A neural network workshop will be held during 15-17 October 1990 in
East Brunswick, New Jersey under the sponsorship of the CAIP Center of
Rutgers University.  The theme of the workshop will be 

"Theory and impact of Neural Networks on future technology"

Leaders in the field from government, industry and academia will
present the state-of-the-art theory and applications of neural
networks. Attendance will be limited to about 100 participants.
  
A Partial List of Speakers and Panelists include:

		J. Alspector, Bellcore
		A. Barto, University of Massachusetts
		R. Brockett, Harvard University
		L. Cooper, Brown University
		J. Cowan, University of Chicago
		K. Fukushima, Osaka University
		D. Glasser, University of California, Berkeley
		S. Grossberg, Boston University
		R. Hecht-Nielsen, HNN, San Diego
		J. Hopfield, California Institute of Technology
		L. Jackel, AT&T Bell Labs.
		S. Kirkpatrick, IBM, T.J. Watson Research Center
		S. Kung, Princeton University
		F. Pineda, JPL, California Institute of Technology
		R. Linsker, IBM, T.J. Watson Research Center
		J. Moody, Yale University
		E. Sontag, Rutgers University
		H. Stark, Illinois Institute of Technology
		B. Widrow, Stanford University
		Y. Zeevi, CAIP Center, Rutgers University and The
                          Technion, Israel 

The workshop will begin with registration at 8:30 AM on Monday, 15
October and end at 7:00 PM on Wednesday, 17 October.  There will be
dinners on  Tuesday and Wednesday evenings followed by special-topic
discussion sessions.  The $395 registration fee ($295 for participants
from CAIP member organizations), includes the cost of the dinners.   

Participants are expected to remain in attendance throughout the entire
period of the workshop.  Proceedings of the workshop will subsequently
be published in book form.   

Individuals wishing to participate in the workshop should fill out the
attached form and mail it to the address indicated.

If there are any questions, please contact

			Prof. Richard Mammone
			Department of Electrical and Computer Engineering
			Rutgers University
			P.O. Box 909
			Piscataway, NJ 08854
			Telephone: (201)932-5554
			Electronic Mail: mammone at caip.rutgers.edu
			FAX: (201)932-4775
			Telex: 6502497820 mci


Rutgers University

CAIP Center

CAIP Neural Network Workshop

15-17 October 1990


I would like to register for the Neural Network Workshop.  


Title:________ Last:_________________ First:_______________ Middle:__________ 

Affiliation	_________________________________________________________

Address    	_________________________________________________________
	
       		______________________________________________________

Business Telephone: (___)________ FAX:(___)________

Electronic Mail:_______________________ Home Telephone:(___)________


I am particularly interested in the following aspects of neural networks:

_______________________________________________________________________

_______________________________________________________________________

Fee enclosed $_______
Please bill me $_______

Please complete the above and mail this form to:

				Neural Network Workshop
				CAIP Center, Rutgers University
				Brett and Bowser Roads
				P.O. Box 1390
				Piscataway, NJ 08855-1390 (USA)


From bms at dcs.leeds.ac.uk  Fri Aug 24 13:31:19 1990
From: bms at dcs.leeds.ac.uk (B M Smith)
Date: Fri, 24 Aug 90 13:31:19 BST
Subject: Item for Distribution
Message-ID: <1560.9008241231@csuna6.dcs.leeds.ac.uk>


                      FINAL CALL FOR PAPERS

                            AISB'91

        8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE

                    University of Leeds, UK
                       16-19 April, 1991

The Society for the Study of Artificial Intelligence and Simulation of
Behaviour (SSAISB) will hold its eighth biennial conference at
Bodington Hall, University of Leeds, from 16 to 19 April 1991. There
will be a Tutorial Programme on 16 April followed by the full Technical
Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit
Brussel).

Scope:
Papers are sought in all areas of Artificial Intelligence and Simulation of
Behaviour, but especially on the following AISB91 special themes:

  * Emergent functionality in autonomous agents
  * Neural networks and self-organisation
  * Constraint logic programming
  * Knowledge level expert systems research

Papers may describe theoretical or practical work but should make a
significant and original contribution to knowledge about the field of
Artificial Intelligence. 

A prize of 500 pounds for the best paper has been offered by British
Telecom Computing (Advanced Technology Group). It is expected 
that the proceedings will be published as a book.

Submission:
All submissions should be in hardcopy in letter quality print and
should be written in 12 point or pica typewriter face on A4 or 8.5" x
11" paper, and should be no longer than 10 sides, single-spaced.
Each paper should contain an abstract of not more than 200 words and a
list of up to four keywords or phrases describing the content of the
paper. Five copies should be submitted. Papers must be written in
English. Authors should give an electronic mail address where possible. 
Submission of a paper implies that all authors have obtained
all necessary clearances from the institution and that an author will
attend the conference to present the paper if it is accepted. Papers
should describe work that will be unpublished on the date of the
conference.

Dates:
  Deadline for Submission:		1 October 1990
  Notification of Acceptance:		7 December 1990
  Deadline for camera ready copy:	16 January 1991

Location: 
Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The
city of Leeds is two and a half hours by rail from London, and there are
frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam
and Paris. The Yorkshire Dales National Park is close by, and the historic 
city of York is only 30 minutes away by rail.

Information:
Papers and all queries regarding the programme should be sent to
Judith Dennison. All other correspondence and queries regarding the
conference to the Local Organiser, Barbara Smith.

  Ms. Judith Dennison			Dr. Barbara Smith
  Cognitive Sciences			Division of AI
  University of Sussex			School of Computer Studies
  Falmer				University of Leeds
  Brighton BN1 9QN			Leeds LS2 9JT
  UK					UK

  Tel: (+44) 273 678379			Tel: (+44) 532 334627
  Email: judithd at cogs.sussex.ac.uk	FAX: (+44) 532 335468
					Email: aisb91 at ai.leeds.ac.uk


From tgd at turing.CS.ORST.EDU  Fri Aug 24 17:55:56 1990
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Fri, 24 Aug 90 14:55:56 PDT
Subject: Human confusability of phonemes
Message-ID: <9008242155.AA06954@turing.CS.ORST.EDU>

I am conducting a comparison study of several learning algorithms on
the nettalk task.  To make the comparisons fair, I would like to be
able to rate the severity of prediction errors made by these
algorithms.  For example, if the desired phoneme is /k/ (the k in
"key") and the phoneme produced by the learned network is /e/ (the a
in "late"), then this is a bad error.  On the other hand, substituting
/x/ (the a in "pirate") for /@/ (the a in "cab") should probably not
count as much of an error.

Can any readers point me to research that has been done on the
confusability of different phonemes (i.e., to what extent human
listeners can confuse two phonemes or reliably detect their
difference)?

Thanks,

Tom Dietterich

Thomas G. Dietterich
Department of Computer Science
Dearborn Hall, 306
Oregon State University
Corvallis, OR 97331-3202

From schraudo%cs at ucsd.edu  Fri Aug 24 18:18:46 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Fri, 24 Aug 90 15:18:46 PDT
Subject: TR announcement (hardcopy and ftp)
Message-ID: <9008242218.AA14587@beowulf.ucsd.edu>

The following technical report is now available in print:

--------

        Dynamic Parameter Encoding for Genetic Algorithms
        -------------------------------------------------

           Nicol N. Schraudolph       Richard K. Belew


The selection of fixed binary gene representations for real-valued
parameters of the phenotype required by Holland's genetic algorithm
(GA) forces either the sacrifice of representational precision for
efficiency of search or vice versa.  Dynamic Parameter Encoding (DPE)
is a mechanism that avoids this dilemma by using convergence statistics
derived from the GA population to adaptively control the mapping from
fixed-length binary genes to real values.  By reducing the length of
genes DPE causes the GA to focus its search on the interactions between
genes rather than the details of allele selection within individual
genes.  DPE also highlights the general importance of the problem of
premature convergence in GAs, explored here through two convergence
models.

--------

To obtain a hardcopy, request technical report LAUR 90-2795 via e-mail
from office%bromine at LANL.GOV, or via plain mail from

	Technical Report Requests
	CNLS, MS-B258
	Los Alamos National Laboratory
	Los Alamos, NM 87545
	USA

--------

As previously announced, the report is also available in compressed
PostScript format for anonymous ftp from the Artificial Life archive
server.  To obtain a copy, use the following procedure:

$ ftp iuvax.cs.indiana.edu   % (or 129.79.254.192)
login: anonymous
password: <anything>
ftp> cd pub/alife/papers
ftp> binary
ftp> get schrau90-dpe.ps.Z
ftp> quit
$ uncompress schrau90-dpe.ps.Z
$ lpr schrau90-dpe.ps

--------

The DPE algorithm is an option in the GENESIS 1.1ucsd GA simulator, which
will be ready for distribution (via anonymous ftp) shortly.  Procedures
for obtaining 1.1ucsd will then be announced on this mailing list.

--------

Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From mikek at wasteheat.colorado.edu  Mon Aug 27 19:42:44 1990
From: mikek at wasteheat.colorado.edu (Mike Kranzdorf)
Date: Mon, 27 Aug 90 17:42:44 -0600
Subject: Mactivation - new info
Message-ID: <9008272342.AA25683@wasteheat.colorado.edu>

***Please note new physical address***

Mactivation is an introductory neural network simulator which
runs on all Macintoshes.  A graphical interface provides direct
access to units, connections, and patterns. Basic concepts of 
associative memory and network operation can be explored, with
many low level parameters available for modification. Back-
propagation is not supported. A user's manual containing an 
introduction to connectionist networks and program documentation
is included on one 800K Macintosh disk. The current version is 3.3

Mactivation is available from the author, Mike Kranzdorf. The 
program may be freely copied, including for classroom distribution.
To obtain a copy, send your name and address and a check payable
to Mike Kranzdorf for $5 (US). International orders should send either
an international postal money order for five dollars US or ten (10)
international postal coupons. 

Mactivation 3.2 is available via anonymous ftp on boulder.colorado.edu
Please don't ask me how to deal with ftp - that's why I offer it via
snail mail. I will probably post version 3.3 soon, it depends on some
politics here.

Mike Kranzdorf
P.O. Box 1379
Nederland, CO 80466-1379


From mikek at wasteheat.colorado.edu  Tue Aug 28 12:24:52 1990
From: mikek at wasteheat.colorado.edu (Mike Kranzdorf)
Date: Tue, 28 Aug 90 10:24:52 -0600
Subject: Mactivation ftp location
Message-ID: <9008281624.AA26266@wasteheat.colorado.edu>

Sorry I forgot to include the ftp specifics:

Machine: boulder.colorado.edu
Directory: /pub
File Name: mactivation.3.2.sit.hqx.Z

I really will try to put version 3.3 there soon. Please send me
comments if you use Mactivation. I am very responsive to good
suggestions and will add them when possible. Back-prop will come in version
4.0, but that's a complete re-write. I can add smaller things to 3.3.

--mike


From pako at neuronstar.it.lut.fi  Thu Aug 30 05:05:47 1990
From: pako at neuronstar.it.lut.fi (Pasi Koikkalainen)
Date: Thu, 30 Aug 90 12:05:47 +0300
Subject: ICANN International Conference on Artificial Neural Networks
Message-ID: <9008300905.AA01460@neuronstar.it.lut.fi>


                           ICANN-91
      INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS

                Helsinki University of Technology
                Espoo, Finland, June 24-28,  1991
 

Conference Chair:                 Conference Committee:
 Teuvo Kohonen   (Finland)          Bernard Angeniol   (France)
                                    Eduardo Caianiello (Italy)
Program Chair:                      Rolf Eckmiller     (FRG)
 Igor Aleksander (England)          John Hertz         (Denmark)
                                    Luc Steels         (Belgium)

                      CALL FOR PAPERS
                    ===================


THE CONFERENCE:
===============
Theories, implementations, and  applications  of Artificial Neural Networks
are  progressing  at  a  growing  speed  both  in  Europe   and  elsewhere.
The first commercial hardware for neural circuits and systems are emerging.
This  conference  will be  a major  international contact forum for experts
from academia and industry worldwide. Around 1000 participants are expected.

ACTIVITIES:
===========
 - Tutorials
 - Invited talks
 - Oral and poster sessions
 - Prototype demonstrations
 - Video presentations
 - Industrial exhibition

-------------------------------------------------------------------------

 Complete papers of at most 6 pages are invited for oral or poster
 presentation in one of the sessions given below:

 1.  Mathematical theories of networks and dynamical systems
 2.  Neural network architectures and algorithms
      (including organizations and comparative studies)
 3.  Artificial associative memories
 4.  Pattern recognition and signal processing (especially vision and speech)
 5.  Self-organization and vector quantization
 6.  Robotics and control
 7.  "Neural" knowledge data bases and non-rule-based decision making
 8.  Software development
      (design tools, parallel algorithms, and software packages)
 9.  Hardware implementations (coprocessors, VLSI, optical, and molecular)
 10. Commercial and industrial applications
 11. Biological and physiological connection
      (synaptic and cell functions, sensory and motor functions, and memory)
 12. Neural models for cognitive science and high-level brain functions
 13. Physics connection (thermodynamical models, spin glasses, and chaos)

--------------------------------------------------------------------------

Deadline for submitting manuscripts is January 15,  1991.  The  Conference
Proceedings will be published as a book by Elsevier Science Publishers B.V.
Deadline for sending final papers on  the special forms is  March 15, 1991.
For  more information and instructions for submitting manuscripts,  please
contact:

Prof. Olli Simula
ICANN-91 Organization Chairman
Helsinki University of Technology
SF-02150 Espoo, Finland
Fax: +358 0 451 3277
Telex: 125161 HTKK SF
Email (internet): icann91 at hutmc.hut.fi

---------------------------------------------------------------------------

In addition to the scientific program,  several social  occasions will be
included in  the  registration fee.  Pre- and post-conference  tours  and
excursions will also be arranged. For more information about registration
and accommodation, please contact:

Congress Management Systems
P.O.Box 151
SF-00141 Helsinki, Finland
Tel.: +358 0 175 355
Fax: +358 0 170 122
Telex: 123585 CMS SF


From uhr at cs.wisc.edu  Thu Aug 30 12:30:30 1990
From: uhr at cs.wisc.edu (Leonard Uhr)
Date: Thu, 30 Aug 90 11:30:30 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008301630.AA10562@thor.cs.wisc.edu>

A quick response to the responses to my comments on the gap between nets and
computer vision (I've been out of town, and now trying to catch up on mail):

I certainly wasn't suggesting that the number of input nodes matters, but
simply that complex images must be resolved in enough detail to be
recognizable.  Gary Cottrell's 64x64 images may be adequate for faces (tho
I suspect finer resolution is needed as more people are used, with many
different expressions (much less rotations) for each).  But the point is that
complete connectivity from layer to layer needs O(N**2) links, and the fact that
"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of
how complete connectivity dominates.  Once the preprocessor is handled by the
net itself it will either need too many links or have ad hoc structure.
It's surely better to use partial connectivity (e.g., local - which is a very
general assumption motivated by physical interactions and brain structure)
than some inevitably ad hoc preprocessing steps of unknown value.
  Evaluation is tedious and unrewarding, but without it we simply can't make
claims or compare systems.  I'm not arguing against nets - to the contrary,
I think that highly parallel nets are the only possibility for handling really
hard problems like recognition, language handling, and reasoning.  But they'll
need much better structure (or the ability to evolve and generate needed
structures).  And I was asking for objective evidence that 3-layer feed-forward
nets with links between all nodes in adjacent layers actually handle complex
images better than some of the large and powerful computer vision systems.
True - we know that in theory they can do anything.  But that's no better than
knowing that random search through the space of all Turing machine programs
can do anything.

Len Uhr

From ahmad at ICSI.Berkeley.EDU  Thu Aug 30 16:20:13 1990
From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad)
Date: Thu, 30 Aug 90 13:20:13 PDT
Subject: Summary (long): pattern recognition comparisons
In-Reply-To: Leonard Uhr's message of Thu, 30 Aug 90 11:30:30 -0500 <9008301630.AA10562@thor.cs.wisc.edu>
Message-ID: <9008302020.AA02846@icsib18.Berkeley.EDU>


>But the point is that
>complete connectivity from layer to layer needs O(N**2) links, and the fact that
>"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of
>how complete connectivity dominates.  Once the preprocessor is handled by the
>net itself it will either need too many links or have ad hoc structure.
>It's surely better to use partial connectivity (e.g., local - which is a very
>general assumption motivated by physical interactions and brain structure)
>than some inevitably ad hoc preprocessing steps of unknown value.

Systems with selective attention mechanisms provide yet another way of
avoiding the combinatorics.  In these models, you can route relevant
feature values from arbitrary locations in the image to a central
processor.  The big advantage is that the central processor can now be
quite complex (possibly fully connected) since it only has to deal
with a relatively small number of inputs.


--Subutai Ahmad
ahmad at icsi.berkeley.edu


References:

Koch, C. and Ullman, S. Shifts in Selective Attention: towards the 
  underlying neural circuitry. Human Neurobiology, Vol 4:219-227,
  1985.

Ahmad, S. and Omohundro, S. Equilateral Triangles: A Challenge for
  Connectionist Vision. In Proceedings of the 12th Annual meeting of the
  Cognitive Science Society, MIT, 1990.

Ahmad, S. and Omohundro, S. A Network for Extracting the Locations of
  Point Clusters Using Selective Attention, ICSI Tech Report 
  No. TR-90-011, 1990.

From kawahara at av-convex.ntt.jp  Fri Aug 31 10:43:46 1990
From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA)
Date: Fri, 31 Aug 90 23:43:46+0900
Subject: JNNS'90 Program Summary (long)
Message-ID: <9008311443.AA11611@av-convex.ntt.jp>

The first annual conference of Japan Neural Network Society
(JNNS'90) will be held from 10 to 12 September, 1990. Followings
are the program summary and related information on JNNS. There
are 2 Invited presentations, 23 oral presentations and 53 poster
presentations.  Unfortunately, a list of the presentation titles
in English is not available yet, because many authors didn't
provide English titles for their presentations (Official
languages for the proceding were Japanese and English. But only
two articles were written in English). I will try to compile the
English list by the end of September and would like to introduce
it.

If you have any questions or comments, please e-mail to the
following address. (Please *DON'T REPLY*.)

kawahara at nttlab.ntt.jp
- ----------------------------------------------
Hideki Kawahara
NTT Basic Research Laboratories
3-9-11, Midori-cho
Musashino, Tokyo 180, JAPAN
Tel: +81 422 59 2276, Fax: +81 422 59 3393
- ----------------------------------------------

			    JNNS'90
		   1990 Annual Conference of
		  Japan Neural Network Society
				
		     September 10-12, 1990
				
		      Tamagawa University,
		     6-1-1 Tamagawa-Gakuen
		   Machida, Tokyo 194, Japan
				
			Program Summary

Monday, 10 September 1990
12:00		Registration
13:00 - 16:00	Oral Session O1: Learning
16:00 - 18:00	Poster session P1: Learning, Motion and Architecture
18:00		Organization Committee

Tuesday, 11 September 1990
9:00  - 12:00	Oral Session O2: Motion and Architecture
13:00 - 13:30	Plenary Session
13:30 - 15:30	Invited Talk;
		"Brain Codes of Shapes: Experiments and Models" by
		Keiji Tanaka
		"Theories: from 1980's to 1990's" by
		Shigeru Shinomoto
15:30 - 18:30	Oral Session O3: Vision I
19:00		Reception

Wednesday, 12 September 1990
9:00  - 12:00	Oral Session O4: Vision II, Time Series and Dynamics
13:00 - 15:00	Poster Session P2: Vision I, II, Time Series and Dynamics
15:00 - 16:45	Oral Session O5: Dynamics

Room 450 is for Oral Session, Plenary Session and Invited talk.
Rooms 322, 323, 324, 325 and 350 are for Poster Session.

Registration Fees for Conference
Members			5000 yen
Student members		3000 yen
Otherwise		8000 yen

Reception
19:00 Tuesday, 12 September 1990
Sakufuu-building
Fee:  5000 yen


	       JNNS Officers and Governing board

Kunihiko Fukushima		Osaka University
				President
Shiun-ichi Amari		University of Tokyo
				International Affair
Secretary
Minoru Tsukada			Tamagawa University
Takashi Nagano			Hosei University

Publication
Shiro Usui			Toyohashi University of Technology
Yoichi Okabe			University of Tokyo
Sei Miyake			NHK Science and Technical Research Labs.

Planning
Yuichiro Anzai			Keio University
Keisuke Toyama			Kyoto Prefectural School of Medicine
Nozomu Hoshimiya		Tohoku University

Treasurer
Naohiro Ishii			Nagoya Institute of Technology
Hideaki Saito			Tamagawa University

Regional Affair
Ken-ichi Hara			Yamagata University
Hiroshi Yagi			Toyama University
Eiji Yodogawa			ATR
Syozo Yasui			Kyushu Institute of Technology

Supervisor
Noboru Sugie			Nagoya University


		       Committee members

Editorial Committee (Newsletter and mailing list)
Takashi Omori			Tokyo University of Agriculture and Technology
Hideki Kawahara			NTT Basic Research Labs.
Itirou Tsuda			Kyushu Institute of Technology

Planning Committee
Kazuyuki Aihara			Tokyo Denki University
Shigeru Shinomoto		Kyoto University
Keiji Tanaka			The Institute of Physical and Chemical Research


	    JNNS'90 Conference Organizing Committee

Sei Miyake			NHK Science and Technical Research Labs.
				General Chairman
Keiji Tanaka			The Institute of Physical and Chemical Research
				Program Chairman
Shigeru Shinomoto		Kyoto University
				Publicity Chairman
Program
Takayuki Ito			NHK Science and Technical Research Labs.
Takashi Omori			Tokyo University of Agriculture and Technology
Koji Kurata			Osaka University
Kenji Doya			University of Tokyo
Kazuhisa Niki			Electrotechnical Laboratory
Ryoko Futami			Tohoku University

Publicity
Kazunari Nakane			ATR

Publication
Hideki Kawahara			NTT Basic Research Labs.
Mahito Fujii			NHK Science and Technical Research Labs.

Treasurer
Shin-ichi Kita			University of Tokyo
Manabu Sakakibara		Toyohashi University of Technology

Local Arrangement
Shigeru Tanaka			Fundamental Research Labs., NEC
Makoto Mizuno			Tamagawa University


For more details, please contact: 

Japan Neural Network Society Office
Faculty of Engineering, Tamagawa University
6-1-1 Tamagawa-Gakuen
Machida, Tokyo 194, Japan
Telephone: +81 427 28 3457 
Facsimile: +81 427 28 3597


From carol at ai.toronto.edu  Fri Aug  3 11:14:14 1990
From: carol at ai.toronto.edu (Carol Plathan)
Date: Fri, 3 Aug 90 11:14:14 EDT
Subject: research programmer job
Message-ID: <90Aug3.111424edt.268@neuron.ai.toronto.edu>


RESEARCH PROGRAMMER JOB AT THE UNIVERSITY OF TORONTO

STARTING SALARY: $36,895 - $43,406
STARTING DATE:  Fall 1990

The Connectionist Research Group in the Department of Computer
Science at the University of Toronto is looking for a research
programmer to develop a neural network simulator that uses Unix, C,
and X-windows.  The simulator will be used by our group of about 10
researchers, directed by Geoffrey Hinton, to explore learning
procedures and their applications.  It will also be released to some
researchers in Canadian Industry.  We already have a fast, flexible
simulator and the programmer's main job will be to further develop,
document, and maintain this simulator.  The development may involve
some significant re-design of the basic simulator.  Additional
duties (if time permits) will include:

Implementing several different learning procedures within the
simulator and investigating their performance on various data-sets;
Assisting industrial collaborators and visitors in the use of the
simulator; Porting the simulator to faster workstations or to boards
that use fast processors such as the Intel i860 or DSP chips;
Developing software for a project that uses a data-glove as an input
device to an adaptive neural network that drives a speech
synthesizer; Assisting in the acquisition and installation of
hardware and software required for the project;

The applicant should possess a Bachelors or Masters, preferably in
Computer Science or Electrical Engineering, and have at least two
years programming experience including experience with unix and C,
and some experience with graphics.  Knowledge of elementary calculus
and elementary linear algebra is essential.  Knowlege of numerical
analysis, information theory, and perceptual or cognitive psychology
would be advantageous.  Good oral and written communication skills
are required.

Please send CV + names of two or three references to Carol Plathan,
Computer Science Department, University of Toronto, 10 Kings College
Road, Toronto Ontario M5S 1A4.  You could also send the information
by email to carol at ai.toronto.edu or call Carol at 416-978-3695 for
more details.  The University of Toronto is an equal opportunity
employer.

ADDITIONAL INFORMATION

The job can be given to a non-Canadian if they are better than any
Canadians or Canadian Residents who apply.  In this case, the
non-Canadian would probably start work here on a temporary work
permit while the application for a more permanent permit was being
processed.

There are already SEVERAL good applicants for the job.  Candidates
who do not already program fluently in C or have not already done
neural network simulations stand very little chance.  Also, it is
basically a programming job. The programmer may get involved in some
original research on neural nets, but this is NOT the main part of
the job, so it is not suitable for postdoctoral researchers who want
to get on with their own research agenda.

Interviews will be during September.  We will definitely not employ
anybody without an interview and we cannot afford to pay travel
expenses for interviews (except in very exceptional circumstances).
If there are several good applicants from the west coast of the USA,
I may arrange to interview them in California.

We already have sufficient funding to support the programmer for the
next three years. However, we have applied to the Canadian
Government for additional funding specifically for this work, and if
it comes through (in November 1990) the programmer will be
transferred to that source of funding and the simulator will
definitely be supplied to Canadian Industry.  The job will then
require more interactions with industrial users and more systematic
documentation, maintainance and debugging of the simulator releases.


From uhr at cs.wisc.edu  Fri Aug  3 15:18:11 1990
From: uhr at cs.wisc.edu (Leonard Uhr)
Date: Fri, 3 Aug 90 14:18:11 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008031918.AA23586@thor.cs.wisc.edu>

Neural nets using backprop have only handled VERY SIMPLE images, usually in
8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
logarithmically converging nets, but I don't know of any nets with complete
connectivity from one layer to the next that are that big.)  In sharp contrast,
pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (e.g.
houses, furniture) in 128-by-128 or even larger inputs.  So I've been really
surprised to read statements to the effect NN have proved to be much better.
What experimental evidence is there that NN recognize images as complex as those
handled by computer vision and pattern recognition approaches?

  True it's hard to run good comparative experiments, but without them where are
we?  NN re-introduce learning, which is great - except that to make learning
work we need to cut down and direct the explosive search at least as much as
using any other approach.  The brain is THE bag of tools that does the trick,
and it has a lot of structure (hierarchical convergence-divergence; local links to relatively small numbers; families of feature-detectors) that can
substantially improve today's nets.  More powerfufl structures, basic processes,
and learning mechanisms are essential to replace weak learning algorithms like
delta and backprop that need O(N*N) links to guarantee (eventual) success -
hence can't even be run on images with more than a few hundred pixels.

Len Uhr

From N.E.Sharkey at cs.exeter.ac.uk  Sat Aug  4 16:30:53 1990
From: N.E.Sharkey at cs.exeter.ac.uk (Noel Sharkey)
Date: Sat, 4 Aug 90 16:30:53 BST
Subject: special issue
Message-ID: <11054.9008041530@entropy.cs.exeter.ac.uk>


The NATURAL LANGUAGE special issue of CONNECION SCIENCE will be
on the shelves soon. I though you might like to see the contents.


CONTENTS

Catherine L Harris
   Connectionism and Cognitive Linguistics

John Rager & George Berg
   A Connectionist Model of Motion and Government in Chomsky's
   Government-binding Theory

David J Chalmers
   Syntactic Transformations on Distributed Representations

Stan C Kwasny & Kanaan A Faisal
   Connectinism and Determinism in a Syntactic Parser

Risto Miikkulainen
   Script Recognition with Hierarchical Feature Maps

Lorraine F R Karen
   Identification of Topical Entities in Discouse: a Connectionist
   Approach to Attentional Mechanism in Language

Mary Hare
   The Role of Similarity in Hungarian Vowel Harmony: a Connectionist
   Account

Robert Port
   Representation and Recognition of Temporal Patterns


Editor: Noel E. Sharkey, University of Exeter

Special Editorial Review Panel

Robert B. Allen, Bellcore
Garrison W. Cottrell, University of California, San Diego
Michael G. Dyer, University of California, Los Angeles
Jeffrey L. Elman, University of California, San Diego
George Lakoff, University of California, Berkeley
Wendy G. Lehnert, University of Massachusetts at Amherst
Jordan Pollack, Ohio State University
Ronan Reilly, Beckman Institute, University of Illinois at Urbana-Champaign
Bart Selman, University of Toronto
Paul Smolensky, University of Colorado, Boulder

We would like to encourage the CNLP community to submit many
more papers, and we would particulary like to see more papers on 
representational issues.

noel


From schraudo%cs at ucsd.edu  Sat Aug  4 15:43:20 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Sat, 4 Aug 90 12:43:20 PDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008041943.AA01622@beowulf.ucsd.edu>

> From: Leonard Uhr <uhr at cs.wisc.edu>
> 
> Neural nets using backprop have only handled VERY SIMPLE images, usually in
> 8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
> logarithmically converging nets, but I don't know of any nets with complete
> connectivity from one layer to the next that are that big.) In sharp contrast,
> pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (eg
> houses, furniture) in 128-by-128 or even larger inputs.  So I've been really
> surprised to read statements to the effect NN have proved to be much better.
> What experimental evidence is there that NN recognize images as complex as
> those handled by computer vision and pattern recognition approaches?

Well, Gary Cottrell for instance has successfully used a standard (3-layer,
fully interconnected) backprop net for various face recognition tasks from
64x64 images.  While I agree with you that many NN architectures don't scale
well to large input sizes, and that modular, heterogenous architectures have
the potential to overcome this limitation, I don't understand why you insist
that current NNs could only handle simple images - unless you consider any
image with less than 16k pixels simple.  Does face recognition qualify as a
complex visual task with you?

The whole point of using comparatively inefficient NN setups (such as fully
interconnected backprop nets) is that they are general enough to solve
complex problems without built-in heuristics.  Modular NNs require either
a lot of prior knowledge about the problem you are trying to solve, or a
second adaptive system (such as a GA) to search the architecture space.
In the former case the problem is comparatively easy, and in the latter
computational complexity rears its ugly head again... having said that,
I do believe that GA/NN hybrids will play an important role in the future.

I'm afraid I don't have a reference for Gary Cottrell's work - maybe
someone else can post the details?
--
Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From honavar at cs.wisc.edu  Sat Aug  4 20:43:56 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sat, 4 Aug 90 19:43:56 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008050043.AA05173@goat.cs.wisc.edu>


>The whole point of using comparatively inefficient NN setups (such as fully
>interconnected backprop nets) is that they are general enough to solve
>complex problems without built-in heuristics.  

	While I know of theoretical results that show that a feedforward
	neural net exists that can adequately encode any arbitrary
 	real-valued function (Hornick, Stinchcombe, & White, 1988;
	Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of
	any results that suggest that such nets can LEARN any real-vauled
	function using backpropagation (ignoring the issue of 
	computational tractability). 

	Heuristics (or architectural constraints) like those used
	by some researchers for some vision problems - locally linked 
	multi-layer converging nets (probably one of
	the most successful demonstrations is the work of LeCun et al. 
	on handwritten zip code recognition) are interesting because
	they constrain (or bias) the network to develop particular types of
	representations. Also, they might enable efficient learning
	to take place in tasks that exhibit a certain intrinsic structure.
	
	The choice of a particular fixed neural network architecture 
	(even if it is fully interconnected backprop net) implies the 
	use of a corresponding representational bias. 
	Whether such a representational bias is in any sense more
	general than some other (e.g., a network of nodes with limited 
	fan-in but sufficient depth) is questionable (For any given
	completely interconnected feedforward network, there exists
	a functionally equivalent feedforward network of nodes with
	limited fan in - and for some problems, the latter may be
	more efficient).

	On a different note, how does one go about assessing the 
	"generality" of a learning algorithm/architecture in practice?
	I would like to see a discussion on this issue.

	Vasant Honavar (honavar at cs.wisc.edu)


From schraudo%cs at ucsd.edu  Sun Aug  5 05:54:43 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Sun, 5 Aug 90 02:54:43 PDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008050954.AA00265@beowulf.ucsd.edu>

> From honavar at cs.wisc.edu Sat Aug  4 17:45:01 1990
> 
> While I know of theoretical results that show that a feedforward
> neural net exists that can adequately encode any arbitrary
> real-valued function (Hornick, Stinchcombe, & White, 1988;
> Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of
> any results that suggest that such nets can LEARN any real-vauled
> function using backpropagation (ignoring the issue of 
> computational tractability). 
> 
It is my understanding that some of the latest work of Hal White et al.
presents a learning algorithm - backprop plus a rule for adding hidden
units - that can (in the limit) provably learn any function of interest.
(Disclaimer: I don't have the mathematical proficiency required to fully
appreciate White et al.'s proofs and thus have to rely on second-hand
interpretations.)

> On a different note, how does one go about assessing the 
> "generality" of a learning algorithm/architecture in practice?
> I would like to see a discussion on this issue.
> 
I second this motion.  As a starting point for discussion, would the
Kolmogorov complexity of an architectural description be useful as a
measure of architectural bias?
--
Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From aarons at cogs.sussex.ac.uk  Sun Aug  5 07:57:52 1990
From: aarons at cogs.sussex.ac.uk (Aaron Sloman)
Date: Sun, 5 Aug 90 12:57:52 +0100
Subject: Summary (long): pattern recognition comparisons
Message-ID: <6816.9008051157@csuna.cogs.susx.ac.uk>


> From: Leonard Uhr <uhr at cs.wisc.edu>
>
> Neural nets using backprop have only handled VERY SIMPLE images.....
>  .......In sharp contrast, pr/computer vision systems are designed
> to handle MUCH MORE COMPLEX images (eg houses, furniture) in
> 128-by-128 or even larger inputs....
	.....

> From: Nici Schraudolph <schraudo%cs at ucsd.edu>
> Well, Gary Cottrell for instance has successfully used a standard (3-layer,
> fully interconnected) backprop net for various face recognition tasks from
> 64x64 images.  While I agree with you that many NN architectures don't scale
> well to large input sizes, and that modular, heterogenous architectures have
> the potential to overcome this limitation, I don't understand why you insist
> that current NNs could only handle simple images - unless you consider any
> image with less than 16k pixels simple.  Does face recognition qualify as a
> complex visual task with you?
> ......

Characterising the complexity of the task in terms of the number of
pixels seems to me to miss the most important points.

Some (but by no means all) of the people working on NNs appear to have
joined the field (the bandwagon?) without feeling obliged to study the
AI literature on vision, perhaps because it is assumed that since the
AI mechanisms are "wrong" all the literature must be irrelevant?

On the contrary, good work in AI vision was concerned with understanding
the nature of the task (or rather tasks) of a visual system,
independently of the mechanisms postulated to perform those tasks. (When
your programs fail you learn more about the nature of the task.)

Recognition of isolated objects (e.g. face recognition) is just _one_ of
the tasks of vision.

Others include:

(a) Interpreting a 2-D array (retinal array or optic array) in terms of
3-D structures and relationships. Seeing the 3-D structure of a face is
a far more complex task than simply attaching a label: "Igor", "Bruce"
or whatever.

(b) Segmenting a complex scene into separate objects and describing the
relationships between them (e.g. "houses, furniture"!). (The
relationships include 2-D and 3-D spatial and functional relations.)
Because evidence for boundaries is often unclear and ambiguous, and
because recognition has to be based on combinations of features, the
segmentation often cannot be done without recognition and recognition
cannot be done without segmentation. This chicken and egg problem can
lead to dreadful combinatorial searches. NNs offer the prospect of doing
some of the searching in parallel by propagating constraints, but as
far as I know they have not yet matched the more sophisticated AI
visual systems.

(It is important to distinguish segmentation, recognition and
description of 2-D image fragments from segmentation, recognition and
description of 3-D objects. The former seems to be what people in
pattern recognition and NN research concentrate on most. The latter has
been a major concern of AI vision work since the mid/late sixties,
starting with L.G. Roberts I think, although some people in AI have
continued trying to find 2-D cues to 3-D segmentation. Both 2-D and 3-D
interpretations are important in human vision.)

(c) Seeing events, processes and their relationships. Change "2-D" to
"3-D" and "3-D" to "4-D" in (b) above. We are able to segment, recognize
and describe events, processes and causal relationships as well as
objects (e.g. following, entering, leaving, catching, bouncing,
intercepting, grasping, sliding, supporting, stretching, compressing,
twisting, untwisting, etc. etc.) Sometimes, as Johansson showed by
attaching lights to human joints in a dark room, motion can be used
to disambiguate 3-D structure.

(d) Providing information and/or control signals for motor-control
mechanisms: e.g. visual feedback is used (unconsciously) for posture
control in sighted people, also controlling movement of arm, hand and
fingers in grasping, etc. (I suspect that many such processes of fine
tuning and control use changing 2-D "image" information rather than (or
in addition to) 3-D structural information.)

That's still only a partial list of the tasks of a visual system.
For more detail see:
 A. Sloman `On designing a visual system: Towards a Gibsonian
 computational model of vision' in Journal of Experimental and
 Theoretical AI 1,4, 1989

 Ballard, D.H. and C.M. Brown,
 Computer Vision,
 Englewood-Cliffs, Prentice Hall 1982.

A system might be able to recognize isolated faces or other objects in
an image by using mechanisms that would fail miserably in dealing with
cluttered scenes where recognition and segmentation need to be combined.
So a NN that recognised faces might tell us nothing about how it is done
in natuarly visual systems, if the latter use more general mechanisms.

One area in which I think neither AI nor NN work has made significant
progress is shape perception. (I don't mean shape recognition!). People,
and presumably many other animals, can see complex, intricate, irregular
and varied shapes in a manner that supports a wide range of tasks,
including recognizing, grasping, planning, controlling motion,
predicting the consequences of motion, copying, building, etc. etc.
Although a number of different kinds of shape representations have been
explored in work on computer vision, CAD, graphics etc. (e.g. feature
vectors; logical descriptions; networks of nodes and arcs; numbers
representing co-ordinates, orientations, curvature etc; systems of
equations for lines, planes, and other mathematically simple structures;
fractals; etc. etc. etc.) they all seem capable of capturing only a
superficial subset of what we can see when we look at kittens, sand
dunes, crumpled paper, a human torso, a shrubbery, cloud formations,
under-water scenes, etc. (Work on computer graphics is particularly
misleading, because people are often tempted to think that a
representation that _generates_ a natural looking image on a screen must
capture what we see in the image, or in the scene that it depicts.)

Does anyone have any idea what kind of breakthrough is needed in order
to give a machine the kind of grasp of shape that can explain animal
abilities to cope with real environments?

Is there anything about NN shape representations that given them an
advantage over others that have been explored, and if so what are they?

I suspect that going for descriptions of static geometric structure is a
dead end: seeing a shape really involves seeing potential processes
involving that shape, and their limits (something like what J.J. Gibson
meant by "affordances"?). I.e. a 3-D shape is inherently a vast array of
4-D possibilities and one of the tasks of a visual system is computing a
large collection of those possibilities and making them readily
available for a variety of subsequent processes.

But that's much too vague an idea to be very useful. Or is it?

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons at cogs.sussex.ac.uk
or:
            aarons%uk.ac.sussex.cogs at nsfnet-relay.ac.uk


From honavar at cs.wisc.edu  Sun Aug  5 15:48:37 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sun, 5 Aug 90 14:48:37 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008051948.AA00212@goat.cs.wisc.edu>


>It is my understanding that some of the latest work of Hal White et al.
>presents a learning algorithm - backprop plus a rule for adding hidden
>units - that can (in the limit) provably learn any function of interest.
>(Disclaimer: I don't have the mathematical proficiency required to fully
>appreciate White et al.'s proofs and thus have to rely on second-hand
>interpretations.)

	I can see how allowing the addition of (potentially unbounded
number of hidden units) could enable a back-prop architecture to learn
arbitrary functions. But in this sense, any procedure that builds up
a look-up table or random-access memory (with some interpolation
capability to cover the instances not explicitly stored) using an 
appropriate set of rules to add units is equally general 
(and probably more efficient than backprop in terms of time complexity 
of learning (cf Baum's proposal for more powerful learning algorithms). 
However look-up tables can be combinatorially intractable in terms of 
memory (space) complexity. This brings us to the issue of searching the 
architectural space along with the weight space in an efficient manner.  
There has already been some work in this direction (Fahlman's cascade
correlation architecture, Ash's DNC, Honavar &  Uhr's generative learning,
Hanson's meiosis networks, and some recent work on ga-nn hybrids). 
We have been investigating methods to constrain the search in the 
architectural space (using heuristic controls / representational bias :-) ). 
I would like to hear from others who might be working on related issues.

Vasant Honavar (honavar at cs.wisc.edu)


From galem at mcc.com  Sun Aug  5 17:48:25 1990
From: galem at mcc.com (Gale Martin)
Date: Sun, 5 Aug 90 16:48:25 CDT
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008052148.AA02989@sunkist.aca.mcc.com>


Leonard Uhr states (about NN learning) "to make learning work, 
we need to cut down and direct explosive search at least as much 
as using any other approach."  

Certainly there is reason to agree with this in the general case, but I
doubt it's validity in important specific cases.  I've spent the past couple
of years working on backprop-based handwritten character recognition and 
find almost no supporting evidence of the need for explicitly cutting
down on explosive search through the use of heuristics in these 
SPECIFIC cases and circumstances.

We varied input character array size (10x16, 15x24, 20x32) to backprop
nets and found no difference in the number of training samples required
to achieve a given level of generalization performance for hand-printed
letters. In nets with one hidden layer, we increased the number 
of hidden nodes from 50 to 383 and found no increase in the number of 
training samples needed to achieve high generalization (in fact, generalization
is worse for the 50 hidden node case).  We experimented extensively with
nets having local connectivity and locally-linked nets in this domain and 
find similarly little evidence to support the need for such heuristics. These
results hold across two different types of handwritten character recognition
tasks (hand-printed letters and digits).  
 
This domain/case-specific robustness across architectural parameters and 
input size is one way to characterize the generality of a learning algorithm 
and may recommend one algorithm over another for specific problems.

Gale Martin

Martin, G. L., & Pittman, J. A.  Recognizing hand-printed letters and digits
	in D.S. Touretzky (Ed.) Advances in Neural Information Processing 
	Systems 2, 1990.
Martin, G.L., Leow, W.K. & Pittman, J. A.  Function complexity effects on
	backpropagation learning.  MCC Tech Report ACT-HI-062-90. 
	
 
From ganesh at cs.wisc.edu  Sun Aug  5 17:59:23 1990
From: ganesh at cs.wisc.edu (Ganesh Mani)
Date: Sun, 5 Aug 90 16:59:23 -0500
Subject: Paper
Message-ID: <9008052159.AA21968@sharp.cs.wisc.edu>

The following paper is available for ftp from the repository at Ohio State.
Please backpropagate comments (and errors!) to ganesh at cs.wisc.edu.

-Ganesh Mani

_________________________________________________________________________


Learning by Gradient Descent in Function Space

Ganesh Mani
Computer Sciences Dept.
Unviersity of Wisconsin---Madison
ganesh at cs.wisc.edu

Abstract

Traditional connectionist networks have homogeneous nodes
wherein each node executes the same function.  Networks where each node
executes a different function can be used to achieve efficient
supervised learning. A modified back-propagation algorithm
for such networks, which performs gradient descent in ``function space,''
is presented and its advantages are discussed.  The benefits of the 
suggested paradigm include faster learning and ease of interpretation 
of the trained network.

_________________________________________________________________________


The following can be used to ftp the paper.


unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): neuron
ftp> cd pub/neuroprose
ftp> type binary
ftp> get
(remote-file)  mani.function-space.ps.Z
(local-file) mani.function-space.ps.Z
ftp> quit
unix> uncompress  mani.function-space.ps.Z
unix> lpr -P(your_local_postscript_printer)  mani.function-space.ps

From honavar at cs.wisc.edu  Mon Aug  6 00:27:25 1990
From: honavar at cs.wisc.edu (Vasant Honavar)
Date: Sun, 5 Aug 90 23:27:25 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008060427.AA00489@goat.cs.wisc.edu>


We have found that with relatively small sample sizes, 
generalization performance is improved by local connectivity
and weight sharing on simple 2-d patterns.

For position-invariant recognition, local connectivity 
and weight-sharing give substantially better generalization 
performance than that obtained without local connectivity.

Clearly this is a case where extensive empirical studies 
are needed to draw general conclusions.

Vasant Honavar (honavar at cs.wisc.edu)

From awyk at wapsyvax.oz.au  Mon Aug  6 03:17:41 1990
From: awyk at wapsyvax.oz.au (Brian Aw)
Date: Mon, 6 Aug 90 15:17:41+0800
Subject: No subject
Message-ID: <9008060725.649@munnari.oz.au>

Dear Sir/Mdm,
	Hello!  My name is Brian Aw and my e-mail address is awyk at wapsyvax.oz  
Would you kindly put me on both your address list and your mailing list for 
connectionist related results.
	I am a Ph.D. student as well as a research officer in the Psychology
Department of the University of Western Australia (UWA), Perth.  I am working 
under the supervision of Prof. John Ross who has recently joined your lists.  
	I am an enthusiastic worker of neural network theory.  Currently, I am 
developing a neural network for feature classifications in images.  This year, 
I have published a technical report in the Computer Scrience Department of UWA 
in this area.  My work has also been accepted for presentation and publication 
in the forthcoming 4th Australian Joint Conference on Artificial Intelligence 
(AI'90).
	Working in this field which advances so rapidly, I certainly need the
kind of fast going and up-to-date informations which your system can provide.
	Thanking you in advance.

brian.

From erol at ehei.ehei.fr  Mon Aug  6 08:07:39 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 12:09:39 +2 
Subject: IJPRAI CALL FOR PAPERS
Message-ID: <9008061041.AA24889@inria.inria.fr>

Would you consider a paper on my "random network model" ?
There are two papers already appeared or appearing in the journal
Neural Computation.

Best regards,

Erol

From erol at ehei.ehei.fr  Mon Aug  6 05:47:31 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 09:49:31 +2 
Subject: Visit to Poland
Message-ID: <9008061014.AA24279@inria.inria.fr>

I don't know about Poland, but you can contact me in Paris !

Erol Gelenbe

From INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU  Sun Aug  5 15:56:00 1990
From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU (INS_ATGE%JHUVMS.BITNET@VMA.CC.CMU.EDU)
Date: Sun, 5 Aug 90 14:56 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <mailman.320.1149591193.29955.connectionists@cs.cmu.edu>

As a side note on the problem of using backpropagation on large
problems, it should be noted that using efficient error minimization
methods (i.e. conjugate-gradient methods) as opposed to
the "vanilla" backprop described in _Parallel_Distributed_Processing_
allows one to work with much larger problems, and also allows for much
greater performance on problems the network was trained on.
For example, an IR target threat detection problem I have been recently
working on (with 127 or 254 inputs and 20 training patterns)
failed miserably when trained with "vanilla" backprop (hours and
hours on a Connection Machine without success).  When a
conjugate-gradient training program was used, the network was able to
learn 100% of the training set perfectly in just a minute or two.

>It is my understanding that some of the latest work of Hal White et al.
>presents a learning algorithm - backprop plus a rule for adding hidden
>units - that can (in the limit) provably learn any function of interest.
>(Disclaimer: I don't have the mathematical proficiency required to fully
>appreciate White et al.'s proofs and thus have to rely on second-hand
>interpretations.)

How does this new work compare with the Cascade Correlation method
developed by Fahlman, where a new hidden unit is added by training
its receptive weights to maximize the correlation between its
output and the network error, and then trains the projective weights
to the outputs to minimize the error (thus only allowing single-layer
backprop learning at each iteration)?

-Thomas Edwards
The Johns Hopkins University / U.S. Naval Research Lab


From erol at ehei.ehei.fr  Mon Aug  6 11:44:10 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Mon, 6 Aug 90 15:46:10 +2 
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008061444.AA05688@inria.inria.fr>

I would like to draw your attention to two recent papers of mine (my name is
Erol Gelenbe) :

Random networks with positive and negative signals and product form solutions
in Neural Computation, Vol. 1, No. 4 (1989)

Stability of the random network model
in press in Neural Computation.

The papers present a new model in which signals travel as "pulses".
The quantity looked at in the model is the "neuron potential" in an
arbitrarily connected network. I prove that these models have "product
form" which means that there state can be computed simply and analytically.

Comments and questions are welcome.

erol at ehei.ehei.fr

From fritz_dg%ncsd.dnet at gte.com  Mon Aug  6 17:26:57 1990
From: fritz_dg%ncsd.dnet at gte.com (fritz_dg%ncsd.dnet@gte.com)
Date: Mon, 6 Aug 90 17:26:57 -0400
Subject: neural network generators in Ada
Message-ID: <9008062126.AA27920@bunny.gte.com>


Are there any non-commercial Neural Network "generator programs" or
such that are in Ada? (ie. generates suitable NN code from a set of
user designated specifications, code suitable for embedding, etc).

I'm interested in

	- experience developing and using same, lessons learned
	- to what uses such have been put, successful?
	- nature of; internal use of lists, arrays; what can be user
	  specified, what can't; built-in limitations; level of HMI
	  attached; compilers used; etc., etc.
	- and other relevant info developing and applying such from those
	  who have tried developing and using them

Am also interested in opinions on:

	If you were going to design a NN Maker _today_, how would you design it?
	If Ada were the language, what special things might be done?

Motive should be transparent.  My sincere thanks to all who respond.  If there
is interest, I'll turn the info (if any) around to the list in general.

Dave Fritz	fritz_dg%ncsd at gte.com
		(301) 738-8932		

----------------------------------------------------------------------
----------------------------------------------------------------------

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Mon Aug  6 23:20:09 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Mon, 06 Aug 90 23:20:09 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Sun, 05 Aug 90 14:56:00 -0500.
Message-ID: <mailman.321.1149591193.29955.connectionists@cs.cmu.edu>


    >It is my understanding that some of the latest work of Hal White et al.
    >presents a learning algorithm - backprop plus a rule for adding hidden
    >units - that can (in the limit) provably learn any function of interest.
    >(Disclaimer: I don't have the mathematical proficiency required to fully
    >appreciate White et al.'s proofs and thus have to rely on second-hand
    >interpretations.)
    
    How does this new work compare with the Cascade Correlation method
    developed by Fahlman, where a new hidden unit is added by training
    its receptive weights to maximize the correlation between its
    output and the network error, and then trains the projective weights
    to the outputs to minimize the error (thus only allowing single-layer
    backprop learning at each iteration)?
    
    -Thomas Edwards
    The Johns Hopkins University / U.S. Naval Research Lab


I'll take a stab at answering this.  Maybe we'll also hear something from
Hal White or one of his colleagues -- especially if I somehow misrepresent
their work.

I believe that all of the published completeness results from White's group
assume a single layer of hidden units.  They show that this architecture
can approximate any desired transfer function (assuming it has certain
smoothness properties) to any desired accuracy if you add enough units in
this single layer.  It's rather like proving that a piecewise linear
approximation can approach any desired curve with arbitrarily small error
as long as you're willing to use enough tiny pieces.  Unless I've
missed something, their work does not attempt to say anything about the
minimum number of hidden units you might need in this hidden layer.

Cascade-Correlation produces a feed-forward network of sigmoid units, but
it differs in a number of ways from the kinds of nets considered by White:

1. Cascade-Correlation is intended to be a practical learning algorithm
that produces a relatively compact solution as fast as possible.

2. In a Cascade net, each new hidden unit can receive inputs from all
pre-existing hidden units.  Therefore, each new unit is potentially a new
layer.  White's results show that you don't really NEED more than a single
hidden layer, but having more layers can sometimes result in a very
dramatic reduction in the total number of units and weights needed to solve
a given problem.

3. There is no convergence proof for Cascade-Correlation.  The candidate
training phase, in which we try to create new hidden units by hill-climbing
in some correlation measure, can and does get stuck in local maxima of this
function.  That's one reason we use a pool of candidate units: by training
many candidates at once, we can greatly reduce the probability of creating
new units that do not contribute significantly to the solution, but with a
finite candidate pool we can never totally eliminate this possibility.

It would not be hard to modify Cascade-Correlation to guarantee that it
will eventually grind out a solution.  The hard part, for a practical
learning algorithm, is to guarantee that you'll find a "reasonably good"
solution, however you want to define that.  The recent work of Gallant and
of Frean are interesting steps in this direction, at least for
binary-valued transfer functions and fixed, finite training sets.

-- Scott

From jamesp at chaos.cs.brandeis.edu  Mon Aug  6 21:38:40 1990
From: jamesp at chaos.cs.brandeis.edu (James Pustejovsky)
Date: Mon, 6 Aug 90 21:38:40 edt
Subject: Visit to Poland
In-Reply-To: erol@ehei.ehei.fr's message of Mon, 6 Aug 90 09:49:31 +2  <9008061014.AA24279@inria.inria.fr>
Message-ID: <9008070138.AA17019@chaos.cs.brandeis.edu>


please withdraw my name from the list. there is too much random and irrelevant
noise around the occasional noteworthy bit. 

From ericj at starbase.MITRE.ORG  Tue Aug  7 08:33:27 1990
From: ericj at starbase.MITRE.ORG (Eric Jenkins)
Date: Tue, 7 Aug 90 08:33:27 EDT
Subject: ref for conjugate-gradient...
Message-ID: <9008071233.AA25689@starbase>


Would someone please post a pointer to info on conjugate-gradient methods of
error minimization.  Thanks.

Eric Jenkins (ericj at ai.mitre.org)
 

From erol at ehei.ehei.fr  Tue Aug  7 07:06:28 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Tue, 7 Aug 90 11:08:28 +2 
Subject: Call for Papers - ICGA-91
Message-ID: <9008071511.AA21568@inria.inria.fr>

Concerning the scope of the conference, could the program chairman indicate
what the boundaries of the area of genetic algorithms are in the context of
this meeting ?

This can be indicated by providing one or more references the conference
chairman considers to be "typical" work in this area.

Erol Gelenbe

From erol at ehei.ehei.fr  Tue Aug  7 10:42:33 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Tue, 7 Aug 90 14:44:33 +2 
Subject: postdoc position available
Message-ID: <9008071513.AA21597@inria.inria.fr>


From jose at learning.siemens.com  Tue Aug  7 19:55:05 1990
From: jose at learning.siemens.com (Steve Hanson)
Date: Tue, 7 Aug 90 18:55:05 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <9008072355.AA05108@learning.siemens.com.siemens.com>

Scott:  Isn't CC just Cart?

Steve


From schraudo%cs at ucsd.edu  Tue Aug  7 15:05:35 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Tue, 7 Aug 90 12:05:35 PDT
Subject: Similarity to Cascade-Correlation
Message-ID: <9008071905.AA10253@beowulf.ucsd.edu>

> From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU
> 
> How does [White et al.'s] new work compare with the Cascade Correlation
> method developed by Fahlman [...]?

In practical terms, very badly.  Their algorithm's point is purely theore-
tical: they can prove convergence from only a very small base of assumptions
about the function to be learned.  Do any similar proofs exist for Cascade
Correlation?  That would be interesting.
--
Nicol N. Schraudolph, C-014                nici%cs at ucsd.edu
University of California, San Diego        nici%cs at ucsd.bitnet
La Jolla, CA 92093-0114                    ...!ucsd!cs!nici

From erol at ehei.ehei.fr  Wed Aug  8 07:12:24 1990
From: erol at ehei.ehei.fr (erol@ehei.ehei.fr)
Date: Wed, 8 Aug 90 11:14:24 +2 
Subject: abstract
Message-ID: <9008081009.AA23199@inria.inria.fr>

I would be very interested to get a copy of this paper. Thankyou in advance,

Erol Gelenbe
erol at ehei.ehei.fr

From pkube at ucsd.edu  Wed Aug  8 15:23:30 1990
From: pkube at ucsd.edu (pkube@ucsd.edu)
Date: Wed, 08 Aug 90 13:23:30 MDT
Subject: ref for conjugate-gradient... 
In-Reply-To: Your message of Tue, 07 Aug 90 08:33:27 EDT.
             <9008071233.AA25689@starbase> 
Message-ID: <9008082023.AA07129@kokoro.ucsd.edu>


For understanding and implementing conjugate gradient and other optimization
methods cleverer than vanilla backprop, I've found the following to be
useful:

%A William H. Press
%T Numerical Recipes in C: The Art of Scientific Computing
%I Cambridge University Press
%D 1988

%A J. E. Dennis
%A R. B. Schnabel
%T Numerical Methods for Unconstrained Optimization and Nonlinear Equations
%I Prentice-Hall
%D 1983

%A R. Fletcher
%T Practical Methods of Optimization, Vol. 1:  Unconstrained Optimization
%I John Wiley & Sons
%D 1980

	--Paul Kube at ucsd.edu

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Wed Aug  8 10:09:48 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Wed, 08 Aug 90 10:09:48 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Wed, 08 Aug 90 08:39:55 -0500.
             <9008081339.AA05550@learning.siemens.com.siemens.com> 
Message-ID: <mailman.322.1149591193.29955.connectionists@cs.cmu.edu>


I got this clarification from Steve Hanson of his original query, which I
found a bit cryptic:

    Isn't cascade Correlation a version (almost exact except for splitting
    rule--although I believe CART allows for other splitting rules) of 
    CART---the decision tree with the hyperplane  feature space cuts...?

My memory of Cart is a bit fuzzy, but I think it's very different from
Cascade-Correlation.  Unless I'm confused, here are a couple of glaring
differences:

1. In a decision-tree setup like CART, each new split works within one of
the regions of space that you've already carved out -- that is, within only
one branch of the tree.  So for something like N-bit parity, you'd need 2^N
hidden units (hyperplanes).  In a single-layer backprop net, you need only
N hidden units because they are shared.  Because it creates higher-order
units, Cascade-Correlation can generally do the job in less than N.  (See
the results in the Cascade-Correlation paper.)  I don't remember if any
version of CART makes serendipitous use of hyperplanes that were created
earlier to split other branches.  I am pretty sure, however, that it works
on splitting just one branch at a time, and doesn't actively try to create
hyperplanes that are useful in splitting many branches at once.

2. If you create all your new hidden units in a single layer, all you can
do is create hyperplanes in the original space of input features.  Because
it builds up multiple layers, Cascade-Correlation can create higher-order
units of great complexity, not just hyperplanes.  If you have the tech
report on Cascade-Correlation (the diagrams had to be cut from the NIPS
version due to page limitations), look at the strange complex curves it
creates in solving the two-spirals problem.  If you prefer,
Cascade-Correlation works by raising the dimensionality of the space and
then drawing hyperplanes in this new complex space, but the projection back
onto the original input space does not look like a straight line.  I've
never heard of anyone solving the two-spirals problem with a single layer
of sigmoid or threshold units -- it would take an awful lot of them.

I think that these two differences change the game entirely.  The only
resemblance I see between CART and Cascade-Correlation is that both build
up a structure little by little, trying to add new nonlinear elements that
eliminate some part of the remaining error.  But the kinds of structures
the two algorithms deal in is qualitatively different.

-- Scott

From pollack at cis.ohio-state.edu  Wed Aug  8 02:11:16 1990
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Wed, 8 Aug 90 02:11:16 -0400
Subject: Cascade-Correlation, etc
Message-ID: <9008080611.AA11352@dendrite.cis.ohio-state.edu>

Scott's description of his method and the need for a convergence
proof, reminded me of the line of research by Meir & Domany (Complex
Sys 2 1988) and Nadal & Mezard (Int.Jrnl. Neural Sys 1,1,1989).  In a
paper definitely related to theirs (which I cannot find), someone
proved (by construction) that each hidden unit added on top of a
feedforward TLU network could monotonically decrease the number of
errors for arbitrary-fan-in, single-output boolean functions. This
result might be generalizable to CC networks.

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Fax/Phone: (614) 292-4890


From FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU  Thu Aug  9 01:51:08 1990
From: FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU (Tal Grossman)
Date: Thu, 09 Aug 90 08:51:08 +0300
Subject: Network Constructing Algorithms.
Message-ID: <mailman.323.1149591193.29955.connectionists@cs.cmu.edu>

Network constructing algorithms, i.e. learning algorithms which add units
while training, receive a lot of interest these days. I've recently
compiled a reference list of papers presenting such algorithms. I send
this list as a small contribution to the last discussion. I hope people
will find it relevant and usefull.  Of course, it is probably not
exhaostive - and I'd like to hear about any other related work.
Note that two refs. are quite old (Hopcroft and Cameron) - from the threshold
logic days. A few papers include convergence proofs (Frean, Gallant,
Mezard and Nadal, Marchand et al). Naturally, there is a significant
overlap between some of the algorithms/architecture.

I also appologize for the primitive Tex format.

                                               Tal grossman < fegross at weizmann>
                                               Electronics Dept.
                                               Weizmann Inst.
                                               Rehovot 76100, ISRAEL.
-------------------------------------------------------------------------------

\centerline{\bf Network Generating Learning Algortihms - Refernces.}

T. Ash, ``Dynamic Node Creation in Back-Propagation Networks",
Tech.Rep.8901, Inst. for Cognitive Sci., Univ. of California, San-Diego.

Cameron S.H., ``The Generation of Minimal Threshold Nets by an
Integer Program",
IEEE TEC {\bf EC-13},299 (1964).

S.E. Fahlman and C.L. Lebiere,
``The Cascade-Correlation Learning Architecture",
in {\it Advances in Neural Information Processing Systems 2},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 524.

M. Frean, ``The Upstart Algorithm: a Method for Constructing and
Trainig Feed Forward Neural Networks",
Neural Computation {\bf 2}:2 (1990).

S.I. Gallant, ``Perceptron -Based Learning Algorithms", IEEE Trans. on
Neural Networks {\bf 1}, 179 (1990).

M. Golea and M. Marchand, ``A Growth Algorithm for Neural Network
Decision Trees", EuroPhys.Lett. {\bf 12}, 205 (1990).

S.J. Hanson, ``Meiosis Networks",
in {\it Advances in Neural Information Processing Systems 2},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 533.

Honavar V. and Uhr L.
in the {\it Proc. of the 1988 Connectionist
Models Summer School}, Touretzky D., Hinton G. and Sejnowski T. eds.
(Morgan Kaufmann, San Mateo, 1988).

Hopcroft J.E. and Mattson R.L.,
``Synthesis of Minimal Threshold Logic Networks",
IEEE TEC {\bf EC-14}, 552 (1965).

Mezard M. and Nadal J.P.,
``Learning in Feed Forward Layered Networks - The Tiling Algorithm",
J.Phys.A {\bf 22}, 2129 (1989).

J.Moody, ``Fast Learning in Multi Resolution Hierarchies",
in {\it Advances in Neural Information Processing Systems 1},
D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1989).

J.P. Nadal, ``Study of a Growth Algorithm for a Feed Forward Network",
International J. of Neural Systems {\bf 1}, 55 (1989).

Rujan P. and Marchand M.,
``Learning by Activating Neurons: A New Approach to Learning in
Neural Networks",
Complex Systems {\bf 3}, 229 (1989); and also in the
{\it Proc. of the First International
Joint Conference on Neural Networks -
Washington D.C. 1989}, Vol.II, pp.105.

J.A. Sirat and J.P. Nadal, ``Neural Trees: A New Tool for Classification",
preprint, submitted to "Network", April 90.
\bye

From LAUTRUP at nbivax.nbi.dk  Thu Aug  9 05:19:00 1990
From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup)
Date: Thu, 9 Aug 90 11:19 +0200 (NBI, Copenhagen)
Subject: International Journal of Neural Systems
Message-ID: <510E1F38537FE1E6AD@nbivax.nbi.dk>

 
Begin Message:

-----------------------------------------------------------------------


INTERNATIONAL JOURNAL OF NEURAL SYSTEMS 
       
The International Journal of Neural  Systems  is  a  quarterly  journal
which covers information processing in natural  and  artificial  neural
systems. It publishes original contributions on  all  aspects  of  this
broad subject which involves  physics,  biology,  psychology,  computer
science and engineering. Contributions include research papers, reviews
and short communications.  The  journal  presents  a  fresh  undogmatic
attitude towards this multidisciplinary field with  the  aim  to  be  a
forum for novel ideas and  improved  understanding  of  collective  and
cooperative phenomena with computational capabilities. 

ISSN: 0129-0657 (IJNS) 

----------------------------------

Contents of issue number 3 (1990):

1. A. S. Weigend, B. A. Huberman and D. E. Rumelhart: 
   Predicting the future: A connectionist approach.

2. C. Chinchuan, M. Shanblatt and C. Maa: An artificial neural 
   network algorithm for dynamic programming.

3. L. Fan and T. Li: Design of competition based neural networks 
   for combinatorial optimization.

4. E. A. Ferran and R. P. J. Perazzo: Dislexic behaviour of 
   feed-forward neural networks.

5. E. Milloti: Sigmoid versus step functions in feed-forward  
   neural networks.  

6. D. Horn and M. Usher: Excitatory-inhibitory networks with
   dynamical thresholds.

7. J. G. Sutherland: A holographic model of memory, learning
   and expression. 

8. L. Xu: Adding top-down expectations into the learning procedure 
   of self-organizing maps.

9. D. Stork: BOOK REVIEW     

----------------------------------

Editorial board:

B. Lautrup (Niels Bohr Institute, Denmark)  (Editor-in-charge)
S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) 

D. Stork (Stanford) (Book review editor)

Associate editors:

B. Baird (Berkeley) 
D. Ballard (University of Rochester) 
E. Baum (NEC Research Institute)
S. Bjornsson (University of Iceland)
J. M. Bower (CalTech)
S. S. Chen (University of North Carolina)
R. Eckmiller (University of Dusseldorf)
J. L. Elman (University of California, San Diego)
M. V. Feigelman (Landau Institute for Theoretical Physics)
F. Fogelman-Soulie (Paris)  
K. Fukushima (Osaka University)
A. Gjedde (Montreal Neurological Institute)
S. Grillner (Nobel Institute for Neurophysiology, Stockholm)
T. Gulliksen (University of Oslo)
D. Hammerstroem (University of Oregon)
J. Hounsgaard (University of Copenhagen) 
B. A. Huberman (XEROX PARC)
L. B. Ioffe (Landau Institute for Theoretical Physics)
P. I. M. Johannesma (Katholieke Univ. Nijmegen)
M. Jordan (MIT)
G. Josin (Neural Systems Inc.)
I. Kanter (Princeton University)
J. H. Kaas (Vanderbilt University)
A. Lansner (Royal Institute of Technology, Stockholm)   
A. Lapedes (Los Alamos)
B. McWhinney (Carnegie-Mellon University)
M. Mezard (Ecole Normale Superieure, Paris)
A. F. Murray (University of Edinburgh)
J. P. Nadal (Ecole Normale Superieure, Paris)
E. Oja (Lappeenranta University of Technology, Finland)
N. Parga (Centro Atomico Bariloche, Argentina)
S. Patarnello (IBM ECSEC, Italy)
P. Peretto (Centre d'Etudes Nucleaires de Grenoble)
C. Peterson (University of Lund)
K. Plunkett (University of Aarhus)
S. A.  Solla (AT&T Bell Labs)
M. A. Virasoro (University of Rome)
D. J. Wallace (University of Edinburgh)
D. Zipser (University of California, San Diego) 

----------------------------------


CALL FOR PAPERS  

Original contributions consistent with the scope  of  the  journal  are
welcome.  Complete  instructions  as  well   as   sample   copies   and
subscription information are available from 

The Editorial Secretariat, IJNS
World Scientific Publishing Co. Pte. Ltd.
73, Lynton Mead, Totteridge
London N20 8DH
ENGLAND 
Telephone: (44)1-446-2461

or 

World Scientific Publishing Co. Inc.
687 Hardwell St.
Teaneck
New Jersey 07666
USA  
Telephone: (1)201-837-8858  

or

World Scientific Publishing Co. Pte. Ltd.
Farrer Road, P. O. Box 128
SINGAPORE 9128
Telephone (65)278-6188

-----------------------------------------------------------------------

End Message


From tgd at turing.CS.ORST.EDU  Thu Aug  9 01:36:10 1990
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Wed, 8 Aug 90 22:36:10 PDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU's message of Wed, 08 Aug 90 10:09:48 EDT <9008090152.AA19554@CS.ORST.EDU>
Message-ID: <9008090536.AA01129@turing.CS.ORST.EDU>

As someone with a lot of experience in decision-tree learning
algorithms, I agree with Scott.  The main similarity between
Cascade-Correlation (CC) and decision tree algorithms like CART is
that they are both greedy.  CART and related algorithms (e.g., ID3,
C4, CN2, GREEDY3) all work by choosing an (axis-parallel) hyperplane
and then subdividing the training data along that hyperplane, whereas
CC keeps all of the training data together and keeps retraining the
output units as it incrementlly adds hidden units.

There is an algorithm, called FRINGE, that learns a decision tree and
then uses that tree to define new features which are then used to
build a new tree (and this process can be repeated, of course).  This
is the best example I know of a non-connectionist (supervised)
algorithm for defining new features.

--Tom


From jose at learning.siemens.com  Thu Aug  9 10:14:39 1990
From: jose at learning.siemens.com (Steve Hanson)
Date: Thu, 9 Aug 90 09:14:39 EST
Subject: Similarity to Cascade-Correlation
Message-ID: <9008091414.AA07343@learning.siemens.com.siemens.com>


thanks for the clarification...

however, as I understand CART, it is not required to
construct an axis-parallel hyperplane (like ID3 etc..), like 
CC any hyperplane is possible.  Now as I understand CC it does 
freeze the weights for each hidden unit once
asymptotic learning takes place and takes as input to
a next candidate hidden unit the frozen hidden unit
output (ie hyperplane decision or discriminant function).
Consequently, CC does not "...keep all of the training 
data together and <keeps> retraining the output units (weights?) as it 
incrementlly adds hidden units".

As to higher-order hidden units... I guess i see what you mean, however,
don't units below simply send a decision concerning the subset
of data which they have correctly classified?  Consequently,
units above see the usual input features and a 
newly learned hidden unit feature indicating that a some subset 
of the input vectors are on one side of its decision surface? right?
Consequently the next hidden unit in the "cascade" can learn
to ignore that subset of the input space and concentrate on
other parts of the input space that requires yet another hyperplane?  
It seems as tho this would produce a branching tree of discriminantS
similar to cart.

n'est pas?


	Steve

From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Thu Aug  9 11:38:51 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Thu, 09 Aug 90 11:38:51 EDT
Subject: Similarity to Cascade-Correlation 
In-Reply-To: Your message of Thu, 09 Aug 90 09:14:39 -0500.
             <9008091414.AA07343@learning.siemens.com.siemens.com> 
Message-ID: <mailman.324.1149591193.29955.connectionists@cs.cmu.edu>


    Now as I understand CC it does 
    freeze the weights for each hidden unit once
    asymptotic learning takes place and takes as input to
    a next candidate hidden unit the frozen hidden unit
    output (ie hyperplane decision or discriminant function).

Right.  The frozen hidden unit becomes available both for forming an output
and as an input to subsequent hidden units.

An aside: Instead of "freezing", I've decided to call this "tenure" from now
on.  When a candidate unit becomes tenured, it no longer has to learn any
new behavior, and from that point on other units will pay attention to what
it says.

    Consequently, CC does not "...keep all of the training 
    data together and <keeps> retraining the output units (weights?) as it 
    incrementlly adds hidden units".

How does this follow from the above?
    
    As to higher-order hidden units... I guess i see what you mean, however,
    don't units below simply send a decision concerning the subset
    of data which they have correctly classified?

It's not just a decision.  The unit's output can assume any value in its
continuous range.  Some hidden units develop big weights and tend to act
like sharp-threshold units, while others do not.

    Consequently,
    units above see the usual input features and a 
    newly learned hidden unit feature indicating that a some subset 
    of the input vectors are on one side of its decision surface? right?

Right, modulo the comment above.

    Consequently the next hidden unit in the "cascade" can learn
    to ignore that subset of the input space and concentrate on
    other parts of the input space that requires yet another hyperplane?  
    It seems as tho this would produce a branching tree of discriminantS
    similar to cart.

No, this doesn't follow at all.  Typically there are still errors on both
sides of the unit just created, so the next unit doesn't ignore either
"branch".  It produces some new cut that typically subdivides all (or many)
of the regions created so far.  Again, I suggest you look at the diagrams
in the tech report to see the kinds of "cuts" are actually created.
    
    n'est pas?

Only eagles nest in passes.  Lesser birds hide among the branches of
decision trees.  :-)

-- Scott

From Connectionists-Request at CS.CMU.EDU  Thu Aug  9 13:33:59 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Thu, 09 Aug 90 13:33:59 EDT
Subject: Return addresses
Message-ID: <24309.650223239@B.GP.CS.CMU.EDU>

I have received several complaints from Connectionists members that they are
not able 'reply' to messages because the original sender's address has been
removed from the message header.  

This is a problem with the receiver's local mailer.  Rather than having me try
to remotely trouble shoot 150 different mailers, the problem could be solved by
including a return email address as part of the body of any message sent to
Connectionists.

I would also like to remind subscribers that a copy of main mailing list is
available in the Connectionists archives.  

Scott Crowder
Connectionists-Request at cs.cmu.edu	(ARPAnet)  <- see, it isn't that hard


-------------------------------------------------------------------------------

			The CONNECTIONISTS Archive:
			---------------------------

All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are
now available for public perusal.  

A separate file exists for each month.  The files' names are:

			 arch.yymm

where yymm stand for the obvious thing.  Thus the earliest available data are
in the file:
			 arch.8802

Files ending with .Z are compressed using the standard unix compress program.
To browse through these files (as well as through other files, see below) you
must FTP them to your local machine.

-------------------------------------------------------------------------------

		How to FTP Files from the CONNECTIONISTS Archive
		------------------------------------------------

1.  Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address
    128.2.242.8).

2.  Login as user anonymous with password your username.

3.  'cd' directly to one of the following directories:
	/usr/connect/connectionists/archives
	/usr/connect/connectionists/bibliographies

4.  The archives and bibliographies directories are the ONLY ones you can 
    access. You can't even find out whether any other directories exist. 
    If you are using the 'cd' command you must cd DIRECTLY into one of these
    two directories. Access will be denied to any others, including their 
    parent directory.

5.  The archives subdirectory contains back issues of the mailing list.
    Some bibliographies are in the bibliographies subdirectory.


Problems? - contact us at "Connectionists-Request at cs.cmu.edu".

Happy Browsing

Scott Crowder
Connectionists-Request at cs.cmu.edu

-------------------------------------------------------------------------------

From orjan at thalamus.sans.bion.kth.se  Thu Aug  9 19:47:34 1990
From: orjan at thalamus.sans.bion.kth.se (Orjan Ekeberg)
Date: Thu, 09 Aug 90 19:47:34 N
Subject: Network Constructing Algorithms. 
In-Reply-To: Your message of Thu, 09 Aug 90 08:51:08 O.
             <9008091705.AAgarbo.bion.kth.se13977@garbo.bion.kth.se> 
Message-ID: <9008091747.AA12363@thalamus>


I assume that some of the work that we have been doing would fit
well in this context too.  Based on a recurrent network, higher order units
are added automatically.  The new units become part of the recurrent set
and helps to make the training patterns fixpoints of the network.

A couple of references (in bibtex format):

@inproceedings{sans:alaoe87,
    author = {Anders Lansner and {\"O}rjan Ekeberg},
    year = 1987,
    title = {An Associative Network Solving the ``4-Bit ADDER Problem''},
    booktitle = {Proceedings of the IEEE First Annual International
        Conference on Neural Networks},
    pages = {II{-}549},
    address = {San Diego, USA},
    month = jun}

@inproceedings{sans:paris88,
    author = {{\"O}rjan Ekeberg and Anders Lansner},
    year = 1988,
    title = {Automatic Generation of Internal Representations
        in a Probabilistic Artificial Neural Network},
    booktitle = {Neural Networks from Models to Applications},
    editor = {L. Personnaz and G. Dreyfus},
    publisher = {I.D.S.E.T.},
    address = {Paris},
    pages = {178--186},
    note = {Proceedings of {nEuro}-88, The First European Conference
        on Neural Networks},
    abstract = {In a one layer feedback perceptron type network,
the connections can be viewed as coding the pairwise correlations
between activity in the corresponding units. This can then be used to
make statistical inference by means of a relaxation technique based on
bayesian inferences.

When such a network fails, it might be because the regularities are
not visible as pairwise correlations. One cure would then be to use a
different internal coding where selected higher order correlations are
explicitly represented. A method for generating this representation
automatically is reviewed and results from experiments regarding the
resulting properties is presented with a special focus on the networks
ability to generalize properly.}}


+---------------------------------+-----------------------+
+ Orjan Ekeberg              + O---O---O          +
+ Department of Computing Science +  \ /|\ /| Studies of  +
+ Royal Institute of Technology      +   O-O-O-O  Artificial +
+ S-100 44 Stockholm, Sweden      +   |/ \ /|   Neural      +
+---------------------------------+   O---O-O    Systems  +
+ EMail: orjan at bion.kth.se      + SANS-project      +
+---------------------------------+-----------------------+

From pollack at cis.ohio-state.edu  Thu Aug  9 12:14:19 1990
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Thu, 9 Aug 90 12:14:19 -0400
Subject: Cascade Correlation and Convergence 
Message-ID: <9008091614.AA14222@dendrite.cis.ohio-state.edu>

Scott's description of his algorithm, and lack of convergence proof,
reminded me of the line of research by Meir and Domany (Complex
Systems 2, 1988) and Mezard and Nadal (Int J Neu Systems, 1,1 1989) on
methods for directly constructing networks.

In a related paper (which I cannot find), I'm quite sure that someone
proved by construction that any (n input, 1 output) boolean function
could be accomplished by a layering of TLU's, where each additional
unit is guaranteed to decrease the number of mis-classified inputs.

Perhaps this approach would help lead to some convergence proof for CC
networks.

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Fax/Phone: (614) 292-4890


From bgupta at aries.intel.com  Thu Aug  9 19:19:58 1990
From: bgupta at aries.intel.com (Bhusan Gupta)
Date: Thu, 9 Aug 90 16:19:58 PDT
Subject: Job opening at Intel for NN IC designer
Message-ID: <9008092319.AA04843@aries>


The neural network group at Intel is looking for an engineer to 
participate in the development of neural networks. 

A qualified applicant should have a M.S. or PhD in electrical 
engineering or equivalent experience. The specialization required is in
CMOS circuit design with an emphasis on digital design. Analog design
experience is considered useful as well. Familiarity with neural network 
architectures, learning algorithms, and applications is desirable. 

The duties that are specific to this job are:
	Neural network design.
		Architecture definition and circuit design.
		Chip planning, layout supervision and verification.
		Testing and debugging silicon.
	The neural network design consists primarily of digital design with
	both a gate-level and transistor-level emphasis.


The job is at the Santa Clara site and is currently open.

Interested principals can email at bgupta at aries.intel.com until the
end of August. Resumes in ascii are preferred. I will pass along all 
responses to the appropriate people.

street address:
	Bhusan Gupta
	m/s sc9-40
	2250 Mission College Blvd.
	P.O. Box 58125
	Santa Clara, Ca 95052


Intel is an equal opportunity employer, etc.

Bhusan Gupta

From sg at corwin.ccs.northeastern.edu  Thu Aug  9 14:34:35 1990
From: sg at corwin.ccs.northeastern.edu (steve gallant)
Date: Thu, 9 Aug 90 14:34:35 EDT
Subject: Cascade-Correlation, etc
Message-ID: <9008091834.AA18306@corwin.CCS.Northeastern.EDU>

To respond to Jordan's suggestion, if you copy the output cell from
a stage in cascade correlation into your growing network, then the
previous convergence results hold for boolean learning problems.
This is true whether you copy at every stage or only occasionally.

Scott tried a few simulations and there seemed to be some learning
speed gain by occasional copying, perhaps 25% on the couple of tests
he ran.

Also, if I can add an early paper (that includes convergence)
to Tal Grossman's list:

     Gallant, S. I\@.  Three Constructive Algorithms for Network
     Learning.  
     Proc.\ Eighth Annual Conference of the
     Cognitive Science Society, Amherst, Ma., Aug. 15-17, 1986, 652-660.
 

Steve Gallant

From marcus at cns.edinburgh.ac.uk  Fri Aug 10 16:37:13 1990
From: marcus at cns.edinburgh.ac.uk (Marcus Frean)
Date: Fri, 10 Aug 90 16:37:13 BST
Subject: Convergence of constructive algorithms.
Message-ID: <8340.9008101537@cns.ed.ac.uk>


Jordan Pollack writes:

> In a related paper (which I cannot find), I'm quite sure that someone
> proved by construction that any (n input, 1 output) boolean function
> could be accomplished by a layering of TLU's, where each additional
> unit is guaranteed to decrease the number of mis-classified inputs.
> Perhaps this approach would help lead to some convergence proof for CC
> networks.


There are several papers that show convergence via guaranteeing each
unit reduces the output's errors by at least one. 

[NB: They all use linear threshold units, and require for convergence
that the training set be composed of binary patterns (or at least
convex: every pattern must be separable from all the others), since
then the worst case is always that a new unit captures a single
pattern and hence is able to correct the output unit by one.]


These include 
The "Tower algorithm":
	Gallant,S.I. 1986a. Three Constructive Algorithms for Network
	Learning.  Proc. 8th Annual Conf. of Cognitive Science Soc.
	p652-660. 
also discussed in 
	Nadal,J. 1989. Study of a Growth Algorithm for Neural Networks
	International J. of Neural Systems, 1,1:55-59 
The performance of this method closely matches that of the "Tiling"
Algorithm of Mezard and Nadal, although the proof there is for
reduction of at least one error per layer rather than per unit.


The "neural decision tree" approach is shown to converge by 
	M. Golea and M. Marchand, A Growth Algorithm for Neural
	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
and also
	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
	Classification, preprint, submitted to "Network", April 90.


The "Upstart" algorithm (my favourite....)
	Frean,M.R. 1990. The Upstart Algorithm: A Method for
	Constructing and Training Feedforward Neural Networks.
	Neural Computation. 2:2, 198-209.  
in which new units are devoted to correcting errors made by existing
units (in this sense it has bears some resemblance to Cascade
Correlation).  A binary tree of units is constructed, but it is not a
decision tree: "daughter" units correct their "parent", with the most
senior parent being the output unit.

Marcus.
---------------------------------------------------------------------


From fanty at cse.ogi.edu  Fri Aug 10 13:00:03 1990
From: fanty at cse.ogi.edu (Mark Fanty)
Date: Fri, 10 Aug 90 10:00:03 -0700
Subject: conjugate gradient optimization program available
Message-ID: <9008101700.AA03174@cse.ogi.edu>

The speech group at OGI uses conjugate-gradient optimization to train
fully connected feed-forward networks.  We have made the program (OPT)
available for anonymous ftp:

1. ftp to cse.ogi.edu

2. login as "anonymous" with any password

3. cd to "pub/speech"

4. get opt.tar

OPT was written by Etienne Barnard at Carnegie-Mellon University.

  Mark Fanty				Computer Science and Engineering 
			       		Oregon Graduate Institute
  fanty at cse.ogi.edu			196000 NW Von Neumann Drive      
  (503) 690-1030			Beaverton, OR 97006-1999         


From amini at tcville.hac.com  Sun Aug 12 23:47:14 1990
From: amini at tcville.hac.com (Afshin Amini)
Date: Sun, 12 Aug 90 20:47:14 PDT
Subject: signal processing with neural nets
Message-ID: <9008130347.AA02757@ai.spl>

Hi there:

I would like to explore possibilities of using neural nets in a signal processing environment. I would like to get familiar with usage of neural nets in
the area of spectral estimation and classification. 
I have used the popular methods of high resolution spectral estimation such
as AR modeling and such.
I would like to get some reffrences to recent publications and books that
contain specific algorithms that deploys neural networks to achieve such
problems in signal processing.

thanks,

-A. Amini


-- 
Afshin Amini
Hughes Aircraft Co.  			voice:    (213) 616-6558
Electro-Optical and Data Systems Group
Signal Processing Lab
                                        fax:         (213) 607-0918
P.O. Box 902, EO/E1/B108 		email:     
El Segundo, CA 90245 			             smart: amini at tcville.hac.com
Bldg. E1 Room b2316f   			             dumb:   amini%tcville at hac2arpa.hac.com
 			                             uucp:  hacgate!tcville!dave


From nelsonde%avlab.dnet at wrdc.af.mil  Mon Aug 13 10:10:04 1990
From: nelsonde%avlab.dnet at wrdc.af.mil (nelsonde%avlab.dnet@wrdc.af.mil)
Date: Mon, 13 Aug 90 10:10:04 EDT
Subject: Last Call for Papers for AGARD Conference
Message-ID: <9008131410.AA08887@wrdc.af.mil>

                   I N T E R O F F I C E   M E M O R A N D U M

                                        Date:      13-Aug-1990 10:05am EST
                                        From:      Dale E. Nelson 
                                                   NELSONDE 
                                        Dept:      AAAT-1
                                        Tel No:    57646


From sankar at caip.rutgers.edu  Sun Aug 12 21:15:24 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Sun, 12 Aug 90 21:15:24 EDT
Subject: No subject
Message-ID: <9008130115.AA08572@caip.rutgers.edu>


>>There are several papers that show convergence via guaranteeing each
>>unit reduces the output's errors by at least one. 
>>
>>
>>The "neural decision tree" approach is shown to converge by 
>>	M. Golea and M. Marchand, A Growth Algorithm for Neural
>>	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
>>and also
>>	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
>>	Classification, preprint, submitted to "Network", April 90.

Add to this the following paper:

A. Sankar and R.J. Mammone, " A fast learning algorithm for tree
neural networks", presented at the 1990 Conference on Information
Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. 

This will appear in the conference proceedings. We also have a more
detailed technical report on this research. 

For copies please contact

Ananth Sankar
CAIP 117
Brett and Bowser Roads
Rutgers University
P.O. Box 1390
Piscataway, NJ 08855-1390


From sankar at caip.rutgers.edu  Mon Aug 13 13:48:53 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Mon, 13 Aug 90 13:48:53 EDT
Subject: No subject
Message-ID: <9008131748.AA07712@caip.rutgers.edu>


An earlier attempt to mail this seems to have failed..my apologies to
everyone who gets a duplicate copy.


>>There are several papers that show convergence via guaranteeing each
>>unit reduces the output's errors by at least one. 
>>
>>
>>The "neural decision tree" approach is shown to converge by 
>>	M. Golea and M. Marchand, A Growth Algorithm for Neural
>>	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
>>and also
>>	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
>>	Classification, preprint, submitted to "Network", April 90.

Add to this the following paper:

A. Sankar and R.J. Mammone, " A fast learning algorithm for tree
neural networks", presented at the 1990 Conference on Information
Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. 

This will appear in the conference proceedings. We also have a more
detailed technical report on this research. 

For copies please contact

Ananth Sankar
CAIP 117
Brett and Bowser Roads
Rutgers University
P.O. Box 1390
Piscataway, NJ 08855-1390


From gary%cs at ucsd.edu  Mon Aug 13 15:35:50 1990
From: gary%cs at ucsd.edu (Gary Cottrell)
Date: Mon, 13 Aug 90 12:35:50 PDT
Subject: Summary (long): pattern recognition comparisons
In-Reply-To: Leonard Uhr's message of Fri, 3 Aug 90 14:18:11 -0500 <9008031918.AA23586@thor.cs.wisc.edu>
Message-ID: <9008131935.AA19428@desi.ucsd.edu>


Leonard Uhr says:

>Neural nets using backprop have only handled VERY SIMPLE images, usually in
>8-by-8 arrays.  (We've used 32-by-32 arrays to investigate generation in
>logarithmically converging nets, but I don't know of any nets with complete
>connectivity from one layer to the next that are that big.)

Mike Fleming and I used 64x64 inputs for face recognition. The system does
auto-encoding as a preprocessing step, reducing the number of inputs to 80.
See IJCNN-90, Vol II p65->.

gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029
Computer Science and Engineering C-014
UCSD, 
La Jolla, Ca. 92093
gary at cs.ucsd.edu (ARPA)
{ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET)
gcottrell at ucsd.edu (BITNET)

From kuepper at ICSI.Berkeley.EDU  Tue Aug 14 14:32:37 1990
From: kuepper at ICSI.Berkeley.EDU (Wolfgang Kuepper)
Date: Tue, 14 Aug 90 11:32:37 PDT
Subject: SIEMENS Job Announcement
Message-ID: <9008141832.AA02344@icsib21.Berkeley.EDU>


		IMAGE UNDERSTANDING and ARTIFICIAL NEURAL NETWORKS

	The Corporate Research and Development Laboratories of Siemens AG, 
	one of the largest companies worldwide in the electrical and elec-
	tronics industry, have research openings in the Computer Vision 
	as well as in the Neural Network Groups. The groups do basic and 
	applied studies in the areas of image understanding (document inter-
	pretation, object recognition, 3D modeling, application of neural 
	networks) and artificial neural networks (models, implementations, 
	selected applications). The Laboratory is located in Munich, an 
	attractive city in the south of the Federal Republic of Germany.

	Connections exists with our sister laboratory, Siemens Corporate 
	Research in Princeton, as well as with various research institutes 
	and universities in Germany and in the U.S. including MIT, CMU and 
	ICSI.

	Above and beyond the Laboratory facilities, the groups have a 
	network of Sun and DEC workstations, Symbolics Lisp machines, 
	file and compute servers, and dedicated image processing hardware.

	The successful candidate should have an M.S. or Ph.D. in Computer 
	Science, Electrical Engineering, or any other AI-related or 
	Cognitive Science field. He or she should prefarably be able to 
	communicate in German and English.

	Siemens is an equal opportunity employer.

	Please send your resume and a reference list to
		Peter Moeckel
		Siemens AG
		ZFE IS INF 1
		Otto-Hahn-Ring 6
		D-8000 Muenchen 83
		West Germany
	e-mail: gm%bsun4 at ztivax.siemens.com
	Tel. +49-89-636-3372
	FAX  +49-89-636-2393

	Inquiries may also be directed to
		Wolfgang Kuepper (on leave from Siemens until 8/91)
		International Computer Science Institute
		1947 Center Street - Suite 600
		Berkeley, CA 94704
	e-mail: kuepper at icsi.berkeley.edu
	Tel. (415) 643-9153
	FAX  (415) 643-7684


From Connectionists-Request at CS.CMU.EDU  Thu Aug 16 12:31:34 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Thu, 16 Aug 90 12:31:34 EDT
Subject: patience is a virtue
Message-ID: <4776.650824294@B.GP.CS.CMU.EDU>

Recently a few people have worried that their posts were lost because of
the long resend time for messages to the connectionists list.  I would like for
all users to exercise a little patience.

CMU is happy to provide the resources and labor necessary to make the
Connectionists list available to the world wide connectionists community.
However, we do have limited resources.  The Connectionists redistribution
machine is a only a VAX 750.  This machine also services several other large
mailing lists.  Delays of 4-6 hours are typical, but delays of >16 hours are
possible during high traffic periods.

If you are trying to debate an issue with another list member, but think the
rest of the list would be interested in the debate it is best to email directly
to the other member and cc: Connectionists at cs.cmu.edu.  This allows you to
carry on your debate at normal email speeds and lets the rest of the community
'listen in' 6-16 hrs latter.

If you feel that the delays are a serious impediment to the research
progress of the connectionists community, CMU would be happy to accept your
donation of new dedicated Connectionists redistribution machine.

Scott Crowder
Connectionists-Request at cs.cmu.edu	(ARPAnet)

PS If you have waited more than 24 hours and STILL haven't recieved your post,
please contact me at Connectionists-Request at cs.cmu.edu.

From xiru at Think.COM  Fri Aug 17 16:48:58 1990
From: xiru at Think.COM (xiru@Think.COM)
Date: Fri, 17 Aug 90 16:48:58 EDT
Subject: backprop for classification
Message-ID: <9008172048.AA00756@yangtze.think.com>


While we trained a standard backprop network for some classification task
(one output unit for each class), we found that when the classes are not
evenly distribed in the training set, e.g., 50% of the training data belong
to one class, 10% belong to another, ... etc., then the network always biased
towards the classes that have the higher percentage in the training set.
Thus, we had to post-process the output of the network, giving more weights
to the classes that occur less frequently (in reverse proportion to their
population). 

I wonder if other people have encountered the same problem, and if  there
are better ways to deal with this problem.

Thanks in advance for any replies.


- Xiru Zhang

Thinking Machines Corp.

From John.Hampshire at SPEECH2.CS.CMU.EDU  Sun Aug 19 13:48:06 1990
From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU)
Date: Sun, 19 Aug 90 13:48:06 EDT
Subject: backprop for classification
Message-ID: <mailman.326.1149591194.29955.connectionists@cs.cmu.edu>

Xiru Zhang of Thinking Machines Corp. writes:

> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
> Thus, we had to post-process the output of the network, giving more weights
> to the classes that occur less frequently (in reverse proportion to their
> population). 
> 
> I wonder if other people have encountered the same problem, and if  there
> are better ways to deal with this problem.

Indeed, one can show that any classifier with sufficient
functional capacity to model the class-conditional densities
of the random vector X being classified (e.g., a MLP with sufficient
connectivity to perform the input-to-output functional mapping
necessary for robust classification) and trained with a "reasonable
error measure" (a term originated by B. Pearlmutter)
will yield outputs that are accurate estimates
of the a posteriori probabilities of X, given an asymptotically
large number of statistically independent training samples.
Examples of "reasonable error measures" are mean-squared
error (the one used by Xiru Zhang), Cross Entropy, Max. Mutual Info.,
Kullback-Liebler distance, Max. Likelihood...

Unfortunately, one never has enough training data, and
it's not always clear what constitutes sufficient but not
excessive functional capacity in the classifier.  So one
ends up *estimating* the a posterioris with one's
"reasonable error measure"-trained classifier.  If one trains
one's classifier with a disproportionately high number of
samples belonging to one particular class, one will get
precisely the behavior Xiru Zhang describes.

**************
This is because the a posterioris depend on the class priors
(you can prove this easily using Bayes' rule).  If you
bias the priors, you will bias the a posterioris accordingly.
Your classifier will therefore learn to estimate the biased
a posterioris.
**************

The best way to fix the problem if you're using a
"reasonable error measure" to train your classifier
is to have a training set that reflects the true class
priors.  If this isn't possible,
then you can post-process the classifier's outputs by
correcting for the biased priors.  Whether or not this fix
really works depends a lot on the classifier you're using.
MLPs tend to be over-parameterized, so they tend to yield
binary outputs that won't be affected by this kind of post
processing.

Another approach might be to avoid using "reasonable error
measures" to train your classifier.  I have more info regarding
such alternatives if anyone cares, but I've already blabbed too much.
If you want refs., please send me email directly.

Cheers,

John


From niranjan at engineering.cambridge.ac.uk  Sun Aug 19 10:11:29 1990
From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan)
Date: Sun, 19 Aug 90 10:11:29 BST
Subject: backprop for classification
Message-ID: <3447.9008190911@dsl.eng.cam.ac.uk>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>
This often happens when the network is too small to load the training data.
Your network, in this case, does not converge to negligible error.
My suggestion is to start with a large network that can load your training
data and gradually reduce the size of the net by pruning the weights giving
small contributions to the output error.

niranjan

From russ at dash.mitre.org  Mon Aug 20 07:17:38 1990
From: russ at dash.mitre.org (Russell Leighton)
Date: Mon, 20 Aug 90 07:17:38 EDT
Subject: backprop for classification
In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com>
Message-ID: <9008201117.AA22280@dash.mitre.org>

We have found backprop VERY sensitive to the probability of
occurance of each class. As long as you are aware of this
you can use this to advantange. For example, if
false alarms are a big concern then by training
with large amounts of "noise" you can bias the sytem
to reduce the Pfa. 

This effect has been quantified analytically and experimentally
for systems with no hidden layers in a paper being compiled now.
The bottom line is that a no hidden layer system implements
a classical Mini-Max test if the signal classes are represented
equally in the training set. By varying the the composition
of the training sets, the network can be designed relative to 
a known maximum false alarm probablity independent of signal-to-noise
ratio. This work continues for multi-layer systems.

An experimental account of how to exploit this effect for
signal classification can be found in:

Wieland, et al., `An Analysis of Noise Tolerance for a Neural
Network Recognition System', Mitre Tech. Rep. MP-88W00021, 1988

and

Wieland, et al., `Shaping Schedules as a Method of Accelerated
Learning', Proceedings of the first INNS Meeting, 1988

Russ.


NFSNET: russ at dash.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA


From wan at whirlwind.Stanford.EDU  Mon Aug 20 14:07:39 1990
From: wan at whirlwind.Stanford.EDU (Eric A. Wan)
Date: Mon, 20 Aug 90 11:07:39 PDT
Subject: Survey of Second Order Techniques
Message-ID: <9008201807.AA13338@whirlwind.Stanford.EDU>


I am compiling a study on the extent to which researches have gone
beyond simple gradient descent (back-propagation) for training
layered neural networks by applying more sophisticated classical
techniques in non-linear optimization (e.g. Newton, Quasi-Newton,
Conjugate-Gradient methods, etc.)?  Please e-mail me any comments
and/or references that you have on the subject.  I will summarize 
the responses.

Thanks in advance.

Eric Wan  
wan at isl.stanford.edu


From YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU  Mon Aug 20 11:36:47 1990
From: YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU (Yves (Zip) Lacouture)
Date: Mon, 20 Aug 90 11:36:47 HAE
Subject: BP for categorization...
Message-ID: <mailman.327.1149591194.29955.connectionists@cs.cmu.edu>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>

I encountered the same problem in a similar situation. This occur with
limited resources (HU): the network tend to neglet a subset of the
stimuli. The phenomenon is also observed when the stimuli have the same
presentation probability and the resources are very limited. It helps to
use a non-orthogonal representation (e.g. by activating neighbor units).
To build a model of (human) simple identification I modified BP to
incorporate a selective attention mechanism by which the adaptative
modifications are made larger for the stimuli for which performances are
worse. I expect to offer a TR on this topic soon.

yves

From chrisley at parc.xerox.com  Mon Aug 20 13:35:08 1990
From: chrisley at parc.xerox.com (Ron Chrisley)
Date: Mon, 20 Aug 90 10:35:08 PDT
Subject: backprop for classification
In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com>
Message-ID: <9008201735.AA07158@owl.parc.xerox.com>


Xiru, you wrote:

"While we trained a standard backprop network for some classification task
(one output unit for each class), we found that when the classes are not
evenly distribed in the training set, e.g., 50% of the training data belong
to one class, 10% belong to another, ... etc., then the network always
biased towards the classes that have the higher percentage in the training set.
Thus, we had to post-process the output of the network, giving more weights
to the classes that occur less frequently (in reverse proportion to their
population)."

My suggestion:  most BP classification paradigms will work best if you are
using the same distribution for training as for testing.  So only worry
about uneven distribution of classes in the training data if the input on
which the network will have to perform does not have that distribution.  If
rocks are 1000 times more common than mines, then given that something is
completely qualitatively ambiguous with respect to the rock/mine
distinction, it is best (in terms of minimizing # of misclassifications) to
guess that the thing is a rock.  So being biased toward rock
classifications is a valid way to minimize misclassification.  (Of course,
once you start factoring in cost, this will be skewed dramatically:  it is
much better to have a false alarm about a mine than to falsely think a mine
is a rock.)

In summary, uneven distributions aren't, in themselves, bad for training,
nor do they require any post-processing.  However, distributions that
differ from real-world ones will require some sort of post-processing, as
you have done.

But there is another issue here, I think.

How were you using the network for classification?  From your message, it
sounds like you were training and interpreting the network in such a way
that the activations of the output nodes were supposed to correspond to the
conditional probabilities of the different classes, given the input.  This
would explain what you meant by your last sentence in the above quote.

But there are other ways of using back-propagation.  For instance, if one
does not constrain the network to estimate conditional probabilities, but
instead has it solve the more general problem of minimizing classification
error, then it is possible that the network will come up with a solution
that is not affected by differences of prior probabilities of classes in the
training and testing data.  Since it is not solving the problem by
classifying via maximum liklihood, its solutions will be based on the
frequency-independent, qualitative structure of the inputs.

In fact, humans often do something like this.  The phenomenon is called
"base rate neglect".  The phenomenon is notorious in that when qualitative
differences are not so marked between a rare and a common class, humans
will always over-classify inputs into the rare class.  That is, if the
symptoms a patient has even *slightly* indicate a rare tropical disease
over a common cold, humans will give the rare disease dignosis, even
though it is extremely unlikely that the patient has that disease.  Of
course, the issue of cost is again being ignored here.  (See Gluck and Bower
for a look at the relation between neural networks and base rate neglect).

Such limitations aside, classification via means other than conditional
probability estimation may be desirable for certain applications.  For
example, those in which you do not know the priors, or they change
dramatically in an unpredictable way.  And/or where there is a strong
qualitative division bewteen members of the classes.

In such cases, you might get good classification performance, even when
the distributions differ, by relying more on qualitative differences in the
inputs than in the frequency of the classes.

Does this sound right?

Ron Chrisley	chrisley at csli.stanford.edu
Xerox PARC SSL					New College
Palo Alto, CA 94304				Oxford OX1 3BN, UK
(415) 494-4728					(865) 793-484


From niranjan at engineering.cambridge.ac.uk  Tue Aug 21 20:20:36 1990
From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan)
Date: Tue, 21 Aug 90 20:20:36 BST
Subject: Backprop for classification
Message-ID: <5229.9008211920@dsl.eng.cam.ac.uk>

> From: xiru at com.think
> Subject: backprop for classification
> Date: 19 Aug 90 00:26:28 GMT
>
> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
>
This often happens when the network is too small to load the training data.
Your network, in this case, does not converge to negligible error.
My suggestion is to start with a large network that can load your training
data and gradually reduce the size of the net by pruning the weights giving
small contributions to the output error.

niranjan


From der%beren at Forsythe.Stanford.EDU  Wed Aug 22 13:35:59 1990
From: der%beren at Forsythe.Stanford.EDU (Dave Rumelhart)
Date: Wed, 22 Aug 90 10:35:59 PDT
Subject: BP for categorization...relative frequency problem
In-Reply-To: "Yves (Zip) Lacouture"'s message of Mon,
  20 Aug 90 11:36:47 HAE <9008210406.AA11690@nprdc.navy.mil>
Message-ID: <9008221735.AA07583@beren.>


We have also encountered the problem.  Since BP does gradient descent and
since the contribution of any set of patterns depends in part on the
relative frequency of those patterns, fewer resources are allocated to low
fequency categories.  Morover, those resources are allocated later in the
training -- probably after over-fitting has already become a problem for
higher frequency categories.  Of course, if your training distribution is
the same as your testing distribution you wil be getting the appropriate
Baysian estimate of the class probabilities.  On the other hand, if the
generalization distribution is unknown at test time we may wish to factor
out the relative frequency of your input frequency during training and add
any known "priors" during generalization.  There are two ways to do this.
One way, suggested in one of the notes on this topic is to "post process"
out output data.  That is, divide the output unit value by the relative
frequency in the training set and multiply by the relative frequency in the
test set.  This will give you an estimate of the Bayesian probability for
the test set.  For a variety of reasons, this is less appropriate that
correcting during training.  In this case, the procedure is to effectively
increase the learning rate inversely proportional to the relative frequency
of the category in the training set.  Thus, we take bigger learning steps
on low frequency categories.  In a simple classification task, this is
roughly equivalent to normalizing the data set by sampling each category
set equally.  In the case of cross-classification (in whihch a given input
can be a member of more the one class), it is roughly equivalent to
weighting each inversely by the probability that that pattern would occur,
given independence between the output classes.  We have used this method
successfully in a system designed to classify mass spectra.  In this method
an output of .5 means that the evidence for and against the category is
equal.  Whereas, in the normal traing method, an output equal to the
relative frequency in the training set means that the evidence for and
against is equal.  In some cases this can be very small.  It is possibly
to add the priors in manually and compare performance on the training set
with the original method.  We find that we do only slightly worse on the
training set with the two methods.  We do much better in generalization on
classes that were low frequency in the training set and slightly worse on
classes which were high frequency in the training set.


                                        der

From hendler at cs.UMD.EDU  Wed Aug 22 16:28:52 1990
From: hendler at cs.UMD.EDU (Jim Hendler)
Date: Wed, 22 Aug 90 16:28:52 -0400
Subject: BP for categorization...relative frequency problem
Message-ID: <9008222028.AA09120@dormouse.cs.UMD.EDU>

Herve Bourlard and Nelson Morgan had to deal with this problem in
a system being used in the context of continuous speech recognition.
They solved the problem, to some extent, by dividing the output
category strengths by the prior probabilities of the training set.  This
avoided having to do anything terribly tricky in the network, and let
them use classical back-propagation without extension (although I think
they've also used some recurrences in one version).  I know there
have been several nice publications of their work in speech - various
papers with the authors Bourlard, Wellekens, and Morgan in various
combinations.  Morgan is at ICSI, and is probably the most accessible
of these authors for requesting reprints.
 -Jim Hendler
 UMCP

From PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU  Wed Aug 22 14:47:17 1990
From: PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU (PSS001%VAXA.BANGOR.AC.UK@vma.CC.CMU.EDU)
Date: Wed, 22 AUG 90 18:47:17 GMT
Subject: No subject
Message-ID: <mailman.328.1149591194.29955.connectionists@cs.cmu.edu>


Department of Psychology, University of Wales, Bangor
and Department of Psychology, University of York


CONNECTIONISM AND PSYCHOLOGY

THREE POST-DOCTORAL RESEARCH FELLOWSHIPS

Applications are invited for three post-doctoral research
fellowships to work on the connectionist and psychological
modelling of human short-term memory and spelling
development.

Two Fellowships are available for three years, on an ESRC-
funded project concerned with the development and evaluation
of a connectionist model of short-term memory.  One Fellow will
be based with Dr. Gordon Brown in the Cognitive
Neurocomputation Unit at Bangor and will be responsible for
implementing the model.  The other Fellow, based at York with
Dr. Charles Hulme, will be responsible for undertaking
psychological experiments with children and adults to evaluate
the model.  Starting salary for both posts on research 1A grade
up to # 13,495.

One two-year Fellowship is available to work on an MRC-funded
project to  develop a sequential connectionist model of the
development of spelling and phonemic awareness in children.
This post is based in Bangor with Dr. Gordon Brown.  Starting
salary on research 1A grade up to # 14,744.


Applicants should have postgraduate research experience or
interest in cognitive psychology/cognitive science or
connectionist/ neural network modelling and computer science.
Good computing skills are essential for the posts based in
Bangor, and  experience in running psychological experiments is
required for the York-based post.  Excellent computational and
research facilities will be available to the successful applicants.

The appointments may commence from 1st. October 1990, but
start could be delayed until 1st. January 1991.   Closing date for
applications is 7th. September 1990, but intending applicants
should get in touch as soon as possible.  Informal enquiries
regarding the Bangor-based posts, and requests for further
details of the posts and host departments, to Gordon Brown
(0248 351151 Ext 2624; email PSS001 at uk.ac.bangor.vaxa);
informal enquiries concerning the York-based post to Charles
Hulme ( 0904 433145; email ch1 at uk.ac.york.vaxa).
Applications (in the form of a curriculum vitae and the names
and addresses of two referees) should be sent to Mr. Alan
James, Personnel Office, University of Wales, Bangor, Gwynedd
LL57 2DG, UK.

(Apologies to anyone who receives this posting through more
than one list or newsgroup)

From MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU  Thu Aug 23 17:22:00 1990
From: MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU (MUSICO%BGERUG51.BITNET@vma.CC.CMU.EDU)
Date: Thu, 23 Aug 90 17:22 N
Subject: signoff
Message-ID: <mailman.329.1149591194.29955.connectionists@cs.cmu.edu>

signoff

From HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU  Fri Aug 24 12:08:15 1990
From: HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU (Gregory Kohring)
Date: Fri, 24 Aug 90 12:08:15 MES
Subject: Preprints
Message-ID: <mailman.330.1149591194.29955.connectionists@cs.cmu.edu>


The following preprint is currently available.
                                       -- Greg Kohring

        Performance Enhancement of Willshaw Type
        Networks through the use of Limit Cycles

                    G.A. Kohring
               HLRZ an der KFA Julich
         (Supercomputing Center at the KFA Julich)

Simulation results of a Willshaw type model for storing sparsely
coded patterns are presented. It is suggested that random patterns can
be stored in Willshaw type models by transforming them into a set of
sparsely coded patterns and retrieving this set as a limit cycle.
In this way, the number of steps needed to recall a pattern will be
a function of the amount of information the pattern contains.
A general algorithm for simulating neural networks
with sparsely coded patterns is also discussed, and, on a fully
connected network of N=36 864  neurons (1.4 billion couplings),
it is shown to achieve effective updating speeds as high as
160 billion coupling evaluations per second on one Cray-YMP processor.

==================================================================

Additionally, the following short review article is also available.
It is aimed at graduate students in computational physics who need an
overview of the neural network literature from a computational sciences
viewpoint, as well as some simple programming hints in order to get
started with their neural network studies. It will shortly
appear in World Scientific's Internationl Journal of Modern Physics C:
Compuational Physics.


         LARGE SCALE NEURAL NETWORK SIMULATIONS

                    G.A. Kohring
               HLRZ an der KFA Julich
         (Supercomputing Center at the KFA Julich)

The current state of large scale, numerical simulations of neural
networks is reviewed. Hardware and software improvements make it likely
that biological size networks, i.e., networks with more than $10^{10}$
couplings, can be simulated in the near future. Sample programs for the
efficient simulation of a few simple models are presented as an aid to
researchers just entering the field.


Send Correspondence and request for preprints to:

G.A. Kohring
HLRZ an der KFA Julich
Postfach 1913
D-5170 Julich, West Germany

e-mail: hkf218 at djukfa11.bitnet

Address after September 1, 1990:

Institut fur Theoretische Physik
Universitat zu Koln
D-5000 Koln  41, West Germany

From Connectionists-Request at CS.CMU.EDU  Fri Aug 24 10:31:02 1990
From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU)
Date: Fri, 24 Aug 90 10:31:02 EDT
Subject: Quantitative Linguistics Conference Announcement
Message-ID: <10643.651508262@B.GP.CS.CMU.EDU>


            First QUANTITATIVE LINGUISTICS CONFERENCE (QUALICO)
                          September 23 - 27, 1991
                        University of Trier, Germany

                              organized by the
          GLDV - Gesellschaft fuer Linguistische Datenverarbeitung
                 (German Society for Linguistic Computing)
	                           and
                 the Editors of "Quantitative Linguistics"

OBJECTIVES

QUALICO is being held for the first time as an International Conference 
to demonstrate the state of the art in Quantitative Linguistics. This 
domain of language study and research is gaining considerable interest 
due to recent advances in linguistic modelling, particularly in computational 
linguistics, cognitive science, and developments in mathematics like non-
linear systems theory. Progress in hard- and software technology together
with ease of access to data and numerical processing has provided 
new means of empirical data acquisition and the application of mathematical 
models of adequate complexity.  The German Society for Linguistic 
Computation (Gesellschaft fuer Linguistische Datenverarbeitung - GLDV)
and the editors of 'Quantitative Linguistics' have taken the initiative 
in preparing this conference to take place at the University of Trier, 
in Trier (Germany), September 23rd - 27th, 1991.

In view of the stimulating new developments in Europe and the academic 
world, the organizers' aim is to encourage and promote mutual exchange 
of ideas in this field of interest which has been limited in the past.
Challenging advances in interdisciplinary quantitative analyses, numerical
modelling and experimental simulations from different linguistic domains
will be reported on by the following keynote speakers: Gabriel Altmann 
(Bochum), Michail V. Arapov (Moskau) (pending acceptance), Hans Goebl 
(Salzburg), Mildred L.G. Shaw (Calgary), John S. Nicolis (Patras), Stuart
M. Shieber (Harvard) (pending acceptance).

CALL FOR PAPERS

The International Program Committee invites communications (long papers: 
20 minutes plus 10; short papers: 15 minutes plus 5; demonstrations 
and posters) on basic research and development as well as on operational 
applications of Quantitative Linguistics, including - but not limited 
to - the following topics:

A. Methodology
1. Theory Construction - 2. Measurement, Scaling - 3.  Taxonomy,
Categorizing - 4. Simulation - 5. Statistics, Probabilistic Modells,
Stochastic Processes - 6. Fuzzy Theory: Possibilistic Modells - 7.  Language
and Grammar Formalisms - 8. Systems Theory: Cybernetics and Information
Theory, Synergetics, New Connectionism

B. Linguistic Analysis and Modelling
1. Phonetics - 2. Phonemics - 3. Morphology - 4. Syntax - 5. Semantics - 6.
Pragmatics - 7.Lexicology - 8. Dialectology - 9. Typology - 10. Text and
Discourse - 11. Semiotics

C. Applications
1. Speech Recognition and Synthesis - 2.Text Analysis and Generation -
3. Language Acquisition and Teaching - 4.Text Understanding and Knowledge
Representation 

Authors are asked to submit extended abstracts (1500 words; 4 copies) 
of their papers in one of the conference's working languages (German, 
English) not later than December 31, 1990 to:

QUALICO - The Program Committee
University of Trier
P.O.Box 3825
D-5500 TRIER
Germany

uucp:    qualico at utrurt.uucp
or:      ..!unido!utrurt!qualico
X.400:   qualico at ldv.rz.uni-trier.dbp.de
or:      <c=de;a=dbp;p=uni-trier;ou=rz;ou=ldv;s=qualico>

Notice of acceptance will be given by March 31, 1991; and full
versions of invited and accepted papers (camera-ready) are due by 
June 30, 1991 in order to have the Conference Proceedings be published 
in time to be available for participants at the beginning of QUALICO.
This 'Call for Papers' is distributed world-wide in order to reach 
researchers active in universities and industry.

SOCIAL PROGRAMME

The oldest city in Germany, founded 16 b.C. by the Romans as Augusta 
Treverorum in the Mosel valley is situated now in the most Western 
region of Germany near both the French and Luxembourgian border.In 
the center of Europe this ancient city will host the participants 
of QUALICO at the University of Trier, surrounded by the vineyards 
of the Mosel-Saar-Ruwer wine district at vintage beginning. The excursion 
day scheduled midway through the conference (September 25, 1991) will 
provide an opportunity to visit points of historical interest in the 
city and its vicinity during a boat-trip on the Mosel river.

PROGRAM COMMITTEE

Chair: B.B. Rieger, University of Trier
S. Embleton, University of York,
D. Gibbon, University of Bielefeld
R. Grotjahn, University of Bochum
J. Haller, IAI Saarbruecken
P. Hellwig, University of Heidelberg
E. Hopkins, University of Bochum
J. Kindermann, GMD Bonn-St.Augustin
U. Klenk, University of Goettingen
R. Koehler, University of Trier
J.P. Koester, University of Trier
J. Krause, University of Regensburg
W. Lehfeldt, University of Konstanz
W. Lenders, University of Bonn
C. Lischka, GMD Bonn-St.Augustin
W. Matthaeus, University of Bochum
R.G. Piotrowski, University of Leningrad
D. Roesner, FAW Ulm
G. Ruge, Siemens AG, Muenchen
B. Schaeder, University of Siegen
H. Schnelle, University of Bochum
J. Sambor, University of Warsaw

ORGANIZING COMMITTEE

Chair: R. Koehler, University of Trier

CONFERENCE FEES
Early registration
(paid before July 31, 1991): DM 300,- 
- Members of supporting organizations  DM 250,-
- Students (without Proceedings) DM 150,-

Registration
(paid after July 31, 1991): DM 400,-
- Members of supporting organizations  DM 350,-
- Students (without Proceedings) DM 250,-


From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU  Fri Aug 24 12:36:28 1990
From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU)
Date: Fri, 24 Aug 90 12:36:28 EDT
Subject: Quantitative Linguistics???
Message-ID: <mailman.331.1149591194.29955.connectionists@cs.cmu.edu>


Perhaps the people who sent out this conference announcement could follow
up with a *brief* description of what quantitative linguistics is all
about, and why they are so excited about new advances in the area.  I'm not
familiar with the term, and the conference announcement didn't make clear
how qualitaive linguistics differs from older (qualitative?)  linguistic
models, except maybe that the key researchers are all in Europe.

And what does quantitative linguistics have to do with connectionism?

-- Scott Fahlman, Carnegie-Mellon University


From bms at dcs.leeds.ac.uk  Fri Aug 24 13:26:57 1990
From: bms at dcs.leeds.ac.uk (B M Smith)
Date: Fri, 24 Aug 90 13:26:57 BST
Subject: Item for Distribution
Message-ID: <1511.9008241226@csuna6.dcs.leeds.ac.uk>


                      FINAL CALL FOR PAPERS

                            AISB'91

        8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE

                    University of Leeds, UK
                       16-19 April, 1991

The Society for the Study of Artificial Intelligence and Simulation of
Behaviour (SSAISB) will hold its eighth biennial conference at
Bodington Hall, University of Leeds, from 16 to 19 April 1991. There
will be a Tutorial Programme on 16 April followed by the full Technical
Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit
Brussel).

Scope:
Papers are sought in all areas of Artificial Intelligence and Simulation of
Behaviour, but especially on the following AISB91 special themes:

  * Emergent functionality in autonomous agents
  * Neural networks and self-organisation
  * Constraint logic programming
  * Knowledge level expert systems research

Papers may describe theoretical or practical work but should make a
significant and original contribution to knowledge about the field of
Artificial Intelligence. 

A prize of 500 pounds for the best paper has been offered by British
Telecom Computing (Advanced Technology Group). It is expected 
that the proceedings will be published as a book.

Submission:
All submissions should be in hardcopy in letter quality print and
should be written in 12 point or pica typewriter face on A4 or 8.5" x
11" paper, and should be no longer than 10 sides, single-spaced.
Each paper should contain an abstract of not more than 200 words and a
list of up to four keywords or phrases describing the content of the
paper. Five copies should be submitted. Papers must be written in
English. Authors should give an electronic mail address where possible. 
Submission of a paper implies that all authors have obtained
all necessary clearances from the institution and that an author will
attend the conference to present the paper if it is accepted. Papers
should describe work that will be unpublished on the date of the
conference.

Dates:
  Deadline for Submission:		1 October 1990
  Notification of Acceptance:		7 December 1990
  Deadline for camera ready copy:	16 January 1991

Location: 
Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The
city of Leeds is two and a half hours by rail from London, and there are
frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam
and Paris. The Yorkshire Dales National Park is close by, and the historic 
city of York is only 30 minutes away by rail.

Information:
Papers and all queries regarding the programme should be sent to
Judith Dennison. All other correspondence and queries regarding the
conference to the Local Organiser, Barbara Smith.

  Ms. Judith Dennison			Dr. Barbara Smith
  Cognitive Sciences			Division of AI
  University of Sussex			School of Computer Studies
  Falmer				University of Leeds
  Brighton BN1 9QN			Leeds LS2 9JT
  UK					UK

  Tel: (+44) 273 678379			Tel: (+44) 532 334627
  Email: judithd at cogs.sussex.ac.uk	FAX: (+44) 532 335468
					Email: aisb91 at ai.leeds.ac.uk


From sankar at caip.rutgers.edu  Fri Aug 24 17:19:35 1990
From: sankar at caip.rutgers.edu (ananth sankar)
Date: Fri, 24 Aug 90 17:19:35 EDT
Subject: No subject
Message-ID: <9008242119.AA06389@caip.rutgers.edu>

Rutgers University

CAIP Center

CAIP Neural Network Workshop

15-17 October 1990

A neural network workshop will be held during 15-17 October 1990 in
East Brunswick, New Jersey under the sponsorship of the CAIP Center of
Rutgers University.  The theme of the workshop will be 

"Theory and impact of Neural Networks on future technology"

Leaders in the field from government, industry and academia will
present the state-of-the-art theory and applications of neural
networks. Attendance will be limited to about 100 participants.
  
A Partial List of Speakers and Panelists include:

		J. Alspector, Bellcore
		A. Barto, University of Massachusetts
		R. Brockett, Harvard University
		L. Cooper, Brown University
		J. Cowan, University of Chicago
		K. Fukushima, Osaka University
		D. Glasser, University of California, Berkeley
		S. Grossberg, Boston University
		R. Hecht-Nielsen, HNN, San Diego
		J. Hopfield, California Institute of Technology
		L. Jackel, AT&T Bell Labs.
		S. Kirkpatrick, IBM, T.J. Watson Research Center
		S. Kung, Princeton University
		F. Pineda, JPL, California Institute of Technology
		R. Linsker, IBM, T.J. Watson Research Center
		J. Moody, Yale University
		E. Sontag, Rutgers University
		H. Stark, Illinois Institute of Technology
		B. Widrow, Stanford University
		Y. Zeevi, CAIP Center, Rutgers University and The
                          Technion, Israel 

The workshop will begin with registration at 8:30 AM on Monday, 15
October and end at 7:00 PM on Wednesday, 17 October.  There will be
dinners on  Tuesday and Wednesday evenings followed by special-topic
discussion sessions.  The $395 registration fee ($295 for participants
from CAIP member organizations), includes the cost of the dinners.   

Participants are expected to remain in attendance throughout the entire
period of the workshop.  Proceedings of the workshop will subsequently
be published in book form.   

Individuals wishing to participate in the workshop should fill out the
attached form and mail it to the address indicated.

If there are any questions, please contact

			Prof. Richard Mammone
			Department of Electrical and Computer Engineering
			Rutgers University
			P.O. Box 909
			Piscataway, NJ 08854
			Telephone: (201)932-5554
			Electronic Mail: mammone at caip.rutgers.edu
			FAX: (201)932-4775
			Telex: 6502497820 mci


Rutgers University

CAIP Center

CAIP Neural Network Workshop

15-17 October 1990


I would like to register for the Neural Network Workshop.  


Title:________ Last:_________________ First:_______________ Middle:__________ 

Affiliation	_________________________________________________________

Address    	_________________________________________________________
	
       		______________________________________________________

Business Telephone: (___)________ FAX:(___)________

Electronic Mail:_______________________ Home Telephone:(___)________


I am particularly interested in the following aspects of neural networks:

_______________________________________________________________________

_______________________________________________________________________

Fee enclosed $_______
Please bill me $_______

Please complete the above and mail this form to:

				Neural Network Workshop
				CAIP Center, Rutgers University
				Brett and Bowser Roads
				P.O. Box 1390
				Piscataway, NJ 08855-1390 (USA)


From bms at dcs.leeds.ac.uk  Fri Aug 24 13:31:19 1990
From: bms at dcs.leeds.ac.uk (B M Smith)
Date: Fri, 24 Aug 90 13:31:19 BST
Subject: Item for Distribution
Message-ID: <1560.9008241231@csuna6.dcs.leeds.ac.uk>


                      FINAL CALL FOR PAPERS

                            AISB'91

        8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE

                    University of Leeds, UK
                       16-19 April, 1991

The Society for the Study of Artificial Intelligence and Simulation of
Behaviour (SSAISB) will hold its eighth biennial conference at
Bodington Hall, University of Leeds, from 16 to 19 April 1991. There
will be a Tutorial Programme on 16 April followed by the full Technical
Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit
Brussel).

Scope:
Papers are sought in all areas of Artificial Intelligence and Simulation of
Behaviour, but especially on the following AISB91 special themes:

  * Emergent functionality in autonomous agents
  * Neural networks and self-organisation
  * Constraint logic programming
  * Knowledge level expert systems research

Papers may describe theoretical or practical work but should make a
significant and original contribution to knowledge about the field of
Artificial Intelligence. 

A prize of 500 pounds for the best paper has been offered by British
Telecom Computing (Advanced Technology Group). It is expected 
that the proceedings will be published as a book.

Submission:
All submissions should be in hardcopy in letter quality print and
should be written in 12 point or pica typewriter face on A4 or 8.5" x
11" paper, and should be no longer than 10 sides, single-spaced.
Each paper should contain an abstract of not more than 200 words and a
list of up to four keywords or phrases describing the content of the
paper. Five copies should be submitted. Papers must be written in
English. Authors should give an electronic mail address where possible. 
Submission of a paper implies that all authors have obtained
all necessary clearances from the institution and that an author will
attend the conference to present the paper if it is accepted. Papers
should describe work that will be unpublished on the date of the
conference.

Dates:
  Deadline for Submission:		1 October 1990
  Notification of Acceptance:		7 December 1990
  Deadline for camera ready copy:	16 January 1991

Location: 
Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The
city of Leeds is two and a half hours by rail from London, and there are
frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam
and Paris. The Yorkshire Dales National Park is close by, and the historic 
city of York is only 30 minutes away by rail.

Information:
Papers and all queries regarding the programme should be sent to
Judith Dennison. All other correspondence and queries regarding the
conference to the Local Organiser, Barbara Smith.

  Ms. Judith Dennison			Dr. Barbara Smith
  Cognitive Sciences			Division of AI
  University of Sussex			School of Computer Studies
  Falmer				University of Leeds
  Brighton BN1 9QN			Leeds LS2 9JT
  UK					UK

  Tel: (+44) 273 678379			Tel: (+44) 532 334627
  Email: judithd at cogs.sussex.ac.uk	FAX: (+44) 532 335468
					Email: aisb91 at ai.leeds.ac.uk


From tgd at turing.CS.ORST.EDU  Fri Aug 24 17:55:56 1990
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Fri, 24 Aug 90 14:55:56 PDT
Subject: Human confusability of phonemes
Message-ID: <9008242155.AA06954@turing.CS.ORST.EDU>

I am conducting a comparison study of several learning algorithms on
the nettalk task.  To make the comparisons fair, I would like to be
able to rate the severity of prediction errors made by these
algorithms.  For example, if the desired phoneme is /k/ (the k in
"key") and the phoneme produced by the learned network is /e/ (the a
in "late"), then this is a bad error.  On the other hand, substituting
/x/ (the a in "pirate") for /@/ (the a in "cab") should probably not
count as much of an error.

Can any readers point me to research that has been done on the
confusability of different phonemes (i.e., to what extent human
listeners can confuse two phonemes or reliably detect their
difference)?

Thanks,

Tom Dietterich

Thomas G. Dietterich
Department of Computer Science
Dearborn Hall, 306
Oregon State University
Corvallis, OR 97331-3202

From schraudo%cs at ucsd.edu  Fri Aug 24 18:18:46 1990
From: schraudo%cs at ucsd.edu (Nici Schraudolph)
Date: Fri, 24 Aug 90 15:18:46 PDT
Subject: TR announcement (hardcopy and ftp)
Message-ID: <9008242218.AA14587@beowulf.ucsd.edu>

The following technical report is now available in print:

--------

        Dynamic Parameter Encoding for Genetic Algorithms
        -------------------------------------------------

           Nicol N. Schraudolph       Richard K. Belew


The selection of fixed binary gene representations for real-valued
parameters of the phenotype required by Holland's genetic algorithm
(GA) forces either the sacrifice of representational precision for
efficiency of search or vice versa.  Dynamic Parameter Encoding (DPE)
is a mechanism that avoids this dilemma by using convergence statistics
derived from the GA population to adaptively control the mapping from
fixed-length binary genes to real values.  By reducing the length of
genes DPE causes the GA to focus its search on the interactions between
genes rather than the details of allele selection within individual
genes.  DPE also highlights the general importance of the problem of
premature convergence in GAs, explored here through two convergence
models.

--------

To obtain a hardcopy, request technical report LAUR 90-2795 via e-mail
from office%bromine at LANL.GOV, or via plain mail from

	Technical Report Requests
	CNLS, MS-B258
	Los Alamos National Laboratory
	Los Alamos, NM 87545
	USA

--------

As previously announced, the report is also available in compressed
PostScript format for anonymous ftp from the Artificial Life archive
server.  To obtain a copy, use the following procedure:

$ ftp iuvax.cs.indiana.edu   % (or 129.79.254.192)
login: anonymous
password: <anything>
ftp> cd pub/alife/papers
ftp> binary
ftp> get schrau90-dpe.ps.Z
ftp> quit
$ uncompress schrau90-dpe.ps.Z
$ lpr schrau90-dpe.ps

--------

The DPE algorithm is an option in the GENESIS 1.1ucsd GA simulator, which
will be ready for distribution (via anonymous ftp) shortly.  Procedures
for obtaining 1.1ucsd will then be announced on this mailing list.

--------

Nici Schraudolph, C-014                nschraudolph at ucsd.edu
University of California, San Diego    nschraudolph at ucsd.bitnet
La Jolla, CA 92093                     ...!ucsd!nschraudolph

From mikek at wasteheat.colorado.edu  Mon Aug 27 19:42:44 1990
From: mikek at wasteheat.colorado.edu (Mike Kranzdorf)
Date: Mon, 27 Aug 90 17:42:44 -0600
Subject: Mactivation - new info
Message-ID: <9008272342.AA25683@wasteheat.colorado.edu>

***Please note new physical address***

Mactivation is an introductory neural network simulator which
runs on all Macintoshes.  A graphical interface provides direct
access to units, connections, and patterns. Basic concepts of 
associative memory and network operation can be explored, with
many low level parameters available for modification. Back-
propagation is not supported. A user's manual containing an 
introduction to connectionist networks and program documentation
is included on one 800K Macintosh disk. The current version is 3.3

Mactivation is available from the author, Mike Kranzdorf. The 
program may be freely copied, including for classroom distribution.
To obtain a copy, send your name and address and a check payable
to Mike Kranzdorf for $5 (US). International orders should send either
an international postal money order for five dollars US or ten (10)
international postal coupons. 

Mactivation 3.2 is available via anonymous ftp on boulder.colorado.edu
Please don't ask me how to deal with ftp - that's why I offer it via
snail mail. I will probably post version 3.3 soon, it depends on some
politics here.

Mike Kranzdorf
P.O. Box 1379
Nederland, CO 80466-1379


From mikek at wasteheat.colorado.edu  Tue Aug 28 12:24:52 1990
From: mikek at wasteheat.colorado.edu (Mike Kranzdorf)
Date: Tue, 28 Aug 90 10:24:52 -0600
Subject: Mactivation ftp location
Message-ID: <9008281624.AA26266@wasteheat.colorado.edu>

Sorry I forgot to include the ftp specifics:

Machine: boulder.colorado.edu
Directory: /pub
File Name: mactivation.3.2.sit.hqx.Z

I really will try to put version 3.3 there soon. Please send me
comments if you use Mactivation. I am very responsive to good
suggestions and will add them when possible. Back-prop will come in version
4.0, but that's a complete re-write. I can add smaller things to 3.3.

--mike


From pako at neuronstar.it.lut.fi  Thu Aug 30 05:05:47 1990
From: pako at neuronstar.it.lut.fi (Pasi Koikkalainen)
Date: Thu, 30 Aug 90 12:05:47 +0300
Subject: ICANN International Conference on Artificial Neural Networks
Message-ID: <9008300905.AA01460@neuronstar.it.lut.fi>


                           ICANN-91
      INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS

                Helsinki University of Technology
                Espoo, Finland, June 24-28,  1991
 

Conference Chair:                 Conference Committee:
 Teuvo Kohonen   (Finland)          Bernard Angeniol   (France)
                                    Eduardo Caianiello (Italy)
Program Chair:                      Rolf Eckmiller     (FRG)
 Igor Aleksander (England)          John Hertz         (Denmark)
                                    Luc Steels         (Belgium)

                      CALL FOR PAPERS
                    ===================


THE CONFERENCE:
===============
Theories, implementations, and  applications  of Artificial Neural Networks
are  progressing  at  a  growing  speed  both  in  Europe   and  elsewhere.
The first commercial hardware for neural circuits and systems are emerging.
This  conference  will be  a major  international contact forum for experts
from academia and industry worldwide. Around 1000 participants are expected.

ACTIVITIES:
===========
 - Tutorials
 - Invited talks
 - Oral and poster sessions
 - Prototype demonstrations
 - Video presentations
 - Industrial exhibition

-------------------------------------------------------------------------

 Complete papers of at most 6 pages are invited for oral or poster
 presentation in one of the sessions given below:

 1.  Mathematical theories of networks and dynamical systems
 2.  Neural network architectures and algorithms
      (including organizations and comparative studies)
 3.  Artificial associative memories
 4.  Pattern recognition and signal processing (especially vision and speech)
 5.  Self-organization and vector quantization
 6.  Robotics and control
 7.  "Neural" knowledge data bases and non-rule-based decision making
 8.  Software development
      (design tools, parallel algorithms, and software packages)
 9.  Hardware implementations (coprocessors, VLSI, optical, and molecular)
 10. Commercial and industrial applications
 11. Biological and physiological connection
      (synaptic and cell functions, sensory and motor functions, and memory)
 12. Neural models for cognitive science and high-level brain functions
 13. Physics connection (thermodynamical models, spin glasses, and chaos)

--------------------------------------------------------------------------

Deadline for submitting manuscripts is January 15,  1991.  The  Conference
Proceedings will be published as a book by Elsevier Science Publishers B.V.
Deadline for sending final papers on  the special forms is  March 15, 1991.
For  more information and instructions for submitting manuscripts,  please
contact:

Prof. Olli Simula
ICANN-91 Organization Chairman
Helsinki University of Technology
SF-02150 Espoo, Finland
Fax: +358 0 451 3277
Telex: 125161 HTKK SF
Email (internet): icann91 at hutmc.hut.fi

---------------------------------------------------------------------------

In addition to the scientific program,  several social  occasions will be
included in  the  registration fee.  Pre- and post-conference  tours  and
excursions will also be arranged. For more information about registration
and accommodation, please contact:

Congress Management Systems
P.O.Box 151
SF-00141 Helsinki, Finland
Tel.: +358 0 175 355
Fax: +358 0 170 122
Telex: 123585 CMS SF


From uhr at cs.wisc.edu  Thu Aug 30 12:30:30 1990
From: uhr at cs.wisc.edu (Leonard Uhr)
Date: Thu, 30 Aug 90 11:30:30 -0500
Subject: Summary (long): pattern recognition comparisons
Message-ID: <9008301630.AA10562@thor.cs.wisc.edu>

A quick response to the responses to my comments on the gap between nets and
computer vision (I've been out of town, and now trying to catch up on mail):

I certainly wasn't suggesting that the number of input nodes matters, but
simply that complex images must be resolved in enough detail to be
recognizable.  Gary Cottrell's 64x64 images may be adequate for faces (tho
I suspect finer resolution is needed as more people are used, with many
different expressions (much less rotations) for each).  But the point is that
complete connectivity from layer to layer needs O(N**2) links, and the fact that
"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of
how complete connectivity dominates.  Once the preprocessor is handled by the
net itself it will either need too many links or have ad hoc structure.
It's surely better to use partial connectivity (e.g., local - which is a very
general assumption motivated by physical interactions and brain structure)
than some inevitably ad hoc preprocessing steps of unknown value.
  Evaluation is tedious and unrewarding, but without it we simply can't make
claims or compare systems.  I'm not arguing against nets - to the contrary,
I think that highly parallel nets are the only possibility for handling really
hard problems like recognition, language handling, and reasoning.  But they'll
need much better structure (or the ability to evolve and generate needed
structures).  And I was asking for objective evidence that 3-layer feed-forward
nets with links between all nodes in adjacent layers actually handle complex
images better than some of the large and powerful computer vision systems.
True - we know that in theory they can do anything.  But that's no better than
knowing that random search through the space of all Turing machine programs
can do anything.

Len Uhr

From ahmad at ICSI.Berkeley.EDU  Thu Aug 30 16:20:13 1990
From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad)
Date: Thu, 30 Aug 90 13:20:13 PDT
Subject: Summary (long): pattern recognition comparisons
In-Reply-To: Leonard Uhr's message of Thu, 30 Aug 90 11:30:30 -0500 <9008301630.AA10562@thor.cs.wisc.edu>
Message-ID: <9008302020.AA02846@icsib18.Berkeley.EDU>


>But the point is that
>complete connectivity from layer to layer needs O(N**2) links, and the fact that
>"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of
>how complete connectivity dominates.  Once the preprocessor is handled by the
>net itself it will either need too many links or have ad hoc structure.
>It's surely better to use partial connectivity (e.g., local - which is a very
>general assumption motivated by physical interactions and brain structure)
>than some inevitably ad hoc preprocessing steps of unknown value.

Systems with selective attention mechanisms provide yet another way of
avoiding the combinatorics.  In these models, you can route relevant
feature values from arbitrary locations in the image to a central
processor.  The big advantage is that the central processor can now be
quite complex (possibly fully connected) since it only has to deal
with a relatively small number of inputs.


--Subutai Ahmad
ahmad at icsi.berkeley.edu


References:

Koch, C. and Ullman, S. Shifts in Selective Attention: towards the 
  underlying neural circuitry. Human Neurobiology, Vol 4:219-227,
  1985.

Ahmad, S. and Omohundro, S. Equilateral Triangles: A Challenge for
  Connectionist Vision. In Proceedings of the 12th Annual meeting of the
  Cognitive Science Society, MIT, 1990.

Ahmad, S. and Omohundro, S. A Network for Extracting the Locations of
  Point Clusters Using Selective Attention, ICSI Tech Report 
  No. TR-90-011, 1990.

From kawahara at av-convex.ntt.jp  Fri Aug 31 10:43:46 1990
From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA)
Date: Fri, 31 Aug 90 23:43:46+0900
Subject: JNNS'90 Program Summary (long)
Message-ID: <9008311443.AA11611@av-convex.ntt.jp>

The first annual conference of Japan Neural Network Society
(JNNS'90) will be held from 10 to 12 September, 1990. Followings
are the program summary and related information on JNNS. There
are 2 Invited presentations, 23 oral presentations and 53 poster
presentations.  Unfortunately, a list of the presentation titles
in English is not available yet, because many authors didn't
provide English titles for their presentations (Official
languages for the proceding were Japanese and English. But only
two articles were written in English). I will try to compile the
English list by the end of September and would like to introduce
it.

If you have any questions or comments, please e-mail to the
following address. (Please *DON'T REPLY*.)

kawahara at nttlab.ntt.jp
- ----------------------------------------------
Hideki Kawahara
NTT Basic Research Laboratories
3-9-11, Midori-cho
Musashino, Tokyo 180, JAPAN
Tel: +81 422 59 2276, Fax: +81 422 59 3393
- ----------------------------------------------

			    JNNS'90
		   1990 Annual Conference of
		  Japan Neural Network Society
				
		     September 10-12, 1990
				
		      Tamagawa University,
		     6-1-1 Tamagawa-Gakuen
		   Machida, Tokyo 194, Japan
				
			Program Summary

Monday, 10 September 1990
12:00		Registration
13:00 - 16:00	Oral Session O1: Learning
16:00 - 18:00	Poster session P1: Learning, Motion and Architecture
18:00		Organization Committee

Tuesday, 11 September 1990
9:00  - 12:00	Oral Session O2: Motion and Architecture
13:00 - 13:30	Plenary Session
13:30 - 15:30	Invited Talk;
		"Brain Codes of Shapes: Experiments and Models" by
		Keiji Tanaka
		"Theories: from 1980's to 1990's" by
		Shigeru Shinomoto
15:30 - 18:30	Oral Session O3: Vision I
19:00		Reception

Wednesday, 12 September 1990
9:00  - 12:00	Oral Session O4: Vision II, Time Series and Dynamics
13:00 - 15:00	Poster Session P2: Vision I, II, Time Series and Dynamics
15:00 - 16:45	Oral Session O5: Dynamics

Room 450 is for Oral Session, Plenary Session and Invited talk.
Rooms 322, 323, 324, 325 and 350 are for Poster Session.

Registration Fees for Conference
Members			5000 yen
Student members		3000 yen
Otherwise		8000 yen

Reception
19:00 Tuesday, 12 September 1990
Sakufuu-building
Fee:  5000 yen


	       JNNS Officers and Governing board

Kunihiko Fukushima		Osaka University
				President
Shiun-ichi Amari		University of Tokyo
				International Affair
Secretary
Minoru Tsukada			Tamagawa University
Takashi Nagano			Hosei University

Publication
Shiro Usui			Toyohashi University of Technology
Yoichi Okabe			University of Tokyo
Sei Miyake			NHK Science and Technical Research Labs.

Planning
Yuichiro Anzai			Keio University
Keisuke Toyama			Kyoto Prefectural School of Medicine
Nozomu Hoshimiya		Tohoku University

Treasurer
Naohiro Ishii			Nagoya Institute of Technology
Hideaki Saito			Tamagawa University

Regional Affair
Ken-ichi Hara			Yamagata University
Hiroshi Yagi			Toyama University
Eiji Yodogawa			ATR
Syozo Yasui			Kyushu Institute of Technology

Supervisor
Noboru Sugie			Nagoya University


		       Committee members

Editorial Committee (Newsletter and mailing list)
Takashi Omori			Tokyo University of Agriculture and Technology
Hideki Kawahara			NTT Basic Research Labs.
Itirou Tsuda			Kyushu Institute of Technology

Planning Committee
Kazuyuki Aihara			Tokyo Denki University
Shigeru Shinomoto		Kyoto University
Keiji Tanaka			The Institute of Physical and Chemical Research


	    JNNS'90 Conference Organizing Committee

Sei Miyake			NHK Science and Technical Research Labs.
				General Chairman
Keiji Tanaka			The Institute of Physical and Chemical Research
				Program Chairman
Shigeru Shinomoto		Kyoto University
				Publicity Chairman
Program
Takayuki Ito			NHK Science and Technical Research Labs.
Takashi Omori			Tokyo University of Agriculture and Technology
Koji Kurata			Osaka University
Kenji Doya			University of Tokyo
Kazuhisa Niki			Electrotechnical Laboratory
Ryoko Futami			Tohoku University

Publicity
Kazunari Nakane			ATR

Publication
Hideki Kawahara			NTT Basic Research Labs.
Mahito Fujii			NHK Science and Technical Research Labs.

Treasurer
Shin-ichi Kita			University of Tokyo
Manabu Sakakibara		Toyohashi University of Technology

Local Arrangement
Shigeru Tanaka			Fundamental Research Labs., NEC
Makoto Mizuno			Tamagawa University


For more details, please contact: 

Japan Neural Network Society Office
Faculty of Engineering, Tamagawa University
6-1-1 Tamagawa-Gakuen
Machida, Tokyo 194, Japan
Telephone: +81 427 28 3457 
Facsimile: +81 427 28 3597