From carol at ai.toronto.edu Fri Aug 3 11:14:14 1990 From: carol at ai.toronto.edu (Carol Plathan) Date: Fri, 3 Aug 90 11:14:14 EDT Subject: research programmer job Message-ID: <90Aug3.111424edt.268@neuron.ai.toronto.edu> RESEARCH PROGRAMMER JOB AT THE UNIVERSITY OF TORONTO STARTING SALARY: $36,895 - $43,406 STARTING DATE: Fall 1990 The Connectionist Research Group in the Department of Computer Science at the University of Toronto is looking for a research programmer to develop a neural network simulator that uses Unix, C, and X-windows. The simulator will be used by our group of about 10 researchers, directed by Geoffrey Hinton, to explore learning procedures and their applications. It will also be released to some researchers in Canadian Industry. We already have a fast, flexible simulator and the programmer's main job will be to further develop, document, and maintain this simulator. The development may involve some significant re-design of the basic simulator. Additional duties (if time permits) will include: Implementing several different learning procedures within the simulator and investigating their performance on various data-sets; Assisting industrial collaborators and visitors in the use of the simulator; Porting the simulator to faster workstations or to boards that use fast processors such as the Intel i860 or DSP chips; Developing software for a project that uses a data-glove as an input device to an adaptive neural network that drives a speech synthesizer; Assisting in the acquisition and installation of hardware and software required for the project; The applicant should possess a Bachelors or Masters, preferably in Computer Science or Electrical Engineering, and have at least two years programming experience including experience with unix and C, and some experience with graphics. Knowledge of elementary calculus and elementary linear algebra is essential. Knowlege of numerical analysis, information theory, and perceptual or cognitive psychology would be advantageous. Good oral and written communication skills are required. Please send CV + names of two or three references to Carol Plathan, Computer Science Department, University of Toronto, 10 Kings College Road, Toronto Ontario M5S 1A4. You could also send the information by email to carol at ai.toronto.edu or call Carol at 416-978-3695 for more details. The University of Toronto is an equal opportunity employer. ADDITIONAL INFORMATION The job can be given to a non-Canadian if they are better than any Canadians or Canadian Residents who apply. In this case, the non-Canadian would probably start work here on a temporary work permit while the application for a more permanent permit was being processed. There are already SEVERAL good applicants for the job. Candidates who do not already program fluently in C or have not already done neural network simulations stand very little chance. Also, it is basically a programming job. The programmer may get involved in some original research on neural nets, but this is NOT the main part of the job, so it is not suitable for postdoctoral researchers who want to get on with their own research agenda. Interviews will be during September. We will definitely not employ anybody without an interview and we cannot afford to pay travel expenses for interviews (except in very exceptional circumstances). If there are several good applicants from the west coast of the USA, I may arrange to interview them in California. We already have sufficient funding to support the programmer for the next three years. However, we have applied to the Canadian Government for additional funding specifically for this work, and if it comes through (in November 1990) the programmer will be transferred to that source of funding and the simulator will definitely be supplied to Canadian Industry. The job will then require more interactions with industrial users and more systematic documentation, maintainance and debugging of the simulator releases. From uhr at cs.wisc.edu Fri Aug 3 15:18:11 1990 From: uhr at cs.wisc.edu (Leonard Uhr) Date: Fri, 3 Aug 90 14:18:11 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008031918.AA23586@thor.cs.wisc.edu> Neural nets using backprop have only handled VERY SIMPLE images, usually in 8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in logarithmically converging nets, but I don't know of any nets with complete connectivity from one layer to the next that are that big.) In sharp contrast, pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (e.g. houses, furniture) in 128-by-128 or even larger inputs. So I've been really surprised to read statements to the effect NN have proved to be much better. What experimental evidence is there that NN recognize images as complex as those handled by computer vision and pattern recognition approaches? True it's hard to run good comparative experiments, but without them where are we? NN re-introduce learning, which is great - except that to make learning work we need to cut down and direct the explosive search at least as much as using any other approach. The brain is THE bag of tools that does the trick, and it has a lot of structure (hierarchical convergence-divergence; local links to relatively small numbers; families of feature-detectors) that can substantially improve today's nets. More powerfufl structures, basic processes, and learning mechanisms are essential to replace weak learning algorithms like delta and backprop that need O(N*N) links to guarantee (eventual) success - hence can't even be run on images with more than a few hundred pixels. Len Uhr From N.E.Sharkey at cs.exeter.ac.uk Sat Aug 4 16:30:53 1990 From: N.E.Sharkey at cs.exeter.ac.uk (Noel Sharkey) Date: Sat, 4 Aug 90 16:30:53 BST Subject: special issue Message-ID: <11054.9008041530@entropy.cs.exeter.ac.uk> The NATURAL LANGUAGE special issue of CONNECION SCIENCE will be on the shelves soon. I though you might like to see the contents. CONTENTS Catherine L Harris Connectionism and Cognitive Linguistics John Rager & George Berg A Connectionist Model of Motion and Government in Chomsky's Government-binding Theory David J Chalmers Syntactic Transformations on Distributed Representations Stan C Kwasny & Kanaan A Faisal Connectinism and Determinism in a Syntactic Parser Risto Miikkulainen Script Recognition with Hierarchical Feature Maps Lorraine F R Karen Identification of Topical Entities in Discouse: a Connectionist Approach to Attentional Mechanism in Language Mary Hare The Role of Similarity in Hungarian Vowel Harmony: a Connectionist Account Robert Port Representation and Recognition of Temporal Patterns Editor: Noel E. Sharkey, University of Exeter Special Editorial Review Panel Robert B. Allen, Bellcore Garrison W. Cottrell, University of California, San Diego Michael G. Dyer, University of California, Los Angeles Jeffrey L. Elman, University of California, San Diego George Lakoff, University of California, Berkeley Wendy G. Lehnert, University of Massachusetts at Amherst Jordan Pollack, Ohio State University Ronan Reilly, Beckman Institute, University of Illinois at Urbana-Champaign Bart Selman, University of Toronto Paul Smolensky, University of Colorado, Boulder We would like to encourage the CNLP community to submit many more papers, and we would particulary like to see more papers on representational issues. noel From schraudo%cs at ucsd.edu Sat Aug 4 15:43:20 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Sat, 4 Aug 90 12:43:20 PDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008041943.AA01622@beowulf.ucsd.edu> > From: Leonard Uhr > > Neural nets using backprop have only handled VERY SIMPLE images, usually in > 8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in > logarithmically converging nets, but I don't know of any nets with complete > connectivity from one layer to the next that are that big.) In sharp contrast, > pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (eg > houses, furniture) in 128-by-128 or even larger inputs. So I've been really > surprised to read statements to the effect NN have proved to be much better. > What experimental evidence is there that NN recognize images as complex as > those handled by computer vision and pattern recognition approaches? Well, Gary Cottrell for instance has successfully used a standard (3-layer, fully interconnected) backprop net for various face recognition tasks from 64x64 images. While I agree with you that many NN architectures don't scale well to large input sizes, and that modular, heterogenous architectures have the potential to overcome this limitation, I don't understand why you insist that current NNs could only handle simple images - unless you consider any image with less than 16k pixels simple. Does face recognition qualify as a complex visual task with you? The whole point of using comparatively inefficient NN setups (such as fully interconnected backprop nets) is that they are general enough to solve complex problems without built-in heuristics. Modular NNs require either a lot of prior knowledge about the problem you are trying to solve, or a second adaptive system (such as a GA) to search the architecture space. In the former case the problem is comparatively easy, and in the latter computational complexity rears its ugly head again... having said that, I do believe that GA/NN hybrids will play an important role in the future. I'm afraid I don't have a reference for Gary Cottrell's work - maybe someone else can post the details? -- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From honavar at cs.wisc.edu Sat Aug 4 20:43:56 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sat, 4 Aug 90 19:43:56 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008050043.AA05173@goat.cs.wisc.edu> >The whole point of using comparatively inefficient NN setups (such as fully >interconnected backprop nets) is that they are general enough to solve >complex problems without built-in heuristics. While I know of theoretical results that show that a feedforward neural net exists that can adequately encode any arbitrary real-valued function (Hornick, Stinchcombe, & White, 1988; Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of any results that suggest that such nets can LEARN any real-vauled function using backpropagation (ignoring the issue of computational tractability). Heuristics (or architectural constraints) like those used by some researchers for some vision problems - locally linked multi-layer converging nets (probably one of the most successful demonstrations is the work of LeCun et al. on handwritten zip code recognition) are interesting because they constrain (or bias) the network to develop particular types of representations. Also, they might enable efficient learning to take place in tasks that exhibit a certain intrinsic structure. The choice of a particular fixed neural network architecture (even if it is fully interconnected backprop net) implies the use of a corresponding representational bias. Whether such a representational bias is in any sense more general than some other (e.g., a network of nodes with limited fan-in but sufficient depth) is questionable (For any given completely interconnected feedforward network, there exists a functionally equivalent feedforward network of nodes with limited fan in - and for some problems, the latter may be more efficient). On a different note, how does one go about assessing the "generality" of a learning algorithm/architecture in practice? I would like to see a discussion on this issue. Vasant Honavar (honavar at cs.wisc.edu) From schraudo%cs at ucsd.edu Sun Aug 5 05:54:43 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Sun, 5 Aug 90 02:54:43 PDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008050954.AA00265@beowulf.ucsd.edu> > From honavar at cs.wisc.edu Sat Aug 4 17:45:01 1990 > > While I know of theoretical results that show that a feedforward > neural net exists that can adequately encode any arbitrary > real-valued function (Hornick, Stinchcombe, & White, 1988; > Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of > any results that suggest that such nets can LEARN any real-vauled > function using backpropagation (ignoring the issue of > computational tractability). > It is my understanding that some of the latest work of Hal White et al. presents a learning algorithm - backprop plus a rule for adding hidden units - that can (in the limit) provably learn any function of interest. (Disclaimer: I don't have the mathematical proficiency required to fully appreciate White et al.'s proofs and thus have to rely on second-hand interpretations.) > On a different note, how does one go about assessing the > "generality" of a learning algorithm/architecture in practice? > I would like to see a discussion on this issue. > I second this motion. As a starting point for discussion, would the Kolmogorov complexity of an architectural description be useful as a measure of architectural bias? -- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From aarons at cogs.sussex.ac.uk Sun Aug 5 07:57:52 1990 From: aarons at cogs.sussex.ac.uk (Aaron Sloman) Date: Sun, 5 Aug 90 12:57:52 +0100 Subject: Summary (long): pattern recognition comparisons Message-ID: <6816.9008051157@csuna.cogs.susx.ac.uk> > From: Leonard Uhr > > Neural nets using backprop have only handled VERY SIMPLE images..... > .......In sharp contrast, pr/computer vision systems are designed > to handle MUCH MORE COMPLEX images (eg houses, furniture) in > 128-by-128 or even larger inputs.... ..... > From: Nici Schraudolph > Well, Gary Cottrell for instance has successfully used a standard (3-layer, > fully interconnected) backprop net for various face recognition tasks from > 64x64 images. While I agree with you that many NN architectures don't scale > well to large input sizes, and that modular, heterogenous architectures have > the potential to overcome this limitation, I don't understand why you insist > that current NNs could only handle simple images - unless you consider any > image with less than 16k pixels simple. Does face recognition qualify as a > complex visual task with you? > ...... Characterising the complexity of the task in terms of the number of pixels seems to me to miss the most important points. Some (but by no means all) of the people working on NNs appear to have joined the field (the bandwagon?) without feeling obliged to study the AI literature on vision, perhaps because it is assumed that since the AI mechanisms are "wrong" all the literature must be irrelevant? On the contrary, good work in AI vision was concerned with understanding the nature of the task (or rather tasks) of a visual system, independently of the mechanisms postulated to perform those tasks. (When your programs fail you learn more about the nature of the task.) Recognition of isolated objects (e.g. face recognition) is just _one_ of the tasks of vision. Others include: (a) Interpreting a 2-D array (retinal array or optic array) in terms of 3-D structures and relationships. Seeing the 3-D structure of a face is a far more complex task than simply attaching a label: "Igor", "Bruce" or whatever. (b) Segmenting a complex scene into separate objects and describing the relationships between them (e.g. "houses, furniture"!). (The relationships include 2-D and 3-D spatial and functional relations.) Because evidence for boundaries is often unclear and ambiguous, and because recognition has to be based on combinations of features, the segmentation often cannot be done without recognition and recognition cannot be done without segmentation. This chicken and egg problem can lead to dreadful combinatorial searches. NNs offer the prospect of doing some of the searching in parallel by propagating constraints, but as far as I know they have not yet matched the more sophisticated AI visual systems. (It is important to distinguish segmentation, recognition and description of 2-D image fragments from segmentation, recognition and description of 3-D objects. The former seems to be what people in pattern recognition and NN research concentrate on most. The latter has been a major concern of AI vision work since the mid/late sixties, starting with L.G. Roberts I think, although some people in AI have continued trying to find 2-D cues to 3-D segmentation. Both 2-D and 3-D interpretations are important in human vision.) (c) Seeing events, processes and their relationships. Change "2-D" to "3-D" and "3-D" to "4-D" in (b) above. We are able to segment, recognize and describe events, processes and causal relationships as well as objects (e.g. following, entering, leaving, catching, bouncing, intercepting, grasping, sliding, supporting, stretching, compressing, twisting, untwisting, etc. etc.) Sometimes, as Johansson showed by attaching lights to human joints in a dark room, motion can be used to disambiguate 3-D structure. (d) Providing information and/or control signals for motor-control mechanisms: e.g. visual feedback is used (unconsciously) for posture control in sighted people, also controlling movement of arm, hand and fingers in grasping, etc. (I suspect that many such processes of fine tuning and control use changing 2-D "image" information rather than (or in addition to) 3-D structural information.) That's still only a partial list of the tasks of a visual system. For more detail see: A. Sloman `On designing a visual system: Towards a Gibsonian computational model of vision' in Journal of Experimental and Theoretical AI 1,4, 1989 Ballard, D.H. and C.M. Brown, Computer Vision, Englewood-Cliffs, Prentice Hall 1982. A system might be able to recognize isolated faces or other objects in an image by using mechanisms that would fail miserably in dealing with cluttered scenes where recognition and segmentation need to be combined. So a NN that recognised faces might tell us nothing about how it is done in natuarly visual systems, if the latter use more general mechanisms. One area in which I think neither AI nor NN work has made significant progress is shape perception. (I don't mean shape recognition!). People, and presumably many other animals, can see complex, intricate, irregular and varied shapes in a manner that supports a wide range of tasks, including recognizing, grasping, planning, controlling motion, predicting the consequences of motion, copying, building, etc. etc. Although a number of different kinds of shape representations have been explored in work on computer vision, CAD, graphics etc. (e.g. feature vectors; logical descriptions; networks of nodes and arcs; numbers representing co-ordinates, orientations, curvature etc; systems of equations for lines, planes, and other mathematically simple structures; fractals; etc. etc. etc.) they all seem capable of capturing only a superficial subset of what we can see when we look at kittens, sand dunes, crumpled paper, a human torso, a shrubbery, cloud formations, under-water scenes, etc. (Work on computer graphics is particularly misleading, because people are often tempted to think that a representation that _generates_ a natural looking image on a screen must capture what we see in the image, or in the scene that it depicts.) Does anyone have any idea what kind of breakthrough is needed in order to give a machine the kind of grasp of shape that can explain animal abilities to cope with real environments? Is there anything about NN shape representations that given them an advantage over others that have been explored, and if so what are they? I suspect that going for descriptions of static geometric structure is a dead end: seeing a shape really involves seeing potential processes involving that shape, and their limits (something like what J.J. Gibson meant by "affordances"?). I.e. a 3-D shape is inherently a vast array of 4-D possibilities and one of the tasks of a visual system is computing a large collection of those possibilities and making them readily available for a variety of subsequent processes. But that's much too vague an idea to be very useful. Or is it? Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons at cogs.sussex.ac.uk or: aarons%uk.ac.sussex.cogs at nsfnet-relay.ac.uk From honavar at cs.wisc.edu Sun Aug 5 15:48:37 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sun, 5 Aug 90 14:48:37 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008051948.AA00212@goat.cs.wisc.edu> >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) I can see how allowing the addition of (potentially unbounded number of hidden units) could enable a back-prop architecture to learn arbitrary functions. But in this sense, any procedure that builds up a look-up table or random-access memory (with some interpolation capability to cover the instances not explicitly stored) using an appropriate set of rules to add units is equally general (and probably more efficient than backprop in terms of time complexity of learning (cf Baum's proposal for more powerful learning algorithms). However look-up tables can be combinatorially intractable in terms of memory (space) complexity. This brings us to the issue of searching the architectural space along with the weight space in an efficient manner. There has already been some work in this direction (Fahlman's cascade correlation architecture, Ash's DNC, Honavar & Uhr's generative learning, Hanson's meiosis networks, and some recent work on ga-nn hybrids). We have been investigating methods to constrain the search in the architectural space (using heuristic controls / representational bias :-) ). I would like to hear from others who might be working on related issues. Vasant Honavar (honavar at cs.wisc.edu) From galem at mcc.com Sun Aug 5 17:48:25 1990 From: galem at mcc.com (Gale Martin) Date: Sun, 5 Aug 90 16:48:25 CDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008052148.AA02989@sunkist.aca.mcc.com> Leonard Uhr states (about NN learning) "to make learning work, we need to cut down and direct explosive search at least as much as using any other approach." Certainly there is reason to agree with this in the general case, but I doubt it's validity in important specific cases. I've spent the past couple of years working on backprop-based handwritten character recognition and find almost no supporting evidence of the need for explicitly cutting down on explosive search through the use of heuristics in these SPECIFIC cases and circumstances. We varied input character array size (10x16, 15x24, 20x32) to backprop nets and found no difference in the number of training samples required to achieve a given level of generalization performance for hand-printed letters. In nets with one hidden layer, we increased the number of hidden nodes from 50 to 383 and found no increase in the number of training samples needed to achieve high generalization (in fact, generalization is worse for the 50 hidden node case). We experimented extensively with nets having local connectivity and locally-linked nets in this domain and find similarly little evidence to support the need for such heuristics. These results hold across two different types of handwritten character recognition tasks (hand-printed letters and digits). This domain/case-specific robustness across architectural parameters and input size is one way to characterize the generality of a learning algorithm and may recommend one algorithm over another for specific problems. Gale Martin Martin, G. L., & Pittman, J. A. Recognizing hand-printed letters and digits in D.S. Touretzky (Ed.) Advances in Neural Information Processing Systems 2, 1990. Martin, G.L., Leow, W.K. & Pittman, J. A. Function complexity effects on backpropagation learning. MCC Tech Report ACT-HI-062-90. From ganesh at cs.wisc.edu Sun Aug 5 17:59:23 1990 From: ganesh at cs.wisc.edu (Ganesh Mani) Date: Sun, 5 Aug 90 16:59:23 -0500 Subject: Paper Message-ID: <9008052159.AA21968@sharp.cs.wisc.edu> The following paper is available for ftp from the repository at Ohio State. Please backpropagate comments (and errors!) to ganesh at cs.wisc.edu. -Ganesh Mani _________________________________________________________________________ Learning by Gradient Descent in Function Space Ganesh Mani Computer Sciences Dept. Unviersity of Wisconsin---Madison ganesh at cs.wisc.edu Abstract Traditional connectionist networks have homogeneous nodes wherein each node executes the same function. Networks where each node executes a different function can be used to achieve efficient supervised learning. A modified back-propagation algorithm for such networks, which performs gradient descent in ``function space,'' is presented and its advantages are discussed. The benefits of the suggested paradigm include faster learning and ease of interpretation of the trained network. _________________________________________________________________________ The following can be used to ftp the paper. unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): neuron ftp> cd pub/neuroprose ftp> type binary ftp> get (remote-file) mani.function-space.ps.Z (local-file) mani.function-space.ps.Z ftp> quit unix> uncompress mani.function-space.ps.Z unix> lpr -P(your_local_postscript_printer) mani.function-space.ps From honavar at cs.wisc.edu Mon Aug 6 00:27:25 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sun, 5 Aug 90 23:27:25 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008060427.AA00489@goat.cs.wisc.edu> We have found that with relatively small sample sizes, generalization performance is improved by local connectivity and weight sharing on simple 2-d patterns. For position-invariant recognition, local connectivity and weight-sharing give substantially better generalization performance than that obtained without local connectivity. Clearly this is a case where extensive empirical studies are needed to draw general conclusions. Vasant Honavar (honavar at cs.wisc.edu) From awyk at wapsyvax.oz.au Mon Aug 6 03:17:41 1990 From: awyk at wapsyvax.oz.au (Brian Aw) Date: Mon, 6 Aug 90 15:17:41+0800 Subject: No subject Message-ID: <9008060725.649@munnari.oz.au> Dear Sir/Mdm, Hello! My name is Brian Aw and my e-mail address is awyk at wapsyvax.oz Would you kindly put me on both your address list and your mailing list for connectionist related results. I am a Ph.D. student as well as a research officer in the Psychology Department of the University of Western Australia (UWA), Perth. I am working under the supervision of Prof. John Ross who has recently joined your lists. I am an enthusiastic worker of neural network theory. Currently, I am developing a neural network for feature classifications in images. This year, I have published a technical report in the Computer Scrience Department of UWA in this area. My work has also been accepted for presentation and publication in the forthcoming 4th Australian Joint Conference on Artificial Intelligence (AI'90). Working in this field which advances so rapidly, I certainly need the kind of fast going and up-to-date informations which your system can provide. Thanking you in advance. brian. From erol at ehei.ehei.fr Mon Aug 6 08:07:39 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 12:09:39 +2 Subject: IJPRAI CALL FOR PAPERS Message-ID: <9008061041.AA24889@inria.inria.fr> Would you consider a paper on my "random network model" ? There are two papers already appeared or appearing in the journal Neural Computation. Best regards, Erol From erol at ehei.ehei.fr Mon Aug 6 05:47:31 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 09:49:31 +2 Subject: Visit to Poland Message-ID: <9008061014.AA24279@inria.inria.fr> I don't know about Poland, but you can contact me in Paris ! Erol Gelenbe From INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU Sun Aug 5 15:56:00 1990 From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU (INS_ATGE%JHUVMS.BITNET@VMA.CC.CMU.EDU) Date: Sun, 5 Aug 90 14:56 EST Subject: Similarity to Cascade-Correlation Message-ID: As a side note on the problem of using backpropagation on large problems, it should be noted that using efficient error minimization methods (i.e. conjugate-gradient methods) as opposed to the "vanilla" backprop described in _Parallel_Distributed_Processing_ allows one to work with much larger problems, and also allows for much greater performance on problems the network was trained on. For example, an IR target threat detection problem I have been recently working on (with 127 or 254 inputs and 20 training patterns) failed miserably when trained with "vanilla" backprop (hours and hours on a Connection Machine without success). When a conjugate-gradient training program was used, the network was able to learn 100% of the training set perfectly in just a minute or two. >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) How does this new work compare with the Cascade Correlation method developed by Fahlman, where a new hidden unit is added by training its receptive weights to maximize the correlation between its output and the network error, and then trains the projective weights to the outputs to minimize the error (thus only allowing single-layer backprop learning at each iteration)? -Thomas Edwards The Johns Hopkins University / U.S. Naval Research Lab From erol at ehei.ehei.fr Mon Aug 6 11:44:10 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 15:46:10 +2 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008061444.AA05688@inria.inria.fr> I would like to draw your attention to two recent papers of mine (my name is Erol Gelenbe) : Random networks with positive and negative signals and product form solutions in Neural Computation, Vol. 1, No. 4 (1989) Stability of the random network model in press in Neural Computation. The papers present a new model in which signals travel as "pulses". The quantity looked at in the model is the "neuron potential" in an arbitrarily connected network. I prove that these models have "product form" which means that there state can be computed simply and analytically. Comments and questions are welcome. erol at ehei.ehei.fr From fritz_dg%ncsd.dnet at gte.com Mon Aug 6 17:26:57 1990 From: fritz_dg%ncsd.dnet at gte.com (fritz_dg%ncsd.dnet@gte.com) Date: Mon, 6 Aug 90 17:26:57 -0400 Subject: neural network generators in Ada Message-ID: <9008062126.AA27920@bunny.gte.com> Are there any non-commercial Neural Network "generator programs" or such that are in Ada? (ie. generates suitable NN code from a set of user designated specifications, code suitable for embedding, etc). I'm interested in - experience developing and using same, lessons learned - to what uses such have been put, successful? - nature of; internal use of lists, arrays; what can be user specified, what can't; built-in limitations; level of HMI attached; compilers used; etc., etc. - and other relevant info developing and applying such from those who have tried developing and using them Am also interested in opinions on: If you were going to design a NN Maker _today_, how would you design it? If Ada were the language, what special things might be done? Motive should be transparent. My sincere thanks to all who respond. If there is interest, I'll turn the info (if any) around to the list in general. Dave Fritz fritz_dg%ncsd at gte.com (301) 738-8932 ---------------------------------------------------------------------- ---------------------------------------------------------------------- From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Mon Aug 6 23:20:09 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Mon, 06 Aug 90 23:20:09 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Sun, 05 Aug 90 14:56:00 -0500. Message-ID: >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) How does this new work compare with the Cascade Correlation method developed by Fahlman, where a new hidden unit is added by training its receptive weights to maximize the correlation between its output and the network error, and then trains the projective weights to the outputs to minimize the error (thus only allowing single-layer backprop learning at each iteration)? -Thomas Edwards The Johns Hopkins University / U.S. Naval Research Lab I'll take a stab at answering this. Maybe we'll also hear something from Hal White or one of his colleagues -- especially if I somehow misrepresent their work. I believe that all of the published completeness results from White's group assume a single layer of hidden units. They show that this architecture can approximate any desired transfer function (assuming it has certain smoothness properties) to any desired accuracy if you add enough units in this single layer. It's rather like proving that a piecewise linear approximation can approach any desired curve with arbitrarily small error as long as you're willing to use enough tiny pieces. Unless I've missed something, their work does not attempt to say anything about the minimum number of hidden units you might need in this hidden layer. Cascade-Correlation produces a feed-forward network of sigmoid units, but it differs in a number of ways from the kinds of nets considered by White: 1. Cascade-Correlation is intended to be a practical learning algorithm that produces a relatively compact solution as fast as possible. 2. In a Cascade net, each new hidden unit can receive inputs from all pre-existing hidden units. Therefore, each new unit is potentially a new layer. White's results show that you don't really NEED more than a single hidden layer, but having more layers can sometimes result in a very dramatic reduction in the total number of units and weights needed to solve a given problem. 3. There is no convergence proof for Cascade-Correlation. The candidate training phase, in which we try to create new hidden units by hill-climbing in some correlation measure, can and does get stuck in local maxima of this function. That's one reason we use a pool of candidate units: by training many candidates at once, we can greatly reduce the probability of creating new units that do not contribute significantly to the solution, but with a finite candidate pool we can never totally eliminate this possibility. It would not be hard to modify Cascade-Correlation to guarantee that it will eventually grind out a solution. The hard part, for a practical learning algorithm, is to guarantee that you'll find a "reasonably good" solution, however you want to define that. The recent work of Gallant and of Frean are interesting steps in this direction, at least for binary-valued transfer functions and fixed, finite training sets. -- Scott From jamesp at chaos.cs.brandeis.edu Mon Aug 6 21:38:40 1990 From: jamesp at chaos.cs.brandeis.edu (James Pustejovsky) Date: Mon, 6 Aug 90 21:38:40 edt Subject: Visit to Poland In-Reply-To: erol@ehei.ehei.fr's message of Mon, 6 Aug 90 09:49:31 +2 <9008061014.AA24279@inria.inria.fr> Message-ID: <9008070138.AA17019@chaos.cs.brandeis.edu> please withdraw my name from the list. there is too much random and irrelevant noise around the occasional noteworthy bit. From ericj at starbase.MITRE.ORG Tue Aug 7 08:33:27 1990 From: ericj at starbase.MITRE.ORG (Eric Jenkins) Date: Tue, 7 Aug 90 08:33:27 EDT Subject: ref for conjugate-gradient... Message-ID: <9008071233.AA25689@starbase> Would someone please post a pointer to info on conjugate-gradient methods of error minimization. Thanks. Eric Jenkins (ericj at ai.mitre.org) From erol at ehei.ehei.fr Tue Aug 7 07:06:28 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Tue, 7 Aug 90 11:08:28 +2 Subject: Call for Papers - ICGA-91 Message-ID: <9008071511.AA21568@inria.inria.fr> Concerning the scope of the conference, could the program chairman indicate what the boundaries of the area of genetic algorithms are in the context of this meeting ? This can be indicated by providing one or more references the conference chairman considers to be "typical" work in this area. Erol Gelenbe From erol at ehei.ehei.fr Tue Aug 7 10:42:33 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Tue, 7 Aug 90 14:44:33 +2 Subject: postdoc position available Message-ID: <9008071513.AA21597@inria.inria.fr> From jose at learning.siemens.com Tue Aug 7 19:55:05 1990 From: jose at learning.siemens.com (Steve Hanson) Date: Tue, 7 Aug 90 18:55:05 EST Subject: Similarity to Cascade-Correlation Message-ID: <9008072355.AA05108@learning.siemens.com.siemens.com> Scott: Isn't CC just Cart? Steve From schraudo%cs at ucsd.edu Tue Aug 7 15:05:35 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Tue, 7 Aug 90 12:05:35 PDT Subject: Similarity to Cascade-Correlation Message-ID: <9008071905.AA10253@beowulf.ucsd.edu> > From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU > > How does [White et al.'s] new work compare with the Cascade Correlation > method developed by Fahlman [...]? In practical terms, very badly. Their algorithm's point is purely theore- tical: they can prove convergence from only a very small base of assumptions about the function to be learned. Do any similar proofs exist for Cascade Correlation? That would be interesting. -- Nicol N. Schraudolph, C-014 nici%cs at ucsd.edu University of California, San Diego nici%cs at ucsd.bitnet La Jolla, CA 92093-0114 ...!ucsd!cs!nici From erol at ehei.ehei.fr Wed Aug 8 07:12:24 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Wed, 8 Aug 90 11:14:24 +2 Subject: abstract Message-ID: <9008081009.AA23199@inria.inria.fr> I would be very interested to get a copy of this paper. Thankyou in advance, Erol Gelenbe erol at ehei.ehei.fr From pkube at ucsd.edu Wed Aug 8 15:23:30 1990 From: pkube at ucsd.edu (pkube@ucsd.edu) Date: Wed, 08 Aug 90 13:23:30 MDT Subject: ref for conjugate-gradient... In-Reply-To: Your message of Tue, 07 Aug 90 08:33:27 EDT. <9008071233.AA25689@starbase> Message-ID: <9008082023.AA07129@kokoro.ucsd.edu> For understanding and implementing conjugate gradient and other optimization methods cleverer than vanilla backprop, I've found the following to be useful: %A William H. Press %T Numerical Recipes in C: The Art of Scientific Computing %I Cambridge University Press %D 1988 %A J. E. Dennis %A R. B. Schnabel %T Numerical Methods for Unconstrained Optimization and Nonlinear Equations %I Prentice-Hall %D 1983 %A R. Fletcher %T Practical Methods of Optimization, Vol. 1: Unconstrained Optimization %I John Wiley & Sons %D 1980 --Paul Kube at ucsd.edu From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Wed Aug 8 10:09:48 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Wed, 08 Aug 90 10:09:48 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Wed, 08 Aug 90 08:39:55 -0500. <9008081339.AA05550@learning.siemens.com.siemens.com> Message-ID: I got this clarification from Steve Hanson of his original query, which I found a bit cryptic: Isn't cascade Correlation a version (almost exact except for splitting rule--although I believe CART allows for other splitting rules) of CART---the decision tree with the hyperplane feature space cuts...? My memory of Cart is a bit fuzzy, but I think it's very different from Cascade-Correlation. Unless I'm confused, here are a couple of glaring differences: 1. In a decision-tree setup like CART, each new split works within one of the regions of space that you've already carved out -- that is, within only one branch of the tree. So for something like N-bit parity, you'd need 2^N hidden units (hyperplanes). In a single-layer backprop net, you need only N hidden units because they are shared. Because it creates higher-order units, Cascade-Correlation can generally do the job in less than N. (See the results in the Cascade-Correlation paper.) I don't remember if any version of CART makes serendipitous use of hyperplanes that were created earlier to split other branches. I am pretty sure, however, that it works on splitting just one branch at a time, and doesn't actively try to create hyperplanes that are useful in splitting many branches at once. 2. If you create all your new hidden units in a single layer, all you can do is create hyperplanes in the original space of input features. Because it builds up multiple layers, Cascade-Correlation can create higher-order units of great complexity, not just hyperplanes. If you have the tech report on Cascade-Correlation (the diagrams had to be cut from the NIPS version due to page limitations), look at the strange complex curves it creates in solving the two-spirals problem. If you prefer, Cascade-Correlation works by raising the dimensionality of the space and then drawing hyperplanes in this new complex space, but the projection back onto the original input space does not look like a straight line. I've never heard of anyone solving the two-spirals problem with a single layer of sigmoid or threshold units -- it would take an awful lot of them. I think that these two differences change the game entirely. The only resemblance I see between CART and Cascade-Correlation is that both build up a structure little by little, trying to add new nonlinear elements that eliminate some part of the remaining error. But the kinds of structures the two algorithms deal in is qualitatively different. -- Scott From pollack at cis.ohio-state.edu Wed Aug 8 02:11:16 1990 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Wed, 8 Aug 90 02:11:16 -0400 Subject: Cascade-Correlation, etc Message-ID: <9008080611.AA11352@dendrite.cis.ohio-state.edu> Scott's description of his method and the need for a convergence proof, reminded me of the line of research by Meir & Domany (Complex Sys 2 1988) and Nadal & Mezard (Int.Jrnl. Neural Sys 1,1,1989). In a paper definitely related to theirs (which I cannot find), someone proved (by construction) that each hidden unit added on top of a feedforward TLU network could monotonically decrease the number of errors for arbitrary-fan-in, single-output boolean functions. This result might be generalizable to CC networks. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Fax/Phone: (614) 292-4890 From FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU Thu Aug 9 01:51:08 1990 From: FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU (Tal Grossman) Date: Thu, 09 Aug 90 08:51:08 +0300 Subject: Network Constructing Algorithms. Message-ID: Network constructing algorithms, i.e. learning algorithms which add units while training, receive a lot of interest these days. I've recently compiled a reference list of papers presenting such algorithms. I send this list as a small contribution to the last discussion. I hope people will find it relevant and usefull. Of course, it is probably not exhaostive - and I'd like to hear about any other related work. Note that two refs. are quite old (Hopcroft and Cameron) - from the threshold logic days. A few papers include convergence proofs (Frean, Gallant, Mezard and Nadal, Marchand et al). Naturally, there is a significant overlap between some of the algorithms/architecture. I also appologize for the primitive Tex format. Tal grossman < fegross at weizmann> Electronics Dept. Weizmann Inst. Rehovot 76100, ISRAEL. ------------------------------------------------------------------------------- \centerline{\bf Network Generating Learning Algortihms - Refernces.} T. Ash, ``Dynamic Node Creation in Back-Propagation Networks", Tech.Rep.8901, Inst. for Cognitive Sci., Univ. of California, San-Diego. Cameron S.H., ``The Generation of Minimal Threshold Nets by an Integer Program", IEEE TEC {\bf EC-13},299 (1964). S.E. Fahlman and C.L. Lebiere, ``The Cascade-Correlation Learning Architecture", in {\it Advances in Neural Information Processing Systems 2}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 524. M. Frean, ``The Upstart Algorithm: a Method for Constructing and Trainig Feed Forward Neural Networks", Neural Computation {\bf 2}:2 (1990). S.I. Gallant, ``Perceptron -Based Learning Algorithms", IEEE Trans. on Neural Networks {\bf 1}, 179 (1990). M. Golea and M. Marchand, ``A Growth Algorithm for Neural Network Decision Trees", EuroPhys.Lett. {\bf 12}, 205 (1990). S.J. Hanson, ``Meiosis Networks", in {\it Advances in Neural Information Processing Systems 2}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 533. Honavar V. and Uhr L. in the {\it Proc. of the 1988 Connectionist Models Summer School}, Touretzky D., Hinton G. and Sejnowski T. eds. (Morgan Kaufmann, San Mateo, 1988). Hopcroft J.E. and Mattson R.L., ``Synthesis of Minimal Threshold Logic Networks", IEEE TEC {\bf EC-14}, 552 (1965). Mezard M. and Nadal J.P., ``Learning in Feed Forward Layered Networks - The Tiling Algorithm", J.Phys.A {\bf 22}, 2129 (1989). J.Moody, ``Fast Learning in Multi Resolution Hierarchies", in {\it Advances in Neural Information Processing Systems 1}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1989). J.P. Nadal, ``Study of a Growth Algorithm for a Feed Forward Network", International J. of Neural Systems {\bf 1}, 55 (1989). Rujan P. and Marchand M., ``Learning by Activating Neurons: A New Approach to Learning in Neural Networks", Complex Systems {\bf 3}, 229 (1989); and also in the {\it Proc. of the First International Joint Conference on Neural Networks - Washington D.C. 1989}, Vol.II, pp.105. J.A. Sirat and J.P. Nadal, ``Neural Trees: A New Tool for Classification", preprint, submitted to "Network", April 90. \bye From LAUTRUP at nbivax.nbi.dk Thu Aug 9 05:19:00 1990 From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup) Date: Thu, 9 Aug 90 11:19 +0200 (NBI, Copenhagen) Subject: International Journal of Neural Systems Message-ID: <510E1F38537FE1E6AD@nbivax.nbi.dk> Begin Message: ----------------------------------------------------------------------- INTERNATIONAL JOURNAL OF NEURAL SYSTEMS The International Journal of Neural Systems is a quarterly journal which covers information processing in natural and artificial neural systems. It publishes original contributions on all aspects of this broad subject which involves physics, biology, psychology, computer science and engineering. Contributions include research papers, reviews and short communications. The journal presents a fresh undogmatic attitude towards this multidisciplinary field with the aim to be a forum for novel ideas and improved understanding of collective and cooperative phenomena with computational capabilities. ISSN: 0129-0657 (IJNS) ---------------------------------- Contents of issue number 3 (1990): 1. A. S. Weigend, B. A. Huberman and D. E. Rumelhart: Predicting the future: A connectionist approach. 2. C. Chinchuan, M. Shanblatt and C. Maa: An artificial neural network algorithm for dynamic programming. 3. L. Fan and T. Li: Design of competition based neural networks for combinatorial optimization. 4. E. A. Ferran and R. P. J. Perazzo: Dislexic behaviour of feed-forward neural networks. 5. E. Milloti: Sigmoid versus step functions in feed-forward neural networks. 6. D. Horn and M. Usher: Excitatory-inhibitory networks with dynamical thresholds. 7. J. G. Sutherland: A holographic model of memory, learning and expression. 8. L. Xu: Adding top-down expectations into the learning procedure of self-organizing maps. 9. D. Stork: BOOK REVIEW ---------------------------------- Editorial board: B. Lautrup (Niels Bohr Institute, Denmark) (Editor-in-charge) S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) D. Stork (Stanford) (Book review editor) Associate editors: B. Baird (Berkeley) D. Ballard (University of Rochester) E. Baum (NEC Research Institute) S. Bjornsson (University of Iceland) J. M. Bower (CalTech) S. S. Chen (University of North Carolina) R. Eckmiller (University of Dusseldorf) J. L. Elman (University of California, San Diego) M. V. Feigelman (Landau Institute for Theoretical Physics) F. Fogelman-Soulie (Paris) K. Fukushima (Osaka University) A. Gjedde (Montreal Neurological Institute) S. Grillner (Nobel Institute for Neurophysiology, Stockholm) T. Gulliksen (University of Oslo) D. Hammerstroem (University of Oregon) J. Hounsgaard (University of Copenhagen) B. A. Huberman (XEROX PARC) L. B. Ioffe (Landau Institute for Theoretical Physics) P. I. M. Johannesma (Katholieke Univ. Nijmegen) M. Jordan (MIT) G. Josin (Neural Systems Inc.) I. Kanter (Princeton University) J. H. Kaas (Vanderbilt University) A. Lansner (Royal Institute of Technology, Stockholm) A. Lapedes (Los Alamos) B. McWhinney (Carnegie-Mellon University) M. Mezard (Ecole Normale Superieure, Paris) A. F. Murray (University of Edinburgh) J. P. Nadal (Ecole Normale Superieure, Paris) E. Oja (Lappeenranta University of Technology, Finland) N. Parga (Centro Atomico Bariloche, Argentina) S. Patarnello (IBM ECSEC, Italy) P. Peretto (Centre d'Etudes Nucleaires de Grenoble) C. Peterson (University of Lund) K. Plunkett (University of Aarhus) S. A. Solla (AT&T Bell Labs) M. A. Virasoro (University of Rome) D. J. Wallace (University of Edinburgh) D. Zipser (University of California, San Diego) ---------------------------------- CALL FOR PAPERS Original contributions consistent with the scope of the journal are welcome. Complete instructions as well as sample copies and subscription information are available from The Editorial Secretariat, IJNS World Scientific Publishing Co. Pte. Ltd. 73, Lynton Mead, Totteridge London N20 8DH ENGLAND Telephone: (44)1-446-2461 or World Scientific Publishing Co. Inc. 687 Hardwell St. Teaneck New Jersey 07666 USA Telephone: (1)201-837-8858 or World Scientific Publishing Co. Pte. Ltd. Farrer Road, P. O. Box 128 SINGAPORE 9128 Telephone (65)278-6188 ----------------------------------------------------------------------- End Message From tgd at turing.CS.ORST.EDU Thu Aug 9 01:36:10 1990 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Wed, 8 Aug 90 22:36:10 PDT Subject: Similarity to Cascade-Correlation In-Reply-To: Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU's message of Wed, 08 Aug 90 10:09:48 EDT <9008090152.AA19554@CS.ORST.EDU> Message-ID: <9008090536.AA01129@turing.CS.ORST.EDU> As someone with a lot of experience in decision-tree learning algorithms, I agree with Scott. The main similarity between Cascade-Correlation (CC) and decision tree algorithms like CART is that they are both greedy. CART and related algorithms (e.g., ID3, C4, CN2, GREEDY3) all work by choosing an (axis-parallel) hyperplane and then subdividing the training data along that hyperplane, whereas CC keeps all of the training data together and keeps retraining the output units as it incrementlly adds hidden units. There is an algorithm, called FRINGE, that learns a decision tree and then uses that tree to define new features which are then used to build a new tree (and this process can be repeated, of course). This is the best example I know of a non-connectionist (supervised) algorithm for defining new features. --Tom From jose at learning.siemens.com Thu Aug 9 10:14:39 1990 From: jose at learning.siemens.com (Steve Hanson) Date: Thu, 9 Aug 90 09:14:39 EST Subject: Similarity to Cascade-Correlation Message-ID: <9008091414.AA07343@learning.siemens.com.siemens.com> thanks for the clarification... however, as I understand CART, it is not required to construct an axis-parallel hyperplane (like ID3 etc..), like CC any hyperplane is possible. Now as I understand CC it does freeze the weights for each hidden unit once asymptotic learning takes place and takes as input to a next candidate hidden unit the frozen hidden unit output (ie hyperplane decision or discriminant function). Consequently, CC does not "...keep all of the training data together and retraining the output units (weights?) as it incrementlly adds hidden units". As to higher-order hidden units... I guess i see what you mean, however, don't units below simply send a decision concerning the subset of data which they have correctly classified? Consequently, units above see the usual input features and a newly learned hidden unit feature indicating that a some subset of the input vectors are on one side of its decision surface? right? Consequently the next hidden unit in the "cascade" can learn to ignore that subset of the input space and concentrate on other parts of the input space that requires yet another hyperplane? It seems as tho this would produce a branching tree of discriminantS similar to cart. n'est pas? Steve From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Thu Aug 9 11:38:51 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Thu, 09 Aug 90 11:38:51 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Thu, 09 Aug 90 09:14:39 -0500. <9008091414.AA07343@learning.siemens.com.siemens.com> Message-ID: Now as I understand CC it does freeze the weights for each hidden unit once asymptotic learning takes place and takes as input to a next candidate hidden unit the frozen hidden unit output (ie hyperplane decision or discriminant function). Right. The frozen hidden unit becomes available both for forming an output and as an input to subsequent hidden units. An aside: Instead of "freezing", I've decided to call this "tenure" from now on. When a candidate unit becomes tenured, it no longer has to learn any new behavior, and from that point on other units will pay attention to what it says. Consequently, CC does not "...keep all of the training data together and retraining the output units (weights?) as it incrementlly adds hidden units". How does this follow from the above? As to higher-order hidden units... I guess i see what you mean, however, don't units below simply send a decision concerning the subset of data which they have correctly classified? It's not just a decision. The unit's output can assume any value in its continuous range. Some hidden units develop big weights and tend to act like sharp-threshold units, while others do not. Consequently, units above see the usual input features and a newly learned hidden unit feature indicating that a some subset of the input vectors are on one side of its decision surface? right? Right, modulo the comment above. Consequently the next hidden unit in the "cascade" can learn to ignore that subset of the input space and concentrate on other parts of the input space that requires yet another hyperplane? It seems as tho this would produce a branching tree of discriminantS similar to cart. No, this doesn't follow at all. Typically there are still errors on both sides of the unit just created, so the next unit doesn't ignore either "branch". It produces some new cut that typically subdivides all (or many) of the regions created so far. Again, I suggest you look at the diagrams in the tech report to see the kinds of "cuts" are actually created. n'est pas? Only eagles nest in passes. Lesser birds hide among the branches of decision trees. :-) -- Scott From Connectionists-Request at CS.CMU.EDU Thu Aug 9 13:33:59 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Thu, 09 Aug 90 13:33:59 EDT Subject: Return addresses Message-ID: <24309.650223239@B.GP.CS.CMU.EDU> I have received several complaints from Connectionists members that they are not able 'reply' to messages because the original sender's address has been removed from the message header. This is a problem with the receiver's local mailer. Rather than having me try to remotely trouble shoot 150 different mailers, the problem could be solved by including a return email address as part of the body of any message sent to Connectionists. I would also like to remind subscribers that a copy of main mailing list is available in the Connectionists archives. Scott Crowder Connectionists-Request at cs.cmu.edu (ARPAnet) <- see, it isn't that hard ------------------------------------------------------------------------------- The CONNECTIONISTS Archive: --------------------------- All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are now available for public perusal. A separate file exists for each month. The files' names are: arch.yymm where yymm stand for the obvious thing. Thus the earliest available data are in the file: arch.8802 Files ending with .Z are compressed using the standard unix compress program. To browse through these files (as well as through other files, see below) you must FTP them to your local machine. ------------------------------------------------------------------------------- How to FTP Files from the CONNECTIONISTS Archive ------------------------------------------------ 1. Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address 128.2.242.8). 2. Login as user anonymous with password your username. 3. 'cd' directly to one of the following directories: /usr/connect/connectionists/archives /usr/connect/connectionists/bibliographies 4. The archives and bibliographies directories are the ONLY ones you can access. You can't even find out whether any other directories exist. If you are using the 'cd' command you must cd DIRECTLY into one of these two directories. Access will be denied to any others, including their parent directory. 5. The archives subdirectory contains back issues of the mailing list. Some bibliographies are in the bibliographies subdirectory. Problems? - contact us at "Connectionists-Request at cs.cmu.edu". Happy Browsing Scott Crowder Connectionists-Request at cs.cmu.edu ------------------------------------------------------------------------------- From orjan at thalamus.sans.bion.kth.se Thu Aug 9 19:47:34 1990 From: orjan at thalamus.sans.bion.kth.se (Orjan Ekeberg) Date: Thu, 09 Aug 90 19:47:34 N Subject: Network Constructing Algorithms. In-Reply-To: Your message of Thu, 09 Aug 90 08:51:08 O. <9008091705.AAgarbo.bion.kth.se13977@garbo.bion.kth.se> Message-ID: <9008091747.AA12363@thalamus> I assume that some of the work that we have been doing would fit well in this context too. Based on a recurrent network, higher order units are added automatically. The new units become part of the recurrent set and helps to make the training patterns fixpoints of the network. A couple of references (in bibtex format): @inproceedings{sans:alaoe87, author = {Anders Lansner and {\"O}rjan Ekeberg}, year = 1987, title = {An Associative Network Solving the ``4-Bit ADDER Problem''}, booktitle = {Proceedings of the IEEE First Annual International Conference on Neural Networks}, pages = {II{-}549}, address = {San Diego, USA}, month = jun} @inproceedings{sans:paris88, author = {{\"O}rjan Ekeberg and Anders Lansner}, year = 1988, title = {Automatic Generation of Internal Representations in a Probabilistic Artificial Neural Network}, booktitle = {Neural Networks from Models to Applications}, editor = {L. Personnaz and G. Dreyfus}, publisher = {I.D.S.E.T.}, address = {Paris}, pages = {178--186}, note = {Proceedings of {nEuro}-88, The First European Conference on Neural Networks}, abstract = {In a one layer feedback perceptron type network, the connections can be viewed as coding the pairwise correlations between activity in the corresponding units. This can then be used to make statistical inference by means of a relaxation technique based on bayesian inferences. When such a network fails, it might be because the regularities are not visible as pairwise correlations. One cure would then be to use a different internal coding where selected higher order correlations are explicitly represented. A method for generating this representation automatically is reviewed and results from experiments regarding the resulting properties is presented with a special focus on the networks ability to generalize properly.}} +---------------------------------+-----------------------+ + Orjan Ekeberg + O---O---O + + Department of Computing Science + \ /|\ /| Studies of + + Royal Institute of Technology + O-O-O-O Artificial + + S-100 44 Stockholm, Sweden + |/ \ /| Neural + +---------------------------------+ O---O-O Systems + + EMail: orjan at bion.kth.se + SANS-project + +---------------------------------+-----------------------+ From pollack at cis.ohio-state.edu Thu Aug 9 12:14:19 1990 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Thu, 9 Aug 90 12:14:19 -0400 Subject: Cascade Correlation and Convergence Message-ID: <9008091614.AA14222@dendrite.cis.ohio-state.edu> Scott's description of his algorithm, and lack of convergence proof, reminded me of the line of research by Meir and Domany (Complex Systems 2, 1988) and Mezard and Nadal (Int J Neu Systems, 1,1 1989) on methods for directly constructing networks. In a related paper (which I cannot find), I'm quite sure that someone proved by construction that any (n input, 1 output) boolean function could be accomplished by a layering of TLU's, where each additional unit is guaranteed to decrease the number of mis-classified inputs. Perhaps this approach would help lead to some convergence proof for CC networks. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Fax/Phone: (614) 292-4890 From bgupta at aries.intel.com Thu Aug 9 19:19:58 1990 From: bgupta at aries.intel.com (Bhusan Gupta) Date: Thu, 9 Aug 90 16:19:58 PDT Subject: Job opening at Intel for NN IC designer Message-ID: <9008092319.AA04843@aries> The neural network group at Intel is looking for an engineer to participate in the development of neural networks. A qualified applicant should have a M.S. or PhD in electrical engineering or equivalent experience. The specialization required is in CMOS circuit design with an emphasis on digital design. Analog design experience is considered useful as well. Familiarity with neural network architectures, learning algorithms, and applications is desirable. The duties that are specific to this job are: Neural network design. Architecture definition and circuit design. Chip planning, layout supervision and verification. Testing and debugging silicon. The neural network design consists primarily of digital design with both a gate-level and transistor-level emphasis. The job is at the Santa Clara site and is currently open. Interested principals can email at bgupta at aries.intel.com until the end of August. Resumes in ascii are preferred. I will pass along all responses to the appropriate people. street address: Bhusan Gupta m/s sc9-40 2250 Mission College Blvd. P.O. Box 58125 Santa Clara, Ca 95052 Intel is an equal opportunity employer, etc. Bhusan Gupta From sg at corwin.ccs.northeastern.edu Thu Aug 9 14:34:35 1990 From: sg at corwin.ccs.northeastern.edu (steve gallant) Date: Thu, 9 Aug 90 14:34:35 EDT Subject: Cascade-Correlation, etc Message-ID: <9008091834.AA18306@corwin.CCS.Northeastern.EDU> To respond to Jordan's suggestion, if you copy the output cell from a stage in cascade correlation into your growing network, then the previous convergence results hold for boolean learning problems. This is true whether you copy at every stage or only occasionally. Scott tried a few simulations and there seemed to be some learning speed gain by occasional copying, perhaps 25% on the couple of tests he ran. Also, if I can add an early paper (that includes convergence) to Tal Grossman's list: Gallant, S. I\@. Three Constructive Algorithms for Network Learning. Proc.\ Eighth Annual Conference of the Cognitive Science Society, Amherst, Ma., Aug. 15-17, 1986, 652-660. Steve Gallant From marcus at cns.edinburgh.ac.uk Fri Aug 10 16:37:13 1990 From: marcus at cns.edinburgh.ac.uk (Marcus Frean) Date: Fri, 10 Aug 90 16:37:13 BST Subject: Convergence of constructive algorithms. Message-ID: <8340.9008101537@cns.ed.ac.uk> Jordan Pollack writes: > In a related paper (which I cannot find), I'm quite sure that someone > proved by construction that any (n input, 1 output) boolean function > could be accomplished by a layering of TLU's, where each additional > unit is guaranteed to decrease the number of mis-classified inputs. > Perhaps this approach would help lead to some convergence proof for CC > networks. There are several papers that show convergence via guaranteeing each unit reduces the output's errors by at least one. [NB: They all use linear threshold units, and require for convergence that the training set be composed of binary patterns (or at least convex: every pattern must be separable from all the others), since then the worst case is always that a new unit captures a single pattern and hence is able to correct the output unit by one.] These include The "Tower algorithm": Gallant,S.I. 1986a. Three Constructive Algorithms for Network Learning. Proc. 8th Annual Conf. of Cognitive Science Soc. p652-660. also discussed in Nadal,J. 1989. Study of a Growth Algorithm for Neural Networks International J. of Neural Systems, 1,1:55-59 The performance of this method closely matches that of the "Tiling" Algorithm of Mezard and Nadal, although the proof there is for reduction of at least one error per layer rather than per unit. The "neural decision tree" approach is shown to converge by M. Golea and M. Marchand, A Growth Algorithm for Neural Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). and also J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for Classification, preprint, submitted to "Network", April 90. The "Upstart" algorithm (my favourite....) Frean,M.R. 1990. The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks. Neural Computation. 2:2, 198-209. in which new units are devoted to correcting errors made by existing units (in this sense it has bears some resemblance to Cascade Correlation). A binary tree of units is constructed, but it is not a decision tree: "daughter" units correct their "parent", with the most senior parent being the output unit. Marcus. --------------------------------------------------------------------- From fanty at cse.ogi.edu Fri Aug 10 13:00:03 1990 From: fanty at cse.ogi.edu (Mark Fanty) Date: Fri, 10 Aug 90 10:00:03 -0700 Subject: conjugate gradient optimization program available Message-ID: <9008101700.AA03174@cse.ogi.edu> The speech group at OGI uses conjugate-gradient optimization to train fully connected feed-forward networks. We have made the program (OPT) available for anonymous ftp: 1. ftp to cse.ogi.edu 2. login as "anonymous" with any password 3. cd to "pub/speech" 4. get opt.tar OPT was written by Etienne Barnard at Carnegie-Mellon University. Mark Fanty Computer Science and Engineering Oregon Graduate Institute fanty at cse.ogi.edu 196000 NW Von Neumann Drive (503) 690-1030 Beaverton, OR 97006-1999 From amini at tcville.hac.com Sun Aug 12 23:47:14 1990 From: amini at tcville.hac.com (Afshin Amini) Date: Sun, 12 Aug 90 20:47:14 PDT Subject: signal processing with neural nets Message-ID: <9008130347.AA02757@ai.spl> Hi there: I would like to explore possibilities of using neural nets in a signal processing environment. I would like to get familiar with usage of neural nets in the area of spectral estimation and classification. I have used the popular methods of high resolution spectral estimation such as AR modeling and such. I would like to get some reffrences to recent publications and books that contain specific algorithms that deploys neural networks to achieve such problems in signal processing. thanks, -A. Amini -- Afshin Amini Hughes Aircraft Co. voice: (213) 616-6558 Electro-Optical and Data Systems Group Signal Processing Lab fax: (213) 607-0918 P.O. Box 902, EO/E1/B108 email: El Segundo, CA 90245 smart: amini at tcville.hac.com Bldg. E1 Room b2316f dumb: amini%tcville at hac2arpa.hac.com uucp: hacgate!tcville!dave From nelsonde%avlab.dnet at wrdc.af.mil Mon Aug 13 10:10:04 1990 From: nelsonde%avlab.dnet at wrdc.af.mil (nelsonde%avlab.dnet@wrdc.af.mil) Date: Mon, 13 Aug 90 10:10:04 EDT Subject: Last Call for Papers for AGARD Conference Message-ID: <9008131410.AA08887@wrdc.af.mil> I N T E R O F F I C E M E M O R A N D U M Date: 13-Aug-1990 10:05am EST From: Dale E. Nelson NELSONDE Dept: AAAT-1 Tel No: 57646 From sankar at caip.rutgers.edu Sun Aug 12 21:15:24 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Sun, 12 Aug 90 21:15:24 EDT Subject: No subject Message-ID: <9008130115.AA08572@caip.rutgers.edu> >>There are several papers that show convergence via guaranteeing each >>unit reduces the output's errors by at least one. >> >> >>The "neural decision tree" approach is shown to converge by >> M. Golea and M. Marchand, A Growth Algorithm for Neural >> Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). >>and also >> J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for >> Classification, preprint, submitted to "Network", April 90. Add to this the following paper: A. Sankar and R.J. Mammone, " A fast learning algorithm for tree neural networks", presented at the 1990 Conference on Information Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. This will appear in the conference proceedings. We also have a more detailed technical report on this research. For copies please contact Ananth Sankar CAIP 117 Brett and Bowser Roads Rutgers University P.O. Box 1390 Piscataway, NJ 08855-1390 From sankar at caip.rutgers.edu Mon Aug 13 13:48:53 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Mon, 13 Aug 90 13:48:53 EDT Subject: No subject Message-ID: <9008131748.AA07712@caip.rutgers.edu> An earlier attempt to mail this seems to have failed..my apologies to everyone who gets a duplicate copy. >>There are several papers that show convergence via guaranteeing each >>unit reduces the output's errors by at least one. >> >> >>The "neural decision tree" approach is shown to converge by >> M. Golea and M. Marchand, A Growth Algorithm for Neural >> Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). >>and also >> J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for >> Classification, preprint, submitted to "Network", April 90. Add to this the following paper: A. Sankar and R.J. Mammone, " A fast learning algorithm for tree neural networks", presented at the 1990 Conference on Information Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. This will appear in the conference proceedings. We also have a more detailed technical report on this research. For copies please contact Ananth Sankar CAIP 117 Brett and Bowser Roads Rutgers University P.O. Box 1390 Piscataway, NJ 08855-1390 From gary%cs at ucsd.edu Mon Aug 13 15:35:50 1990 From: gary%cs at ucsd.edu (Gary Cottrell) Date: Mon, 13 Aug 90 12:35:50 PDT Subject: Summary (long): pattern recognition comparisons In-Reply-To: Leonard Uhr's message of Fri, 3 Aug 90 14:18:11 -0500 <9008031918.AA23586@thor.cs.wisc.edu> Message-ID: <9008131935.AA19428@desi.ucsd.edu> Leonard Uhr says: >Neural nets using backprop have only handled VERY SIMPLE images, usually in >8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in >logarithmically converging nets, but I don't know of any nets with complete >connectivity from one layer to the next that are that big.) Mike Fleming and I used 64x64 inputs for face recognition. The system does auto-encoding as a preprocessing step, reducing the number of inputs to 80. See IJCNN-90, Vol II p65->. gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at cs.ucsd.edu (ARPA) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET) gcottrell at ucsd.edu (BITNET) From kuepper at ICSI.Berkeley.EDU Tue Aug 14 14:32:37 1990 From: kuepper at ICSI.Berkeley.EDU (Wolfgang Kuepper) Date: Tue, 14 Aug 90 11:32:37 PDT Subject: SIEMENS Job Announcement Message-ID: <9008141832.AA02344@icsib21.Berkeley.EDU> IMAGE UNDERSTANDING and ARTIFICIAL NEURAL NETWORKS The Corporate Research and Development Laboratories of Siemens AG, one of the largest companies worldwide in the electrical and elec- tronics industry, have research openings in the Computer Vision as well as in the Neural Network Groups. The groups do basic and applied studies in the areas of image understanding (document inter- pretation, object recognition, 3D modeling, application of neural networks) and artificial neural networks (models, implementations, selected applications). The Laboratory is located in Munich, an attractive city in the south of the Federal Republic of Germany. Connections exists with our sister laboratory, Siemens Corporate Research in Princeton, as well as with various research institutes and universities in Germany and in the U.S. including MIT, CMU and ICSI. Above and beyond the Laboratory facilities, the groups have a network of Sun and DEC workstations, Symbolics Lisp machines, file and compute servers, and dedicated image processing hardware. The successful candidate should have an M.S. or Ph.D. in Computer Science, Electrical Engineering, or any other AI-related or Cognitive Science field. He or she should prefarably be able to communicate in German and English. Siemens is an equal opportunity employer. Please send your resume and a reference list to Peter Moeckel Siemens AG ZFE IS INF 1 Otto-Hahn-Ring 6 D-8000 Muenchen 83 West Germany e-mail: gm%bsun4 at ztivax.siemens.com Tel. +49-89-636-3372 FAX +49-89-636-2393 Inquiries may also be directed to Wolfgang Kuepper (on leave from Siemens until 8/91) International Computer Science Institute 1947 Center Street - Suite 600 Berkeley, CA 94704 e-mail: kuepper at icsi.berkeley.edu Tel. (415) 643-9153 FAX (415) 643-7684 From Connectionists-Request at CS.CMU.EDU Thu Aug 16 12:31:34 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Thu, 16 Aug 90 12:31:34 EDT Subject: patience is a virtue Message-ID: <4776.650824294@B.GP.CS.CMU.EDU> Recently a few people have worried that their posts were lost because of the long resend time for messages to the connectionists list. I would like for all users to exercise a little patience. CMU is happy to provide the resources and labor necessary to make the Connectionists list available to the world wide connectionists community. However, we do have limited resources. The Connectionists redistribution machine is a only a VAX 750. This machine also services several other large mailing lists. Delays of 4-6 hours are typical, but delays of >16 hours are possible during high traffic periods. If you are trying to debate an issue with another list member, but think the rest of the list would be interested in the debate it is best to email directly to the other member and cc: Connectionists at cs.cmu.edu. This allows you to carry on your debate at normal email speeds and lets the rest of the community 'listen in' 6-16 hrs latter. If you feel that the delays are a serious impediment to the research progress of the connectionists community, CMU would be happy to accept your donation of new dedicated Connectionists redistribution machine. Scott Crowder Connectionists-Request at cs.cmu.edu (ARPAnet) PS If you have waited more than 24 hours and STILL haven't recieved your post, please contact me at Connectionists-Request at cs.cmu.edu. From xiru at Think.COM Fri Aug 17 16:48:58 1990 From: xiru at Think.COM (xiru@Think.COM) Date: Fri, 17 Aug 90 16:48:58 EDT Subject: backprop for classification Message-ID: <9008172048.AA00756@yangtze.think.com> While we trained a standard backprop network for some classification task (one output unit for each class), we found that when the classes are not evenly distribed in the training set, e.g., 50% of the training data belong to one class, 10% belong to another, ... etc., then the network always biased towards the classes that have the higher percentage in the training set. Thus, we had to post-process the output of the network, giving more weights to the classes that occur less frequently (in reverse proportion to their population). I wonder if other people have encountered the same problem, and if there are better ways to deal with this problem. Thanks in advance for any replies. - Xiru Zhang Thinking Machines Corp. From John.Hampshire at SPEECH2.CS.CMU.EDU Sun Aug 19 13:48:06 1990 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Sun, 19 Aug 90 13:48:06 EDT Subject: backprop for classification Message-ID: Xiru Zhang of Thinking Machines Corp. writes: > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > Thus, we had to post-process the output of the network, giving more weights > to the classes that occur less frequently (in reverse proportion to their > population). > > I wonder if other people have encountered the same problem, and if there > are better ways to deal with this problem. Indeed, one can show that any classifier with sufficient functional capacity to model the class-conditional densities of the random vector X being classified (e.g., a MLP with sufficient connectivity to perform the input-to-output functional mapping necessary for robust classification) and trained with a "reasonable error measure" (a term originated by B. Pearlmutter) will yield outputs that are accurate estimates of the a posteriori probabilities of X, given an asymptotically large number of statistically independent training samples. Examples of "reasonable error measures" are mean-squared error (the one used by Xiru Zhang), Cross Entropy, Max. Mutual Info., Kullback-Liebler distance, Max. Likelihood... Unfortunately, one never has enough training data, and it's not always clear what constitutes sufficient but not excessive functional capacity in the classifier. So one ends up *estimating* the a posterioris with one's "reasonable error measure"-trained classifier. If one trains one's classifier with a disproportionately high number of samples belonging to one particular class, one will get precisely the behavior Xiru Zhang describes. ************** This is because the a posterioris depend on the class priors (you can prove this easily using Bayes' rule). If you bias the priors, you will bias the a posterioris accordingly. Your classifier will therefore learn to estimate the biased a posterioris. ************** The best way to fix the problem if you're using a "reasonable error measure" to train your classifier is to have a training set that reflects the true class priors. If this isn't possible, then you can post-process the classifier's outputs by correcting for the biased priors. Whether or not this fix really works depends a lot on the classifier you're using. MLPs tend to be over-parameterized, so they tend to yield binary outputs that won't be affected by this kind of post processing. Another approach might be to avoid using "reasonable error measures" to train your classifier. I have more info regarding such alternatives if anyone cares, but I've already blabbed too much. If you want refs., please send me email directly. Cheers, John From niranjan at engineering.cambridge.ac.uk Sun Aug 19 10:11:29 1990 From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan) Date: Sun, 19 Aug 90 10:11:29 BST Subject: backprop for classification Message-ID: <3447.9008190911@dsl.eng.cam.ac.uk> > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > This often happens when the network is too small to load the training data. Your network, in this case, does not converge to negligible error. My suggestion is to start with a large network that can load your training data and gradually reduce the size of the net by pruning the weights giving small contributions to the output error. niranjan From russ at dash.mitre.org Mon Aug 20 07:17:38 1990 From: russ at dash.mitre.org (Russell Leighton) Date: Mon, 20 Aug 90 07:17:38 EDT Subject: backprop for classification In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com> Message-ID: <9008201117.AA22280@dash.mitre.org> We have found backprop VERY sensitive to the probability of occurance of each class. As long as you are aware of this you can use this to advantange. For example, if false alarms are a big concern then by training with large amounts of "noise" you can bias the sytem to reduce the Pfa. This effect has been quantified analytically and experimentally for systems with no hidden layers in a paper being compiled now. The bottom line is that a no hidden layer system implements a classical Mini-Max test if the signal classes are represented equally in the training set. By varying the the composition of the training sets, the network can be designed relative to a known maximum false alarm probablity independent of signal-to-noise ratio. This work continues for multi-layer systems. An experimental account of how to exploit this effect for signal classification can be found in: Wieland, et al., `An Analysis of Noise Tolerance for a Neural Network Recognition System', Mitre Tech. Rep. MP-88W00021, 1988 and Wieland, et al., `Shaping Schedules as a Method of Accelerated Learning', Proceedings of the first INNS Meeting, 1988 Russ. NFSNET: russ at dash.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA From wan at whirlwind.Stanford.EDU Mon Aug 20 14:07:39 1990 From: wan at whirlwind.Stanford.EDU (Eric A. Wan) Date: Mon, 20 Aug 90 11:07:39 PDT Subject: Survey of Second Order Techniques Message-ID: <9008201807.AA13338@whirlwind.Stanford.EDU> I am compiling a study on the extent to which researches have gone beyond simple gradient descent (back-propagation) for training layered neural networks by applying more sophisticated classical techniques in non-linear optimization (e.g. Newton, Quasi-Newton, Conjugate-Gradient methods, etc.)? Please e-mail me any comments and/or references that you have on the subject. I will summarize the responses. Thanks in advance. Eric Wan wan at isl.stanford.edu From YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU Mon Aug 20 11:36:47 1990 From: YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU (Yves (Zip) Lacouture) Date: Mon, 20 Aug 90 11:36:47 HAE Subject: BP for categorization... Message-ID: > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > I encountered the same problem in a similar situation. This occur with limited resources (HU): the network tend to neglet a subset of the stimuli. The phenomenon is also observed when the stimuli have the same presentation probability and the resources are very limited. It helps to use a non-orthogonal representation (e.g. by activating neighbor units). To build a model of (human) simple identification I modified BP to incorporate a selective attention mechanism by which the adaptative modifications are made larger for the stimuli for which performances are worse. I expect to offer a TR on this topic soon. yves From chrisley at parc.xerox.com Mon Aug 20 13:35:08 1990 From: chrisley at parc.xerox.com (Ron Chrisley) Date: Mon, 20 Aug 90 10:35:08 PDT Subject: backprop for classification In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com> Message-ID: <9008201735.AA07158@owl.parc.xerox.com> Xiru, you wrote: "While we trained a standard backprop network for some classification task (one output unit for each class), we found that when the classes are not evenly distribed in the training set, e.g., 50% of the training data belong to one class, 10% belong to another, ... etc., then the network always biased towards the classes that have the higher percentage in the training set. Thus, we had to post-process the output of the network, giving more weights to the classes that occur less frequently (in reverse proportion to their population)." My suggestion: most BP classification paradigms will work best if you are using the same distribution for training as for testing. So only worry about uneven distribution of classes in the training data if the input on which the network will have to perform does not have that distribution. If rocks are 1000 times more common than mines, then given that something is completely qualitatively ambiguous with respect to the rock/mine distinction, it is best (in terms of minimizing # of misclassifications) to guess that the thing is a rock. So being biased toward rock classifications is a valid way to minimize misclassification. (Of course, once you start factoring in cost, this will be skewed dramatically: it is much better to have a false alarm about a mine than to falsely think a mine is a rock.) In summary, uneven distributions aren't, in themselves, bad for training, nor do they require any post-processing. However, distributions that differ from real-world ones will require some sort of post-processing, as you have done. But there is another issue here, I think. How were you using the network for classification? From your message, it sounds like you were training and interpreting the network in such a way that the activations of the output nodes were supposed to correspond to the conditional probabilities of the different classes, given the input. This would explain what you meant by your last sentence in the above quote. But there are other ways of using back-propagation. For instance, if one does not constrain the network to estimate conditional probabilities, but instead has it solve the more general problem of minimizing classification error, then it is possible that the network will come up with a solution that is not affected by differences of prior probabilities of classes in the training and testing data. Since it is not solving the problem by classifying via maximum liklihood, its solutions will be based on the frequency-independent, qualitative structure of the inputs. In fact, humans often do something like this. The phenomenon is called "base rate neglect". The phenomenon is notorious in that when qualitative differences are not so marked between a rare and a common class, humans will always over-classify inputs into the rare class. That is, if the symptoms a patient has even *slightly* indicate a rare tropical disease over a common cold, humans will give the rare disease dignosis, even though it is extremely unlikely that the patient has that disease. Of course, the issue of cost is again being ignored here. (See Gluck and Bower for a look at the relation between neural networks and base rate neglect). Such limitations aside, classification via means other than conditional probability estimation may be desirable for certain applications. For example, those in which you do not know the priors, or they change dramatically in an unpredictable way. And/or where there is a strong qualitative division bewteen members of the classes. In such cases, you might get good classification performance, even when the distributions differ, by relying more on qualitative differences in the inputs than in the frequency of the classes. Does this sound right? Ron Chrisley chrisley at csli.stanford.edu Xerox PARC SSL New College Palo Alto, CA 94304 Oxford OX1 3BN, UK (415) 494-4728 (865) 793-484 From niranjan at engineering.cambridge.ac.uk Tue Aug 21 20:20:36 1990 From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan) Date: Tue, 21 Aug 90 20:20:36 BST Subject: Backprop for classification Message-ID: <5229.9008211920@dsl.eng.cam.ac.uk> > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > This often happens when the network is too small to load the training data. Your network, in this case, does not converge to negligible error. My suggestion is to start with a large network that can load your training data and gradually reduce the size of the net by pruning the weights giving small contributions to the output error. niranjan From der%beren at Forsythe.Stanford.EDU Wed Aug 22 13:35:59 1990 From: der%beren at Forsythe.Stanford.EDU (Dave Rumelhart) Date: Wed, 22 Aug 90 10:35:59 PDT Subject: BP for categorization...relative frequency problem In-Reply-To: "Yves (Zip) Lacouture"'s message of Mon, 20 Aug 90 11:36:47 HAE <9008210406.AA11690@nprdc.navy.mil> Message-ID: <9008221735.AA07583@beren.> We have also encountered the problem. Since BP does gradient descent and since the contribution of any set of patterns depends in part on the relative frequency of those patterns, fewer resources are allocated to low fequency categories. Morover, those resources are allocated later in the training -- probably after over-fitting has already become a problem for higher frequency categories. Of course, if your training distribution is the same as your testing distribution you wil be getting the appropriate Baysian estimate of the class probabilities. On the other hand, if the generalization distribution is unknown at test time we may wish to factor out the relative frequency of your input frequency during training and add any known "priors" during generalization. There are two ways to do this. One way, suggested in one of the notes on this topic is to "post process" out output data. That is, divide the output unit value by the relative frequency in the training set and multiply by the relative frequency in the test set. This will give you an estimate of the Bayesian probability for the test set. For a variety of reasons, this is less appropriate that correcting during training. In this case, the procedure is to effectively increase the learning rate inversely proportional to the relative frequency of the category in the training set. Thus, we take bigger learning steps on low frequency categories. In a simple classification task, this is roughly equivalent to normalizing the data set by sampling each category set equally. In the case of cross-classification (in whihch a given input can be a member of more the one class), it is roughly equivalent to weighting each inversely by the probability that that pattern would occur, given independence between the output classes. We have used this method successfully in a system designed to classify mass spectra. In this method an output of .5 means that the evidence for and against the category is equal. Whereas, in the normal traing method, an output equal to the relative frequency in the training set means that the evidence for and against is equal. In some cases this can be very small. It is possibly to add the priors in manually and compare performance on the training set with the original method. We find that we do only slightly worse on the training set with the two methods. We do much better in generalization on classes that were low frequency in the training set and slightly worse on classes which were high frequency in the training set. der From hendler at cs.UMD.EDU Wed Aug 22 16:28:52 1990 From: hendler at cs.UMD.EDU (Jim Hendler) Date: Wed, 22 Aug 90 16:28:52 -0400 Subject: BP for categorization...relative frequency problem Message-ID: <9008222028.AA09120@dormouse.cs.UMD.EDU> Herve Bourlard and Nelson Morgan had to deal with this problem in a system being used in the context of continuous speech recognition. They solved the problem, to some extent, by dividing the output category strengths by the prior probabilities of the training set. This avoided having to do anything terribly tricky in the network, and let them use classical back-propagation without extension (although I think they've also used some recurrences in one version). I know there have been several nice publications of their work in speech - various papers with the authors Bourlard, Wellekens, and Morgan in various combinations. Morgan is at ICSI, and is probably the most accessible of these authors for requesting reprints. -Jim Hendler UMCP From PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU Wed Aug 22 14:47:17 1990 From: PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU (PSS001%VAXA.BANGOR.AC.UK@vma.CC.CMU.EDU) Date: Wed, 22 AUG 90 18:47:17 GMT Subject: No subject Message-ID: Department of Psychology, University of Wales, Bangor and Department of Psychology, University of York CONNECTIONISM AND PSYCHOLOGY THREE POST-DOCTORAL RESEARCH FELLOWSHIPS Applications are invited for three post-doctoral research fellowships to work on the connectionist and psychological modelling of human short-term memory and spelling development. Two Fellowships are available for three years, on an ESRC- funded project concerned with the development and evaluation of a connectionist model of short-term memory. One Fellow will be based with Dr. Gordon Brown in the Cognitive Neurocomputation Unit at Bangor and will be responsible for implementing the model. The other Fellow, based at York with Dr. Charles Hulme, will be responsible for undertaking psychological experiments with children and adults to evaluate the model. Starting salary for both posts on research 1A grade up to # 13,495. One two-year Fellowship is available to work on an MRC-funded project to develop a sequential connectionist model of the development of spelling and phonemic awareness in children. This post is based in Bangor with Dr. Gordon Brown. Starting salary on research 1A grade up to # 14,744. Applicants should have postgraduate research experience or interest in cognitive psychology/cognitive science or connectionist/ neural network modelling and computer science. Good computing skills are essential for the posts based in Bangor, and experience in running psychological experiments is required for the York-based post. Excellent computational and research facilities will be available to the successful applicants. The appointments may commence from 1st. October 1990, but start could be delayed until 1st. January 1991. Closing date for applications is 7th. September 1990, but intending applicants should get in touch as soon as possible. Informal enquiries regarding the Bangor-based posts, and requests for further details of the posts and host departments, to Gordon Brown (0248 351151 Ext 2624; email PSS001 at uk.ac.bangor.vaxa); informal enquiries concerning the York-based post to Charles Hulme ( 0904 433145; email ch1 at uk.ac.york.vaxa). Applications (in the form of a curriculum vitae and the names and addresses of two referees) should be sent to Mr. Alan James, Personnel Office, University of Wales, Bangor, Gwynedd LL57 2DG, UK. (Apologies to anyone who receives this posting through more than one list or newsgroup) From MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU Thu Aug 23 17:22:00 1990 From: MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU (MUSICO%BGERUG51.BITNET@vma.CC.CMU.EDU) Date: Thu, 23 Aug 90 17:22 N Subject: signoff Message-ID: signoff From HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU Fri Aug 24 12:08:15 1990 From: HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU (Gregory Kohring) Date: Fri, 24 Aug 90 12:08:15 MES Subject: Preprints Message-ID: The following preprint is currently available. -- Greg Kohring Performance Enhancement of Willshaw Type Networks through the use of Limit Cycles G.A. Kohring HLRZ an der KFA Julich (Supercomputing Center at the KFA Julich) Simulation results of a Willshaw type model for storing sparsely coded patterns are presented. It is suggested that random patterns can be stored in Willshaw type models by transforming them into a set of sparsely coded patterns and retrieving this set as a limit cycle. In this way, the number of steps needed to recall a pattern will be a function of the amount of information the pattern contains. A general algorithm for simulating neural networks with sparsely coded patterns is also discussed, and, on a fully connected network of N=36 864 neurons (1.4 billion couplings), it is shown to achieve effective updating speeds as high as 160 billion coupling evaluations per second on one Cray-YMP processor. ================================================================== Additionally, the following short review article is also available. It is aimed at graduate students in computational physics who need an overview of the neural network literature from a computational sciences viewpoint, as well as some simple programming hints in order to get started with their neural network studies. It will shortly appear in World Scientific's Internationl Journal of Modern Physics C: Compuational Physics. LARGE SCALE NEURAL NETWORK SIMULATIONS G.A. Kohring HLRZ an der KFA Julich (Supercomputing Center at the KFA Julich) The current state of large scale, numerical simulations of neural networks is reviewed. Hardware and software improvements make it likely that biological size networks, i.e., networks with more than $10^{10}$ couplings, can be simulated in the near future. Sample programs for the efficient simulation of a few simple models are presented as an aid to researchers just entering the field. Send Correspondence and request for preprints to: G.A. Kohring HLRZ an der KFA Julich Postfach 1913 D-5170 Julich, West Germany e-mail: hkf218 at djukfa11.bitnet Address after September 1, 1990: Institut fur Theoretische Physik Universitat zu Koln D-5000 Koln 41, West Germany From Connectionists-Request at CS.CMU.EDU Fri Aug 24 10:31:02 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Fri, 24 Aug 90 10:31:02 EDT Subject: Quantitative Linguistics Conference Announcement Message-ID: <10643.651508262@B.GP.CS.CMU.EDU> First QUANTITATIVE LINGUISTICS CONFERENCE (QUALICO) September 23 - 27, 1991 University of Trier, Germany organized by the GLDV - Gesellschaft fuer Linguistische Datenverarbeitung (German Society for Linguistic Computing) and the Editors of "Quantitative Linguistics" OBJECTIVES QUALICO is being held for the first time as an International Conference to demonstrate the state of the art in Quantitative Linguistics. This domain of language study and research is gaining considerable interest due to recent advances in linguistic modelling, particularly in computational linguistics, cognitive science, and developments in mathematics like non- linear systems theory. Progress in hard- and software technology together with ease of access to data and numerical processing has provided new means of empirical data acquisition and the application of mathematical models of adequate complexity. The German Society for Linguistic Computation (Gesellschaft fuer Linguistische Datenverarbeitung - GLDV) and the editors of 'Quantitative Linguistics' have taken the initiative in preparing this conference to take place at the University of Trier, in Trier (Germany), September 23rd - 27th, 1991. In view of the stimulating new developments in Europe and the academic world, the organizers' aim is to encourage and promote mutual exchange of ideas in this field of interest which has been limited in the past. Challenging advances in interdisciplinary quantitative analyses, numerical modelling and experimental simulations from different linguistic domains will be reported on by the following keynote speakers: Gabriel Altmann (Bochum), Michail V. Arapov (Moskau) (pending acceptance), Hans Goebl (Salzburg), Mildred L.G. Shaw (Calgary), John S. Nicolis (Patras), Stuart M. Shieber (Harvard) (pending acceptance). CALL FOR PAPERS The International Program Committee invites communications (long papers: 20 minutes plus 10; short papers: 15 minutes plus 5; demonstrations and posters) on basic research and development as well as on operational applications of Quantitative Linguistics, including - but not limited to - the following topics: A. Methodology 1. Theory Construction - 2. Measurement, Scaling - 3. Taxonomy, Categorizing - 4. Simulation - 5. Statistics, Probabilistic Modells, Stochastic Processes - 6. Fuzzy Theory: Possibilistic Modells - 7. Language and Grammar Formalisms - 8. Systems Theory: Cybernetics and Information Theory, Synergetics, New Connectionism B. Linguistic Analysis and Modelling 1. Phonetics - 2. Phonemics - 3. Morphology - 4. Syntax - 5. Semantics - 6. Pragmatics - 7.Lexicology - 8. Dialectology - 9. Typology - 10. Text and Discourse - 11. Semiotics C. Applications 1. Speech Recognition and Synthesis - 2.Text Analysis and Generation - 3. Language Acquisition and Teaching - 4.Text Understanding and Knowledge Representation Authors are asked to submit extended abstracts (1500 words; 4 copies) of their papers in one of the conference's working languages (German, English) not later than December 31, 1990 to: QUALICO - The Program Committee University of Trier P.O.Box 3825 D-5500 TRIER Germany uucp: qualico at utrurt.uucp or: ..!unido!utrurt!qualico X.400: qualico at ldv.rz.uni-trier.dbp.de or: Notice of acceptance will be given by March 31, 1991; and full versions of invited and accepted papers (camera-ready) are due by June 30, 1991 in order to have the Conference Proceedings be published in time to be available for participants at the beginning of QUALICO. This 'Call for Papers' is distributed world-wide in order to reach researchers active in universities and industry. SOCIAL PROGRAMME The oldest city in Germany, founded 16 b.C. by the Romans as Augusta Treverorum in the Mosel valley is situated now in the most Western region of Germany near both the French and Luxembourgian border.In the center of Europe this ancient city will host the participants of QUALICO at the University of Trier, surrounded by the vineyards of the Mosel-Saar-Ruwer wine district at vintage beginning. The excursion day scheduled midway through the conference (September 25, 1991) will provide an opportunity to visit points of historical interest in the city and its vicinity during a boat-trip on the Mosel river. PROGRAM COMMITTEE Chair: B.B. Rieger, University of Trier S. Embleton, University of York, D. Gibbon, University of Bielefeld R. Grotjahn, University of Bochum J. Haller, IAI Saarbruecken P. Hellwig, University of Heidelberg E. Hopkins, University of Bochum J. Kindermann, GMD Bonn-St.Augustin U. Klenk, University of Goettingen R. Koehler, University of Trier J.P. Koester, University of Trier J. Krause, University of Regensburg W. Lehfeldt, University of Konstanz W. Lenders, University of Bonn C. Lischka, GMD Bonn-St.Augustin W. Matthaeus, University of Bochum R.G. Piotrowski, University of Leningrad D. Roesner, FAW Ulm G. Ruge, Siemens AG, Muenchen B. Schaeder, University of Siegen H. Schnelle, University of Bochum J. Sambor, University of Warsaw ORGANIZING COMMITTEE Chair: R. Koehler, University of Trier CONFERENCE FEES Early registration (paid before July 31, 1991): DM 300,- - Members of supporting organizations DM 250,- - Students (without Proceedings) DM 150,- Registration (paid after July 31, 1991): DM 400,- - Members of supporting organizations DM 350,- - Students (without Proceedings) DM 250,- From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Fri Aug 24 12:36:28 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Fri, 24 Aug 90 12:36:28 EDT Subject: Quantitative Linguistics??? Message-ID: Perhaps the people who sent out this conference announcement could follow up with a *brief* description of what quantitative linguistics is all about, and why they are so excited about new advances in the area. I'm not familiar with the term, and the conference announcement didn't make clear how qualitaive linguistics differs from older (qualitative?) linguistic models, except maybe that the key researchers are all in Europe. And what does quantitative linguistics have to do with connectionism? -- Scott Fahlman, Carnegie-Mellon University From bms at dcs.leeds.ac.uk Fri Aug 24 13:26:57 1990 From: bms at dcs.leeds.ac.uk (B M Smith) Date: Fri, 24 Aug 90 13:26:57 BST Subject: Item for Distribution Message-ID: <1511.9008241226@csuna6.dcs.leeds.ac.uk> FINAL CALL FOR PAPERS AISB'91 8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE University of Leeds, UK 16-19 April, 1991 The Society for the Study of Artificial Intelligence and Simulation of Behaviour (SSAISB) will hold its eighth biennial conference at Bodington Hall, University of Leeds, from 16 to 19 April 1991. There will be a Tutorial Programme on 16 April followed by the full Technical Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit Brussel). Scope: Papers are sought in all areas of Artificial Intelligence and Simulation of Behaviour, but especially on the following AISB91 special themes: * Emergent functionality in autonomous agents * Neural networks and self-organisation * Constraint logic programming * Knowledge level expert systems research Papers may describe theoretical or practical work but should make a significant and original contribution to knowledge about the field of Artificial Intelligence. A prize of 500 pounds for the best paper has been offered by British Telecom Computing (Advanced Technology Group). It is expected that the proceedings will be published as a book. Submission: All submissions should be in hardcopy in letter quality print and should be written in 12 point or pica typewriter face on A4 or 8.5" x 11" paper, and should be no longer than 10 sides, single-spaced. Each paper should contain an abstract of not more than 200 words and a list of up to four keywords or phrases describing the content of the paper. Five copies should be submitted. Papers must be written in English. Authors should give an electronic mail address where possible. Submission of a paper implies that all authors have obtained all necessary clearances from the institution and that an author will attend the conference to present the paper if it is accepted. Papers should describe work that will be unpublished on the date of the conference. Dates: Deadline for Submission: 1 October 1990 Notification of Acceptance: 7 December 1990 Deadline for camera ready copy: 16 January 1991 Location: Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The city of Leeds is two and a half hours by rail from London, and there are frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam and Paris. The Yorkshire Dales National Park is close by, and the historic city of York is only 30 minutes away by rail. Information: Papers and all queries regarding the programme should be sent to Judith Dennison. All other correspondence and queries regarding the conference to the Local Organiser, Barbara Smith. Ms. Judith Dennison Dr. Barbara Smith Cognitive Sciences Division of AI University of Sussex School of Computer Studies Falmer University of Leeds Brighton BN1 9QN Leeds LS2 9JT UK UK Tel: (+44) 273 678379 Tel: (+44) 532 334627 Email: judithd at cogs.sussex.ac.uk FAX: (+44) 532 335468 Email: aisb91 at ai.leeds.ac.uk From sankar at caip.rutgers.edu Fri Aug 24 17:19:35 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Fri, 24 Aug 90 17:19:35 EDT Subject: No subject Message-ID: <9008242119.AA06389@caip.rutgers.edu> Rutgers University CAIP Center CAIP Neural Network Workshop 15-17 October 1990 A neural network workshop will be held during 15-17 October 1990 in East Brunswick, New Jersey under the sponsorship of the CAIP Center of Rutgers University. The theme of the workshop will be "Theory and impact of Neural Networks on future technology" Leaders in the field from government, industry and academia will present the state-of-the-art theory and applications of neural networks. Attendance will be limited to about 100 participants. A Partial List of Speakers and Panelists include: J. Alspector, Bellcore A. Barto, University of Massachusetts R. Brockett, Harvard University L. Cooper, Brown University J. Cowan, University of Chicago K. Fukushima, Osaka University D. Glasser, University of California, Berkeley S. Grossberg, Boston University R. Hecht-Nielsen, HNN, San Diego J. Hopfield, California Institute of Technology L. Jackel, AT&T Bell Labs. S. Kirkpatrick, IBM, T.J. Watson Research Center S. Kung, Princeton University F. Pineda, JPL, California Institute of Technology R. Linsker, IBM, T.J. Watson Research Center J. Moody, Yale University E. Sontag, Rutgers University H. Stark, Illinois Institute of Technology B. Widrow, Stanford University Y. Zeevi, CAIP Center, Rutgers University and The Technion, Israel The workshop will begin with registration at 8:30 AM on Monday, 15 October and end at 7:00 PM on Wednesday, 17 October. There will be dinners on Tuesday and Wednesday evenings followed by special-topic discussion sessions. The $395 registration fee ($295 for participants from CAIP member organizations), includes the cost of the dinners. Participants are expected to remain in attendance throughout the entire period of the workshop. Proceedings of the workshop will subsequently be published in book form. Individuals wishing to participate in the workshop should fill out the attached form and mail it to the address indicated. If there are any questions, please contact Prof. Richard Mammone Department of Electrical and Computer Engineering Rutgers University P.O. Box 909 Piscataway, NJ 08854 Telephone: (201)932-5554 Electronic Mail: mammone at caip.rutgers.edu FAX: (201)932-4775 Telex: 6502497820 mci Rutgers University CAIP Center CAIP Neural Network Workshop 15-17 October 1990 I would like to register for the Neural Network Workshop. Title:________ Last:_________________ First:_______________ Middle:__________ Affiliation _________________________________________________________ Address _________________________________________________________ ______________________________________________________ Business Telephone: (___)________ FAX:(___)________ Electronic Mail:_______________________ Home Telephone:(___)________ I am particularly interested in the following aspects of neural networks: _______________________________________________________________________ _______________________________________________________________________ Fee enclosed $_______ Please bill me $_______ Please complete the above and mail this form to: Neural Network Workshop CAIP Center, Rutgers University Brett and Bowser Roads P.O. Box 1390 Piscataway, NJ 08855-1390 (USA) From bms at dcs.leeds.ac.uk Fri Aug 24 13:31:19 1990 From: bms at dcs.leeds.ac.uk (B M Smith) Date: Fri, 24 Aug 90 13:31:19 BST Subject: Item for Distribution Message-ID: <1560.9008241231@csuna6.dcs.leeds.ac.uk> FINAL CALL FOR PAPERS AISB'91 8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE University of Leeds, UK 16-19 April, 1991 The Society for the Study of Artificial Intelligence and Simulation of Behaviour (SSAISB) will hold its eighth biennial conference at Bodington Hall, University of Leeds, from 16 to 19 April 1991. There will be a Tutorial Programme on 16 April followed by the full Technical Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit Brussel). Scope: Papers are sought in all areas of Artificial Intelligence and Simulation of Behaviour, but especially on the following AISB91 special themes: * Emergent functionality in autonomous agents * Neural networks and self-organisation * Constraint logic programming * Knowledge level expert systems research Papers may describe theoretical or practical work but should make a significant and original contribution to knowledge about the field of Artificial Intelligence. A prize of 500 pounds for the best paper has been offered by British Telecom Computing (Advanced Technology Group). It is expected that the proceedings will be published as a book. Submission: All submissions should be in hardcopy in letter quality print and should be written in 12 point or pica typewriter face on A4 or 8.5" x 11" paper, and should be no longer than 10 sides, single-spaced. Each paper should contain an abstract of not more than 200 words and a list of up to four keywords or phrases describing the content of the paper. Five copies should be submitted. Papers must be written in English. Authors should give an electronic mail address where possible. Submission of a paper implies that all authors have obtained all necessary clearances from the institution and that an author will attend the conference to present the paper if it is accepted. Papers should describe work that will be unpublished on the date of the conference. Dates: Deadline for Submission: 1 October 1990 Notification of Acceptance: 7 December 1990 Deadline for camera ready copy: 16 January 1991 Location: Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The city of Leeds is two and a half hours by rail from London, and there are frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam and Paris. The Yorkshire Dales National Park is close by, and the historic city of York is only 30 minutes away by rail. Information: Papers and all queries regarding the programme should be sent to Judith Dennison. All other correspondence and queries regarding the conference to the Local Organiser, Barbara Smith. Ms. Judith Dennison Dr. Barbara Smith Cognitive Sciences Division of AI University of Sussex School of Computer Studies Falmer University of Leeds Brighton BN1 9QN Leeds LS2 9JT UK UK Tel: (+44) 273 678379 Tel: (+44) 532 334627 Email: judithd at cogs.sussex.ac.uk FAX: (+44) 532 335468 Email: aisb91 at ai.leeds.ac.uk From tgd at turing.CS.ORST.EDU Fri Aug 24 17:55:56 1990 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Fri, 24 Aug 90 14:55:56 PDT Subject: Human confusability of phonemes Message-ID: <9008242155.AA06954@turing.CS.ORST.EDU> I am conducting a comparison study of several learning algorithms on the nettalk task. To make the comparisons fair, I would like to be able to rate the severity of prediction errors made by these algorithms. For example, if the desired phoneme is /k/ (the k in "key") and the phoneme produced by the learned network is /e/ (the a in "late"), then this is a bad error. On the other hand, substituting /x/ (the a in "pirate") for /@/ (the a in "cab") should probably not count as much of an error. Can any readers point me to research that has been done on the confusability of different phonemes (i.e., to what extent human listeners can confuse two phonemes or reliably detect their difference)? Thanks, Tom Dietterich Thomas G. Dietterich Department of Computer Science Dearborn Hall, 306 Oregon State University Corvallis, OR 97331-3202 From schraudo%cs at ucsd.edu Fri Aug 24 18:18:46 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Fri, 24 Aug 90 15:18:46 PDT Subject: TR announcement (hardcopy and ftp) Message-ID: <9008242218.AA14587@beowulf.ucsd.edu> The following technical report is now available in print: -------- Dynamic Parameter Encoding for Genetic Algorithms ------------------------------------------------- Nicol N. Schraudolph Richard K. Belew The selection of fixed binary gene representations for real-valued parameters of the phenotype required by Holland's genetic algorithm (GA) forces either the sacrifice of representational precision for efficiency of search or vice versa. Dynamic Parameter Encoding (DPE) is a mechanism that avoids this dilemma by using convergence statistics derived from the GA population to adaptively control the mapping from fixed-length binary genes to real values. By reducing the length of genes DPE causes the GA to focus its search on the interactions between genes rather than the details of allele selection within individual genes. DPE also highlights the general importance of the problem of premature convergence in GAs, explored here through two convergence models. -------- To obtain a hardcopy, request technical report LAUR 90-2795 via e-mail from office%bromine at LANL.GOV, or via plain mail from Technical Report Requests CNLS, MS-B258 Los Alamos National Laboratory Los Alamos, NM 87545 USA -------- As previously announced, the report is also available in compressed PostScript format for anonymous ftp from the Artificial Life archive server. To obtain a copy, use the following procedure: $ ftp iuvax.cs.indiana.edu % (or 129.79.254.192) login: anonymous password: ftp> cd pub/alife/papers ftp> binary ftp> get schrau90-dpe.ps.Z ftp> quit $ uncompress schrau90-dpe.ps.Z $ lpr schrau90-dpe.ps -------- The DPE algorithm is an option in the GENESIS 1.1ucsd GA simulator, which will be ready for distribution (via anonymous ftp) shortly. Procedures for obtaining 1.1ucsd will then be announced on this mailing list. -------- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From mikek at wasteheat.colorado.edu Mon Aug 27 19:42:44 1990 From: mikek at wasteheat.colorado.edu (Mike Kranzdorf) Date: Mon, 27 Aug 90 17:42:44 -0600 Subject: Mactivation - new info Message-ID: <9008272342.AA25683@wasteheat.colorado.edu> ***Please note new physical address*** Mactivation is an introductory neural network simulator which runs on all Macintoshes. A graphical interface provides direct access to units, connections, and patterns. Basic concepts of associative memory and network operation can be explored, with many low level parameters available for modification. Back- propagation is not supported. A user's manual containing an introduction to connectionist networks and program documentation is included on one 800K Macintosh disk. The current version is 3.3 Mactivation is available from the author, Mike Kranzdorf. The program may be freely copied, including for classroom distribution. To obtain a copy, send your name and address and a check payable to Mike Kranzdorf for $5 (US). International orders should send either an international postal money order for five dollars US or ten (10) international postal coupons. Mactivation 3.2 is available via anonymous ftp on boulder.colorado.edu Please don't ask me how to deal with ftp - that's why I offer it via snail mail. I will probably post version 3.3 soon, it depends on some politics here. Mike Kranzdorf P.O. Box 1379 Nederland, CO 80466-1379 From mikek at wasteheat.colorado.edu Tue Aug 28 12:24:52 1990 From: mikek at wasteheat.colorado.edu (Mike Kranzdorf) Date: Tue, 28 Aug 90 10:24:52 -0600 Subject: Mactivation ftp location Message-ID: <9008281624.AA26266@wasteheat.colorado.edu> Sorry I forgot to include the ftp specifics: Machine: boulder.colorado.edu Directory: /pub File Name: mactivation.3.2.sit.hqx.Z I really will try to put version 3.3 there soon. Please send me comments if you use Mactivation. I am very responsive to good suggestions and will add them when possible. Back-prop will come in version 4.0, but that's a complete re-write. I can add smaller things to 3.3. --mike From pako at neuronstar.it.lut.fi Thu Aug 30 05:05:47 1990 From: pako at neuronstar.it.lut.fi (Pasi Koikkalainen) Date: Thu, 30 Aug 90 12:05:47 +0300 Subject: ICANN International Conference on Artificial Neural Networks Message-ID: <9008300905.AA01460@neuronstar.it.lut.fi> ICANN-91 INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS Helsinki University of Technology Espoo, Finland, June 24-28, 1991 Conference Chair: Conference Committee: Teuvo Kohonen (Finland) Bernard Angeniol (France) Eduardo Caianiello (Italy) Program Chair: Rolf Eckmiller (FRG) Igor Aleksander (England) John Hertz (Denmark) Luc Steels (Belgium) CALL FOR PAPERS =================== THE CONFERENCE: =============== Theories, implementations, and applications of Artificial Neural Networks are progressing at a growing speed both in Europe and elsewhere. The first commercial hardware for neural circuits and systems are emerging. This conference will be a major international contact forum for experts from academia and industry worldwide. Around 1000 participants are expected. ACTIVITIES: =========== - Tutorials - Invited talks - Oral and poster sessions - Prototype demonstrations - Video presentations - Industrial exhibition ------------------------------------------------------------------------- Complete papers of at most 6 pages are invited for oral or poster presentation in one of the sessions given below: 1. Mathematical theories of networks and dynamical systems 2. Neural network architectures and algorithms (including organizations and comparative studies) 3. Artificial associative memories 4. Pattern recognition and signal processing (especially vision and speech) 5. Self-organization and vector quantization 6. Robotics and control 7. "Neural" knowledge data bases and non-rule-based decision making 8. Software development (design tools, parallel algorithms, and software packages) 9. Hardware implementations (coprocessors, VLSI, optical, and molecular) 10. Commercial and industrial applications 11. Biological and physiological connection (synaptic and cell functions, sensory and motor functions, and memory) 12. Neural models for cognitive science and high-level brain functions 13. Physics connection (thermodynamical models, spin glasses, and chaos) -------------------------------------------------------------------------- Deadline for submitting manuscripts is January 15, 1991. The Conference Proceedings will be published as a book by Elsevier Science Publishers B.V. Deadline for sending final papers on the special forms is March 15, 1991. For more information and instructions for submitting manuscripts, please contact: Prof. Olli Simula ICANN-91 Organization Chairman Helsinki University of Technology SF-02150 Espoo, Finland Fax: +358 0 451 3277 Telex: 125161 HTKK SF Email (internet): icann91 at hutmc.hut.fi --------------------------------------------------------------------------- In addition to the scientific program, several social occasions will be included in the registration fee. Pre- and post-conference tours and excursions will also be arranged. For more information about registration and accommodation, please contact: Congress Management Systems P.O.Box 151 SF-00141 Helsinki, Finland Tel.: +358 0 175 355 Fax: +358 0 170 122 Telex: 123585 CMS SF From uhr at cs.wisc.edu Thu Aug 30 12:30:30 1990 From: uhr at cs.wisc.edu (Leonard Uhr) Date: Thu, 30 Aug 90 11:30:30 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008301630.AA10562@thor.cs.wisc.edu> A quick response to the responses to my comments on the gap between nets and computer vision (I've been out of town, and now trying to catch up on mail): I certainly wasn't suggesting that the number of input nodes matters, but simply that complex images must be resolved in enough detail to be recognizable. Gary Cottrell's 64x64 images may be adequate for faces (tho I suspect finer resolution is needed as more people are used, with many different expressions (much less rotations) for each). But the point is that complete connectivity from layer to layer needs O(N**2) links, and the fact that "a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of how complete connectivity dominates. Once the preprocessor is handled by the net itself it will either need too many links or have ad hoc structure. It's surely better to use partial connectivity (e.g., local - which is a very general assumption motivated by physical interactions and brain structure) than some inevitably ad hoc preprocessing steps of unknown value. Evaluation is tedious and unrewarding, but without it we simply can't make claims or compare systems. I'm not arguing against nets - to the contrary, I think that highly parallel nets are the only possibility for handling really hard problems like recognition, language handling, and reasoning. But they'll need much better structure (or the ability to evolve and generate needed structures). And I was asking for objective evidence that 3-layer feed-forward nets with links between all nodes in adjacent layers actually handle complex images better than some of the large and powerful computer vision systems. True - we know that in theory they can do anything. But that's no better than knowing that random search through the space of all Turing machine programs can do anything. Len Uhr From ahmad at ICSI.Berkeley.EDU Thu Aug 30 16:20:13 1990 From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad) Date: Thu, 30 Aug 90 13:20:13 PDT Subject: Summary (long): pattern recognition comparisons In-Reply-To: Leonard Uhr's message of Thu, 30 Aug 90 11:30:30 -0500 <9008301630.AA10562@thor.cs.wisc.edu> Message-ID: <9008302020.AA02846@icsib18.Berkeley.EDU> >But the point is that >complete connectivity from layer to layer needs O(N**2) links, and the fact that >"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of >how complete connectivity dominates. Once the preprocessor is handled by the >net itself it will either need too many links or have ad hoc structure. >It's surely better to use partial connectivity (e.g., local - which is a very >general assumption motivated by physical interactions and brain structure) >than some inevitably ad hoc preprocessing steps of unknown value. Systems with selective attention mechanisms provide yet another way of avoiding the combinatorics. In these models, you can route relevant feature values from arbitrary locations in the image to a central processor. The big advantage is that the central processor can now be quite complex (possibly fully connected) since it only has to deal with a relatively small number of inputs. --Subutai Ahmad ahmad at icsi.berkeley.edu References: Koch, C. and Ullman, S. Shifts in Selective Attention: towards the underlying neural circuitry. Human Neurobiology, Vol 4:219-227, 1985. Ahmad, S. and Omohundro, S. Equilateral Triangles: A Challenge for Connectionist Vision. In Proceedings of the 12th Annual meeting of the Cognitive Science Society, MIT, 1990. Ahmad, S. and Omohundro, S. A Network for Extracting the Locations of Point Clusters Using Selective Attention, ICSI Tech Report No. TR-90-011, 1990. From kawahara at av-convex.ntt.jp Fri Aug 31 10:43:46 1990 From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA) Date: Fri, 31 Aug 90 23:43:46+0900 Subject: JNNS'90 Program Summary (long) Message-ID: <9008311443.AA11611@av-convex.ntt.jp> The first annual conference of Japan Neural Network Society (JNNS'90) will be held from 10 to 12 September, 1990. Followings are the program summary and related information on JNNS. There are 2 Invited presentations, 23 oral presentations and 53 poster presentations. Unfortunately, a list of the presentation titles in English is not available yet, because many authors didn't provide English titles for their presentations (Official languages for the proceding were Japanese and English. But only two articles were written in English). I will try to compile the English list by the end of September and would like to introduce it. If you have any questions or comments, please e-mail to the following address. (Please *DON'T REPLY*.) kawahara at nttlab.ntt.jp - ---------------------------------------------- Hideki Kawahara NTT Basic Research Laboratories 3-9-11, Midori-cho Musashino, Tokyo 180, JAPAN Tel: +81 422 59 2276, Fax: +81 422 59 3393 - ---------------------------------------------- JNNS'90 1990 Annual Conference of Japan Neural Network Society September 10-12, 1990 Tamagawa University, 6-1-1 Tamagawa-Gakuen Machida, Tokyo 194, Japan Program Summary Monday, 10 September 1990 12:00 Registration 13:00 - 16:00 Oral Session O1: Learning 16:00 - 18:00 Poster session P1: Learning, Motion and Architecture 18:00 Organization Committee Tuesday, 11 September 1990 9:00 - 12:00 Oral Session O2: Motion and Architecture 13:00 - 13:30 Plenary Session 13:30 - 15:30 Invited Talk; "Brain Codes of Shapes: Experiments and Models" by Keiji Tanaka "Theories: from 1980's to 1990's" by Shigeru Shinomoto 15:30 - 18:30 Oral Session O3: Vision I 19:00 Reception Wednesday, 12 September 1990 9:00 - 12:00 Oral Session O4: Vision II, Time Series and Dynamics 13:00 - 15:00 Poster Session P2: Vision I, II, Time Series and Dynamics 15:00 - 16:45 Oral Session O5: Dynamics Room 450 is for Oral Session, Plenary Session and Invited talk. Rooms 322, 323, 324, 325 and 350 are for Poster Session. Registration Fees for Conference Members 5000 yen Student members 3000 yen Otherwise 8000 yen Reception 19:00 Tuesday, 12 September 1990 Sakufuu-building Fee: 5000 yen JNNS Officers and Governing board Kunihiko Fukushima Osaka University President Shiun-ichi Amari University of Tokyo International Affair Secretary Minoru Tsukada Tamagawa University Takashi Nagano Hosei University Publication Shiro Usui Toyohashi University of Technology Yoichi Okabe University of Tokyo Sei Miyake NHK Science and Technical Research Labs. Planning Yuichiro Anzai Keio University Keisuke Toyama Kyoto Prefectural School of Medicine Nozomu Hoshimiya Tohoku University Treasurer Naohiro Ishii Nagoya Institute of Technology Hideaki Saito Tamagawa University Regional Affair Ken-ichi Hara Yamagata University Hiroshi Yagi Toyama University Eiji Yodogawa ATR Syozo Yasui Kyushu Institute of Technology Supervisor Noboru Sugie Nagoya University Committee members Editorial Committee (Newsletter and mailing list) Takashi Omori Tokyo University of Agriculture and Technology Hideki Kawahara NTT Basic Research Labs. Itirou Tsuda Kyushu Institute of Technology Planning Committee Kazuyuki Aihara Tokyo Denki University Shigeru Shinomoto Kyoto University Keiji Tanaka The Institute of Physical and Chemical Research JNNS'90 Conference Organizing Committee Sei Miyake NHK Science and Technical Research Labs. General Chairman Keiji Tanaka The Institute of Physical and Chemical Research Program Chairman Shigeru Shinomoto Kyoto University Publicity Chairman Program Takayuki Ito NHK Science and Technical Research Labs. Takashi Omori Tokyo University of Agriculture and Technology Koji Kurata Osaka University Kenji Doya University of Tokyo Kazuhisa Niki Electrotechnical Laboratory Ryoko Futami Tohoku University Publicity Kazunari Nakane ATR Publication Hideki Kawahara NTT Basic Research Labs. Mahito Fujii NHK Science and Technical Research Labs. Treasurer Shin-ichi Kita University of Tokyo Manabu Sakakibara Toyohashi University of Technology Local Arrangement Shigeru Tanaka Fundamental Research Labs., NEC Makoto Mizuno Tamagawa University For more details, please contact: Japan Neural Network Society Office Faculty of Engineering, Tamagawa University 6-1-1 Tamagawa-Gakuen Machida, Tokyo 194, Japan Telephone: +81 427 28 3457 Facsimile: +81 427 28 3597 From carol at ai.toronto.edu Fri Aug 3 11:14:14 1990 From: carol at ai.toronto.edu (Carol Plathan) Date: Fri, 3 Aug 90 11:14:14 EDT Subject: research programmer job Message-ID: <90Aug3.111424edt.268@neuron.ai.toronto.edu> RESEARCH PROGRAMMER JOB AT THE UNIVERSITY OF TORONTO STARTING SALARY: $36,895 - $43,406 STARTING DATE: Fall 1990 The Connectionist Research Group in the Department of Computer Science at the University of Toronto is looking for a research programmer to develop a neural network simulator that uses Unix, C, and X-windows. The simulator will be used by our group of about 10 researchers, directed by Geoffrey Hinton, to explore learning procedures and their applications. It will also be released to some researchers in Canadian Industry. We already have a fast, flexible simulator and the programmer's main job will be to further develop, document, and maintain this simulator. The development may involve some significant re-design of the basic simulator. Additional duties (if time permits) will include: Implementing several different learning procedures within the simulator and investigating their performance on various data-sets; Assisting industrial collaborators and visitors in the use of the simulator; Porting the simulator to faster workstations or to boards that use fast processors such as the Intel i860 or DSP chips; Developing software for a project that uses a data-glove as an input device to an adaptive neural network that drives a speech synthesizer; Assisting in the acquisition and installation of hardware and software required for the project; The applicant should possess a Bachelors or Masters, preferably in Computer Science or Electrical Engineering, and have at least two years programming experience including experience with unix and C, and some experience with graphics. Knowledge of elementary calculus and elementary linear algebra is essential. Knowlege of numerical analysis, information theory, and perceptual or cognitive psychology would be advantageous. Good oral and written communication skills are required. Please send CV + names of two or three references to Carol Plathan, Computer Science Department, University of Toronto, 10 Kings College Road, Toronto Ontario M5S 1A4. You could also send the information by email to carol at ai.toronto.edu or call Carol at 416-978-3695 for more details. The University of Toronto is an equal opportunity employer. ADDITIONAL INFORMATION The job can be given to a non-Canadian if they are better than any Canadians or Canadian Residents who apply. In this case, the non-Canadian would probably start work here on a temporary work permit while the application for a more permanent permit was being processed. There are already SEVERAL good applicants for the job. Candidates who do not already program fluently in C or have not already done neural network simulations stand very little chance. Also, it is basically a programming job. The programmer may get involved in some original research on neural nets, but this is NOT the main part of the job, so it is not suitable for postdoctoral researchers who want to get on with their own research agenda. Interviews will be during September. We will definitely not employ anybody without an interview and we cannot afford to pay travel expenses for interviews (except in very exceptional circumstances). If there are several good applicants from the west coast of the USA, I may arrange to interview them in California. We already have sufficient funding to support the programmer for the next three years. However, we have applied to the Canadian Government for additional funding specifically for this work, and if it comes through (in November 1990) the programmer will be transferred to that source of funding and the simulator will definitely be supplied to Canadian Industry. The job will then require more interactions with industrial users and more systematic documentation, maintainance and debugging of the simulator releases. From uhr at cs.wisc.edu Fri Aug 3 15:18:11 1990 From: uhr at cs.wisc.edu (Leonard Uhr) Date: Fri, 3 Aug 90 14:18:11 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008031918.AA23586@thor.cs.wisc.edu> Neural nets using backprop have only handled VERY SIMPLE images, usually in 8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in logarithmically converging nets, but I don't know of any nets with complete connectivity from one layer to the next that are that big.) In sharp contrast, pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (e.g. houses, furniture) in 128-by-128 or even larger inputs. So I've been really surprised to read statements to the effect NN have proved to be much better. What experimental evidence is there that NN recognize images as complex as those handled by computer vision and pattern recognition approaches? True it's hard to run good comparative experiments, but without them where are we? NN re-introduce learning, which is great - except that to make learning work we need to cut down and direct the explosive search at least as much as using any other approach. The brain is THE bag of tools that does the trick, and it has a lot of structure (hierarchical convergence-divergence; local links to relatively small numbers; families of feature-detectors) that can substantially improve today's nets. More powerfufl structures, basic processes, and learning mechanisms are essential to replace weak learning algorithms like delta and backprop that need O(N*N) links to guarantee (eventual) success - hence can't even be run on images with more than a few hundred pixels. Len Uhr From N.E.Sharkey at cs.exeter.ac.uk Sat Aug 4 16:30:53 1990 From: N.E.Sharkey at cs.exeter.ac.uk (Noel Sharkey) Date: Sat, 4 Aug 90 16:30:53 BST Subject: special issue Message-ID: <11054.9008041530@entropy.cs.exeter.ac.uk> The NATURAL LANGUAGE special issue of CONNECION SCIENCE will be on the shelves soon. I though you might like to see the contents. CONTENTS Catherine L Harris Connectionism and Cognitive Linguistics John Rager & George Berg A Connectionist Model of Motion and Government in Chomsky's Government-binding Theory David J Chalmers Syntactic Transformations on Distributed Representations Stan C Kwasny & Kanaan A Faisal Connectinism and Determinism in a Syntactic Parser Risto Miikkulainen Script Recognition with Hierarchical Feature Maps Lorraine F R Karen Identification of Topical Entities in Discouse: a Connectionist Approach to Attentional Mechanism in Language Mary Hare The Role of Similarity in Hungarian Vowel Harmony: a Connectionist Account Robert Port Representation and Recognition of Temporal Patterns Editor: Noel E. Sharkey, University of Exeter Special Editorial Review Panel Robert B. Allen, Bellcore Garrison W. Cottrell, University of California, San Diego Michael G. Dyer, University of California, Los Angeles Jeffrey L. Elman, University of California, San Diego George Lakoff, University of California, Berkeley Wendy G. Lehnert, University of Massachusetts at Amherst Jordan Pollack, Ohio State University Ronan Reilly, Beckman Institute, University of Illinois at Urbana-Champaign Bart Selman, University of Toronto Paul Smolensky, University of Colorado, Boulder We would like to encourage the CNLP community to submit many more papers, and we would particulary like to see more papers on representational issues. noel From schraudo%cs at ucsd.edu Sat Aug 4 15:43:20 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Sat, 4 Aug 90 12:43:20 PDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008041943.AA01622@beowulf.ucsd.edu> > From: Leonard Uhr > > Neural nets using backprop have only handled VERY SIMPLE images, usually in > 8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in > logarithmically converging nets, but I don't know of any nets with complete > connectivity from one layer to the next that are that big.) In sharp contrast, > pr/computer vision systems are designed to handle MUCH MORE COMPLEX images (eg > houses, furniture) in 128-by-128 or even larger inputs. So I've been really > surprised to read statements to the effect NN have proved to be much better. > What experimental evidence is there that NN recognize images as complex as > those handled by computer vision and pattern recognition approaches? Well, Gary Cottrell for instance has successfully used a standard (3-layer, fully interconnected) backprop net for various face recognition tasks from 64x64 images. While I agree with you that many NN architectures don't scale well to large input sizes, and that modular, heterogenous architectures have the potential to overcome this limitation, I don't understand why you insist that current NNs could only handle simple images - unless you consider any image with less than 16k pixels simple. Does face recognition qualify as a complex visual task with you? The whole point of using comparatively inefficient NN setups (such as fully interconnected backprop nets) is that they are general enough to solve complex problems without built-in heuristics. Modular NNs require either a lot of prior knowledge about the problem you are trying to solve, or a second adaptive system (such as a GA) to search the architecture space. In the former case the problem is comparatively easy, and in the latter computational complexity rears its ugly head again... having said that, I do believe that GA/NN hybrids will play an important role in the future. I'm afraid I don't have a reference for Gary Cottrell's work - maybe someone else can post the details? -- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From honavar at cs.wisc.edu Sat Aug 4 20:43:56 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sat, 4 Aug 90 19:43:56 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008050043.AA05173@goat.cs.wisc.edu> >The whole point of using comparatively inefficient NN setups (such as fully >interconnected backprop nets) is that they are general enough to solve >complex problems without built-in heuristics. While I know of theoretical results that show that a feedforward neural net exists that can adequately encode any arbitrary real-valued function (Hornick, Stinchcombe, & White, 1988; Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of any results that suggest that such nets can LEARN any real-vauled function using backpropagation (ignoring the issue of computational tractability). Heuristics (or architectural constraints) like those used by some researchers for some vision problems - locally linked multi-layer converging nets (probably one of the most successful demonstrations is the work of LeCun et al. on handwritten zip code recognition) are interesting because they constrain (or bias) the network to develop particular types of representations. Also, they might enable efficient learning to take place in tasks that exhibit a certain intrinsic structure. The choice of a particular fixed neural network architecture (even if it is fully interconnected backprop net) implies the use of a corresponding representational bias. Whether such a representational bias is in any sense more general than some other (e.g., a network of nodes with limited fan-in but sufficient depth) is questionable (For any given completely interconnected feedforward network, there exists a functionally equivalent feedforward network of nodes with limited fan in - and for some problems, the latter may be more efficient). On a different note, how does one go about assessing the "generality" of a learning algorithm/architecture in practice? I would like to see a discussion on this issue. Vasant Honavar (honavar at cs.wisc.edu) From schraudo%cs at ucsd.edu Sun Aug 5 05:54:43 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Sun, 5 Aug 90 02:54:43 PDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008050954.AA00265@beowulf.ucsd.edu> > From honavar at cs.wisc.edu Sat Aug 4 17:45:01 1990 > > While I know of theoretical results that show that a feedforward > neural net exists that can adequately encode any arbitrary > real-valued function (Hornick, Stinchcombe, & White, 1988; > Cybenko, 1988; Carrol & Dickinson, 1989), I am not aware of > any results that suggest that such nets can LEARN any real-vauled > function using backpropagation (ignoring the issue of > computational tractability). > It is my understanding that some of the latest work of Hal White et al. presents a learning algorithm - backprop plus a rule for adding hidden units - that can (in the limit) provably learn any function of interest. (Disclaimer: I don't have the mathematical proficiency required to fully appreciate White et al.'s proofs and thus have to rely on second-hand interpretations.) > On a different note, how does one go about assessing the > "generality" of a learning algorithm/architecture in practice? > I would like to see a discussion on this issue. > I second this motion. As a starting point for discussion, would the Kolmogorov complexity of an architectural description be useful as a measure of architectural bias? -- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From aarons at cogs.sussex.ac.uk Sun Aug 5 07:57:52 1990 From: aarons at cogs.sussex.ac.uk (Aaron Sloman) Date: Sun, 5 Aug 90 12:57:52 +0100 Subject: Summary (long): pattern recognition comparisons Message-ID: <6816.9008051157@csuna.cogs.susx.ac.uk> > From: Leonard Uhr > > Neural nets using backprop have only handled VERY SIMPLE images..... > .......In sharp contrast, pr/computer vision systems are designed > to handle MUCH MORE COMPLEX images (eg houses, furniture) in > 128-by-128 or even larger inputs.... ..... > From: Nici Schraudolph > Well, Gary Cottrell for instance has successfully used a standard (3-layer, > fully interconnected) backprop net for various face recognition tasks from > 64x64 images. While I agree with you that many NN architectures don't scale > well to large input sizes, and that modular, heterogenous architectures have > the potential to overcome this limitation, I don't understand why you insist > that current NNs could only handle simple images - unless you consider any > image with less than 16k pixels simple. Does face recognition qualify as a > complex visual task with you? > ...... Characterising the complexity of the task in terms of the number of pixels seems to me to miss the most important points. Some (but by no means all) of the people working on NNs appear to have joined the field (the bandwagon?) without feeling obliged to study the AI literature on vision, perhaps because it is assumed that since the AI mechanisms are "wrong" all the literature must be irrelevant? On the contrary, good work in AI vision was concerned with understanding the nature of the task (or rather tasks) of a visual system, independently of the mechanisms postulated to perform those tasks. (When your programs fail you learn more about the nature of the task.) Recognition of isolated objects (e.g. face recognition) is just _one_ of the tasks of vision. Others include: (a) Interpreting a 2-D array (retinal array or optic array) in terms of 3-D structures and relationships. Seeing the 3-D structure of a face is a far more complex task than simply attaching a label: "Igor", "Bruce" or whatever. (b) Segmenting a complex scene into separate objects and describing the relationships between them (e.g. "houses, furniture"!). (The relationships include 2-D and 3-D spatial and functional relations.) Because evidence for boundaries is often unclear and ambiguous, and because recognition has to be based on combinations of features, the segmentation often cannot be done without recognition and recognition cannot be done without segmentation. This chicken and egg problem can lead to dreadful combinatorial searches. NNs offer the prospect of doing some of the searching in parallel by propagating constraints, but as far as I know they have not yet matched the more sophisticated AI visual systems. (It is important to distinguish segmentation, recognition and description of 2-D image fragments from segmentation, recognition and description of 3-D objects. The former seems to be what people in pattern recognition and NN research concentrate on most. The latter has been a major concern of AI vision work since the mid/late sixties, starting with L.G. Roberts I think, although some people in AI have continued trying to find 2-D cues to 3-D segmentation. Both 2-D and 3-D interpretations are important in human vision.) (c) Seeing events, processes and their relationships. Change "2-D" to "3-D" and "3-D" to "4-D" in (b) above. We are able to segment, recognize and describe events, processes and causal relationships as well as objects (e.g. following, entering, leaving, catching, bouncing, intercepting, grasping, sliding, supporting, stretching, compressing, twisting, untwisting, etc. etc.) Sometimes, as Johansson showed by attaching lights to human joints in a dark room, motion can be used to disambiguate 3-D structure. (d) Providing information and/or control signals for motor-control mechanisms: e.g. visual feedback is used (unconsciously) for posture control in sighted people, also controlling movement of arm, hand and fingers in grasping, etc. (I suspect that many such processes of fine tuning and control use changing 2-D "image" information rather than (or in addition to) 3-D structural information.) That's still only a partial list of the tasks of a visual system. For more detail see: A. Sloman `On designing a visual system: Towards a Gibsonian computational model of vision' in Journal of Experimental and Theoretical AI 1,4, 1989 Ballard, D.H. and C.M. Brown, Computer Vision, Englewood-Cliffs, Prentice Hall 1982. A system might be able to recognize isolated faces or other objects in an image by using mechanisms that would fail miserably in dealing with cluttered scenes where recognition and segmentation need to be combined. So a NN that recognised faces might tell us nothing about how it is done in natuarly visual systems, if the latter use more general mechanisms. One area in which I think neither AI nor NN work has made significant progress is shape perception. (I don't mean shape recognition!). People, and presumably many other animals, can see complex, intricate, irregular and varied shapes in a manner that supports a wide range of tasks, including recognizing, grasping, planning, controlling motion, predicting the consequences of motion, copying, building, etc. etc. Although a number of different kinds of shape representations have been explored in work on computer vision, CAD, graphics etc. (e.g. feature vectors; logical descriptions; networks of nodes and arcs; numbers representing co-ordinates, orientations, curvature etc; systems of equations for lines, planes, and other mathematically simple structures; fractals; etc. etc. etc.) they all seem capable of capturing only a superficial subset of what we can see when we look at kittens, sand dunes, crumpled paper, a human torso, a shrubbery, cloud formations, under-water scenes, etc. (Work on computer graphics is particularly misleading, because people are often tempted to think that a representation that _generates_ a natural looking image on a screen must capture what we see in the image, or in the scene that it depicts.) Does anyone have any idea what kind of breakthrough is needed in order to give a machine the kind of grasp of shape that can explain animal abilities to cope with real environments? Is there anything about NN shape representations that given them an advantage over others that have been explored, and if so what are they? I suspect that going for descriptions of static geometric structure is a dead end: seeing a shape really involves seeing potential processes involving that shape, and their limits (something like what J.J. Gibson meant by "affordances"?). I.e. a 3-D shape is inherently a vast array of 4-D possibilities and one of the tasks of a visual system is computing a large collection of those possibilities and making them readily available for a variety of subsequent processes. But that's much too vague an idea to be very useful. Or is it? Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons at cogs.sussex.ac.uk or: aarons%uk.ac.sussex.cogs at nsfnet-relay.ac.uk From honavar at cs.wisc.edu Sun Aug 5 15:48:37 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sun, 5 Aug 90 14:48:37 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008051948.AA00212@goat.cs.wisc.edu> >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) I can see how allowing the addition of (potentially unbounded number of hidden units) could enable a back-prop architecture to learn arbitrary functions. But in this sense, any procedure that builds up a look-up table or random-access memory (with some interpolation capability to cover the instances not explicitly stored) using an appropriate set of rules to add units is equally general (and probably more efficient than backprop in terms of time complexity of learning (cf Baum's proposal for more powerful learning algorithms). However look-up tables can be combinatorially intractable in terms of memory (space) complexity. This brings us to the issue of searching the architectural space along with the weight space in an efficient manner. There has already been some work in this direction (Fahlman's cascade correlation architecture, Ash's DNC, Honavar & Uhr's generative learning, Hanson's meiosis networks, and some recent work on ga-nn hybrids). We have been investigating methods to constrain the search in the architectural space (using heuristic controls / representational bias :-) ). I would like to hear from others who might be working on related issues. Vasant Honavar (honavar at cs.wisc.edu) From galem at mcc.com Sun Aug 5 17:48:25 1990 From: galem at mcc.com (Gale Martin) Date: Sun, 5 Aug 90 16:48:25 CDT Subject: Summary (long): pattern recognition comparisons Message-ID: <9008052148.AA02989@sunkist.aca.mcc.com> Leonard Uhr states (about NN learning) "to make learning work, we need to cut down and direct explosive search at least as much as using any other approach." Certainly there is reason to agree with this in the general case, but I doubt it's validity in important specific cases. I've spent the past couple of years working on backprop-based handwritten character recognition and find almost no supporting evidence of the need for explicitly cutting down on explosive search through the use of heuristics in these SPECIFIC cases and circumstances. We varied input character array size (10x16, 15x24, 20x32) to backprop nets and found no difference in the number of training samples required to achieve a given level of generalization performance for hand-printed letters. In nets with one hidden layer, we increased the number of hidden nodes from 50 to 383 and found no increase in the number of training samples needed to achieve high generalization (in fact, generalization is worse for the 50 hidden node case). We experimented extensively with nets having local connectivity and locally-linked nets in this domain and find similarly little evidence to support the need for such heuristics. These results hold across two different types of handwritten character recognition tasks (hand-printed letters and digits). This domain/case-specific robustness across architectural parameters and input size is one way to characterize the generality of a learning algorithm and may recommend one algorithm over another for specific problems. Gale Martin Martin, G. L., & Pittman, J. A. Recognizing hand-printed letters and digits in D.S. Touretzky (Ed.) Advances in Neural Information Processing Systems 2, 1990. Martin, G.L., Leow, W.K. & Pittman, J. A. Function complexity effects on backpropagation learning. MCC Tech Report ACT-HI-062-90. From ganesh at cs.wisc.edu Sun Aug 5 17:59:23 1990 From: ganesh at cs.wisc.edu (Ganesh Mani) Date: Sun, 5 Aug 90 16:59:23 -0500 Subject: Paper Message-ID: <9008052159.AA21968@sharp.cs.wisc.edu> The following paper is available for ftp from the repository at Ohio State. Please backpropagate comments (and errors!) to ganesh at cs.wisc.edu. -Ganesh Mani _________________________________________________________________________ Learning by Gradient Descent in Function Space Ganesh Mani Computer Sciences Dept. Unviersity of Wisconsin---Madison ganesh at cs.wisc.edu Abstract Traditional connectionist networks have homogeneous nodes wherein each node executes the same function. Networks where each node executes a different function can be used to achieve efficient supervised learning. A modified back-propagation algorithm for such networks, which performs gradient descent in ``function space,'' is presented and its advantages are discussed. The benefits of the suggested paradigm include faster learning and ease of interpretation of the trained network. _________________________________________________________________________ The following can be used to ftp the paper. unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): neuron ftp> cd pub/neuroprose ftp> type binary ftp> get (remote-file) mani.function-space.ps.Z (local-file) mani.function-space.ps.Z ftp> quit unix> uncompress mani.function-space.ps.Z unix> lpr -P(your_local_postscript_printer) mani.function-space.ps From honavar at cs.wisc.edu Mon Aug 6 00:27:25 1990 From: honavar at cs.wisc.edu (Vasant Honavar) Date: Sun, 5 Aug 90 23:27:25 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008060427.AA00489@goat.cs.wisc.edu> We have found that with relatively small sample sizes, generalization performance is improved by local connectivity and weight sharing on simple 2-d patterns. For position-invariant recognition, local connectivity and weight-sharing give substantially better generalization performance than that obtained without local connectivity. Clearly this is a case where extensive empirical studies are needed to draw general conclusions. Vasant Honavar (honavar at cs.wisc.edu) From awyk at wapsyvax.oz.au Mon Aug 6 03:17:41 1990 From: awyk at wapsyvax.oz.au (Brian Aw) Date: Mon, 6 Aug 90 15:17:41+0800 Subject: No subject Message-ID: <9008060725.649@munnari.oz.au> Dear Sir/Mdm, Hello! My name is Brian Aw and my e-mail address is awyk at wapsyvax.oz Would you kindly put me on both your address list and your mailing list for connectionist related results. I am a Ph.D. student as well as a research officer in the Psychology Department of the University of Western Australia (UWA), Perth. I am working under the supervision of Prof. John Ross who has recently joined your lists. I am an enthusiastic worker of neural network theory. Currently, I am developing a neural network for feature classifications in images. This year, I have published a technical report in the Computer Scrience Department of UWA in this area. My work has also been accepted for presentation and publication in the forthcoming 4th Australian Joint Conference on Artificial Intelligence (AI'90). Working in this field which advances so rapidly, I certainly need the kind of fast going and up-to-date informations which your system can provide. Thanking you in advance. brian. From erol at ehei.ehei.fr Mon Aug 6 08:07:39 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 12:09:39 +2 Subject: IJPRAI CALL FOR PAPERS Message-ID: <9008061041.AA24889@inria.inria.fr> Would you consider a paper on my "random network model" ? There are two papers already appeared or appearing in the journal Neural Computation. Best regards, Erol From erol at ehei.ehei.fr Mon Aug 6 05:47:31 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 09:49:31 +2 Subject: Visit to Poland Message-ID: <9008061014.AA24279@inria.inria.fr> I don't know about Poland, but you can contact me in Paris ! Erol Gelenbe From INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU Sun Aug 5 15:56:00 1990 From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU (INS_ATGE%JHUVMS.BITNET@VMA.CC.CMU.EDU) Date: Sun, 5 Aug 90 14:56 EST Subject: Similarity to Cascade-Correlation Message-ID: As a side note on the problem of using backpropagation on large problems, it should be noted that using efficient error minimization methods (i.e. conjugate-gradient methods) as opposed to the "vanilla" backprop described in _Parallel_Distributed_Processing_ allows one to work with much larger problems, and also allows for much greater performance on problems the network was trained on. For example, an IR target threat detection problem I have been recently working on (with 127 or 254 inputs and 20 training patterns) failed miserably when trained with "vanilla" backprop (hours and hours on a Connection Machine without success). When a conjugate-gradient training program was used, the network was able to learn 100% of the training set perfectly in just a minute or two. >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) How does this new work compare with the Cascade Correlation method developed by Fahlman, where a new hidden unit is added by training its receptive weights to maximize the correlation between its output and the network error, and then trains the projective weights to the outputs to minimize the error (thus only allowing single-layer backprop learning at each iteration)? -Thomas Edwards The Johns Hopkins University / U.S. Naval Research Lab From erol at ehei.ehei.fr Mon Aug 6 11:44:10 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Mon, 6 Aug 90 15:46:10 +2 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008061444.AA05688@inria.inria.fr> I would like to draw your attention to two recent papers of mine (my name is Erol Gelenbe) : Random networks with positive and negative signals and product form solutions in Neural Computation, Vol. 1, No. 4 (1989) Stability of the random network model in press in Neural Computation. The papers present a new model in which signals travel as "pulses". The quantity looked at in the model is the "neuron potential" in an arbitrarily connected network. I prove that these models have "product form" which means that there state can be computed simply and analytically. Comments and questions are welcome. erol at ehei.ehei.fr From fritz_dg%ncsd.dnet at gte.com Mon Aug 6 17:26:57 1990 From: fritz_dg%ncsd.dnet at gte.com (fritz_dg%ncsd.dnet@gte.com) Date: Mon, 6 Aug 90 17:26:57 -0400 Subject: neural network generators in Ada Message-ID: <9008062126.AA27920@bunny.gte.com> Are there any non-commercial Neural Network "generator programs" or such that are in Ada? (ie. generates suitable NN code from a set of user designated specifications, code suitable for embedding, etc). I'm interested in - experience developing and using same, lessons learned - to what uses such have been put, successful? - nature of; internal use of lists, arrays; what can be user specified, what can't; built-in limitations; level of HMI attached; compilers used; etc., etc. - and other relevant info developing and applying such from those who have tried developing and using them Am also interested in opinions on: If you were going to design a NN Maker _today_, how would you design it? If Ada were the language, what special things might be done? Motive should be transparent. My sincere thanks to all who respond. If there is interest, I'll turn the info (if any) around to the list in general. Dave Fritz fritz_dg%ncsd at gte.com (301) 738-8932 ---------------------------------------------------------------------- ---------------------------------------------------------------------- From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Mon Aug 6 23:20:09 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Mon, 06 Aug 90 23:20:09 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Sun, 05 Aug 90 14:56:00 -0500. Message-ID: >It is my understanding that some of the latest work of Hal White et al. >presents a learning algorithm - backprop plus a rule for adding hidden >units - that can (in the limit) provably learn any function of interest. >(Disclaimer: I don't have the mathematical proficiency required to fully >appreciate White et al.'s proofs and thus have to rely on second-hand >interpretations.) How does this new work compare with the Cascade Correlation method developed by Fahlman, where a new hidden unit is added by training its receptive weights to maximize the correlation between its output and the network error, and then trains the projective weights to the outputs to minimize the error (thus only allowing single-layer backprop learning at each iteration)? -Thomas Edwards The Johns Hopkins University / U.S. Naval Research Lab I'll take a stab at answering this. Maybe we'll also hear something from Hal White or one of his colleagues -- especially if I somehow misrepresent their work. I believe that all of the published completeness results from White's group assume a single layer of hidden units. They show that this architecture can approximate any desired transfer function (assuming it has certain smoothness properties) to any desired accuracy if you add enough units in this single layer. It's rather like proving that a piecewise linear approximation can approach any desired curve with arbitrarily small error as long as you're willing to use enough tiny pieces. Unless I've missed something, their work does not attempt to say anything about the minimum number of hidden units you might need in this hidden layer. Cascade-Correlation produces a feed-forward network of sigmoid units, but it differs in a number of ways from the kinds of nets considered by White: 1. Cascade-Correlation is intended to be a practical learning algorithm that produces a relatively compact solution as fast as possible. 2. In a Cascade net, each new hidden unit can receive inputs from all pre-existing hidden units. Therefore, each new unit is potentially a new layer. White's results show that you don't really NEED more than a single hidden layer, but having more layers can sometimes result in a very dramatic reduction in the total number of units and weights needed to solve a given problem. 3. There is no convergence proof for Cascade-Correlation. The candidate training phase, in which we try to create new hidden units by hill-climbing in some correlation measure, can and does get stuck in local maxima of this function. That's one reason we use a pool of candidate units: by training many candidates at once, we can greatly reduce the probability of creating new units that do not contribute significantly to the solution, but with a finite candidate pool we can never totally eliminate this possibility. It would not be hard to modify Cascade-Correlation to guarantee that it will eventually grind out a solution. The hard part, for a practical learning algorithm, is to guarantee that you'll find a "reasonably good" solution, however you want to define that. The recent work of Gallant and of Frean are interesting steps in this direction, at least for binary-valued transfer functions and fixed, finite training sets. -- Scott From jamesp at chaos.cs.brandeis.edu Mon Aug 6 21:38:40 1990 From: jamesp at chaos.cs.brandeis.edu (James Pustejovsky) Date: Mon, 6 Aug 90 21:38:40 edt Subject: Visit to Poland In-Reply-To: erol@ehei.ehei.fr's message of Mon, 6 Aug 90 09:49:31 +2 <9008061014.AA24279@inria.inria.fr> Message-ID: <9008070138.AA17019@chaos.cs.brandeis.edu> please withdraw my name from the list. there is too much random and irrelevant noise around the occasional noteworthy bit. From ericj at starbase.MITRE.ORG Tue Aug 7 08:33:27 1990 From: ericj at starbase.MITRE.ORG (Eric Jenkins) Date: Tue, 7 Aug 90 08:33:27 EDT Subject: ref for conjugate-gradient... Message-ID: <9008071233.AA25689@starbase> Would someone please post a pointer to info on conjugate-gradient methods of error minimization. Thanks. Eric Jenkins (ericj at ai.mitre.org) From erol at ehei.ehei.fr Tue Aug 7 07:06:28 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Tue, 7 Aug 90 11:08:28 +2 Subject: Call for Papers - ICGA-91 Message-ID: <9008071511.AA21568@inria.inria.fr> Concerning the scope of the conference, could the program chairman indicate what the boundaries of the area of genetic algorithms are in the context of this meeting ? This can be indicated by providing one or more references the conference chairman considers to be "typical" work in this area. Erol Gelenbe From erol at ehei.ehei.fr Tue Aug 7 10:42:33 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Tue, 7 Aug 90 14:44:33 +2 Subject: postdoc position available Message-ID: <9008071513.AA21597@inria.inria.fr> From jose at learning.siemens.com Tue Aug 7 19:55:05 1990 From: jose at learning.siemens.com (Steve Hanson) Date: Tue, 7 Aug 90 18:55:05 EST Subject: Similarity to Cascade-Correlation Message-ID: <9008072355.AA05108@learning.siemens.com.siemens.com> Scott: Isn't CC just Cart? Steve From schraudo%cs at ucsd.edu Tue Aug 7 15:05:35 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Tue, 7 Aug 90 12:05:35 PDT Subject: Similarity to Cascade-Correlation Message-ID: <9008071905.AA10253@beowulf.ucsd.edu> > From: INS_ATGE%JHUVMS.BITNET at VMA.CC.CMU.EDU > > How does [White et al.'s] new work compare with the Cascade Correlation > method developed by Fahlman [...]? In practical terms, very badly. Their algorithm's point is purely theore- tical: they can prove convergence from only a very small base of assumptions about the function to be learned. Do any similar proofs exist for Cascade Correlation? That would be interesting. -- Nicol N. Schraudolph, C-014 nici%cs at ucsd.edu University of California, San Diego nici%cs at ucsd.bitnet La Jolla, CA 92093-0114 ...!ucsd!cs!nici From erol at ehei.ehei.fr Wed Aug 8 07:12:24 1990 From: erol at ehei.ehei.fr (erol@ehei.ehei.fr) Date: Wed, 8 Aug 90 11:14:24 +2 Subject: abstract Message-ID: <9008081009.AA23199@inria.inria.fr> I would be very interested to get a copy of this paper. Thankyou in advance, Erol Gelenbe erol at ehei.ehei.fr From pkube at ucsd.edu Wed Aug 8 15:23:30 1990 From: pkube at ucsd.edu (pkube@ucsd.edu) Date: Wed, 08 Aug 90 13:23:30 MDT Subject: ref for conjugate-gradient... In-Reply-To: Your message of Tue, 07 Aug 90 08:33:27 EDT. <9008071233.AA25689@starbase> Message-ID: <9008082023.AA07129@kokoro.ucsd.edu> For understanding and implementing conjugate gradient and other optimization methods cleverer than vanilla backprop, I've found the following to be useful: %A William H. Press %T Numerical Recipes in C: The Art of Scientific Computing %I Cambridge University Press %D 1988 %A J. E. Dennis %A R. B. Schnabel %T Numerical Methods for Unconstrained Optimization and Nonlinear Equations %I Prentice-Hall %D 1983 %A R. Fletcher %T Practical Methods of Optimization, Vol. 1: Unconstrained Optimization %I John Wiley & Sons %D 1980 --Paul Kube at ucsd.edu From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Wed Aug 8 10:09:48 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Wed, 08 Aug 90 10:09:48 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Wed, 08 Aug 90 08:39:55 -0500. <9008081339.AA05550@learning.siemens.com.siemens.com> Message-ID: I got this clarification from Steve Hanson of his original query, which I found a bit cryptic: Isn't cascade Correlation a version (almost exact except for splitting rule--although I believe CART allows for other splitting rules) of CART---the decision tree with the hyperplane feature space cuts...? My memory of Cart is a bit fuzzy, but I think it's very different from Cascade-Correlation. Unless I'm confused, here are a couple of glaring differences: 1. In a decision-tree setup like CART, each new split works within one of the regions of space that you've already carved out -- that is, within only one branch of the tree. So for something like N-bit parity, you'd need 2^N hidden units (hyperplanes). In a single-layer backprop net, you need only N hidden units because they are shared. Because it creates higher-order units, Cascade-Correlation can generally do the job in less than N. (See the results in the Cascade-Correlation paper.) I don't remember if any version of CART makes serendipitous use of hyperplanes that were created earlier to split other branches. I am pretty sure, however, that it works on splitting just one branch at a time, and doesn't actively try to create hyperplanes that are useful in splitting many branches at once. 2. If you create all your new hidden units in a single layer, all you can do is create hyperplanes in the original space of input features. Because it builds up multiple layers, Cascade-Correlation can create higher-order units of great complexity, not just hyperplanes. If you have the tech report on Cascade-Correlation (the diagrams had to be cut from the NIPS version due to page limitations), look at the strange complex curves it creates in solving the two-spirals problem. If you prefer, Cascade-Correlation works by raising the dimensionality of the space and then drawing hyperplanes in this new complex space, but the projection back onto the original input space does not look like a straight line. I've never heard of anyone solving the two-spirals problem with a single layer of sigmoid or threshold units -- it would take an awful lot of them. I think that these two differences change the game entirely. The only resemblance I see between CART and Cascade-Correlation is that both build up a structure little by little, trying to add new nonlinear elements that eliminate some part of the remaining error. But the kinds of structures the two algorithms deal in is qualitatively different. -- Scott From pollack at cis.ohio-state.edu Wed Aug 8 02:11:16 1990 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Wed, 8 Aug 90 02:11:16 -0400 Subject: Cascade-Correlation, etc Message-ID: <9008080611.AA11352@dendrite.cis.ohio-state.edu> Scott's description of his method and the need for a convergence proof, reminded me of the line of research by Meir & Domany (Complex Sys 2 1988) and Nadal & Mezard (Int.Jrnl. Neural Sys 1,1,1989). In a paper definitely related to theirs (which I cannot find), someone proved (by construction) that each hidden unit added on top of a feedforward TLU network could monotonically decrease the number of errors for arbitrary-fan-in, single-output boolean functions. This result might be generalizable to CC networks. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Fax/Phone: (614) 292-4890 From FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU Thu Aug 9 01:51:08 1990 From: FEGROSS%WEIZMANN.BITNET at VMA.CC.CMU.EDU (Tal Grossman) Date: Thu, 09 Aug 90 08:51:08 +0300 Subject: Network Constructing Algorithms. Message-ID: Network constructing algorithms, i.e. learning algorithms which add units while training, receive a lot of interest these days. I've recently compiled a reference list of papers presenting such algorithms. I send this list as a small contribution to the last discussion. I hope people will find it relevant and usefull. Of course, it is probably not exhaostive - and I'd like to hear about any other related work. Note that two refs. are quite old (Hopcroft and Cameron) - from the threshold logic days. A few papers include convergence proofs (Frean, Gallant, Mezard and Nadal, Marchand et al). Naturally, there is a significant overlap between some of the algorithms/architecture. I also appologize for the primitive Tex format. Tal grossman < fegross at weizmann> Electronics Dept. Weizmann Inst. Rehovot 76100, ISRAEL. ------------------------------------------------------------------------------- \centerline{\bf Network Generating Learning Algortihms - Refernces.} T. Ash, ``Dynamic Node Creation in Back-Propagation Networks", Tech.Rep.8901, Inst. for Cognitive Sci., Univ. of California, San-Diego. Cameron S.H., ``The Generation of Minimal Threshold Nets by an Integer Program", IEEE TEC {\bf EC-13},299 (1964). S.E. Fahlman and C.L. Lebiere, ``The Cascade-Correlation Learning Architecture", in {\it Advances in Neural Information Processing Systems 2}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 524. M. Frean, ``The Upstart Algorithm: a Method for Constructing and Trainig Feed Forward Neural Networks", Neural Computation {\bf 2}:2 (1990). S.I. Gallant, ``Perceptron -Based Learning Algorithms", IEEE Trans. on Neural Networks {\bf 1}, 179 (1990). M. Golea and M. Marchand, ``A Growth Algorithm for Neural Network Decision Trees", EuroPhys.Lett. {\bf 12}, 205 (1990). S.J. Hanson, ``Meiosis Networks", in {\it Advances in Neural Information Processing Systems 2}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1990), pp. 533. Honavar V. and Uhr L. in the {\it Proc. of the 1988 Connectionist Models Summer School}, Touretzky D., Hinton G. and Sejnowski T. eds. (Morgan Kaufmann, San Mateo, 1988). Hopcroft J.E. and Mattson R.L., ``Synthesis of Minimal Threshold Logic Networks", IEEE TEC {\bf EC-14}, 552 (1965). Mezard M. and Nadal J.P., ``Learning in Feed Forward Layered Networks - The Tiling Algorithm", J.Phys.A {\bf 22}, 2129 (1989). J.Moody, ``Fast Learning in Multi Resolution Hierarchies", in {\it Advances in Neural Information Processing Systems 1}, D.S. Touretzky ed. (Morgan Kaufmann, San Mateo 1989). J.P. Nadal, ``Study of a Growth Algorithm for a Feed Forward Network", International J. of Neural Systems {\bf 1}, 55 (1989). Rujan P. and Marchand M., ``Learning by Activating Neurons: A New Approach to Learning in Neural Networks", Complex Systems {\bf 3}, 229 (1989); and also in the {\it Proc. of the First International Joint Conference on Neural Networks - Washington D.C. 1989}, Vol.II, pp.105. J.A. Sirat and J.P. Nadal, ``Neural Trees: A New Tool for Classification", preprint, submitted to "Network", April 90. \bye From LAUTRUP at nbivax.nbi.dk Thu Aug 9 05:19:00 1990 From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup) Date: Thu, 9 Aug 90 11:19 +0200 (NBI, Copenhagen) Subject: International Journal of Neural Systems Message-ID: <510E1F38537FE1E6AD@nbivax.nbi.dk> Begin Message: ----------------------------------------------------------------------- INTERNATIONAL JOURNAL OF NEURAL SYSTEMS The International Journal of Neural Systems is a quarterly journal which covers information processing in natural and artificial neural systems. It publishes original contributions on all aspects of this broad subject which involves physics, biology, psychology, computer science and engineering. Contributions include research papers, reviews and short communications. The journal presents a fresh undogmatic attitude towards this multidisciplinary field with the aim to be a forum for novel ideas and improved understanding of collective and cooperative phenomena with computational capabilities. ISSN: 0129-0657 (IJNS) ---------------------------------- Contents of issue number 3 (1990): 1. A. S. Weigend, B. A. Huberman and D. E. Rumelhart: Predicting the future: A connectionist approach. 2. C. Chinchuan, M. Shanblatt and C. Maa: An artificial neural network algorithm for dynamic programming. 3. L. Fan and T. Li: Design of competition based neural networks for combinatorial optimization. 4. E. A. Ferran and R. P. J. Perazzo: Dislexic behaviour of feed-forward neural networks. 5. E. Milloti: Sigmoid versus step functions in feed-forward neural networks. 6. D. Horn and M. Usher: Excitatory-inhibitory networks with dynamical thresholds. 7. J. G. Sutherland: A holographic model of memory, learning and expression. 8. L. Xu: Adding top-down expectations into the learning procedure of self-organizing maps. 9. D. Stork: BOOK REVIEW ---------------------------------- Editorial board: B. Lautrup (Niels Bohr Institute, Denmark) (Editor-in-charge) S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) D. Stork (Stanford) (Book review editor) Associate editors: B. Baird (Berkeley) D. Ballard (University of Rochester) E. Baum (NEC Research Institute) S. Bjornsson (University of Iceland) J. M. Bower (CalTech) S. S. Chen (University of North Carolina) R. Eckmiller (University of Dusseldorf) J. L. Elman (University of California, San Diego) M. V. Feigelman (Landau Institute for Theoretical Physics) F. Fogelman-Soulie (Paris) K. Fukushima (Osaka University) A. Gjedde (Montreal Neurological Institute) S. Grillner (Nobel Institute for Neurophysiology, Stockholm) T. Gulliksen (University of Oslo) D. Hammerstroem (University of Oregon) J. Hounsgaard (University of Copenhagen) B. A. Huberman (XEROX PARC) L. B. Ioffe (Landau Institute for Theoretical Physics) P. I. M. Johannesma (Katholieke Univ. Nijmegen) M. Jordan (MIT) G. Josin (Neural Systems Inc.) I. Kanter (Princeton University) J. H. Kaas (Vanderbilt University) A. Lansner (Royal Institute of Technology, Stockholm) A. Lapedes (Los Alamos) B. McWhinney (Carnegie-Mellon University) M. Mezard (Ecole Normale Superieure, Paris) A. F. Murray (University of Edinburgh) J. P. Nadal (Ecole Normale Superieure, Paris) E. Oja (Lappeenranta University of Technology, Finland) N. Parga (Centro Atomico Bariloche, Argentina) S. Patarnello (IBM ECSEC, Italy) P. Peretto (Centre d'Etudes Nucleaires de Grenoble) C. Peterson (University of Lund) K. Plunkett (University of Aarhus) S. A. Solla (AT&T Bell Labs) M. A. Virasoro (University of Rome) D. J. Wallace (University of Edinburgh) D. Zipser (University of California, San Diego) ---------------------------------- CALL FOR PAPERS Original contributions consistent with the scope of the journal are welcome. Complete instructions as well as sample copies and subscription information are available from The Editorial Secretariat, IJNS World Scientific Publishing Co. Pte. Ltd. 73, Lynton Mead, Totteridge London N20 8DH ENGLAND Telephone: (44)1-446-2461 or World Scientific Publishing Co. Inc. 687 Hardwell St. Teaneck New Jersey 07666 USA Telephone: (1)201-837-8858 or World Scientific Publishing Co. Pte. Ltd. Farrer Road, P. O. Box 128 SINGAPORE 9128 Telephone (65)278-6188 ----------------------------------------------------------------------- End Message From tgd at turing.CS.ORST.EDU Thu Aug 9 01:36:10 1990 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Wed, 8 Aug 90 22:36:10 PDT Subject: Similarity to Cascade-Correlation In-Reply-To: Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU's message of Wed, 08 Aug 90 10:09:48 EDT <9008090152.AA19554@CS.ORST.EDU> Message-ID: <9008090536.AA01129@turing.CS.ORST.EDU> As someone with a lot of experience in decision-tree learning algorithms, I agree with Scott. The main similarity between Cascade-Correlation (CC) and decision tree algorithms like CART is that they are both greedy. CART and related algorithms (e.g., ID3, C4, CN2, GREEDY3) all work by choosing an (axis-parallel) hyperplane and then subdividing the training data along that hyperplane, whereas CC keeps all of the training data together and keeps retraining the output units as it incrementlly adds hidden units. There is an algorithm, called FRINGE, that learns a decision tree and then uses that tree to define new features which are then used to build a new tree (and this process can be repeated, of course). This is the best example I know of a non-connectionist (supervised) algorithm for defining new features. --Tom From jose at learning.siemens.com Thu Aug 9 10:14:39 1990 From: jose at learning.siemens.com (Steve Hanson) Date: Thu, 9 Aug 90 09:14:39 EST Subject: Similarity to Cascade-Correlation Message-ID: <9008091414.AA07343@learning.siemens.com.siemens.com> thanks for the clarification... however, as I understand CART, it is not required to construct an axis-parallel hyperplane (like ID3 etc..), like CC any hyperplane is possible. Now as I understand CC it does freeze the weights for each hidden unit once asymptotic learning takes place and takes as input to a next candidate hidden unit the frozen hidden unit output (ie hyperplane decision or discriminant function). Consequently, CC does not "...keep all of the training data together and retraining the output units (weights?) as it incrementlly adds hidden units". As to higher-order hidden units... I guess i see what you mean, however, don't units below simply send a decision concerning the subset of data which they have correctly classified? Consequently, units above see the usual input features and a newly learned hidden unit feature indicating that a some subset of the input vectors are on one side of its decision surface? right? Consequently the next hidden unit in the "cascade" can learn to ignore that subset of the input space and concentrate on other parts of the input space that requires yet another hyperplane? It seems as tho this would produce a branching tree of discriminantS similar to cart. n'est pas? Steve From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Thu Aug 9 11:38:51 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Thu, 09 Aug 90 11:38:51 EDT Subject: Similarity to Cascade-Correlation In-Reply-To: Your message of Thu, 09 Aug 90 09:14:39 -0500. <9008091414.AA07343@learning.siemens.com.siemens.com> Message-ID: Now as I understand CC it does freeze the weights for each hidden unit once asymptotic learning takes place and takes as input to a next candidate hidden unit the frozen hidden unit output (ie hyperplane decision or discriminant function). Right. The frozen hidden unit becomes available both for forming an output and as an input to subsequent hidden units. An aside: Instead of "freezing", I've decided to call this "tenure" from now on. When a candidate unit becomes tenured, it no longer has to learn any new behavior, and from that point on other units will pay attention to what it says. Consequently, CC does not "...keep all of the training data together and retraining the output units (weights?) as it incrementlly adds hidden units". How does this follow from the above? As to higher-order hidden units... I guess i see what you mean, however, don't units below simply send a decision concerning the subset of data which they have correctly classified? It's not just a decision. The unit's output can assume any value in its continuous range. Some hidden units develop big weights and tend to act like sharp-threshold units, while others do not. Consequently, units above see the usual input features and a newly learned hidden unit feature indicating that a some subset of the input vectors are on one side of its decision surface? right? Right, modulo the comment above. Consequently the next hidden unit in the "cascade" can learn to ignore that subset of the input space and concentrate on other parts of the input space that requires yet another hyperplane? It seems as tho this would produce a branching tree of discriminantS similar to cart. No, this doesn't follow at all. Typically there are still errors on both sides of the unit just created, so the next unit doesn't ignore either "branch". It produces some new cut that typically subdivides all (or many) of the regions created so far. Again, I suggest you look at the diagrams in the tech report to see the kinds of "cuts" are actually created. n'est pas? Only eagles nest in passes. Lesser birds hide among the branches of decision trees. :-) -- Scott From Connectionists-Request at CS.CMU.EDU Thu Aug 9 13:33:59 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Thu, 09 Aug 90 13:33:59 EDT Subject: Return addresses Message-ID: <24309.650223239@B.GP.CS.CMU.EDU> I have received several complaints from Connectionists members that they are not able 'reply' to messages because the original sender's address has been removed from the message header. This is a problem with the receiver's local mailer. Rather than having me try to remotely trouble shoot 150 different mailers, the problem could be solved by including a return email address as part of the body of any message sent to Connectionists. I would also like to remind subscribers that a copy of main mailing list is available in the Connectionists archives. Scott Crowder Connectionists-Request at cs.cmu.edu (ARPAnet) <- see, it isn't that hard ------------------------------------------------------------------------------- The CONNECTIONISTS Archive: --------------------------- All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are now available for public perusal. A separate file exists for each month. The files' names are: arch.yymm where yymm stand for the obvious thing. Thus the earliest available data are in the file: arch.8802 Files ending with .Z are compressed using the standard unix compress program. To browse through these files (as well as through other files, see below) you must FTP them to your local machine. ------------------------------------------------------------------------------- How to FTP Files from the CONNECTIONISTS Archive ------------------------------------------------ 1. Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address 128.2.242.8). 2. Login as user anonymous with password your username. 3. 'cd' directly to one of the following directories: /usr/connect/connectionists/archives /usr/connect/connectionists/bibliographies 4. The archives and bibliographies directories are the ONLY ones you can access. You can't even find out whether any other directories exist. If you are using the 'cd' command you must cd DIRECTLY into one of these two directories. Access will be denied to any others, including their parent directory. 5. The archives subdirectory contains back issues of the mailing list. Some bibliographies are in the bibliographies subdirectory. Problems? - contact us at "Connectionists-Request at cs.cmu.edu". Happy Browsing Scott Crowder Connectionists-Request at cs.cmu.edu ------------------------------------------------------------------------------- From orjan at thalamus.sans.bion.kth.se Thu Aug 9 19:47:34 1990 From: orjan at thalamus.sans.bion.kth.se (Orjan Ekeberg) Date: Thu, 09 Aug 90 19:47:34 N Subject: Network Constructing Algorithms. In-Reply-To: Your message of Thu, 09 Aug 90 08:51:08 O. <9008091705.AAgarbo.bion.kth.se13977@garbo.bion.kth.se> Message-ID: <9008091747.AA12363@thalamus> I assume that some of the work that we have been doing would fit well in this context too. Based on a recurrent network, higher order units are added automatically. The new units become part of the recurrent set and helps to make the training patterns fixpoints of the network. A couple of references (in bibtex format): @inproceedings{sans:alaoe87, author = {Anders Lansner and {\"O}rjan Ekeberg}, year = 1987, title = {An Associative Network Solving the ``4-Bit ADDER Problem''}, booktitle = {Proceedings of the IEEE First Annual International Conference on Neural Networks}, pages = {II{-}549}, address = {San Diego, USA}, month = jun} @inproceedings{sans:paris88, author = {{\"O}rjan Ekeberg and Anders Lansner}, year = 1988, title = {Automatic Generation of Internal Representations in a Probabilistic Artificial Neural Network}, booktitle = {Neural Networks from Models to Applications}, editor = {L. Personnaz and G. Dreyfus}, publisher = {I.D.S.E.T.}, address = {Paris}, pages = {178--186}, note = {Proceedings of {nEuro}-88, The First European Conference on Neural Networks}, abstract = {In a one layer feedback perceptron type network, the connections can be viewed as coding the pairwise correlations between activity in the corresponding units. This can then be used to make statistical inference by means of a relaxation technique based on bayesian inferences. When such a network fails, it might be because the regularities are not visible as pairwise correlations. One cure would then be to use a different internal coding where selected higher order correlations are explicitly represented. A method for generating this representation automatically is reviewed and results from experiments regarding the resulting properties is presented with a special focus on the networks ability to generalize properly.}} +---------------------------------+-----------------------+ + Orjan Ekeberg + O---O---O + + Department of Computing Science + \ /|\ /| Studies of + + Royal Institute of Technology + O-O-O-O Artificial + + S-100 44 Stockholm, Sweden + |/ \ /| Neural + +---------------------------------+ O---O-O Systems + + EMail: orjan at bion.kth.se + SANS-project + +---------------------------------+-----------------------+ From pollack at cis.ohio-state.edu Thu Aug 9 12:14:19 1990 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Thu, 9 Aug 90 12:14:19 -0400 Subject: Cascade Correlation and Convergence Message-ID: <9008091614.AA14222@dendrite.cis.ohio-state.edu> Scott's description of his algorithm, and lack of convergence proof, reminded me of the line of research by Meir and Domany (Complex Systems 2, 1988) and Mezard and Nadal (Int J Neu Systems, 1,1 1989) on methods for directly constructing networks. In a related paper (which I cannot find), I'm quite sure that someone proved by construction that any (n input, 1 output) boolean function could be accomplished by a layering of TLU's, where each additional unit is guaranteed to decrease the number of mis-classified inputs. Perhaps this approach would help lead to some convergence proof for CC networks. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Fax/Phone: (614) 292-4890 From bgupta at aries.intel.com Thu Aug 9 19:19:58 1990 From: bgupta at aries.intel.com (Bhusan Gupta) Date: Thu, 9 Aug 90 16:19:58 PDT Subject: Job opening at Intel for NN IC designer Message-ID: <9008092319.AA04843@aries> The neural network group at Intel is looking for an engineer to participate in the development of neural networks. A qualified applicant should have a M.S. or PhD in electrical engineering or equivalent experience. The specialization required is in CMOS circuit design with an emphasis on digital design. Analog design experience is considered useful as well. Familiarity with neural network architectures, learning algorithms, and applications is desirable. The duties that are specific to this job are: Neural network design. Architecture definition and circuit design. Chip planning, layout supervision and verification. Testing and debugging silicon. The neural network design consists primarily of digital design with both a gate-level and transistor-level emphasis. The job is at the Santa Clara site and is currently open. Interested principals can email at bgupta at aries.intel.com until the end of August. Resumes in ascii are preferred. I will pass along all responses to the appropriate people. street address: Bhusan Gupta m/s sc9-40 2250 Mission College Blvd. P.O. Box 58125 Santa Clara, Ca 95052 Intel is an equal opportunity employer, etc. Bhusan Gupta From sg at corwin.ccs.northeastern.edu Thu Aug 9 14:34:35 1990 From: sg at corwin.ccs.northeastern.edu (steve gallant) Date: Thu, 9 Aug 90 14:34:35 EDT Subject: Cascade-Correlation, etc Message-ID: <9008091834.AA18306@corwin.CCS.Northeastern.EDU> To respond to Jordan's suggestion, if you copy the output cell from a stage in cascade correlation into your growing network, then the previous convergence results hold for boolean learning problems. This is true whether you copy at every stage or only occasionally. Scott tried a few simulations and there seemed to be some learning speed gain by occasional copying, perhaps 25% on the couple of tests he ran. Also, if I can add an early paper (that includes convergence) to Tal Grossman's list: Gallant, S. I\@. Three Constructive Algorithms for Network Learning. Proc.\ Eighth Annual Conference of the Cognitive Science Society, Amherst, Ma., Aug. 15-17, 1986, 652-660. Steve Gallant From marcus at cns.edinburgh.ac.uk Fri Aug 10 16:37:13 1990 From: marcus at cns.edinburgh.ac.uk (Marcus Frean) Date: Fri, 10 Aug 90 16:37:13 BST Subject: Convergence of constructive algorithms. Message-ID: <8340.9008101537@cns.ed.ac.uk> Jordan Pollack writes: > In a related paper (which I cannot find), I'm quite sure that someone > proved by construction that any (n input, 1 output) boolean function > could be accomplished by a layering of TLU's, where each additional > unit is guaranteed to decrease the number of mis-classified inputs. > Perhaps this approach would help lead to some convergence proof for CC > networks. There are several papers that show convergence via guaranteeing each unit reduces the output's errors by at least one. [NB: They all use linear threshold units, and require for convergence that the training set be composed of binary patterns (or at least convex: every pattern must be separable from all the others), since then the worst case is always that a new unit captures a single pattern and hence is able to correct the output unit by one.] These include The "Tower algorithm": Gallant,S.I. 1986a. Three Constructive Algorithms for Network Learning. Proc. 8th Annual Conf. of Cognitive Science Soc. p652-660. also discussed in Nadal,J. 1989. Study of a Growth Algorithm for Neural Networks International J. of Neural Systems, 1,1:55-59 The performance of this method closely matches that of the "Tiling" Algorithm of Mezard and Nadal, although the proof there is for reduction of at least one error per layer rather than per unit. The "neural decision tree" approach is shown to converge by M. Golea and M. Marchand, A Growth Algorithm for Neural Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). and also J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for Classification, preprint, submitted to "Network", April 90. The "Upstart" algorithm (my favourite....) Frean,M.R. 1990. The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks. Neural Computation. 2:2, 198-209. in which new units are devoted to correcting errors made by existing units (in this sense it has bears some resemblance to Cascade Correlation). A binary tree of units is constructed, but it is not a decision tree: "daughter" units correct their "parent", with the most senior parent being the output unit. Marcus. --------------------------------------------------------------------- From fanty at cse.ogi.edu Fri Aug 10 13:00:03 1990 From: fanty at cse.ogi.edu (Mark Fanty) Date: Fri, 10 Aug 90 10:00:03 -0700 Subject: conjugate gradient optimization program available Message-ID: <9008101700.AA03174@cse.ogi.edu> The speech group at OGI uses conjugate-gradient optimization to train fully connected feed-forward networks. We have made the program (OPT) available for anonymous ftp: 1. ftp to cse.ogi.edu 2. login as "anonymous" with any password 3. cd to "pub/speech" 4. get opt.tar OPT was written by Etienne Barnard at Carnegie-Mellon University. Mark Fanty Computer Science and Engineering Oregon Graduate Institute fanty at cse.ogi.edu 196000 NW Von Neumann Drive (503) 690-1030 Beaverton, OR 97006-1999 From amini at tcville.hac.com Sun Aug 12 23:47:14 1990 From: amini at tcville.hac.com (Afshin Amini) Date: Sun, 12 Aug 90 20:47:14 PDT Subject: signal processing with neural nets Message-ID: <9008130347.AA02757@ai.spl> Hi there: I would like to explore possibilities of using neural nets in a signal processing environment. I would like to get familiar with usage of neural nets in the area of spectral estimation and classification. I have used the popular methods of high resolution spectral estimation such as AR modeling and such. I would like to get some reffrences to recent publications and books that contain specific algorithms that deploys neural networks to achieve such problems in signal processing. thanks, -A. Amini -- Afshin Amini Hughes Aircraft Co. voice: (213) 616-6558 Electro-Optical and Data Systems Group Signal Processing Lab fax: (213) 607-0918 P.O. Box 902, EO/E1/B108 email: El Segundo, CA 90245 smart: amini at tcville.hac.com Bldg. E1 Room b2316f dumb: amini%tcville at hac2arpa.hac.com uucp: hacgate!tcville!dave From nelsonde%avlab.dnet at wrdc.af.mil Mon Aug 13 10:10:04 1990 From: nelsonde%avlab.dnet at wrdc.af.mil (nelsonde%avlab.dnet@wrdc.af.mil) Date: Mon, 13 Aug 90 10:10:04 EDT Subject: Last Call for Papers for AGARD Conference Message-ID: <9008131410.AA08887@wrdc.af.mil> I N T E R O F F I C E M E M O R A N D U M Date: 13-Aug-1990 10:05am EST From: Dale E. Nelson NELSONDE Dept: AAAT-1 Tel No: 57646 From sankar at caip.rutgers.edu Sun Aug 12 21:15:24 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Sun, 12 Aug 90 21:15:24 EDT Subject: No subject Message-ID: <9008130115.AA08572@caip.rutgers.edu> >>There are several papers that show convergence via guaranteeing each >>unit reduces the output's errors by at least one. >> >> >>The "neural decision tree" approach is shown to converge by >> M. Golea and M. Marchand, A Growth Algorithm for Neural >> Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). >>and also >> J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for >> Classification, preprint, submitted to "Network", April 90. Add to this the following paper: A. Sankar and R.J. Mammone, " A fast learning algorithm for tree neural networks", presented at the 1990 Conference on Information Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. This will appear in the conference proceedings. We also have a more detailed technical report on this research. For copies please contact Ananth Sankar CAIP 117 Brett and Bowser Roads Rutgers University P.O. Box 1390 Piscataway, NJ 08855-1390 From sankar at caip.rutgers.edu Mon Aug 13 13:48:53 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Mon, 13 Aug 90 13:48:53 EDT Subject: No subject Message-ID: <9008131748.AA07712@caip.rutgers.edu> An earlier attempt to mail this seems to have failed..my apologies to everyone who gets a duplicate copy. >>There are several papers that show convergence via guaranteeing each >>unit reduces the output's errors by at least one. >> >> >>The "neural decision tree" approach is shown to converge by >> M. Golea and M. Marchand, A Growth Algorithm for Neural >> Network Decision Trees, EuroPhys.Lett. 12, 205 (1990). >>and also >> J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for >> Classification, preprint, submitted to "Network", April 90. Add to this the following paper: A. Sankar and R.J. Mammone, " A fast learning algorithm for tree neural networks", presented at the 1990 Conference on Information Sciences and Systems, Princeton, NJ, March 21,22,23, 1990. This will appear in the conference proceedings. We also have a more detailed technical report on this research. For copies please contact Ananth Sankar CAIP 117 Brett and Bowser Roads Rutgers University P.O. Box 1390 Piscataway, NJ 08855-1390 From gary%cs at ucsd.edu Mon Aug 13 15:35:50 1990 From: gary%cs at ucsd.edu (Gary Cottrell) Date: Mon, 13 Aug 90 12:35:50 PDT Subject: Summary (long): pattern recognition comparisons In-Reply-To: Leonard Uhr's message of Fri, 3 Aug 90 14:18:11 -0500 <9008031918.AA23586@thor.cs.wisc.edu> Message-ID: <9008131935.AA19428@desi.ucsd.edu> Leonard Uhr says: >Neural nets using backprop have only handled VERY SIMPLE images, usually in >8-by-8 arrays. (We've used 32-by-32 arrays to investigate generation in >logarithmically converging nets, but I don't know of any nets with complete >connectivity from one layer to the next that are that big.) Mike Fleming and I used 64x64 inputs for face recognition. The system does auto-encoding as a preprocessing step, reducing the number of inputs to 80. See IJCNN-90, Vol II p65->. gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at cs.ucsd.edu (ARPA) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET) gcottrell at ucsd.edu (BITNET) From kuepper at ICSI.Berkeley.EDU Tue Aug 14 14:32:37 1990 From: kuepper at ICSI.Berkeley.EDU (Wolfgang Kuepper) Date: Tue, 14 Aug 90 11:32:37 PDT Subject: SIEMENS Job Announcement Message-ID: <9008141832.AA02344@icsib21.Berkeley.EDU> IMAGE UNDERSTANDING and ARTIFICIAL NEURAL NETWORKS The Corporate Research and Development Laboratories of Siemens AG, one of the largest companies worldwide in the electrical and elec- tronics industry, have research openings in the Computer Vision as well as in the Neural Network Groups. The groups do basic and applied studies in the areas of image understanding (document inter- pretation, object recognition, 3D modeling, application of neural networks) and artificial neural networks (models, implementations, selected applications). The Laboratory is located in Munich, an attractive city in the south of the Federal Republic of Germany. Connections exists with our sister laboratory, Siemens Corporate Research in Princeton, as well as with various research institutes and universities in Germany and in the U.S. including MIT, CMU and ICSI. Above and beyond the Laboratory facilities, the groups have a network of Sun and DEC workstations, Symbolics Lisp machines, file and compute servers, and dedicated image processing hardware. The successful candidate should have an M.S. or Ph.D. in Computer Science, Electrical Engineering, or any other AI-related or Cognitive Science field. He or she should prefarably be able to communicate in German and English. Siemens is an equal opportunity employer. Please send your resume and a reference list to Peter Moeckel Siemens AG ZFE IS INF 1 Otto-Hahn-Ring 6 D-8000 Muenchen 83 West Germany e-mail: gm%bsun4 at ztivax.siemens.com Tel. +49-89-636-3372 FAX +49-89-636-2393 Inquiries may also be directed to Wolfgang Kuepper (on leave from Siemens until 8/91) International Computer Science Institute 1947 Center Street - Suite 600 Berkeley, CA 94704 e-mail: kuepper at icsi.berkeley.edu Tel. (415) 643-9153 FAX (415) 643-7684 From Connectionists-Request at CS.CMU.EDU Thu Aug 16 12:31:34 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Thu, 16 Aug 90 12:31:34 EDT Subject: patience is a virtue Message-ID: <4776.650824294@B.GP.CS.CMU.EDU> Recently a few people have worried that their posts were lost because of the long resend time for messages to the connectionists list. I would like for all users to exercise a little patience. CMU is happy to provide the resources and labor necessary to make the Connectionists list available to the world wide connectionists community. However, we do have limited resources. The Connectionists redistribution machine is a only a VAX 750. This machine also services several other large mailing lists. Delays of 4-6 hours are typical, but delays of >16 hours are possible during high traffic periods. If you are trying to debate an issue with another list member, but think the rest of the list would be interested in the debate it is best to email directly to the other member and cc: Connectionists at cs.cmu.edu. This allows you to carry on your debate at normal email speeds and lets the rest of the community 'listen in' 6-16 hrs latter. If you feel that the delays are a serious impediment to the research progress of the connectionists community, CMU would be happy to accept your donation of new dedicated Connectionists redistribution machine. Scott Crowder Connectionists-Request at cs.cmu.edu (ARPAnet) PS If you have waited more than 24 hours and STILL haven't recieved your post, please contact me at Connectionists-Request at cs.cmu.edu. From xiru at Think.COM Fri Aug 17 16:48:58 1990 From: xiru at Think.COM (xiru@Think.COM) Date: Fri, 17 Aug 90 16:48:58 EDT Subject: backprop for classification Message-ID: <9008172048.AA00756@yangtze.think.com> While we trained a standard backprop network for some classification task (one output unit for each class), we found that when the classes are not evenly distribed in the training set, e.g., 50% of the training data belong to one class, 10% belong to another, ... etc., then the network always biased towards the classes that have the higher percentage in the training set. Thus, we had to post-process the output of the network, giving more weights to the classes that occur less frequently (in reverse proportion to their population). I wonder if other people have encountered the same problem, and if there are better ways to deal with this problem. Thanks in advance for any replies. - Xiru Zhang Thinking Machines Corp. From John.Hampshire at SPEECH2.CS.CMU.EDU Sun Aug 19 13:48:06 1990 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Sun, 19 Aug 90 13:48:06 EDT Subject: backprop for classification Message-ID: Xiru Zhang of Thinking Machines Corp. writes: > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > Thus, we had to post-process the output of the network, giving more weights > to the classes that occur less frequently (in reverse proportion to their > population). > > I wonder if other people have encountered the same problem, and if there > are better ways to deal with this problem. Indeed, one can show that any classifier with sufficient functional capacity to model the class-conditional densities of the random vector X being classified (e.g., a MLP with sufficient connectivity to perform the input-to-output functional mapping necessary for robust classification) and trained with a "reasonable error measure" (a term originated by B. Pearlmutter) will yield outputs that are accurate estimates of the a posteriori probabilities of X, given an asymptotically large number of statistically independent training samples. Examples of "reasonable error measures" are mean-squared error (the one used by Xiru Zhang), Cross Entropy, Max. Mutual Info., Kullback-Liebler distance, Max. Likelihood... Unfortunately, one never has enough training data, and it's not always clear what constitutes sufficient but not excessive functional capacity in the classifier. So one ends up *estimating* the a posterioris with one's "reasonable error measure"-trained classifier. If one trains one's classifier with a disproportionately high number of samples belonging to one particular class, one will get precisely the behavior Xiru Zhang describes. ************** This is because the a posterioris depend on the class priors (you can prove this easily using Bayes' rule). If you bias the priors, you will bias the a posterioris accordingly. Your classifier will therefore learn to estimate the biased a posterioris. ************** The best way to fix the problem if you're using a "reasonable error measure" to train your classifier is to have a training set that reflects the true class priors. If this isn't possible, then you can post-process the classifier's outputs by correcting for the biased priors. Whether or not this fix really works depends a lot on the classifier you're using. MLPs tend to be over-parameterized, so they tend to yield binary outputs that won't be affected by this kind of post processing. Another approach might be to avoid using "reasonable error measures" to train your classifier. I have more info regarding such alternatives if anyone cares, but I've already blabbed too much. If you want refs., please send me email directly. Cheers, John From niranjan at engineering.cambridge.ac.uk Sun Aug 19 10:11:29 1990 From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan) Date: Sun, 19 Aug 90 10:11:29 BST Subject: backprop for classification Message-ID: <3447.9008190911@dsl.eng.cam.ac.uk> > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > This often happens when the network is too small to load the training data. Your network, in this case, does not converge to negligible error. My suggestion is to start with a large network that can load your training data and gradually reduce the size of the net by pruning the weights giving small contributions to the output error. niranjan From russ at dash.mitre.org Mon Aug 20 07:17:38 1990 From: russ at dash.mitre.org (Russell Leighton) Date: Mon, 20 Aug 90 07:17:38 EDT Subject: backprop for classification In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com> Message-ID: <9008201117.AA22280@dash.mitre.org> We have found backprop VERY sensitive to the probability of occurance of each class. As long as you are aware of this you can use this to advantange. For example, if false alarms are a big concern then by training with large amounts of "noise" you can bias the sytem to reduce the Pfa. This effect has been quantified analytically and experimentally for systems with no hidden layers in a paper being compiled now. The bottom line is that a no hidden layer system implements a classical Mini-Max test if the signal classes are represented equally in the training set. By varying the the composition of the training sets, the network can be designed relative to a known maximum false alarm probablity independent of signal-to-noise ratio. This work continues for multi-layer systems. An experimental account of how to exploit this effect for signal classification can be found in: Wieland, et al., `An Analysis of Noise Tolerance for a Neural Network Recognition System', Mitre Tech. Rep. MP-88W00021, 1988 and Wieland, et al., `Shaping Schedules as a Method of Accelerated Learning', Proceedings of the first INNS Meeting, 1988 Russ. NFSNET: russ at dash.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA From wan at whirlwind.Stanford.EDU Mon Aug 20 14:07:39 1990 From: wan at whirlwind.Stanford.EDU (Eric A. Wan) Date: Mon, 20 Aug 90 11:07:39 PDT Subject: Survey of Second Order Techniques Message-ID: <9008201807.AA13338@whirlwind.Stanford.EDU> I am compiling a study on the extent to which researches have gone beyond simple gradient descent (back-propagation) for training layered neural networks by applying more sophisticated classical techniques in non-linear optimization (e.g. Newton, Quasi-Newton, Conjugate-Gradient methods, etc.)? Please e-mail me any comments and/or references that you have on the subject. I will summarize the responses. Thanks in advance. Eric Wan wan at isl.stanford.edu From YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU Mon Aug 20 11:36:47 1990 From: YVES%LAVALVM1.BITNET at vma.CC.CMU.EDU (Yves (Zip) Lacouture) Date: Mon, 20 Aug 90 11:36:47 HAE Subject: BP for categorization... Message-ID: > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > I encountered the same problem in a similar situation. This occur with limited resources (HU): the network tend to neglet a subset of the stimuli. The phenomenon is also observed when the stimuli have the same presentation probability and the resources are very limited. It helps to use a non-orthogonal representation (e.g. by activating neighbor units). To build a model of (human) simple identification I modified BP to incorporate a selective attention mechanism by which the adaptative modifications are made larger for the stimuli for which performances are worse. I expect to offer a TR on this topic soon. yves From chrisley at parc.xerox.com Mon Aug 20 13:35:08 1990 From: chrisley at parc.xerox.com (Ron Chrisley) Date: Mon, 20 Aug 90 10:35:08 PDT Subject: backprop for classification In-Reply-To: xiru@Think.COM's message of Fri, 17 Aug 90 16:48:58 EDT <9008172048.AA00756@yangtze.think.com> Message-ID: <9008201735.AA07158@owl.parc.xerox.com> Xiru, you wrote: "While we trained a standard backprop network for some classification task (one output unit for each class), we found that when the classes are not evenly distribed in the training set, e.g., 50% of the training data belong to one class, 10% belong to another, ... etc., then the network always biased towards the classes that have the higher percentage in the training set. Thus, we had to post-process the output of the network, giving more weights to the classes that occur less frequently (in reverse proportion to their population)." My suggestion: most BP classification paradigms will work best if you are using the same distribution for training as for testing. So only worry about uneven distribution of classes in the training data if the input on which the network will have to perform does not have that distribution. If rocks are 1000 times more common than mines, then given that something is completely qualitatively ambiguous with respect to the rock/mine distinction, it is best (in terms of minimizing # of misclassifications) to guess that the thing is a rock. So being biased toward rock classifications is a valid way to minimize misclassification. (Of course, once you start factoring in cost, this will be skewed dramatically: it is much better to have a false alarm about a mine than to falsely think a mine is a rock.) In summary, uneven distributions aren't, in themselves, bad for training, nor do they require any post-processing. However, distributions that differ from real-world ones will require some sort of post-processing, as you have done. But there is another issue here, I think. How were you using the network for classification? From your message, it sounds like you were training and interpreting the network in such a way that the activations of the output nodes were supposed to correspond to the conditional probabilities of the different classes, given the input. This would explain what you meant by your last sentence in the above quote. But there are other ways of using back-propagation. For instance, if one does not constrain the network to estimate conditional probabilities, but instead has it solve the more general problem of minimizing classification error, then it is possible that the network will come up with a solution that is not affected by differences of prior probabilities of classes in the training and testing data. Since it is not solving the problem by classifying via maximum liklihood, its solutions will be based on the frequency-independent, qualitative structure of the inputs. In fact, humans often do something like this. The phenomenon is called "base rate neglect". The phenomenon is notorious in that when qualitative differences are not so marked between a rare and a common class, humans will always over-classify inputs into the rare class. That is, if the symptoms a patient has even *slightly* indicate a rare tropical disease over a common cold, humans will give the rare disease dignosis, even though it is extremely unlikely that the patient has that disease. Of course, the issue of cost is again being ignored here. (See Gluck and Bower for a look at the relation between neural networks and base rate neglect). Such limitations aside, classification via means other than conditional probability estimation may be desirable for certain applications. For example, those in which you do not know the priors, or they change dramatically in an unpredictable way. And/or where there is a strong qualitative division bewteen members of the classes. In such cases, you might get good classification performance, even when the distributions differ, by relying more on qualitative differences in the inputs than in the frequency of the classes. Does this sound right? Ron Chrisley chrisley at csli.stanford.edu Xerox PARC SSL New College Palo Alto, CA 94304 Oxford OX1 3BN, UK (415) 494-4728 (865) 793-484 From niranjan at engineering.cambridge.ac.uk Tue Aug 21 20:20:36 1990 From: niranjan at engineering.cambridge.ac.uk (Mahesan Niranjan) Date: Tue, 21 Aug 90 20:20:36 BST Subject: Backprop for classification Message-ID: <5229.9008211920@dsl.eng.cam.ac.uk> > From: xiru at com.think > Subject: backprop for classification > Date: 19 Aug 90 00:26:28 GMT > > While we trained a standard backprop network for some classification task > (one output unit for each class), we found that when the classes are not > evenly distribed in the training set, e.g., 50% of the training data belong > to one class, 10% belong to another, ... etc., then the network always biased > towards the classes that have the higher percentage in the training set. > This often happens when the network is too small to load the training data. Your network, in this case, does not converge to negligible error. My suggestion is to start with a large network that can load your training data and gradually reduce the size of the net by pruning the weights giving small contributions to the output error. niranjan From der%beren at Forsythe.Stanford.EDU Wed Aug 22 13:35:59 1990 From: der%beren at Forsythe.Stanford.EDU (Dave Rumelhart) Date: Wed, 22 Aug 90 10:35:59 PDT Subject: BP for categorization...relative frequency problem In-Reply-To: "Yves (Zip) Lacouture"'s message of Mon, 20 Aug 90 11:36:47 HAE <9008210406.AA11690@nprdc.navy.mil> Message-ID: <9008221735.AA07583@beren.> We have also encountered the problem. Since BP does gradient descent and since the contribution of any set of patterns depends in part on the relative frequency of those patterns, fewer resources are allocated to low fequency categories. Morover, those resources are allocated later in the training -- probably after over-fitting has already become a problem for higher frequency categories. Of course, if your training distribution is the same as your testing distribution you wil be getting the appropriate Baysian estimate of the class probabilities. On the other hand, if the generalization distribution is unknown at test time we may wish to factor out the relative frequency of your input frequency during training and add any known "priors" during generalization. There are two ways to do this. One way, suggested in one of the notes on this topic is to "post process" out output data. That is, divide the output unit value by the relative frequency in the training set and multiply by the relative frequency in the test set. This will give you an estimate of the Bayesian probability for the test set. For a variety of reasons, this is less appropriate that correcting during training. In this case, the procedure is to effectively increase the learning rate inversely proportional to the relative frequency of the category in the training set. Thus, we take bigger learning steps on low frequency categories. In a simple classification task, this is roughly equivalent to normalizing the data set by sampling each category set equally. In the case of cross-classification (in whihch a given input can be a member of more the one class), it is roughly equivalent to weighting each inversely by the probability that that pattern would occur, given independence between the output classes. We have used this method successfully in a system designed to classify mass spectra. In this method an output of .5 means that the evidence for and against the category is equal. Whereas, in the normal traing method, an output equal to the relative frequency in the training set means that the evidence for and against is equal. In some cases this can be very small. It is possibly to add the priors in manually and compare performance on the training set with the original method. We find that we do only slightly worse on the training set with the two methods. We do much better in generalization on classes that were low frequency in the training set and slightly worse on classes which were high frequency in the training set. der From hendler at cs.UMD.EDU Wed Aug 22 16:28:52 1990 From: hendler at cs.UMD.EDU (Jim Hendler) Date: Wed, 22 Aug 90 16:28:52 -0400 Subject: BP for categorization...relative frequency problem Message-ID: <9008222028.AA09120@dormouse.cs.UMD.EDU> Herve Bourlard and Nelson Morgan had to deal with this problem in a system being used in the context of continuous speech recognition. They solved the problem, to some extent, by dividing the output category strengths by the prior probabilities of the training set. This avoided having to do anything terribly tricky in the network, and let them use classical back-propagation without extension (although I think they've also used some recurrences in one version). I know there have been several nice publications of their work in speech - various papers with the authors Bourlard, Wellekens, and Morgan in various combinations. Morgan is at ICSI, and is probably the most accessible of these authors for requesting reprints. -Jim Hendler UMCP From PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU Wed Aug 22 14:47:17 1990 From: PSS001%VAXA.BANGOR.AC.UK at vma.CC.CMU.EDU (PSS001%VAXA.BANGOR.AC.UK@vma.CC.CMU.EDU) Date: Wed, 22 AUG 90 18:47:17 GMT Subject: No subject Message-ID: Department of Psychology, University of Wales, Bangor and Department of Psychology, University of York CONNECTIONISM AND PSYCHOLOGY THREE POST-DOCTORAL RESEARCH FELLOWSHIPS Applications are invited for three post-doctoral research fellowships to work on the connectionist and psychological modelling of human short-term memory and spelling development. Two Fellowships are available for three years, on an ESRC- funded project concerned with the development and evaluation of a connectionist model of short-term memory. One Fellow will be based with Dr. Gordon Brown in the Cognitive Neurocomputation Unit at Bangor and will be responsible for implementing the model. The other Fellow, based at York with Dr. Charles Hulme, will be responsible for undertaking psychological experiments with children and adults to evaluate the model. Starting salary for both posts on research 1A grade up to # 13,495. One two-year Fellowship is available to work on an MRC-funded project to develop a sequential connectionist model of the development of spelling and phonemic awareness in children. This post is based in Bangor with Dr. Gordon Brown. Starting salary on research 1A grade up to # 14,744. Applicants should have postgraduate research experience or interest in cognitive psychology/cognitive science or connectionist/ neural network modelling and computer science. Good computing skills are essential for the posts based in Bangor, and experience in running psychological experiments is required for the York-based post. Excellent computational and research facilities will be available to the successful applicants. The appointments may commence from 1st. October 1990, but start could be delayed until 1st. January 1991. Closing date for applications is 7th. September 1990, but intending applicants should get in touch as soon as possible. Informal enquiries regarding the Bangor-based posts, and requests for further details of the posts and host departments, to Gordon Brown (0248 351151 Ext 2624; email PSS001 at uk.ac.bangor.vaxa); informal enquiries concerning the York-based post to Charles Hulme ( 0904 433145; email ch1 at uk.ac.york.vaxa). Applications (in the form of a curriculum vitae and the names and addresses of two referees) should be sent to Mr. Alan James, Personnel Office, University of Wales, Bangor, Gwynedd LL57 2DG, UK. (Apologies to anyone who receives this posting through more than one list or newsgroup) From MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU Thu Aug 23 17:22:00 1990 From: MUSICO%BGERUG51.BITNET at vma.CC.CMU.EDU (MUSICO%BGERUG51.BITNET@vma.CC.CMU.EDU) Date: Thu, 23 Aug 90 17:22 N Subject: signoff Message-ID: signoff From HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU Fri Aug 24 12:08:15 1990 From: HKF218%DJUKFA11.BITNET at vma.CC.CMU.EDU (Gregory Kohring) Date: Fri, 24 Aug 90 12:08:15 MES Subject: Preprints Message-ID: The following preprint is currently available. -- Greg Kohring Performance Enhancement of Willshaw Type Networks through the use of Limit Cycles G.A. Kohring HLRZ an der KFA Julich (Supercomputing Center at the KFA Julich) Simulation results of a Willshaw type model for storing sparsely coded patterns are presented. It is suggested that random patterns can be stored in Willshaw type models by transforming them into a set of sparsely coded patterns and retrieving this set as a limit cycle. In this way, the number of steps needed to recall a pattern will be a function of the amount of information the pattern contains. A general algorithm for simulating neural networks with sparsely coded patterns is also discussed, and, on a fully connected network of N=36 864 neurons (1.4 billion couplings), it is shown to achieve effective updating speeds as high as 160 billion coupling evaluations per second on one Cray-YMP processor. ================================================================== Additionally, the following short review article is also available. It is aimed at graduate students in computational physics who need an overview of the neural network literature from a computational sciences viewpoint, as well as some simple programming hints in order to get started with their neural network studies. It will shortly appear in World Scientific's Internationl Journal of Modern Physics C: Compuational Physics. LARGE SCALE NEURAL NETWORK SIMULATIONS G.A. Kohring HLRZ an der KFA Julich (Supercomputing Center at the KFA Julich) The current state of large scale, numerical simulations of neural networks is reviewed. Hardware and software improvements make it likely that biological size networks, i.e., networks with more than $10^{10}$ couplings, can be simulated in the near future. Sample programs for the efficient simulation of a few simple models are presented as an aid to researchers just entering the field. Send Correspondence and request for preprints to: G.A. Kohring HLRZ an der KFA Julich Postfach 1913 D-5170 Julich, West Germany e-mail: hkf218 at djukfa11.bitnet Address after September 1, 1990: Institut fur Theoretische Physik Universitat zu Koln D-5000 Koln 41, West Germany From Connectionists-Request at CS.CMU.EDU Fri Aug 24 10:31:02 1990 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Fri, 24 Aug 90 10:31:02 EDT Subject: Quantitative Linguistics Conference Announcement Message-ID: <10643.651508262@B.GP.CS.CMU.EDU> First QUANTITATIVE LINGUISTICS CONFERENCE (QUALICO) September 23 - 27, 1991 University of Trier, Germany organized by the GLDV - Gesellschaft fuer Linguistische Datenverarbeitung (German Society for Linguistic Computing) and the Editors of "Quantitative Linguistics" OBJECTIVES QUALICO is being held for the first time as an International Conference to demonstrate the state of the art in Quantitative Linguistics. This domain of language study and research is gaining considerable interest due to recent advances in linguistic modelling, particularly in computational linguistics, cognitive science, and developments in mathematics like non- linear systems theory. Progress in hard- and software technology together with ease of access to data and numerical processing has provided new means of empirical data acquisition and the application of mathematical models of adequate complexity. The German Society for Linguistic Computation (Gesellschaft fuer Linguistische Datenverarbeitung - GLDV) and the editors of 'Quantitative Linguistics' have taken the initiative in preparing this conference to take place at the University of Trier, in Trier (Germany), September 23rd - 27th, 1991. In view of the stimulating new developments in Europe and the academic world, the organizers' aim is to encourage and promote mutual exchange of ideas in this field of interest which has been limited in the past. Challenging advances in interdisciplinary quantitative analyses, numerical modelling and experimental simulations from different linguistic domains will be reported on by the following keynote speakers: Gabriel Altmann (Bochum), Michail V. Arapov (Moskau) (pending acceptance), Hans Goebl (Salzburg), Mildred L.G. Shaw (Calgary), John S. Nicolis (Patras), Stuart M. Shieber (Harvard) (pending acceptance). CALL FOR PAPERS The International Program Committee invites communications (long papers: 20 minutes plus 10; short papers: 15 minutes plus 5; demonstrations and posters) on basic research and development as well as on operational applications of Quantitative Linguistics, including - but not limited to - the following topics: A. Methodology 1. Theory Construction - 2. Measurement, Scaling - 3. Taxonomy, Categorizing - 4. Simulation - 5. Statistics, Probabilistic Modells, Stochastic Processes - 6. Fuzzy Theory: Possibilistic Modells - 7. Language and Grammar Formalisms - 8. Systems Theory: Cybernetics and Information Theory, Synergetics, New Connectionism B. Linguistic Analysis and Modelling 1. Phonetics - 2. Phonemics - 3. Morphology - 4. Syntax - 5. Semantics - 6. Pragmatics - 7.Lexicology - 8. Dialectology - 9. Typology - 10. Text and Discourse - 11. Semiotics C. Applications 1. Speech Recognition and Synthesis - 2.Text Analysis and Generation - 3. Language Acquisition and Teaching - 4.Text Understanding and Knowledge Representation Authors are asked to submit extended abstracts (1500 words; 4 copies) of their papers in one of the conference's working languages (German, English) not later than December 31, 1990 to: QUALICO - The Program Committee University of Trier P.O.Box 3825 D-5500 TRIER Germany uucp: qualico at utrurt.uucp or: ..!unido!utrurt!qualico X.400: qualico at ldv.rz.uni-trier.dbp.de or: Notice of acceptance will be given by March 31, 1991; and full versions of invited and accepted papers (camera-ready) are due by June 30, 1991 in order to have the Conference Proceedings be published in time to be available for participants at the beginning of QUALICO. This 'Call for Papers' is distributed world-wide in order to reach researchers active in universities and industry. SOCIAL PROGRAMME The oldest city in Germany, founded 16 b.C. by the Romans as Augusta Treverorum in the Mosel valley is situated now in the most Western region of Germany near both the French and Luxembourgian border.In the center of Europe this ancient city will host the participants of QUALICO at the University of Trier, surrounded by the vineyards of the Mosel-Saar-Ruwer wine district at vintage beginning. The excursion day scheduled midway through the conference (September 25, 1991) will provide an opportunity to visit points of historical interest in the city and its vicinity during a boat-trip on the Mosel river. PROGRAM COMMITTEE Chair: B.B. Rieger, University of Trier S. Embleton, University of York, D. Gibbon, University of Bielefeld R. Grotjahn, University of Bochum J. Haller, IAI Saarbruecken P. Hellwig, University of Heidelberg E. Hopkins, University of Bochum J. Kindermann, GMD Bonn-St.Augustin U. Klenk, University of Goettingen R. Koehler, University of Trier J.P. Koester, University of Trier J. Krause, University of Regensburg W. Lehfeldt, University of Konstanz W. Lenders, University of Bonn C. Lischka, GMD Bonn-St.Augustin W. Matthaeus, University of Bochum R.G. Piotrowski, University of Leningrad D. Roesner, FAW Ulm G. Ruge, Siemens AG, Muenchen B. Schaeder, University of Siegen H. Schnelle, University of Bochum J. Sambor, University of Warsaw ORGANIZING COMMITTEE Chair: R. Koehler, University of Trier CONFERENCE FEES Early registration (paid before July 31, 1991): DM 300,- - Members of supporting organizations DM 250,- - Students (without Proceedings) DM 150,- Registration (paid after July 31, 1991): DM 400,- - Members of supporting organizations DM 350,- - Students (without Proceedings) DM 250,- From Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU Fri Aug 24 12:36:28 1990 From: Scott.Fahlman at SEF1.SLISP.CS.CMU.EDU (Scott.Fahlman@SEF1.SLISP.CS.CMU.EDU) Date: Fri, 24 Aug 90 12:36:28 EDT Subject: Quantitative Linguistics??? Message-ID: Perhaps the people who sent out this conference announcement could follow up with a *brief* description of what quantitative linguistics is all about, and why they are so excited about new advances in the area. I'm not familiar with the term, and the conference announcement didn't make clear how qualitaive linguistics differs from older (qualitative?) linguistic models, except maybe that the key researchers are all in Europe. And what does quantitative linguistics have to do with connectionism? -- Scott Fahlman, Carnegie-Mellon University From bms at dcs.leeds.ac.uk Fri Aug 24 13:26:57 1990 From: bms at dcs.leeds.ac.uk (B M Smith) Date: Fri, 24 Aug 90 13:26:57 BST Subject: Item for Distribution Message-ID: <1511.9008241226@csuna6.dcs.leeds.ac.uk> FINAL CALL FOR PAPERS AISB'91 8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE University of Leeds, UK 16-19 April, 1991 The Society for the Study of Artificial Intelligence and Simulation of Behaviour (SSAISB) will hold its eighth biennial conference at Bodington Hall, University of Leeds, from 16 to 19 April 1991. There will be a Tutorial Programme on 16 April followed by the full Technical Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit Brussel). Scope: Papers are sought in all areas of Artificial Intelligence and Simulation of Behaviour, but especially on the following AISB91 special themes: * Emergent functionality in autonomous agents * Neural networks and self-organisation * Constraint logic programming * Knowledge level expert systems research Papers may describe theoretical or practical work but should make a significant and original contribution to knowledge about the field of Artificial Intelligence. A prize of 500 pounds for the best paper has been offered by British Telecom Computing (Advanced Technology Group). It is expected that the proceedings will be published as a book. Submission: All submissions should be in hardcopy in letter quality print and should be written in 12 point or pica typewriter face on A4 or 8.5" x 11" paper, and should be no longer than 10 sides, single-spaced. Each paper should contain an abstract of not more than 200 words and a list of up to four keywords or phrases describing the content of the paper. Five copies should be submitted. Papers must be written in English. Authors should give an electronic mail address where possible. Submission of a paper implies that all authors have obtained all necessary clearances from the institution and that an author will attend the conference to present the paper if it is accepted. Papers should describe work that will be unpublished on the date of the conference. Dates: Deadline for Submission: 1 October 1990 Notification of Acceptance: 7 December 1990 Deadline for camera ready copy: 16 January 1991 Location: Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The city of Leeds is two and a half hours by rail from London, and there are frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam and Paris. The Yorkshire Dales National Park is close by, and the historic city of York is only 30 minutes away by rail. Information: Papers and all queries regarding the programme should be sent to Judith Dennison. All other correspondence and queries regarding the conference to the Local Organiser, Barbara Smith. Ms. Judith Dennison Dr. Barbara Smith Cognitive Sciences Division of AI University of Sussex School of Computer Studies Falmer University of Leeds Brighton BN1 9QN Leeds LS2 9JT UK UK Tel: (+44) 273 678379 Tel: (+44) 532 334627 Email: judithd at cogs.sussex.ac.uk FAX: (+44) 532 335468 Email: aisb91 at ai.leeds.ac.uk From sankar at caip.rutgers.edu Fri Aug 24 17:19:35 1990 From: sankar at caip.rutgers.edu (ananth sankar) Date: Fri, 24 Aug 90 17:19:35 EDT Subject: No subject Message-ID: <9008242119.AA06389@caip.rutgers.edu> Rutgers University CAIP Center CAIP Neural Network Workshop 15-17 October 1990 A neural network workshop will be held during 15-17 October 1990 in East Brunswick, New Jersey under the sponsorship of the CAIP Center of Rutgers University. The theme of the workshop will be "Theory and impact of Neural Networks on future technology" Leaders in the field from government, industry and academia will present the state-of-the-art theory and applications of neural networks. Attendance will be limited to about 100 participants. A Partial List of Speakers and Panelists include: J. Alspector, Bellcore A. Barto, University of Massachusetts R. Brockett, Harvard University L. Cooper, Brown University J. Cowan, University of Chicago K. Fukushima, Osaka University D. Glasser, University of California, Berkeley S. Grossberg, Boston University R. Hecht-Nielsen, HNN, San Diego J. Hopfield, California Institute of Technology L. Jackel, AT&T Bell Labs. S. Kirkpatrick, IBM, T.J. Watson Research Center S. Kung, Princeton University F. Pineda, JPL, California Institute of Technology R. Linsker, IBM, T.J. Watson Research Center J. Moody, Yale University E. Sontag, Rutgers University H. Stark, Illinois Institute of Technology B. Widrow, Stanford University Y. Zeevi, CAIP Center, Rutgers University and The Technion, Israel The workshop will begin with registration at 8:30 AM on Monday, 15 October and end at 7:00 PM on Wednesday, 17 October. There will be dinners on Tuesday and Wednesday evenings followed by special-topic discussion sessions. The $395 registration fee ($295 for participants from CAIP member organizations), includes the cost of the dinners. Participants are expected to remain in attendance throughout the entire period of the workshop. Proceedings of the workshop will subsequently be published in book form. Individuals wishing to participate in the workshop should fill out the attached form and mail it to the address indicated. If there are any questions, please contact Prof. Richard Mammone Department of Electrical and Computer Engineering Rutgers University P.O. Box 909 Piscataway, NJ 08854 Telephone: (201)932-5554 Electronic Mail: mammone at caip.rutgers.edu FAX: (201)932-4775 Telex: 6502497820 mci Rutgers University CAIP Center CAIP Neural Network Workshop 15-17 October 1990 I would like to register for the Neural Network Workshop. Title:________ Last:_________________ First:_______________ Middle:__________ Affiliation _________________________________________________________ Address _________________________________________________________ ______________________________________________________ Business Telephone: (___)________ FAX:(___)________ Electronic Mail:_______________________ Home Telephone:(___)________ I am particularly interested in the following aspects of neural networks: _______________________________________________________________________ _______________________________________________________________________ Fee enclosed $_______ Please bill me $_______ Please complete the above and mail this form to: Neural Network Workshop CAIP Center, Rutgers University Brett and Bowser Roads P.O. Box 1390 Piscataway, NJ 08855-1390 (USA) From bms at dcs.leeds.ac.uk Fri Aug 24 13:31:19 1990 From: bms at dcs.leeds.ac.uk (B M Smith) Date: Fri, 24 Aug 90 13:31:19 BST Subject: Item for Distribution Message-ID: <1560.9008241231@csuna6.dcs.leeds.ac.uk> FINAL CALL FOR PAPERS AISB'91 8th SSAISB CONFERENCE ON ARTIFICIAL INTELLIGENCE University of Leeds, UK 16-19 April, 1991 The Society for the Study of Artificial Intelligence and Simulation of Behaviour (SSAISB) will hold its eighth biennial conference at Bodington Hall, University of Leeds, from 16 to 19 April 1991. There will be a Tutorial Programme on 16 April followed by the full Technical Programme. The Programme Chair will be Luc Steels (AI Lab, Vrije Universiteit Brussel). Scope: Papers are sought in all areas of Artificial Intelligence and Simulation of Behaviour, but especially on the following AISB91 special themes: * Emergent functionality in autonomous agents * Neural networks and self-organisation * Constraint logic programming * Knowledge level expert systems research Papers may describe theoretical or practical work but should make a significant and original contribution to knowledge about the field of Artificial Intelligence. A prize of 500 pounds for the best paper has been offered by British Telecom Computing (Advanced Technology Group). It is expected that the proceedings will be published as a book. Submission: All submissions should be in hardcopy in letter quality print and should be written in 12 point or pica typewriter face on A4 or 8.5" x 11" paper, and should be no longer than 10 sides, single-spaced. Each paper should contain an abstract of not more than 200 words and a list of up to four keywords or phrases describing the content of the paper. Five copies should be submitted. Papers must be written in English. Authors should give an electronic mail address where possible. Submission of a paper implies that all authors have obtained all necessary clearances from the institution and that an author will attend the conference to present the paper if it is accepted. Papers should describe work that will be unpublished on the date of the conference. Dates: Deadline for Submission: 1 October 1990 Notification of Acceptance: 7 December 1990 Deadline for camera ready copy: 16 January 1991 Location: Bodington Hall is on the edge of Leeds, in 14 acres of private grounds. The city of Leeds is two and a half hours by rail from London, and there are frequent flights to Leeds/Bradford Airport from London Heathrow, Amsterdam and Paris. The Yorkshire Dales National Park is close by, and the historic city of York is only 30 minutes away by rail. Information: Papers and all queries regarding the programme should be sent to Judith Dennison. All other correspondence and queries regarding the conference to the Local Organiser, Barbara Smith. Ms. Judith Dennison Dr. Barbara Smith Cognitive Sciences Division of AI University of Sussex School of Computer Studies Falmer University of Leeds Brighton BN1 9QN Leeds LS2 9JT UK UK Tel: (+44) 273 678379 Tel: (+44) 532 334627 Email: judithd at cogs.sussex.ac.uk FAX: (+44) 532 335468 Email: aisb91 at ai.leeds.ac.uk From tgd at turing.CS.ORST.EDU Fri Aug 24 17:55:56 1990 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Fri, 24 Aug 90 14:55:56 PDT Subject: Human confusability of phonemes Message-ID: <9008242155.AA06954@turing.CS.ORST.EDU> I am conducting a comparison study of several learning algorithms on the nettalk task. To make the comparisons fair, I would like to be able to rate the severity of prediction errors made by these algorithms. For example, if the desired phoneme is /k/ (the k in "key") and the phoneme produced by the learned network is /e/ (the a in "late"), then this is a bad error. On the other hand, substituting /x/ (the a in "pirate") for /@/ (the a in "cab") should probably not count as much of an error. Can any readers point me to research that has been done on the confusability of different phonemes (i.e., to what extent human listeners can confuse two phonemes or reliably detect their difference)? Thanks, Tom Dietterich Thomas G. Dietterich Department of Computer Science Dearborn Hall, 306 Oregon State University Corvallis, OR 97331-3202 From schraudo%cs at ucsd.edu Fri Aug 24 18:18:46 1990 From: schraudo%cs at ucsd.edu (Nici Schraudolph) Date: Fri, 24 Aug 90 15:18:46 PDT Subject: TR announcement (hardcopy and ftp) Message-ID: <9008242218.AA14587@beowulf.ucsd.edu> The following technical report is now available in print: -------- Dynamic Parameter Encoding for Genetic Algorithms ------------------------------------------------- Nicol N. Schraudolph Richard K. Belew The selection of fixed binary gene representations for real-valued parameters of the phenotype required by Holland's genetic algorithm (GA) forces either the sacrifice of representational precision for efficiency of search or vice versa. Dynamic Parameter Encoding (DPE) is a mechanism that avoids this dilemma by using convergence statistics derived from the GA population to adaptively control the mapping from fixed-length binary genes to real values. By reducing the length of genes DPE causes the GA to focus its search on the interactions between genes rather than the details of allele selection within individual genes. DPE also highlights the general importance of the problem of premature convergence in GAs, explored here through two convergence models. -------- To obtain a hardcopy, request technical report LAUR 90-2795 via e-mail from office%bromine at LANL.GOV, or via plain mail from Technical Report Requests CNLS, MS-B258 Los Alamos National Laboratory Los Alamos, NM 87545 USA -------- As previously announced, the report is also available in compressed PostScript format for anonymous ftp from the Artificial Life archive server. To obtain a copy, use the following procedure: $ ftp iuvax.cs.indiana.edu % (or 129.79.254.192) login: anonymous password: ftp> cd pub/alife/papers ftp> binary ftp> get schrau90-dpe.ps.Z ftp> quit $ uncompress schrau90-dpe.ps.Z $ lpr schrau90-dpe.ps -------- The DPE algorithm is an option in the GENESIS 1.1ucsd GA simulator, which will be ready for distribution (via anonymous ftp) shortly. Procedures for obtaining 1.1ucsd will then be announced on this mailing list. -------- Nici Schraudolph, C-014 nschraudolph at ucsd.edu University of California, San Diego nschraudolph at ucsd.bitnet La Jolla, CA 92093 ...!ucsd!nschraudolph From mikek at wasteheat.colorado.edu Mon Aug 27 19:42:44 1990 From: mikek at wasteheat.colorado.edu (Mike Kranzdorf) Date: Mon, 27 Aug 90 17:42:44 -0600 Subject: Mactivation - new info Message-ID: <9008272342.AA25683@wasteheat.colorado.edu> ***Please note new physical address*** Mactivation is an introductory neural network simulator which runs on all Macintoshes. A graphical interface provides direct access to units, connections, and patterns. Basic concepts of associative memory and network operation can be explored, with many low level parameters available for modification. Back- propagation is not supported. A user's manual containing an introduction to connectionist networks and program documentation is included on one 800K Macintosh disk. The current version is 3.3 Mactivation is available from the author, Mike Kranzdorf. The program may be freely copied, including for classroom distribution. To obtain a copy, send your name and address and a check payable to Mike Kranzdorf for $5 (US). International orders should send either an international postal money order for five dollars US or ten (10) international postal coupons. Mactivation 3.2 is available via anonymous ftp on boulder.colorado.edu Please don't ask me how to deal with ftp - that's why I offer it via snail mail. I will probably post version 3.3 soon, it depends on some politics here. Mike Kranzdorf P.O. Box 1379 Nederland, CO 80466-1379 From mikek at wasteheat.colorado.edu Tue Aug 28 12:24:52 1990 From: mikek at wasteheat.colorado.edu (Mike Kranzdorf) Date: Tue, 28 Aug 90 10:24:52 -0600 Subject: Mactivation ftp location Message-ID: <9008281624.AA26266@wasteheat.colorado.edu> Sorry I forgot to include the ftp specifics: Machine: boulder.colorado.edu Directory: /pub File Name: mactivation.3.2.sit.hqx.Z I really will try to put version 3.3 there soon. Please send me comments if you use Mactivation. I am very responsive to good suggestions and will add them when possible. Back-prop will come in version 4.0, but that's a complete re-write. I can add smaller things to 3.3. --mike From pako at neuronstar.it.lut.fi Thu Aug 30 05:05:47 1990 From: pako at neuronstar.it.lut.fi (Pasi Koikkalainen) Date: Thu, 30 Aug 90 12:05:47 +0300 Subject: ICANN International Conference on Artificial Neural Networks Message-ID: <9008300905.AA01460@neuronstar.it.lut.fi> ICANN-91 INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS Helsinki University of Technology Espoo, Finland, June 24-28, 1991 Conference Chair: Conference Committee: Teuvo Kohonen (Finland) Bernard Angeniol (France) Eduardo Caianiello (Italy) Program Chair: Rolf Eckmiller (FRG) Igor Aleksander (England) John Hertz (Denmark) Luc Steels (Belgium) CALL FOR PAPERS =================== THE CONFERENCE: =============== Theories, implementations, and applications of Artificial Neural Networks are progressing at a growing speed both in Europe and elsewhere. The first commercial hardware for neural circuits and systems are emerging. This conference will be a major international contact forum for experts from academia and industry worldwide. Around 1000 participants are expected. ACTIVITIES: =========== - Tutorials - Invited talks - Oral and poster sessions - Prototype demonstrations - Video presentations - Industrial exhibition ------------------------------------------------------------------------- Complete papers of at most 6 pages are invited for oral or poster presentation in one of the sessions given below: 1. Mathematical theories of networks and dynamical systems 2. Neural network architectures and algorithms (including organizations and comparative studies) 3. Artificial associative memories 4. Pattern recognition and signal processing (especially vision and speech) 5. Self-organization and vector quantization 6. Robotics and control 7. "Neural" knowledge data bases and non-rule-based decision making 8. Software development (design tools, parallel algorithms, and software packages) 9. Hardware implementations (coprocessors, VLSI, optical, and molecular) 10. Commercial and industrial applications 11. Biological and physiological connection (synaptic and cell functions, sensory and motor functions, and memory) 12. Neural models for cognitive science and high-level brain functions 13. Physics connection (thermodynamical models, spin glasses, and chaos) -------------------------------------------------------------------------- Deadline for submitting manuscripts is January 15, 1991. The Conference Proceedings will be published as a book by Elsevier Science Publishers B.V. Deadline for sending final papers on the special forms is March 15, 1991. For more information and instructions for submitting manuscripts, please contact: Prof. Olli Simula ICANN-91 Organization Chairman Helsinki University of Technology SF-02150 Espoo, Finland Fax: +358 0 451 3277 Telex: 125161 HTKK SF Email (internet): icann91 at hutmc.hut.fi --------------------------------------------------------------------------- In addition to the scientific program, several social occasions will be included in the registration fee. Pre- and post-conference tours and excursions will also be arranged. For more information about registration and accommodation, please contact: Congress Management Systems P.O.Box 151 SF-00141 Helsinki, Finland Tel.: +358 0 175 355 Fax: +358 0 170 122 Telex: 123585 CMS SF From uhr at cs.wisc.edu Thu Aug 30 12:30:30 1990 From: uhr at cs.wisc.edu (Leonard Uhr) Date: Thu, 30 Aug 90 11:30:30 -0500 Subject: Summary (long): pattern recognition comparisons Message-ID: <9008301630.AA10562@thor.cs.wisc.edu> A quick response to the responses to my comments on the gap between nets and computer vision (I've been out of town, and now trying to catch up on mail): I certainly wasn't suggesting that the number of input nodes matters, but simply that complex images must be resolved in enough detail to be recognizable. Gary Cottrell's 64x64 images may be adequate for faces (tho I suspect finer resolution is needed as more people are used, with many different expressions (much less rotations) for each). But the point is that complete connectivity from layer to layer needs O(N**2) links, and the fact that "a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of how complete connectivity dominates. Once the preprocessor is handled by the net itself it will either need too many links or have ad hoc structure. It's surely better to use partial connectivity (e.g., local - which is a very general assumption motivated by physical interactions and brain structure) than some inevitably ad hoc preprocessing steps of unknown value. Evaluation is tedious and unrewarding, but without it we simply can't make claims or compare systems. I'm not arguing against nets - to the contrary, I think that highly parallel nets are the only possibility for handling really hard problems like recognition, language handling, and reasoning. But they'll need much better structure (or the ability to evolve and generate needed structures). And I was asking for objective evidence that 3-layer feed-forward nets with links between all nodes in adjacent layers actually handle complex images better than some of the large and powerful computer vision systems. True - we know that in theory they can do anything. But that's no better than knowing that random search through the space of all Turing machine programs can do anything. Len Uhr From ahmad at ICSI.Berkeley.EDU Thu Aug 30 16:20:13 1990 From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad) Date: Thu, 30 Aug 90 13:20:13 PDT Subject: Summary (long): pattern recognition comparisons In-Reply-To: Leonard Uhr's message of Thu, 30 Aug 90 11:30:30 -0500 <9008301630.AA10562@thor.cs.wisc.edu> Message-ID: <9008302020.AA02846@icsib18.Berkeley.EDU> >But the point is that >complete connectivity from layer to layer needs O(N**2) links, and the fact that >"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of >how complete connectivity dominates. Once the preprocessor is handled by the >net itself it will either need too many links or have ad hoc structure. >It's surely better to use partial connectivity (e.g., local - which is a very >general assumption motivated by physical interactions and brain structure) >than some inevitably ad hoc preprocessing steps of unknown value. Systems with selective attention mechanisms provide yet another way of avoiding the combinatorics. In these models, you can route relevant feature values from arbitrary locations in the image to a central processor. The big advantage is that the central processor can now be quite complex (possibly fully connected) since it only has to deal with a relatively small number of inputs. --Subutai Ahmad ahmad at icsi.berkeley.edu References: Koch, C. and Ullman, S. Shifts in Selective Attention: towards the underlying neural circuitry. Human Neurobiology, Vol 4:219-227, 1985. Ahmad, S. and Omohundro, S. Equilateral Triangles: A Challenge for Connectionist Vision. In Proceedings of the 12th Annual meeting of the Cognitive Science Society, MIT, 1990. Ahmad, S. and Omohundro, S. A Network for Extracting the Locations of Point Clusters Using Selective Attention, ICSI Tech Report No. TR-90-011, 1990. From kawahara at av-convex.ntt.jp Fri Aug 31 10:43:46 1990 From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA) Date: Fri, 31 Aug 90 23:43:46+0900 Subject: JNNS'90 Program Summary (long) Message-ID: <9008311443.AA11611@av-convex.ntt.jp> The first annual conference of Japan Neural Network Society (JNNS'90) will be held from 10 to 12 September, 1990. Followings are the program summary and related information on JNNS. There are 2 Invited presentations, 23 oral presentations and 53 poster presentations. Unfortunately, a list of the presentation titles in English is not available yet, because many authors didn't provide English titles for their presentations (Official languages for the proceding were Japanese and English. But only two articles were written in English). I will try to compile the English list by the end of September and would like to introduce it. If you have any questions or comments, please e-mail to the following address. (Please *DON'T REPLY*.) kawahara at nttlab.ntt.jp - ---------------------------------------------- Hideki Kawahara NTT Basic Research Laboratories 3-9-11, Midori-cho Musashino, Tokyo 180, JAPAN Tel: +81 422 59 2276, Fax: +81 422 59 3393 - ---------------------------------------------- JNNS'90 1990 Annual Conference of Japan Neural Network Society September 10-12, 1990 Tamagawa University, 6-1-1 Tamagawa-Gakuen Machida, Tokyo 194, Japan Program Summary Monday, 10 September 1990 12:00 Registration 13:00 - 16:00 Oral Session O1: Learning 16:00 - 18:00 Poster session P1: Learning, Motion and Architecture 18:00 Organization Committee Tuesday, 11 September 1990 9:00 - 12:00 Oral Session O2: Motion and Architecture 13:00 - 13:30 Plenary Session 13:30 - 15:30 Invited Talk; "Brain Codes of Shapes: Experiments and Models" by Keiji Tanaka "Theories: from 1980's to 1990's" by Shigeru Shinomoto 15:30 - 18:30 Oral Session O3: Vision I 19:00 Reception Wednesday, 12 September 1990 9:00 - 12:00 Oral Session O4: Vision II, Time Series and Dynamics 13:00 - 15:00 Poster Session P2: Vision I, II, Time Series and Dynamics 15:00 - 16:45 Oral Session O5: Dynamics Room 450 is for Oral Session, Plenary Session and Invited talk. Rooms 322, 323, 324, 325 and 350 are for Poster Session. Registration Fees for Conference Members 5000 yen Student members 3000 yen Otherwise 8000 yen Reception 19:00 Tuesday, 12 September 1990 Sakufuu-building Fee: 5000 yen JNNS Officers and Governing board Kunihiko Fukushima Osaka University President Shiun-ichi Amari University of Tokyo International Affair Secretary Minoru Tsukada Tamagawa University Takashi Nagano Hosei University Publication Shiro Usui Toyohashi University of Technology Yoichi Okabe University of Tokyo Sei Miyake NHK Science and Technical Research Labs. Planning Yuichiro Anzai Keio University Keisuke Toyama Kyoto Prefectural School of Medicine Nozomu Hoshimiya Tohoku University Treasurer Naohiro Ishii Nagoya Institute of Technology Hideaki Saito Tamagawa University Regional Affair Ken-ichi Hara Yamagata University Hiroshi Yagi Toyama University Eiji Yodogawa ATR Syozo Yasui Kyushu Institute of Technology Supervisor Noboru Sugie Nagoya University Committee members Editorial Committee (Newsletter and mailing list) Takashi Omori Tokyo University of Agriculture and Technology Hideki Kawahara NTT Basic Research Labs. Itirou Tsuda Kyushu Institute of Technology Planning Committee Kazuyuki Aihara Tokyo Denki University Shigeru Shinomoto Kyoto University Keiji Tanaka The Institute of Physical and Chemical Research JNNS'90 Conference Organizing Committee Sei Miyake NHK Science and Technical Research Labs. General Chairman Keiji Tanaka The Institute of Physical and Chemical Research Program Chairman Shigeru Shinomoto Kyoto University Publicity Chairman Program Takayuki Ito NHK Science and Technical Research Labs. Takashi Omori Tokyo University of Agriculture and Technology Koji Kurata Osaka University Kenji Doya University of Tokyo Kazuhisa Niki Electrotechnical Laboratory Ryoko Futami Tohoku University Publicity Kazunari Nakane ATR Publication Hideki Kawahara NTT Basic Research Labs. Mahito Fujii NHK Science and Technical Research Labs. Treasurer Shin-ichi Kita University of Tokyo Manabu Sakakibara Toyohashi University of Technology Local Arrangement Shigeru Tanaka Fundamental Research Labs., NEC Makoto Mizuno Tamagawa University For more details, please contact: Japan Neural Network Society Office Faculty of Engineering, Tamagawa University 6-1-1 Tamagawa-Gakuen Machida, Tokyo 194, Japan Telephone: +81 427 28 3457 Facsimile: +81 427 28 3597