From jbower at cns.caltech.edu Thu Jan 2 17:59:02 1992 From: jbower at cns.caltech.edu (Jim Bower) Date: Thu, 2 Jan 92 14:59:02 PST Subject: CNS*92 submissions Message-ID: <9201022259.AA16423@cns.caltech.edu> Submissions to CNS*92 Those of you who are preparing submissions for the first annual Computation and Neural Systems Meeting, July 26 - 31, San Francisco, California. We encourage you to submit materials via email. You can send your 100 word abstract and 1000 word summary to: cp at cns.caltech.edu You can also use this address to obtain additional information including conference registration forms. Jim Bower Program Chair CNS*92 From mosfet at mcc.com Fri Jan 3 09:57:21 1992 From: mosfet at mcc.com (Mosfeq Rashid) Date: Fri, 3 Jan 92 08:57:21 CST Subject: CNS*92 submissions In-Reply-To: `CNS*92 submissions' (2,<9201022259.AA16423@cns.caltech.edu>) by Jim Bower Message-ID: <9201031457.AA23305@avarice.aca.mcc.com> From schraudo at cs.UCSD.EDU Fri Jan 3 19:40:33 1992 From: schraudo at cs.UCSD.EDU (Nici Schraudolph) Date: Fri, 3 Jan 92 16:40:33 PST Subject: a hint for NIPS-4 authors using LaTeX Message-ID: <9201040040.AA02740@beowulf.ucsd.edu> [This message is only relevant for NIPS authors using LaTeX. My apologies for sending it to all connectionists, but I do feel that the benefit to some outweighs the slight inconvenience to many in this case.] The formatting instructions for NIPS-4 authors call for a third level heading for the reference section; however, both the \bibliography and \thebibliography macros produce a first level heading. Here's a quick fix for this: the line \renewcommand{\section}[2]{\subsubsection*{#2}} inserted right before the \bibliography or \thebibliography command will generate the correct format. Since the references are the last section of the paper, we can get away with such an ugly hack... Best regards, -- Nicol N. Schraudolph, CSE Dept. | work (619) 453-4364 | nici at cs.ucsd.edu Univ. of California, San Diego | FAX (619) 534-7029 | nici%cs at ucsd.bitnet La Jolla, CA 92093-0114, U.S.A. | home (619) 273-5261 | ...!ucsd!cs!nici From rkc at xn.ll.mit.edu Mon Jan 6 11:49:45 1992 From: rkc at xn.ll.mit.edu (rkc@xn.ll.mit.edu) Date: Mon, 6 Jan 92 11:49:45 EST Subject: [mike@psych.ualberta.ca: Connectionism & Motion] In-Reply-To: "Mike R. W. Dawson"'s message of Thu, 19 Dec 1991 20:27:42 -0700 Message-ID: <9201061649.AA06212@tremor.ll.mit.edu> Can I get a reprint of: Dawson, M.R.W. (1991). The how and why of what went where in apparent motion: Modeling solutions to the motion correspondence problem. Psychological Review, 98(4), 569-603. with any additional technical reports you've written on the matter? -Rob From ken at cns.caltech.edu Mon Jan 6 12:14:59 1992 From: ken at cns.caltech.edu (Ken Miller) Date: Mon, 6 Jan 92 09:14:59 PST Subject: summer program in Mathematical Physiology: update Message-ID: <9201061714.AA20536@cns.caltech.edu> I'm writing with respect to my previous posting on the summer program in Mathematical Physiology at MSRI (Mathematical Sciences Research Institute, Berkeley, CA). The first two weeks of this program, July 6-17, are on "Neurons in Networks", as described in that posting. The new information is: (1) The application deadline has been pushed back to Feb. 1 (2) Applications may be sent by e-mail. Send applications to abaxter at msri.org, and address the correspondence to Nancy Kopell and Michael Reed (3) All expenses will be covered for those who attend, *except*: there is a $450 limit on travel expenses. So, those wishing to attend from overseas should indicate in their application whether they will be able to attend with that limit on travel support. (4) As mentioned before, women and minorities are encouraged to apply. If you *are* a member of a minority, or if you are a woman and it might not be obvious to us from your name, please be sure to note this in your application. I'll repeat here the basic information about applications: To apply to participate and for financial support or to obtain more information about the topics of the workshops, please write to: Nancy Kopell and Michael Reed Summer Program in Mathematical Physiology MSRI 1000 Centennial Drive Berkeley, CA 94720 Applicants should state clearly whether they wish to be long term participants or workshop participants and which workshops they wish to attend. Students should send a letter explaining their background and interests and arrange for one letter of recommendation to be sent. Researchers should indicate their interest and experience in mathematical biology and include a current vita and bibliography. Women and minorities are encouraged to apply. Ken From jim at gdstech.grumman.com Tue Jan 7 10:50:27 1992 From: jim at gdstech.grumman.com (Jim Eilbert) Date: Tue, 7 Jan 92 10:50:27 EST Subject: Mailing List Message-ID: <9201071550.AA20642@gdstech.grumman.com> Would you please add me to the Connectionist mailing list. Thanks. Jim Eilbert M/S A02-26 Grumman CRC Bethpage, NY 11714 516-575-4909 From ingber at umiacs.UMD.EDU Wed Jan 8 11:15:21 1992 From: ingber at umiacs.UMD.EDU (Lester Ingber) Date: Wed, 8 Jan 1992 11:15:21 EST Subject: Generic mesoscopic neural networks ... neocortical interactions Message-ID: <9201081615.AA01582@dweezil.umiacs.UMD.EDU> "Generic mesoscopic neural networks based on statistical mechanics of neocortical interactions," previously placed in the pub/neuroprose archive (anonymous ftp to archive.cis.ohio-state.edu [128.146.8.52]) as ingber.mnn.ps.Z, has been accepted for publication as a Rapid Communications in Physical Review A. For awhile, most-current drafts of this preprint and some related papers may be obtained by anonymous ftp to ftp.umiacs.umd.edu [128.8.120.23] in the directory pub/ingber. (Remember to set "binary" at the ">" prompt after logging in.) If you do not have access to ftp, send me an email request, and I'll send you a uuencoded-compressed PostScript file. Sorry, but I cannot take on the task of mailing out hardcopies of this paper. ------------------------------------------ | Prof. Lester Ingber | | ______________________ | | Science Transfer Corporation | | P.O. Box 857 703-759-2769 | | McLean, VA 22101 ingber at umiacs.umd.edu | ------------------------------------------ From rwl at bend.UCSD.EDU Wed Jan 8 16:17:48 1992 From: rwl at bend.UCSD.EDU (Ron Langacker) Date: Wed, 8 Jan 92 13:17:48 PST Subject: Mailing List Message-ID: <9201082117.AA28894@bend.UCSD.EDU> Please remove my name from the connectionist mailing list. Thank you. Ron Langacker UCSD From j_bonnet at inescn.pt Thu Jan 9 09:11:00 1992 From: j_bonnet at inescn.pt (Jose' M. Bonnet) Date: 9 Jan 92 9:11 Subject: Mailing List Message-ID: <104*j_bonnet@inescn.pt> Would you please add me to the Connectionist mailing list. Thanks. Jose Bonnet INESC-Porto, Largo Monpilher, 22 4000 PORTO PORTUGAL From marshall at cs.unc.edu Thu Jan 9 14:31:05 1992 From: marshall at cs.unc.edu (Jonathan Marshall) Date: Thu, 9 Jan 92 14:31:05 -0500 Subject: Paper available: Neural mechanisms for steering in visual motion Message-ID: <9201091931.AA19960@marshall.cs.unc.edu> The following paper is available via ftp from the neuroprose archive at Ohio State (instructions for retrieval follow the abstract). ---------------------------------------------------------------------- Challenges of Vision Theory: Self-Organization of Neural Mechanisms for Stable Steering of Object-Grouping Data in Visual Motion Perception Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Invited paper, in Stochastic and Neural Methods in Signal Processing, Image Processing, and Computer Vision, Su-Shing Chen, Ed., Proceedings of the SPIE 1569, San Diego, July 1991, pp.200-215. ---------------------------------------------------------------------- ABSTRACT Psychophysical studies on motion perception suggest that human visual systems perform certain nonlocal operations. In some cases, data about one part of an image can influence the processing or perception of data about another part of the image, across a long spatial range. In others, data about nearby parts of an image can fail to influence one another strongly, despite their proximity. Several types of nonlocal interaction may underlie cortical processing for accurate, stable perception of visual motion, depth, and form: o trajectory-specific propagation of computed moving stimulus information to successive image locations where a stimulus is predicted to appear; o grouping operations (establishing linkages among perceptually related data); o scission operations (breaking linkages between unrelated data); and o steering operations, whereby visible portions of a visual group or object can control the representations of invisible or occluded portions of the same group. Nonlocal interactions like these could be mediated by long-range excitatory horizontal intrinsic connections (LEHICs), discovered in visual cortex of several animal species. LEHICs often span great distances across cortical image space. Typically, they have been found to interconnect regions of like specificity with regard to certain receptive field attributes, e.g., stimulus orientation. It has recently been shown that several visual processing mechanisms can self-organize in model recurrent neural networks using unsupervised "EXIN" (excitatory+inhibitory) learning rules. Because the same rules are used in each case, EXIN networks provide a means to unify explanations of how different visual processing modules acquire their structure and function. EXIN networks learn to multiplex (or represent simultaneously) multiple spatially overlapping components of complex scenes, in a context-sensitive fashion. Modeled LEHICs have been used together with the EXIN learning rules to show how visual experience can shape neural mechanisms for nonlocal, context-sensitive processing of visual motion data. ---------------------------------------------------------------------- To get a copy of the paper, do the following: unix> ftp archive.cis.ohio-state.edu login: anonymous password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get marshall.steering.ps.Z ftp> quit unix> uncompress marshall.steering.ps.Z unix> lpr marshall.steering.ps.Z If you have trouble printing the file on a Postscript-compatible printer, send me e-mail (marshall at cs.unc.edu) with your postal address, and I'll have a hardcopy mailed to you (may take several weeks for delivery, though). ---------------------------------------------------------------------- From flann at nick.cs.usu.edu Fri Jan 10 10:50:58 1992 From: flann at nick.cs.usu.edu (flann@nick.cs.usu.edu) Date: Fri, 10 Jan 92 08:50:58 -0700 Subject: NN package for IBM PC's Message-ID: <9201101550.AA04877@nick.cs.usu.edu> If any of you know of a public domain NN package that runs on an IBM PC (or equivalent) please let me know. Nick Flann, flann at nick.cs.usu.edu From harnad at Princeton.EDU Fri Jan 10 12:28:55 1992 From: harnad at Princeton.EDU (Stevan Harnad) Date: Fri, 10 Jan 92 12:28:55 EST Subject: Movement Systems: BBS Special Call for Commentators Message-ID: <9201101728.AA25367@clarity.Princeton.EDU> Below are the abstracts of 8 forthcoming target articles for a special issue on Movement Systems that will appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. This will be the first in a new series called "Controversies in Neuroscience," done in collaboration with Paul Cordo and the RS Dow Neurological Science Institute. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on any of these articles, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at clarity.princeton.edu or harnad at pucc.bitnet or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] Please specify which article or articles you would like to comment on. (Commentators will be allotted 1000 words to comment on one of the articles, 750 words more to comment on two of them, 500 more for three and then 250 more for each additional one, for a maximum of 3500 words to comment on all eight target articles.) To help us put together a balanced list of commentators, please give some indication of the aspects of the topic on which you would bring your areas of expertise to bear if you were selected as a commentator. In the next week or so, electronic drafts of the full text of each article will be available for inspection by anonymous ftp according to the instructions that follow after the abstracts. These drafts are for inspection only; please do not prepare a commentary until you are formally invited to do so. ____________________________________________________________________ 1. Alexander GE, MR De Long, & MD Crutcher: DO CORTICAL AND BASAL GANGLIONIC MOTOR AREAS USE "MOTOR PROGRAMS" TO CONTROL MOVEMENT? bbs.alexander 2. Bizzi E, N Hogan, FA Mussa-Ivaldi & S Giszter: DOES THE NERVOUS SYSTEM USE EQUILIBRIUM-POINT CONTROL TO GUIDE SINGLE AND MULTIPLE JOINT MOVEMENTS? bbs.bizzi 3. Bloedel JR: DOES THE ONE-STRUCTURE/ONE-FUNCTION RULE APPLY TO THE CEREBELLUM? bbs.bloedel 4. Fetz EH: ARE MOVEMENT PARAMETERS RECOGNIZABLY CODED IN SINGLE NEURON ACTIVITY? bbs.fetz 5. Gandevia SC & D Burke: DOES THE NERVOUS SYSTEM DEPEND ON KINESTHETIC INFORMATION TO CONTROL NATURAL LIMB MOVEMENTS? bbs.gandevia 6. McCrea DA: CAN SENSE BE MADE OF SPINAL INTERNEURON CIRCUITS? bbs.mccrea 7. Robinson DA: IMPLICATIONS OF NEURAL NETWORKS FOR HOW WE THINK ABOUT BRAIN FUNCTION bbs.robinson 8. Stein JF: POSTERIOR PARIETAL CORTEX AND EGOCENTRIC SPACE bbs.stein ---------------------------------------------------------------- 1. DO CORTICAL AND BASAL GANGLIONIC MOTOR AREAS USE "MOTOR PROGRAMS" TO CONTROL MOVEMENT? Garrett E. Alexander, Mahlon R. De Long, and Michael D. Crutcher Department of Neurology Emory University School of Medicine Atlanta, GA 30322 gea at vax3200.neuro.emory.edu KEYWORDS: basal ganglia, cortex, motor system, motor program, motor control, parallel processing, connectionism, neural network ABSTRACT: Prevailing engineering-inspired theories of motor control based on sequential/algorithmic or motor programming models are difficult to reconcile with what is known about the anatomy and physiology of the motor areas. This is partly because of certain problems with the theories themselves and partly because of features of the cortical and basal ganglionic motor circuits that seem ill-suited for most engineering analyses of motor control. Recent developments in computational neuroscience offer more realistic connectionist models of motor processing. The distributed, highly parallel, and nonalgorithmic processes in these models are inherently self-organizing and hence more plausible biologically than their more traditional algorithmic or motor-programming counterparts. The newer models also have the potential to explain some of the unique features of natural, brain-based motor behavior and to avoid some of the computational dilemmas asscociated with engineering approaches. ------------------------------------------------------------------- 2. DOES THE NERVOUS SYSTEM USE EQUILIBRIUM-POINT CONTROL TO GUIDE SINGLE AND MULTIPLE JOINT MOVEMENTS? E. Bizzi, N. Hogan, F.A. Mussa-Ivaldi and S. Giszter Department of Brain and Cognitive Sciences and Department of Mechanical Engineering Massachusetts Institute of Technology Cambridge, MA 02139 emilio at wheaties.ai.mit.edu KEYWORDS: spinal cord, force field, equilibrium point, microstimulation, multi-joint coordination, contact tasks, robotics, inverse dynamics, motor control. ABSTRACT: The hypothesis that the central nervous system (CNS) generates movement as a shift of the limb's equilibrium posture has been corroborated experimentally in single- and multi-joint motions. Posture may be controlled through the choice of muscle length tension curves that set agonist-antagonist torque-angle curves determining an equilibrium position for the limb and the stiffness about the joints. Arm trajectories seem to be generated through a control signal defining a series of equilibrium postures. The equilibrium-point hypothesis drastically simplifies the requisite computations for multijoint movements and mechanical interactions with complex dynamic objects in the environment. Because the neuromuscular system is springlike, the instantaneous difference between the arm's actual position and the equilibrium position specified by the neural activity can generate the requisite torques, avoiding the complex "inverse dynamic" problem of computing the torques at the joints. The hypothesis provides a simple unified description of posture and movement as well as performance on contact control tasks, in which the limb must exert force stably and do work on objects in the environment. The latter is a surprisingly difficult problem, as robotic experience has shown. The prior evidence for the hypothesis came mainly from psychophysical and behavioral experiments. Our recent work has shown that microstimulation of the spinal cord's premotoneuronal network produces leg movements to various positions in the frog's motor space. The hypothesis can now be investigated in the neurophysiological machinery of the spinal cord. -------------------------------------------------------------------- 3. DOES THE ONE-STRUCTURE/ONE-FUNCTION RULE APPLY TO THE CEREBELLUM? James R. Bloedel Division of Neurobiology Barrow Neurological Institute Phoenix, AZ KEYWORDS: cerebellum; Purkinje cells; mossy fibres; movement; proprioception; body image; kinesthesis; robotics; posture. ABSTRACT: The premise explored in this target article is that the function of the cerebellum can be best understood in terms of the operation it performs across its structurally homogeneous subdivisions. The functional heterogeneity sometimes ascribed to these different regions reflects the many functions of the central targets receiving the outputs of different cerebellar regions. Recent studies by ourselves and others suggest that the functional unit of the cerebellum is its sagittal zone. It is hypothesized that the climbing fiber system produces a short-lasting modification in the gain of Purkinje cell responses to its other principle afferent input, the mossy fiber-granule cell-parallel fiber system. Because the climbing fiber inputs to sagittally aligned Purkinje cells can be activated under functionally specific conditions, they could select populations of Purkinje neurons that were most highly modulated by the distributed mossy fiber inputs responding to the same conditions. These operations may be critical for the on-line integration of inputs representing external target space with features of intended movement, proprioceptive and kinesthetic cues, and body image. ----------------------------------------------------------------- 4. ARE MOVEMENT PARAMETERS RECOGNIZABLY CODED IN SINGLE NEURON ACTIVITY? Eberhard E. Fetz Regional Primate Research Center University of Washington Seattle, WA 98195 fetz at locke.hs.washington.edu KEYWORDS: neural coding; representation; neural networks; cross-correlation; movement parameters; parallel distributed processing ABSTRACT: To investigate neural mechanisms of movement, physiologists have analyzed the activity of task-related neurons in behaving animals. The relative onset latencies of neural activity have been scrutinized for evidence of a functional hierarchy of sequentially recruited centers, but activity appears to change largely in parallel. Neurons whose activity covaries with movement parameters have been sought for evidence of explicit coding of parameters such as active force, limb displacement and behavioral set. Neurons with recognizable relations to the task are typically selected from a larger population, ignoring unmodulated cells as well as cells whose activity is not related to the task in a simple, easily recognized way. Selective interpretations are also used to support the notion that different motor regions perform different motor functions; again, current evidence suggests that units with similar properties are widely distributed over different regions. These coding issues are re-examined for premotoneuronal (PreM) cells, whose correlational links with motoneurons are revealed by spike-triggered averages. PreM cells are recruited over long times relative to their target muscles. They show diverse response patterns relative to the muscle force they produce; functionally disparate PreM cells such as afferent fibers and descending corticomotoneuronal and rubromotoneuronal cells can exhibit similar patterns. Neural mechanisms have been further elucidated by neural network simulations of sensorimotor behavior; the pre-output hidden units typically show diverse responses relative to their targets. Thus, studies in which both the activity and the connectivity of the same units is known reveal that units with many kinds of relations to the task, simple and complex, contribute significantly to the output. This suggests that the search for explicit coding may be diverting us from understanding more distributed neural mechanisms that are more complex and less directly related to explicit movement paremeters. ------------------------------------------------------------------ 5. DOES THE NERVOUS SYSTEM DEPEND ON KINESTHETIC INFORMATION TO CONTROL NATURAL LIMB MOVEMENTS? S.C. Gandevia and David Burke Department of Clinical Neurophysiology Institute of Neurological Sciences The Prince Henry Hospital P.O. Box 233 Matraville, N.S.W. 2036 Sydney, Australia KEYWORDS: kinesthesia, motor control, muscle, joint and cutaneous afferents, motor commands, deafferentation ABSTRACT: This target article draws together two groups of experimental studies on the control of human movement through peripheral feedback and centrally generated signals of motor command. First, during natural movement, feedback from muscle, joint and cutaneous afferents changes; in human subjects these changes have reflexive and kinesthetic consequences. Recent psychophysical and microneurographic evidence suggests that joint and even cutaneous afferents may have a proprioceptive role. Second, the role of centrally generated motor commands in the control of normal movements and movements following acute and chronic of deafferentation is reviewed. There is increasing evidence that subjects can perceive their motor commands under various conditions, but this is inadequate for normal movement; deficits in motor performance arise when the reliance on proprioceptive feedback is abolished, either experimentally or because of pathology. During natural movement, the CNS appears to have access to functionally useful input from a range of receptors as well as from internally generated command signals. Remaining unanswered questions suggest a number of avenues for further research. ------------------------------------------------------------------ 6. CAN SENSE BE MADE OF SPINAL INTERNEURON CIRCUITS? David A. McCrea The Department of Physiology Faculty of Medicine University of Manitoba 770 Bannatyne Avenue Winnipeg, Manitoba, Canada R3E OW3 dave at scrc.umanitoba.ca KEYWORDS: interneuron, motor control, reflexes, spinal cord, flexion, muscle synery, presynaptic inhibition. ABSTRACT: It is increasingly clear that spinal reflex systems cannot be described in terms of simple and constant reflex actions. The extensive convergence of segmental and descending systems onto spinal interneurons suggests that spinal interneurons are not relay systems but rather form a crucial component in determining which muscles are activated during voluntary and reflex movements. The notion that descending systems simply modulate the gain of spinal interneuronal pathways has been tempered by the observation that spinal interneurons gate and distribute descending control to specific motoneurons. Spinal systems are complex, but current approaches will continue to provide insight into motor systems. During movement, several neural mechanisms act to reduce the functional complexity of motor systems by inhibiting some of the parallel reflex pathways available to segmental afferents and descending systems. The flexion reflex system is discussed as an example of the flexibility of spinal interneuron systems and as useful construct. Examples are provided of the kinds of experiments that can be developed using current approaches to spinal interneuronal systems. -------------------------------------------------------------------- 7. IMPLICATIONS OF NEURAL NETWORKS FOR HOW WE THINK ABOUT BRAIN FUNCTION David A. Robinson Ophthalmology, Biomedical Engineering, and Neuroscience The Johns Hopkins University, School of Medicine Room 355 Woods Res. Bldg. The Wilmer Institute Baltimore, MD 21205 KEYWORDS: Neural networks, signal processing, oculomotor system, vestibulo-ocular reflex, pursuit eye movements, saccadic eye movements, coordinate transformations ABSTRACT: Engineers use neural networks to control systems too complex for conventional engineering analysis. To examine hidden unit behavior would defeat the purpose of this approach, because individual units would be largely uninterpretable. Yet neurophysiologists spend their careers doing just that! Hidden units contain bits and pieces of signals that yield only arcane hints of network function and no information about how the units process signals. Most of the literature on single-unit recordings attests to this grim fact. On the other hand, knowing system function and describing it with elegant mathematics tells one very little about what to expect of interneuron behavior. Examples of simple networks based on neurophysiology are taken from the oculomotor literature to suggest how single-unit interpretability might degrade with increasing task complexity. Trying to explain how any real neural network works on a cell-by-cell, reductionist basis is futile; we may have to be content with understanding the brain at higher levels of organization. ------------------------------------------------------------------- 8. POSTERIOR PARIETAL CORTEX AND EGOCENTRIC SPACE J.F. Stein University Laboratory of Physiology University of Oxford Oxford, England OX1 3PT stein at vax.oxford.ac.uk KEYWORDS: posterior parietal cortex; egocentric space; space perception; attention; coordinate transformations; distributed systems; neural networks. ABSTRACT: The posterior parietal cortex (PPC) is the most likely site where egocentric spatial relationships are represented in the brain. PPC cells receive visual, auditory, somaesthetic and vestibular sensory inputs, oculomotor, head, limb and body motor signals, and strong motivational projections from the limbic system. Their discharge increases not only when an animal moves towards a sensory target, but also when it directs its attention to it. PPC lesions have the opposite effect: sensory inattention and neglect. PPC does not seem to contain a "map" of the location of objects in space but a distributed neural network for transforming one set of sensory vectors into other sensory reference frames or into various motor coordinate systems. Which set of transformation rules is used probably depends on attention, which selectively enhances the synapses needed for a making particular sensory comparison or aiming a particular movement. -------------------------------------------------------------- To help you decide whether you would be an appropriate commentator for any of these articles, a (nonfinal) draft of each will soon be retrievable by anonymous ftp from princeton.edu according to the instructions below (filenames will be of the form bbs.alexander, based on the name of the first author). Please do not prepare a commentary on this draft. Just let us know, after having inspected it, what relevant expertise you feel you would bring to bear on what aspect of the article. --------------------------------------------------------------- To retrieve a file by ftp from a Unix/Internet site, type either: ftp princeton.edu or ftp 128.112.128.1 When you are asked for your login, type: anonymous For your password, type your real name. then change directories with: cd pub/harnad To show the available files, type: ls Next, retrieve the file you want with (for example): get bbs.alexander When you have the file(s) you want, type: quit JANET users can use the Internet file transfer utility at JANET node UK.AC.FT-RELAY to get BBS files. Use standard file transfer, setting the site to be UK.AC.FT-RELAY, the userid as anonymous at edu.princeton, the password as your own userid, and the remote filename to be the filename according to Unix conventions (e.g. pub/harnad/bbs.article). Lower case should be used where indicated, using quotes if necessary to avoid automatic translation into upper case. --------------------------------------------------------------- The above cannot be done form Bitnet directly, but there is a fileserver called bitftp at pucc.bitnet that will do it for you. Send it the one line message help for instructions (which will be similar to the above, but will be in the form of a series of lines in an email message that bitftp will then execute for you). From andycl at syma.sussex.ac.uk Fri Jan 10 11:22:57 1992 From: andycl at syma.sussex.ac.uk (Andy Clark) Date: Fri, 10 Jan 92 16:22:57 GMT Subject: No subject Message-ID: <14634.9201101622@syma.sussex.ac.uk> bcc: andycl at cogs re: MA in Philosophy of Cognitive Science at Sussex University UNIVERSITY OF SUSSEX, BRIGHTON, ENGLAND SCHOOL OF COGNITIVE AND COMPUTING SCIENCES M.A. in the PHILOSOPHY OF COGNITIVE SCIENCE The is a one year course which aims to foster the study of foundational issues in Cognitive Science and Computer Modelling. It is designed for students with a background in Philosophy although offers may be made to exceptional students whose background is in some other discipline related to Cognitive Science. Students would combine work towards a 20,000 word philosophy dissertation with subsidiary courses concerning aspects of A.I. and the other Cognitive Sciences. General Information. The course is based in the School of Cognitive and Computing Sciences. The School provides a highly active and interdisciplinary environment involving linguists, cognitive psychologists, philosophers and A.I. researchers. The kinds of work undertaken in the school range from highly practical applications of new ideas in computing to the most abstract philosophical issues concerning the foundations of cognitive science. The school attracts a large number of research fellows and distinguished academic visitors, and interdisciplinary dialogue is encouraged by several weekly research seminars. Course Structure of the MA in Philosophy of Cognitive Science TERM 1 Compulsory Course: Philosophy of Cognitive Science Topic: The Representational Theory of Mind: From Fodor to Connectionism. and one out of : Artificial Intelligence Programming (Part I) Knowledge Representation Natural Language Syntax Psychology I Computer Science I Modern Analytic Philosophy (1) Modern European Philosophy (1) Artificial Intelligence and Creativity TERM 2 Compulsory Course: Philosophy of Cognitive Science (II) Topic: Code,Concept and Process: Philosophy, Neuropsychology and A.I. and one out of: Artificial Intelligence Programming (Part II) Natural Language Processing Computer Vision Neural Networks Intelligent Tutoring Systems Psychology II Computer Science II Social Implications of AI Modern Analytic Philosophy (2) Modern European Philosophy (2) TERM 3 Supervised work for the Philosophy of Cognitive Science dissertation (20,000 words) Courses are taught by one hour lectures , two hour seminars and one hour tutorials. Choice of options is determined by student preference and content of first degree. Not all options will always be available and new options may be added according to faculty interests. CURRENT TEACHING FACULTY for the MA Dr A. Clark Philosophy of Cognitive Science I and II Mr R.Dammann Recent European Philosophy Dr M.Morris Recent Analytic Philosophy Dr S Wood and Mr R Lutz AI Programming I Dr B Katz Knowledge Representation Neural Networks Dr N Yuill Psychology I Dr M. Scaife Psychology II Prof M Boden Artificial Intelligence and Creativity Social Implications of AI Dr L Trask Natural Language Syntax \& Semantics Dr S Easterbrook Computer Science I \& II Dr D Weir Logics for Artificial Intelligence Dr D Young Computer Vision Dr B Keller Natural Language Processing Dr Y Rogers & Prof B du Boulay Intelligent Tutoring Systems ENTRANCE REQUIREMENTS These will be flexible. A first degree in Philosophy or one of the Cognitive Sciences would be the usual minimum requirement. FUNDING U.K.students may apply for British Academy funding for this course in the usual manner. Overseas students would need to be funded by home bodies. CONTACT For an application form, or further information, please write to Dr Allen Stoughton at the School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton BN1 9QH, or phone him on (0273) 606755 ext. 2882, or email - allen at cogs.sussex.ac.uk. From SCCS6082%IRUCCVAX.UCC.IE at BITNET.CC.CMU.EDU Mon Jan 13 04:46:00 1992 From: SCCS6082%IRUCCVAX.UCC.IE at BITNET.CC.CMU.EDU (SCCS6082%IRUCCVAX.UCC.IE@BITNET.CC.CMU.EDU) Date: Mon, 13 Jan 1992 09:46 GMT Subject: No subject Message-ID: <01GF9HC0EGGW0006LI@IRUCCVAX.UCC.IE> Hi, My request is similar to Nick Flans, I'm looking for any SOURCE code for any type of neural network. I'm working on a simulator to compare learning times for different types. I already have quickprop, rcc and one or two others but I'm looking for more so I can have a good data set.Anything in C,C++ ,Pascal or LISP ,for any machine, would be much appreciated.I won't be claiming credit in my thesis for anything I receive. Also has anyone got a set of standard (or their standard) benchmarks that they use for training? I use various ones that people present in their papers when they propose a new algorithm , but their doesn't seem to be anything standard around, Thanking you in advance, Colin McCormack, University College Cork, Ireland. From DOLL%BROWNCOG.BITNET at BITNET.CC.CMU.EDU Mon Jan 13 08:22:00 1992 From: DOLL%BROWNCOG.BITNET at BITNET.CC.CMU.EDU (DOLL%BROWNCOG.BITNET@BITNET.CC.CMU.EDU) Date: Mon, 13 Jan 1992 08:22 EST Subject: Lost Mail Message-ID: <01GF9EFDGY1W000D8T@BROWNCOG.BITNET> From postmaster%gonzo.inescn.pt at CARNEGIE.BITNET Thu Jan 9 18:17:46 1992 From: postmaster%gonzo.inescn.pt at CARNEGIE.BITNET (PT Gateway) Date: Thu, 9 Jan 92 23:17:46 GMT Subject: PT Mail Network -- failed mail Message-ID: <9201092317.AA04593@gonzo.inescn.pt> ----- Mail failure diagnostics ----- From j_bonnet%inescn.pt at CARNEGIE.BITNET Thu Jan 9 09:11:00 1992 From: j_bonnet%inescn.pt at CARNEGIE.BITNET (Jose' M. Bonnet) Date: 9 Jan 92 9:11 Subject: Mailing List Message-ID: <104*j_bonnet@inescn.pt> Call for Participation in a Workshop on THE COGNITIVE SCIENCE OF NATURAL LANGUAGE PROCESSING 14-15 March, 1992 Dublin City University Guest Speakers: George McConkie University of Illinois at Urbana-Champaign Kim Plunkett University of Oxford Noel Sharkey University of Exeter Attendance at the CSNLP workshop will be by invitation on the basis of a submitted paper. Those wishing to be considered should send a paper (hardcopy, no e-mail submissions please) of not more than eight A4 pages to Ronan Reilly (e-mail: ronan_reilly at eurokom.ie), Educational Research Centre, St Patrick's College, Dublin 9, Ireland, not later than 3 February, 1992. Notification of acceptance along with registration and accommodation details will be sent out by 17 February, 1992. Submitting authors should also send their fax number and/or e-mail address to help speed up the selection process. The particular focus of the workshop will be on the computational modelling of human natural language processing (NLP), and preference will be given to papers that present empirically supported computational models of any aspect of human NLP. An additional goal in selecting papers will be to provide coverage of a range of NLP areas. This workshop is supported by the following organisations: Educational Research Centre, St Patrick's College, Dublin; Linguistics Insititute of Ireland; Dublin City University; and the Commission of the European Communities through the DANDI ESPRIT Basic Research Action (No. 3351). From lazzaro at boom.CS.Berkeley.EDU Tue Jan 14 16:25:14 1992 From: lazzaro at boom.CS.Berkeley.EDU (John Lazzaro) Date: Tue, 14 Jan 92 13:25:14 PST Subject: No subject Message-ID: <9201142125.AA00345@boom.CS.Berkeley.EDU> An announcement of a NIPS-4 preprint on the neuroprose server ... Temporal Adaptation in a Silicon Auditory Nerve John Lazzaro, CS Division, UC Berkeley Abstract -------- Many auditory theorists consider the temporal adaptation of the auditory nerve a key aspect of speech coding in the auditory periphery. Experiments with models of auditory localization and pitch perception also suggest temporal adaptation is an important element of practical auditory processing. I have designed, fabricated, and successfully tested an analog integrated circuit that models many aspects of auditory nerve response, including temporal adaptation. ----- To retrieve ... >ftp cheops.cis.ohio-state.edu >Name (cheops.cis.ohio-state.edu:lazzaro): anonymous >331 Guest login ok, send ident as password. >Password: your_username >230 Guest login ok, access restrictions apply. >cd pub/neuroprose >binary >get lazzaro.audnerve.ps.Z >quit %uncompress lazzaro.audnerve.ps.Z %lpr lazzaro.audnerve.ps ---- --john lazzaro lazzaro at boom.cs.berkeley.edu From marwan at ee.su.OZ.AU Tue Jan 14 19:44:58 1992 From: marwan at ee.su.OZ.AU (Marwan Jabri) Date: Wed, 15 Jan 1992 11:44:58 +1100 Subject: No subject Message-ID: <9201150044.AA23190@brutus.ee.su.OZ.AU> Sydney University Electrical Engineering RESEARCH FELLOW IN MACHINE INTELLIGENCE Applications are invited for a position as a Girling Watson Research Fellow to work for the Machine Intelligence Group in the area of information integration and multi-media. The applicant should have a strong research and development experience, preferably with a background in machine intelligence, artificial neural networks or multi-media. The project involves research into multi-source knowledge representation, integration, machine learning and associated computing architectures. The applicant should have either a PhD or an equivalent industry research and development experience. The appointment is available for a period of three years, subject to the submission of an annual progress report. Salary is in the range of Research Fellow: A$ 39,463 to A$ 48,688. Top of salary range unvailable until July 1992. For further information contact Dr M. Jabri, Tel: (+61-2) 692-2240, Fax: (+61-2) 660-1228. Membership of a superannuation scheme is a condition of employment for new appointees. Method of applications: Applications quoting Ref No: 02/16, including curriculum vitae and the names, addresses and phone nos of two referees, should be sent to the: Assistant Registrar (appointments) Staff Office (KO7), The University of Sydney, NSW 2006 Australia Closing: January 23, 1992. From gordon at AIC.NRL.Navy.Mil Wed Jan 15 15:59:56 1992 From: gordon at AIC.NRL.Navy.Mil (gordon@AIC.NRL.Navy.Mil) Date: Wed, 15 Jan 92 15:59:56 EST Subject: workshop announcement Message-ID: <9201152059.AA28490@sun25.aic.nrl.navy.mil> CALL FOR PAPERS Informal Workshop on ``Biases in Inductive Learning" To be held after ML-92 Saturday, July 4, 1992 Aberdeen, Scotland All aspects of an inductive learning system can bias the learn- ing process. Researchers to date have studied various biases in inductive learning such as algorithms, representations, background knowledge, and instance orders. The focus of this workshop is not to examine these biases in isolation. Instead, this workshop will examine how these biases influence each other and how they influence learning performance. For example, how can active selection of instances in concept learning influence PAC convergence? How might a domain theory affect an inductive learning algorithm? How does the choice of representational bias in a learner influence its algo- rithmic bias and vice versa? The purpose of this workshop is to draw researchers from diverse areas to discuss the issue of biases in inductive learning. The workshop topic is a unifying theme for researchers working in the areas of reformulation, constructive induction, inverse resolu- tion, PAC learning, EBL-SBL learning, and other areas. This workshop does not encourage papers describing system comparisons. Instead, the workshop encourages papers on the following topics: - Empirical and analytical studies comparing different biases in inductive learning and their quantitative and qualitative influ- ence on each other or on learning performance - Studies of methods for dynamically adjusting biases, with a focus on the impact of these adjustments on other biases and on learning performance - Analyses of why certain biases are more suitable for particular applications of inductive learning - Issues that arise when integrating new biases into an existing inductive learning system - Theory of inductive bias Please send 4 hard copies of a paper (10-15 double-spaced pages, ML-92 format) or (if you do not wish to present a paper) a descrip- tion of your current research to: Diana Gordon Naval Research Laboratory, Code 5510 4555 Overlook Ave. S.W. Washington, D.C. 20375-5000 USA Email submissions to gordon at aic.nrl.navy.mil are also acceptable, but they must be in postscript. FAX submissions will not be accepted. If you have any questions about the workshop, please send email to Diana Gordon at gordon at aic.nrl.navy.mil or call 202-767- 2686. Important Dates: March 12 - Papers and research descriptions due May 1 - Acceptance notification June 1 - Final version of papers due Program Committee: Diana Gordon, Naval Research Laboratory Dennis Kibler, University of California at Irvine Larry Rendell, University of Illinois Jude Shavlik, University of Wisconsin William Spears, Naval Research Laboratory Devika Subramanian, Cornell University Paul Vitanyi, CWI and University of Amsterdam From jim at gdsnet.grumman.com Wed Jan 15 18:08:55 1992 From: jim at gdsnet.grumman.com (Jim Eilbert) Date: Wed, 15 Jan 92 18:08:55 EST Subject: Paper available: Neural mechanisms for steering in visual motion In-Reply-To: Jonathan Marshall's message of Thu, 9 Jan 92 14:31:05 -0500 <9201091931.AA19960@marshall.cs.unc.edu> Message-ID: <9201152308.AA22264@gdsnet.grumman.com> Jonathan, Iam interested in getting a copy of your paper Challenges of Vision Theory: Self-Organization of Neural Mechanisms for Stable Steering of Object-Grouping Data in Visual Motion Perception Jonathan A. Marshall However, the host list on my computer does not know about ftp archive.cis.ohio-state.edu Could you send me the network address of this computer. If that is not readily available, I'll wait for a hardcopy. Thanks, Jim Eilbert M/S A02-26 Grumman CRC Bethpage, NY 11714 From SAYEGH at CVAX.IPFW.INDIANA.EDU Wed Jan 15 20:52:40 1992 From: SAYEGH at CVAX.IPFW.INDIANA.EDU (SAYEGH@CVAX.IPFW.INDIANA.EDU) Date: Wed, 15 Jan 1992 20:52:40 EST Subject: Proceedings Announcement Message-ID: <920115205240.21a00cdd@CVAX.IPFW.INDIANA.EDU> The Proceedings of the Fourth Conference on Neural Networks and Parallel Distributed Processing at Indiana University-Purdue University at Fort Wayne, held April 11, 12, and 13, 1991 are now available. They can be ordered ($6 + $1 U.S. mail cost) from: Ms. Sandra Fisher, Physics Department Indiana University-Purdue University at Fort Wayne Fort Wayne, IN 46805-1499 FAX: (219) 481-6880 Voice: (219) 481-6306 OR 481-6157 email: proceedings at ipfwcvax.bitnet The following papers are included in the Proceedings: Optimization and genetic algorithms: J.L. Noyes, Wittenberg University Neural Network Optimization Methods Robert L. Sedlmeyer, Indiana University-Purdue University at Fort Wayne A Genetic Algorithm to Estimate the Edge-Intergrity of Halin Graphs Omer Tunali & Ugur Halici, University of Missouri/Rolla A Boltzman Machine for Hypercube Embedding Problem William G. Frederick and Curt M. White, Indiana University-Purdue University at Fort Wayne Genetic Algorithms and a Variation on the Steiner Point Problem Network analysis: P.G. Madhavan, B. Xu, B. Stephens, Purdue University, Indianapolis On the Convergence Speed and the Generalization Ability of Tri-state Neural Networks Mohammad R. Sayeh, Southern Illinois University at Carbondale Dynamical-System Approach to Unsupervised Classifier Samir I. Sayegh, Indiana University-Purdue University at Fort Wayne Symbolic Manipulation and Neural Networks Zhenni Wang, Ming T. Tham & A.J. Morris, University of Newcastle upon Tyne Multilayer Neural Networks: Approximated Canonical Decomposition of Nonlinearity M.G. Royer & O.K. Ersoy, Purdue University, West Lafayette Classification Performance of Pshnn with BackPropagation Stages Sean Carroll, Tri-State University Single-Hidden-Layer Neural Nets Can Approximate B-Splines G. Allen Pugh, Indiana University-Purdue University at Fort Wayne Further Design Considerations for Back Propagation Biological aspects: R. Manalis, Indiana University-Purdue University at Fort Wayne Short Term Memory Implicated in Twitch Facilitation Edgar Erwin, K. Obermayer, University of Illinois Formation and Variability of Somatotopic Maps with Topological Mismatch T. Alvager, B. Humpert, P. Lu, and C. Roberts, Indiana State University DNA Sequence Analysis with a Neural Network Christel Kemke, DFKI, Germany Towards a Synthesis of Neural Network Behavior Arun Jagota, State University of New York at Buffalo A Forgetting Rule and Other Extensions to the Hopfield-Style Network Storage Rule and Their Applications applications: I.H. Shin and K.J. Cios, The University of Toledo A Neural Network Paradigm and Architecture for Image Pattern Recognition R.E. Tjia, K.J. Cios and B.N. Shabestari, The University of Toledo Neural Network in Identification of Car Wheels from Gray Level Images M.D. Tom and M.F. Tenorio, Purdue University, West Lafayette A Neuron Architecture with Short Term Memory S. Sayegh, C. Pomalaza-Raez, B. Beer and E. Tepper, Indiana University-Purdue University at Fort Wayne Pitch and Timbre Recognition Using Neural Network Jacek Witaszek & Colleen Brown, DePaul University Automatic Construction of Connectionist Expert Systems Robert Zerwekh, Northern Illinois University Modeling Learner Performance: Classifying Competence Levels Using Adaptive Resonance Theory tutorial lectures: Marc Clare, Lincoln National Corporation, Fort Wayne An Introduction to the Methodology of Building Neural Networks Ingrid Russell, University of Hartford Integrating Neural Networks into an AI Course Arun Jagota, State University of New York at Buffalo The Hopfield Model and Associative Memory Ingrid Russell, University of Hartford Self Organization and Adaptive Resonance Theory Models Note: Copies of the Proceedings of the Third Conference on NN&PDP are also available and can be ordered from the same address. From PSS001 at VAXA.BANGOR.AC.UK Thu Jan 16 05:55:37 1992 From: PSS001 at VAXA.BANGOR.AC.UK (PSS001@VAXA.BANGOR.AC.UK) Date: Thu, 16 JAN 92 10:55:37 GMT Subject: No subject Message-ID: <01GFDG54Y5SW8Y5CAT@BITNET.CC.CMU.EDU> Workshop on Neurodynamics and Psychology April 22nd -April 24th 1992 Cognitive Neurocomputation Unit, University of Wales, Bangor Session chairs and likely speakers include: Igor Aleksander (London) Alan Allport (Oxford) Jean-Pierre Changeux (Paris) Stanislas Dehaene (Paris) Glyn Humphreys (Birmingham) Marc Richelle (Lige) Tim Shallice (London) John Taylor (London) David Willshaw (Edinburgh) The purpose of this workshop is to bring together researchers to outline and define a new area of research that has arisen from work within such diverse disciplines as neurobiology, cognitive psychology, artificial intelligence and computer science. This area concerns the representation of time within natural and artificial neural systems, and the role these representations play in behaviours from spatial learning in animals to high-level cognitive functions such as language processing, problem solving, reasoning, and sequential pattern recognition in general. Attendance at this workshop will be limited to 50 to allow ample time for discussion. For further details contact: Mike Oaksford or Gordon Brown, Neurodynamics Workshop, Cognitive Neurocomputation Unit, Department of Psychology, University of Wales, Bangor, Gwynedd, LL57 2DG, United Kingdom. Tel: 0248 351151 Ext. 2211. Email: PSS//1 at uk.ac.bangor.vaxa Sponsored by the British Psychological Society (Welsh Branch) From hinton at ai.toronto.edu Thu Jan 16 10:07:26 1992 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Thu, 16 Jan 1992 10:07:26 -0500 Subject: NIPS preprint in neuroprose Message-ID: <92Jan16.100733edt.73@neuron.ai.toronto.edu> The following paper is available as hinton.handwriting.ps.Z in neuroprose ADAPTIVE ELASTIC MODELS FOR HAND-PRINTED CHARACTER RECOGNITION Geoffrey E. Hinton, Christopher K. I. Williams and Michael D. Revow Department of Computer Science, University of Toronto ABSTRACT Hand-printed digits can be modeled as splines that are governed by about 8 control points. For each known digit, the control points have preferred "home" locations, and deformations of the digit are generated by moving the control points away from their home locations. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the deformation energy of the digit model and the log probability that the model would generate the inked pixels in the image. The model with the lowest total energy wins. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image. The digit models learn by modifying the home locations of the control points. From becker at ai.toronto.edu Thu Jan 16 14:44:36 1992 From: becker at ai.toronto.edu (becker@ai.toronto.edu) Date: Thu, 16 Jan 1992 14:44:36 -0500 Subject: NIPS preprint in neuroprose Message-ID: <92Jan16.144445edt.10@neuron.ai.toronto.edu> The following paper is available as becker.prediction.ps.Z in neuroprose: LEARNING TO MAKE COHERENT PREDICTIONS IN DOMAINS WITH DISCONTINUITIES Suzanna Becker and Geoffrey E. Hinton Department of Computer Science, University of Toronto ABSTRACT We have previously described an unsupervised learning procedure that discovers spatially coherent properties of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hinton, 1992). In this paper, we propose two new models which handle surfaces with discontinuities. The first model attempts to detect cases of discontinuities and reject them. The second model develops a mixture of expert interpolators. It learns to detect the locations of discontinuities and to invoke specialized, asymmetric interpolators that do not cross the discontinuities. From rsun at orion.ssdc.honeywell.com Thu Jan 16 17:21:46 1992 From: rsun at orion.ssdc.honeywell.com (Ron Sun) Date: Thu, 16 Jan 92 16:21:46 CST Subject: No subject Message-ID: <9201162221.AA07989@orion.ssdc.honeywell.com> Call For Participation The AAAI Workshop on Integrating Neural and Symbolic Processes (the Cognitive Dimension) to be held at the Tenth National Conference on Artificial Intelligence July 12-17, 1992 San Jose, CA There has been a large amount of research in integrating neural and symbolic processes that uses networks of simple units. However, there is relatively little work so far in comparing and combining these fairly isolated efforts. This workshop will provide a forum for discussions and exchanges of ideas in this area, to foster cooperative work, and to produce synergistic results. The workshop will tackle important issues in integrating neural and symbolic processes, such as: What are the fundamental problems of integrating neural and symbolic processes? Why should we integrate them after all? What class of problems is well-suited to such integration? What are the relative advantages of each approach or technique in achieving such integrations? Is cognitive plausibility an important criterion? How do we judge the cognitive plausibility of existing approaches? what is the nature of psychological and/or biological evidence for existing models, if there is any? What role does emergent behavior vs. a localist approach play in integrating these processes? (Explicit symbol manipulation vs. their functional counterparts) Is it possible to synthesize various existing models? ----------------------- The workshop will include invited talks, presentations, questions and answers, and general discussion sessions. Currently invited speakers include: Jerome Feldman, ICSI; Stuart Dreyfus, IEOR, UC Berkeley. Jim Hendler, U Maryland. Research addressing connectionist rule-based reasoning, connectionist natural language processing, other high-level connectionist models, compositionality, connectionist knowledge representation is particularly relevant to the workshop. ------------------------ If you wish to present, submit an extended abstract (up to 5 pages); If you only wish to attend the workshop, send a one-page description of your interest; All submissions should include 4 hardcopies AND 1 electronic copy (via e-mail) by March 13, 1992 to the workshop chair: Dr. Ron Sun \\ Honeywell SSDC \\ 3660 Technology Drive \\ Minneapolis, MN 55418 \\ rsun at orion.ssdc.honeywell.com \\ (612)-782-7379 \\ Organizing Committee: Dr. Ron Sun, Honeywell SSDC; Dr. Lawrence Bookman, Brandeis University; Prof. Shashi Shekhar, University of Minnesota. From suddarth at cs.UMD.EDU Fri Jan 17 13:15:16 1992 From: suddarth at cs.UMD.EDU (Steven C. Suddarth) Date: Fri, 17 Jan 92 13:15:16 -0500 Subject: GNN 92 call for papers Message-ID: <9201171815.AA04780@mimsy.cs.UMD.EDU> The following is an announcement for the 3rd annual GNN meeting. It is one of the few refereed forums for applications. Last year we accepted about one out of four papers, and generally had an interesting meeting. The meeting ws also useful for those seeking collaborators. If you have made any significant contributions to a neural-network oriented application, you may want to submit an abstract. Steve Suddarth suddarth at cs.umd.edu ***************************************************************** Government Neural Network Applications Workshop G N N 9 2 Dayton, Ohio, August 24-28, 1992 ----------------------------- C A L L F O R P A P E R S ----------------------------- The 1992 Government Neural Network Applications Workshop will be held from 24-28 August at the Hope Hotel, Wright-Patterson AFB, Ohio. The tentative schedule is: 24 Aug - Registration and Tutorial 25-27 Aug - Main Meeting (includes export-controlled session) 28 Aug - Classified Meeting (tentative) * Authors are invited to submit abstracts on any application- oriented topic of interest to government agencies, this inludes: - Image/speech/signal - Man-machine - Detection and processing interface classification - Guidance and control - Medicine - Robotics * Presentations will be selected based upon two-page abstracts. Please note that ABSTRACTS LONGER THAN TWO PAGES WILL BE RETURNED. Also, the ABSTRACT DEADLINE IS APRIL 15, 1992. Abstracts should be accompanied by a cover letter stating the affiliation, address and phone number of the author. The cover letter should also state whether the presentation will be unclassified (open dissemination), unclassified (export controlled) or classified. Send abstracts to the following addresses Unclassified: Classified: GNN 92 GNN 92 Maj. Steven K. Rogers Maj. Steven K. Rodgers AFIT/ENG AFIT/ENA WPAFB, OH 45433 WPAFB, OH 45433 * The export-controlled session will be open to U.S. citizens only. Please use this session for unclassified material that you wish to present in a limited way. * Classified abstracts must contain a description of the classified portion and authors must make clear why classified material is important to the presentation. Please note that any "classified" abstracts without well marked classified content will be automatically rejected. Finally, the classified meeting will only take place if the classified program committee deems there to be a sufficient number of quality papers on worthwhile classified subjects. If authors are concerned about reducing the dissemination of unclassified material, they should use the export- controlled session of the main meeting. If the classified portion is unimportant to the main thrust of the abstract, it should be "sanitized" and submitted as unclassified. If accepted, classified authors will be allotted space in the proceedings for an (optional) unclassified paper. * Registration fees are not yet final, but they are expected to be in the $200 to $300 range, and they will include some meals. * The all-day tutorial on August 24 will be conducted by AFIT faculty. It will be oriented toward those who are new to the field and would like sufficient background for the remainder of the meeting. There will be no extra cost for the tutorial. Thanks to the Army, last year's meeting was a big hit. It was one of the few selective application-oriented meetings in this field. We look forward to your presence, and we know that you will find this meeting informative and useful whether you are currently building neural network applications, are thinking about them, or are contributing to theories that underpin them. Conference chairs are: General Chair: Capt. Steve Suddarth Mathematics Department (AFOSR/NM) Air Force Office of Scientific Research Bolling AFB DC 20332-6448 Comm: (202) 767-5028 DSN: 295-5028 Fax: (202) 404-7496 suddarth at cs.umd.edu Program Chair: Maj. Steve Rogers School of Engineering (AFIT/ENG) Air Force Institute of Technology Wright-Patterson AFB OH 45433 Comm: (513) 255-9266 DSN: 785-9266 Fax: (513) 476-4055 rogers at blackbird.afit.af.mil From krogh at cse.ucsc.edu Fri Jan 17 15:02:35 1992 From: krogh at cse.ucsc.edu (Anders Krogh) Date: Fri, 17 Jan 92 12:02:35 -0800 Subject: NIPS paper in Neuroprose Message-ID: <9201172002.AA05878@spica.ucsc.edu> The following paper has been placed in the Neuroprose archive Title: A Simple Weight Decay Can Improve Generalization Authors: Anders Krogh and John A. Hertz Filename: krogh.weight-decay.ps.Z (To appear in proceedings from NIPS 91) Abstract: It has been observed in numerical simulations that a weight decay can improve generalization in a feed-forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk. ---------------------------------------------------------------- FTP INSTRUCTIONS unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: anything ftp> cd pub/neuroprose ftp> binary ftp> get krogh.weight-decay.ps.Z ftp> bye unix> zcat krogh.weight-decay.ps.Z | lpr (or however you uncompress and print postscript) From bill at baku.eece.unm.edu Fri Jan 17 21:11:22 1992 From: bill at baku.eece.unm.edu (bill@baku.eece.unm.edu) Date: Fri, 17 Jan 92 19:11:22 MST Subject: Multi-layer threshold logic function question Message-ID: <9201180211.AA22025@baku.eece.unm.edu> Does anybody know of bounds on the number of hidden layer units required to implement an arbitrary logic function over n variables in a one-hidden layer MLP where each node uses hard-limiting nonlinearities? Obviously, 2^n is a upper bound: You can form each possible conjunction of n variables in the hidden layer, and then selectively combine them disjunctively with the output node. This seems like overkill though.... I've seen some other bounds in Muroga's book which allow for multiple layers and they are on the order of O(2^(n/2)), but I'm looking specifically for a bound for a one-hidden layer net. -Bill =============================================================================== Bill Horne | email: bill at baku.eece.unm.edu Dept. of Electical and Computer Engineering | University of New Mexico | Phone: (505) 277-0805 Albuquerque, NM 87131 USA | Office: EECE 224D =============================================================================== From finton at cs.wisc.edu Fri Jan 17 22:16:34 1992 From: finton at cs.wisc.edu (David J. Finton) Date: Fri, 17 Jan 92 21:16:34 -0600 Subject: Lost Mail Message-ID: <9201180316.AA14631@lactic.cs.wisc.edu> q h From ross at psych.psy.uq.oz.au Sat Jan 18 08:13:20 1992 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 19 Jan 1992 00:13:20 +1100 Subject: trying to find JM SOPENA of BARCELONA Message-ID: <9201181313.AA23110@psych.psy.uq.oz.au> J.M. Sopena of the University of Barcelona posted notice of a paper on ESRP: a Distributed Connectionist Parser, some weeks back. The contact address was given as: d4pbjss0 at e0ub011.bitnet My mail to Sopena has been bounced by the bitnet gateway (cunyvm.bitnet) with a 'cannot find mailbox' message. Would Dr Sopena please contact me directly or perhaps someone wha HAS got through to Sopena might get in touch with me. MY APOLOGIES TO THE 99.9% OF THE LIST FOR WHOM THIS IS IRRELEVANT. Thankyou Ross Gayler ross at psych.psy.uq.oz.au From LZHAO at swift.cs.tcd.ie Sat Jan 18 19:54:00 1992 From: LZHAO at swift.cs.tcd.ie (LZHAO@swift.cs.tcd.ie) Date: Sun, 19 Jan 1992 01:54 +0100 Subject: I would like to put my name on your list Message-ID: <8A7B8495C0006D91@cs.tcd.ie> my e-mail address is lzhao at cs.tcd.ie From B344DSL at UTARLG.UTA.EDU Mon Jan 20 11:15:00 1992 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 20 Jan 1992 10:15 CST Subject: Registration and tentative program for a conference in Dallas, Feb.6-8. Message-ID: <01GFJAE9SU2C00018I@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From B344DSL at UTARLG.UTA.EDU Mon Jan 20 20:09:00 1992 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 20 Jan 1992 19:09 CST Subject: Registration form and tentative program for conference previously announced. Message-ID: <01GFJT21UXPC0001UF@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From terry at jeeves.UCSD.EDU Tue Jan 21 04:41:45 1992 From: terry at jeeves.UCSD.EDU (Terry Sejnowski) Date: Tue, 21 Jan 92 01:41:45 PST Subject: Neural Computation 4:1 Message-ID: <9201210941.AA11715@jeeves.UCSD.EDU> Neural Computation Volume 4, Issue 1, January 1992 Review: Neural Networks and the Bias/Variance Dilemma Stuart German, Elie Bienenstock, and Rene Doursat Article: A Model for the Action of NMDA Conductances in the Visual Cortex Kevin Fox and Nigel Daw Letters: Alternating and Synchronous Rhythms in Reciprocally Inhibitory Model Neurons Xiao-Jing Wang and John Rinzel Feature Extraction Using an Unsupervised Neural Network Nathan Intrator Speaker-Independent Digit Recognition Using a Neural Network with Time-Delayed Connections K. P. Unnikirishnan, J. J. Hopfield, and D. W. Tank Local Feedback Multilayered Networks Paolo Frasconi, Marco Gori, and Giovanni Soda Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks Jurgen Schmidhuber ----- SUBSCRIPTIONS - VOLUME 4 - BIMONTHLY (6 issues) ______ $40 Student ______ $65 Individual ______ $150 Institution Add $12 for postage and handling outside USA (+7% for Canada). (Back issues from Volumes 1-3 are available for $17 each.) MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142. (617) 253-2889. ----- From golden at utdallas.edu Tue Jan 21 12:55:06 1992 From: golden at utdallas.edu (Richard Golden) Date: Tue, 21 Jan 1992 11:55:06 -0600 Subject: No subject Message-ID: <92Jan21.115518cst.16234@utdallas.edu> I apologize if this conference announcement was already sent out over the CONNECTIONISTS network. CONFERENCE ANNOUNCEMENT: OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? AT UNIVERSITY OF TEXAS AT DALLAS The following NEURAL NETWORK conference will be held at the University of Texas at Dallas February 6-9, 1992. It is sponsored by the Metroplex Institute for Neural Dynamics (MIND), Texas SIG of International Neural Network Society (INNS), and the University of Texas at Dallas. ------------------- The conference will focus upon the following two themes: (1) How can particular neural functions be optimized in a network? (2) Are particular functions performed optimally by biological or artificial neural network architectures? -------------------- Invited Speakers Include: Gail Carpenter (Boston University) Stephen Grossberg (Boston University) Steven Hampson (U. of California Irvine) Karl Pribram (Radford University) David Stork (Ricoh Corporation) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) -------------------- The one-hour oral presentations will be non-overlapping. Location: University of Texas at Dallas Thursday and Friday at Conference Center Saturday at Green Building Auditorium Conference Hotel: Richardson Hilton and Towers Conference Fees: Student members of MIND or INNS or UTD: $10 Other students: $20 Non-student members of MIND or INNS: $80 Other non-students: $80 ---------------------- Contacts: Professor Dan Levine (University of Texas at Arlington) Dr. Manuel Aparicio (IBM research) Professor Alice O'Toole (University of Texas at Dallas) ------------------------- A registration form is included at the end of this email message after the tentative schedule. I am sure that "on-site" registration will be available as well but I do not know the details. -------------------------------- TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? --------------------------------------------------- Registration for conference on optimality in Biological and Artificial Networks. Name (last,first,middle) _______________________________________________________________ Mailing Address ____________________________________________ ____________________________________________ ____________________________________________ ___________________________________________- Affiliation ____________________________________________ Telephone ____________________________________________ Email (if any) ____________________________________________ Registration fee (please enclose check payable to MIND): Registration Fees: Student members of MIND or INNS or UTD: $10 Other students: $20 Non-student members of MIND or INNS: $80 Other non-students: $80 Registration Fee Type (e.g., student member, non-student, etc.): --------------------------------------- Amount of check _______________________________________ Hotel: Do you need a registration card (rooms are $59/night)? _________________ Do you wish to share a room? _____________________ -------------------------------------- Reduced fares are available to Dallas-Fort Worth on American Airlines. Call the airline and ask for Starfile S14227D, under the name of MIND. Preregistrants whose forms and payment checks are received by Jan. 31 will be mailed a preregistration package with a confirmation. This will include a complete schedue with times of presentations and directions to sites. --------------------------------------- Please send registration form to: Professor Daniel S. Levine Department of Mathematics Box 19408 University of Texas at Arlington Arlington, TX 76019-0408 Office: 817-273-3598 FAX: 817-794-5802 email: b344dsl at utarlg.uta.edu From uh311ae at sunmanager.lrz-muenchen.de Tue Jan 21 16:00:28 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 21 Jan 92 22:00:28+0100 Subject: Bengio's paper on 'learning learning rules' Message-ID: <9201212100.AA01632@sunmanager.lrz-muenchen.de> I just ftp'ed and read the above paper, which is available on the neuroprose server. Essentially, it proposes to use an optimizer (like a gradient descent method or a genetic algorithm) to optimize very global network parameters, last not least the learning rule itself. The latter might be accomplished by e.g. having a ga switching individual modules that get combined into a hete- rogenous rule on and off. Unfortunately, the paper does not include any si- mulation data to support this idea. I did some experiments last year which might be of interest, because they do support Bengio's predictions. Due to the hunger for resources exhibited by a GA that has expensive function evaluations (network training tests), the results are based on the usual set of toy problems like xor and a 16-2-16 encoder. The problem was wether and if, with what mixing factor, to couple two dif- fering learning rules into a hybrid. This is not as straightforward as to simply evaluate the mixing factor, because typically differing algorithms like different 'environments' to work in. More specifically, the two algo- rithms I considered are very sensitive to initialization range of the net- work weights and prefer rather nonoverlapping values. This complicated d the search for a good mixing factor into a multi-parameter nobody-knows problem, because I couldn't a priori rule out that a good hybrid would exist with unknown initialization parameters. One night of 36MHz R3000 sweat produced a nice hybrid with improved conver- gence for the tested problems, thus Bengio's claims get some support from me. I'd like to add, though, that more advanced searches are very likely requi- ring very long and careful optimization runs, if the GA is to sample a sufficiently large part of the search space. A hint to the practitioneer: It helps to introduce (either by hand, or dyna- mically) precision and range 'knobs' into the simulation, which makes it pos- sible to start with low precision, large range. It is also helpful to average at least 10, better 20+ individual network runs into a single function eva- luation. The GA could in principle deal with this noise, but is actually hard pressed when confronted with networks which sometimes do & sometimes don't converge. Cheers, Henrik Klagges IBM Research rick at vee.lrz-muenchen.de & henrik at mpci.llnl.gov From B344DSL at utarlg.uta.edu Mon Jan 20 11:15:00 1992 From: B344DSL at utarlg.uta.edu (B344DSL@utarlg.uta.edu) Date: Mon, 20 Jan 1992 10:15 CST Subject: Registration and tentative program for a conference in Dallas, Feb.6-8. Message-ID: <01GFJAE9SU2C00018I@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From uh311ae at sunmanager.lrz-muenchen.de Tue Jan 21 16:49:41 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 21 Jan 92 22:49:41+0100 Subject: CascadeC variants, Sjogaard's paper Message-ID: <9201212149.AA01642@sunmanager.lrz-muenchen.de> Steen Sjogaard makes some interesting observations about the Cascade Corre- lation algorithm. He invents a variation of it, named 2CCA, which is using only two layers, but also relies on freezing weights, error covariance, candidate pools etc. as the CCA does. The whole purpose of this difference in the dynamic network construction strategy is to increase the network's ability to generalize. He invents a meaningful classification problem (the 'three disc'-problem) and goes through some lengths to present 2CCA's su- periority over CCA when confronted with the benchmark problem (specifically, 2CCA generalizes better by getting more test patterns right). Now, the prob- lem is that he also says that, even after creation of 100 hidden units, 2CCA only classifies only half of the points of the 'Two-Spirals-Problem' right, which is not better than chance. Everybody who has ever tried to solve the 'Two-Spirals' with _any_ algorithm knows how nasty it is and how good the CCA solution (as presented in Fahlmann's paper) really is. In this light, it is obviously not easy to accept Sjogaard's claims. However, I think that Sjogaard is making some good points, and I would like to hear your opinion: a) Is a solution that employs mostly low-order feature detectors (i.e., has _few_ and _populated_ hidden layers) typically generalizing better than one that uses fewer high-order ones ? b) How is it to be decided when to add a new hidden unit to an existing layer versus putting it into a new one ? Simply creating a second candi- date pool doesn't do the job: A new-layer hidden unit should _always_ do at least as good as an extra old-layer hidden one, so simple covariance comparison does not work. Use an 'allowable error difference' term ? Use test patterns to check generalization ? Return to a fixed architec- ture ? c) Would the addition of a little noise make HCCA perform even better ? I haven't heard of an experiment in this direction, but I suspect this is based on quickprop's relative dislike of changing training sets, not on fundamental issues. (HCCA = Sjogaard's term for 'high-order-CCA' = CCA). d) How to write a k-CCA, a 'in-between-order' CC algorithm ? e) If you read Sjogaard's neuroprose paper, did you like his formalization of generalization ? (I thought it is insightful) Cheers, Henrik Klagges IBM Research rick at vee.lrz-muenchen.de @ henrik at mpci.llnl.gov PS: I hope Steen gets a k-CCA paper out soon 8-) ! From david at cns.edinburgh.ac.uk Tue Jan 21 04:10:08 1992 From: david at cns.edinburgh.ac.uk (David Willshaw) Date: Tue, 21 Jan 92 09:10:08 GMT Subject: Contents of NETWORK - Vol. 3, no 1 Feb 1992 Message-ID: <24040.9201210910@subnode.cns.ed.ac.uk> --------------------------------------------------------------------- CONTENTS OF NETWORK - COMPUTATION IN NEURAL SYSTEMS Volume 3 Number 1 February 1992 Proceedings of the First Irish Neural Networks Conference held at The Queen's University of Belfast, 21 June 1991. Proceedings Editor: G Orchard PAPERS 1 Sharing interdisciplinary perspectives on neural networks: the First Irish Neural Networks Conference G ORCHARD 5 On the proper treatment of eliminative connectionism S MILLS 15 Cervical cell image inspection - a task for artificial neural networks I W RICKETTS 19 A geometric interpretation of hidden layer units in feedforward neural networks J MITCHELL 27 Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feedforward neural networks with backward error propagation J A KINSELLA 37 A connectionist technique for on-line parsing R REILLY 47 Are artificial neural nets as smart as a rat? T SAVAGE & R COWIE 61 The principal components of natural images P J B HANCOCK, R J BADDELEY & L S SMITH 71 Information processing and neuromodulation in the visual system of frogs and toads P R LAMING 89 Neurodevelopmental events underlying information acquisition and storage E DOYLE, P NOLAN, R BELL & C M REGAN 95 ABSTRACTS SECTION 97 BOOK REVIEWS Network welcomes research Papers and Letters where the findings have demonstrable relevance across traditional disciplinary boundaries. Research Papers can be of any length, if that length can be justified by content. Rarely, however, is it expected that a length in excess of 10,000 words will be justified. 2,500 words is the expected limit for research Letters. Articles can be published from authors' TeX source codes. Macros can be supplied to produce papers in the form suitable for refereeing and for IOP house style. For more details contact the Editorial Services Manager at IOP Publishing, Techno House, Redcliffe Way, Bristol BS1 6NX, UK. Telephone: 0272 297481 Fax: 0272 294318 Telex: 449149 INSTP G Email Janet: IOPPL at UK.AC.RL.GB Subscription Information Frequency: quarterly Subscription rates: Institution 125.00 pounds (US$220.00) Individual (UK) 17.30 pounds (Overseas) 20.50 pounds (US$37.90) A microfiche edition is also available at 75.00 pounds (US$132.00) From poggio at atr-hr.atr.co.jp Wed Jan 22 02:00:50 1992 From: poggio at atr-hr.atr.co.jp (Poggio) Date: Wed, 22 Jan 92 16:00:50 +0900 Subject: NIPS preprint in neuroprose In-Reply-To: becker@ai.toronto.edu's message of Thu, 16 Jan 1992 14:44:36 -0500 <92Jan16.144445edt.10@neuron.ai.toronto.edu> Message-ID: <9201220700.AA02064@atr-hr.atr.co.jp> isbell at ai.mit.edu From uh311ae at sunmanager.lrz-muenchen.de Wed Jan 22 06:07:34 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 22 Jan 92 12:07:34+0100 Subject: CascadeC in parallel Message-ID: <9201221107.AA02532@sunmanager.lrz-muenchen.de> It is less than obvious how to parallelize Cascade Correlation, especially if compared to stuff like backprop (which fits easily on specialized 'kill- them-all' machines). If one tries to decompose the CC-computation onto neurons, it is a quick failure: It is not obvious to do any better than training the candidate pool in parallel. This suggest that the problem should be decomposed into another unit. I am to foggy at this time of the working day, however, to see another clever decomposition unit that would be homogenous enough to fit on a SIMD architecture. CC abandons the inter- hidden-unit parallelism by training only one at a time. It also kills of most of the feeding calculation via freezing and cacheing. So, where does the long vector SIMD machine fit into the picture ? It can't be a unipro- cessor-RISC-only algorithm like 'Boot Unix on the gate-level simulator'. Of course I could do 10 complete networks in parallel, but ... Cheers, Henrik From B344DSL at utarlg.uta.edu Wed Jan 22 12:03:00 1992 From: B344DSL at utarlg.uta.edu (B344DSL@utarlg.uta.edu) Date: Wed, 22 Jan 1992 11:03 CST Subject: Last notice and registration form for conference Feb. 6-8, UT Dallas Message-ID: <01GFM4NT9W5C00037X@utarlg.uta.edu> The program and abstracts I sent for the Optimality conference Feb. 6-8, I believe, did not include a registration form. I am sorry for the error: since I have already sent out two mailings to Connectionists on this con- ference, this is the last general mailing I will send on it. Anybody desi- ring more information, e.g., abstracts that weren't included earlier, should contact me individually (my e-mail address is at the end of the registration form which I am including in this notice.) Hope to see some of you there. Dan Levine REGISTRATION FOR CONFERENCE ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? FEBRUARY 6 TO 8, 1992, UNIVERSITY OF TEXAS AT DALLAS Sponsored by Metroplex Institute for Neural Dynamics (MIND), Texas SIG of International Neural Network Society (INNS), and the University of Texas at Dallas Name _______________________________________________________ LAST FIRST MIDDLE Mailing Address ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ Affiliation ________________________________________________ Telephone Number ____________________ e-mail if any ____________________ FAX if any ____________________ Registration fee (please enclose check payable to MIND): Non-student members of MIND or INNS, $70 _______ or UTD faculty or staff Other non-students $80 _______ Student members of MIND or INNS, $10 _______ or UTD students Other students $20 _______ Presenters (oral or poster) from outside Dallas-Ft. Worth FREE _______ (Note: Registration does not include meals) Hotel: Please check if you need a reservation card ______ (Rooms at the Richardson Hilton are $59 a night) Please check if you wish to share a room ______ Reduced fares are available to Dallas-Fort Worth on American Airlines. Call the airline and ask for StarFile S14227D, under the name of MIND. Preregistrants whose forms and payment checks are received by January 31 will be mailed a preregistration package with a confirmation. This will include a complete schedule with times of presentations and directions to the hotel and conference site. Please send this form to: Professor Daniel S. Levine Department of Mathematics Box 19408 University of Texas at Arlington Arlington, TX 76019-0408 Office: 817-273-3598; FAX: 817-794-5802; e-mail b344dsl at utarlg.uta.edu Conference program(still subject to minor change): ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- Man as Machine? Conflicting Optima, Dynamic Goals, and Hope Wesley Elsberry, Battelle Research Labs -- Putting Optimality in its Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, U. of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Network Formations on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew U. -- For What are Brains Striving? Gail Carpenter, Boston U. -- Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Stephen Grossberg, Boston U. -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, U. of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems Ian Parberry, U. of North Texas -- Neural Networks and Computational Complexity David Chance, Central Oklahoma U. -- Real-time Neuronal Models Compared Within a Classical Conditioning Framework Samy Bengio, Universite de Montreal -- On the Optimization of a Synaptic Learning Rule Saturday, Feb. 8, AM: Karl Pribram, Radford U. -- The Least Action Principle: Does it Apply to Cognitive Processes? Paul Prueitt, Georgetown U. -- Control Hierarchies and the Return to Homeostasis Herve Abdi, U. of Texas, Dallas -- Generalization of the Linear Auto-Associator Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours John Johnson, U. of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Brian Telfer, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers From jm2z+ at andrew.cmu.edu Wed Jan 22 12:25:25 1992 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Wed, 22 Jan 1992 12:25:25 -0500 (EST) Subject: TRs announcemnt Message-ID: **** DO NOT FORWARD TO OTHER GROUPS **** We have recently produced two technical reports, the first in a new series devoted to issues in PDP and Cognitive Neuroscience. They are described below, followed by instructions for obtaining copies. -------------------------------------------------------------- TOWARD A THEORY OF INFORMATION PROCESSING IN GRADED, RANDOM, INTERACTIVE NETWORKS James L. McClelland Technical Report PDP.CNS.91.1 A set of principles for information processing in parallel distributed processing systems is described. In brief, the principles state that processing is graded and gradual, interactive and competitive, and subject to intrinsic variability. Networks that adhere to these principles are called GRAIN networks. Four goals of a theory of information processing based on these principles are described: 1) to unify the asymptotic accuracy, reaction-time, and time accuracy paradigms; 2) to examine how simple general laws might emerge from systems conforming to the principles and to predict and/or explain cases in which the general laws do not hold; and 3) to explore the explanatory role of the various principles in different aspects of observed empirical phenomena. Two case studies are presented. In the first, a general regularity here called Morton's independence law of the joint effects of context and stimulus information on perceptual identification is shown to be an emergent property of Grain networks that obey a specific architectural constraint. In the second case study, the general shape of time-accuracy curves produced by networks that conform to the principles is examined. A regularity here called Wickelgren's law, concerning the approximate shape of time accuracy curves, is found to be consistent with some GRAIN networks. While the exact conditions that give rise to standard time accuracy curves remain to be established, one observation is made concerning conditions that can lead to a strong violation of Wickelgren's law, and an experimental study that can be interpreted as meeting this condition is simulated. In both case studies, the joint dependency of the emergent characteristics of information processing on the different principles is considered. --------------------------------------------------------------- LEARNING CONTINUOUS PROBABILITY DISTRIBUTIONS WITH THE CONTRASTIVE HEBBIAN ALGORITHM Javier R. Movellan & James L. McClelland Technical Report PDP.CNS.91.2 We show that contrastive Hebbian learning (CHL), a well known learning rule previously used to train Boltzmann machines and Hopfield models, can also be used to train networks that adhere to the principles of graded, random, and interactive propagation of information. We show that when applied to such networks, CHL performs gradient descent on a contrastive function which captures the difference between desired and obtained continuous multivariate probability distributions. This allows the learning algorithm to go beyond expected values of output units and to approximate complete probability distributions on continuous multivariate activation spaces. Simulations show that such networks can indeed be trained with the CHL rule to approximate discrete and continuous probability distributions of various types. We briefly discuss the implications of stochasticity in our interpretation of information processing concepts. -------------------------------------------------------------- To Obtain Copies: To minimize printing/mailing costs, we strongly encourage interested readers of this mailing list to get the reports via ftp. The filenames for the reports are * pdp.cns.91.1.ps.Z for the McClelland paper (42 pages, no figures) * pdp.cns.91.2.ps.Z for the Movellan and McClelland paper (42 pages, includes figures). Full instructions are given below. For those who do not have access to ftp, physical copies can be requested from: bd1q+ at andrew.cmu.edu You can also request just the figures of the McClelland paper. In your email please indicate exactly what you are requesting. Figures for the McClelland paper will be sent promptly; physical copies of either complete TR will be sent within a few weeks of receipt of the request. Instructions: To obtain copies via ftp use the following commands: unix> ftp 128.2.248.152 Name: anonymous Password: pdp.cns ftp> cd pub/pdp.cns ftp> binary ftp> get pdp.cns.91.1.ps.Z (or pdp.cns.91.2.ps.Z) ftp> quit unix> uncompress pdp.cns.91.1.ps.Z | lpr ------------------------------------------------------------------------------ If you need further help, please contact me: Javier R. Movellan...... jm2z at andrew.cmu.edu Department of Psychology..... 412/268-5145(voice) Carnegie Mellon University 412/268-5060(Fax) Pittsburgh, PA 15213-3890 - Javier From dlukas at park.bu.edu Wed Jan 22 15:14:27 1992 From: dlukas at park.bu.edu (dlukas@park.bu.edu) Date: Wed, 22 Jan 92 15:14:27 -0500 Subject: Call For Papers: Neural Networks for Learning, Recognition and Control Message-ID: <9201222014.AA21377@fenway.bu.edu> CALL FOR PAPERS International Conference on NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL May 14-16, 1992 Gail A. Carpenter and Stephen Grossberg CONFIRMED SPEAKERS: May 14: R. Shiffrin, R. Ratcliff, D. Rumelhart. May 15: M. Mishkin, L. Squire, S. Grossberg, T. Berger, M. Bear, G. Carpenter, A. Waxman, T. Caudell. May 16: G. Cybenko, E. Sontag, R. Brockett, B. Peterson, D. Bullock, J. Albus, K. Narendra, R. Pap. CONTRIBUTED PAPERS: A featured 3-hour poster session on neural network research related to learning, recognition, and control will be held on May 15, 1992. Attendees who wish to present a poster should submit three copies of an abstract (one single-space page), post-marked by March 1, 1992, for refereeing. Include a cover letter giving the name, address, and telephone number of the corresponding author. Mail to: Poster Session, Neural Networks Conference, Wang Institute of Boston University, 72 Tyng Road, Tyngsboro, MA 01879. Authors will be informed of abstract acceptance by March 31, 1992. A book of lecture and poster abstracts will be given to attendees at the conference. For information about registration and the two neural network tutorial courses being taught on May 9-14, call (508) 649-9731 (x255) or request a meeting brochure in writing when submitting your abstract. From jfj%FRLIM51.BITNET at BITNET.CC.CMU.EDU Thu Jan 23 10:14:16 1992 From: jfj%FRLIM51.BITNET at BITNET.CC.CMU.EDU (jfj%FRLIM51.BITNET@BITNET.CC.CMU.EDU) Date: Thu, 23 Jan 92 16:14:16 +0100 Subject: NNs & NLP Message-ID: <9201231514.AA14149@m53.limsi.fr> Hi. About a month ago, I posted a request for references concerning neural networs and natural language processing. Quite a few people have been kind enough to reply, and I am in the process of compiling a bibliography list. This should be completed soon (I'm still waiting for a few references to arrive by hard-mail), and I'll post the results. If anyone out there hasn't yet replied, and would like to do so, I'll be glad to add their contribution to the list. Thank you for your help, jfj From josh at flash.bellcore.com Thu Jan 23 09:54:10 1992 From: josh at flash.bellcore.com (Joshua Alspector) Date: Thu, 23 Jan 92 09:54:10 -0500 Subject: Postdoctoral position at Bellcore Message-ID: <9201231454.AA13262@flash.bellcore.com> The neural network research group at Bellcore is looking for a post-doctoral researcher for a period of 1 year. The start date is flexible but should be before August, 1992. Because of its inherently parallel nature, neural network technology is particularly suited for the types of real-time computation needed in telecommunications. This includes data compression, signal processing, optimization, and speech and pattern recognition. Neural network training is also suitable for knowledge-based systems where the rules are not known or where there are too many rules to incorporate in an expert system. The goal of our group is to bring neural network technology into the telecommunications network. Our approach to developing and using neural network technology to encode knowledge by learning includes work on the following: 1) Development, analysis, and simulation of learning algorithms and architectures. 2) Design and fabrication of prototype chips and boards suitable for parallel, high-speed, neural systems. 3) Telecommunications applications. We are interested in strong candidates in any of the above work areas but are especially encouraging people who can demonstrate the usefulness of the technology in telecommunications applications. The successful candidate should have a demonstrated record of accomplishment in neural network research, should be proficient at working in a UNIX/C environment, and should be able to work interactively in the applied research area at Bellcore. Apply in writing to: Joshua Alspector Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07962-1910 Please enclose a resume, a copy of a recent paper, and the names, addresses, and phone numbers of three referees. From yoshua at psyche.mit.edu Thu Jan 23 12:55:40 1992 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Thu, 23 Jan 92 12:55:40 EST Subject: Optimizing a learning rule Message-ID: <9201231755.AA15379@psyche.mit.edu> Hello, Recently, Henrik Klagges broadcasted on the connectionists list results he obtained on optimizing synaptic learning rules, citing our last year tech report [1] on this subject. This report [1] did not contain any simulation results. However, since then, we have been able to perform several series of experiments, with interesting results. Early results were presented last year at Snowbird and more recent results will be presented at the Conference on Optimality in Biological and Artificial Networks, to be held in Dallas, TX, Feb.6-9. A preprint can be obtained from anonymous ftp at iros1.umontreal.ca in directory pub/IRO/its/bengio.optim.ps.Z (compressed postscript file) The title of the paper to be presented at Dallas is: On the optimization of a synaptic learning rule by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. Abstract: This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has {\it local} inputs, and is the same is many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for {\it Aplysia} yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Previous work: [1] Bengio Y. and Bengio S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universite de Montreal. [2] Bengio Y., Bengio S., and Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Related work: [3] Chalmers D.J. (1990). The evolution of learning: an experiment in genetic connectionism. In: Connectionist Models: Proceedings of the 1990 Summer School, pp. 81-90. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Thu Jan 23 13:55:36 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Thu, 23 Jan 92 13:55:36 EST Subject: CascadeC variants, Sjogaard's paper In-Reply-To: Your message of 21 Jan 92 22:49:41 +0100. <9201212149.AA01642@sunmanager.lrz-muenchen.de> Message-ID: Steen Sjogaard makes some interesting observations about the Cascade Corre- lation algorithm. He invents a variation of it, named 2CCA, which is using only two layers, but also relies on freezing weights, error covariance, candidate pools etc. as the CCA does. Some people don't like to see much discussion on this mailing list, so I'll keep this response as brief as possible. Sjogaard's 2CCA algorithm is just like cascade-correlation except that it elimiantes the "cascade" part: the candidate units receive connections only from the original inputs, and not from previously tenured hidden units. So it builds a net with a single hidden layer, plus shortcut connections from inputs to outputs. For some problems, a solution with one hidden layer is as good as any other, and for these problems 2CCA will learn a bit faster and generalize a bit better than Cascor. The extra degress of freedom in cascor have nothing useful to do and they just get in the way. However, for other problems, such as two-spirals, you get a much better solution with more layers. A cascade architecture can solve this problem with 10 units, while a single hidden layer requires something like 50 or 60. My own conclusion is that 2CCA does work somewhat better for certain problems, but is terrible for others. If you don't know in advance what architecture your problem needs, you are probably better off sticking with the more general cascade architecture. Chris Lebiere looked briefly at the following option: create two pools of candidate units, one that receives connections from all pre-existing inputs and hidden units, and one that has no connections from the deepest layer created so far. If the correlation scores are pretty close, tenure goes to the best-scoring unit in the latter pool. This new unit is a sibling of the pre-existing units, not a descendant. Preliminary results were promising, but for various reason Chris didn't follow up on this and it is still an open question what effect this might have on generalization. Scott Fahlman School of Computer Science Carnegie Mellon University From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Thu Jan 23 14:08:49 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Thu, 23 Jan 92 14:08:49 EST Subject: CascadeC in parallel In-Reply-To: Your message of 22 Jan 92 12:07:34 +0100. <9201221107.AA02532@sunmanager.lrz-muenchen.de> Message-ID: It is less than obvious how to parallelize Cascade Correlation, especially if compared to stuff like backprop. There are several good ways to partition Cascade-Correlation for parallel execution. Of course, the choice depends on the dimensions of your problem, and the dimensions and communication structure of your machine. One obvious choice is indeed to run each of the candidate units on its own separate processor. There is little communication involved: just broadcasting the incoming values and error to be matched (if these are not stored locally) and occasional polling to determine quiescence and choose a winner. For hard problems, Cascor spends most of its time in the candidate training phase, so this can give good results. If there are many output units, they can also be trained separately, one per processor. It is possible to overlap candidate and output training to some degree. The other obvious choice is to simulate an identical copy of the whole architecture on different processors, each with 1/N of the training data. Just before each weight-update phase, you sum up the computed derivative values from all the processors and update all the copies at once. Of course, there's much less total work to be done than with backprop, so you might need a bigger problem to get any real advantage from parallelism. But I find it hard to think of that as a disadvantage. Scott Fahlman School of Computer Science Carnegie Mellon University From seifert at csmil.umich.edu Thu Jan 23 17:03:41 1992 From: seifert at csmil.umich.edu (Colleen Seifert) Date: Thu, 23 Jan 92 17:03:41 -0500 Subject: position announcement Message-ID: <9201232203.AA17879@csmil.umich.edu> The University of Michigan Department of Psychology invites applications for a tenure-track position in the area of Cognitive Modelling. We seek candidates with primary interests and technical skills in cognitive psychology, with special preference for individuals with particular expertise in computational modelling (broadly defined, including connectionist modelling). Due to time constraints, please indicate interest via email (to "gmo at csmil.umich.edu") along with sending vita, references, publications, and statement of research and teaching interests to: Cognitive Processes Search Committee, Dept. of Psychology, University of Michigan, Ann Arbor, MI 48109. From russ at oceanus.mitre.org Fri Jan 24 07:25:35 1992 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 24 Jan 92 07:25:35 EST Subject: Aspirin/MIGRAINES V5.0 Message-ID: <9201241225.AA29557@oceanus.mitre.org> ------- OFFICIAL RELEASE! All pre-release 5.0 versions should be deleted ------- The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance C code implementations for backpropagation networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 5.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates a C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The system has been ported to a number of platforms: Apollo Convex Cray DecStation HP IBM RS/6000 Intel 486/386 (Unix System V) NeXT News Silicon Graphics Iris Sun4, Sun3 Coprocessors: Mercury i860 (40MHz) Coprocessors Meiko Computing Surface w/i860 (40MHz) Nodes Skystation i860 (40MHz) Coprocessors iWarp Cells Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. Aspirin 5.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/730: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/730: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/730: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/730: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3.0 - Matlab - Mathematica Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3.0. Gnuplot3.0 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3.0 is available as the file gnuplot3.0.tar.Z. >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> >>>> USENET users: >>>> >>>> GNUPLOT 3.0 was posted to comp.sources.misc. >>>> >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.0.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am5.tar.Z" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "polaris.cognet.ucla.edu" (128.97.50.3) (typically with the command "ftp 128.97.50.3") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "alexis", by typing the command "cd alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am5.tar.Z" How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am5.tar.Z | tar xvf - -or- uncompress am5.tar.Z ; tar xvf am5.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. January, 1992 Russell Leighton * * MITRE Signal Processing Center *** *** *** *** 7525 Colshire Dr. ****** *** *** ****** McLean, Va. 22102, USA ***************************************** ***** *** *** ****** INTERNET: russ at dash.mitre.org, ** *** *** *** leighton at mitre.org * * From arseno at phy.ulaval.ca Fri Jan 24 09:58:25 1992 From: arseno at phy.ulaval.ca (Henri Arsenault) Date: Fri, 24 Jan 92 09:58:25 EST Subject: programs Message-ID: <9201241458.AA15022@einstein.phy.ulaval.ca> There are a lot of long meetings programs with abstracts and so on being transmitted on this network. Would it not be more economical to transmit a short abstract along with instructions on how to ftp the whole document? Almost every day I have to scroll through long documents of marginal interest to me. Is it really necessary to put such long documents so all the subscribers have to read them? Henri H. Arsenault email: arseno at phy.ulaval.ca From Ye-Yi.Wang at DEAD.BOLTZ.CS.CMU.EDU Fri Jan 24 10:18:24 1992 From: Ye-Yi.Wang at DEAD.BOLTZ.CS.CMU.EDU (Ye-Yi.Wang@DEAD.BOLTZ.CS.CMU.EDU) Date: Fri, 24 Jan 92 10:18:24 EST Subject: connectionist text generation Message-ID: Coulod anyone give me a pointer to the references on neural network text generation systems? Thanks. Ye-Yi From russ at oceanus.mitre.org Fri Jan 24 13:15:23 1992 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 24 Jan 92 13:15:23 EST Subject: Free Neural Network Simulator (Aspirin V5.0) Message-ID: <9201241815.AA03381@oceanus.mitre.org> ------- OFFICIAL RELEASE! All pre-release 5.0 versions should be deleted ------- The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance C code implementations for backpropagation networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 5.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates a C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The system has been ported to a number of platforms: Apollo Convex Cray DecStation HP IBM RS/6000 Intel 486/386 (Unix System V) NeXT News Silicon Graphics Iris Sun4, Sun3 Coprocessors: Mercury i860 (40MHz) Coprocessors Meiko Computing Surface w/i860 (40MHz) Nodes Skystation i860 (40MHz) Coprocessors iWarp Cells Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. Aspirin 5.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/730: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/730: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/730: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/730: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3.0 - Matlab - Mathematica Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3.0. Gnuplot3.0 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3.0 is available as the file gnuplot3.0.tar.Z. >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> >>>> USENET users: >>>> >>>> GNUPLOT 3.0 was posted to comp.sources.misc. >>>> >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.0.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am5.tar.Z" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "polaris.cognet.ucla.edu" (128.97.50.3) (typically with the command "ftp 128.97.50.3") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "alexis", by typing the command "cd alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am5.tar.Z" How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am5.tar.Z | tar xvf - -or- uncompress am5.tar.Z ; tar xvf am5.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. January, 1992 Russell Leighton * * MITRE Signal Processing Center *** *** *** *** 7525 Colshire Dr. ****** *** *** ****** McLean, Va. 22102, USA ***************************************** ***** *** *** ****** INTERNET: russ at dash.mitre.org, ** *** *** *** leighton at mitre.org * * From bap at james.psych.yale.edu Fri Jan 24 16:45:02 1992 From: bap at james.psych.yale.edu (Barak Pearlmutter) Date: Fri, 24 Jan 92 16:45:02 -0500 Subject: YANIPSPPI (Yet Another NIPS PrePrint by Internet) Message-ID: <9201242145.AA16059@james.psych.yale.edu> Because of the large number of requests for preprints of "Gradient descent: second-order momentum and saturating error", I am following the trend and making it available by FTP. I apologize to those who left their cards, but the effort and expense of distribution is prohibitive. If you can not access the paper in this fashion but really must have a copy before the proceedings come out, please contact me. FTP INSTRUCTIONS: ftp JAMES.PSYCH.YALE.EDU user anonymous password state-your-name cd pub/bap/asymp binary get nips91.PS.Z quit zcat nips91.PS.Z | lpr Maybe next year, instead of contracting out the proceedings, we can require postscript from all contributers and everyone will print everything out at home. Money saved will be used to purchase giant staplers. Barak Pearlmutter Yale University Department of Psychology 11A Yale Station New Haven, CT 06520-7447 pearlmutter-barak at yale.edu From yoshua at psyche.mit.edu Fri Jan 24 20:34:38 1992 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Fri, 24 Jan 92 20:34:38 EST Subject: optimization of learning rule: erratum Message-ID: <9201250134.AA26747@psyche.mit.edu> Hi, My previous message mentioned the availability of a preprint on the optimization of learning rules at an ftp site. There was a typo in the address. The correct address is: iros1.iro.umontreal.ca or 132.204.32.21 and the compressed postscript file is in pub/IRO/its/bengio.optim.ps.Z Sorry, Yoshua Bengio From Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU Sat Jan 25 23:49:31 1992 From: Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU (Dave_Touretzky@DST.BOLTZ.CS.CMU.EDU) Date: Sat, 25 Jan 92 23:49:31 EST Subject: programs In-Reply-To: Your message of Fri, 24 Jan 92 09:58:25 -0500. <9201241458.AA15022@einstein.phy.ulaval.ca> Message-ID: <15398.696401371@DST.BOLTZ.CS.CMU.EDU> Henri Arsenault writes: > There are a lot of long meetings programs with abstracts and so on being > transmitted on this network. Would it not be more economical to transmit > a short abstract along with instructions on how to ftp the whole > document? Almost every day I have to scroll through long documents of > marginal interest to me. Is it really necessary to put such long > documents so all the subscribers have to read them? Meeting programs and abstracts are entirely appropriate materials for the CONNECTIONISTS list. Not all our subscribers have access to FTP, and most would consider it an unreasonable inconvenience to have to FTP such things. My advice to you is to learn how to use your mail-reading program correctly. In most mail readers, long messages that don't interest you can be skipped with a single keystroke. If you are scrolling through the whole message in order to get to the next one, go back and read the user's manual. Please, let's have no more discussion of this topic on the CONNECTIONSITS list. *That* would be a waste of bandwidth. People who feel they absolutely have to comment on this can send their remarks to the maintainers: Connectionists-Request at cs.cmu.edu. -- Dave Touretzky From U53076%UICVM.BITNET at bitnet.cc.cmu.edu Mon Jan 27 00:32:17 1992 From: U53076%UICVM.BITNET at bitnet.cc.cmu.edu (Bruce Lambert) Date: Sun, 26 Jan 92 23:32:17 CST Subject: OPtimizing inductive bias Message-ID: <01GFSII5TIGG9YCJ0F@BITNET.CC.CMU.EDU> HI folks, Recently Yoshua Bengio posted a note about using standard optimization techniques to set tunable parameters to neural nets. Dave Tcheng and I have working on the same basic idea at a more general level for several years. Rather than optimizing just networks, we have developed a framework for using optimization to search a large inductive bias space defined by several different types of algorithms (e.g., decision tree builders, nets, exemplar based approaches, etc.). Given the omnipresent necessity of tweaking biases to get good performance, automation of the bias search seems very sensible. A couple of references to our work are given below. We hope you find them useful -Bruce Lambert Department of Pharmacy Administration University of Illinois at Chicago Tcheng, D., Lambert, B., Lu, S. C-Y., & Rendell, L. (1989). Building robust learning systems by combining induction and optimization. In _Proc. 11th IJCAI_ (pp. 806-812). San Mateo, CA: Morgan Kaufman. Tcheng, D., Lambert, B., Lu, S. C-Y, & Rendell, L. (1991). AIMS: An adaptive interactive modelling system for supporting engineering decision making. In L. Birnbaum & G. Collins (Eds.), _Machine learning: Proceedings of the eighth international workshop_ (pp. 645-649). San Mateo, CA: Morgan Kaufman. From T00BOR%DHHDESY3.BITNET at BITNET.CC.CMU.EDU Mon Jan 27 15:54:45 1992 From: T00BOR%DHHDESY3.BITNET at BITNET.CC.CMU.EDU (Stefan Bornholdt) Date: MON, 27 JAN 92 15:54:45 MEZ Subject: papers available --- ann, genetic algorithms Message-ID: <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> papers available, hardcopies only. ------------------------------------------------------------------------ GENERAL ASYMMETRIC NEURAL NETWORKS AND STRUCTURE DESIGN BY GENETIC ALGORITHMS Stefan Bornholdt Deutsches Elektronen-Synchrotron DESY, Notkestr. 85, 2000 Hamburg 52 Dirk Graudenz Institut f\"ur Theoretische Physik, Lehrstuhl E, RWTH 5100 Aachen, Germany. A learning algorithm for neural networks based on genetic algorithms is proposed. The concept leads in a natural way to a model for the explanation of inherited behavior. Explicitly we study a simplified model for a brain with sensory and motor neurons. We use a general asymmetric network whose structure is solely determined by an evolutionary process. This system is simulated numerically. It turns out that the network obtained by the algorithm reaches a stable state after a small number of sweeps. Some results illustrating the learning capabilities are presented. [to appear in Neural Networks] preprints available from: Stefan Bornholdt, DESY-T, Notkestr. 85, 2000 Hamburg 52, Germany. Email: t00bor at dhhdesy3.bitnet (hardcopies only, all rights reserved) ------------------------------------------------------------------------ From lacher at NU.CS.FSU.EDU Tue Jan 28 12:47:08 1992 From: lacher at NU.CS.FSU.EDU (Chris Lacher) Date: Tue, 28 Jan 92 12:47:08 -0500 Subject: appropriate material Message-ID: <9201281747.AA02271@lambda.cs.fsu.edu> This is to express my *satisfaction* with the connectionists mailing list and the materials I get by being a subscriber. True, there are the occasional "unsubscribe me" and " Harry, will you give me a ride home" accidents, but in reality these are more humorous than annoying. It is not realistic to expect a list with many subscribers to be perfect. In general, I find the material (information on preprints, conference announcements, bibliographic endeavors, and scientific discussions) well worth the few uninteresting things that come out. And, as Dave Touretzky stated, learning to use the mailer makes it very easy and convenient to skip over things that are of no interest. So, my vote is to KEEP connectionists and the associated ftp servers running. We owe a big round of applause for the institutions and people who keep it going. Thanks! Chris Lacher From bogner at eleceng.adelaide.edu.au Wed Jan 29 00:18:05 1992 From: bogner at eleceng.adelaide.edu.au (bogner@eleceng.adelaide.edu.au) Date: Wed, 29 Jan 1992 16:18:05 +1100 Subject: papers available --- ann, genetic algorithms In-Reply-To: Stefan Bornholdt's message of MON, 27 JAN 92 15:54:45 MEZ <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> Message-ID: <9201290518.16821@munnari.oz.au> Would greatly appreciate a copy of the paper offered. Prof. Robert E. Bogner Dept. of Elec. Eng., Thu University of Adelaide, Box 498, Adelaide, SOUTH AUSTRALIA 5001 bogner at eleceng.adelaide.edu.au From ringram at ncsa.uiuc.edu Wed Jan 29 08:23:31 1992 From: ringram at ncsa.uiuc.edu (ringram@ncsa.uiuc.edu) Date: Wed, 29 Jan 92 07:23:31 -0600 Subject: Mailing List Message-ID: <9201291323.AA27261@newton.ncsa.uiuc.edu> Please remove my name from your mailing list. Rich Ingram From lacher at NU.CS.FSU.EDU Wed Jan 29 14:36:23 1992 From: lacher at NU.CS.FSU.EDU (Chris Lacher) Date: Wed, 29 Jan 92 14:36:23 -0500 Subject: paper Message-ID: <9201291936.AA02761@lambda.cs.fsu.edu> The following paper has been placed in the neuroprose archives under the name 'lacher.rapprochement.ps.Z'. Retrieval, uncompress, and printing have been successfuly tested. Expert Networks: Paradigmatic Conflict, Technological Rapprochement^\dagger R. C. Lacher Florida State University lacher at cs.fsu.edu Abstract. A rule-based expert system is demonstrated to have both a symbolic computational network representation and a sub-symbolic connectionist representation. These alternate views enhance the usefulness of the original system by facilitating introduction of connectionist learning methods into the symbolic domain. The connectionist representation learns and stores metaknowledge in highly connected subnetworks and domain knowledge in a sparsely connected expert network superstructure. The total connectivity of the neural network representation approximates that of real neural systems which may be useful in avoiding scaling and memory stability problems associated with some other connectionist models. Keywords. symbolic AI, connectionist AI, connectionism, neural networks, learning, reasoning, expert networks, expert systems, symbolic models, sub-symbolic models. ------------------- ^\dagger Paper given to the symposium "Approaches to Cognition", the fifteenth annual Symposium in Philosophy held at the University of North Carolina, Greensboro, April 5-7, 1991. From yair at siren.arc.nasa.gov Wed Jan 29 11:55:15 1992 From: yair at siren.arc.nasa.gov (Yair Barniv) Date: Wed, 29 Jan 92 08:55:15 PST Subject: papers available --- ann, genetic algorithms In-Reply-To: Stefan Bornholdt's message of MON, 27 JAN 92 15:54:45 MEZ <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> Message-ID: <9201291655.AA05759@siren.arc.nasa.gov.> Hello Dr. Bornholdt: I will appreciate obtaining a copy of the above work Thanks, Yair Barniv NASA/Ames, Mountain View, CA USA From Gripe at VEGA.FAC.CS.CMU.EDU Wed Jan 29 15:47:30 1992 From: Gripe at VEGA.FAC.CS.CMU.EDU (Gripe@VEGA.FAC.CS.CMU.EDU) Date: Wed, 29 Jan 92 15:47:30 EST Subject: workshop announcement :posted for Diane Gordon Message-ID: <2157.696718050@PULSAR.FAC.CS.CMU.EDU> From gordon at AIC.NRL.Navy.Mil Wed Jan 15 15:59:56 1992 From: gordon at AIC.NRL.Navy.Mil (gordon@AIC.NRL.Navy.Mil) Date: Wed, 15 Jan 92 15:59:56 EST Subject: workshop announcement Message-ID: ing process. Researchers to date have studied various biases in inductive learning such as algorithms, representations, background knowledge, and instance orders. The focus of this workshop is not to examine these biases in isolation. Instead, this workshop will examine how these biases influence each other and how they influence learning performance. For example, how can active selection of instances in concept learning influence PAC convergence? How might a domain theory affect an inductive learning algorithm? How does the choice of representational bias in a learner influence its algo- rithmic bias and vice versa? The purpose of this workshop is to draw researchers from diverse areas to discuss the issue of biases in inductive learning. The workshop topic is a unifying theme for researchers working in the areas of reformulation, constructive induction, inverse resolu- tion, PAC learning, EBL-SBL learning, and other areas. This workshop does not encourage papers describing system comparisons. Instead, the workshop encourages papers on the following topics: - Empirical and analytical studies comparing different biases in inductive learning and their quantitative and qualitative influ- ence on each other or on learning performance - Studies of methods for dynamically adjusting biases, with a focus on the impact of these adjustments on other biases and on learning performance - Analyses of why certain biases are more suitable for particular applications of inductive learning - Issues that arise when integrating new biases into an existing inductive learning system - Theory of inductive bias Please send 4 hard copies of a paper (10-15 double-spaced pages, ML-92 format) or (if you do not wish to present a paper) a descrip- tion of your current research to: Diana Gordon Naval Research Laboratory, Code 5510 4555 Overlook Ave. S.W. Washington, D.C. 20375-5000 USA Email submissions to gordon at aic.nrl.navy.mil are also acceptable, but they must be in postscript. FAX submissions will not be accepted. If you have any questions about the workshop, please send email to Diana Gordon at gordon at aic.nrl.navy.mil or call 202-767- 2686. Important Dates: March 12 - Papers and research descriptions due May 1 - Acceptance notification June 1 - Final version of papers due Program Committee: Diana Gordon, Naval Research Laboratory Dennis Kibler, University of California at Irvine Larry Rendell, University of Illinois Jude Shavlik, University of Wisconsin William Spears, Naval Research Laboratory Devika Subramanian, Cornell University Paul Vitanyi, CWI and University of Amsterdam From bogner at eleceng.adelaide.edu.au Wed Jan 29 20:21:54 1992 From: bogner at eleceng.adelaide.edu.au (bogner@eleceng.adelaide.edu.au) Date: Thu, 30 Jan 1992 12:21:54 +1100 Subject: Advertisement Message-ID: <9201300121.14027@munnari.oz.au> University of Adelaide SIGNAL PROCESSING AND NEURAL NETWORKS RESEARCH ASSOCIATE or RESEARCH OFFICER A research associate or research officer is required as soon as possible to work on projects supported by the University, The Department of Defence, and the Australian Research Council. Two prime projects are under consideration and the appointee may be required to work on either or both. The aim of the one project is to design an electronic sensor organ based on known principles of insect vision. The insect's eye has specialised preprocessing that provides measures of distance and velocity by evaluation of deformations of the perceived visual field. The work will entail novel electronic design in silicon or gallium arsenide, software simulation and experimental work to evaluate and demonstrate performance. This work is in collaboration with the Australian National University. The aim of the other project is to develop and investigate principles of artificial neural network for processing multiple signals obtained from over-the-horizon radars. Investigation of the wavelet functions for the representations of signals may be involved. The work is primarily in the area of exploration of algorithms and high-level computer software. This work is in conjunction with DSTO. d. DUTIES: In consultation with task leaders and specialist researchers to investigate alternative design approaches and to produce designs for microelectronic devices, based on established design procedures. Communicate designs to manufacturers and oversee the production of devices. Prepare data for experiments on applications of signal processing and artificial neural networks. Prepare software for testing algorithms. Assist with the preparation of reports. QUALIFICATIONS: For the Research Associate, a Phd or other suitable evidence of equivalent capability in research in engineering or computer science. Exceptionally, a candidate with less experience but outstanding ability might be considered. For the Research Officer, a degree in electrical engineering or computer science with a good level of achievement. Experience in signal processing would be an advantage. Demonstrated ability to communicate fluently in written and spoken English. PAY and CONDITIONS: will be in accordance with University of Adelaide policies, and will depend on the qualifications and experience. Suitable incumbents may be able to include some of the work undertaken for a higher degree if they do not hold such. Appointment may be made in scales from $25692 p.a. to 33017 for the Research Officer or $29600 to $38418 p.a. for the Research Associate. ENQUIRIES: Professor R. E. Bogner, Dept. of Electrical and Electronic Engineering, The University of Adelaide, Box 498, Adelaide, Phone (08) 228 5589, Fax (08) 224 0464, E-mail bogner at eleceng.adelaide.edu.au bugI2A22.CHI 28-1-92 From 0005013469 at mcimail.com Thu Jan 30 03:30:00 1992 From: 0005013469 at mcimail.com (Jean-Bernard Condat) Date: Thu, 30 Jan 92 08:30 GMT Subject: papers available --- golden section Message-ID: <70920130083007/0005013469PK3EM@mcimail.com> Hallo! I work on the golden section in sciences and look at all possible refernces and/or article related to this subject. If you known one, could you please send me one copy and/or the reference? Thank you very much for your kind help. Jean-Bernard Condat CCCF B.P. 8005 69351 Lyon Cedex 08 France Fax.: +33 1 47 87 70 70 Phone: +33 1 47 87 40 83 DialMail #24064 MCI Mail #501-3469 From anshu at discovery.rutgers.edu Thu Jan 30 14:33:01 1992 From: anshu at discovery.rutgers.edu (anshu@discovery.rutgers.edu) Date: Thu, 30 Jan 92 14:33:01 EST Subject: No subject Message-ID: <9201301933.AA04183@discovery.rutgers.edu> Hi! I am working on the retention of knowledge by a neural network. A neural network tends to forget the past training when it is trained on new data points. I'll be thankful if you could suggest some references in which I could find this topic, or suggest some interesting research topic which I can pursue further as my thesis also. Thanks. Anshu Agarwal anshu at caip.rutgers.edu From ross at psych.psy.uq.oz.au Thu Jan 30 14:56:16 1992 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Fri, 31 Jan 1992 06:56:16 +1100 Subject: Where is J M SOPENA of BARCELONA? Message-ID: <9201301956.AA11031@psych.psy.uq.oz.au> J.M. Sopena of the University of Barcelona posted notice of a paper on ESRP: a Distributed Connectionist Parser, some weeks back. The contact address was given as: d4pbjss0 at e0ub011.bitnet My mail to Sopena has been bounced by the bitnet gateway (cunyvm.bitnet) with a 'cannot find mailbox' message. Would Dr Sopena please contact me directly or perhaps someone who HAS got through to Sopena might get in touch with me. Almost a dozen people have contacted me since I last posted this message, to say that they also failed to contact Sopena and to please let them know the secret if I found it. Thankyou Ross Gayler ross at psych.psy.uq.oz.au From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Jan 31 09:55:54 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 31 Jan 92 09:55:54 EST Subject: No subject In-Reply-To: Your message of Thu, 30 Jan 92 14:33:01 -0500. <9201301933.AA04183@discovery.rutgers.edu> Message-ID: I am working on the retention of knowledge by a neural network. A neural network tends to forget the past training when it is trained on new data points... Geoff Hinton and his students did some early work on damaging nets and then re-training, and also on the use of fast-changing and slow-changing weights. Perhaps he can supply some references to this work and related work done by others. Cascade-Correlation has some interesting properties related to retention. If you train and then switch to a new training set, you mess up the output-layer weights, but you retain all of the old feature detectors (hidden units), and maybe build some new ones for the new data. Then if you return to the old training set or a composite set, re-learning is generally quite fast. This is demonstrated in my Recurrent Cascade-Correlation (RCC) paper that can be found in NIPS-3 or in neuroprose. I train a net to recognize Morse code by breaking the training into distinct (not cumulative) lessons, starting with the shortest codes, and then training on all codes at once. This works better than training on all codes right from the start. This opens up the possibility of a sort of long-lived, eclectic net that is trained in many diferent domains over its "lifetime" and that gradually accumulates a rich library of useful feature detectors. The current version of Cascor wouldn't be very good for this, since later hidden units would be too deep and would have too many inputs, but I think that this problem of excess connectivity is may be easily solvable. -- Scott Scott E. Fahlman School of Computer Science Carnegie Mellon University From pellioni at pioneer.arc.nasa.gov Wed Jan 29 20:38:04 1992 From: pellioni at pioneer.arc.nasa.gov (Andras Pellionisz SL) Date: Wed, 29 Jan 92 17:38:04 -0800 Subject: Open letter to Dr. Sun-Ichi Amari Message-ID: [[ Editor's Note: I know many in the field regard Dr. Pellionisz as holding controversial opinions. He and I have corresponded and I feel he brings up some very valid points which should be the source of substantive debate. The letter below is the result. I encourage responses, either in support or refutation, to the following letter. The main issue, that of intellectual priority and proper citation, affects all of us in research and forms the foundation of the modern scientific tradition. Dr. Pellionisz' secondary issue, international competition versus cooperation, is also worthy of discussion, though I would request that responses to Neuron Digest remain factual and near the subject of neural networks. I also certainly hope that Dr. Amari responds to the rather serious charges in an appropriate forum. -PM ]] Dear Peter: according to our previous exchange, after long deliberation, I put together the "Open letter to Amari". Given the fact that my personal story is well in line with some global happenings, I trust that you will find this contribution worthy of distribution Andras * "Tensor-Geometrical Approach to Neural Nets" in 1985 and 91* or OPEN LETTER TO DR. SUN-ICHI AMARI by Andras J. Pellionisz Dear Readers: Many of you may know that I pioneered a tensor- geometrical approach to neural nets for over a decade, with dozens of publications in this subject. Many of you may have seen a recent paper on tensor-geometry of neural nets (by Dr. Amari) as "opening a new fertile field of neural network research" (in 1991!) WITHOUT referencing ONE of the pre- existing pioneering studies. Dr. Amari did not even cite his own paper (1985), in which he criticized my pioneering. This is unfair, especially since that the majority of readers were uninitiated in tensor geometry in 85 and thus his early "criticism" greatly hampered the unfolding of the tensor geomery approach that he now takes. Unfortunately, Dr. Amari's paper appeared in a Journal in which he is a chief editor. Therefore, I am turning directly to you, with the copy of my letter (sent to Dr. Amari 21st Oct. 1991, no response to date). There may be two issues involved. Obviously, we are entering an era which will be characterized by fierce competition in R&D World- wide, especially between US, Japan and Europe. The question of protocol of fair competition in such a complex endeavor may be too nascent or too overwhelming for me to address. The costliness of pioneering and fairness to long-existing standards of protocol in academia, acknowledgement of such initiatives, is a painful and personal enough problem for me to have to shoulder. =========================================== Dear Dr. Amari: Thank you for your response to my E-mail sent to you regarding your paper in the September issue (1991) of "Neural Networks", entitled "Dualistic geometry of the manifold of higher-order neurons". You offered two interpretations why you featured a Geometrical Approach in 1991 as "opening a new fertile field of neural network research". One can see two explanations why you wrote your paper without even mentioning any of my many publications, for a decade prior to yours, or without even mentioning your own paper (with Arbib in which you criticized in 1985 my geometrical- tensorial approach that I took since 1980). I feel that one cannot accept both interpretations at the same time, since they contradict one another. Thus, I feel compelled to make a choice. The opening (2nd and 3rd) paragraphs of your letter say: "As you know very well, we criticized your idea of tensorial approach in our... paper with M.Arbib. The point is that, although the tensorial approach is welcome, it is too restrictive to think that the brain function is merely a transformation between contravariant vectors and covariant vectors; even if we use linear approximations, the transformation should be free of the positivity and symmetry. As you may understand these two are the essential restrictions of covariant-contravariant transformations. ...You have interests in analyzing a general but single neural network. Of course this is very important. However, what I am interested in is to know a geometrical structures of a set of neural networks (in other words, a set of brains). This is a new object of research." THIS FIRST INTERPRETATION, that you could have easily included to your 1991 paper, clearly features your work as a GENERALIZATION of my decade-old geometrical initiative, which you deem "too restrictive". I am happy that you still clearly detect some general features of my prior work, which you describes as targeting a "single neural network", while yours as being concerned with a "set of neural networks". Still, it is a fact that my work was never restricted to e.g. a SINGLE cerebellum, but was a geometrical representation of the total "set of all cerebella", not even restricted to any single species (but, in full generality, the metric tensor of the spacetime geometry). Thus the characterization of your work as more general appears unsupported by your letter. However, even if your argument were fully supported, in a generalization of earlier studies an author would be expected to make references, according to standard protocol, to prior work which is being generalized (as my "too restrictive" studies preceeded yours by a decade). In fact, you (implicitly) appear to accept this point by saying (later in your letter): "Indeed, when I wrote that paper, I thought to refer to your paper". Unfortunately, instead of doing so, you continue by offering a SECOND ALTERNATIVE INTERPRETATION of your omission of any reference to my work, by saying: "But if I did so, I could only state that it is nothing to do with this new approach". Regrettably, I find that the two interpretations are incompatible that (1) your work is a GENERALIZATION of mine (2) your geometrical aproach has NOTHING TO DO with the geometrical approach that I initiated. Since I have just returned from a visit to Germany (a country that awarded to me the Alexander von Humboldt Prize honoring my geometrical approach to brain theory) I know that many in Germany as well as in the US are curious to see how THEIR INTERPRETATION of similarities of the two tensor-geometrical approaches compares to Amari's and/or Pellionisz's interpretation. I can not run the risk of trying to slap into the face of the audience two diametrically opposing arguments (when they will press me requiring comparisons of your metric tensors used in 1991 and those that I used since 1980). On my part, I will therefore take the less offensive interpretation from those you offered, which claims that your geometrical approach is in some ways more general than my geometrical approach a decade before. As for you, I will leave it to you how you compare your approach to mine, if you become pressed by anyone to substantiate your claim over the comparison. I maintain the position proposed in my original letter, that it might be useful if such a public comparison is offered by you for the record at the earliest occasion of your choice. For now, I shall remain most cooperative to find ways to make sure that appropriate credit is given to my decade-old pioneering efforts (however "restrictive" you label the early papers and whether or not you have read any of those that I wrote since1982, the date of manuscript of your 1985 critique). At this time, I would like to refer to the wide selection of options taken by workers in the past in similar situations. Since by December 7, 1991, I will have made a strong public impact by statements on this issue, I would most appreciate if during the coming week or two you could indicate (which I have no reason to doubt at this time) your willingness to credit my costly pioneering efforts in some appropriate fashion. As you so well know yourself, a geometrical approach to brain theory is still not automatically taken by workers in 1991, and certainly was rather costly to me to initiate more than a decade ago, and to uphold, expand, experimentally prove in neuroscience, and firmly establish in neural net theory in spite of criticisms. Sincerely: Dr. Andras J. Pellionisz ------------------------------ Neuron Digest Monday, 2 Mar 1992 Volume 9 : Issue 9 Today's Topics: Open Letter - Response Reply to Pellionisz' "Open Letter" ------------------------------ From jbower at cns.caltech.edu Thu Jan 2 17:59:02 1992 From: jbower at cns.caltech.edu (Jim Bower) Date: Thu, 2 Jan 92 14:59:02 PST Subject: CNS*92 submissions Message-ID: <9201022259.AA16423@cns.caltech.edu> Submissions to CNS*92 Those of you who are preparing submissions for the first annual Computation and Neural Systems Meeting, July 26 - 31, San Francisco, California. We encourage you to submit materials via email. You can send your 100 word abstract and 1000 word summary to: cp at cns.caltech.edu You can also use this address to obtain additional information including conference registration forms. Jim Bower Program Chair CNS*92 From mosfet at mcc.com Fri Jan 3 09:57:21 1992 From: mosfet at mcc.com (Mosfeq Rashid) Date: Fri, 3 Jan 92 08:57:21 CST Subject: CNS*92 submissions In-Reply-To: `CNS*92 submissions' (2,<9201022259.AA16423@cns.caltech.edu>) by Jim Bower Message-ID: <9201031457.AA23305@avarice.aca.mcc.com> From schraudo at cs.UCSD.EDU Fri Jan 3 19:40:33 1992 From: schraudo at cs.UCSD.EDU (Nici Schraudolph) Date: Fri, 3 Jan 92 16:40:33 PST Subject: a hint for NIPS-4 authors using LaTeX Message-ID: <9201040040.AA02740@beowulf.ucsd.edu> [This message is only relevant for NIPS authors using LaTeX. My apologies for sending it to all connectionists, but I do feel that the benefit to some outweighs the slight inconvenience to many in this case.] The formatting instructions for NIPS-4 authors call for a third level heading for the reference section; however, both the \bibliography and \thebibliography macros produce a first level heading. Here's a quick fix for this: the line \renewcommand{\section}[2]{\subsubsection*{#2}} inserted right before the \bibliography or \thebibliography command will generate the correct format. Since the references are the last section of the paper, we can get away with such an ugly hack... Best regards, -- Nicol N. Schraudolph, CSE Dept. | work (619) 453-4364 | nici at cs.ucsd.edu Univ. of California, San Diego | FAX (619) 534-7029 | nici%cs at ucsd.bitnet La Jolla, CA 92093-0114, U.S.A. | home (619) 273-5261 | ...!ucsd!cs!nici From rkc at xn.ll.mit.edu Mon Jan 6 11:49:45 1992 From: rkc at xn.ll.mit.edu (rkc@xn.ll.mit.edu) Date: Mon, 6 Jan 92 11:49:45 EST Subject: [mike@psych.ualberta.ca: Connectionism & Motion] In-Reply-To: "Mike R. W. Dawson"'s message of Thu, 19 Dec 1991 20:27:42 -0700 Message-ID: <9201061649.AA06212@tremor.ll.mit.edu> Can I get a reprint of: Dawson, M.R.W. (1991). The how and why of what went where in apparent motion: Modeling solutions to the motion correspondence problem. Psychological Review, 98(4), 569-603. with any additional technical reports you've written on the matter? -Rob From ken at cns.caltech.edu Mon Jan 6 12:14:59 1992 From: ken at cns.caltech.edu (Ken Miller) Date: Mon, 6 Jan 92 09:14:59 PST Subject: summer program in Mathematical Physiology: update Message-ID: <9201061714.AA20536@cns.caltech.edu> I'm writing with respect to my previous posting on the summer program in Mathematical Physiology at MSRI (Mathematical Sciences Research Institute, Berkeley, CA). The first two weeks of this program, July 6-17, are on "Neurons in Networks", as described in that posting. The new information is: (1) The application deadline has been pushed back to Feb. 1 (2) Applications may be sent by e-mail. Send applications to abaxter at msri.org, and address the correspondence to Nancy Kopell and Michael Reed (3) All expenses will be covered for those who attend, *except*: there is a $450 limit on travel expenses. So, those wishing to attend from overseas should indicate in their application whether they will be able to attend with that limit on travel support. (4) As mentioned before, women and minorities are encouraged to apply. If you *are* a member of a minority, or if you are a woman and it might not be obvious to us from your name, please be sure to note this in your application. I'll repeat here the basic information about applications: To apply to participate and for financial support or to obtain more information about the topics of the workshops, please write to: Nancy Kopell and Michael Reed Summer Program in Mathematical Physiology MSRI 1000 Centennial Drive Berkeley, CA 94720 Applicants should state clearly whether they wish to be long term participants or workshop participants and which workshops they wish to attend. Students should send a letter explaining their background and interests and arrange for one letter of recommendation to be sent. Researchers should indicate their interest and experience in mathematical biology and include a current vita and bibliography. Women and minorities are encouraged to apply. Ken From jim at gdstech.grumman.com Tue Jan 7 10:50:27 1992 From: jim at gdstech.grumman.com (Jim Eilbert) Date: Tue, 7 Jan 92 10:50:27 EST Subject: Mailing List Message-ID: <9201071550.AA20642@gdstech.grumman.com> Would you please add me to the Connectionist mailing list. Thanks. Jim Eilbert M/S A02-26 Grumman CRC Bethpage, NY 11714 516-575-4909 From ingber at umiacs.UMD.EDU Wed Jan 8 11:15:21 1992 From: ingber at umiacs.UMD.EDU (Lester Ingber) Date: Wed, 8 Jan 1992 11:15:21 EST Subject: Generic mesoscopic neural networks ... neocortical interactions Message-ID: <9201081615.AA01582@dweezil.umiacs.UMD.EDU> "Generic mesoscopic neural networks based on statistical mechanics of neocortical interactions," previously placed in the pub/neuroprose archive (anonymous ftp to archive.cis.ohio-state.edu [128.146.8.52]) as ingber.mnn.ps.Z, has been accepted for publication as a Rapid Communications in Physical Review A. For awhile, most-current drafts of this preprint and some related papers may be obtained by anonymous ftp to ftp.umiacs.umd.edu [128.8.120.23] in the directory pub/ingber. (Remember to set "binary" at the ">" prompt after logging in.) If you do not have access to ftp, send me an email request, and I'll send you a uuencoded-compressed PostScript file. Sorry, but I cannot take on the task of mailing out hardcopies of this paper. ------------------------------------------ | Prof. Lester Ingber | | ______________________ | | Science Transfer Corporation | | P.O. Box 857 703-759-2769 | | McLean, VA 22101 ingber at umiacs.umd.edu | ------------------------------------------ From rwl at bend.UCSD.EDU Wed Jan 8 16:17:48 1992 From: rwl at bend.UCSD.EDU (Ron Langacker) Date: Wed, 8 Jan 92 13:17:48 PST Subject: Mailing List Message-ID: <9201082117.AA28894@bend.UCSD.EDU> Please remove my name from the connectionist mailing list. Thank you. Ron Langacker UCSD From j_bonnet at inescn.pt Thu Jan 9 09:11:00 1992 From: j_bonnet at inescn.pt (Jose' M. Bonnet) Date: 9 Jan 92 9:11 Subject: Mailing List Message-ID: <104*j_bonnet@inescn.pt> Would you please add me to the Connectionist mailing list. Thanks. Jose Bonnet INESC-Porto, Largo Monpilher, 22 4000 PORTO PORTUGAL From marshall at cs.unc.edu Thu Jan 9 14:31:05 1992 From: marshall at cs.unc.edu (Jonathan Marshall) Date: Thu, 9 Jan 92 14:31:05 -0500 Subject: Paper available: Neural mechanisms for steering in visual motion Message-ID: <9201091931.AA19960@marshall.cs.unc.edu> The following paper is available via ftp from the neuroprose archive at Ohio State (instructions for retrieval follow the abstract). ---------------------------------------------------------------------- Challenges of Vision Theory: Self-Organization of Neural Mechanisms for Stable Steering of Object-Grouping Data in Visual Motion Perception Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Invited paper, in Stochastic and Neural Methods in Signal Processing, Image Processing, and Computer Vision, Su-Shing Chen, Ed., Proceedings of the SPIE 1569, San Diego, July 1991, pp.200-215. ---------------------------------------------------------------------- ABSTRACT Psychophysical studies on motion perception suggest that human visual systems perform certain nonlocal operations. In some cases, data about one part of an image can influence the processing or perception of data about another part of the image, across a long spatial range. In others, data about nearby parts of an image can fail to influence one another strongly, despite their proximity. Several types of nonlocal interaction may underlie cortical processing for accurate, stable perception of visual motion, depth, and form: o trajectory-specific propagation of computed moving stimulus information to successive image locations where a stimulus is predicted to appear; o grouping operations (establishing linkages among perceptually related data); o scission operations (breaking linkages between unrelated data); and o steering operations, whereby visible portions of a visual group or object can control the representations of invisible or occluded portions of the same group. Nonlocal interactions like these could be mediated by long-range excitatory horizontal intrinsic connections (LEHICs), discovered in visual cortex of several animal species. LEHICs often span great distances across cortical image space. Typically, they have been found to interconnect regions of like specificity with regard to certain receptive field attributes, e.g., stimulus orientation. It has recently been shown that several visual processing mechanisms can self-organize in model recurrent neural networks using unsupervised "EXIN" (excitatory+inhibitory) learning rules. Because the same rules are used in each case, EXIN networks provide a means to unify explanations of how different visual processing modules acquire their structure and function. EXIN networks learn to multiplex (or represent simultaneously) multiple spatially overlapping components of complex scenes, in a context-sensitive fashion. Modeled LEHICs have been used together with the EXIN learning rules to show how visual experience can shape neural mechanisms for nonlocal, context-sensitive processing of visual motion data. ---------------------------------------------------------------------- To get a copy of the paper, do the following: unix> ftp archive.cis.ohio-state.edu login: anonymous password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get marshall.steering.ps.Z ftp> quit unix> uncompress marshall.steering.ps.Z unix> lpr marshall.steering.ps.Z If you have trouble printing the file on a Postscript-compatible printer, send me e-mail (marshall at cs.unc.edu) with your postal address, and I'll have a hardcopy mailed to you (may take several weeks for delivery, though). ---------------------------------------------------------------------- From flann at nick.cs.usu.edu Fri Jan 10 10:50:58 1992 From: flann at nick.cs.usu.edu (flann@nick.cs.usu.edu) Date: Fri, 10 Jan 92 08:50:58 -0700 Subject: NN package for IBM PC's Message-ID: <9201101550.AA04877@nick.cs.usu.edu> If any of you know of a public domain NN package that runs on an IBM PC (or equivalent) please let me know. Nick Flann, flann at nick.cs.usu.edu From harnad at Princeton.EDU Fri Jan 10 12:28:55 1992 From: harnad at Princeton.EDU (Stevan Harnad) Date: Fri, 10 Jan 92 12:28:55 EST Subject: Movement Systems: BBS Special Call for Commentators Message-ID: <9201101728.AA25367@clarity.Princeton.EDU> Below are the abstracts of 8 forthcoming target articles for a special issue on Movement Systems that will appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. This will be the first in a new series called "Controversies in Neuroscience," done in collaboration with Paul Cordo and the RS Dow Neurological Science Institute. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on any of these articles, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at clarity.princeton.edu or harnad at pucc.bitnet or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] Please specify which article or articles you would like to comment on. (Commentators will be allotted 1000 words to comment on one of the articles, 750 words more to comment on two of them, 500 more for three and then 250 more for each additional one, for a maximum of 3500 words to comment on all eight target articles.) To help us put together a balanced list of commentators, please give some indication of the aspects of the topic on which you would bring your areas of expertise to bear if you were selected as a commentator. In the next week or so, electronic drafts of the full text of each article will be available for inspection by anonymous ftp according to the instructions that follow after the abstracts. These drafts are for inspection only; please do not prepare a commentary until you are formally invited to do so. ____________________________________________________________________ 1. Alexander GE, MR De Long, & MD Crutcher: DO CORTICAL AND BASAL GANGLIONIC MOTOR AREAS USE "MOTOR PROGRAMS" TO CONTROL MOVEMENT? bbs.alexander 2. Bizzi E, N Hogan, FA Mussa-Ivaldi & S Giszter: DOES THE NERVOUS SYSTEM USE EQUILIBRIUM-POINT CONTROL TO GUIDE SINGLE AND MULTIPLE JOINT MOVEMENTS? bbs.bizzi 3. Bloedel JR: DOES THE ONE-STRUCTURE/ONE-FUNCTION RULE APPLY TO THE CEREBELLUM? bbs.bloedel 4. Fetz EH: ARE MOVEMENT PARAMETERS RECOGNIZABLY CODED IN SINGLE NEURON ACTIVITY? bbs.fetz 5. Gandevia SC & D Burke: DOES THE NERVOUS SYSTEM DEPEND ON KINESTHETIC INFORMATION TO CONTROL NATURAL LIMB MOVEMENTS? bbs.gandevia 6. McCrea DA: CAN SENSE BE MADE OF SPINAL INTERNEURON CIRCUITS? bbs.mccrea 7. Robinson DA: IMPLICATIONS OF NEURAL NETWORKS FOR HOW WE THINK ABOUT BRAIN FUNCTION bbs.robinson 8. Stein JF: POSTERIOR PARIETAL CORTEX AND EGOCENTRIC SPACE bbs.stein ---------------------------------------------------------------- 1. DO CORTICAL AND BASAL GANGLIONIC MOTOR AREAS USE "MOTOR PROGRAMS" TO CONTROL MOVEMENT? Garrett E. Alexander, Mahlon R. De Long, and Michael D. Crutcher Department of Neurology Emory University School of Medicine Atlanta, GA 30322 gea at vax3200.neuro.emory.edu KEYWORDS: basal ganglia, cortex, motor system, motor program, motor control, parallel processing, connectionism, neural network ABSTRACT: Prevailing engineering-inspired theories of motor control based on sequential/algorithmic or motor programming models are difficult to reconcile with what is known about the anatomy and physiology of the motor areas. This is partly because of certain problems with the theories themselves and partly because of features of the cortical and basal ganglionic motor circuits that seem ill-suited for most engineering analyses of motor control. Recent developments in computational neuroscience offer more realistic connectionist models of motor processing. The distributed, highly parallel, and nonalgorithmic processes in these models are inherently self-organizing and hence more plausible biologically than their more traditional algorithmic or motor-programming counterparts. The newer models also have the potential to explain some of the unique features of natural, brain-based motor behavior and to avoid some of the computational dilemmas asscociated with engineering approaches. ------------------------------------------------------------------- 2. DOES THE NERVOUS SYSTEM USE EQUILIBRIUM-POINT CONTROL TO GUIDE SINGLE AND MULTIPLE JOINT MOVEMENTS? E. Bizzi, N. Hogan, F.A. Mussa-Ivaldi and S. Giszter Department of Brain and Cognitive Sciences and Department of Mechanical Engineering Massachusetts Institute of Technology Cambridge, MA 02139 emilio at wheaties.ai.mit.edu KEYWORDS: spinal cord, force field, equilibrium point, microstimulation, multi-joint coordination, contact tasks, robotics, inverse dynamics, motor control. ABSTRACT: The hypothesis that the central nervous system (CNS) generates movement as a shift of the limb's equilibrium posture has been corroborated experimentally in single- and multi-joint motions. Posture may be controlled through the choice of muscle length tension curves that set agonist-antagonist torque-angle curves determining an equilibrium position for the limb and the stiffness about the joints. Arm trajectories seem to be generated through a control signal defining a series of equilibrium postures. The equilibrium-point hypothesis drastically simplifies the requisite computations for multijoint movements and mechanical interactions with complex dynamic objects in the environment. Because the neuromuscular system is springlike, the instantaneous difference between the arm's actual position and the equilibrium position specified by the neural activity can generate the requisite torques, avoiding the complex "inverse dynamic" problem of computing the torques at the joints. The hypothesis provides a simple unified description of posture and movement as well as performance on contact control tasks, in which the limb must exert force stably and do work on objects in the environment. The latter is a surprisingly difficult problem, as robotic experience has shown. The prior evidence for the hypothesis came mainly from psychophysical and behavioral experiments. Our recent work has shown that microstimulation of the spinal cord's premotoneuronal network produces leg movements to various positions in the frog's motor space. The hypothesis can now be investigated in the neurophysiological machinery of the spinal cord. -------------------------------------------------------------------- 3. DOES THE ONE-STRUCTURE/ONE-FUNCTION RULE APPLY TO THE CEREBELLUM? James R. Bloedel Division of Neurobiology Barrow Neurological Institute Phoenix, AZ KEYWORDS: cerebellum; Purkinje cells; mossy fibres; movement; proprioception; body image; kinesthesis; robotics; posture. ABSTRACT: The premise explored in this target article is that the function of the cerebellum can be best understood in terms of the operation it performs across its structurally homogeneous subdivisions. The functional heterogeneity sometimes ascribed to these different regions reflects the many functions of the central targets receiving the outputs of different cerebellar regions. Recent studies by ourselves and others suggest that the functional unit of the cerebellum is its sagittal zone. It is hypothesized that the climbing fiber system produces a short-lasting modification in the gain of Purkinje cell responses to its other principle afferent input, the mossy fiber-granule cell-parallel fiber system. Because the climbing fiber inputs to sagittally aligned Purkinje cells can be activated under functionally specific conditions, they could select populations of Purkinje neurons that were most highly modulated by the distributed mossy fiber inputs responding to the same conditions. These operations may be critical for the on-line integration of inputs representing external target space with features of intended movement, proprioceptive and kinesthetic cues, and body image. ----------------------------------------------------------------- 4. ARE MOVEMENT PARAMETERS RECOGNIZABLY CODED IN SINGLE NEURON ACTIVITY? Eberhard E. Fetz Regional Primate Research Center University of Washington Seattle, WA 98195 fetz at locke.hs.washington.edu KEYWORDS: neural coding; representation; neural networks; cross-correlation; movement parameters; parallel distributed processing ABSTRACT: To investigate neural mechanisms of movement, physiologists have analyzed the activity of task-related neurons in behaving animals. The relative onset latencies of neural activity have been scrutinized for evidence of a functional hierarchy of sequentially recruited centers, but activity appears to change largely in parallel. Neurons whose activity covaries with movement parameters have been sought for evidence of explicit coding of parameters such as active force, limb displacement and behavioral set. Neurons with recognizable relations to the task are typically selected from a larger population, ignoring unmodulated cells as well as cells whose activity is not related to the task in a simple, easily recognized way. Selective interpretations are also used to support the notion that different motor regions perform different motor functions; again, current evidence suggests that units with similar properties are widely distributed over different regions. These coding issues are re-examined for premotoneuronal (PreM) cells, whose correlational links with motoneurons are revealed by spike-triggered averages. PreM cells are recruited over long times relative to their target muscles. They show diverse response patterns relative to the muscle force they produce; functionally disparate PreM cells such as afferent fibers and descending corticomotoneuronal and rubromotoneuronal cells can exhibit similar patterns. Neural mechanisms have been further elucidated by neural network simulations of sensorimotor behavior; the pre-output hidden units typically show diverse responses relative to their targets. Thus, studies in which both the activity and the connectivity of the same units is known reveal that units with many kinds of relations to the task, simple and complex, contribute significantly to the output. This suggests that the search for explicit coding may be diverting us from understanding more distributed neural mechanisms that are more complex and less directly related to explicit movement paremeters. ------------------------------------------------------------------ 5. DOES THE NERVOUS SYSTEM DEPEND ON KINESTHETIC INFORMATION TO CONTROL NATURAL LIMB MOVEMENTS? S.C. Gandevia and David Burke Department of Clinical Neurophysiology Institute of Neurological Sciences The Prince Henry Hospital P.O. Box 233 Matraville, N.S.W. 2036 Sydney, Australia KEYWORDS: kinesthesia, motor control, muscle, joint and cutaneous afferents, motor commands, deafferentation ABSTRACT: This target article draws together two groups of experimental studies on the control of human movement through peripheral feedback and centrally generated signals of motor command. First, during natural movement, feedback from muscle, joint and cutaneous afferents changes; in human subjects these changes have reflexive and kinesthetic consequences. Recent psychophysical and microneurographic evidence suggests that joint and even cutaneous afferents may have a proprioceptive role. Second, the role of centrally generated motor commands in the control of normal movements and movements following acute and chronic of deafferentation is reviewed. There is increasing evidence that subjects can perceive their motor commands under various conditions, but this is inadequate for normal movement; deficits in motor performance arise when the reliance on proprioceptive feedback is abolished, either experimentally or because of pathology. During natural movement, the CNS appears to have access to functionally useful input from a range of receptors as well as from internally generated command signals. Remaining unanswered questions suggest a number of avenues for further research. ------------------------------------------------------------------ 6. CAN SENSE BE MADE OF SPINAL INTERNEURON CIRCUITS? David A. McCrea The Department of Physiology Faculty of Medicine University of Manitoba 770 Bannatyne Avenue Winnipeg, Manitoba, Canada R3E OW3 dave at scrc.umanitoba.ca KEYWORDS: interneuron, motor control, reflexes, spinal cord, flexion, muscle synery, presynaptic inhibition. ABSTRACT: It is increasingly clear that spinal reflex systems cannot be described in terms of simple and constant reflex actions. The extensive convergence of segmental and descending systems onto spinal interneurons suggests that spinal interneurons are not relay systems but rather form a crucial component in determining which muscles are activated during voluntary and reflex movements. The notion that descending systems simply modulate the gain of spinal interneuronal pathways has been tempered by the observation that spinal interneurons gate and distribute descending control to specific motoneurons. Spinal systems are complex, but current approaches will continue to provide insight into motor systems. During movement, several neural mechanisms act to reduce the functional complexity of motor systems by inhibiting some of the parallel reflex pathways available to segmental afferents and descending systems. The flexion reflex system is discussed as an example of the flexibility of spinal interneuron systems and as useful construct. Examples are provided of the kinds of experiments that can be developed using current approaches to spinal interneuronal systems. -------------------------------------------------------------------- 7. IMPLICATIONS OF NEURAL NETWORKS FOR HOW WE THINK ABOUT BRAIN FUNCTION David A. Robinson Ophthalmology, Biomedical Engineering, and Neuroscience The Johns Hopkins University, School of Medicine Room 355 Woods Res. Bldg. The Wilmer Institute Baltimore, MD 21205 KEYWORDS: Neural networks, signal processing, oculomotor system, vestibulo-ocular reflex, pursuit eye movements, saccadic eye movements, coordinate transformations ABSTRACT: Engineers use neural networks to control systems too complex for conventional engineering analysis. To examine hidden unit behavior would defeat the purpose of this approach, because individual units would be largely uninterpretable. Yet neurophysiologists spend their careers doing just that! Hidden units contain bits and pieces of signals that yield only arcane hints of network function and no information about how the units process signals. Most of the literature on single-unit recordings attests to this grim fact. On the other hand, knowing system function and describing it with elegant mathematics tells one very little about what to expect of interneuron behavior. Examples of simple networks based on neurophysiology are taken from the oculomotor literature to suggest how single-unit interpretability might degrade with increasing task complexity. Trying to explain how any real neural network works on a cell-by-cell, reductionist basis is futile; we may have to be content with understanding the brain at higher levels of organization. ------------------------------------------------------------------- 8. POSTERIOR PARIETAL CORTEX AND EGOCENTRIC SPACE J.F. Stein University Laboratory of Physiology University of Oxford Oxford, England OX1 3PT stein at vax.oxford.ac.uk KEYWORDS: posterior parietal cortex; egocentric space; space perception; attention; coordinate transformations; distributed systems; neural networks. ABSTRACT: The posterior parietal cortex (PPC) is the most likely site where egocentric spatial relationships are represented in the brain. PPC cells receive visual, auditory, somaesthetic and vestibular sensory inputs, oculomotor, head, limb and body motor signals, and strong motivational projections from the limbic system. Their discharge increases not only when an animal moves towards a sensory target, but also when it directs its attention to it. PPC lesions have the opposite effect: sensory inattention and neglect. PPC does not seem to contain a "map" of the location of objects in space but a distributed neural network for transforming one set of sensory vectors into other sensory reference frames or into various motor coordinate systems. Which set of transformation rules is used probably depends on attention, which selectively enhances the synapses needed for a making particular sensory comparison or aiming a particular movement. -------------------------------------------------------------- To help you decide whether you would be an appropriate commentator for any of these articles, a (nonfinal) draft of each will soon be retrievable by anonymous ftp from princeton.edu according to the instructions below (filenames will be of the form bbs.alexander, based on the name of the first author). Please do not prepare a commentary on this draft. Just let us know, after having inspected it, what relevant expertise you feel you would bring to bear on what aspect of the article. --------------------------------------------------------------- To retrieve a file by ftp from a Unix/Internet site, type either: ftp princeton.edu or ftp 128.112.128.1 When you are asked for your login, type: anonymous For your password, type your real name. then change directories with: cd pub/harnad To show the available files, type: ls Next, retrieve the file you want with (for example): get bbs.alexander When you have the file(s) you want, type: quit JANET users can use the Internet file transfer utility at JANET node UK.AC.FT-RELAY to get BBS files. Use standard file transfer, setting the site to be UK.AC.FT-RELAY, the userid as anonymous at edu.princeton, the password as your own userid, and the remote filename to be the filename according to Unix conventions (e.g. pub/harnad/bbs.article). Lower case should be used where indicated, using quotes if necessary to avoid automatic translation into upper case. --------------------------------------------------------------- The above cannot be done form Bitnet directly, but there is a fileserver called bitftp at pucc.bitnet that will do it for you. Send it the one line message help for instructions (which will be similar to the above, but will be in the form of a series of lines in an email message that bitftp will then execute for you). From andycl at syma.sussex.ac.uk Fri Jan 10 11:22:57 1992 From: andycl at syma.sussex.ac.uk (Andy Clark) Date: Fri, 10 Jan 92 16:22:57 GMT Subject: No subject Message-ID: <14634.9201101622@syma.sussex.ac.uk> bcc: andycl at cogs re: MA in Philosophy of Cognitive Science at Sussex University UNIVERSITY OF SUSSEX, BRIGHTON, ENGLAND SCHOOL OF COGNITIVE AND COMPUTING SCIENCES M.A. in the PHILOSOPHY OF COGNITIVE SCIENCE The is a one year course which aims to foster the study of foundational issues in Cognitive Science and Computer Modelling. It is designed for students with a background in Philosophy although offers may be made to exceptional students whose background is in some other discipline related to Cognitive Science. Students would combine work towards a 20,000 word philosophy dissertation with subsidiary courses concerning aspects of A.I. and the other Cognitive Sciences. General Information. The course is based in the School of Cognitive and Computing Sciences. The School provides a highly active and interdisciplinary environment involving linguists, cognitive psychologists, philosophers and A.I. researchers. The kinds of work undertaken in the school range from highly practical applications of new ideas in computing to the most abstract philosophical issues concerning the foundations of cognitive science. The school attracts a large number of research fellows and distinguished academic visitors, and interdisciplinary dialogue is encouraged by several weekly research seminars. Course Structure of the MA in Philosophy of Cognitive Science TERM 1 Compulsory Course: Philosophy of Cognitive Science Topic: The Representational Theory of Mind: From Fodor to Connectionism. and one out of : Artificial Intelligence Programming (Part I) Knowledge Representation Natural Language Syntax Psychology I Computer Science I Modern Analytic Philosophy (1) Modern European Philosophy (1) Artificial Intelligence and Creativity TERM 2 Compulsory Course: Philosophy of Cognitive Science (II) Topic: Code,Concept and Process: Philosophy, Neuropsychology and A.I. and one out of: Artificial Intelligence Programming (Part II) Natural Language Processing Computer Vision Neural Networks Intelligent Tutoring Systems Psychology II Computer Science II Social Implications of AI Modern Analytic Philosophy (2) Modern European Philosophy (2) TERM 3 Supervised work for the Philosophy of Cognitive Science dissertation (20,000 words) Courses are taught by one hour lectures , two hour seminars and one hour tutorials. Choice of options is determined by student preference and content of first degree. Not all options will always be available and new options may be added according to faculty interests. CURRENT TEACHING FACULTY for the MA Dr A. Clark Philosophy of Cognitive Science I and II Mr R.Dammann Recent European Philosophy Dr M.Morris Recent Analytic Philosophy Dr S Wood and Mr R Lutz AI Programming I Dr B Katz Knowledge Representation Neural Networks Dr N Yuill Psychology I Dr M. Scaife Psychology II Prof M Boden Artificial Intelligence and Creativity Social Implications of AI Dr L Trask Natural Language Syntax \& Semantics Dr S Easterbrook Computer Science I \& II Dr D Weir Logics for Artificial Intelligence Dr D Young Computer Vision Dr B Keller Natural Language Processing Dr Y Rogers & Prof B du Boulay Intelligent Tutoring Systems ENTRANCE REQUIREMENTS These will be flexible. A first degree in Philosophy or one of the Cognitive Sciences would be the usual minimum requirement. FUNDING U.K.students may apply for British Academy funding for this course in the usual manner. Overseas students would need to be funded by home bodies. CONTACT For an application form, or further information, please write to Dr Allen Stoughton at the School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton BN1 9QH, or phone him on (0273) 606755 ext. 2882, or email - allen at cogs.sussex.ac.uk. From SCCS6082%IRUCCVAX.UCC.IE at BITNET.CC.CMU.EDU Mon Jan 13 04:46:00 1992 From: SCCS6082%IRUCCVAX.UCC.IE at BITNET.CC.CMU.EDU (SCCS6082%IRUCCVAX.UCC.IE@BITNET.CC.CMU.EDU) Date: Mon, 13 Jan 1992 09:46 GMT Subject: No subject Message-ID: <01GF9HC0EGGW0006LI@IRUCCVAX.UCC.IE> Hi, My request is similar to Nick Flans, I'm looking for any SOURCE code for any type of neural network. I'm working on a simulator to compare learning times for different types. I already have quickprop, rcc and one or two others but I'm looking for more so I can have a good data set.Anything in C,C++ ,Pascal or LISP ,for any machine, would be much appreciated.I won't be claiming credit in my thesis for anything I receive. Also has anyone got a set of standard (or their standard) benchmarks that they use for training? I use various ones that people present in their papers when they propose a new algorithm , but their doesn't seem to be anything standard around, Thanking you in advance, Colin McCormack, University College Cork, Ireland. From DOLL%BROWNCOG.BITNET at BITNET.CC.CMU.EDU Mon Jan 13 08:22:00 1992 From: DOLL%BROWNCOG.BITNET at BITNET.CC.CMU.EDU (DOLL%BROWNCOG.BITNET@BITNET.CC.CMU.EDU) Date: Mon, 13 Jan 1992 08:22 EST Subject: Lost Mail Message-ID: <01GF9EFDGY1W000D8T@BROWNCOG.BITNET> From postmaster%gonzo.inescn.pt at CARNEGIE.BITNET Thu Jan 9 18:17:46 1992 From: postmaster%gonzo.inescn.pt at CARNEGIE.BITNET (PT Gateway) Date: Thu, 9 Jan 92 23:17:46 GMT Subject: PT Mail Network -- failed mail Message-ID: <9201092317.AA04593@gonzo.inescn.pt> ----- Mail failure diagnostics ----- From j_bonnet%inescn.pt at CARNEGIE.BITNET Thu Jan 9 09:11:00 1992 From: j_bonnet%inescn.pt at CARNEGIE.BITNET (Jose' M. Bonnet) Date: 9 Jan 92 9:11 Subject: Mailing List Message-ID: <104*j_bonnet@inescn.pt> Call for Participation in a Workshop on THE COGNITIVE SCIENCE OF NATURAL LANGUAGE PROCESSING 14-15 March, 1992 Dublin City University Guest Speakers: George McConkie University of Illinois at Urbana-Champaign Kim Plunkett University of Oxford Noel Sharkey University of Exeter Attendance at the CSNLP workshop will be by invitation on the basis of a submitted paper. Those wishing to be considered should send a paper (hardcopy, no e-mail submissions please) of not more than eight A4 pages to Ronan Reilly (e-mail: ronan_reilly at eurokom.ie), Educational Research Centre, St Patrick's College, Dublin 9, Ireland, not later than 3 February, 1992. Notification of acceptance along with registration and accommodation details will be sent out by 17 February, 1992. Submitting authors should also send their fax number and/or e-mail address to help speed up the selection process. The particular focus of the workshop will be on the computational modelling of human natural language processing (NLP), and preference will be given to papers that present empirically supported computational models of any aspect of human NLP. An additional goal in selecting papers will be to provide coverage of a range of NLP areas. This workshop is supported by the following organisations: Educational Research Centre, St Patrick's College, Dublin; Linguistics Insititute of Ireland; Dublin City University; and the Commission of the European Communities through the DANDI ESPRIT Basic Research Action (No. 3351). From lazzaro at boom.CS.Berkeley.EDU Tue Jan 14 16:25:14 1992 From: lazzaro at boom.CS.Berkeley.EDU (John Lazzaro) Date: Tue, 14 Jan 92 13:25:14 PST Subject: No subject Message-ID: <9201142125.AA00345@boom.CS.Berkeley.EDU> An announcement of a NIPS-4 preprint on the neuroprose server ... Temporal Adaptation in a Silicon Auditory Nerve John Lazzaro, CS Division, UC Berkeley Abstract -------- Many auditory theorists consider the temporal adaptation of the auditory nerve a key aspect of speech coding in the auditory periphery. Experiments with models of auditory localization and pitch perception also suggest temporal adaptation is an important element of practical auditory processing. I have designed, fabricated, and successfully tested an analog integrated circuit that models many aspects of auditory nerve response, including temporal adaptation. ----- To retrieve ... >ftp cheops.cis.ohio-state.edu >Name (cheops.cis.ohio-state.edu:lazzaro): anonymous >331 Guest login ok, send ident as password. >Password: your_username >230 Guest login ok, access restrictions apply. >cd pub/neuroprose >binary >get lazzaro.audnerve.ps.Z >quit %uncompress lazzaro.audnerve.ps.Z %lpr lazzaro.audnerve.ps ---- --john lazzaro lazzaro at boom.cs.berkeley.edu From marwan at ee.su.OZ.AU Tue Jan 14 19:44:58 1992 From: marwan at ee.su.OZ.AU (Marwan Jabri) Date: Wed, 15 Jan 1992 11:44:58 +1100 Subject: No subject Message-ID: <9201150044.AA23190@brutus.ee.su.OZ.AU> Sydney University Electrical Engineering RESEARCH FELLOW IN MACHINE INTELLIGENCE Applications are invited for a position as a Girling Watson Research Fellow to work for the Machine Intelligence Group in the area of information integration and multi-media. The applicant should have a strong research and development experience, preferably with a background in machine intelligence, artificial neural networks or multi-media. The project involves research into multi-source knowledge representation, integration, machine learning and associated computing architectures. The applicant should have either a PhD or an equivalent industry research and development experience. The appointment is available for a period of three years, subject to the submission of an annual progress report. Salary is in the range of Research Fellow: A$ 39,463 to A$ 48,688. Top of salary range unvailable until July 1992. For further information contact Dr M. Jabri, Tel: (+61-2) 692-2240, Fax: (+61-2) 660-1228. Membership of a superannuation scheme is a condition of employment for new appointees. Method of applications: Applications quoting Ref No: 02/16, including curriculum vitae and the names, addresses and phone nos of two referees, should be sent to the: Assistant Registrar (appointments) Staff Office (KO7), The University of Sydney, NSW 2006 Australia Closing: January 23, 1992. From gordon at AIC.NRL.Navy.Mil Wed Jan 15 15:59:56 1992 From: gordon at AIC.NRL.Navy.Mil (gordon@AIC.NRL.Navy.Mil) Date: Wed, 15 Jan 92 15:59:56 EST Subject: workshop announcement Message-ID: <9201152059.AA28490@sun25.aic.nrl.navy.mil> CALL FOR PAPERS Informal Workshop on ``Biases in Inductive Learning" To be held after ML-92 Saturday, July 4, 1992 Aberdeen, Scotland All aspects of an inductive learning system can bias the learn- ing process. Researchers to date have studied various biases in inductive learning such as algorithms, representations, background knowledge, and instance orders. The focus of this workshop is not to examine these biases in isolation. Instead, this workshop will examine how these biases influence each other and how they influence learning performance. For example, how can active selection of instances in concept learning influence PAC convergence? How might a domain theory affect an inductive learning algorithm? How does the choice of representational bias in a learner influence its algo- rithmic bias and vice versa? The purpose of this workshop is to draw researchers from diverse areas to discuss the issue of biases in inductive learning. The workshop topic is a unifying theme for researchers working in the areas of reformulation, constructive induction, inverse resolu- tion, PAC learning, EBL-SBL learning, and other areas. This workshop does not encourage papers describing system comparisons. Instead, the workshop encourages papers on the following topics: - Empirical and analytical studies comparing different biases in inductive learning and their quantitative and qualitative influ- ence on each other or on learning performance - Studies of methods for dynamically adjusting biases, with a focus on the impact of these adjustments on other biases and on learning performance - Analyses of why certain biases are more suitable for particular applications of inductive learning - Issues that arise when integrating new biases into an existing inductive learning system - Theory of inductive bias Please send 4 hard copies of a paper (10-15 double-spaced pages, ML-92 format) or (if you do not wish to present a paper) a descrip- tion of your current research to: Diana Gordon Naval Research Laboratory, Code 5510 4555 Overlook Ave. S.W. Washington, D.C. 20375-5000 USA Email submissions to gordon at aic.nrl.navy.mil are also acceptable, but they must be in postscript. FAX submissions will not be accepted. If you have any questions about the workshop, please send email to Diana Gordon at gordon at aic.nrl.navy.mil or call 202-767- 2686. Important Dates: March 12 - Papers and research descriptions due May 1 - Acceptance notification June 1 - Final version of papers due Program Committee: Diana Gordon, Naval Research Laboratory Dennis Kibler, University of California at Irvine Larry Rendell, University of Illinois Jude Shavlik, University of Wisconsin William Spears, Naval Research Laboratory Devika Subramanian, Cornell University Paul Vitanyi, CWI and University of Amsterdam From jim at gdsnet.grumman.com Wed Jan 15 18:08:55 1992 From: jim at gdsnet.grumman.com (Jim Eilbert) Date: Wed, 15 Jan 92 18:08:55 EST Subject: Paper available: Neural mechanisms for steering in visual motion In-Reply-To: Jonathan Marshall's message of Thu, 9 Jan 92 14:31:05 -0500 <9201091931.AA19960@marshall.cs.unc.edu> Message-ID: <9201152308.AA22264@gdsnet.grumman.com> Jonathan, Iam interested in getting a copy of your paper Challenges of Vision Theory: Self-Organization of Neural Mechanisms for Stable Steering of Object-Grouping Data in Visual Motion Perception Jonathan A. Marshall However, the host list on my computer does not know about ftp archive.cis.ohio-state.edu Could you send me the network address of this computer. If that is not readily available, I'll wait for a hardcopy. Thanks, Jim Eilbert M/S A02-26 Grumman CRC Bethpage, NY 11714 From SAYEGH at CVAX.IPFW.INDIANA.EDU Wed Jan 15 20:52:40 1992 From: SAYEGH at CVAX.IPFW.INDIANA.EDU (SAYEGH@CVAX.IPFW.INDIANA.EDU) Date: Wed, 15 Jan 1992 20:52:40 EST Subject: Proceedings Announcement Message-ID: <920115205240.21a00cdd@CVAX.IPFW.INDIANA.EDU> The Proceedings of the Fourth Conference on Neural Networks and Parallel Distributed Processing at Indiana University-Purdue University at Fort Wayne, held April 11, 12, and 13, 1991 are now available. They can be ordered ($6 + $1 U.S. mail cost) from: Ms. Sandra Fisher, Physics Department Indiana University-Purdue University at Fort Wayne Fort Wayne, IN 46805-1499 FAX: (219) 481-6880 Voice: (219) 481-6306 OR 481-6157 email: proceedings at ipfwcvax.bitnet The following papers are included in the Proceedings: Optimization and genetic algorithms: J.L. Noyes, Wittenberg University Neural Network Optimization Methods Robert L. Sedlmeyer, Indiana University-Purdue University at Fort Wayne A Genetic Algorithm to Estimate the Edge-Intergrity of Halin Graphs Omer Tunali & Ugur Halici, University of Missouri/Rolla A Boltzman Machine for Hypercube Embedding Problem William G. Frederick and Curt M. White, Indiana University-Purdue University at Fort Wayne Genetic Algorithms and a Variation on the Steiner Point Problem Network analysis: P.G. Madhavan, B. Xu, B. Stephens, Purdue University, Indianapolis On the Convergence Speed and the Generalization Ability of Tri-state Neural Networks Mohammad R. Sayeh, Southern Illinois University at Carbondale Dynamical-System Approach to Unsupervised Classifier Samir I. Sayegh, Indiana University-Purdue University at Fort Wayne Symbolic Manipulation and Neural Networks Zhenni Wang, Ming T. Tham & A.J. Morris, University of Newcastle upon Tyne Multilayer Neural Networks: Approximated Canonical Decomposition of Nonlinearity M.G. Royer & O.K. Ersoy, Purdue University, West Lafayette Classification Performance of Pshnn with BackPropagation Stages Sean Carroll, Tri-State University Single-Hidden-Layer Neural Nets Can Approximate B-Splines G. Allen Pugh, Indiana University-Purdue University at Fort Wayne Further Design Considerations for Back Propagation Biological aspects: R. Manalis, Indiana University-Purdue University at Fort Wayne Short Term Memory Implicated in Twitch Facilitation Edgar Erwin, K. Obermayer, University of Illinois Formation and Variability of Somatotopic Maps with Topological Mismatch T. Alvager, B. Humpert, P. Lu, and C. Roberts, Indiana State University DNA Sequence Analysis with a Neural Network Christel Kemke, DFKI, Germany Towards a Synthesis of Neural Network Behavior Arun Jagota, State University of New York at Buffalo A Forgetting Rule and Other Extensions to the Hopfield-Style Network Storage Rule and Their Applications applications: I.H. Shin and K.J. Cios, The University of Toledo A Neural Network Paradigm and Architecture for Image Pattern Recognition R.E. Tjia, K.J. Cios and B.N. Shabestari, The University of Toledo Neural Network in Identification of Car Wheels from Gray Level Images M.D. Tom and M.F. Tenorio, Purdue University, West Lafayette A Neuron Architecture with Short Term Memory S. Sayegh, C. Pomalaza-Raez, B. Beer and E. Tepper, Indiana University-Purdue University at Fort Wayne Pitch and Timbre Recognition Using Neural Network Jacek Witaszek & Colleen Brown, DePaul University Automatic Construction of Connectionist Expert Systems Robert Zerwekh, Northern Illinois University Modeling Learner Performance: Classifying Competence Levels Using Adaptive Resonance Theory tutorial lectures: Marc Clare, Lincoln National Corporation, Fort Wayne An Introduction to the Methodology of Building Neural Networks Ingrid Russell, University of Hartford Integrating Neural Networks into an AI Course Arun Jagota, State University of New York at Buffalo The Hopfield Model and Associative Memory Ingrid Russell, University of Hartford Self Organization and Adaptive Resonance Theory Models Note: Copies of the Proceedings of the Third Conference on NN&PDP are also available and can be ordered from the same address. From PSS001 at VAXA.BANGOR.AC.UK Thu Jan 16 05:55:37 1992 From: PSS001 at VAXA.BANGOR.AC.UK (PSS001@VAXA.BANGOR.AC.UK) Date: Thu, 16 JAN 92 10:55:37 GMT Subject: No subject Message-ID: <01GFDG54Y5SW8Y5CAT@BITNET.CC.CMU.EDU> Workshop on Neurodynamics and Psychology April 22nd -April 24th 1992 Cognitive Neurocomputation Unit, University of Wales, Bangor Session chairs and likely speakers include: Igor Aleksander (London) Alan Allport (Oxford) Jean-Pierre Changeux (Paris) Stanislas Dehaene (Paris) Glyn Humphreys (Birmingham) Marc Richelle (Lige) Tim Shallice (London) John Taylor (London) David Willshaw (Edinburgh) The purpose of this workshop is to bring together researchers to outline and define a new area of research that has arisen from work within such diverse disciplines as neurobiology, cognitive psychology, artificial intelligence and computer science. This area concerns the representation of time within natural and artificial neural systems, and the role these representations play in behaviours from spatial learning in animals to high-level cognitive functions such as language processing, problem solving, reasoning, and sequential pattern recognition in general. Attendance at this workshop will be limited to 50 to allow ample time for discussion. For further details contact: Mike Oaksford or Gordon Brown, Neurodynamics Workshop, Cognitive Neurocomputation Unit, Department of Psychology, University of Wales, Bangor, Gwynedd, LL57 2DG, United Kingdom. Tel: 0248 351151 Ext. 2211. Email: PSS//1 at uk.ac.bangor.vaxa Sponsored by the British Psychological Society (Welsh Branch) From hinton at ai.toronto.edu Thu Jan 16 10:07:26 1992 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Thu, 16 Jan 1992 10:07:26 -0500 Subject: NIPS preprint in neuroprose Message-ID: <92Jan16.100733edt.73@neuron.ai.toronto.edu> The following paper is available as hinton.handwriting.ps.Z in neuroprose ADAPTIVE ELASTIC MODELS FOR HAND-PRINTED CHARACTER RECOGNITION Geoffrey E. Hinton, Christopher K. I. Williams and Michael D. Revow Department of Computer Science, University of Toronto ABSTRACT Hand-printed digits can be modeled as splines that are governed by about 8 control points. For each known digit, the control points have preferred "home" locations, and deformations of the digit are generated by moving the control points away from their home locations. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the deformation energy of the digit model and the log probability that the model would generate the inked pixels in the image. The model with the lowest total energy wins. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image. The digit models learn by modifying the home locations of the control points. From becker at ai.toronto.edu Thu Jan 16 14:44:36 1992 From: becker at ai.toronto.edu (becker@ai.toronto.edu) Date: Thu, 16 Jan 1992 14:44:36 -0500 Subject: NIPS preprint in neuroprose Message-ID: <92Jan16.144445edt.10@neuron.ai.toronto.edu> The following paper is available as becker.prediction.ps.Z in neuroprose: LEARNING TO MAKE COHERENT PREDICTIONS IN DOMAINS WITH DISCONTINUITIES Suzanna Becker and Geoffrey E. Hinton Department of Computer Science, University of Toronto ABSTRACT We have previously described an unsupervised learning procedure that discovers spatially coherent properties of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hinton, 1992). In this paper, we propose two new models which handle surfaces with discontinuities. The first model attempts to detect cases of discontinuities and reject them. The second model develops a mixture of expert interpolators. It learns to detect the locations of discontinuities and to invoke specialized, asymmetric interpolators that do not cross the discontinuities. From rsun at orion.ssdc.honeywell.com Thu Jan 16 17:21:46 1992 From: rsun at orion.ssdc.honeywell.com (Ron Sun) Date: Thu, 16 Jan 92 16:21:46 CST Subject: No subject Message-ID: <9201162221.AA07989@orion.ssdc.honeywell.com> Call For Participation The AAAI Workshop on Integrating Neural and Symbolic Processes (the Cognitive Dimension) to be held at the Tenth National Conference on Artificial Intelligence July 12-17, 1992 San Jose, CA There has been a large amount of research in integrating neural and symbolic processes that uses networks of simple units. However, there is relatively little work so far in comparing and combining these fairly isolated efforts. This workshop will provide a forum for discussions and exchanges of ideas in this area, to foster cooperative work, and to produce synergistic results. The workshop will tackle important issues in integrating neural and symbolic processes, such as: What are the fundamental problems of integrating neural and symbolic processes? Why should we integrate them after all? What class of problems is well-suited to such integration? What are the relative advantages of each approach or technique in achieving such integrations? Is cognitive plausibility an important criterion? How do we judge the cognitive plausibility of existing approaches? what is the nature of psychological and/or biological evidence for existing models, if there is any? What role does emergent behavior vs. a localist approach play in integrating these processes? (Explicit symbol manipulation vs. their functional counterparts) Is it possible to synthesize various existing models? ----------------------- The workshop will include invited talks, presentations, questions and answers, and general discussion sessions. Currently invited speakers include: Jerome Feldman, ICSI; Stuart Dreyfus, IEOR, UC Berkeley. Jim Hendler, U Maryland. Research addressing connectionist rule-based reasoning, connectionist natural language processing, other high-level connectionist models, compositionality, connectionist knowledge representation is particularly relevant to the workshop. ------------------------ If you wish to present, submit an extended abstract (up to 5 pages); If you only wish to attend the workshop, send a one-page description of your interest; All submissions should include 4 hardcopies AND 1 electronic copy (via e-mail) by March 13, 1992 to the workshop chair: Dr. Ron Sun \\ Honeywell SSDC \\ 3660 Technology Drive \\ Minneapolis, MN 55418 \\ rsun at orion.ssdc.honeywell.com \\ (612)-782-7379 \\ Organizing Committee: Dr. Ron Sun, Honeywell SSDC; Dr. Lawrence Bookman, Brandeis University; Prof. Shashi Shekhar, University of Minnesota. From suddarth at cs.UMD.EDU Fri Jan 17 13:15:16 1992 From: suddarth at cs.UMD.EDU (Steven C. Suddarth) Date: Fri, 17 Jan 92 13:15:16 -0500 Subject: GNN 92 call for papers Message-ID: <9201171815.AA04780@mimsy.cs.UMD.EDU> The following is an announcement for the 3rd annual GNN meeting. It is one of the few refereed forums for applications. Last year we accepted about one out of four papers, and generally had an interesting meeting. The meeting ws also useful for those seeking collaborators. If you have made any significant contributions to a neural-network oriented application, you may want to submit an abstract. Steve Suddarth suddarth at cs.umd.edu ***************************************************************** Government Neural Network Applications Workshop G N N 9 2 Dayton, Ohio, August 24-28, 1992 ----------------------------- C A L L F O R P A P E R S ----------------------------- The 1992 Government Neural Network Applications Workshop will be held from 24-28 August at the Hope Hotel, Wright-Patterson AFB, Ohio. The tentative schedule is: 24 Aug - Registration and Tutorial 25-27 Aug - Main Meeting (includes export-controlled session) 28 Aug - Classified Meeting (tentative) * Authors are invited to submit abstracts on any application- oriented topic of interest to government agencies, this inludes: - Image/speech/signal - Man-machine - Detection and processing interface classification - Guidance and control - Medicine - Robotics * Presentations will be selected based upon two-page abstracts. Please note that ABSTRACTS LONGER THAN TWO PAGES WILL BE RETURNED. Also, the ABSTRACT DEADLINE IS APRIL 15, 1992. Abstracts should be accompanied by a cover letter stating the affiliation, address and phone number of the author. The cover letter should also state whether the presentation will be unclassified (open dissemination), unclassified (export controlled) or classified. Send abstracts to the following addresses Unclassified: Classified: GNN 92 GNN 92 Maj. Steven K. Rogers Maj. Steven K. Rodgers AFIT/ENG AFIT/ENA WPAFB, OH 45433 WPAFB, OH 45433 * The export-controlled session will be open to U.S. citizens only. Please use this session for unclassified material that you wish to present in a limited way. * Classified abstracts must contain a description of the classified portion and authors must make clear why classified material is important to the presentation. Please note that any "classified" abstracts without well marked classified content will be automatically rejected. Finally, the classified meeting will only take place if the classified program committee deems there to be a sufficient number of quality papers on worthwhile classified subjects. If authors are concerned about reducing the dissemination of unclassified material, they should use the export- controlled session of the main meeting. If the classified portion is unimportant to the main thrust of the abstract, it should be "sanitized" and submitted as unclassified. If accepted, classified authors will be allotted space in the proceedings for an (optional) unclassified paper. * Registration fees are not yet final, but they are expected to be in the $200 to $300 range, and they will include some meals. * The all-day tutorial on August 24 will be conducted by AFIT faculty. It will be oriented toward those who are new to the field and would like sufficient background for the remainder of the meeting. There will be no extra cost for the tutorial. Thanks to the Army, last year's meeting was a big hit. It was one of the few selective application-oriented meetings in this field. We look forward to your presence, and we know that you will find this meeting informative and useful whether you are currently building neural network applications, are thinking about them, or are contributing to theories that underpin them. Conference chairs are: General Chair: Capt. Steve Suddarth Mathematics Department (AFOSR/NM) Air Force Office of Scientific Research Bolling AFB DC 20332-6448 Comm: (202) 767-5028 DSN: 295-5028 Fax: (202) 404-7496 suddarth at cs.umd.edu Program Chair: Maj. Steve Rogers School of Engineering (AFIT/ENG) Air Force Institute of Technology Wright-Patterson AFB OH 45433 Comm: (513) 255-9266 DSN: 785-9266 Fax: (513) 476-4055 rogers at blackbird.afit.af.mil From krogh at cse.ucsc.edu Fri Jan 17 15:02:35 1992 From: krogh at cse.ucsc.edu (Anders Krogh) Date: Fri, 17 Jan 92 12:02:35 -0800 Subject: NIPS paper in Neuroprose Message-ID: <9201172002.AA05878@spica.ucsc.edu> The following paper has been placed in the Neuroprose archive Title: A Simple Weight Decay Can Improve Generalization Authors: Anders Krogh and John A. Hertz Filename: krogh.weight-decay.ps.Z (To appear in proceedings from NIPS 91) Abstract: It has been observed in numerical simulations that a weight decay can improve generalization in a feed-forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk. ---------------------------------------------------------------- FTP INSTRUCTIONS unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: anything ftp> cd pub/neuroprose ftp> binary ftp> get krogh.weight-decay.ps.Z ftp> bye unix> zcat krogh.weight-decay.ps.Z | lpr (or however you uncompress and print postscript) From bill at baku.eece.unm.edu Fri Jan 17 21:11:22 1992 From: bill at baku.eece.unm.edu (bill@baku.eece.unm.edu) Date: Fri, 17 Jan 92 19:11:22 MST Subject: Multi-layer threshold logic function question Message-ID: <9201180211.AA22025@baku.eece.unm.edu> Does anybody know of bounds on the number of hidden layer units required to implement an arbitrary logic function over n variables in a one-hidden layer MLP where each node uses hard-limiting nonlinearities? Obviously, 2^n is a upper bound: You can form each possible conjunction of n variables in the hidden layer, and then selectively combine them disjunctively with the output node. This seems like overkill though.... I've seen some other bounds in Muroga's book which allow for multiple layers and they are on the order of O(2^(n/2)), but I'm looking specifically for a bound for a one-hidden layer net. -Bill =============================================================================== Bill Horne | email: bill at baku.eece.unm.edu Dept. of Electical and Computer Engineering | University of New Mexico | Phone: (505) 277-0805 Albuquerque, NM 87131 USA | Office: EECE 224D =============================================================================== From finton at cs.wisc.edu Fri Jan 17 22:16:34 1992 From: finton at cs.wisc.edu (David J. Finton) Date: Fri, 17 Jan 92 21:16:34 -0600 Subject: Lost Mail Message-ID: <9201180316.AA14631@lactic.cs.wisc.edu> q h From ross at psych.psy.uq.oz.au Sat Jan 18 08:13:20 1992 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 19 Jan 1992 00:13:20 +1100 Subject: trying to find JM SOPENA of BARCELONA Message-ID: <9201181313.AA23110@psych.psy.uq.oz.au> J.M. Sopena of the University of Barcelona posted notice of a paper on ESRP: a Distributed Connectionist Parser, some weeks back. The contact address was given as: d4pbjss0 at e0ub011.bitnet My mail to Sopena has been bounced by the bitnet gateway (cunyvm.bitnet) with a 'cannot find mailbox' message. Would Dr Sopena please contact me directly or perhaps someone wha HAS got through to Sopena might get in touch with me. MY APOLOGIES TO THE 99.9% OF THE LIST FOR WHOM THIS IS IRRELEVANT. Thankyou Ross Gayler ross at psych.psy.uq.oz.au From LZHAO at swift.cs.tcd.ie Sat Jan 18 19:54:00 1992 From: LZHAO at swift.cs.tcd.ie (LZHAO@swift.cs.tcd.ie) Date: Sun, 19 Jan 1992 01:54 +0100 Subject: I would like to put my name on your list Message-ID: <8A7B8495C0006D91@cs.tcd.ie> my e-mail address is lzhao at cs.tcd.ie From B344DSL at UTARLG.UTA.EDU Mon Jan 20 11:15:00 1992 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 20 Jan 1992 10:15 CST Subject: Registration and tentative program for a conference in Dallas, Feb.6-8. Message-ID: <01GFJAE9SU2C00018I@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From B344DSL at UTARLG.UTA.EDU Mon Jan 20 20:09:00 1992 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 20 Jan 1992 19:09 CST Subject: Registration form and tentative program for conference previously announced. Message-ID: <01GFJT21UXPC0001UF@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From terry at jeeves.UCSD.EDU Tue Jan 21 04:41:45 1992 From: terry at jeeves.UCSD.EDU (Terry Sejnowski) Date: Tue, 21 Jan 92 01:41:45 PST Subject: Neural Computation 4:1 Message-ID: <9201210941.AA11715@jeeves.UCSD.EDU> Neural Computation Volume 4, Issue 1, January 1992 Review: Neural Networks and the Bias/Variance Dilemma Stuart German, Elie Bienenstock, and Rene Doursat Article: A Model for the Action of NMDA Conductances in the Visual Cortex Kevin Fox and Nigel Daw Letters: Alternating and Synchronous Rhythms in Reciprocally Inhibitory Model Neurons Xiao-Jing Wang and John Rinzel Feature Extraction Using an Unsupervised Neural Network Nathan Intrator Speaker-Independent Digit Recognition Using a Neural Network with Time-Delayed Connections K. P. Unnikirishnan, J. J. Hopfield, and D. W. Tank Local Feedback Multilayered Networks Paolo Frasconi, Marco Gori, and Giovanni Soda Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks Jurgen Schmidhuber ----- SUBSCRIPTIONS - VOLUME 4 - BIMONTHLY (6 issues) ______ $40 Student ______ $65 Individual ______ $150 Institution Add $12 for postage and handling outside USA (+7% for Canada). (Back issues from Volumes 1-3 are available for $17 each.) MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142. (617) 253-2889. ----- From golden at utdallas.edu Tue Jan 21 12:55:06 1992 From: golden at utdallas.edu (Richard Golden) Date: Tue, 21 Jan 1992 11:55:06 -0600 Subject: No subject Message-ID: <92Jan21.115518cst.16234@utdallas.edu> I apologize if this conference announcement was already sent out over the CONNECTIONISTS network. CONFERENCE ANNOUNCEMENT: OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? AT UNIVERSITY OF TEXAS AT DALLAS The following NEURAL NETWORK conference will be held at the University of Texas at Dallas February 6-9, 1992. It is sponsored by the Metroplex Institute for Neural Dynamics (MIND), Texas SIG of International Neural Network Society (INNS), and the University of Texas at Dallas. ------------------- The conference will focus upon the following two themes: (1) How can particular neural functions be optimized in a network? (2) Are particular functions performed optimally by biological or artificial neural network architectures? -------------------- Invited Speakers Include: Gail Carpenter (Boston University) Stephen Grossberg (Boston University) Steven Hampson (U. of California Irvine) Karl Pribram (Radford University) David Stork (Ricoh Corporation) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) -------------------- The one-hour oral presentations will be non-overlapping. Location: University of Texas at Dallas Thursday and Friday at Conference Center Saturday at Green Building Auditorium Conference Hotel: Richardson Hilton and Towers Conference Fees: Student members of MIND or INNS or UTD: $10 Other students: $20 Non-student members of MIND or INNS: $80 Other non-students: $80 ---------------------- Contacts: Professor Dan Levine (University of Texas at Arlington) Dr. Manuel Aparicio (IBM research) Professor Alice O'Toole (University of Texas at Dallas) ------------------------- A registration form is included at the end of this email message after the tentative schedule. I am sure that "on-site" registration will be available as well but I do not know the details. -------------------------------- TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? --------------------------------------------------- Registration for conference on optimality in Biological and Artificial Networks. Name (last,first,middle) _______________________________________________________________ Mailing Address ____________________________________________ ____________________________________________ ____________________________________________ ___________________________________________- Affiliation ____________________________________________ Telephone ____________________________________________ Email (if any) ____________________________________________ Registration fee (please enclose check payable to MIND): Registration Fees: Student members of MIND or INNS or UTD: $10 Other students: $20 Non-student members of MIND or INNS: $80 Other non-students: $80 Registration Fee Type (e.g., student member, non-student, etc.): --------------------------------------- Amount of check _______________________________________ Hotel: Do you need a registration card (rooms are $59/night)? _________________ Do you wish to share a room? _____________________ -------------------------------------- Reduced fares are available to Dallas-Fort Worth on American Airlines. Call the airline and ask for Starfile S14227D, under the name of MIND. Preregistrants whose forms and payment checks are received by Jan. 31 will be mailed a preregistration package with a confirmation. This will include a complete schedue with times of presentations and directions to sites. --------------------------------------- Please send registration form to: Professor Daniel S. Levine Department of Mathematics Box 19408 University of Texas at Arlington Arlington, TX 76019-0408 Office: 817-273-3598 FAX: 817-794-5802 email: b344dsl at utarlg.uta.edu From uh311ae at sunmanager.lrz-muenchen.de Tue Jan 21 16:00:28 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 21 Jan 92 22:00:28+0100 Subject: Bengio's paper on 'learning learning rules' Message-ID: <9201212100.AA01632@sunmanager.lrz-muenchen.de> I just ftp'ed and read the above paper, which is available on the neuroprose server. Essentially, it proposes to use an optimizer (like a gradient descent method or a genetic algorithm) to optimize very global network parameters, last not least the learning rule itself. The latter might be accomplished by e.g. having a ga switching individual modules that get combined into a hete- rogenous rule on and off. Unfortunately, the paper does not include any si- mulation data to support this idea. I did some experiments last year which might be of interest, because they do support Bengio's predictions. Due to the hunger for resources exhibited by a GA that has expensive function evaluations (network training tests), the results are based on the usual set of toy problems like xor and a 16-2-16 encoder. The problem was wether and if, with what mixing factor, to couple two dif- fering learning rules into a hybrid. This is not as straightforward as to simply evaluate the mixing factor, because typically differing algorithms like different 'environments' to work in. More specifically, the two algo- rithms I considered are very sensitive to initialization range of the net- work weights and prefer rather nonoverlapping values. This complicated d the search for a good mixing factor into a multi-parameter nobody-knows problem, because I couldn't a priori rule out that a good hybrid would exist with unknown initialization parameters. One night of 36MHz R3000 sweat produced a nice hybrid with improved conver- gence for the tested problems, thus Bengio's claims get some support from me. I'd like to add, though, that more advanced searches are very likely requi- ring very long and careful optimization runs, if the GA is to sample a sufficiently large part of the search space. A hint to the practitioneer: It helps to introduce (either by hand, or dyna- mically) precision and range 'knobs' into the simulation, which makes it pos- sible to start with low precision, large range. It is also helpful to average at least 10, better 20+ individual network runs into a single function eva- luation. The GA could in principle deal with this noise, but is actually hard pressed when confronted with networks which sometimes do & sometimes don't converge. Cheers, Henrik Klagges IBM Research rick at vee.lrz-muenchen.de & henrik at mpci.llnl.gov From B344DSL at utarlg.uta.edu Mon Jan 20 11:15:00 1992 From: B344DSL at utarlg.uta.edu (B344DSL@utarlg.uta.edu) Date: Mon, 20 Jan 1992 10:15 CST Subject: Registration and tentative program for a conference in Dallas, Feb.6-8. Message-ID: <01GFJAE9SU2C00018I@utarlg.uta.edu> TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8, 1992 ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- (title to be announced) Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Ian Parberry, University of North Texas -- (title to be announced) Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew University -- For What are Brains Striving? Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of Nonstationary Data by Self-Organizing Neural Networks Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a Classical Conditioning Network Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Saturday, Feb. 8, AM: Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive Processes? Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator Paul Prueitt, Georgetown University -- (title to be announced) Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by Competitive Neural Networks John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE (alphabetical by first author): Generalization of the Linear Auto-Associator Herve Abdi, Dominique Valentin, and Alice J. O'Toole University of Texas at Dallas The classical auto-associator can be used to model some processes in prototype abstraction. In particular, the eigenvectors of the auto-associative matrix have been interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and Abdi, 1989). It has also been noted that computing these eigenvectors is equivalent to performing the principal component analysis of the matrix of objects to be stored in the memory. This paper describes a generalization of the linear auto-associator in which units (i.e., cells) can be of differential importance, or can be non-independent or can have a bias. The stimuli to be stored in the memory can have also a differential importance (or can be non-independent). The constraints expressing response bias and differential importance of stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule is applied to the weight matrix in a generalized form which takes the bias and the differential importance constraints into account to compute the error. Conditions for the convergence of the learning rule are examined and convergence is shown to be dependent only on the ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix. The maximal responses of the memory correspond to generalized eigenvectors, these vectors are biased-orthogonal (i.e., they are orthogonal after the response bias is implemented). It is also shown that (with an appropriate choice of matrices for response biais and differential importance), the generalized auto-associator is able to implement the general linear model of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical correlation analysis, generalized principal component analysis, etc.) Applications and Monte Carlo simulation of the generalized auto-associator dealing with face processing will be presented and discussed. On the Optimization of a Synaptic Learning Rule Samy Bengio, Universit de Montral Yoshua Bengio, Massachusetts Institute of Technology Jocelyn Cloutier, Universit de Montral Jan Gecsei, Universit de Montral This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has local inputs, and is the same for many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Initial experiments can be found in [1, 2]. References [1] Bengio, Y. & Bengio, S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universit de Montral. [2] Bengio Y., Bengio S., & Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Complexity Optimized Data Clustering by Competitive Neural Networks Joachim Buhmann, Lawrence Livermore National Laboratory Hans Khnel, Technische Universitt Mnchen Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy which explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their occupation probabilities. An iterative version of complexity optimized clustering is imple- mented by an artificial neural network with winner-take-all connectivity. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrainted vector quantization or topological feature maps and competitive neural networks. Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Sylvia Candelaria de Ram, New Mexico State University Context-sensitivity and rapidity of communication are two things that become ecological essentials as cognition advances. They become ``optimals'' as cognition develops into something elaborate, long-lasting, flexible, and social. For successful operation of language's default speech/gesture mode, articulation response must be rapid and context-sensitive. It does not follow that all linguistic cognitive function will or can be equally fast or re-equilibrating. But it may follow that articulation response mechanisms are specialized in different ways than those for other cognitive functions. The special properties of the varied mechanisms would then interact in language use. In actuality, our own architecture is of this sort [1,2,3,4]. Major formative effects on our language, society, and individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift (and adaptability) and hypercorrection effects (mitigated by emotion), so that we have multitudes of distinct but related languages and speech communities. Consider modelling the variety of co-functioning mechanisms for utterance and gesture articulation, interpretation, monitoring and selection. Wherein lies the source of the differing function in one and another mechanism? Suppose [parts of] mechanisms are treated as parts of a multi-layered, temporally parallel, staged architecture (like ours). The layers may be inter-connected selectively [6]. Any given portion may be designed to deal with particular sorts of excitation [7,8,9]. A multi-level belief/knowledge logic enhanced for such structures [10] has properties extraordinary for a logic, properties which point up some critical features of ``neural nets'' having optimization properties pertinent to intelligent, interactive systems. References [1] Candelaria de Ram, S. (1984). Genesis of the mechanism for sound change as suggested by auditory reflexes. Linguistic Association of the Southwest, El Paso. [2] Candelaria de Ram, S. (1988). Neural feedback and causation of evolving speech styles. New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces mathmatiques, Montreal, October. [3] Candelaria de Ram, S. (1989). Sociolinguistic style shift and recent evidence on `prese- mantic' loci of attention to fine acoustic difference. New Ways of Analyzing Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC), Durham, NC, October. [4] Candelaria de Ram, S. (1991b). Language processing: mental access and sublanguages. Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept. 1991. [5] Candelaria de Ram, S. (1990b). The sensory basis of mind: feasibility and functionality of a phonetic sensory store. [Commentary on R. Ntnen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236. [6] Candelaria de Ram, S. (1990c). Sensors & concepts: Grounded cognition. Working Session on Algebraic Approaches to Problem Solving and Representation, June 27-29, Briarcliff, NY. [7] Candelaria de Ram, S. (1990a). Belief/knowledge dependency graphs with sensory groundings. Third Int. Symp. on Artificial Intelligence Applications of Engineering Design and Manufacturing in Industrialized and Developing Countries, Monterrey, Mexico, Oct. 22-26, pp. 103-110. [8] Candelaria de Ram, S. (1991a). From sensors to concepts: Pragmasemantic system constructivity. Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91, Sophia-Antipolis, France, April 22-24. Also in Knowledge Modeling and Expertise Transfer, IOS Publishing, Paris, 1991, pp. 433-448. [9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989). Reasoning using inheritance from a mixture of knowledge and beliefs. In S. Ramani, R. Chandrasekar, & K.S.R. Anjaneylu (Eds.), Knowledge Based Computer Systems. Delhi: Narosa, pp. 387-396; republished by Vedams Books International, New Delhi, 1990. Also in Lecture Notes in Computer Science series No. 444, Springer-Verlag, 1990. [10] Candelaria de Ram, S. (1991c). Why to enter into dialogue is to come out with changed speech: Cross-linked modalities, emotion, and language. Second Invitational Venaco Workshop and European Speech Communication Association Tutorial and Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept. 16-20, 1991. Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Gail Carpenter, Boston University A neural network architecture for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors will be described. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, response, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator () and the MAX operator () of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights corresponds to increasing sizes of category "boxes." Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; (iv) a letter recognition database; and (v) a medical database. Properties of Optimality in Neural Networks Mark DeYong and Thomas Eskridge, New Mexico State University This presentation discusses issues concerning optimality in neural and cognitive functioning. We discuss these issues in terms of the tradeoffs they impose on the design of neural network systems. We illustrate the issues with example systems based on a novel VLSI neural processing element developed, fabricated, and tested by the first author. There are four general issues of interest:  Biological Realism vs. Computational Power. Many implementations of neurons sacrifice computational power for biological realism. Biological realism imposes a set of constraints on the structure and timing of certain operations in the neuron. Taken as an absolute requirement, these constraints, though realistic, reduce the computational power of individual neurons, and of systems built on those neurons. However, to ignore the biological characteristics of neurons is to ignore the best example of the type of device we are trying to implement. In particular, simple non-biologically inspired neurons perform a completely different style of processing than biologically inspired ones. Our work allows for biological realism in areas where it increases computational power, while ignoring the low-level details that are simply by-products of organic systems.  Task-Specific Architecture vs. Uniform Element, Massive Parallelism. A central issue in developing neural network systems is whether to design networks specific to a particular task or to adapt a general-purpose network to accomplish the task. Developing task- specific architectures allows for small, fast networks that approach optimality in performance, but require more effort during the design stage. General-purpose architectures approach optimality in design that merely needs to be adapted via weight modifications to a new problem, but suffer from performance inefficiencies due to unneeded and/or redundant nodes. Our work hypothesizes that task-specific architec- tures that use a building-block approach combined with fine-tuning by training will produce the greatest benefits in the tradeoff between design and performance optimality.  Time Independence vs. Time Dependence. Many neural networks assume that each input vector is independent of other inputs, and the job of the neural network is to extract patterns within the input vector that are sufficient to characterize it. For problems of this type, a network that assumes time independence will provide acceptable performance. However, if the input vectors cannot be assumed to be independent, the network must process the vector with respect to its temporal characteristics. Networks that assume time independence have a variety of well-known training and performance algorithms, but will be unwieldy when applied to a problem in which time independence does not hold. Although temporal characteristics can be converted into levels, there will be a loss of information that may be critical to solving the problem efficiently. Networks that assume time dependence have the advantage of being able to handle both time dependent and time independent data, but do not have well known, generally applicable training and performance algorithms. Our approach is to assume time dependence, with the goal of handling a larger range of problems rather than having general training and performance methods.  Hybrid Implementation vs. Analog or Digital Only. The optimality of hardware implementations of neural networks depends in part on the resolution of the second tradeoff mentioned above. Analog devices generally afford faster processing at a lower hardware overhead than digital, whereas digital devices provide noise immunity and a building-block approach to system design. Our work adopts a hybrid approach where the internal computation of the neuron is implemented in analog, and the extracellular communication is performed digitally. This gives the best of both worlds: the speed and low hardware overhead of analog and the noise immunity and building-block nature of digital components. Each of these issues has individual ramifications for neural network design, but optimality of the overall system must be viewed as their composite. Thus, design decisions made in one area will constrain the decisions that can be made in the other areas. Putting Optimality in its Place: Arguments on Context, Systems and Neural Networks Wesley Elsberry, Battelle Research Laboratories Determining the "optimality" of a particular neural network should be an exercise in multivariate analysis. Too often, performance concerning a narrowly defined problem has been accepted as prima facie evidence that some ANN architecture has a specific level of optimality. Taking a cue from the field of genetic algorithms (and the theory of natural selection from which GA's are derived), I offer the observation that optimality is selected in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the system of which it is a part. The context in which the evaluation of optimality is performed will influence the results of that evaluation greatly. While compartmentalized and specialized tests of ANN performance can offer insights, the construction of effective systems may require additional consideration to be given to the assumptions of such tests. Many benchmarks and other tests assume a static problem set, while many real-world applications offer dynamical problems. An ANN which performs "optimally" in a test may perform miserably in a putatively similar real-world application. Recognizing the assumptions which underlie evaluations is important for issues of optimal system design; recognizing the need for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is critical to proper placement of priority given to optimality of ANN's. Identifying a Neural Network's Computational Goals: A Statistical Optimization Perspective Richard M. Golden, University of Texas at Dallas The importance of identifying the computational goal of a neural network computation is first considered from the perspective of Marr's levels of descriptions theory and Simon's theory of satisficing. A "statistical optimization perspective" is proposed as a specific implementation of the more general theories of Marr and Simon. The empirical "testability" of the "statistical optimization perspective" is also considered. It is argued that although such a hypothesis is only occasionally empirically testable, such a hypothesis plays a fundamental role in understanding complex information processing systems. The usefulness of the above theoretical framework is then considered with respect to both artificial neural networks and biological neural networks. An argument is made that almost any artificial neural networks may be viewed as optimizing a statistical cost function. To support this claim, the large class of back-propagation feed-forward artificial neural networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed as optimizing specific statistical cost functions. Specific statistical tests for deciding whether the statistical environment of the neural network is "compatible" with the statistical cost function the network is presumably optimizing are also proposed. Next, some ideas regarding the applicability of such analyses to much more complicated artificial neural networks which are "closer approximations" to real biological neural networks will also be discussed. Vector Associative Maps: Self- organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-motor Control Stephen Grossberg, Boston University This talk describes a new class of neural models for unsupervised error-based learning. Such a Vector Associative Map, or VAM, is capable of autonomously calibrating the spatial maps and arm trajecgtory parameters used during visually guided reaching. VAMs illustrate how spatial and motor representations can self-organize using a unified computational format. They clarify how an autonomous agent can build a self-optimizing hierarchy of goal-oriented actions based upon more primitive, endogenously generated exploration of its environment. Computational properties of ART and VAM systems are complementary. This complementarity reflects different processing requirements of sensory- cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be used to design an autonomous behavioral agent. Problem Solving in a Connectionistic World Model Steven Hampson, University of California at Irvine Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S) models of problem solving are central to animal learning theory. When applicable, the procedural S-R and S-E models can be quite space efficient, as they can potentially learn compact generalizations over the functions they are taught to compute. On the other hand, learning these generalizations can be quite time consuming, and adjusting them when conditions change can take as long as learning them in the first place. In contrast, the S-S model developed here does not learn a particular input-to-output mapping, but simply records a series of "facts" about possible state transitions in the world. This declarative world model provides fast learning, easy update and flexible use, but is space expensive. The procedural/declarative distinction observed in biological behavior suggests that both types of mechanisms are available to an organism in its attempts to balance, if not optimize, both time and space requirements. The work presented here investigates the type of problems that are most effectively addressed in an S-S model. Efficient Optimising Dynamics in a Hopfield-style Network Arun Jagota, State University of New York at Buffalo Definition: A set of vertices (nodes, points) of a graph such that every pair is connected by an edge (arc, line) is called a clique. An example in the ANN context is a set of units such that all pairs are mutually excitatory. The following description applies to optimisation issues in any problems that can be modeled with cliques. Background I: We earlier proposed a variant (essentially a special case) of the Hopfield network which we called the Hopfield-Style Network (HSN). We showed that the stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of a local minimum (stable state) is directly proportional (although not linearly) to the size of the corresponding clique. Any graph can be made the underlying graph of HSN. These three facts suggest that HSN with suitable optimising dynamics can be applied to the CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given graph''. Background II: The CLIQUE problem is NP-Complete, suggesting that it is most likely intractable. Recent results from Computer Science suggest that even approximately solving this problem is probably hard. Researchers have shown that on most (random) graphs, however, it can be approximated fairly well. The CLIQUE problem has many applications, including (1) Contentaddressable memories can be modeled as cliques. (2) ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem. Many problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain object recognition problems in Computer Vision can be modeled as the CLIQUE problem. Given an image object A and a reference object B, one problem is to find a sub-object of B which ``matches'' A. This can be represented as the CLIQUE problem. Abstract: We will present details of the modeling of optimisation problems related to those described in Background II (and perhaps others) as the CLIQUE problem. We will discuss how HSN can be used to obtain optimal or approximate solutions. In particular, we will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation capabilities, and present theoretical and/or empirical evidence for such. The dynamics are, Discrete: (1) Steepest gradient-descent (2) rho-annealing. Continuous: (3) Mean-field annealing. We will discuss characterising properties of these dynamics including, (1) emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from statistical mechanics and has gained wide attention for its optimisation properties. We will also discuss the continuous Hopfield network dynamics as a special case of (3). State Generators and Complex Neural Memories Subhash C. Kak, Louisiana State University The mechanism of self-indexing for feedback neural networks that generates memories from short subsequences is generalized so that a single bit together with an appropriate update order suffices for each memory. This mechanism can explain how stimu- lating an appropriate neuron can then recall a memory. Although the information is distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. Also a new complex valued neuron model is presented to generalize McCulloch-Pitts neurons. There are aspects to biological memory that are distributed [2] and others that are localized [3]. In the currently popular artificial neural network models the synaptic weights reflect the stored memories, which are thus distributed over the network. The question then arises when these models can explain Penfield's observations on memory localization. This paper shows that memory localization. This paper shows that such a memory localization does occur in these models if self-indexing is used. It is also shown how a generalization of the McCulloch-Pitts model of neurons appears essential in order to account for certain aspects of distributed information processing. One particular generalization, described in the paper, allows one to deal with some recent findings of Optican & Richmond [4]. Consider the model of the mind where each mental event corresponds to some neural event. Neurons that deal with mental events may be called cognitive neurons. There would be other neurons that simply compute without cognitive function. Consider now cognitive neurons dealing with sensory input that directly affects their behaviour. We now show that independent cognitive centers will lead to competing behaviour. Even non-competing cognitive centres would show odd behaviour since collective choice is associated with non-transitive logic. This is clear from the ordered choice paradox that occurs for any collection of cognitive individuals. This indicates that a scalar energy function cannot be associated with a neural network that performs logical processing. Because if that were possible then all choices made by a network could be defined in an unambiguous hierarchy, with at worst more than one choice having a particular value. The question of cyclicity of choices, as in the ordered choice paradox, will not arise. References [1] Kak, S.C. (1990a). Physics Letters A 143, 293. [2] Lashley, K.S. (1963). Brain Mechanisms and Learning. New York: Dover). [3] Penfield, P. & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton University Press). [4] Optican, L.M. & Richmond, B.J. (1987). J. Neurophysiol. 57, 162. Don't Just Stand There, Optimize Something! Daniel Levine, University of Texas at Arlington Perspectives on optimization in a variety of disciplines, including physics, biology, psychology, and economics, are reviewed. The major debate is over whether optimization is a description of nature, a normative prescription, both or neither. The presenter leans toward a belief that optimization is a normative prescription and not a description of nature. In neural network theory, the attempt to explain all behavior as the optimization of some variable (no matter how tortuously defined the variable is!) has spawned some work that has been seminal to the field. This includes both the "hedonistic neuron" theory of Harry Klopf, which led to some important work in conditioning theory and robotics, and the "dynamic programming" of Paul Werbos which led to back propagation networks. Yet if all human behavior is optimal, this means that no improvement is possible on wasteful wars, environmental destruction, and unjust social systems. The presenter will review work on the effects of frontal lobe damage, specifically the dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty, and describe these effects as prototypes of non-optimal cognitive function. It can be argued (David Stork, personal communication) that lesion effects do not demonstrate non-optimality because they are the result of system malfunction. If that is so, then such malfunction is far more pervasive than generally believed and is not dependent on actual brain damage. Architectural principles such as associative learning, lateral inhibition, opponent processing, and resonant feedback, which enable us to interact with a complex environment also sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980). Even intact frontal lobes do not always perform their executive function (Pribram, 1991) with optimal efficiency. References Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. Pribram, K. (1991). Brain and Perception. Erlbaum. For What Are Brains Striving? Gershom-Zvi Rosenstein, Hebrew University My aim is to outline a possibility of a unified approach to several yet unsolved problems of behavioral regulation, most of them related to the puzzle of schizophrenia. This Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only recently in the book of the present author [1]. One of the main problems the approach was applied to is the model of behavior disturbances. The income (the value of the goal-function of our model) is defined, by assumption, on the intensities of streams of impulses directed to the reward system. The incomd can be accumulated and spent on different activites of the model. The choices done by the model depend on the income they are expected to bring. Now the ICA is applied to the following problems: The catecholamine distribution change (CDC) in the schizophrenic brain. I try to prove the idea that CDC is caused by the same augmented (in comparison with the norm) stimulation of the reward system that was proposed by us earlier as a possible cause for the behavioral disturbance. The role of dopamine in the brain processing of information is discussed. The dopamine is seen as one of the forms of representation of income in the brain. The main difference between the psychology of "normal" and schizophrenic subjects, according to many researchers, is that in schizophrenics "observations prevail over expectations." This property can be shown to be a formal consequence of our model. It was used earlier to describe the behavior of schizophrenics versus normal people in delusion formation (as Scharpantje delusion, etc.). ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions were accepted. A hypothesis about the nature of stereotypes as an adjunctive type of behavior is proposed. They are seen as behaviors concerned not with the direct physiological needs of the organism but with the regulation of activity of its reward system. The proposition can be tested partly in animal experiments. The problem of origination of so-called "positive" and "negative" symptoms in schizophrenia is discussed. The positive symptoms are seen as attempts and sometimes means to produce an additional income for the brain whose external sources of income are severely limited. The negative symptoms are seen as behaviors chosen in the condition whereby the quantity of income that can be used to provide these behaviors is small and cannot be increased. The last part of the presentation is dedicated to the old problem of the realtionship between "genius" and schizophrenia. It is a continuation of material introduced in [1]. The remark is made that the phenomenon of uric acid excess thought by some investigators too be connected to high intellectual achievement can be realted to the uric acid excess found to be produced by augmented stimulation of the reward system in the self-stimulation paradigm. References [1] Rosenstein, G.-Z. (1991). Income and Choice in Biological Systems. Lawrence Erlbaum Associates. Non-optimality in Neurobiological Systems David Stork, Ricoh California Research Center I will in tow ways argue strongly that neurobiological systems are "non-optimal." I note that "optimal" implies a match between some (human) notion of function (or structure,...) and the implementation itself. My first argument addresses the dubious approach which tries to impose notions of what is being optimized, i.e., stating what the desired function is. For instance Gabor-function theorists claim that human visual receptive fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the spatial-frequency domains [1]. I demonstrate how such bandwidth notions have an implied measure, or metric, of localization; I examine the implied metric and find little or no justification for preferring it over any of a number of other plausible metrics [2]. I also show that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than what is implied by the Gabor approach; thus the Gabor approach makes this important fact hard to understand. Then I review aruments of others describing visual receptive fields as being "optimally" tuned to visual gratings [3], and show that here too an implied metric is unjustified [4]. These considerations lead to skepticism of the general appoach of imposing or guessing the actual "true" function of neural systems, even in specific mathematical cases. Only in the most compelling cases can the function be stated confidently. My second criticism of the notion of optimality is that even if in such extreme cases the neurobiological function is known, biological systems generally do not implement them in an "optimal" way. I demonstrate this for a non-optimal ("useless") synapse in the crayfish tailflip circuit. Such non-optimality can be well explained by appealing to the process of preadaptation from evolutionary theory [5,6]. If a neural circuit (or organ, or behavior ...) which evolves to solve one problem is later called upon to solve a different problem, then the evolving circuit must be built upon the structure appropriate to the previous task. Thus, for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood as a holdover from a previous evolutionary epoch in which the circuit was used instead for swimming.. In other words, evolutionary processes are gradual, and even if locally optimal (i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. (This is analogous to local minima that plague some gradient-descent methods in mathematics.) Such an analysis highlights the role of evolutionary history in understanding the structure and function of current neurobiological systems, and along with our previous analysis, strongly argues against optimality in neurobiological systems. I therefore concur with the recent statement that in neural systems "... elegance of design counts for little." [7] References [1] Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160-1169. [2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J. Opt. Soc. Am. A 7, 1362-1373. [3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980). Visual cortical neurons: Are bars or gratings the optimal stimuli? Science 207, 88-90. [4] Stork, D. G. & Levinson, J. Z. (1982). Receptive fields and the optimal stimulus. Science 216, 204-205. [5] Stork, D. G., Jackson, B., & Walker, S. (1991). "Non-optimality" via preadaptation in simple neural systesm. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II. Addison-Wesley and Santa Fe Institute, pp. 409-429. [6] Stork, D. G. (1992, in press). Preadaptation and principles of organization in organisms. In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. Addison-Wesley and Santa Fe Institute. [7] Dumont, J. P. C. & Robertson, R. M. (1986). Neuronal circuits: An evolutionary perspective. Science 233, 849-853. Why Do We Study Neural Nets on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Harold Szu, Naval Surface Warfare Center Neural nets, natural or artificial, are structure by the desire to accomplish certain information processing goals. An example of this occurs in the celebrated exclusive-OR computation. "You are as intelligent as you can hear and see," according to an ancient Chinese saying. Thus, the mismatch between human-created sensor data used for input knowledge representation and the nature-evolved brain-style computing architectures is one of the major impediments for neural net applications. After a review of classical neural nets with fixed layered architectures and small- perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one of the most challenging frontiers the self-architecturing of neural nets. The singlet and pair correlation functions can be measured to define a hairy neuron model. The minimum set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form a selective network. Then, the convergence proof for self-architecturing hairy neurons will be given. A more powerful tool, however, is the wavelet transform, an adaptive wide-band Fourier analysis developed in 1985 by French oil explorers. This transform goes beyond the (preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a noisy environment. The utility of wavelets in brain-style computing can be recognized from two observations. First, the "cocktail party effect," namely, you hear what you wish to hear, can be explained by the wavelet matched filter which can achieve a tremendous bandwidth noise reduction. Second, "end-cut" contour filling may be described by Gibbs overshooting in this wavelet manner. In other words, wavelets form a very natural way of describing real scenes and real signals. For this reason, it seems likely that the future of neural net applica- tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet" that is most appropriate for a specific dynamic input-output medium. Optimal Generalisation in Artificial Neural Networks Graham Tattersall, University of East Anglia A key property of artificial neural networks is their ability to produce a useful output when presented with an input of previously unseen data even if the network has only been trained on a small set of examples of the input-output function underlying the data. This process is called generalisation and is effectively a form of function completion. ANNs such as the MLP and RBF sometimes appear to work effectively as generalisers on this type of problem, but there is now widespread recognition that the form of generalisation which arises is very dependent on the architecture of the ANN, and is often completely inappropriate, particularly when dealing with symbolic data. This paper will argue that generalisation should be done in such a way that the chosen completion is the most probable and is consistent with the tranining examples. These criteria dictate that the generalisation should not depend in any way upon the architecture or functionality of components of the generalising system, and that the generalisation will depend entirely on the statistics of the training exemplars. A practical method for generalising in accordance with the probability and consistency criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship between entropy and spatial bandwidth. The usefulness of this approach can be demonstrated using a number of binary data functions which contain both first and higher order structure. However, this work has shown very clearly that, in the absence of an architecturally imposed generalisation strategy, many function completions are equally possible unless a very large proportion of all possible function domain points are contained in the training set. It therefore appears desirable to design generalising systems such as neural networks so that they generalise, not only in accordance with the optimal generalisation criteria of maximum probability and training set consistency, but also subject to a generalisation strategy which is specified by the user. Two approaches to the imposition of a generalisation strategy are described. In the first method, the characteristic autocorrelation function or functions belonging to a specified family are used as the weight set in a Kosko net. The second method uses Wiener Filtering to remove the "noise" implicit in an incomplete description of a function. The transfer function of the Wiener Filter is specific to a particular generalisation strategy. E-mail Addresses of Presenters Abdi abdi at utdallas.edu Bengio bengio at iro.umontreal.ca Bhaumik netearth!bhaumik at shakti.ernet.in Buhmann jb at s1.gov Candelaria de Ram sylvia at nmsu.edu Carpenter gail at park.bu.edu Chance u0503aa at vms.ucc.okstate.edu DeYong mdeyong at nmsu.edu Elsberry elsberry at hellfire.pnl.gov Golden golden at utdallas.edu Grossberg steve at park.bu.edu Hampson hampson at ics.uci.edu Jagota jagota at cs.buffalo.edu Johnson ecjdj at nve.mcsr.olemiss.edu Kak kak at max.ee.lsu.edu Leven (reach via Pribram, see below) Levine b344dsl at utarlg.uta.edu Ogmen elee52f at jetson.uh.edu Parberry ian at hercule.csci.unt.edu Pribram kpribram at ruacad.runet.edu Prueitt prueitt at guvax.georgetown.edu Rosenstein NONE Stork stork at crc.ricoh.com Szu btelfe at bagheera.nswc.navy.mil Tattersall ? From uh311ae at sunmanager.lrz-muenchen.de Tue Jan 21 16:49:41 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 21 Jan 92 22:49:41+0100 Subject: CascadeC variants, Sjogaard's paper Message-ID: <9201212149.AA01642@sunmanager.lrz-muenchen.de> Steen Sjogaard makes some interesting observations about the Cascade Corre- lation algorithm. He invents a variation of it, named 2CCA, which is using only two layers, but also relies on freezing weights, error covariance, candidate pools etc. as the CCA does. The whole purpose of this difference in the dynamic network construction strategy is to increase the network's ability to generalize. He invents a meaningful classification problem (the 'three disc'-problem) and goes through some lengths to present 2CCA's su- periority over CCA when confronted with the benchmark problem (specifically, 2CCA generalizes better by getting more test patterns right). Now, the prob- lem is that he also says that, even after creation of 100 hidden units, 2CCA only classifies only half of the points of the 'Two-Spirals-Problem' right, which is not better than chance. Everybody who has ever tried to solve the 'Two-Spirals' with _any_ algorithm knows how nasty it is and how good the CCA solution (as presented in Fahlmann's paper) really is. In this light, it is obviously not easy to accept Sjogaard's claims. However, I think that Sjogaard is making some good points, and I would like to hear your opinion: a) Is a solution that employs mostly low-order feature detectors (i.e., has _few_ and _populated_ hidden layers) typically generalizing better than one that uses fewer high-order ones ? b) How is it to be decided when to add a new hidden unit to an existing layer versus putting it into a new one ? Simply creating a second candi- date pool doesn't do the job: A new-layer hidden unit should _always_ do at least as good as an extra old-layer hidden one, so simple covariance comparison does not work. Use an 'allowable error difference' term ? Use test patterns to check generalization ? Return to a fixed architec- ture ? c) Would the addition of a little noise make HCCA perform even better ? I haven't heard of an experiment in this direction, but I suspect this is based on quickprop's relative dislike of changing training sets, not on fundamental issues. (HCCA = Sjogaard's term for 'high-order-CCA' = CCA). d) How to write a k-CCA, a 'in-between-order' CC algorithm ? e) If you read Sjogaard's neuroprose paper, did you like his formalization of generalization ? (I thought it is insightful) Cheers, Henrik Klagges IBM Research rick at vee.lrz-muenchen.de @ henrik at mpci.llnl.gov PS: I hope Steen gets a k-CCA paper out soon 8-) ! From david at cns.edinburgh.ac.uk Tue Jan 21 04:10:08 1992 From: david at cns.edinburgh.ac.uk (David Willshaw) Date: Tue, 21 Jan 92 09:10:08 GMT Subject: Contents of NETWORK - Vol. 3, no 1 Feb 1992 Message-ID: <24040.9201210910@subnode.cns.ed.ac.uk> --------------------------------------------------------------------- CONTENTS OF NETWORK - COMPUTATION IN NEURAL SYSTEMS Volume 3 Number 1 February 1992 Proceedings of the First Irish Neural Networks Conference held at The Queen's University of Belfast, 21 June 1991. Proceedings Editor: G Orchard PAPERS 1 Sharing interdisciplinary perspectives on neural networks: the First Irish Neural Networks Conference G ORCHARD 5 On the proper treatment of eliminative connectionism S MILLS 15 Cervical cell image inspection - a task for artificial neural networks I W RICKETTS 19 A geometric interpretation of hidden layer units in feedforward neural networks J MITCHELL 27 Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feedforward neural networks with backward error propagation J A KINSELLA 37 A connectionist technique for on-line parsing R REILLY 47 Are artificial neural nets as smart as a rat? T SAVAGE & R COWIE 61 The principal components of natural images P J B HANCOCK, R J BADDELEY & L S SMITH 71 Information processing and neuromodulation in the visual system of frogs and toads P R LAMING 89 Neurodevelopmental events underlying information acquisition and storage E DOYLE, P NOLAN, R BELL & C M REGAN 95 ABSTRACTS SECTION 97 BOOK REVIEWS Network welcomes research Papers and Letters where the findings have demonstrable relevance across traditional disciplinary boundaries. Research Papers can be of any length, if that length can be justified by content. Rarely, however, is it expected that a length in excess of 10,000 words will be justified. 2,500 words is the expected limit for research Letters. Articles can be published from authors' TeX source codes. Macros can be supplied to produce papers in the form suitable for refereeing and for IOP house style. For more details contact the Editorial Services Manager at IOP Publishing, Techno House, Redcliffe Way, Bristol BS1 6NX, UK. Telephone: 0272 297481 Fax: 0272 294318 Telex: 449149 INSTP G Email Janet: IOPPL at UK.AC.RL.GB Subscription Information Frequency: quarterly Subscription rates: Institution 125.00 pounds (US$220.00) Individual (UK) 17.30 pounds (Overseas) 20.50 pounds (US$37.90) A microfiche edition is also available at 75.00 pounds (US$132.00) From poggio at atr-hr.atr.co.jp Wed Jan 22 02:00:50 1992 From: poggio at atr-hr.atr.co.jp (Poggio) Date: Wed, 22 Jan 92 16:00:50 +0900 Subject: NIPS preprint in neuroprose In-Reply-To: becker@ai.toronto.edu's message of Thu, 16 Jan 1992 14:44:36 -0500 <92Jan16.144445edt.10@neuron.ai.toronto.edu> Message-ID: <9201220700.AA02064@atr-hr.atr.co.jp> isbell at ai.mit.edu From uh311ae at sunmanager.lrz-muenchen.de Wed Jan 22 06:07:34 1992 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 22 Jan 92 12:07:34+0100 Subject: CascadeC in parallel Message-ID: <9201221107.AA02532@sunmanager.lrz-muenchen.de> It is less than obvious how to parallelize Cascade Correlation, especially if compared to stuff like backprop (which fits easily on specialized 'kill- them-all' machines). If one tries to decompose the CC-computation onto neurons, it is a quick failure: It is not obvious to do any better than training the candidate pool in parallel. This suggest that the problem should be decomposed into another unit. I am to foggy at this time of the working day, however, to see another clever decomposition unit that would be homogenous enough to fit on a SIMD architecture. CC abandons the inter- hidden-unit parallelism by training only one at a time. It also kills of most of the feeding calculation via freezing and cacheing. So, where does the long vector SIMD machine fit into the picture ? It can't be a unipro- cessor-RISC-only algorithm like 'Boot Unix on the gate-level simulator'. Of course I could do 10 complete networks in parallel, but ... Cheers, Henrik From B344DSL at utarlg.uta.edu Wed Jan 22 12:03:00 1992 From: B344DSL at utarlg.uta.edu (B344DSL@utarlg.uta.edu) Date: Wed, 22 Jan 1992 11:03 CST Subject: Last notice and registration form for conference Feb. 6-8, UT Dallas Message-ID: <01GFM4NT9W5C00037X@utarlg.uta.edu> The program and abstracts I sent for the Optimality conference Feb. 6-8, I believe, did not include a registration form. I am sorry for the error: since I have already sent out two mailings to Connectionists on this con- ference, this is the last general mailing I will send on it. Anybody desi- ring more information, e.g., abstracts that weren't included earlier, should contact me individually (my e-mail address is at the end of the registration form which I am including in this notice.) Hope to see some of you there. Dan Levine REGISTRATION FOR CONFERENCE ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? FEBRUARY 6 TO 8, 1992, UNIVERSITY OF TEXAS AT DALLAS Sponsored by Metroplex Institute for Neural Dynamics (MIND), Texas SIG of International Neural Network Society (INNS), and the University of Texas at Dallas Name _______________________________________________________ LAST FIRST MIDDLE Mailing Address ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ Affiliation ________________________________________________ Telephone Number ____________________ e-mail if any ____________________ FAX if any ____________________ Registration fee (please enclose check payable to MIND): Non-student members of MIND or INNS, $70 _______ or UTD faculty or staff Other non-students $80 _______ Student members of MIND or INNS, $10 _______ or UTD students Other students $20 _______ Presenters (oral or poster) from outside Dallas-Ft. Worth FREE _______ (Note: Registration does not include meals) Hotel: Please check if you need a reservation card ______ (Rooms at the Richardson Hilton are $59 a night) Please check if you wish to share a room ______ Reduced fares are available to Dallas-Fort Worth on American Airlines. Call the airline and ask for StarFile S14227D, under the name of MIND. Preregistrants whose forms and payment checks are received by January 31 will be mailed a preregistration package with a confirmation. This will include a complete schedule with times of presentations and directions to the hotel and conference site. Please send this form to: Professor Daniel S. Levine Department of Mathematics Box 19408 University of Texas at Arlington Arlington, TX 76019-0408 Office: 817-273-3598; FAX: 817-794-5802; e-mail b344dsl at utarlg.uta.edu Conference program(still subject to minor change): ORAL PRESENTATIONS -- Thursday, Feb. 6, AM: Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something! Samuel Leven, Radford U. -- Man as Machine? Conflicting Optima, Dynamic Goals, and Hope Wesley Elsberry, Battelle Research Labs -- Putting Optimality in its Place: Argument on Context, Systems, and Neural Networks Graham Tattersall, U. of East Anglia -- Optimal Generalisation in Artificial Neural Networks Thursday, Feb. 6, PM: Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals: a Statistical Optimization Perspective Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Network Formations on VLSI Chips and Why Are Wavelets More Natural for Brain-Style Computing? Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network Friday, Feb. 7, AM: Gershom Rosenstein, Hebrew U. -- For What are Brains Striving? Gail Carpenter, Boston U. -- Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning Stephen Grossberg, Boston U. -- Vector Associative Maps: Self-Organizing Neural Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control Haluk Ogmen, U. of Houston -- Self-Organization via Active Exploration in Robotics Friday, Feb. 7, PM: David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems Ian Parberry, U. of North Texas -- Neural Networks and Computational Complexity David Chance, Central Oklahoma U. -- Real-time Neuronal Models Compared Within a Classical Conditioning Framework Samy Bengio, Universite de Montreal -- On the Optimization of a Synaptic Learning Rule Saturday, Feb. 8, AM: Karl Pribram, Radford U. -- The Least Action Principle: Does it Apply to Cognitive Processes? Paul Prueitt, Georgetown U. -- Control Hierarchies and the Return to Homeostasis Herve Abdi, U. of Texas, Dallas -- Generalization of the Linear Auto-Associator Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural Language and the Treatment of Specialized Function Saturday, Feb. 8, PM: Panel discussion on the basic themes of the conference POSTERS Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for Determining Subjective Contours John Johnson, U. of Mississippi -- The Genetic Adaptive Neural Network Training Algorithm for Generic Feedforward Artificial Neural Systems Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories Brian Telfer, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural Classifiers From jm2z+ at andrew.cmu.edu Wed Jan 22 12:25:25 1992 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Wed, 22 Jan 1992 12:25:25 -0500 (EST) Subject: TRs announcemnt Message-ID: **** DO NOT FORWARD TO OTHER GROUPS **** We have recently produced two technical reports, the first in a new series devoted to issues in PDP and Cognitive Neuroscience. They are described below, followed by instructions for obtaining copies. -------------------------------------------------------------- TOWARD A THEORY OF INFORMATION PROCESSING IN GRADED, RANDOM, INTERACTIVE NETWORKS James L. McClelland Technical Report PDP.CNS.91.1 A set of principles for information processing in parallel distributed processing systems is described. In brief, the principles state that processing is graded and gradual, interactive and competitive, and subject to intrinsic variability. Networks that adhere to these principles are called GRAIN networks. Four goals of a theory of information processing based on these principles are described: 1) to unify the asymptotic accuracy, reaction-time, and time accuracy paradigms; 2) to examine how simple general laws might emerge from systems conforming to the principles and to predict and/or explain cases in which the general laws do not hold; and 3) to explore the explanatory role of the various principles in different aspects of observed empirical phenomena. Two case studies are presented. In the first, a general regularity here called Morton's independence law of the joint effects of context and stimulus information on perceptual identification is shown to be an emergent property of Grain networks that obey a specific architectural constraint. In the second case study, the general shape of time-accuracy curves produced by networks that conform to the principles is examined. A regularity here called Wickelgren's law, concerning the approximate shape of time accuracy curves, is found to be consistent with some GRAIN networks. While the exact conditions that give rise to standard time accuracy curves remain to be established, one observation is made concerning conditions that can lead to a strong violation of Wickelgren's law, and an experimental study that can be interpreted as meeting this condition is simulated. In both case studies, the joint dependency of the emergent characteristics of information processing on the different principles is considered. --------------------------------------------------------------- LEARNING CONTINUOUS PROBABILITY DISTRIBUTIONS WITH THE CONTRASTIVE HEBBIAN ALGORITHM Javier R. Movellan & James L. McClelland Technical Report PDP.CNS.91.2 We show that contrastive Hebbian learning (CHL), a well known learning rule previously used to train Boltzmann machines and Hopfield models, can also be used to train networks that adhere to the principles of graded, random, and interactive propagation of information. We show that when applied to such networks, CHL performs gradient descent on a contrastive function which captures the difference between desired and obtained continuous multivariate probability distributions. This allows the learning algorithm to go beyond expected values of output units and to approximate complete probability distributions on continuous multivariate activation spaces. Simulations show that such networks can indeed be trained with the CHL rule to approximate discrete and continuous probability distributions of various types. We briefly discuss the implications of stochasticity in our interpretation of information processing concepts. -------------------------------------------------------------- To Obtain Copies: To minimize printing/mailing costs, we strongly encourage interested readers of this mailing list to get the reports via ftp. The filenames for the reports are * pdp.cns.91.1.ps.Z for the McClelland paper (42 pages, no figures) * pdp.cns.91.2.ps.Z for the Movellan and McClelland paper (42 pages, includes figures). Full instructions are given below. For those who do not have access to ftp, physical copies can be requested from: bd1q+ at andrew.cmu.edu You can also request just the figures of the McClelland paper. In your email please indicate exactly what you are requesting. Figures for the McClelland paper will be sent promptly; physical copies of either complete TR will be sent within a few weeks of receipt of the request. Instructions: To obtain copies via ftp use the following commands: unix> ftp 128.2.248.152 Name: anonymous Password: pdp.cns ftp> cd pub/pdp.cns ftp> binary ftp> get pdp.cns.91.1.ps.Z (or pdp.cns.91.2.ps.Z) ftp> quit unix> uncompress pdp.cns.91.1.ps.Z | lpr ------------------------------------------------------------------------------ If you need further help, please contact me: Javier R. Movellan...... jm2z at andrew.cmu.edu Department of Psychology..... 412/268-5145(voice) Carnegie Mellon University 412/268-5060(Fax) Pittsburgh, PA 15213-3890 - Javier From dlukas at park.bu.edu Wed Jan 22 15:14:27 1992 From: dlukas at park.bu.edu (dlukas@park.bu.edu) Date: Wed, 22 Jan 92 15:14:27 -0500 Subject: Call For Papers: Neural Networks for Learning, Recognition and Control Message-ID: <9201222014.AA21377@fenway.bu.edu> CALL FOR PAPERS International Conference on NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL May 14-16, 1992 Gail A. Carpenter and Stephen Grossberg CONFIRMED SPEAKERS: May 14: R. Shiffrin, R. Ratcliff, D. Rumelhart. May 15: M. Mishkin, L. Squire, S. Grossberg, T. Berger, M. Bear, G. Carpenter, A. Waxman, T. Caudell. May 16: G. Cybenko, E. Sontag, R. Brockett, B. Peterson, D. Bullock, J. Albus, K. Narendra, R. Pap. CONTRIBUTED PAPERS: A featured 3-hour poster session on neural network research related to learning, recognition, and control will be held on May 15, 1992. Attendees who wish to present a poster should submit three copies of an abstract (one single-space page), post-marked by March 1, 1992, for refereeing. Include a cover letter giving the name, address, and telephone number of the corresponding author. Mail to: Poster Session, Neural Networks Conference, Wang Institute of Boston University, 72 Tyng Road, Tyngsboro, MA 01879. Authors will be informed of abstract acceptance by March 31, 1992. A book of lecture and poster abstracts will be given to attendees at the conference. For information about registration and the two neural network tutorial courses being taught on May 9-14, call (508) 649-9731 (x255) or request a meeting brochure in writing when submitting your abstract. From jfj%FRLIM51.BITNET at BITNET.CC.CMU.EDU Thu Jan 23 10:14:16 1992 From: jfj%FRLIM51.BITNET at BITNET.CC.CMU.EDU (jfj%FRLIM51.BITNET@BITNET.CC.CMU.EDU) Date: Thu, 23 Jan 92 16:14:16 +0100 Subject: NNs & NLP Message-ID: <9201231514.AA14149@m53.limsi.fr> Hi. About a month ago, I posted a request for references concerning neural networs and natural language processing. Quite a few people have been kind enough to reply, and I am in the process of compiling a bibliography list. This should be completed soon (I'm still waiting for a few references to arrive by hard-mail), and I'll post the results. If anyone out there hasn't yet replied, and would like to do so, I'll be glad to add their contribution to the list. Thank you for your help, jfj From josh at flash.bellcore.com Thu Jan 23 09:54:10 1992 From: josh at flash.bellcore.com (Joshua Alspector) Date: Thu, 23 Jan 92 09:54:10 -0500 Subject: Postdoctoral position at Bellcore Message-ID: <9201231454.AA13262@flash.bellcore.com> The neural network research group at Bellcore is looking for a post-doctoral researcher for a period of 1 year. The start date is flexible but should be before August, 1992. Because of its inherently parallel nature, neural network technology is particularly suited for the types of real-time computation needed in telecommunications. This includes data compression, signal processing, optimization, and speech and pattern recognition. Neural network training is also suitable for knowledge-based systems where the rules are not known or where there are too many rules to incorporate in an expert system. The goal of our group is to bring neural network technology into the telecommunications network. Our approach to developing and using neural network technology to encode knowledge by learning includes work on the following: 1) Development, analysis, and simulation of learning algorithms and architectures. 2) Design and fabrication of prototype chips and boards suitable for parallel, high-speed, neural systems. 3) Telecommunications applications. We are interested in strong candidates in any of the above work areas but are especially encouraging people who can demonstrate the usefulness of the technology in telecommunications applications. The successful candidate should have a demonstrated record of accomplishment in neural network research, should be proficient at working in a UNIX/C environment, and should be able to work interactively in the applied research area at Bellcore. Apply in writing to: Joshua Alspector Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07962-1910 Please enclose a resume, a copy of a recent paper, and the names, addresses, and phone numbers of three referees. From yoshua at psyche.mit.edu Thu Jan 23 12:55:40 1992 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Thu, 23 Jan 92 12:55:40 EST Subject: Optimizing a learning rule Message-ID: <9201231755.AA15379@psyche.mit.edu> Hello, Recently, Henrik Klagges broadcasted on the connectionists list results he obtained on optimizing synaptic learning rules, citing our last year tech report [1] on this subject. This report [1] did not contain any simulation results. However, since then, we have been able to perform several series of experiments, with interesting results. Early results were presented last year at Snowbird and more recent results will be presented at the Conference on Optimality in Biological and Artificial Networks, to be held in Dallas, TX, Feb.6-9. A preprint can be obtained from anonymous ftp at iros1.umontreal.ca in directory pub/IRO/its/bengio.optim.ps.Z (compressed postscript file) The title of the paper to be presented at Dallas is: On the optimization of a synaptic learning rule by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. Abstract: This paper presents an original approach to neural modeling based on the idea of tuning synaptic learning rules with optimization methods. This approach relies on the idea of considering the synaptic modification rule as a parametric function which has {\it local} inputs, and is the same is many neurons. Because the space of learning algorithms is very large, we propose to use biological knowledge about synaptic mechanisms, in order to design the form of such rules. The optimization methods used for this search do not have to be biologically plausible, although the net result of this search may be a biologically plausible learning rule. In the experiments described in this paper, local optimization method (gradient descent) as well as global optimization method (simulated annealing) were used to search for new learning rules. Estimation of parameters of synaptic modification rules consists of a joint global optimization of the rules themselves, as well as, of multiple networks that learn to perform some tasks with these rules. Experiments are described in order to assess the feasibility of the proposed method for very simple tasks. Experiments of classical conditioning for {\it Aplysia} yielded a rule that allowed a network to reproduce five basic conditioning phenomena. Experiments with two-dimentional categorization problems yielded a rule for a network with a hidden layer that could be used to learn some simple but non-linearly separable classification tasks. The rule parameters were optimized for a set of classification tasks and the generalization was tested successfully on a different set of tasks. Previous work: [1] Bengio Y. and Bengio S. (1990). Learning a synaptic learning rule. Technical Report #751. Computer Science Department. Universite de Montreal. [2] Bengio Y., Bengio S., and Cloutier, J. (1991). Learning a synaptic learning rule. IJCNN-91-Seattle. Related work: [3] Chalmers D.J. (1990). The evolution of learning: an experiment in genetic connectionism. In: Connectionist Models: Proceedings of the 1990 Summer School, pp. 81-90. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Thu Jan 23 13:55:36 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Thu, 23 Jan 92 13:55:36 EST Subject: CascadeC variants, Sjogaard's paper In-Reply-To: Your message of 21 Jan 92 22:49:41 +0100. <9201212149.AA01642@sunmanager.lrz-muenchen.de> Message-ID: Steen Sjogaard makes some interesting observations about the Cascade Corre- lation algorithm. He invents a variation of it, named 2CCA, which is using only two layers, but also relies on freezing weights, error covariance, candidate pools etc. as the CCA does. Some people don't like to see much discussion on this mailing list, so I'll keep this response as brief as possible. Sjogaard's 2CCA algorithm is just like cascade-correlation except that it elimiantes the "cascade" part: the candidate units receive connections only from the original inputs, and not from previously tenured hidden units. So it builds a net with a single hidden layer, plus shortcut connections from inputs to outputs. For some problems, a solution with one hidden layer is as good as any other, and for these problems 2CCA will learn a bit faster and generalize a bit better than Cascor. The extra degress of freedom in cascor have nothing useful to do and they just get in the way. However, for other problems, such as two-spirals, you get a much better solution with more layers. A cascade architecture can solve this problem with 10 units, while a single hidden layer requires something like 50 or 60. My own conclusion is that 2CCA does work somewhat better for certain problems, but is terrible for others. If you don't know in advance what architecture your problem needs, you are probably better off sticking with the more general cascade architecture. Chris Lebiere looked briefly at the following option: create two pools of candidate units, one that receives connections from all pre-existing inputs and hidden units, and one that has no connections from the deepest layer created so far. If the correlation scores are pretty close, tenure goes to the best-scoring unit in the latter pool. This new unit is a sibling of the pre-existing units, not a descendant. Preliminary results were promising, but for various reason Chris didn't follow up on this and it is still an open question what effect this might have on generalization. Scott Fahlman School of Computer Science Carnegie Mellon University From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Thu Jan 23 14:08:49 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Thu, 23 Jan 92 14:08:49 EST Subject: CascadeC in parallel In-Reply-To: Your message of 22 Jan 92 12:07:34 +0100. <9201221107.AA02532@sunmanager.lrz-muenchen.de> Message-ID: It is less than obvious how to parallelize Cascade Correlation, especially if compared to stuff like backprop. There are several good ways to partition Cascade-Correlation for parallel execution. Of course, the choice depends on the dimensions of your problem, and the dimensions and communication structure of your machine. One obvious choice is indeed to run each of the candidate units on its own separate processor. There is little communication involved: just broadcasting the incoming values and error to be matched (if these are not stored locally) and occasional polling to determine quiescence and choose a winner. For hard problems, Cascor spends most of its time in the candidate training phase, so this can give good results. If there are many output units, they can also be trained separately, one per processor. It is possible to overlap candidate and output training to some degree. The other obvious choice is to simulate an identical copy of the whole architecture on different processors, each with 1/N of the training data. Just before each weight-update phase, you sum up the computed derivative values from all the processors and update all the copies at once. Of course, there's much less total work to be done than with backprop, so you might need a bigger problem to get any real advantage from parallelism. But I find it hard to think of that as a disadvantage. Scott Fahlman School of Computer Science Carnegie Mellon University From seifert at csmil.umich.edu Thu Jan 23 17:03:41 1992 From: seifert at csmil.umich.edu (Colleen Seifert) Date: Thu, 23 Jan 92 17:03:41 -0500 Subject: position announcement Message-ID: <9201232203.AA17879@csmil.umich.edu> The University of Michigan Department of Psychology invites applications for a tenure-track position in the area of Cognitive Modelling. We seek candidates with primary interests and technical skills in cognitive psychology, with special preference for individuals with particular expertise in computational modelling (broadly defined, including connectionist modelling). Due to time constraints, please indicate interest via email (to "gmo at csmil.umich.edu") along with sending vita, references, publications, and statement of research and teaching interests to: Cognitive Processes Search Committee, Dept. of Psychology, University of Michigan, Ann Arbor, MI 48109. From russ at oceanus.mitre.org Fri Jan 24 07:25:35 1992 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 24 Jan 92 07:25:35 EST Subject: Aspirin/MIGRAINES V5.0 Message-ID: <9201241225.AA29557@oceanus.mitre.org> ------- OFFICIAL RELEASE! All pre-release 5.0 versions should be deleted ------- The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance C code implementations for backpropagation networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 5.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates a C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The system has been ported to a number of platforms: Apollo Convex Cray DecStation HP IBM RS/6000 Intel 486/386 (Unix System V) NeXT News Silicon Graphics Iris Sun4, Sun3 Coprocessors: Mercury i860 (40MHz) Coprocessors Meiko Computing Surface w/i860 (40MHz) Nodes Skystation i860 (40MHz) Coprocessors iWarp Cells Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. Aspirin 5.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/730: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/730: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/730: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/730: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3.0 - Matlab - Mathematica Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3.0. Gnuplot3.0 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3.0 is available as the file gnuplot3.0.tar.Z. >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> >>>> USENET users: >>>> >>>> GNUPLOT 3.0 was posted to comp.sources.misc. >>>> >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.0.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am5.tar.Z" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "polaris.cognet.ucla.edu" (128.97.50.3) (typically with the command "ftp 128.97.50.3") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "alexis", by typing the command "cd alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am5.tar.Z" How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am5.tar.Z | tar xvf - -or- uncompress am5.tar.Z ; tar xvf am5.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. January, 1992 Russell Leighton * * MITRE Signal Processing Center *** *** *** *** 7525 Colshire Dr. ****** *** *** ****** McLean, Va. 22102, USA ***************************************** ***** *** *** ****** INTERNET: russ at dash.mitre.org, ** *** *** *** leighton at mitre.org * * From arseno at phy.ulaval.ca Fri Jan 24 09:58:25 1992 From: arseno at phy.ulaval.ca (Henri Arsenault) Date: Fri, 24 Jan 92 09:58:25 EST Subject: programs Message-ID: <9201241458.AA15022@einstein.phy.ulaval.ca> There are a lot of long meetings programs with abstracts and so on being transmitted on this network. Would it not be more economical to transmit a short abstract along with instructions on how to ftp the whole document? Almost every day I have to scroll through long documents of marginal interest to me. Is it really necessary to put such long documents so all the subscribers have to read them? Henri H. Arsenault email: arseno at phy.ulaval.ca From Ye-Yi.Wang at DEAD.BOLTZ.CS.CMU.EDU Fri Jan 24 10:18:24 1992 From: Ye-Yi.Wang at DEAD.BOLTZ.CS.CMU.EDU (Ye-Yi.Wang@DEAD.BOLTZ.CS.CMU.EDU) Date: Fri, 24 Jan 92 10:18:24 EST Subject: connectionist text generation Message-ID: Coulod anyone give me a pointer to the references on neural network text generation systems? Thanks. Ye-Yi From russ at oceanus.mitre.org Fri Jan 24 13:15:23 1992 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 24 Jan 92 13:15:23 EST Subject: Free Neural Network Simulator (Aspirin V5.0) Message-ID: <9201241815.AA03381@oceanus.mitre.org> ------- OFFICIAL RELEASE! All pre-release 5.0 versions should be deleted ------- The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance C code implementations for backpropagation networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 5.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates a C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The system has been ported to a number of platforms: Apollo Convex Cray DecStation HP IBM RS/6000 Intel 486/386 (Unix System V) NeXT News Silicon Graphics Iris Sun4, Sun3 Coprocessors: Mercury i860 (40MHz) Coprocessors Meiko Computing Surface w/i860 (40MHz) Nodes Skystation i860 (40MHz) Coprocessors iWarp Cells Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. Aspirin 5.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/730: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/730: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/730: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/730: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3.0 - Matlab - Mathematica Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3.0. Gnuplot3.0 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3.0 is available as the file gnuplot3.0.tar.Z. >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> >>>> USENET users: >>>> >>>> GNUPLOT 3.0 was posted to comp.sources.misc. >>>> >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.0.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am5.tar.Z" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "polaris.cognet.ucla.edu" (128.97.50.3) (typically with the command "ftp 128.97.50.3") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "alexis", by typing the command "cd alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am5.tar.Z" How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am5.tar.Z | tar xvf - -or- uncompress am5.tar.Z ; tar xvf am5.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. January, 1992 Russell Leighton * * MITRE Signal Processing Center *** *** *** *** 7525 Colshire Dr. ****** *** *** ****** McLean, Va. 22102, USA ***************************************** ***** *** *** ****** INTERNET: russ at dash.mitre.org, ** *** *** *** leighton at mitre.org * * From bap at james.psych.yale.edu Fri Jan 24 16:45:02 1992 From: bap at james.psych.yale.edu (Barak Pearlmutter) Date: Fri, 24 Jan 92 16:45:02 -0500 Subject: YANIPSPPI (Yet Another NIPS PrePrint by Internet) Message-ID: <9201242145.AA16059@james.psych.yale.edu> Because of the large number of requests for preprints of "Gradient descent: second-order momentum and saturating error", I am following the trend and making it available by FTP. I apologize to those who left their cards, but the effort and expense of distribution is prohibitive. If you can not access the paper in this fashion but really must have a copy before the proceedings come out, please contact me. FTP INSTRUCTIONS: ftp JAMES.PSYCH.YALE.EDU user anonymous password state-your-name cd pub/bap/asymp binary get nips91.PS.Z quit zcat nips91.PS.Z | lpr Maybe next year, instead of contracting out the proceedings, we can require postscript from all contributers and everyone will print everything out at home. Money saved will be used to purchase giant staplers. Barak Pearlmutter Yale University Department of Psychology 11A Yale Station New Haven, CT 06520-7447 pearlmutter-barak at yale.edu From yoshua at psyche.mit.edu Fri Jan 24 20:34:38 1992 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Fri, 24 Jan 92 20:34:38 EST Subject: optimization of learning rule: erratum Message-ID: <9201250134.AA26747@psyche.mit.edu> Hi, My previous message mentioned the availability of a preprint on the optimization of learning rules at an ftp site. There was a typo in the address. The correct address is: iros1.iro.umontreal.ca or 132.204.32.21 and the compressed postscript file is in pub/IRO/its/bengio.optim.ps.Z Sorry, Yoshua Bengio From Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU Sat Jan 25 23:49:31 1992 From: Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU (Dave_Touretzky@DST.BOLTZ.CS.CMU.EDU) Date: Sat, 25 Jan 92 23:49:31 EST Subject: programs In-Reply-To: Your message of Fri, 24 Jan 92 09:58:25 -0500. <9201241458.AA15022@einstein.phy.ulaval.ca> Message-ID: <15398.696401371@DST.BOLTZ.CS.CMU.EDU> Henri Arsenault writes: > There are a lot of long meetings programs with abstracts and so on being > transmitted on this network. Would it not be more economical to transmit > a short abstract along with instructions on how to ftp the whole > document? Almost every day I have to scroll through long documents of > marginal interest to me. Is it really necessary to put such long > documents so all the subscribers have to read them? Meeting programs and abstracts are entirely appropriate materials for the CONNECTIONISTS list. Not all our subscribers have access to FTP, and most would consider it an unreasonable inconvenience to have to FTP such things. My advice to you is to learn how to use your mail-reading program correctly. In most mail readers, long messages that don't interest you can be skipped with a single keystroke. If you are scrolling through the whole message in order to get to the next one, go back and read the user's manual. Please, let's have no more discussion of this topic on the CONNECTIONSITS list. *That* would be a waste of bandwidth. People who feel they absolutely have to comment on this can send their remarks to the maintainers: Connectionists-Request at cs.cmu.edu. -- Dave Touretzky From U53076%UICVM.BITNET at bitnet.cc.cmu.edu Mon Jan 27 00:32:17 1992 From: U53076%UICVM.BITNET at bitnet.cc.cmu.edu (Bruce Lambert) Date: Sun, 26 Jan 92 23:32:17 CST Subject: OPtimizing inductive bias Message-ID: <01GFSII5TIGG9YCJ0F@BITNET.CC.CMU.EDU> HI folks, Recently Yoshua Bengio posted a note about using standard optimization techniques to set tunable parameters to neural nets. Dave Tcheng and I have working on the same basic idea at a more general level for several years. Rather than optimizing just networks, we have developed a framework for using optimization to search a large inductive bias space defined by several different types of algorithms (e.g., decision tree builders, nets, exemplar based approaches, etc.). Given the omnipresent necessity of tweaking biases to get good performance, automation of the bias search seems very sensible. A couple of references to our work are given below. We hope you find them useful -Bruce Lambert Department of Pharmacy Administration University of Illinois at Chicago Tcheng, D., Lambert, B., Lu, S. C-Y., & Rendell, L. (1989). Building robust learning systems by combining induction and optimization. In _Proc. 11th IJCAI_ (pp. 806-812). San Mateo, CA: Morgan Kaufman. Tcheng, D., Lambert, B., Lu, S. C-Y, & Rendell, L. (1991). AIMS: An adaptive interactive modelling system for supporting engineering decision making. In L. Birnbaum & G. Collins (Eds.), _Machine learning: Proceedings of the eighth international workshop_ (pp. 645-649). San Mateo, CA: Morgan Kaufman. From T00BOR%DHHDESY3.BITNET at BITNET.CC.CMU.EDU Mon Jan 27 15:54:45 1992 From: T00BOR%DHHDESY3.BITNET at BITNET.CC.CMU.EDU (Stefan Bornholdt) Date: MON, 27 JAN 92 15:54:45 MEZ Subject: papers available --- ann, genetic algorithms Message-ID: <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> papers available, hardcopies only. ------------------------------------------------------------------------ GENERAL ASYMMETRIC NEURAL NETWORKS AND STRUCTURE DESIGN BY GENETIC ALGORITHMS Stefan Bornholdt Deutsches Elektronen-Synchrotron DESY, Notkestr. 85, 2000 Hamburg 52 Dirk Graudenz Institut f\"ur Theoretische Physik, Lehrstuhl E, RWTH 5100 Aachen, Germany. A learning algorithm for neural networks based on genetic algorithms is proposed. The concept leads in a natural way to a model for the explanation of inherited behavior. Explicitly we study a simplified model for a brain with sensory and motor neurons. We use a general asymmetric network whose structure is solely determined by an evolutionary process. This system is simulated numerically. It turns out that the network obtained by the algorithm reaches a stable state after a small number of sweeps. Some results illustrating the learning capabilities are presented. [to appear in Neural Networks] preprints available from: Stefan Bornholdt, DESY-T, Notkestr. 85, 2000 Hamburg 52, Germany. Email: t00bor at dhhdesy3.bitnet (hardcopies only, all rights reserved) ------------------------------------------------------------------------ From lacher at NU.CS.FSU.EDU Tue Jan 28 12:47:08 1992 From: lacher at NU.CS.FSU.EDU (Chris Lacher) Date: Tue, 28 Jan 92 12:47:08 -0500 Subject: appropriate material Message-ID: <9201281747.AA02271@lambda.cs.fsu.edu> This is to express my *satisfaction* with the connectionists mailing list and the materials I get by being a subscriber. True, there are the occasional "unsubscribe me" and " Harry, will you give me a ride home" accidents, but in reality these are more humorous than annoying. It is not realistic to expect a list with many subscribers to be perfect. In general, I find the material (information on preprints, conference announcements, bibliographic endeavors, and scientific discussions) well worth the few uninteresting things that come out. And, as Dave Touretzky stated, learning to use the mailer makes it very easy and convenient to skip over things that are of no interest. So, my vote is to KEEP connectionists and the associated ftp servers running. We owe a big round of applause for the institutions and people who keep it going. Thanks! Chris Lacher From bogner at eleceng.adelaide.edu.au Wed Jan 29 00:18:05 1992 From: bogner at eleceng.adelaide.edu.au (bogner@eleceng.adelaide.edu.au) Date: Wed, 29 Jan 1992 16:18:05 +1100 Subject: papers available --- ann, genetic algorithms In-Reply-To: Stefan Bornholdt's message of MON, 27 JAN 92 15:54:45 MEZ <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> Message-ID: <9201290518.16821@munnari.oz.au> Would greatly appreciate a copy of the paper offered. Prof. Robert E. Bogner Dept. of Elec. Eng., Thu University of Adelaide, Box 498, Adelaide, SOUTH AUSTRALIA 5001 bogner at eleceng.adelaide.edu.au From ringram at ncsa.uiuc.edu Wed Jan 29 08:23:31 1992 From: ringram at ncsa.uiuc.edu (ringram@ncsa.uiuc.edu) Date: Wed, 29 Jan 92 07:23:31 -0600 Subject: Mailing List Message-ID: <9201291323.AA27261@newton.ncsa.uiuc.edu> Please remove my name from your mailing list. Rich Ingram From lacher at NU.CS.FSU.EDU Wed Jan 29 14:36:23 1992 From: lacher at NU.CS.FSU.EDU (Chris Lacher) Date: Wed, 29 Jan 92 14:36:23 -0500 Subject: paper Message-ID: <9201291936.AA02761@lambda.cs.fsu.edu> The following paper has been placed in the neuroprose archives under the name 'lacher.rapprochement.ps.Z'. Retrieval, uncompress, and printing have been successfuly tested. Expert Networks: Paradigmatic Conflict, Technological Rapprochement^\dagger R. C. Lacher Florida State University lacher at cs.fsu.edu Abstract. A rule-based expert system is demonstrated to have both a symbolic computational network representation and a sub-symbolic connectionist representation. These alternate views enhance the usefulness of the original system by facilitating introduction of connectionist learning methods into the symbolic domain. The connectionist representation learns and stores metaknowledge in highly connected subnetworks and domain knowledge in a sparsely connected expert network superstructure. The total connectivity of the neural network representation approximates that of real neural systems which may be useful in avoiding scaling and memory stability problems associated with some other connectionist models. Keywords. symbolic AI, connectionist AI, connectionism, neural networks, learning, reasoning, expert networks, expert systems, symbolic models, sub-symbolic models. ------------------- ^\dagger Paper given to the symposium "Approaches to Cognition", the fifteenth annual Symposium in Philosophy held at the University of North Carolina, Greensboro, April 5-7, 1991. From yair at siren.arc.nasa.gov Wed Jan 29 11:55:15 1992 From: yair at siren.arc.nasa.gov (Yair Barniv) Date: Wed, 29 Jan 92 08:55:15 PST Subject: papers available --- ann, genetic algorithms In-Reply-To: Stefan Bornholdt's message of MON, 27 JAN 92 15:54:45 MEZ <01GFT1UXLRMS9YCJ3G@BITNET.CC.CMU.EDU> Message-ID: <9201291655.AA05759@siren.arc.nasa.gov.> Hello Dr. Bornholdt: I will appreciate obtaining a copy of the above work Thanks, Yair Barniv NASA/Ames, Mountain View, CA USA From Gripe at VEGA.FAC.CS.CMU.EDU Wed Jan 29 15:47:30 1992 From: Gripe at VEGA.FAC.CS.CMU.EDU (Gripe@VEGA.FAC.CS.CMU.EDU) Date: Wed, 29 Jan 92 15:47:30 EST Subject: workshop announcement :posted for Diane Gordon Message-ID: <2157.696718050@PULSAR.FAC.CS.CMU.EDU> From gordon at AIC.NRL.Navy.Mil Wed Jan 15 15:59:56 1992 From: gordon at AIC.NRL.Navy.Mil (gordon@AIC.NRL.Navy.Mil) Date: Wed, 15 Jan 92 15:59:56 EST Subject: workshop announcement Message-ID: ing process. Researchers to date have studied various biases in inductive learning such as algorithms, representations, background knowledge, and instance orders. The focus of this workshop is not to examine these biases in isolation. Instead, this workshop will examine how these biases influence each other and how they influence learning performance. For example, how can active selection of instances in concept learning influence PAC convergence? How might a domain theory affect an inductive learning algorithm? How does the choice of representational bias in a learner influence its algo- rithmic bias and vice versa? The purpose of this workshop is to draw researchers from diverse areas to discuss the issue of biases in inductive learning. The workshop topic is a unifying theme for researchers working in the areas of reformulation, constructive induction, inverse resolu- tion, PAC learning, EBL-SBL learning, and other areas. This workshop does not encourage papers describing system comparisons. Instead, the workshop encourages papers on the following topics: - Empirical and analytical studies comparing different biases in inductive learning and their quantitative and qualitative influ- ence on each other or on learning performance - Studies of methods for dynamically adjusting biases, with a focus on the impact of these adjustments on other biases and on learning performance - Analyses of why certain biases are more suitable for particular applications of inductive learning - Issues that arise when integrating new biases into an existing inductive learning system - Theory of inductive bias Please send 4 hard copies of a paper (10-15 double-spaced pages, ML-92 format) or (if you do not wish to present a paper) a descrip- tion of your current research to: Diana Gordon Naval Research Laboratory, Code 5510 4555 Overlook Ave. S.W. Washington, D.C. 20375-5000 USA Email submissions to gordon at aic.nrl.navy.mil are also acceptable, but they must be in postscript. FAX submissions will not be accepted. If you have any questions about the workshop, please send email to Diana Gordon at gordon at aic.nrl.navy.mil or call 202-767- 2686. Important Dates: March 12 - Papers and research descriptions due May 1 - Acceptance notification June 1 - Final version of papers due Program Committee: Diana Gordon, Naval Research Laboratory Dennis Kibler, University of California at Irvine Larry Rendell, University of Illinois Jude Shavlik, University of Wisconsin William Spears, Naval Research Laboratory Devika Subramanian, Cornell University Paul Vitanyi, CWI and University of Amsterdam From bogner at eleceng.adelaide.edu.au Wed Jan 29 20:21:54 1992 From: bogner at eleceng.adelaide.edu.au (bogner@eleceng.adelaide.edu.au) Date: Thu, 30 Jan 1992 12:21:54 +1100 Subject: Advertisement Message-ID: <9201300121.14027@munnari.oz.au> University of Adelaide SIGNAL PROCESSING AND NEURAL NETWORKS RESEARCH ASSOCIATE or RESEARCH OFFICER A research associate or research officer is required as soon as possible to work on projects supported by the University, The Department of Defence, and the Australian Research Council. Two prime projects are under consideration and the appointee may be required to work on either or both. The aim of the one project is to design an electronic sensor organ based on known principles of insect vision. The insect's eye has specialised preprocessing that provides measures of distance and velocity by evaluation of deformations of the perceived visual field. The work will entail novel electronic design in silicon or gallium arsenide, software simulation and experimental work to evaluate and demonstrate performance. This work is in collaboration with the Australian National University. The aim of the other project is to develop and investigate principles of artificial neural network for processing multiple signals obtained from over-the-horizon radars. Investigation of the wavelet functions for the representations of signals may be involved. The work is primarily in the area of exploration of algorithms and high-level computer software. This work is in conjunction with DSTO. d. DUTIES: In consultation with task leaders and specialist researchers to investigate alternative design approaches and to produce designs for microelectronic devices, based on established design procedures. Communicate designs to manufacturers and oversee the production of devices. Prepare data for experiments on applications of signal processing and artificial neural networks. Prepare software for testing algorithms. Assist with the preparation of reports. QUALIFICATIONS: For the Research Associate, a Phd or other suitable evidence of equivalent capability in research in engineering or computer science. Exceptionally, a candidate with less experience but outstanding ability might be considered. For the Research Officer, a degree in electrical engineering or computer science with a good level of achievement. Experience in signal processing would be an advantage. Demonstrated ability to communicate fluently in written and spoken English. PAY and CONDITIONS: will be in accordance with University of Adelaide policies, and will depend on the qualifications and experience. Suitable incumbents may be able to include some of the work undertaken for a higher degree if they do not hold such. Appointment may be made in scales from $25692 p.a. to 33017 for the Research Officer or $29600 to $38418 p.a. for the Research Associate. ENQUIRIES: Professor R. E. Bogner, Dept. of Electrical and Electronic Engineering, The University of Adelaide, Box 498, Adelaide, Phone (08) 228 5589, Fax (08) 224 0464, E-mail bogner at eleceng.adelaide.edu.au bugI2A22.CHI 28-1-92 From 0005013469 at mcimail.com Thu Jan 30 03:30:00 1992 From: 0005013469 at mcimail.com (Jean-Bernard Condat) Date: Thu, 30 Jan 92 08:30 GMT Subject: papers available --- golden section Message-ID: <70920130083007/0005013469PK3EM@mcimail.com> Hallo! I work on the golden section in sciences and look at all possible refernces and/or article related to this subject. If you known one, could you please send me one copy and/or the reference? Thank you very much for your kind help. Jean-Bernard Condat CCCF B.P. 8005 69351 Lyon Cedex 08 France Fax.: +33 1 47 87 70 70 Phone: +33 1 47 87 40 83 DialMail #24064 MCI Mail #501-3469 From anshu at discovery.rutgers.edu Thu Jan 30 14:33:01 1992 From: anshu at discovery.rutgers.edu (anshu@discovery.rutgers.edu) Date: Thu, 30 Jan 92 14:33:01 EST Subject: No subject Message-ID: <9201301933.AA04183@discovery.rutgers.edu> Hi! I am working on the retention of knowledge by a neural network. A neural network tends to forget the past training when it is trained on new data points. I'll be thankful if you could suggest some references in which I could find this topic, or suggest some interesting research topic which I can pursue further as my thesis also. Thanks. Anshu Agarwal anshu at caip.rutgers.edu From ross at psych.psy.uq.oz.au Thu Jan 30 14:56:16 1992 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Fri, 31 Jan 1992 06:56:16 +1100 Subject: Where is J M SOPENA of BARCELONA? Message-ID: <9201301956.AA11031@psych.psy.uq.oz.au> J.M. Sopena of the University of Barcelona posted notice of a paper on ESRP: a Distributed Connectionist Parser, some weeks back. The contact address was given as: d4pbjss0 at e0ub011.bitnet My mail to Sopena has been bounced by the bitnet gateway (cunyvm.bitnet) with a 'cannot find mailbox' message. Would Dr Sopena please contact me directly or perhaps someone who HAS got through to Sopena might get in touch with me. Almost a dozen people have contacted me since I last posted this message, to say that they also failed to contact Sopena and to please let them know the secret if I found it. Thankyou Ross Gayler ross at psych.psy.uq.oz.au From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Jan 31 09:55:54 1992 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 31 Jan 92 09:55:54 EST Subject: No subject In-Reply-To: Your message of Thu, 30 Jan 92 14:33:01 -0500. <9201301933.AA04183@discovery.rutgers.edu> Message-ID: I am working on the retention of knowledge by a neural network. A neural network tends to forget the past training when it is trained on new data points... Geoff Hinton and his students did some early work on damaging nets and then re-training, and also on the use of fast-changing and slow-changing weights. Perhaps he can supply some references to this work and related work done by others. Cascade-Correlation has some interesting properties related to retention. If you train and then switch to a new training set, you mess up the output-layer weights, but you retain all of the old feature detectors (hidden units), and maybe build some new ones for the new data. Then if you return to the old training set or a composite set, re-learning is generally quite fast. This is demonstrated in my Recurrent Cascade-Correlation (RCC) paper that can be found in NIPS-3 or in neuroprose. I train a net to recognize Morse code by breaking the training into distinct (not cumulative) lessons, starting with the shortest codes, and then training on all codes at once. This works better than training on all codes right from the start. This opens up the possibility of a sort of long-lived, eclectic net that is trained in many diferent domains over its "lifetime" and that gradually accumulates a rich library of useful feature detectors. The current version of Cascor wouldn't be very good for this, since later hidden units would be too deep and would have too many inputs, but I think that this problem of excess connectivity is may be easily solvable. -- Scott Scott E. Fahlman School of Computer Science Carnegie Mellon University From pellioni at pioneer.arc.nasa.gov Wed Jan 29 20:38:04 1992 From: pellioni at pioneer.arc.nasa.gov (Andras Pellionisz SL) Date: Wed, 29 Jan 92 17:38:04 -0800 Subject: Open letter to Dr. Sun-Ichi Amari Message-ID: [[ Editor's Note: I know many in the field regard Dr. Pellionisz as holding controversial opinions. He and I have corresponded and I feel he brings up some very valid points which should be the source of substantive debate. The letter below is the result. I encourage responses, either in support or refutation, to the following letter. The main issue, that of intellectual priority and proper citation, affects all of us in research and forms the foundation of the modern scientific tradition. Dr. Pellionisz' secondary issue, international competition versus cooperation, is also worthy of discussion, though I would request that responses to Neuron Digest remain factual and near the subject of neural networks. I also certainly hope that Dr. Amari responds to the rather serious charges in an appropriate forum. -PM ]] Dear Peter: according to our previous exchange, after long deliberation, I put together the "Open letter to Amari". Given the fact that my personal story is well in line with some global happenings, I trust that you will find this contribution worthy of distribution Andras * "Tensor-Geometrical Approach to Neural Nets" in 1985 and 91* or OPEN LETTER TO DR. SUN-ICHI AMARI by Andras J. Pellionisz Dear Readers: Many of you may know that I pioneered a tensor- geometrical approach to neural nets for over a decade, with dozens of publications in this subject. Many of you may have seen a recent paper on tensor-geometry of neural nets (by Dr. Amari) as "opening a new fertile field of neural network research" (in 1991!) WITHOUT referencing ONE of the pre- existing pioneering studies. Dr. Amari did not even cite his own paper (1985), in which he criticized my pioneering. This is unfair, especially since that the majority of readers were uninitiated in tensor geometry in 85 and thus his early "criticism" greatly hampered the unfolding of the tensor geomery approach that he now takes. Unfortunately, Dr. Amari's paper appeared in a Journal in which he is a chief editor. Therefore, I am turning directly to you, with the copy of my letter (sent to Dr. Amari 21st Oct. 1991, no response to date). There may be two issues involved. Obviously, we are entering an era which will be characterized by fierce competition in R&D World- wide, especially between US, Japan and Europe. The question of protocol of fair competition in such a complex endeavor may be too nascent or too overwhelming for me to address. The costliness of pioneering and fairness to long-existing standards of protocol in academia, acknowledgement of such initiatives, is a painful and personal enough problem for me to have to shoulder. =========================================== Dear Dr. Amari: Thank you for your response to my E-mail sent to you regarding your paper in the September issue (1991) of "Neural Networks", entitled "Dualistic geometry of the manifold of higher-order neurons". You offered two interpretations why you featured a Geometrical Approach in 1991 as "opening a new fertile field of neural network research". One can see two explanations why you wrote your paper without even mentioning any of my many publications, for a decade prior to yours, or without even mentioning your own paper (with Arbib in which you criticized in 1985 my geometrical- tensorial approach that I took since 1980). I feel that one cannot accept both interpretations at the same time, since they contradict one another. Thus, I feel compelled to make a choice. The opening (2nd and 3rd) paragraphs of your letter say: "As you know very well, we criticized your idea of tensorial approach in our... paper with M.Arbib. The point is that, although the tensorial approach is welcome, it is too restrictive to think that the brain function is merely a transformation between contravariant vectors and covariant vectors; even if we use linear approximations, the transformation should be free of the positivity and symmetry. As you may understand these two are the essential restrictions of covariant-contravariant transformations. ...You have interests in analyzing a general but single neural network. Of course this is very important. However, what I am interested in is to know a geometrical structures of a set of neural networks (in other words, a set of brains). This is a new object of research." THIS FIRST INTERPRETATION, that you could have easily included to your 1991 paper, clearly features your work as a GENERALIZATION of my decade-old geometrical initiative, which you deem "too restrictive". I am happy that you still clearly detect some general features of my prior work, which you describes as targeting a "single neural network", while yours as being concerned with a "set of neural networks". Still, it is a fact that my work was never restricted to e.g. a SINGLE cerebellum, but was a geometrical representation of the total "set of all cerebella", not even restricted to any single species (but, in full generality, the metric tensor of the spacetime geometry). Thus the characterization of your work as more general appears unsupported by your letter. However, even if your argument were fully supported, in a generalization of earlier studies an author would be expected to make references, according to standard protocol, to prior work which is being generalized (as my "too restrictive" studies preceeded yours by a decade). In fact, you (implicitly) appear to accept this point by saying (later in your letter): "Indeed, when I wrote that paper, I thought to refer to your paper". Unfortunately, instead of doing so, you continue by offering a SECOND ALTERNATIVE INTERPRETATION of your omission of any reference to my work, by saying: "But if I did so, I could only state that it is nothing to do with this new approach". Regrettably, I find that the two interpretations are incompatible that (1) your work is a GENERALIZATION of mine (2) your geometrical aproach has NOTHING TO DO with the geometrical approach that I initiated. Since I have just returned from a visit to Germany (a country that awarded to me the Alexander von Humboldt Prize honoring my geometrical approach to brain theory) I know that many in Germany as well as in the US are curious to see how THEIR INTERPRETATION of similarities of the two tensor-geometrical approaches compares to Amari's and/or Pellionisz's interpretation. I can not run the risk of trying to slap into the face of the audience two diametrically opposing arguments (when they will press me requiring comparisons of your metric tensors used in 1991 and those that I used since 1980). On my part, I will therefore take the less offensive interpretation from those you offered, which claims that your geometrical approach is in some ways more general than my geometrical approach a decade before. As for you, I will leave it to you how you compare your approach to mine, if you become pressed by anyone to substantiate your claim over the comparison. I maintain the position proposed in my original letter, that it might be useful if such a public comparison is offered by you for the record at the earliest occasion of your choice. For now, I shall remain most cooperative to find ways to make sure that appropriate credit is given to my decade-old pioneering efforts (however "restrictive" you label the early papers and whether or not you have read any of those that I wrote since1982, the date of manuscript of your 1985 critique). At this time, I would like to refer to the wide selection of options taken by workers in the past in similar situations. Since by December 7, 1991, I will have made a strong public impact by statements on this issue, I would most appreciate if during the coming week or two you could indicate (which I have no reason to doubt at this time) your willingness to credit my costly pioneering efforts in some appropriate fashion. As you so well know yourself, a geometrical approach to brain theory is still not automatically taken by workers in 1991, and certainly was rather costly to me to initiate more than a decade ago, and to uphold, expand, experimentally prove in neuroscience, and firmly establish in neural net theory in spite of criticisms. Sincerely: Dr. Andras J. Pellionisz ------------------------------ Neuron Digest Monday, 2 Mar 1992 Volume 9 : Issue 9 Today's Topics: Open Letter - Response Reply to Pellionisz' "Open Letter" ------------------------------