From karit at spine.hut.fi Mon Jun 3 14:07:02 1991 From: karit at spine.hut.fi (Kari Torkkola) Date: Mon, 3 Jun 91 14:07:02 DST Subject: Research positions in speech and image processing Message-ID: <9106031107.AA08981@spine.hut.fi.hut.fi> RESEARCH POSITIONS AVAILABLE The newly created "Institut Dalle Molle d'Intelligence Arti- ficielle Perceptive" (IDIAP) in Martigny Switzerland seeks to hire qualified researchers in the areas of speech recognition and image manipulation. Candidates should be able to conduct in- dependent research in a UNIX environment on the basis of solid theoretical and applied knowledge. Salaries will be aligned with those offered by the Swiss government for equivalent positions. Laboratories are now being established in the newly renovated building that houses the Institute, and international network connections will soon be in place. Researchers are expected to begin activity during the academic year 1991-1992. IDIAP is the third institute of artificial intelligence sup- ported by the Dalle Molle Foundation, the others being ISSCO (at- tached to the University of Geneva) and IDSIA (situated in Lu- gano). The new institute will maintain close contact with these latter centers as well as with the Polytechnical School of Lausanne and the University of Geneva. To apply for a research position at IDIAP, please send a curriculum vita and technical reports to: Daniel Osherson, Directeur IDIAP Case Postale 609 CH-1920 Martigny Switzerland For further information by e-mail, contact: osherson at disuns2.epfl.ch From issnnet at park.bu.edu Mon Jun 3 11:27:36 1991 From: issnnet at park.bu.edu (issnnet@park.bu.edu) Date: Mon, 3 Jun 91 11:27:36 -0400 Subject: RFD: comp.org.issnnet Message-ID: <9106031527.AA06005@copley.bu.edu> REQUEST FOR DISCUSSION ---------------------- GROUP NAME: comp.org.issnnet STATUS: unmoderated CHARTER: The newsgroup shall serve as a medium for discussions pertaining to the International Student Society for Neural Networks (ISSNNet), Inc., and to its activities and programs as they pertain to the role of students in the field of neural networks. See details below. TARGET VOTING DATE: JUNE 20 - JULY 20, 1991 ****************************************************************************** PLEASE NOTE In agreement with USENET newsgroup guidelines for the creation of new newsgroups, this discussion period will continue until June 21, at which time voting will begin if deemed appropriate. ALL DISCUSSION SHOULD TAKE PLACE ON THE NEWSGROUP "news.groups" If you do not have access to USENET newsgroups but wish to contribute to the discussion, send your comments to: issnnet at park.bu.edu specifying whether you would like your message relayed to news.groups. A call for votes will be made to the same newsgroups and mailing lists that originally received this message. PLEASE DO NOT SEND REPLIES TO THIS MAILING LIST OR NEWSGROUP DIRECTLY! A call for votes will be broadcast in a timely fashion. Please do not send votes until then. ****************************************************************************** BACKGROUND AND INFORMATION: The purpose of the International Student Society for Neural Networks (ISSNNet) is to (1) provide a means of exchanging information among students and young professionals within the area of Neural Networks; (2) create an opportunity for interaction between students and professionals from academia and industry; (3) encourage support >from academia and industry for the advancement of students in the area of Neural Networks; (4) insure that the interest of all students in the area of Neural Networks is taken into consideration by other societies and institutions involved with Neural Networks; and (5) to foster a spirit of international and interdisciplinary kinship among students as the study of Neural Networks develops into a self-contained discipline. Since its creation one year ago, ISSNNet has grown to over 300 members in more than 20 countries around the world. One of the biggest problems we have faced thus far is to efficiently communicate with all the members. To this end, a network of "governors" has been created. Each governor is in charge of distributing information (such as our newsletter) to all local members, collect dues, notify local members of relevant activities, etc. However, even this system has problems. Communication to a possibly very large number of members relies entirely on one individual, and given the typically erratic schedule of a student, it is often difficult to insure prompt and timely distribution to all members. More to the point, up until this time all governors have been contacting a single person (yours truly), and that has been a problem. Regular discussions on the society and related matters become very difficult when routed through individuals in this fashion. The newsgroup would be primarily dedicated to discussion of items pertaining to the society. We are about to launch a massive call for nominations, in the hope that more students will step forward and take a leading role in the continued success of the society. In addition, ISSNNet is involved with a number of projects, many of which require extensive electronic mail discussions. For example, we are developing a sponsorship program for students presenting papers at NNet conferences. This alone has generated at least 100 mail messages to the ISSNNet account, most of which could have been answered by two or three "generic" postings. We have refrained from using some of the existing mailing lists and USENET newsgroups that deal with NNets because of the non-technical nature of our issues. In addition to messages that are strictly society-related, we feel that there are many messages posted to these existing bulletin boards for which our newsgroup would be a better forum. Here is a list of topics that frequently come up, which would be handled in comp.org.issnnet as part of our "sponsored" programs: "What graduate school should I go to?" Last year, ISSNNet compiled a list of graduate programs around the world. The list will be updated later this year to include a large number of new degree programs around the world. "What jobs are available?" We asked companies that attended last year's IJCNN-San_Diego and INNC-Paris conferences to fill out a questionnaire on employment opportunities for NNet students. "Does anyone have such-and-such NNet simulator?" Many students have put together computer simulations of NNet paradigms and these could be shared by people on this group. "When is the next IJCNN conference?" We have had a booth at past NNet conferences, and hope to continue doing this for more and more international and local meetings. We often have informal get-togethers at these conferences, where students and others have the opportunity to meet. ----------------------------------------------------------------------- For more information, please send e-mail to issnnet at park.bu.edu (ARPANET) write to: ISSNNet, Inc. PO Box 557, New Town Br. Boston, MA 02258 USA ISSNNet, Inc. is a non-profit corporation in the Commonwealth of Massachusetts. ISSNNet, Inc. P.O. Box 557, New Town Branch Boston, MA 02258 USA From dcp+ at cs.cmu.edu Mon Jun 3 15:51:50 1991 From: dcp+ at cs.cmu.edu (David Plaut) Date: Mon, 03 Jun 91 15:51:50 EDT Subject: Preprint: Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia Message-ID: <1831.675978710@DWEEB.BOLTZ.CS.CMU.EDU> The following paper is available in the neuroprose archive as plaut.cogsci91.ps.Z. It will appear in this year's Cognitive Science Conference proceedings. A much longer paper presenting a wide range of related work is in preparation and will be announced shortly. Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia David C. Plaut Tim Shallice School of Computer Science Department of Psychology Carnegie Mellon University University College, London dcp at cs.cmu.edu ucjtsts at ucl.ac.uk Deep dyslexics are patients with neurological damage who exhibit a variety of symptoms in oral reading, including semantic, visual and morphological effects in their errors, a part-of-speech effect, and better performance on concrete than abstract words. Extending work by Hinton & Shallice (1991), we develop a recurrent connectionist network that pronounces both concrete and abstract words via their semantics, defined so that abstract words have fewer semantic features. The behavior of this network under a variety of ``lesions'' reproduces the main effects of abstractness on deep dyslexic reading: better correct performance for concrete words, a tendency for error responses to be more concrete than stimuli, and a higher proportion of visual errors in response to abstract words. Surprisingly, severe damage within the semantic system yields better performance on *abstract* words, reminiscent of CAV, the single, enigmatic patient with ``concrete word dyslexia.'' To retrieve this from the neuroprose archive type the following: unix> ftp 128.146.8.62 Name: anonymous Password: neuron ftp> binary ftp> cd pub/neuroprose ftp> get plaut.cogsci91.ps.Z ftp> quit unix> zcat plaut.cogsci91.ps.Z | lpr ------------------------------------------------------------------------------- David Plaut dcp+ at cs.cmu.edu School of Computer Science 412/268-8102 Carnegie Mellon University Pittsburgh, PA 15213-3890 From mjolsness-eric at CS.YALE.EDU Wed Jun 5 15:50:55 1991 From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness) Date: Wed, 5 Jun 91 15:50:55 EDT Subject: TR: Bayesian Inference on Visual Grammars by NNs that Optimize Message-ID: <9106051951.AA25379@NEBULA.SYSTEMSZ.CS.YALE.EDU> The following paper is available in the neuroprose archive as mjolsness.grammar.ps.Z: Bayesian Inference on Visual Grammars by Neural Nets that Optimize Eric Mjolsness Department of Computer Science Yale University New Haven, CT 06520-2158 YALEU/DCS/TR854 May 1991 Abstract: We exhibit a systematic way to derive neural nets for vision problems. It involves formulating a vision problem as Bayesian inference or decision on a comprehensive model of the visual domain given by a probabilistic {\it grammar}. A key feature of this grammar is the way in which it eliminates model information, such as object labels, as it produces an image; correspondance problems and other noise removal tasks result. The neural nets that arise most directly are generalized assignment networks. Also there are transformations which naturally yield improved algorithms such as correlation matching in scale space and the Frameville neural nets for high-level vision. Deterministic annealing provides an effective optimization dynamics. The grammatical method of neural net design allows domain knowledge to enter from all levels of the grammar, including ``abstract'' levels remote from the final image data, and may permit new kinds of learning as well. The paper is 56 pages long. To get the file from neuroprose: unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get mjolsness.grammar.ps.Z ftp> quit unix> uncompress mjolsness.grammar.ps.Z unix> lpr mjolsness.grammar.ps (or however you print postscript) -Eric ------- From jm2z+ at andrew.cmu.edu Thu Jun 6 13:47:19 1991 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Thu, 6 Jun 91 13:47:19 -0400 (EDT) Subject: Are they really worth the effort ? Message-ID: <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu> I'd like to have a debate about the advantages of distributed over local representations. I mean sure, distributed representations are great for they work in 2^n instead of n space, they degrade gracefully and all these PDP Bible type of things. But ... are they really that good ? For one thing they make our life awfully difficult in terms of undertanding and manipulating them .. .Are they really worth the effort ? Do you have concrete examples in your work where they did a better job than local representations ? Javier From ogs0%dixie.dnet at gte.com Thu Jun 6 17:21:22 1991 From: ogs0%dixie.dnet at gte.com (Oliver G. Selfridge) Date: Thu, 6 Jun 91 17:21:22 -0400 Subject: Warren McCulloch's widow Message-ID: <9106062121.AA05259@bunny.gte.com> I sadly announce that Rook McCulloch, widow to Warren McCulloch, dies last night at the age of 92. Warren himself, with Walter Pitts wrote the revolutionary introduction to neural nets in the middle 40s in two well-knwon papers. Rook maintained a bright aned contributory life up to the end and we will all miss her. Oliver Selfridge From jm2z+ at andrew.cmu.edu Thu Jun 6 18:19:45 1991 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Thu, 6 Jun 91 18:19:45 -0400 (EDT) Subject: are they worth the effort II Message-ID: <4cHfI1q00WBK03HW0G@andrew.cmu.edu> Please send your thoughts to connectionists so that we all can be instructed about the advantages of distributed representations. By the way, I already got two responses that I will sumarize bellow. Response number one provided the following arguments: 1- Brain uses distributed representations. He cites Lashley's (1929) experiments where rats show graceful performance degradation when they were partially deprived of their cortex. 2- Distributed representations are more resistant to degradation. He claims this may have military implications (systems resistant to enemy fire type of thing). [ OK does anybody out ther have data showing that distributed representations are more noise resistant than local representations ? I mean one can always clone the local representations and get noise resistance that way -Javier ] 3- He claims distributed representations performed very well in his research projects. [ Unfortunately he confuses distributed representations with backpropagation (BP). It is BP that worked well. It is always possible to force BP to develop local representations and perhaps it would work better that way. -Javier ] Response number two claims that *very* distributed representations are probably the wrong way to go. He said "Sligthly" distributed representations (like the ones used in Kruschke's ALCOVE model) are better. Unfortunately he does not provide any data supporting this point. I just got response # 3, which claims that distributed representations performed consistently better than local in the NETtalk domain and in isolated letter speech. [ Tom, could you send me some references ? Thanks - Javier ] -- Javier From tgd at turing.CS.ORST.EDU Thu Jun 6 18:11:44 1991 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Thu, 6 Jun 91 15:11:44 PDT Subject: Are they really worth the effort ? In-Reply-To: Javier Movellan's message of Thu, 6 Jun 91 13:47:19 -0400 (EDT) <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu> Message-ID: <9106062211.AA13213@turing.CS.ORST.EDU> In my studies of error-correcting output codes, I found that these codes---which are particularly neat distributed representations---performed consistently better than local representations in the NETtalk domain and in isolated letter speech recognition. --Tom Thomas G. Dietterich Department of Computer Science Dearborn Hall, 303 Oregon State University Corvallis, OR 97331-3102 503-737-5559 From Nigel.Goddard at B.GP.CS.CMU.EDU Thu Jun 6 19:06:13 1991 From: Nigel.Goddard at B.GP.CS.CMU.EDU (Nigel.Goddard@B.GP.CS.CMU.EDU) Date: Thu, 6 Jun 91 19:06:13 EDT Subject: distributed/local Message-ID: Both extremes are wrong for representing conceptual knowledge (i.e., one unit per concept versus all units participate in all concepts). Disadvantages of extreme local include no tolerance to failure (neurons die all the time), difficult to express nuance without impossibly large numbers of units. The big advantage is it is easy to see what is going on, to design structures. Disadvantages of extreme distributed include crosstalk when more than one item is active and difficulty communicating an active item from one part of the architecture to another (too many links required). The big advantages are fault-tolerance (graceful degradation) and generalization. The answer is something in between the extremes (not that this is news to anyone), depending on what the task is. Order logn units per concept for an n-unit net might be a good place to start. Feldman has a TR discussing these issues in much more depth (TR 189, "Neural Representation of Conceptual Knowledge", Computer Science Dpt, Univ. Rochester, NY 14627). Also published as a book chapter, I believe. Nigel Goddard From soller%asylum at cs.utah.edu Thu Jun 6 22:54:43 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Thu, 6 Jun 91 20:54:43 -0600 Subject: Request for Information on Cognitive Science Curriculum Message-ID: <9106070254.AA24372@asylum.utah.edu> At the University of Utah, we are in the process of putting together a curriculum for Cognitive Science degrees at the undergraduate and graduate level. This faculty/student initiative is being led by Dr. Dick Burgess of Physiology. We were wondering what classes and sequences are considered to form the core of established Cognitive Science degree granting programs at graduate and undergraduate levels? Jerome Soller Department of C.S. U. of Utah soller at asylum.utah.edu From slehar at park.bu.edu Fri Jun 7 08:56:58 1991 From: slehar at park.bu.edu (Steve Lehar) Date: Fri, 7 Jun 91 08:56:58 -0400 Subject: Distributed Representations In-Reply-To: connectionists@c.cs.cmu.edu's message of 7 Jun 91 09:39:59 GM Message-ID: <9106071256.AA15832@park.bu.edu> I think the essence of this debate is in the nature of the input data. If your input is boolean in nature and reliably correct, then the processing performed on it can be similarly boolean and sequential with a great saving in time and space. It is when the input is fuzzy, ambiguous and distributed that the sequential logical boolean type of processing runs into problems. A perfect example is image understanding. No single local region of the image is sufficient for reliable identification. Try this yourself- punch a little hole in a big piece of paper and lay it on a randomly selected photograph and see how much you can recognize through that one local aperture. You have no way of knowing what the local feature is without the global context, but how do you know the global context without building it up out of the local pieces? Studies of the visual system suggest that in nature this problem is solved by a parallel optimization of all the local pieces in parallel together with many levels of global representations, such that the final interpretation is a kind of relaxation due to all of the constraints felt at all of the different representations all at the same time. This is the basic idea of Grossberg's BCS/FCS algorithm, and is in contrast to a more sequential "AI" approach where the local pieces are each evaluated independantly, and the results passed on to the next stage. I would claim that such an approach can never work reliably with natural images. I would be happy to provide more information on the BCS/FCS and my implementations of it to interested parties. From hendler at cs.UMD.EDU Fri Jun 7 10:40:58 1991 From: hendler at cs.UMD.EDU (Jim Hendler) Date: Fri, 7 Jun 91 10:40:58 -0400 Subject: distributed/local In-Reply-To: Nigel.Goddard@B.GP.CS.CMU.EDU's message of Thu, 6 Jun 91 19:06:13 EDT <9106071428.AA09615@mimsy.UMD.EDU> Message-ID: <9106071440.AA23704@dormouse.cs.UMD.EDU> For what it's worth, some preliminary results showing a well-behaved relationship between local and distributed reps are in paper I had at the NIPS conf (Advances in Neur. Info. Proc. Sys I - Touretzky (ed), 1989, p.553). I have followed up on this work a little, with a better anaylsis of the relationship described in last year's Cog. Sci. Conference, but the work is pretty preliminary. I've pretty much stopped pursuing this actively, but anyone wanting to pick up on it is welcome... -J. Hendler From hu at eceserv0.ece.wisc.edu Fri Jun 7 11:22:28 1991 From: hu at eceserv0.ece.wisc.edu (Yu Hu) Date: Fri, 7 Jun 91 10:22:28 -0500 Subject: What is distributed/local representation Message-ID: <9106071522.AA18585@eceserv0.ece.wisc.edu> While lots of buzzzzz words such as graceful degradation, appear in the discussion, may I ask a rather naive question: Shall someone give a mathematically (or .....ly) sound definition of distribution and local representation (of what?) then we proceed to discuss them? Suppose the representations are for data vector in an N-dimensional space. Is Distributed representation referred to data with many non-zero elements, and local representation to the opposite? If not, what are they? Regards, Yu Hen Hu Department of Electrical and Computer Engr. (608)262-6724(phone) Univ. of Wisconsin - Madison (608)262-1267(fax) 1415 Johnson Drive hu at engr.wisc.edu Madison, WI 53706-1691 U.S.A. From indurkhy at paul.rutgers.edu Fri Jun 7 12:10:42 1991 From: indurkhy at paul.rutgers.edu (Nitin Indurkhya) Date: Fri, 7 Jun 91 12:10:42 EDT Subject: Are they really worth the effort ? Message-ID: <9106071610.AA17674@paul.rutgers.edu> >In my studies of error-correcting output codes, I found that these >codes---which are particularly neat distributed >representations---performed consistently better than local >representations in the NETtalk domain and in isolated letter speech >recognition. in our own studies with the NETtalk dataset that you gave us, we found that local representations were competitive. the results are reported in "reduced complexity rule induction" by weiss and indurkhya (to be presented at ijcai-91). --nitin From lina at mimosa.physio.nwu.edu Fri Jun 7 12:47:29 1991 From: lina at mimosa.physio.nwu.edu (Lina Massone) Date: Fri, 7 Jun 91 11:47:29 CDT Subject: No subject Message-ID: <9106071647.AA05357@mimosa.physio.nwu.edu> About distributed representations The concept of distributed representation is intimately related to the concept of redundancy. The central nervous system makes a great use of redundant representations in the way receptive/projective fields are organized. I do not agree on the fact that distributed/redundant representations are primarily a protection against possible injuries or failures of the components; I'd rather consider that as a useful side-effect. To me the main values of redundancy are: greater sensitivity, higher resolution, improvement of signal-to-noise ratio, reduction of demand for stability of performance and for precision in ontogenesis. In general a comparison between the activity of a population of neurons and the activity of a single neuron will show that the population is sensitive to lower stimulus intensities, smaller increments, briefer events, higher frequencies, wider dynamic ranges than a single neuron and is less disturbed by independent drift and instability. As far as the amount of redundancy, there is some physiological evidence that the coding of information in the CNS is a compromise between fully distributed and fully localized. Given that the available number of neurons is limited, an entity (a piece of information) cannot be represented over a very large population of neurons that overlaps almost completely with the population activated by a different entity; this would cause a high degree of interference and would correspond to a very inefficient memory storage system. To maintain some degree of orthogonality within a limited number of neurons, the CNS makes the number of neurons - active for each stimulus - low. In other words each entity is represented across an ensemble of neurons but the ensemble is of limited size. As far as coarse coding, Ken Laws raised the issue of matching the structure of data with the code. I agree on that. The CNS does that by having neighboring receptors stimulated by neighboring fractions of the impinging world, i.e. by means of a topological principle. An example of the computational advantages of this idea for control problems is given in L. Massone, E. Bizzi (1990) On the role of input representations in sensorimotor mapping, Proc. IJCNN, Washington D.C. Lina Massone From tgd at turing.cs.orst.edu Fri Jun 7 12:45:26 1991 From: tgd at turing.cs.orst.edu (Tom Dietterich) Date: Fri, 7 Jun 91 09:45:26 PDT Subject: Distributed Representations In-Reply-To: Ken Laws's message of Thu 6 Jun 91 22:02:27-PDT <676270947.0.LAWS@AI.SRI.COM> Message-ID: <9106071645.AA16085@turing.CS.ORST.EDU> Date: Thu 6 Jun 91 22:02:27-PDT From: Ken Laws Mail-System-Version: I'm not sure this is the same concept, but there were several papers at the last IJCAI showing that neural networks worked better than decision trees. The reason seemed to be that neural decisions depend on all the data all the time, whereas local decisions use only part of the data at one time. This is not the same concept at all. You are worrying about locality in the input space, whereas distributed representations usually concern (lack of) locality in the output space or in some intermediate representation. I have applied decision trees to learn distributed representations of output classes, and in all of my experiments, the distributed representation performs better than learning either one large tree (to make a k-way discrimination) or learning k separate trees. I believe this is because a distributed representation is able to correct for errors made in learning any individual output unit. The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive presents experimental support for this claim. I've never put much stock in the military reliability claims. A bullet through the chip or its power supply will be a real challenge. Noise tolerance is important, though, and I suspect that neural systems really are more tolerant. It isn't a neural vs. non-neural issue: distributed representations are more redundant, and hence, more resistant to (local) damage. Noise tolerance is also not a neural vs. non-neural issue. To achieve noise tolerance, you must control over-fitting. There are many ways to do this: low-dimensional representations, smoothness assumptions, minimum description length methods, cross-validation, etc. Terry Sejnowski's original NETtalk work has always bothered me. He used a neural network to set up a mapping from an input bit string to 27 output bits, if I recall. I have never seen a "control" experiment showing similar results for 27 separate discriminant analyses, or for a single multivariate discriminant. I suspect that the results would be far better. The wonder of the net was not that it worked so well, but that it worked at all. I think you should perform these studies before you make such claims. I myself doubt them very much, because the NETtalk task violates the assumptions of discriminant analysis. In my experience, backpropagation works quite well on the NETtalk task. We have found that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor method) and generalized radial basis functions do better than backpropagation, but everything else we have tried does worse (decision trees, perceptrons, Fringe). I have come to believe strongly in "coarse-coded" representations, which are somewhat distributed. (I have no insight as to whether fully distributed representations might be even better. I suspect that their power is similar to adding quadratic and higher-order terms to a standard statistical model.) The real win in coarse coding occurs if the structure of the code models structure in the data source (or perhaps in the problem to be solved). -- Ken Laws The real win in any problem comes from good modelling, of course. But since we can't guarantee a priori that our representations are good models, it is important to develop ways for recovering from inappropriate models. I believe distributed representations provide one such way. --Tom Dietterich From dhw at t13.Lanl.GOV Fri Jun 7 14:31:42 1991 From: dhw at t13.Lanl.GOV (David Wolpert) Date: Fri, 7 Jun 91 12:31:42 MDT Subject: No subject Message-ID: <9106071831.AA11289@t13.lanl.gov> Javier Movellan wonders about the relative "advantages of distributed over local representations". He asks of members of the net, "Do you have concrete examples in your work where they did a better job than local representations? I have concrete examples in which they do worse - sometimes far worse. See references below. David Wolpert (dhw at tweety.lanl.gov) D. H. Wolpert, "A benchmark for how well neural nets generalize", Biological Cybernetics, 61 (1989), 303-315. D. H. Wolpert, "Constructing a generalizer superior to NETtalk via a mathematical theory of generalization", Neural Networks, 3 (1990), 445-452. D. H. Wolpert, "Improving the performance of generalizers via time-series-like pre-processing of the learning set", Los Alamos Report LA-UR-91-350, submitted to IEEE PAMI. From kukich at flash.bellcore.com Fri Jun 7 17:26:05 1991 From: kukich at flash.bellcore.com (Karen Kukich) Date: Fri, 7 Jun 91 17:26:05 -0400 Subject: distributed vs. local encoding schemes Message-ID: <9106072126.AA06750@flash.bellcore.com> I ran some back-prop spelling correction experiments a few years ago in which one of the control variables was the use of distributed vs. local encoding schemes for both input and output representations. Local encodings schemes were clear winners in both speed of learning and performance (correction accuracy for novel misspellings). To clarify, a local output scheme was simply a 1-of-n vector (n=200) where each node represented one word in the lexicon; a "semi-local" input scheme was a 15*30=450-unit vector where each 30-unit block locally encoded one letter in a word of up to 15 characters. This positionally-encoded input scheme was thus local w.r.t individual letters in a word but distributed w.r.t the whole word. (Incidentally, the nets took slightly longer to learn to correct the shift-variant insertion and deletion errors, but they eventually learned them as well as the shift-invariant substitution and transposition errors.) The distributed encoding schemes were m-distance lexicodes, where m is the Hamming distance btwn codes. Thus lexicode-1 is just a binary number code. I tried lexicodes of m=1,2,3 and 4 for both output words and input letters. Both speed of learning and correction accuracy improved linearly with increasing m. These results were published in a paper that appeared in the U.S. Post Office Avanced Technology Conference in May of 1988. My only interpretation of the results is that local encoding schemes simplify the learning task for nets; I'm convinced that distributed schemes are essential for cognitive processes such as semantic representation at least, due to the need for multi-dimensional semantic access and association. As an epilog, I ran a few more experiments afterword that left me with a small puzzle. In the above experiments I had also found that performance improved as the number of hidden nodes increased up to about n(=200) and then leveled off. Afterwords, I tested the local net with the 450-unit positionally-encoded input scheme and NO hidden nodes and found performance equal to or better than any net with a hidden layer and much faster learning. But when I tried a shift-invariant input encoding scheme, in which misspellings were encoded by a 420-unit vector representing letter bigrams and unigrams, I found similarly good performance for nets with hidden layers but miserable performance for a net with no hidden layer. Apparently, the positionally-encoded input scheme yields a set of linearly- separable input classes but the shift-invariant scheme does not. It's still not clear to me why this is? Karen Kukich kukich at bellcore.com From ps_coltheart at vaxa.mqcc.mq.oz.au Sat Jun 8 10:51:22 1991 From: ps_coltheart at vaxa.mqcc.mq.oz.au (Max Coltheart) Date: Sat, 8 Jun 91 09:51:22 est Subject: distributed representations Message-ID: <9106072351.AA01618@macuni.mqcc.mq.oz.au> The original posting about this mentioned the property of graceful degradation as one of the virtues of systems that use distributed respresentations. In what way is this a virtue? For nets that are doing some engineering job such as character recognition, it would obviously be good if some damage or malfunction didn't much affect the net's performance. But for nets that are meant to be models of cognition, the hidden assumption seems to be that after brain damage there is graceful degradation of cognitive processing, so the fact that nets show graceful degradation too means they have promise for modelling cognition. But where's the evidence that brain damage degrades cognition gracefully? That is, the person just gets a little bit worse at a lot of things? Very commonly, exactly the opposite happens - the person remains normal at almost all kinds of cognitive processing, but some specific cognitive task suffers catastroph- ically. No graceful degradation here. I could give very many examples: I'll just give one (Semanza & Zettin, Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had impaired language, but this impairment was confined to language production (comprehension was fine) and to the production of just one type of word: proper nouns. He could understand proper nouns normally, but could produce almost none whilst his production of other kinds of nouns was normal. What's graceful about this degradation of cognition? If cognition does *not* degrade gracefully, and neural nets do, what does this say about neural nets as models of cognition? Max Coltheart From dave at cogsci.indiana.edu Fri Jun 7 22:03:50 1991 From: dave at cogsci.indiana.edu (David Chalmers) Date: Fri, 7 Jun 91 21:03:50 EST Subject: distributed reps Message-ID: Properties like damage resistance, graceful degradation, etc, are all nice, useful, cognitively plausible possibilities, but I would have thought that by far the most important property of distributed representation is the potential for systematic processing. Obviously ultra-local systems (every possible concept represented by an arbitrary symbol) don't allow much systematic processing, as each symbol has to be handled by its own special rule: e.g. , (though things can be improved somewhat by connecting the symbols up, as e.g. in a semantic network). Things are much improved by using compositional representations, as e.g. found in standard AI. If you represent many concepts by compounding the basic tokens, then certain semantic properties can be reflected in internal structure -- e.g. "LOVES(CAT, DOG)" and "LOVES(JOHN,BILL)" have relevantly similar internal structures -- opening the door to processing these structures in systematic ways. Distributed representations just take this idea a step further. One sees the systematicity made possible by giving representations internal structure as above, and says "why stop there?" e.g. why not give every representation internal structure (why should CATs and DOGs miss out?). Compositional representations as above only represent a limited range of semantic properties systematically in internal structure -- namely, compositional properties. All kinds of other semantic properties might be fair game. By moving to e.g. vectorial representation for every concept, then e.g. the similarity structure of the semantic space can be reflected in the similarity structure of the representational space, and so on. And it turns out that you can process compositional properties systematically too (though not quite as easily). The combination of a multi-dimensional space with a large supply of possible non-linear operations seems to open up a lot of possible kinds of systematic processing, essentially because these operations can chop up the space in ways that standard operations on compositional structures can't. The proof is in the pudding, i.e. the kinds of systematic processing that connectionist networks exhibit all the time. Most obviously, automatic generalization: new inputs are slotted into some representational form, hopefully leading to reasonable behaviour from the network. Similarly for dealing with old inputs in new contexts. By comparison, with ultra-local representations, generalization is right out (except by assimilating new inputs into an old category, e.g. by nearest neighbour methods). Using compositional representations, certain kinds of generalization are obviously possible, as with decision trees. These suffer a bit from having to deal directly with the original input space, rather than developing a new representational space as with dist reps: so you (a) don't get the very useful capacity to take a representation that's developed and use it for other purposes (e.g. as context for a recurrent network, or as input for some new network), and (b) are likely to have problems on very large input spaces (anyone using decision trees for vision?). Both (a) and (b) suggest that decision trees may be unlikely candidates for the backbone of a cognitive architecture (conversely, the ability of connectionist networks to transform one representational space into another is likely to be key to their success as a cognitive architecture). As for generalization performance, that's an empirical matter, but the results of Dietterich etc seem to indicate that decision trees don't do quite as well, presumably because of the limited ways in which they can chop up a representational space (nasty rectangular blocks vs groovy flexible curves). There's far too much else that could be said, so I'll stop here. Dave Chalmers. From tsejnowski at UCSD.EDU Fri Jun 7 22:48:11 1991 From: tsejnowski at UCSD.EDU (Terry Sejnowski) Date: Fri, 7 Jun 91 19:48:11 PDT Subject: distributed/local Message-ID: <9106080248.AA27620@sdbio2.UCSD.EDU> A nice paper that compares ID3 decision trees with backprop on NETtalk and other data sets: Shavlik, J. W., Mooney, R. J., and Towell, G. G. Symbolic and neural learning algorithms: An experimental comparison (revised). Univ. Wisconsin Dept Comp. Sci Tech Report #955 (to appear Machine Learning #6). Overall, backprop performed slightly better than ID3 but took longer to train. Backprop was also more effective in using distributed coding schemes for the inputs and outputs. An error-correcting code, or even a random code, works better than a local code or hand-crafted features. (Ghulum Bakiri and Tom Dietterich reached the same conclusion). The issue of the code developed by the hidden units is also an interesting issue. In NETtalk, the intermediate code was semidistributed -- around 15% of the hidden units were used to represent each letter-to-sound correspondence. The vowels and the consonants were fairly well segregated, arguing for local coding at a gross population level (something seen in the brain) but distributed coding at the level of single units (also observed in the brain). The degree of coarseness clearly depends on the grain of the problem. In the original study Charlie Rosenberg and I showed that backprop with hidden units outperformed perceptorons, and hence 26 independent linear discriminants. The NETtalk database is available to anyone who wants to benchmark their learning algorithm. For ftp access contact Scott.Fahlman at b.gp.cs.cmu.edu Terry From french at cogsci.indiana.edu Sat Jun 8 00:39:11 1991 From: french at cogsci.indiana.edu (Bob French) Date: Fri, 7 Jun 91 23:39:11 EST Subject: semi-distributed representations Message-ID: One simultaneous advantage and disadvantage of fully distributed representations is that one representation will affect many others. This phenomenon of interference is what allows networks to generalize but it is also what leads to the problem of catastrophic forgetting. It is reasonable to suppose that the amount of interference in backpropagation networks is directly proportional to the amount of overlap of representations in the hidden layer (the "overlap" of two representations can be defined as the dot product of their activation vectors). The greater the overlap (i.e., the more distributed the representations), the more the network will be affected by catastrophic forgetting, but the better it will be at generalizing. The less the overlap (i.e., the more local the representations), the less the network will be affected by catastrophic forgetting, but the worse it will be at generalizing. If we want nets that do not need to be retrained completely when new data is presented to them but still retain their ability to generalize, we must therefore use representations that are neither too local, nor too distributed, what I have called "semi-distributed" representations. I have a paper to appear in CogSci Proceedings 1991 that proposes this relationship between the amount of overlap of representations in the hidden layer and catastrophic forgetting and generalization. The paper outlines one simple method that allows a BP network to evolve its own semi-distributed representations as it learns. - Bob French Center for Research on Concepts and Cognition Indiana University From dcp+ at cs.cmu.edu Sun Jun 9 09:30:32 1991 From: dcp+ at cs.cmu.edu (David Plaut) Date: Sun, 09 Jun 91 09:30:32 EDT Subject: distributed representations In-Reply-To: Your message of Sat, 08 Jun 91 10:51:22 -0400. <9106072351.AA01618@macuni.mqcc.mq.oz.au> Message-ID: <2428.676474232@DWEEB.BOLTZ.CS.CMU.EDU> >But where's the evidence that brain damage degrades cognition gracefully? That >is, the person just gets a little bit worse at a lot of things? Very commonly, >exactly the opposite happens - the person remains normal at almost all kinds >of cognitive processing, but some specific cognitive task suffers catastroph- >ically. No graceful degradation here. I think the issue here is a matter of scale. "Graceful degredation" refers to the gradual loss of function with increasing severity of damage - it says nothing about how specific or general that function is. Connectionist models can be modular at a global scale, but use distributed representations and show graceful degredation *within* modules. I think you would agree that, within a particular domain, this is a reasonable characterization of the behavior of many types of patient (to the degree that we understand the modular organization of certain aspects of cognition and the nature of individual patients' damage). Of course, severe damage to a module might still produce catestrophic loss of its function, perhaps leaving the remaining functions relatively intact. On the other hand, the *degree* of specificity of impairment certainly places constraints on the modular organization and the nature of the representations within each module (although I think connectionist modeling illustrates the danger of the "specific impairment implies separate module" logic). Only specific modeling work can demonstrate whether connectionist architectures and representations can account for the behavior of specific patients in an informative way. -Dave ------------------------------------------------------------------------------- David Plaut dcp+ at cs.cmu.edu School of Computer Science 412/268-8102 Carnegie Mellon University Pittsburgh, PA 15213-3890 From gasser at bend.UCSD.EDU Sun Jun 9 00:58:26 1991 From: gasser at bend.UCSD.EDU (Michael Gasser) Date: Sat, 8 Jun 91 21:58:26 PDT Subject: Distributed representations and graceful degradation Message-ID: <9106090458.AA04907@bend.UCSD.EDU> Max Coltheart discusses how damage to real neural networks often results in more of a clumsy than a graceful sort of degradation. But isn't degradation under conditions of increasing task complexity a different matter? I'm thinking of the processing of increased levels of embedding or (possibly also) numbers of arguments in natural language. Fixed-length distributed representations of syntactic or semantic structure (e.g., RAAM, Elman nets) seem to model this behavior quite well, in comparison to the usual symbolic approach (you're no more likely to fail at 28 levels of embedding than at 2) and to localist connectionist approaches (you can handle sentences with 3 arguments, but 4 are out because you run of units). Mike Gasser From siegelma at yoko.rutgers.edu Sun Jun 9 10:56:40 1991 From: siegelma at yoko.rutgers.edu (siegelma@yoko.rutgers.edu) Date: Sun, 9 Jun 91 10:56:40 EDT Subject: TR available from neuroprose; Turing equivalence Message-ID: <9106091456.AA12844@yoko.rutgers.edu> The following report is now available from the neuroprose archive: NEURAL NETS ARE UNIVERSAL COMPUTING DEVICES H. T. Siegelmann and E.D. Sontag. (13pp.) Abstract: It is folk knowledge that neural nets should be capable of simulating arbitrary computing devices. Past formalizations of this fact have been proved under the hypotheses that there are potentially infinitely many neurons available during a computation and/or that interconnections are multiplicative. In this work, we show the existence of a finite network, made up of sigmoidal neurons, which simulates a universal Turing machine. It is composed of less than 100,000 synchronously evolving processors, interconnected linearly. -Hava ----------------------------------------------------------------------------- To obtain copies of the postscript file, please use Jordan Pollack's service: Example: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): ftp> cd pub/neuroprose ftp> binary ftp> get (remote-file) siegelman.turing.ps.Z (local-file) siegelman.turing.ps.Z ftp> quit unix> uncompress siegelman.turing.ps.Z unix> lpr -P(your_local_postscript_printer) siegelman.turing.ps ---------------------------------------------------------------------------- If you have any difficulties with the above, please send e-mail to siegelma at paul.rutgers.edu. DO NOT "reply" to this message, please. From jagota at cs.Buffalo.EDU Sun Jun 9 16:52:33 1991 From: jagota at cs.Buffalo.EDU (Arun Jagota) Date: Sun, 9 Jun 91 16:52:33 EDT Subject: Information Capacity and Local vs Distributed Message-ID: <9106092052.AA04177@sybil.cs.Buffalo.EDU> Dear Connectionists, I think Information Capacity* (IC) (Abu-Mostafa, Jacques 85) is a useful quantitative criterion for L vs D, illustrated by the following trivial example. You are given k pebbles, to be placed in k-of-n locations. location has pebble => `1', otherwise `0'. IC == # distinct vectors that can be stored = C(n,k) (n choose k) For this e.g, its nice that the Binomial distribution quantifies IC for L vs D. The IC of k ~ n/2 (distributed) is by far superior. k = 1 ==> Local, IC = n k is n/2 ==> distributed, IC = C(n,n/2) is maximum k = n-1 ==> over-distributed, IC = n With (threshold-element) connectionist nets, the analogy holds, but the (hidden or output layer) units [locations] are not independent. I would think there is scope for theory and empirical work along these lines. I have seen IC work on symmetric nets but even here I am unaware of work on IC as a function of k. I am unaware (haven't looked) of any work on FF nets. * - IC is actually defined as log of how I have shown Sincerely, Arun Jagota jagota at cs.buffalo.edu From peterc at chaos.cs.brandeis.edu Mon Jun 10 00:06:49 1991 From: peterc at chaos.cs.brandeis.edu (Peter Cariani) Date: Mon, 10 Jun 91 00:06:49 edt Subject: (the late) Rook McCulloch Message-ID: <9106100406.AA29926@chaos.cs.brandeis.edu> Rook McCulloch also edited a 4 volume set of Warren McCulloch's works, "The Collected Works of Warren S. McCulloch", published by Intersystems Press in 1989 (401 Victor Way #3, Salinas, CA 93907 USA; $84 for 4 volumes, paper). In addition to her forward and Warren McCulloch's papers, the set also contains some very nice essays by Jerry Lettvin, Michael Arbib, F.S.C. Northrop, Heinz von Foerster, D.M. MacKay (and others). For those of us who never knew the McCullochs, this seems to be the best available source of information about what they thought and felt. Also of relevance is the book of Steve Heims on the Macy conferences and the origins of cybernetics ("The Cybernetics Group", MIT Press, 1991) in which Warren McCulloch's role is amply discussed. From bates at crl.ucsd.edu Mon Jun 10 12:25:10 1991 From: bates at crl.ucsd.edu (Elizabeth Bates) Date: Mon, 10 Jun 91 09:25:10 PDT Subject: response to max coltheart Message-ID: <9106101625.AA25405@crl.ucsd.edu> From Mailer-Daemon Sun Jun 9 19:23:49 1991 From: Mailer-Daemon (Mail Delivery Subsystem) Date: Sun, 9 Jun 91 16:23:49 PDT Subject: Returned mail: User unknown Message-ID: <9106092323.AA19415@crl.ucsd.edu> ----- Transcript of session follows ----- Connected to macuni.mqcc.mq.oz.au: >>> RCPT To: <<< 550 ... User unknown: Connection refused 550 connectnet at macuni.mqcc.mq.oz.au... User unknown ----- Unsent message follows ----- From bates Sun Jun 9 19:23:49 1991 From: bates (Elizabeth Bates) Date: Sun, 9 Jun 91 16:23:49 PDT Subject: distributed representations Message-ID: <9106092323.AA19411@crl.ucsd.edu> I respectfully disagree with Max Coltheart that brain damage usually or even often yields discrete and domain-specific performance decrements. to be sure, such cases have been reported -- and indeed, their "news value" often lies in the surprisingly discrete nature of the patient's profile. but such case studies typically fail to recognize issues like the peaks and valleys that might have been there premorbidly, i.e. in the "man that used to be". also, we often fail to recognize that by choosing those patients with "interesting" profiles against an unspecified number of background patients with "uninteresting" profiles, we are capitalizing on chance distributions across a number of noisy domains. given 1000 patients who are normally distributed across 100 tasks, I have a pretty solid chance of finding a good number of striking "double dissociations" and even more "single dissociations" entirely by chance. For a simulation that makes EXACTLY that point (coupled with a detailed critique of a "real" study of 20 patients that make this very error), see Bates, Appelbaum and Allard, "Statistical constraints on the use of single case studies in neuropsychological research", in the last issue of Brain and Language. -liz bates From bates at crl.ucsd.edu Mon Jun 10 12:29:33 1991 From: bates at crl.ucsd.edu (Elizabeth Bates) Date: Mon, 10 Jun 91 09:29:33 PDT Subject: Distributed representations and graceful degradation Message-ID: <9106101629.AA25488@crl.ucsd.edu> Marcel Just and Patricia Carpenter have a paper coming out in Psychological Review that shows (reviewing quite a range of studies) how the ability of normal adults to handle (read, comprehend) various levels of grammatical complexity and ambiguity interacts with (1) that adult's working memory span, and (2) the effects of a cognitive load imposed by a secondary task. The notion of graceful degradation seems to apply to their work very well. You can obtain a preprint of their paper by contacting them at CMU (Psychology Department). -liz bates From cabestan at eel.upc.es Mon Jun 10 10:05:46 1991 From: cabestan at eel.upc.es (JOAN CABESTANY) Date: Mon, 10 Jun 1991 14:05:46 +0000 Subject: Call for Papers IWANN'91 Message-ID: <"155*/S=cabestan/OU=eel/O=upc/PRMD=iris/ADMD= /C=es/"@MHS> Dear Colleagues, Please find here the second Call for Papers for IWANN'91. Remember that the absolute limit date for work presentation is June 20 th. IWANN'91 will be held in GRANADA next September. ****************************************************************** ****************************************************************** INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS IWANN'91 Second Announcement Granada, Spain September 17-19, 1991 ORGANISED AND SPONSORED BY Spanish Chapter of the Computer Society of IEEE, AEIA (IEEE Computer Society Affiliate), and Department of Electronic and Computer Technology. University of Granada. Spain. SCOPE Artificial Neural Networks (ANN) were first developed as structural or functional modelling systems of natural ones, featuring the ability to perform problem-solving tasks. They can be thought as computing arrays consisting of series of repetitive uniform processors (neuron-like elements) placed on a grid. Learning is achieved by changing the interconnections between these processing elements. Hence, these systems are also called connectionist models. ANN has become a subject of wide-spread interest: they offer an odd scheme-based programming standpoint and exhibit higher computing speeds than conventional von-Neumann architectures, thus easing or even enabling handling complex task such as artificial vision, speech recognition, information recovery in noisy environments or general pattern recognition. In ANN systems, collective information management is achieved by means of parallel operation of neuron-like elements, into which information processing is distributed. It is intended to exploit this highly parallel processing capability as far as possible in complex problem-solving tasks. Cross-fertilization between the domains of artificial and real neural nets is desirable. The more genuine problems of biological computation and information processing in the nervous system still remain open and contributions in this line are more than welcome. Methodology, theoretical frames, structural and organizational principles in neuroscience, self- organizing and co-operative processes and knowledge based descriptions of neural tissue are relevant topics to bridge the gap between the artificial and natural perspectives. The workshop intends to serve as a meeting place for engineers and scientists working in this area, so that present contacts and relationships can be further increased. The workshop will comprise two complementary activities: . scientific and technical conferences, and . scientific communications sessions. TOPICS The workshop is open to all aspects of artificial neural networks, including: 1. Neural network theories. Neural models. 2. Biological perspectives 3. Neural network architectures and algorithms. 4. Software developments and tools. 5. Hardware implementations 6. Applications. LOCATION Facultad de Ciencias Campus Universitario de Fuentenueva Universidad de Granada 18071 GRANADA. (SPAIN) LANGUAGES English and Spanish will be the official working languages. English is preferable as the working language. Simultaneous translation will be available. Simultaneous translation will be available. CALL FOR PAPERS The Programme Committee seeks original papers on the six above mentioned areas. Survey papers on the various available approaches or particular application domains are also sought. In their submitted papers, authors should pay particular attention to explaining the theoretical and technical choices involved, to make clear the limitations encountered and to describe the current state of development of their work. INSTRUCTIONS TO AUTHORS Three copies of submitted papers (not exceeding 8 pages in 21x29.7 cms (DIN-A4), with 1,6 cm. left, right, top and bottom margins) should be received by the Programme Chairman at the address below before June 20, 1991. The headlines should be centred and include: . the title of paper in capitals . the name(s) of author(s) . the address(es) of author(s), and . a 10 line abstract. Three blank lines should be left between each of the above items, and four between the headlines and the body of the paper, written in English, single-spaced and not exceeding the 8 pages limit. All papers received will be refereed by the Programme Committee. The Committee will communicate their decision to the authors on July 10. Accepted papers will be published in the proceedings to be distributed to workshop participants. In addition to the paper, one sheet should be attached including the following information: . the title of the paper, . the name(s) of author(s), . a list of five keywords, . a reference to which of the six topics the paper concerns, and . postal address of one of the authors, with phone and fax numbers, and E-mail (if available). . presentation language We intend to get in touch with various international publishers (such as Springer-Verlag and Prentice-Hall) for the final version of the proceedings. PROGRAM AND ORGANIZATION COMMITTEE Organization Chairman: Alberto Prieto (Unv. Granada. Spain) Programme Chairman: Jos Mira (UNED. Madrid. Spain) Senen Barro Unv. de Santiago (E) Francois Blayo Ecole Polytechnique Federale de Lausanne (S) Joan Cabestany Unv. Pltca. de Catalunya (E) Marie Cottrell Unv. Paris I (F) Jose Antonio Corrales Unv. Oviedo. (E) Gerard Dreyfus ESPCI Paris (F) Gregorio Fernandez Unv. Pltca. de Madrid (E) J. Simoes da Fonseca Unv. de Lisboa (P) Karl Goser Unv. Dortmund (G) Jeanny Herault INPG Grenoble (F) Jose Luis Huertas CNM- Universidad de Sevilla (E) Simon Jones Unv. Nottingham (UK) Chistian Jutten INPG Grenoble (F) Antonio Lloris Unv. Granada (E) Panos A. Ligomenides Unv. of Maryland (USA) Javier Lopez Aligue Unv. de Extremadura. (E) Federico Moran Unv. Complutense. Madrid (E) Roberto Moreno Unv. Las Palmas Gran Canaria (E) Franz Pichler Johannes Kepler Univ. (Aus) Ulrich Rueckert Unv. Dortmund (G) Francisco Sandoval Unv. de Malaga (E) Carmen Torras Instituto de Ciberntica. CSIC. Barcelona (E) V. Tryba Unv. Dortmund (G) Elena Valderrama CNM- Unv. Autonoma de Barcelona (E) Michel Weinfeld Ecole Polytechnique Paris (F) LOCAL ORGANIZING COMMITTEE (Universidad de Granada) Juan Julian Merelo Julio Ortega Francisco J. Pelayo Begona del Pino Alberto Prieto ORGANIZING ENTITIES: Spanish Chapter of the Computer Society of IEEE, AEIA (IEEE Computer Society Affiliate), and Department of Electronic and Computer Technology. University of Granada. Spain. SPONSORING ENTITIES: Ayuntamiento de Granada (Dto. de Congresos) Caja General Universidad de Granada SOME USEFUL INFORMATION Granada is a beautiful city that lies to the south of Spain, in which the mixture between Christian and Muslim culture reaches its architectural peak. The Alhambra is the most magnificent European Muslim fortress and palace conserved to-date, and Granada nights are known in all Spain for their liveliness, due to the high proportion of students. The river Genil gives rise to the Vega or Valley of Granada, where the soil is fertile and bears the most varied crops. It has small farms and beautiful villages, some as interesting as Santa Fe, where the voyage for the discovery of America was negotiated. From Granada it takes only one hour to get to the southernmost ski resort in Europe, Sierra Nevada, where Winter sports can be enjoyed. A wide road leads right up to the Veleta Peak, so that in Summer it can be reached by car. This road, at 3,428 m. above sea level, is the highest in Europe. 65 Km. from the city of Granada is Granada's Costa del Sol (so called Costa Tropical or Tropical Coast). The University of Granada is the third most important in Spain. It has 40,000 students, which makes up one sixth of the whole population. This is what gives the city a youthful and dynamic atmosphere, stimulating a "living culture". The weather during mid-September in Granada is warm, and temperatures of 30 degrees Centigrade are not unusual. Temperatures can lower during the night, so a pullover is advised. During the day, t-shirts or light shirts and trousers are the most suitable clothes. PRE AND POST WORKSHOP TOURS: A-EXCURSION: September 16: Trip to Alpujarra, typical mountain villages. Time: 9.00-20h. Price: 3500 ptas./per person (Includes Bus and lunch). B-EXCURSION: September 20: Trip to Costa del Sol, including Nerja with its wonderful caves and the seaside resorts of Almunecar and Salobre$a. Time: 9.00-20h. Price: 2000 ptas. (Includes Bus) SOCIAL ACTIVITIES: September 16: Pre Workshop tour (A-Excursion) September 17: 20:00 Reception at the Hospital Real (16th Century University Central Services Building). 22:00 Night visit to the Alhambra. September 18: 20:00 Reception at the "Palacio de los Cordova" (Albaic!n), given by the Granada City Hall (Congress Dept.). September 19: 21:00 Official dinner September 20: Post Workshop tour (B-Excursion) PROVISIONAL SCHEDULE September 17: 9:15 Opening session. 10:00-11:30 Lecture 1:Natural and Artificial Neural Nets; Prof. Dr. Roberto MORENO (Universidad de las Palmas de Gran Canaria) 11:30-12:00 Coffee-break. 12:00-13:30 Session 1. 16:00-17:30 Session 2. 17:30-18:00 Coffee-break. 18:00-19:30 Session 3 September 18: 09:30-11:00 Lecture 2: Application and Implementation of Neural Networks in Microelectronics; Prof. Dr. Ing. Karl GOSER (Universitt Dortmund) 11:00-11:30 Coffee-break. 11:30-13:30 Session 4. 16:00-17:30 Session 5. 17:30-18:00 Coffee-break. 18:00-19:30 Session 6. September 19: 09:00-11:00 Lecture 3: Cooperative Computing and Neural Networks; Prof. Panos A. LIGOMENIDES (University of Maryland) 11:00-11:30 Coffee-break. 11:30-13:30 Session 7. 16:00-17:30 Session 8. 17:30-18:00 Coffee-break. 18:00-19:30 Session 9. This form should be sent before July 25 to: Viajes Internacional Expreso (V.I.E.); Galerias Preciados; Carrera del Genil, s/n. 18005 GRANADA (Spain) Tnos. (34) 58-22.44.95, (34) 58-22.75.86, (34) 58-224944; Telex: 78525 The following hotels are available with special fees for the Workshop participants. The prices are per night and they include V.A.T. and continental breakfast: Hotel Cat. Single room Double room ______________________________________________________________ Condor *** 7700 10070 pts. Eurobecquer ** 4630 5820 Tour A ........... 3.500 pts. Tour B ........... 2.000 pts Please tick the appropriate box. Reservations can be guaranteed before July 25th. A list of other hotels is enclosed (Please address directly to them). Payment should be made in Spanish currency. I enclose a bank cheque payable to: V.I.E. INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS (IWANN'91) Granada, Spain, September 17-19, 1991 HOTEL BOOKING FORM SURNAME ______________________________________ FIRST NAME _______________________________ ORGANIZATION _______________________________________________________________ ADDRESS ____________________________________________________________________ CITY _____________________ POST CODE __________ COUNTRY______________________ TELEPHONE __________________ FAX _________________________ E-MAIL:_______________________ Accompanying person(s) ________________________________________________________ I want to reserve: _______ double room(s); ___________ single room(s) Arrival date:__________ Time: __________ Departure date:_________ Time:_________ INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS (IWANN'91) Granada, Spain, September 17-19, 1991 REGISTRATION FORM SURNAME ______________________________________ FIRST NAME _______________________________ ORGANIZATION ________________________________________________________________ ADDRESS ____________________________________________________________________ CITY _____________________ POST CODE __________ COUNTRY ______________________ TELEPHONE __________________ FAX _________________________ E-MAIL: _______________________ Fill in the appropriate box: Fee Before June 25th After June 25th ___________________________________________________________________ Regular 33.000 35.000 IEEE,AEIA,ATI members 28.000 30.000 Scholarship 4.000 5.000 This form should be sent as soon as possible to: Departamento de Electronica y Tecnologia de Computadores Facultad de Ciencias Universidad de Granada 18071 GRANADA (SPAIN) In order to avoid delays, please fax the registration form, together with a copy of the cheque or the bank transfer to: FAX: 34-58-24.32.30 or 34-58-27.42.58 INSCRIPTION PAYMENTS: Cheque payable to: IWANN'91 (16.142.512) or alternatively transfer to: IWANN'91 IWANN'91 account number: 16.142.512 account number: 007.01-450888 Caja Postal (Code: 2088-2037.1) or to Caja General Camino de Ronda, 138 Camino de Ronda, 156 18003 GRANADA (SPAIN) 18003 GRANADA (SPAIN) ************************************************************************ From ashley at spectrum.cs.unsw.oz.au Tue Jun 11 00:11:13 1991 From: ashley at spectrum.cs.unsw.oz.au (Ashley Aitken) Date: Tue, 11 Jun 91 0:11:13 AES Subject: distributed representations In-Reply-To: <9106072351.AA01618@macuni.mqcc.mq.oz.au>; from "Max Coltheart" at Jun 8, 91 9:51 am Message-ID: <9106101413.10651@munnari.oz.au> G'day, In the discussion of "Distributed Representations", Max Coltheart writes: > > But for nets that are meant to be > models of cognition, the hidden assumption seems to be that after brain damage > there is graceful degradation of cognitive processing, so the fact that nets > show graceful degradation too means they have promise for modelling cognition. > > But where's the evidence that brain damage degrades cognition gracefully? That > is, the person just gets a little bit worse at a lot of things? Very commonly, > exactly the opposite happens - the person remains normal at almost all kinds > of cognitive processing, but some specific cognitive task suffers catastroph- > ically. No graceful degradation here. I would suggest that Max is possibly confusing diffuse brain damage with catastrophic brain damage. Diffuse brain damage is the elimination of a small percentage of neurons diffusely from throughout the brain. Examples are the natural death of neurons throughout the brain and, perhaps, micro-lesions. The continual death of an immense number of neurons in the brain, thankfully only really amounts to the death of a very small percentage of the neurons in the brain. In any of the partitioned networks of the brain (say an area of the cortex) we would expect only a small number of neurons to die. If one considers that a neuron may receive in the order of thousands of synapses on it's dendritic tree, it can be understood, I believe, how the network (thought of as a connectionist network) could continue to function if one or two of these were to be eliminated. I would suggest that this continual death of neurons in the brain with the subtle, and often unnoticed, degradation in cognitive performance to be an example of (diffuse) brain damage degrading cognition gracefully. Hence, I believe this type of degradation does show neural networks have promise for modelling cogntion. Of course, this does depend on the the degradation seen in cognition being shown to be qualitatively the same as degradation seen in artificial neural networks. Catastrophic brain damage, on the other hand, is the gross elimination of neurons (usually relatively localized) from the brain. Examples are lesions resulting from head injuries or strokes, and ablation. It would seem that in this case one is most likely seeing the complete (or nearly complete) elimination of an entire network (or a critical part of it) and hence the elimination of it's associated and dependent function(s). I don't believe anyone would suggest that the brain's function would degrade gracefully under such terrorist action. Max continues: > > I could give very many examples: I'll just give one (Semanza & Zettin, > Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had > impaired language, but this impairment was confined to language production > (comprehension was fine) and to the production of just one type of word: proper > nouns. He could understand proper nouns normally, but could produce almost none > whilst his production of other kinds of nouns was normal. What's graceful about > this degradation of cognition? I am definitely no expert neuroscientist but I would suggest that this is an example of catastrophic brain damage not diffuse brain damage. Hence, I would not expect graceful degradation of cognitive performance. It seems to me that this would be too much to ask of all but the most completely holographic-like systems. The interesting point to be made from this example would then be that it appears to be evidence for a cortical region involved (directly or in-line) with the speech of only nouns. Amazing! It would also be interesting to test if there is any subtle difference in our *understanding* of a noun depending upon whether we are receiving (ie hearing or seeing it) with when we are producing (ie speaking or imagining) it. If this diagnosis of catastrophic brain damage is correct then I believe this example is mute upon whether or not the brain is functionally a Connectionist System. Still, the Connectionist System, in my opinion, gets the points for the diffuse brain damage. Hence Max's concluding suggestion, > If cognition does *not* degrade gracefully, and neural nets do, what does this > say about neural nets as models of cognition? becomes rather misplaced because cognition does appear to degrade gracefully under diffuse brain damage and catastrophically under catastrophic brain damage. The former providing possible evidence for neural networks as models of cognition. Ashley ashley at spectrum.cs.unsw.oz.au From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From ahmad at ICSI.Berkeley.EDU Mon Jun 10 18:58:12 1991 From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad) Date: Mon, 10 Jun 91 15:58:12 PDT Subject: Preprint Message-ID: <9106102258.AA16050@icsib18.Berkeley.EDU> The following paper (to appear in this Cognitive Science proceedings) is available from the neuroprose archives as ahmad.cogsci91.ps.Z (ftp instructions below). Efficient Visual Search: A Connectionist Solution by Subutai Ahmad & Stephen Omohundro International Computer Science Institute Abstract Searching for objects in scenes is a natural task for people and has been extensively studied by psychologists. In this paper we examine this task from a connectionist perspective. Computational complexity arguments suggest that parallel feed-forward networks cannot perform this task efficiently. One difficulty is that, in order to distinguish the target from distractors, a combination of features must be associated with a single object. Often called the binding problem, this requirement presents a serious hurdle for connectionist models of visual processing when multiple objects are present. Psychophysical experiments suggest that people use covert visual attention to get around this problem. In this paper we describe a psychologically plausible system which uses a focus of attention mechanism to locate target objects. A strategy that combines top-down and bottom-up information is used to minimize search time. The behavior of the resulting system matches the reaction time behavior of people in several interesting tasks. A postscript version of the paper can be obtained by ftp from cheops.cis.ohio-state.edu. The file is ahmad.cogsci91.ps.Z in the pub/neuroprose directory. You can either use the Getps script or follow these steps: unix:2> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. Name (cheops.cis.ohio-state.edu:): anonymous 331 Guest login ok, send ident as password. Password: neuron 230 Guest login ok, access restrictions apply. ftp> cd pub/neuroprose ftp> binary ftp> get ahmad.cogsci91.ps.Z ftp> quit unix:4> uncompress ahmad.cogsci91.ps.Z unix:5> lpr ahmad.cogsci91.ps --Subutai ahmad at icsi.berkeley.edu From crr at shum.huji.ac.il Mon Jun 10 15:12:11 1991 From: crr at shum.huji.ac.il (crr@shum.huji.ac.il) Date: Mon, 10 Jun 91 22:12:11 +0300 Subject: distributed vs. local encoding schemes Message-ID: <9106101912.AA28249@shum.huji.ac.il> Terry Sejnowski mentioned the kinds of hidden units that we found in NETtalk. As for the input/output representations, we ran a number of experiments using both local (one unit per letter/phoneme, but more than one unit on per window) and distributed representations (more than one unit on per letter/phoneme). Learning times are generally faster with distributed representations simply because the net inputs and resulting error gradients are larger. (However it might be possible to boost the learning rate for the local representation to match the distributed one. I don't know if this would affect generalization or not since I didn't try it.) Using a representation that "makes sense" for the particular domain (such as using an articulatory feature code for the phonemes -- or is this local because the units represent features?) also leds to faster learning, and is more resistant to damage than a "random" encoding of the phonemes. Charlie Rosenberg From CADEPS at BBRNSF11.BITNET Tue Jun 11 08:56:05 1991 From: CADEPS at BBRNSF11.BITNET (JANSSEN Jacques) Date: Tue, 11 Jun 91 14:56:05 +0200 Subject: No subject Message-ID: <5901C8A706400066@BITNET.CC.CMU.EDU> STEERABLE GenNets - A Query. Abstract : One can evolve a GenNet (a neural net evolved with the genetic algorithm) to display two separate behaviors depending upon the setting of a clamped input control variable. By using an intermediate control value one obtains an intermediate behavior. For example, let the behaviors be sinusoidal oscillations of periods T1 and T2, where the control settings are 0.5 and -0.5 By using a control value of 0.3, one will get a sinusoid with a period between T1 and T2. Why? Has anyone out there had any similar experiences (i.e. of this sort of generalised behavioral learning), and has anybody any idea why GenNets are capable of such a phenomenon? If I receive some interesting replies, I'll prepare a summary and report back. Further details. One of the great advantages of GenNets (= using the GA to teach your neural nets their behaviors) over traditional NN paradigms such as backprop, Hopfield, etc is that the GA treats your NN as a black box, and it doesnt matter how complex the internal dynamics of the NN are. All that counts is the result. How well did the NN perform? If it did well, the bitstring which codes for the NN's weights will survive. This allows the creation of GenNets which can cope with both inputs and outputs which vary constantly. One does not need stationary output values a la Hopfield etc. Hence NNs become much more "dynamic", compared to the more "static" nature of traditional paradigms. One can thus evolve dynamics (behaviors) on NNs (GenNets). This opens up a new world of NN possibilities. If one can evolve a GenNet to express one behavior, why not two? If two, can one evolve a continuum of behaviors depending upon the setting of a controlled input value? The variable frequency generator GenNet mentioned above shows that this is possible. But I'm damned if I know why? Whats going on? Have any of you had similar experiences? Any clues for a theoretical explanation for this extraordinary phenomenon? P.S. To evolve this GenNet, use a fully connected net, with all external inputs set at zero, except for two inputs. Clamp one at 0.5, and the other at 0.5 (and then -0.5 in the second "experiment"). The fitness is the inverse of the sum of the two sums (for the two expts) of the squares of the difference between the desired output at each clock cycle and the actual output. Assign one neuron to be the output neuron. Cheers, Hugo de Garis, University of Brussels, Belgium, George Mason University, VA, USA. From thomasp at gshalle1.informatik.tu-muenchen.de Tue Jun 11 11:50:25 1991 From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas) Date: Tue, 11 Jun 1991 17:50:25 +0200 Subject: Research Position in SPAIN ? Message-ID: <9106111550.AA08800@gshalle1.informatik.tu-muenchen.de> I'm a graduate student in computer science at Munich Technical University and plan to work in a research position related to neural networks in SPAIN. I would extremely appreciate if you could provide me some information on university/private/company research institutes active or interested in the field of neural network research and located in the Madrid or Seville area. Preferably, I would like to start working in Spain in November 91 or, alternatively, in January/February 1992. Sincerely, Patrick Thomas Institute for Medical Psychology Goethestr. 31 8000 Munich 2 From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From moeller at kiti.informatik.uni-bonn.de Thu Jun 13 03:50:34 1991 From: moeller at kiti.informatik.uni-bonn.de (Knut Moeller) Date: Thu, 13 Jun 91 09:50:34 +0200 Subject: TR available from neuroprose; learning algorithms Message-ID: <9106130750.AA01054@kiti.> The following report is now available from the neuroprose archive: LEARNING BY ERROR-DRIVEN DECOMPOSITION D.Fox V.Heinze K.Moeller S.Thrun G.Veenker (6pp.) Abstract: In this paper we describe a new selforganizing decomposition technique for learning high-dimensional mappings. Problem decomposition is performed in an error-driven manner, such that the resulting subtasks (patches) are equally well approximated. Our method combines an unsupervised learning scheme (Feature Maps [Koh84]) with a nonlinear approximator (Backpropagation [RHW86]). The resulting learning system is more stable and effective in changing environments than plain backpropagation and much more powerful than extended feature maps as proposed by [RMW89]. Extensions of our method give rise to active exploration strategies for autonomous agents facing unknown environments. The appropriateness of this technique is demonstrated with an example from mathematical function approximation. ----------------------------------------------------------------------------- To obtain copies of the postscript file, please use Jordan Pollack's service: Example: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): ftp> cd pub/neuroprose ftp> binary ftp> get (remote-file) fox.decomp.ps.Z (local-file) fox.decomp.ps.Z ftp> quit unix> uncompress fox.decomp.ps.Z unix> lpr -P((your_local_postscript_printer) fox.decomp.ps.Z ---------------------------------------------------------------------------- If you have any difficulties with the above, please send e-mail to moeller at kiti.informatik.uni-bonn.de DO NOT "reply" to this message!! From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From thomasp at gshalle1.informatik.tu-muenchen.de Thu Jun 13 13:33:19 1991 From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas) Date: Thu, 13 Jun 1991 19:33:19 +0200 Subject: Gracias & Sorry Message-ID: <9106131733.AA19732@gshalle1.informatik.tu-muenchen.de> Sorry for the "garbage" and muchas gracias to all those helping out with adresses and conference announcements. Patrick From utans-joachim at CS.YALE.EDU Sat Jun 15 12:48:45 1991 From: utans-joachim at CS.YALE.EDU (Joachim Utans) Date: Sat, 15 Jun 91 12:48:45 EDT Subject: preprint available Message-ID: <9106151648.AA01689@SUNNY.SYSTEMSX.CS.YALE.EDU> The following preprint has been placed in the neuroprose archive at Ohio State University: Selecting Neural Network Architectures via the Prediction Risk: Application to Corporate Bond Rating Prediction Joachim Utans John Moody Department of Electrical Engineering Department of Computer Science Yale University Yale University New Haven, CT 06520 New Haven, CT 06520 Abstract: Intuitively, the notion of generalization is closely related to the ability of an estimator to perform well with new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select the optimal network architecture. The prediction risk needs to be estimated from the available data; here we approximate the prediction risk by v-fold cross-validation and asymtotic estimates of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of complete a priori information that could be used to impose a structure to the network architecture. To retrieve it by anonymous ftp: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): neuron ftp> cd pub/neuroprose ftp> binary ftp> get utans.bondrating.ps.Z ftp> quit unix> uncompress utans.bondrating.ps unix> lpr -P(your_local_postscript_printer) utans.bondrating.ps Joachim Utans From h1201kam at ella.hu Sun Jun 16 13:05:00 1991 From: h1201kam at ella.hu (Kampis Gyorgy) Date: Sun, 16 Jun 91 13:05:00 Subject: a new book; special issue on emergence; preprint availab Message-ID: <9106161115.AA13832@sztaki.hu> ANNOUNCEMENTS **************************************************************** 1. a new book 2. a Special Issue on emergence 3. preprint available **************************************************************** 1. the book George Kampis SELF-MODIFYING SYSTEMS IN BIOLOGY AND COGNITIVE SCIENCE: a New Framework for Dynamics, Information and Complexity Pergamon, Oxford-New York, March 1991, 546pp with 96 Figures About the book: The main theme of the book is the possibility of generating information by a recursive self-modification and self- redefinition in systems. The book offers technical discussions of a variety of systems (Turing machines, input-output systems, synergetic systems, connectionist networks, nonlinear dynamic systems, etc.) to contrast them with the systems capable of self-modification. What in the book are characterized as 'simple systems' involve a fixed definition of their internal modes of operations, with variables, parts, categories, etc. invariant. Such systems can be represented by single schemes, like computational models of the above kind. A relevant observation concerning model schemes is that any scheme grasps but one facet of material structure, and hence to every model there belongs a complexity excluded by it. In other words, to every simple system there belongs a complex one that is implicit. Self-modifying systems are 'complex' in the sense that they are characterized by the author as ones capable to access an implicate material complexity and turn it into the information carrying variables of a process. An example for such a system would be a tape recorder which spontaneously accesses new modes of information processing (e.g. bits represented as knots on the tape). A thesis discussed in the book is that unlike current technical systems, many natural systems know how to do that trick, and make it their principle of functioning. The book develops the mathematics, philosophy and methodology for dealing with such systems, and explains how they work. A constructive theory of models is offered, with which the modeling of systems can be examined in terms of algorithmic information theory. This makes possible a novel treatment of various old issues like causation and determinism, symbolic and nonsymbolic systems, the origin of system complexity, and, finally, the notion of information. The book introduces technical concepts such as information sets, encoding languages, material implications, supports, and reading frames, to develop these topics, and a class of systems called 'component-systems', to give examples for self-modifying systems. As an application, it is discussed how the latter can be applied to understand aspects of evolution and cognition. From tgelder at phil.indiana.edu Mon Jun 17 11:45:58 1991 From: tgelder at phil.indiana.edu (Timothy van Gelder) Date: Mon, 17 Jun 91 10:45:58 EST Subject: distribution and its advantages Message-ID: Javier Movellan's question -- what are distributed representations are good for anyway? -- is I think an important one for connectionism and cognitive science generally. Trouble is, the way it was put, it presupposes that there is some one kind of representation that everyone is referring to when they talk about distribution. In fact, though most people have a reasonable idea what they themselves intend when they use the term "distributed", they usually don't realize that its not the way many other people use it. This is immediately apparent if one takes an overview of the responses that actually came in. Various people took it that a representation is distributed if it utilizes many units rather than just one, with the "strength" of distribution increasing as the total number of units (or perhaps, the proportion of available units) used increases. Massone by contrast thought the key concept is that of redundancy, which I take roughly to mean that a given piece of input information is represented multiple times. This presumably requires that many units are used (i.e., that there is distribution in the previous sense) but is a significantly stronger requirement. Massone's position was echoed in some other responses. Chalmers claims that a distributed representation is one in which every representation, whether of a basic concept or a more complex one, has a kind of semantically significant internal structure. This definition also seems to presuppose the first kind of definition, but is different from redundancy. Proposing a somewhat different definition again, French suggested that distribution is a matter of the degree of "overlap" between representations of different entities. And so on. This lack of agreement over what distribution actually is at least partly responsible for the fact that no really clear and useful consensus on the advantages of distributed representation really emerged in the responses to the initial question. It manifests a wider lack of agreement over the concept of distribution in connectionism and cognitive science more generally. I once surveyed as many of the definitions and occurrences of "distribution", "distributed representation", etc., as I could find in the cognitive science literature, and found that there were at least 5 very different basic properties that people often refer to as distribution. These ranged from a very simple notion of "spread-out- ness" - each entity being represented by activity in many units rather than just one - at one extreme, to complete functional equipotentiality at the other. (A representation is functionally equipotential when any part of it can stand in for the whole thing. Holograms are famous for exhibiting a form of equipotentiality.) Authors often picked up multiple strands and ran them together in one characterization, or defined distribution differently on different occasions, sometimes even in the same work. Probably the two most common definitions are (1) the notion of simple extendedness just mentioned (i.e., using "many" units to represent a given item) and (2) superimposition of representations. We have superimposition when there are multiple items being represented at the same time, but no way of pointing to the discrete part of the representation which is responsible for item A, the discrete part which is responsible for item B, and so forth. Think of the weights in a standard feed-forward network. Here multiple input-output associations are represented at the same time, but there is (in general) no separate set of weights for each association. To see how these two senses simultaneously dominate connectionist discussions of distribution, think again of the answers to Movellan's question. Many of the answers took the form, roughly, that "when I used representations involving activity in many units rather than just one in such and such a network, I found better (or worse!) performance". Other responses, particularly those that made reference to the brain or neuropsychological results, were more concerned with the extent to which there is separate or discrete storage of the various components of our knowledge in a given circumscribed domain. (In these contexts, "graceful degradation" in performance is often thought to be a consequence of knowledge being stored in an inextricably superimposed fashion.) In one sense, it is not surprising that these are the two most common notions of distribution. Perhaps the only thing that is really clear about distribution is the opposition between distribution and localization: whatever distributed representations are, they are non-local. Trouble is, "local" turns out to be ambiguous. Sometimes "local" means restricted in extent (e.g., using only one unit rather than many), and sometimes it means not overlapping with the representation of anything else. The two most common senses of "distribution" mentioned a moment ago simply result from denying locality in these two distinct senses. It seems to me that a necessary condition for any significant progress on the question "what are distributed representations good for?" is that this general state of confusion over what "distributed" means be resolved. This means clearly laying out the different senses that are floating around, picking out the one that is the most central and most theoretically significant, and giving it a reasonably precise definition. I attempted this in Ch.1 of my PhD dissertation (Distributed Representation, University of Pittsburgh 1989); a shorter overview of some of the material from that chapter has recently appeared as "What is the D in PDP? An overview of the concept of distribution" in Stich, Ramsey & Rumelhart (eds) Philosophy and Connectionist Theory. In my opinion, the most important concept in the vicinity of distribution is that of superimposition of representations, and it is for this that the term "distributed" should really be reserved. One advantage of this strategy is that superimposition admits of a surprisingly clear and satisfying mathematical definition: Suppose R is a representation of multiple items. If the representings of the different items are fully superimposed, every part of the representation R must be implicated in representing each item. If this is achieved in a non-trivial way there must be some encoding process that generates R given the various items to be stored, and which makes R vary, at every point, as a function of each item. This process will be implementing a certain kind of transformation from items to representations. This suggests thinking of distribution more generally in terms of mathematical transformations exhibiting a certain abstract structure of dependency of the output on the input. More precisely, define any transformation from a function F to another function G as strongly distributing just in case the value of G at any point varies with the value of F at every point; the Fourier transform is a classic example. Similarly, a transformation from F to G is weakly distributing, relative to a division of the domain of F into a number of sub-domains, just in case the value of G at every point varies as a function of the value of F at at least one point in each sub-domain. The classic example here is the linear associator, in which a series of vector pairs are stored in a weight matrix by first forming, and then adding together, their respective outer products. Each element of the matrix varies with every stored vector, but only with one element of each of those vectors. (The "functions" F and G in this case describe the input vectors and the association matrix respectively; e.g., given an argument specifying a place in an input vector, F returns the value of the vector at that place.) Clearly, a given distributing transformation yields a whole space of functions resulting from applying that transformation to different inputs (i.e., different functions F). If we think of these output functions as descriptions of representations, and the input functions as descriptions of items to be represented, the distributing transformation is defining a whole space or scheme of distributed representations. To be a distributed representation, then, is to be a member of such a scheme; it is to be a representation R of a series of items C such that the encoding process which generates R on the basis of C implements a given distributing transformation. Basically, then, distributed representations are what you get from distributing transformations, which are transformations which make each part of the output (the representation) depend on every part of the input (what you're representing). Now, mathematically speaking, there is a vast number of different kinds of distributing transformations, and so there is a vast number of possible instantiations of distributed representation. Connectionists can be seen as exploring that portion of the space of possible transformations that you can handle with n-dimensional vector operations, learning algorithms, etc. In other domains such as optics it is possible to implement other forms of distributing transformations and hence to get distributed representations with different properties. There are a number of reasons for wanting to define distributed representation in terms of superimposition generally, and distributed transformations in particular: (a) superimposition is certainly one of the most common of the standard senses of "distribution" in current usage, and so we remain as close as possible to that usage; (b) superimposition admits of a precise mathematical definition, so those who think clarity only comes from formalization should be kept happy; (c) various popular properties of distributed representation such as automatic generalization and graceful degradation are a natural consequence of distribution defined this way; (d) in practice, in a connectionist context, distribution in the sense of requiring many units rather than just one is a necessary precondition of this more full-blooded notion; hence any advantages that accrue to representations in virtue of utilizing many units also accrue to superimposed representations; (e) a number of other interesting theoretical results follow from defining distribution this way: in particular, it can be shown that distributed representations cannot be symbolic in nature, on a reasonably precise definition of "symbolic" (see e.g. my "Why distributed representation is inherently non- symbolic", in G. Dorffner (ed.) Konnektionismus in Artificial Intelligence und Kognitionsforschung. Berlin: Springer- Verlag, 1990; 58-66). On the basis of this kind of definition of what distributed representation is, what kind of answer can be given to the "what are distributed representations good for?" question? Well, the kind of answer you will find satisfying will depend very much on what your theoretical interests are. A connectionist whose concerns have more of an applied, engineering focus will want to know what specific processing benefits arise from using representations generated by distributing transformations. As mentioned in (c) above, I think that some of the favorite virtues of distribution are best seen as an immediate consequence of superimposition. The technical issues here still need much clarification, however. As a cognitive scientist, on the other hand, I'm interested in more general questions such as - what are the advantages of distribution for human knowledge representation? Here I don't have any actual answers ready to hand; the most I can do the moment is point to the kind of question that seems the most interesting. Speaking at the broadest possible level: various difficulties encountered in mainstream AI, combined with some philosophical reflections, suggest that everyday commonsense knowledge cannot be fully and effectively captured in any kind of purely symbolic format; that, in other words, symbolic representation is fundamentally the wrong medium for capturing at least certain kinds of human knowledge. Just above I mentioned that distributed representation (defined in terms of superimposition) can be shown to be intrinsically non-symbolic. The obvious suggestion then is: perhaps the most important advantage of distributed representation is that it (and it alone?) is capable of representing the kind of knowledge that underlies everyday human competence? Tim van Gelder From tsejnowski at UCSD.EDU Mon Jun 17 13:14:00 1991 From: tsejnowski at UCSD.EDU (Terry Sejnowski) Date: Mon, 17 Jun 91 10:14:00 PDT Subject: Santa Fe Time Series Competition Message-ID: <9106171714.AA23031@sdbio2.UCSD.EDU> A Time Series Prediction and Analysis Competition The Santa Fe Institute August 1, 1991 - December 31, 1991 A wide range of new techniques are now being applied to the time series analysis problems of predicting the future behavior of a system and deducing properties of the system that produced the time series. Such problems arise in most observational disciplines, including physics, biology, and economics; new tools, such as the use of connectionist models for forecasting, or the extraction of parameters of nonlinear systems with time-delay embedding, promise to provide results that are unobtainable with more traditional time series techniques. Unfortunately, the realization and evaluation of this promise has been hampered by the difficulty of making rigorous comparisons between competing techniques, particularly ones that come from different disciplines. In order to facilitate such comparisons and to foster contact among the relevant disciplines, the Santa Fe Institute is organizing a time series analysis and prediction competition. A few carefully chosen experimental time series will be made available through a computer at the Santa Fe Institute, and quantitative analyses of these data will be collected in the areas of forecasting, characterization (evaluating dynamical measures of the system such as the number of degrees of freedom and the information production rate), and system identification (inferring a model of the system's governing equations). At the close of the competition the performance of the techniques submitted will be compared and published, and the server will continue to operate as an archive of data, programs, and comparisons among algorithms. There will be no monetary prizes. A workshop is planned for the Spring of 1992 to explore the results of the competition. The competition does not require advance registration; to enter, simply retrieve the data and submit your analysis. The detailed description of the competition categories and instructions for retrieving the data and entering the competition will be available after August 1 through four routes: ACCESSING THE DATA --------- --- ---- ftp: Ftp to sfi.santafe.edu (192.12.12.1) as user "tsguest" and use "tsguest" for the password. Get the file "instructions". dial-up: There are two dial-up lines: 505-988-1705 (2400 baud), and 505-986-0252 (any speed to 9600 baud). The settings for both lines are no parity, 8 bit words, 1 stop bit. At the connect press return; at the prompt type "login tsguest" and use "tsguest" for the password. At the next prompt type "telnet sfi" and login as user "tsguest" (password "tsguest"). Using either "kermit" or "xmodem", retrieve the file instructions". When you are finished, logout from sfi and from the prompt. mail server: Send email to tserver at sfi.santafe.edu with the phrase "send time series instructions" in either the subject or the body of the message. The mailer will return a file with more detailed instructions for requesting the data and submitting analyses. pc disks: The data is available on disks in either IBM-PC or Mac formats. To cover the cost of distributing the data, send $25 to Time Series Competition Disks, The Santa Fe Institute, 1120 Canyon Road, Santa Fe, NM 87501, and specify the machine type, disk size, and disk density required. Instructions will be included with the disks on submitting a return disk with the analysis of the data. FOR MORE INFORMATION --- ---- ----------- Further questions about the competition, or inquiries about contributing data to be used in the competition, should be directed to: Time Series Competition Santa Fe Institute 1660 Old Pecos Trail, Suite A Santa Fe, NM 87501 (505) 984--8800 tserver at sfi.santafe.edu or to one of the organizers: Neil Gershenfeld Andreas Weigend Department of Physics Xerox Palo Alto Research Center Harvard University 3333 Coyote Hill Road 15 Oxford Street Palo Alto, CA 94304 Cambridge, MA 02138 (415) 322-4066 (617) 495-5641 andreas at sfi.santafe.edu neilg at sfi.santafe.edu ADVISORY BOARD -------- ----- Prof. Leon Glass Department of Physiology McGill University Prof. Clive W. J. Granger Center for Econometric Analysis Department of Economics University of California, San Diego Prof. William H. Press Department of Physics and Center for Astrophysics Harvard University Prof. Maurice B. Priestley Department of Mathematics The University of Manchester Institute of Science and Technology Prof. Itamar Procaccia Department of Chemical Physics The Weizmann Institute of Science Prof. T. Subba Rao Department of Mathematics The University of Manchester Institute of Science and Technology Prof. Harry L. Swinney Department of Physics University of Texas at Austin From pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU Tue Jun 18 14:10:12 1991 From: pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU (Michael Pazzani) Date: Tue, 18 Jun 91 11:10:12 -0700 Subject: Special Issue of Machine Learning Journal Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU> MACHINE LEARNING will be publishing a special issue on Computer Models of Human Learning. The ideal paper would describe an aspect of human learning, present a computational model of the learning behavior, evaluate how the performance of the model compares to the performance of human learners, and describe any additional predictions made by the computational model. Since it is hoped that the papers will be of interest to both cognitive psychologists and computer scientists, papers should be clearly written and provide the background information necessary to appreciate the contribution of the computational model. Manuscripts must be received by April 1, 1992, to assure full consideration. One copy should be mailed to the editor: Michael Pazzani Department of Information and Computer Science University of California, Irvine, CA 92717 USA In addition, four copies should be mailed to: Karen Cullen MACH Editorial Office Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, MA 02061 USA Papers will be subject to the standard review process. Please pass this announcement along to interested colleagues. From pazzani at pan.ICS.UCI.EDU Tue Jun 18 14:10:12 1991 From: pazzani at pan.ICS.UCI.EDU (Michael Pazzani) Date: Tue, 18 Jun 91 11:10:12 -0700 Subject: Special Issue of Machine Learning Journal Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU> MACHINE LEARNING will be publishing a special issue on Computer Models of Human Learning. The ideal paper would describe an aspect of human learning, present a computational model of the learning behavior, evaluate how the performance of the model compares to the performance of human learners, and describe any additional predictions made by the computational model. Since it is hoped that the papers will be of interest to both cognitive psychologists and computer scientists, papers should be clearly written and provide the background information necessary to appreciate the contribution of the computational model. Manuscripts must be received by April 1, 1992, to assure full consideration. One copy should be mailed to the editor: Michael Pazzani Department of Information and Computer Science University of California, Irvine, CA 92717 USA In addition, four copies should be mailed to: Karen Cullen MACH Editorial Office Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, MA 02061 USA Papers will be subject to the standard review process. Please pass this announcement along to interested colleagues. From pollack at cis.ohio-state.edu Tue Jun 18 11:28:35 1991 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Tue, 18 Jun 91 11:28:35 -0400 Subject: Neuroprose Turbulence Expected Message-ID: <9106181528.AA01029@dendrite.cis.ohio-state.edu> Cheops, the pyramid machine upon which NEUROPROSE resides, will be decommissioned. The Neuroprose archive will move, with luck, to a new Sparcserver at the same IP address also called Cheops. But between today and July 1, all cis.ohio-state.edu systems (including email) will be pretty wobbly, so expect delays. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Phone: (614)292-4890 (then * to fax) From uh311ae at sunmanager.lrz-muenchen.de Tue Jun 18 18:31:23 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 19 Jun 91 00:31:23+0200 Subject: large SIMD nn machines, ASI Message-ID: <9106182231.AA12381@sunmanager.lrz-muenchen.de> Hello, I wonder wether there are any other beta testers of the ASI Cnaps machine out who might want to share some experiences. Specifically, has anyone - implemented a non-local algorithm (CG, PCG), - implemented a good random number generator memory efficient enough to be put into node memory / what do you think about tables or host communication for an alternative implementation ? - thought about interfacing some hardware as preprocessor, pi- ping data in via DMA ? - found a job for idle processors (small net sizes) - liked the 1-bit weight mode - ported the debugger to Irix - (other) ? Some of these questions should be familiar to other SIMD programmers to ( I have the witbrock gf11 paper). Thank you for any hints. Cheers, Henrik (Rick at vee.lrz-muenchen.de) H. Klagges, Laser Institute Prof Haensch, PhysDep U of Munich, FRG + IBM Research Division, Binnig group From uh311ae at sunmanager.lrz-muenchen.de Tue Jun 18 18:44:50 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 19 Jun 91 00:44:50+0200 Subject: Backpercolation Message-ID: <9106182244.AA12446@sunmanager.lrz-muenchen.de> Hello, I wonder wether the backpercolation algorithm (see back articles in comp.ai.neural-nets) is important or not. I got some very preliminary results on very simple problems (n-n-n linear channel with few (3-10) patterns) which look not bad, but complicated ones don't seem parti- cularly zooming yet (yes, there are some bugs in my code left). If anyone likes a C++ - backperc - server object (guarantedly broken) to avoid reinventing the wheel and get some basic data structures, let me know. The only problem: Mark Jurik (mgj at cup.portal.com) wants you to sign a nondisclosure thing first before I can sent it out to you. Anyway, if someone else has some first results, I would really like to see them. Cheers, Henrik (Rick at vee.lrz-muenchen.de) From ITGT500 at INDYCMS.BITNET Tue Jun 18 16:32:57 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Tue, 18 Jun 91 15:32:57 EST Subject: Distributed vs. local representation Message-ID: <25A077F07E800064@BITNET.CC.CMU.EDU> Following I would like to state my views on the distributed and local representations. All comments are more than welcome. I think that if we define a strict local representation as: "one object (or item, entity, etc.) is represented by one node (or unit, neuron, etc.) only, and one node represents only one object", then all the other situations probably can be classified as distributed representation (either semi- or full-distributed). In other words, only the one-to-one representation belongs to local representation. The others, multiple-to-one, one-to- multiple, and multiple-to-multiple representations all belong to distributed representation. Therefore, the distributed representation has more senses than local. This may help reduce the confusion regarding these definitions. Because distributed representation covers more range than local, there are many different appearance in distributed representation. One point unnoticed up to now is the difference between the "binary representation" (the node takes binary values only) and the "analog representation" (the node takes analog values). In NETtalk and many other examples, the distributed representation used seems to be the binary one. However, the world seems favoring and is taking the analog form. Therefore, analog distributed representation probably is the one that is working and dominating our cognitive processes. I met one such problem in our work on parabolic problem. We found that if it was not impossible, it would be very difficult to use (strict) local or binary distributed representation to solve the parabolic problem. It was only the analog distributed representation that worked well. We concluded that from the practical application viewpoint, local and the distributed representation all would work if the training and test patterns were discrete and finite. However, if the training and/or test patterns were continuous and infinite, only distributed representation worked. -Bo From aam9n at hagar2.acc.Virginia.EDU Wed Jun 19 04:39:39 1991 From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai) Date: Wed, 19 Jun 91 04:39:39 EDT Subject: Distributed Representations Message-ID: <9106190839.AA00322@hagar2.acc.Virginia.EDU> We connectionists never tire of talking about "distributed representations", and with good reason. However, I have never come across a rigorous definition of the concept. Now, I realize that this notion, like most powerful ones, will necessarily be diminished in any process of definition, however inclusive that might be. That has not fazed us in trying to define entropy, information, complexity, learnability --- and probability! My question is: has anyone rigorously, or even empirically, tried to come up with a definition for distributed representations --- especially a way to quantify distributed-ness? I suppose high-order statistics represent a way to look at this, but have there been any attempts to develop a definition specifically in the context of connectionist networks? And would that be such a bad thing? Ali Minai Dept of EE University of Virginia aam9n at Virginia.EDU From maureen at ai.toronto.edu Wed Jun 19 11:38:49 1991 From: maureen at ai.toronto.edu (Maureen Smith) Date: Wed, 19 Jun 1991 11:38:49 -0400 Subject: Announce new CRG Technical Report Message-ID: <91Jun19.113852edt.780@neuron.ai.toronto.edu> The following technical report is available for ftp from the neuroprose archive. A hardcopy may also be requested. (See below for details.) Though written for a statistics audience, this report should be of interest to connectionists and others interested in machine learning, as it reports a Bayesian solution for one type of "unsupervised concept learning". The technique employed is also related to that used in Boltzmann Machines. Bayesian Mixture Modeling by Monte Carlo Simulation Radford M. Neal Technical Report CRG-TR-91-2 Department of Computer Science University of Toronto It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation. This method exhibits the true Bayesian predictive distribution, implicitly integrating over the entire underlying parameter space. An infinite number of mixture components can be accommodated without difficulty, using a prior distribution for mixing proportions that selects a reasonable subset of components to explain any finite training set. The need to decide on a ``correct'' number of components is thereby avoided. The feasibility of the method is shown empirically for a simple classification task. To obtain a compressed PostScript version of this report from neuroprose, ftp to "cheops.cis.ohio-state.edu" (128.146.8.62), log in as "anonymous" with password "neuron", set the transfer mode to "binary", change to the directory "pub/neuroprose", and get the file "neal.bayes.ps.Z". Then use the command "uncompress neal.bayes.ps.Z" to convert the file to PostScript. To obtain a hardcopy version of the paper by physical mail, send mail to : Maureen Smith Department of Computer Science University of Toronto 6 King's College Road Toronto, Ontario M5A 1A4 From schraudo at cs.UCSD.EDU Wed Jun 19 21:39:56 1991 From: schraudo at cs.UCSD.EDU (Nici Schraudolph) Date: Wed, 19 Jun 91 18:39:56 PDT Subject: hertz.refs.bib patch Message-ID: <9106200139.AA29142@beowulf.ucsd.edu> In adding the "HKP:" prefix to the citation keys in the BibTeX version of the Hertz/Krogh/Palmer bibliography I forgot to modify the internal cross-citations accordingly. I've appended the necessary patch below; it only involves three lines, but those who don't feel up to the task can ftp the patched file (still called hertz.refs.bib.Z) from neuroprose. My apologies for the invonvenience, - Nici Schraudolph. Here's the patch: *** hertz.refs.bib Wed Jun 19 18:23:36 1991 *************** *** 73,80 **** @string{snowbird = "Neural Networks for Computing"} % -------------------------------- Books --------------------------------- ! @string{inAR = "Reprinted in \cite{Anderson88}"} ! @string{partinAR = "Partially reprinted in \cite{Anderson88}"} @string{pdp = "Parallel Distributed Processing"} % ------------------------------- Journals --------------------------------- --- 73,80 ---- @string{snowbird = "Neural Networks for Computing"} % -------------------------------- Books --------------------------------- ! @string{inAR = "Reprinted in \cite{HKP:Anderson88}"} ! @string{partinAR = "Partially reprinted in \cite{HKP:Anderson88}"} @string{pdp = "Parallel Distributed Processing"} % ------------------------------- Journals --------------------------------- *************** *** 3500,3506 **** pages = "75--112", journal = cogsci, volume = 9, ! note = "Reprinted in \cite[chapter 5]{Rumelhart86a}", year = 1985 } --- 3500,3506 ---- pages = "75--112", journal = cogsci, volume = 9, ! note = "Reprinted in \cite[chapter 5]{HKP:Rumelhart86a}", year = 1985 } From rich at gte.com Thu Jun 20 10:24:53 1991 From: rich at gte.com (Rich Sutton) Date: Thu, 20 Jun 91 10:24:53 -0400 Subject: Job Announcement - GTE Message-ID: <9106201424.AA29945@bunny> The connectionist machine learning project at GTE Laboratories is looking for a researcher in computational models of learning and adaptive control. Applications from highly-qualified candidates are solicited. A demonstrated ability to perform and publish world-class research is required. The ideal candidate would also be interested in pursuing applications of their research within GTE businesses. GTE is a large company with major businesses in local telphone operations, mobile communications, lighting, precision materials, and government systems. GTE Labs has had one of the largest machine learning research groups in industry for about seven years. A doctorate in Computer Science, Computer Engineering or Mathematics is required. A demonstrated ability to communicate effectively in writing and in technical and business presentations is also required. Please send resumes and correspondence to: June Pierce GTE Labs MS-44 40 Sylvan Road Waltham, MA 02254 USA From ga1043 at sdcc6.UCSD.EDU Thu Jun 20 12:48:06 1991 From: ga1043 at sdcc6.UCSD.EDU (ga1043) Date: Thu, 20 Jun 91 09:48:06 PDT Subject: Super-Turing discussion Message-ID: <9106201648.AA15438@sdcc6.UCSD.EDU> A couple of months ago, there was a discussion on the network about neural nets, their capabilities, super-Turing machines, etc. About five or six references were mentioned. Does anyone have a list of those refereces, or a copy of that discussion? If you could forward the information to me at ga1043 at sdcc6.ucsd.edu, I would appreciate it. Valerie Hardcastle From rstark at aipna.edinburgh.ac.uk Thu Jun 20 12:29:54 1991 From: rstark at aipna.edinburgh.ac.uk (rstark@aipna.edinburgh.ac.uk) Date: Thu, 20 Jun 91 12:29:54 BST Subject: Distributed vs. Localist Representations Message-ID: <4210.9106201129@fal.aipna.ed.ac.uk> One aspect of this issue which seems implicit in much of this discussion is the notion that distributed representation can be considered a *relative* property. Thus the "room schema" network is "distributed" relative to rooms, but "localist" relative to ovens. Likewise, the Jets and Sharks model, which is considered to be strictly localist in the sense that each unit explictly represents a single concept (eg. "is-in-thirties"), does produce representations that are distributed relative to individual gang members. Andy Clark notes this in Microcognition. Does this seem correct? Is anyone uncomfortable with calling the Jets and Sharks a "distributed" model since each individual is represented by a pattern over the units (one unit active in each competition network), even though each unit can be clearly labelled in a localist fashion? Note that his notion of relativity in distributed representation is (I believe) distinct from its continuous aspects (seen in references to "paritally-" or "semi-" distributed representations), which may be quantifiable using eg. Tim Van Gelder's proposal of degree of superimposition. -Randall Stark --------------------------------------------------------------------------- Randall Stark TEL: (+44)-31-650-2725 | Dept of Artifical Intelligence JANET: rstark at uk.ac.ed.aipna | 80, South Bridge ARPA: rstark%uk.ac.ed.aipna at nsfnet-relay | University of Edinburgh UUCP: ...!uunet!mcsun!ukc!aipna!rstark | Edinburgh, EH1 1HN, UK --------------------------------------------------------------------------- From haffner at lannion.cnet.fr Fri Jun 21 11:36:19 1991 From: haffner at lannion.cnet.fr (Haffner Patrick) Date: 21 Jun 91 17:36:19+0200 Subject: POST-DOCTORAL VACANCY : Connectionism and Oral Dialogue Message-ID: <9106211536.AA02620@lsun26> Applications are invited for research assistantship(s) for post-doctoral or sabbatical candidates. Funding at the French National Telecommunications Research Centre (Centre National d'Etudes des Telecommunications, CNET) will commence in September '91 for a two-year period ; the work location will be Lannion, Brittany, France. Experience is required in Natural Language Processing, especially Oral Dialogue Processing, by Connectionist methods. Applicants should specify the period between Sept '91 and Sept '93 which interests them. Applications, including CV/Resume, should be sent to : Mme Christel Sorin CNET LAA/TSS/RCP BP 40 22301 LANNION CEDEX FRANCE TEL : +33 96-05-31-40 FAX : +33 96-05-35-30 E-MAIL : sorin at lannion.cnet.fr From ITGT500 at INDYCMS.BITNET Thu Jun 20 11:55:32 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Thu, 20 Jun 91 10:55:32 EST Subject: Distributed Representations In-Reply-To: Your message of Wed, 19 Jun 91 04:39:39 EDT Message-ID: ----------------------------Original message---------------------------- Two days ago I mentioned (strict) local representation, binary distributed representation, and analog distributed representation. As an attempt to answer Ali Minai's question, I try to give my understandings on representations as follows: (1). In my opinion, the key points underlying the definitions of representa- tions are the correspondences between the objects (or items, entities, etc.) to be represented by the units (or nodes, neurons, etc.) of the network and the units. The objects can be classified according to the properties they have. More than one object probably can possess the same property. In this case, these objects should be classified into the same group with this property. The units can represent different properties of the objects, or different objects within the same property group. As mentioned in my mail two days ago, there are four kinds of correspondences existed for the relationships between the objects and units: one-to-one, multiple-to-one, one-to-multiple, and multiple-to-multiple. If we define the (strict) local representation as the one that represents the one-to-one correspondence only, then all the other three correspondences can be called distributed representations. However, since there are three different correspondences in distributed representation, the word "distributed representation" will probably be a too broad or too general concept if we try to use one definition "distributed representation" to refer to all the three correspondences. It is perhaps due to this too general word or concept that brought about the confusion on the advantages and disadvantages of local representation vs. distributed representation. (2). In an attempt to clarify these confusions, I think it is necessary to give more specific definitions to all these four different correspondences. Followings are my attempt to define these representations: Local Representation ---- The one-to-one correspondence in which each object is represented by one unit, and each unit represents only one object. Units in local representation always take binary values. Binary Distributed Representation ---- The one-to-multiple correspondence in which each object is represented by multiple units and each unit is employed to represent only one object. The unit takes only binary values here because it represents only one object, there is no need for it to take analog values. Analog Distributed Representation ---- The multiple-to-one correspondence in which multiple objects with the same property are represented by one unit and each unit represents multiple objects with the same property only. Here the unit takes different analog values for different objects within this property group. Different analog values are used to differentiate these different objects within the same property group. Mixed Distributed Representation ---- The multiple-to-multiple correspondence in which multiple objects of multiple properties are represented by one unit and each unit represents multiple objects with multiple properties. Here, the units take either binary or analog values depending on the properties and the object they represent. I am not sure whether the above definitions clarify these concepts and reduce the confusions on these problems or not. Welcome your comments on above statements. Bo Xu Dept. of Physiology and Biophysics School of Medicine Indiana University ITGT500 at INDYCMS.BITNET From hwang at pierce.ee.washington.edu Fri Jun 21 14:54:47 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Fri, 21 Jun 91 11:54:47 PDT Subject: IJCNN'91 Presidents' Forum (new announcement from Prof. Marks) Message-ID: <9106211854.AA13350@pierce.ee.washington.edu.> News release IEEE NEURAL NETWORKS COUNCIL IS SPONSORING A PRESIDENTS' FORUM AT IJCNN `91 IN SEATTLE, WASHINGTON Robert J. Marks II, Professor at the University of Washington and President of the IEEE Neural Networks Council (NNC), has announced that for the first time the IEEE/NNC will be sponsoring a Presidents' Forum during IJCNN `91 in Seattle, Washington, July 8-12, 1991. The participants of the Presidents' Forum will be the Presidents of the major artificial neural network societies of the world, including the China Neural Networks Committee, the Joint European Neural Network Initiative, the Japanese Neural Networks Society and the Russian Neural Networks Society. The Forum will be open to conference attendees and the press on Wednesday evening, 6:30-8:30 pm, July 10, 1991, at the Washington State Convention Center in Seattle. Each President will give a short (15-20 minute) presentation of the activities of their society, followed by a short question/answer period. Robert J. Marks II will be this year's moderator. From aam9n at honi4.acc.virginia.edu Thu Jun 20 17:39:38 1991 From: aam9n at honi4.acc.virginia.edu (aam9n) Date: Thu, 20 Jun 91 17:39:38 EDT Subject: Distributed Representations Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU> Bo Xu presents a very interesting classification of representations in terms of their distribution over representational units. The definitions of each class are internally clear enough, but I have some comments about how "distributivity" is defined, and where it leads. Let's take the definitions that Bo Xu gives: >Local Representation ---- The one-to-one correspondence in which each object > is represented by one unit, and each unit represents only one object. > Units in local representation always take binary values. No quarrel about this one being a local representation. >Binary Distributed Representation ---- The one-to-multiple correspondence > in which each object is represented by multiple units and each unit > is employed to represent only one object. The unit takes only binary > values here because it represents only one object, there is no need > for it to take analog values. Suppose I have two objects --- an apple and a pear --- and six representational units r1.....r6. Then, if I read this definition correctly, a distributed representation might be 000111 <-> apple and 111000 <-> pear. Since the units are binary, they are presumably "on" if the object is present and "off" if it is not. No reference is made to "properties" defining the object, and so there is no semantic content in any unit beyond that of mere signification: each unit is, ideally, identical. The question is: why have three units signifying one object when they work as one? One reason might be to achieve redundancy, and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Is this a distributed representation, though? To decide that, I must have an *external* definition of what it means for a representation to be distributed. Tentatively, I say that "a representation is distributed over a group of units if no single unit's correct operation is critical to the representation". This certainly holds in the above example. It holds, indeed, in all error-correcting codes. In a binary distributed representation, then, I can define the "degree of distributivity" as the minimum Hamming distance of the code. This is quite consistent, if rather disappointingly mundane. >Analog Distributed Representation ---- The multiple-to-one correspondence > in which multiple objects with the same property are represented by > one unit and each unit represents multiple objects with the same > property only. Here the unit takes different analog values for > different objects within this property group. Different analog > values are used to differentiate these different objects within the > same property group. Here, under the obvious reading of this definition, I have two categories (units) called "fruits" and "vegetables". Each represents many objects with different values, but mutually exclusively. Thus, I might have apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code 0.1,0.1. This is obviously equivalent to a binary representation with each unit replaced by, say, n binary units. The question is: does this code embody the principle of dispensibility? Not necessarily. One wrong bit could change an apple into a lemon, or even lose all information about the category of the object. Thus, in the general case, such a representation is "distributed" only in the physical sense of activating (or not activating) units in a group. Each unit is still functionally critical. >Mixed Distributed Representation ---- The multiple-to-multiple correspondence > in which multiple objects of multiple properties are represented by > one unit and each unit represents multiple objects with multiple > properties. Here, the units take either binary or analog values > depending on the properties and the object they represent. Now here we have what most people mean by "distributed representations". We have many properties, each represented by a unit, and many objects. Each object can be encoded in terms of its properties. If the set of properties does not have enough discrimination, multiple objects could have the same code. Even if the property set is sufficient for unique representation, it is possible that the malfunction of one unit may change one object to another. The question then is: is this dependency small or large? Does a small malfunction in a unit cause catastrophic change in the semantic content of the whole group of units? I can "distribute" my representation over all the atoms in the universe, but if that doesn't give me some protection from point failures, I have not truly "distributed" things at all --- merely multiplied the local representation. Now, of course, in the "real" world where things are uniformly or normally distributed and errors are uncorrelated, increasing the size of a representation over a set of independent units will almost always confer some degree of protection from catastrophic point failures. An important issue is how to *maximize* this. And to do that, we must be able to measure it. One way would be to minimize the average information each representational unit conveys about the represented objects, which is a simple maximum entropy formulation. This requirement must, of course, be balanced by an adequate representation imperative. Other formulations are certainly possible, and probably much better. In any case, many of the more interesting issues in distributed representation arise when the "object" being represented is only implicitly available, or when the representation is distributed over a hierarchy of units, not all of which are directly observable, and not all of which count in the final encoding. Comments? Ali Minai aam9n at Virginia.EDU From aam9n at honi4.acc.virginia.edu Thu Jun 20 17:39:38 1991 From: aam9n at honi4.acc.virginia.edu (aam9n) Date: Thu, 20 Jun 91 17:39:38 EDT Subject: Distributed Representations Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU> Bo Xu presents a very interesting classification of representations in terms of their distribution over representational units. The definitions of each class are internally clear enough, but I have some comments about how "distributivity" is defined, and where it leads. Let's take the definitions that Bo Xu gives: >Local Representation ---- The one-to-one correspondence in which each object > is represented by one unit, and each unit represents only one object. > Units in local representation always take binary values. No quarrel about this one being a local representation. >Binary Distributed Representation ---- The one-to-multiple correspondence > in which each object is represented by multiple units and each unit > is employed to represent only one object. The unit takes only binary > values here because it represents only one object, there is no need > for it to take analog values. Suppose I have two objects --- an apple and a pear --- and six representational units r1.....r6. Then, if I read this definition correctly, a distributed representation might be 000111 <-> apple and 111000 <-> pear. Since the units are binary, they are presumably "on" if the object is present and "off" if it is not. No reference is made to "properties" defining the object, and so there is no semantic content in any unit beyond that of mere signification: each unit is, ideally, identical. The question is: why have three units signifying one object when they work as one? One reason might be to achieve redundancy, and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Is this a distributed representation, though? To decide that, I must have an *external* definition of what it means for a representation to be distributed. Tentatively, I say that "a representation is distributed over a group of units if no single unit's correct operation is critical to the representation". This certainly holds in the above example. It holds, indeed, in all error-correcting codes. In a binary distributed representation, then, I can define the "degree of distributivity" as the minimum Hamming distance of the code. This is quite consistent, if rather disappointingly mundane. >Analog Distributed Representation ---- The multiple-to-one correspondence > in which multiple objects with the same property are represented by > one unit and each unit represents multiple objects with the same > property only. Here the unit takes different analog values for > different objects within this property group. Different analog > values are used to differentiate these different objects within the > same property group. Here, under the obvious reading of this definition, I have two categories (units) called "fruits" and "vegetables". Each represents many objects with different values, but mutually exclusively. Thus, I might have apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code 0.1,0.1. This is obviously equivalent to a binary representation with each unit replaced by, say, n binary units. The question is: does this code embody the principle of dispensibility? Not necessarily. One wrong bit could change an apple into a lemon, or even lose all information about the category of the object. Thus, in the general case, such a representation is "distributed" only in the physical sense of activating (or not activating) units in a group. Each unit is still functionally critical. >Mixed Distributed Representation ---- The multiple-to-multiple correspondence > in which multiple objects of multiple properties are represented by > one unit and each unit represents multiple objects with multiple > properties. Here, the units take either binary or analog values > depending on the properties and the object they represent. Now here we have what most people mean by "distributed representations". We have many properties, each represented by a unit, and many objects. Each object can be encoded in terms of its properties. If the set of properties does not have enough discrimination, multiple objects could have the same code. Even if the property set is sufficient for unique representation, it is possible that the malfunction of one unit may change one object to another. The question then is: is this dependency small or large? Does a small malfunction in a unit cause catastrophic change in the semantic content of the whole group of units? I can "distribute" my representation over all the atoms in the universe, but if that doesn't give me some protection from point failures, I have not truly "distributed" things at all --- merely multiplied the local representation. Now, of course, in the "real" world where things are uniformly or normally distributed and errors are uncorrelated, increasing the size of a representation over a set of independent units will almost always confer some degree of protection from catastrophic point failures. An important issue is how to *maximize* this. And to do that, we must be able to measure it. One way would be to minimize the average information each representational unit conveys about the represented objects, which is a simple maximum entropy formulation. This requirement must, of course, be balanced by an adequate representation imperative. Other formulations are certainly possible, and probably much better. In any case, many of the more interesting issues in distributed representation arise when the "object" being represented is only implicitly available, or when the representation is distributed over a hierarchy of units, not all of which are directly observable, and not all of which count in the final encoding. Comments? Ali Minai aam9n at Virginia.EDU From ITGT500 at INDYCMS.BITNET Sat Jun 22 11:38:17 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Sat, 22 Jun 91 10:38:17 EST Subject: Distributed Representations Message-ID: <29E19BB296800064@BITNET.CC.CMU.EDU> Ali Minai presented a good example of apple and pear. I am going to answer some questions he raised. Let's look at his statements first. >is not. No reference is made to "properties" defining the object, and so there >is no semantic content in any unit beyond that of mere signification: each This is a very good question. Generally speaking, there are many properties existed at the same time for each object. Let's take the apple as an example. An apple can be classified according to its taste, color, size, shape, or whether it is a fruit or not (as Ali Minai chose) etc. Different people will choose different criteria to meet the purpose of their applications. >unit is, ideally, identical. The question is: why have three units signifying >one object when they work as one? One reason might be to achieve redundancy, >and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Redundancy and fault-tolerance may be reasons for binary distributed representation. Another reason probably comes from the faster convergence rate consideration. Karen Kukich has done some interesting work and concludes that the advantage of local representation is the faster convergence rate (see K. Kukich, "Variations on a Back-Propagation Name Recognition Net" in the Proceedings of the United States Postal Service Advanced Technology Conference, Vol. 2, 722-735). The binary distributed representation is similar to local representation in that they all take binary values. However, as to why "three" instead of "five" or any other numbers, I also don't know. This question is probably similar to the question of "how many hidden units are needed for a specific task?". It may depend on to what degree the redundancy is needed. >Here, under the obvious reading of this definition, I have two categories >(units) called "fruits" and "vegetables". Each represents many objects >with different values, but mutually exclusively. Thus, I might have >apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code >0.1,0.1. This is obviously equivalent to a binary representation with >each unit replaced by, say, n binary units. The question is: does this >code embody the principle of dispensibility? Not necessarily. One wrong bit >could change an apple into a lemon, or even lose all information about the >category of the object. Thus, in the general case, such a representation >is "distributed" only in the physical sense of activating (or not activating) >units in a group. Each unit is still functionally critical. It is true if there is a bit of error, the apple will change to lemon etc. However, the key point here is that the neural net's fault-tolerance characteristic exists only after it is trained and has reached an accuracy criterion. If we are dealing with many objects and use 0.1 as a value to differentiate different objects, we will train the net to reach a criterion at least smaller than 0.1 (otherwise, the net will be of no use). Thus, for seen patterns, the error will not be so big that an apple will turn into a lemon. For unseen patterns, bigger errors probably will occur, and apples probably will turn to lemons or whatsoever. However, this time we may not attribute the problem to the representation used only. This is related to the generalizability of the net, and the learning algorithm, units responsive characteristics and even the topology of the net all probably are playing roles for the generalizability of the net. >Now here we have what most people mean by "distributed representations". We >nother. The question then is: is this dependency small or large? Does >small malfunction in a unit cause catastrophic change in the semantic >content of the whole group of units? I can "distribute" my representation When talking about the representations, the graceful degradation of brain is introduced as a criterion. However, since the neural net is still far away from a real brain model, some cautiousness should be taken when relating the neural net to brain. The first thing to be made clear is that which layer of neural net we are refering to. Most people refer to the interface layers (the input and output layers) of neural net when they talk about the local/distributed representations. However, they refer to all layers (both the interface layers and hidden layers) when they talk about the graceful degradation. However, what are the justices for the interface layers to possess graceful degradation? If we say that neural net resembles brain in some aspects, then the resemblance most likely lies in the hidden layers instead of the interface layers. The criterion of graceful degradation should be made on the hidden layers instead of the interface layers. In most of current nets, the hidden layers are using mixed distributed representation, and thus possess the graceful degradation characteristics. As to the interface layers (input/output layers), we can demand them to possess the graceful degradation characteristics too. However, in my opinion, this will lead to many additional problems and confusions. The mixed distributed representation is good for hidden layers, not for interface layers. I think for the interface layers, the analog distributed representation works best because: (1) Considerations at the interface layers should be practicality instead of graceful degradation. There is no justice and no need for the interface layers to possess the graceful degradation. (2). The analog distributed representation has classified the objects to be represented. The objects with the same property are classified into the same group. The differences between the objects in the same group are represented by different analog values of the unit representing this property group (eg, assume that there are four apples and three pears, then in analog distributed representation, two units should be used: unit A for apple and unit P for pear. The four apples can be represented by letting unit A take four different analog values. The three pears can be represented by letting unit P take three different analog values.). This is the most natural way when we deal with many objects. Why should we sacrifice the natural way (analog distributed representation) for the graceful degradation (which may not belong to the interface layers. The hidden layers are using mixed distributed representation and possess graceful degradation) when we are considering the interface layers? We used the analog distributed representation in a parabolic problem (a task mapping the parabola curve we used to compare the performances of BPNN and PPNN) and found that the analog distributed representation was the best and most natural representation for problems (such as the parabolic problem) which has continuous and infinite training/test patterns (objects). In sum, I think that we should be more specific when we talk about the representations and brain-like characteristics of neural nets: (1) For the interface layers (input/output layers), the analog distributed representation is the best choice because at the interface layers, the priority of consideration is practicality, and the analog distributed representation is the most natural one and most easily to be used in dealing with many objects. (2) For the hidden layers, the mixed distributed representation is the best choice because the graceful degradation requirement now is the priority to be taken into account of for hidden layers. Fortunately, most of the current network architechures have ensured such requirement for hidden layers. Bo Xu ITGT500 at INDYCMS.BITNET From aam9n at hagar3.acc.Virginia.EDU Sat Jun 22 21:49:33 1991 From: aam9n at hagar3.acc.Virginia.EDU (Ali Ahmad Minai) Date: Sat, 22 Jun 91 21:49:33 EDT Subject: Distributed Representations Message-ID: <9106230149.AA00465@hagar3.acc.Virginia.EDU> Bo Xu raises some questions about distributed representations in the context of feed-forward neural networks, particularly with regard to graceful degradation. I do not agree that to require graceful degradation is to imply "brain-like" networks. In my opinion, the very notion of distribution is fundamentally linked to the requirement that each representational unit be minimally loaded, and that each representation be as homogeneously distributed over all representational units as possible. That this produces graceful degradation is partly true (only to the first order, given the non-linearity of the system), but that is incidental. Speaking of which layers to apply the definition to, I think that in a feed-forward associative network (analog or binary), the hidden neurons (or all the weights) are the representational units. The input neurons merely distribute the prior part of the association, and the output neurons merely produce the posterior part. The latter are thus a "recovery mechanism" designed to "decode" the distributed representation of the hidden units and recover the "original" item. Of course, in a heteroassociative system, the "recovered original" is not the same as the "stored original". I realize that this is stretching the definition of "representation", but it seems quite natural to me. The issue of a "recovery mechanism" is quite fundamental to the question of representational distribution. Without a requirement for adequate recoverability, any finite medium could be "distributedly" loaded with a potentially infinite number of representations, without being able to reproduce any of them. To ensure adequate recoverability, however, representations must be "distinct", or mutually non-interacting, in some sense. Given the countervailing requirement of distributedness, the obvious route of separation by localization is not available, and we must arrive at some compromise principle of minimum mutual disturbance, such as a requirement for orthogonality or linear independence (rather artificial, if you ask me). My point is that defining distributed representations only in terms of unconstrained characteristics is a partial solution. Internal and external constraining factors must be included in the formulation to adequately ground the definition. These are provided by the requirements of maximum dispensibility and adequate recoverability. Zillions of issues remain unaddressed by this formulation too, especially those of consistent measurement. I feel that each domain and situation will have to supply its own specifics. I am not sure I understand Bo Xu's assertion that analog representations are "more natural". Certainly, to approximate a parabola (which I have done hundreds of times with different neural nets) would imply using an analog representation, but it is not clear if that is so natural for classifying apples and pears. Using different analog values to indicate intra-class variations is reasonable and, under specific circumstances, might even be provably better than a binary representation. But I would be very hesitant to generalize over all possible circumstances. In any case, a global characterization of distributed representation should depend of specifics only for details, and should apply to both discrete and analog representations. Ali Minai University of Virginia aam9n at Virginia.EDU From ross at psych.psy.uq.oz.au Sun Jun 23 01:52:51 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 23 Jun 1991 15:52:51 +1000 Subject: Distributed vs Localist Representations Message-ID: <9106230552.AA02343@psych.psy.uq.oz.au> Randall Stark (rstark at aipna.edinburgh.ac.uk) writes: >One aspect of this issue which seems implicit in much of this discussion >is the notion that distributed representation can be considered >a *relative* property. Thus the "room schema" network is "distributed" >relative to rooms, but "localist" relative to ovens. A related point was raised by Paul Smolensky in his work on variable binding using tensor representations. By his definition a representation is distributed if enitities of external interest (objects, attributes, values or whatever) are represented as patterns across multiple units. The point Paul makes is that in much connectionist work the variables are localised while the values are distributed. That is, the set of units is typically divided into disjoint groups that function as registers or variables. Each variable is able to hold a pattern of activations that is a distributed value. He proposed a mechanism in which the variables are not disjoint sets of units but instead are patterns that are bound to the patterns representing values. Using this scheme a binding of a variable with a value is itself represented as a pattern distributed over units and multiple bindings can be simultaneously represented on the same units. The nice point about this is that it puts variables and values on an equal footing, they are both patterns. In fact the system does not need to distinguish between them from a processing perspective. Whether something is a variable or a value is a question of how it is used, not how it is represented or implemented. Ross Gayler ross at psych.psy.uq.oz.au From aarons at cogs.sussex.ac.uk Sun Jun 23 16:13:31 1991 From: aarons at cogs.sussex.ac.uk (Aaron Sloman) Date: Sun, 23 Jun 91 21:13:31 +0100 Subject: Varieties of intelligence (long) Message-ID: <1666.9106232013@csrn.cogs.susx.ac.uk> A friend, Gerry Martin, is interested in "achievers", how they differ and the conditions that create them or enable them to achieve. I offered to try to find out if anyone knew of relevant work on different kinds of (human) intelligence, how they develop, what they are, and what (social) mechanisms if any enable them to be matched with opportunities for development or fulfilment. There's a collection of related questions. 1. To what extent does evolution produce variation in intellectual capabilities, motivations, etc.? How far is the observable variation due to environmental factors? This is an old question, of course, and very ill-defined (e.g. there is probably no meaningful metric for the contributions of genetic and environmental factors to individual development). It is clear that physical variability is inherent in evolutionary mechanisms: without this there could not be (Darwinian) evolution. The same must presumably be true for "mental" variability. Do genetic factors produce different kinds of differences: in intellectual capabilities, motivational patterns, perceptual abilities, memory abilities, problem solving abilities, etc. I think it was Waddington who offered the metaphor of the "epigenetic landscape" genetically determining the opportunities for development of an individual. The route actually taken through the landscape would depend on the individual's environment. So our question is how different are the landscapes (the sets of possible developmental routes) with which each human child is born, and to what extent do they determine different opportunities for mental, as well as physical development? (Obviously the two are linked: a blind child won't as easily become a great painter.) (Piaget suggested that all the human landscapes have a common structure, with well defined stages. I suspect this view will not survive close analysis.) For intelligent social animals, mental variability is more important than physical variability: a social system has more diversity of intellectual and motivational requirements in its "jobs" than diversity of physical requirements. (Perhaps not if you include the "jobs" done for us by other animals, plants, microorganisms, machines, etc., without which our society could not survive.) Anyhow, without variation in mental properties (whether produced genetically or not) it could be hard to achieve the division of labour that enables a complex social system to work. Aldous Huxley's book "Brave New World" takes this idea towards an unpalatable conclusion. The need for mental variability goes beyond infrastructure: without such variability all artists would be painters, or all would be composers, or all would be poets, and all scientists would be physicists, or biologists... Division of labour is required not only for the enabling mechanisms of society, but also for cultural richness. 2. What is the form of this variability? Folk psychology has it that there are different kinds of genius - musical geniuses, mathematical geniuses, geniuses in biology, great actors and actresses, etc. Could any of these have excelled in any other field? Would the right education have turned Mozart into a great mathematicion, or would his particular "gifts" never have engaged with advanced mathematics? Could a suitable background have made Newton a great composer? Does anyone have any insight into the genetic requirements for different kinds of creative excellence? We can distinguish two broad questions: (a) is there wide variability in DEGREE in innate capabilities (b) is there also wide variability in KIND (domain, field of application, or whatever)? In either case it would be interesting to know what kinds of mechanisms account for the differences? Could they be quantitative (as many naive scientists have supposed -- e.g. number of brain cells, number of connections, speed of transmission of signals, etc.) or are the relevant differences more likely to be structural -- i.e. differences in hardware or software organisation? It looks as if many ordinary human learning capabilities need specific pre-determined structures, providing the basis for learning abilities: e.g. learning languages with complex syntax, learning music, learning to control limbs, learning to see structured objects, learning to play games, learning mathematics, and so on. (Some of the structures creating these capabilities might be shared between different kinds of potential.) If these enabling structures are not "all-or-nothing" systems there could sometimes be partial structures at birth, giving some individuals subsets of "normal" capabilities. Are these all a result of pre-natal damage, or might the gene pool INHERENTLY generate such variety? (An unpalatable truth?) Does the gene pool also produce some individuals with powerful supersets of what is relatively common? Are there importantly different supersets, corresponding to distinct "gifts"? (E.g. Mozart, Newton, Shakespeare.) What are the additional mechanisms these individuals have? Can those born without be given them artificially? (E.g. through special training, hormone treatment, etc..) 3. To what extent do different approaches to AI (I include connectionism as a sub-field of AI) provide tools to model different sorts of mentalities? As far as I know, although there has been much empirical research (e.g. on twins) to find out what is and what is not determined genetically, there there has been very little discussion of mechanisms that might be related to such variability. >From an AI standpoint it is easy to speculate about ways in which learning systems could be designed that are initially highly sensitive to minor and subtle environmental differences and which, through various kinds of positive feedback, amplify differences so that even individuals that start off very similar could, in a rich and varied environment, end up very different. This sort of thing could be a consequence of multi-layered self-modifying architectures with thresholds of various kinds that get modified by "experience" and which thereby change the behaviour of systems which cause other thresholds to be modified. Even without thresholds, hierarchies of condition-action rules, where some of the actions create or alter other rules, would also provide for enormous variability. (As could hierarchies of pdp networks, some of which change the topology of others.) Cascades of such changes could produce huge qualitative variation in various kinds of intellectual capabilities as well as variation in motivational, emotional and personality traits, aesthetic tastes, etc. Such architectures might allow relatively small genetic differences as well as small environmental differences to produce vast differences in adult capabilities. Variation in tastes in food, or preferences for mates, despite common biological needs, seem to be partly a result of cultural feedback through such developmental mechanisms. But is it all environmental? I gather there are genetic factors that stop some people liking the tastes of certain foods. What about a taste for mathematics, or a general taste for intellectual achievement? 4. Does anyone have any notion of the kinds of differences in implementation that could account for differences in tastes, capabilities, etc. Would it require: (a) differences in underlying physical architectures (e.g. different divisions of brains into cooperative sub-nets, or different connection topologies among neurones?), (b) differences in the contents of "knowledge bases", "plan databases", skill databases, etc. (By "database" I include what can be stored in a trainable network.) (c) differences in numerical parameters. or something quite different? I suspect there's a huge variety of distinct ways in which qualitative differences in capability can emerge: some closer to hardware differences, some closer to software differences. The latter might in principle be easier to change, but not in practice, if for example, it requires de-compiling a huge and messy system. The only AI-related work that I know of that explicitly deals not only with the design or development of a single agent, but with variable populations, is work on genetic algorithms, which can produce a family of slightly different design solutions. Of course, it is premature for anyone to consider modelling evolutionary processes that would produce collections of "complete" intelligent agents (as opposed to collections of solutions to simple problems like planning problems, recognition problems, or whatever). But has anyone investigated general principles involved in mechanisms that could produce populations of agents with important MENTAL differences? Are there any general principles? (Are the mental epigenetic landscapes for a species importantly different in structure from the physical ones? Perhaps for some organisms, e.g. ants, there's a lot less difference than for others, e.g. chimpanzees?) 5. There are related questions about the need for or possibility of social engineering. (The questions are fraught with political and ethical problems.) In particular, if truly gifted individuals have narrowly targetted potential, are there mechanims that enable such potential to be matched with appropriate opportunities for development and application? Do rare needs have a way of "attracting" those with the rare ability to tackle them? What mechanisms can help to match individuals with unusual combinations of motives and capabilities, with tasks or roles that require those combinations? In a crude and only partly successful way the educational system and career advisory services attempt to do this. Special schools or special lessons for gifted children attempt to enhance the match-making. However, these formal institutions work only insofar as there are fairly broad and widely-recognized categories of individuals and of tasks. They don't address the problem of matching the potentially very high achievers to very specific opportunities and tasks that need them. Some job advertisements and recruitment services attempt to do this but there's no guarantee that they make contact with really suitable candidates, and we all know how difficult selection is. Also these mechanisms assume that the need has been identified. There was no institution that identified the need for a theory of gravity and recruited Newton, provided him with opportunties, etc. Was it pure chance then that he was "found"? Or were there many others who might have achieved what he did? Or were there unrecognized social mechanisms that "arranged" the match? If so, how far afield could he have been born without defeating the match-making? If the potentially very high acheivers only have very small areas in which their potential can be realized, and if each type is very rare, there may be no general way to set up conditions that bring them into the appropriate circumstances. An important example might turn out to be the problem of matching the particular collection of talents, knowledge, and opportunity that would enable a cure for AIDS to be found. In a homogenous global culture with richly integrated (electronic?) information systems it might be possible to reduce the risks of such lost opportunities, but only if there are ways of recognizing in advance that a particular individual is likely to be well suited to a particular task. The more narrowly defined and rare the task and the capabilities, the less likely it is that the match can be recognized in advance. Is the idea that there are important but extremely difficult tasks and challenges that only a very few individuals have the potential to cope with just a romantic myth? Or is every solvable problem, every achievable goal, solvable by a large subset of humanity, given the right training and opportunity? (Will we ever know whether nobody but Fermat had what it takes to prove his "last" theorem?) Even if the "romantic myth" is close to the truth, there may be no way of setting up social mechanisms with a good chance of bringing important opportunities and appropriately gifted individuals together: social systems are so complex that all attempts to control them, however well-meaning, invariably have a host of unintended, often undesirable, consequences, some of them long term and far less obvious than missiles that hit the wrong target. Could some variant of AI help here? It seems unlikely that connectionist pattern recognition techniques could work. (E.g. where would training sets come from?) Could some more abstract sort of expert system help? Neither could inform us that the person capable of solving a particular problem is an unknown child in a remote underdeveloped community. Perhaps there is nothing for it, but to rely on chance, co-incidence, or whatever combination of ill-understood biological and social processes have worked up to now in enabling humankind to achieve what distinguishes us from ants and apes) including our extremes of ecological vandalism). ----------------------------------------------------------------------- I don't know if I have captured Gerry's questions well: he hasn't seen this message. But if you have any relevant comments including pointers to literature, information about work in progress, criticisms of the presuppositions of the questions, conjectures about the answers, etc. I'll be interested to receive them and to pass them on. I'll post this to connectionists and the comp.ai newsgroup. (Should it go to others?) Apologies for length. Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons at cogs.sussex.ac.uk After 18th July 1991: School of Computer Science. The University of Birmingham, UK. Email: A.Sloman at cs.bham.ac.uk From ITGT500 at INDYCMS.BITNET Mon Jun 24 10:45:52 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Mon, 24 Jun 91 09:45:52 EST Subject: Distributed Representations Message-ID: Ali Minai mentioned a good point on where the representations are considered. Let's see his messages first: >Speaking of which layers to apply the definition to, I think that in a >feed-forward associative network (analog or binary), the hidden neurons >(or all the weights) are the representational units. The input neurons >merely distribute the prior part of the association, and the output neurons >merely produce the posterior part. The latter are thus a "recovery mechanism" >designed to "decode" the distributed representation of the hidden units and >recover the "original" item. Of course, in a heteroassociative system, the >"recovered original" is not the same as the "stored original". I realize that >this is stretching the definition of "representation", but it seems quite >natural to me. I think according to the criterion of where representations exist, the representations can be classified into two different types: (1). External representations ---- The representations existed at the interface layers (input and/or output layers). They are responsible for the information transmission between the network and the outside world (coding the input information at the input layer and decoding the output information at the output layer). (2). Internal representations ---- The representations existed at the hidden layers. These representations are used to encode the mappings from the input field to the output field. The mappings are the core of the neural net. If I understand correctly, Ali Minai is referring to the internal representations only, and neglect the external representations. The internal representations are very important representations. However, these representations are determined by the topology of the network, and we cannot change them unless we change the network topology. Most of the current networks' topology ensure that the internal representations are mixed distributed representations (as I pointed out several days ago). Their working mechanisms are still a black-box. Without changing the topology of the network, what we can choose and select are the external representations only. They should not be neglected. >Zillions of issues remain unaddressed by this formulation too, especially >those of consistent measurement. I feel that each domain and situation >will have to supply its own specifics. >I am not sure I understand Bo Xu's assertion that analog representations >are "more natural". Certainly, to approximate a parabola (which I have >done hundreds of times with different neural nets) would imply using an >analog representation, but it is not clear if that is so natural for >classifying apples and pears. Using different analog values to indicate >intra-class variations is reasonable and, under specific circumstances, >might even be provably better than a binary representation. But I would >be very hesitant to generalize over all possible circumstances. In any >case, a global characterization of distributed representation should depend >of specifics only for details, and should apply to both discrete and analog >representations. It's true that there will be zillions of issues in practical applications. However, it's also due to this fact, it will be very difficult (if not impossible) to study all these zillions issues first before drawing some conclusions. Some generalizations based on limited studies are probably necessary and of help when facing such a situation. I want to thank Ali Minai for his comments. All of his comments are very valuable and thought-stimulating. Bo Xu Indiana University ITGT500 at INDYCMS.BITNET From aam9n at hagar2.acc.Virginia.EDU Mon Jun 24 22:29:34 1991 From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai) Date: Mon, 24 Jun 91 22:29:34 EDT Subject: Distributed Representations Message-ID: <9106250229.AA00528@hagar2.acc.Virginia.EDU> This is in response to Bo Xu's last posting regarding distributed representations. I think one of the problems is a basic incompatibility in our notions of "representations" and where they exist. I would like to clarify my earlier posting somewhat on this point. I wrote: >>Speaking of which layers to apply the definition to, I think that in a >>feed-forward associative network (analog or binary), the hidden neurons >>(or all the weights) are the representational units. The input neurons >>merely distribute the prior part of the association, and the output neurons >>merely produce the posterior part. The latter are thus a "recovery mechanism" >>designed to "decode" the distributed representation of the hidden units and >>recover the "original" item. Of course, in a heteroassociative system, the >>"recovered original" is not the same as the "stored original". I realize that >>this is stretching the definition of "representation", but it seems quite >>natural to me. To which Bo replied: >I think according to the criterion of where representations exist, the >representations can be classified into two different types: > >(1). External representations ---- The representations existed at the > interface layers (input and/or output layers). They are > responsible for the information transmission between the network > and the outside world (coding the input information at the input > layer and decoding the output information at the output layer). > >(2). Internal representations ---- The representations existed at the > hidden layers. These representations are used to encode the > mappings from the input field to the output field. The mappings > are the core of the neural net. > >If I understand correctly, Ali Minai is referring to the internal >representations only, and neglect the external representations. The internal >representations are very important representations. However, these >representations are determined by the topology of the network, and we cannot >change them unless we change the network topology. Most of the current >networks' topology ensure that the internal representations are mixed >distributed representations (as I pointed out several days ago). Their >working mechanisms are still a black-box. > >Without changing the topology of the network, what we can choose and >select are the external representations only. They should not be neglected. First, let me state what I meant by the "stored" and "recovered" representations in the heteroassociative case. We can see the process of the heteroassociation of an input vector U and output vector V in a feed-forward network as a process of encoding a representation of the vector UV over the hidden units of the network. This is what I call "storage". There is a special requirement here that, given U, a mechanism should be able to produce V over the output units, thus "completing the pattern". The process of doing this is what I call "recovery" (or "recall"). The way I see it (and I believe most other connectionists too) is that the representational part of the network consists of its "internals" --- either the weights, or the hidden units. Far from being uncontrollable, as Bo Xu states, these are *precisely* the things that we *do* control --- not in a micro sense, but through complex global schemes such as training algorithms. The prior to be stored, which Bo takes to be the representation, is, to me, just a given that has been through some unspecified preprocessing. It is the "object" to be represented (though I agree that all objects are themselves representations). From rosauer at ira.uka.de Tue Jun 25 14:27:57 1991 From: rosauer at ira.uka.de (Bernd Rosauer) Date: Tue, 25 Jun 91 14:27:57 MET DST Subject: genetic algorithms + neural networks Message-ID: I am interested in any kind of combination of genetic algorithms and neural network training. I am aware of the papers presented at * Connectionist Models Summer School, 1990 * First International Workshop on Parallel Problem Solving from Nature, 1990 * Third International Conference on Genetic Algorithms, 1989 * Advances in Neural Information Processing Systems 2, 1989. Please, let me know if there is any further work on that topic. Post to , so I will summarize here. Thanks a lot Bernd From stork at GUALALA.CRC.RICOH.COM Mon Jun 24 20:36:49 1991 From: stork at GUALALA.CRC.RICOH.COM (David Stork) Date: Mon, 24 Jun 91 17:36:49 -0700 Subject: Job offer Message-ID: <9106250036.AA11456@cache.CRC.Ricoh.Com> The Ricoh California Research Center has an oppening for a staff programmer or researcher in neural networks and connectionism. This opening is for a B.S. or possibly M.S.-level graduate in Physics, Computer Science, Math, Electrical Engineering, Cognitive Science, Psychology, and related topics. A background in some hardware design is a plus. The Ricoh California Research Center is located in Menlo Park, about one mile from Stanford University. Contact: Dr. David G. Stork Ricoh California Research Center 2882 Sand Hill Road #115 Menlo Park, CA 94025-7022 stork at crc.ricoh.com From issnnet at park.bu.edu Tue Jun 25 15:39:29 1991 From: issnnet at park.bu.edu (issnnet@park.bu.edu) Date: Tue, 25 Jun 91 15:39:29 -0400 Subject: Call For Votes: comp.org.issnnet Message-ID: <9106251939.AA04607@copley.bu.edu> CALL FOR VOTES ---------------- GROUP NAME: comp.org.issnnet STATUS: unmoderated CHARTER: The newsgroup shall serve as a medium for discussions pertaining to the International Student Society for Neural Networks (ISSNNet), Inc., and to its activities and programs as they pertain to the role of students in the field of neural networks. Details were posted in the REQUEST FOR DISCUSSION, and can be requested from . VOTING PERIOD: JUNE 25 - JULY 25, 1991 ****************************************************************************** VOTING PROCESS If you wish to vote for or against the creation of comp.org.issnnet, please send your vote to: issnnet at park.bu.edu To facilitate collection and sorting of votes, please include one of these lines in your "subject:" entry: If you favor creation of comp.org.issnnet, your subject should read: YES - comp.org.issnnet If you DO NOT favor creation of comp.org.issnnet, use the subject: NO - comp.org.issnnet YOUR VOTE ONLY COUNTS IF SENT DIRECTLY TO THE ABOVE ADDRESS. ----------------------------------------------------------------------- For more information, please send e-mail to issnnet at park.bu.edu (ARPANET) write to: ISSNNet, Inc. PO Box 557, New Town Br. Boston, MA 02258 USA ISSNNet, Inc. is a non-profit corporation in the Commonwealth of Massachusetts. NOTE -- NEW SURFACE ADDRESS: ISSNNet, Inc. P.O. Box 15661 Boston, MA 02215 USA From koch at CitIago.Bitnet Thu Jun 27 06:12:08 1991 From: koch at CitIago.Bitnet (Christof Koch) Date: Thu, 27 Jun 91 03:12:08 PDT Subject: Phase-locking without oscillations Message-ID: <910627031202.20402f6a@Iago.Caltech.Edu> The following paper is available by anyonymous FTP from Ohio State University from pub/neuroprose. The file is called "koch.syncron.ps.Z". A SIMPLE NETWORK SHOWING BURST SYNCHRONIZATION WITHOUT FREQUENCY-LOCKING Christof Koch and Heinz Schuster ABSTRACT: The dynamic behavior of a network model consisting of all-to-all excitatory coupled binary neurons with global inhibition is studied analytically and numerically. It is shown that for random input signals, the output of the network consists of synchronized bursts with apparently random intermissions of noisy activity. We introduce the fraction of simultaneously firing neurons as a measure for synchrony and prove that its temporal correlation function displays, besides a delta peak at zero indicating random processes, strongly damped oscillations. Our results suggest that synchronous bursts can be generated by a simple neuronal architecture which amplifies incoming coincident signals. This synchronization process is accompanied by damped oscillations which, by themselves, however, do not play any constructive role in this and can therefore be considered to be an epiphenomenon. Key words: neuronal networks / stochastic activity / burst synchronization / phase-locking / oscillations For comments, send e-mail to koch at iago.caltech.edu. Christof P.S. And this is how you can FTP and print the file: unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62) Name: anonymous Password: neuron ftp> cd pub/neuroprose (actually, cd neuroprose) ftp> binary ftp> get koch.syncron.ps.Z ftp> quit unix> uncompress koch.syncron.ps.Z unix> lpr koch.syncron.ps Read and be illuminated. From nowlan at helmholtz.sdsc.edu Thu Jun 27 14:38:58 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Thu, 27 Jun 91 11:38:58 MST Subject: Thesis/TR available Message-ID: <9106271838.AA27191@bose> The following technical report version of my thesis is now available from the School of Computer Science, Carnegie Mellon University: ------------------------------------------------------------------------------- Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures CMU-CS-91-126 Steven J. Nowlan School of Computer Science Carnegie Mellon University ABSTRACT In this thesis, we consider learning algorithms for neural networks which are based on fitting a mixture probability density to a set of data. We begin with an unsupervised algorithm which is an alternative to the classical winner-take-all competitive algorithms. Rather than updating only the parameters of the ``winner'' on each case, the parameters of all competitors are updated in proportion to their relative responsibility for the case. Use of such a ``soft'' competitive algorithm is shown to give better performance than the more traditional algorithms, with little additional cost. We then consider a supervised modular architecture in which a number of simple ``expert'' networks compete to solve distinct pieces of a large task. A soft competitive mechanism is used to determine how much an expert learns on a case, based on how well the expert performs relative to the other expert networks. At the same time, a separate gating network learns to weight the output of each expert according to a prediction of its relative performance based on the input to the system. Experiments on a number of tasks illustrate that this architecture is capable of uncovering interesting task decompositions and of generalizing better than a single network with small training sets. Finally, we consider learning algorithms in which we assume that the actual output of the network should fall into one of a small number of classes or clusters. The objective of learning is to make the variance of these classes as small as possible. In the classical decision-directed algorithm, we decide that an output belongs to the class it is closest to and minimize the squared distance between the output and the center (mean) of this closest class. In the ``soft'' version of this algorithm, we minimize the squared distance between the actual output and a weighted average of the means of all of the classes. The weighting factors are the relative probability that the output belongs to each class. This idea may also be used to model the weights of a network, to produce networks which generalize better from small training sets. ------------------------------------------------------------------------------- Unfortunately there is NOT an electronic version of this TR. Copies may be ordered by sending a request for TR CMU-CS-91-126 to: Computer Science Documentation School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA There will be a charge of $10.00 U.S. for orders from the U.S., Canada or Mexico and $15.00 U.S. for overseas orders to cover copying and mailing costs (the TR is 314 pages in length). Checks and money orders should be made payable to Carnegie Mellon University. Note that if your institution is part of the Carnegie Mellon Technical Report Exchange Program there will be NO charge for this TR. REQUESTS SENT DIRECTLY TO MY E-MAIL ADDRESS WILL BE FILED IN /dev/null. - Steve (P.S. Please note my new e-mail address is nowlan at helmholtz.sdsc.edu). ------- End of Forwarded Message From D.M.Shumsheruddin at computer-science.birmingham.ac.uk Thu Jun 27 06:06:51 1991 From: D.M.Shumsheruddin at computer-science.birmingham.ac.uk (Dean Shumsheruddin) Date: Thu, 27 Jun 91 11:06:51 +0100 Subject: Request for references on navigation Message-ID: <961.9106271006@christopher-robin.cs.bham.ac.uk> I am looking for references to work on neural nets for navigation in graph-structured environments. I've already found the papers by Pomerleau and Bachrach in NIPS 3. I would greatly appreciate information about related work. If there is sufficient interest I'll post a summary to the list. Dean Shumsheruddin University of Birmingham, UK dms at cs.bham.ac.uk From russ at oceanus.mitre.org Fri Jun 28 10:47:09 1991 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 28 Jun 91 10:47:09 EDT Subject: Aspirin/MIGRAINES v4.0 Users Message-ID: <9106281447.AA13459@oceanus.mitre.org> Aspirin/MIGRAINES v4.0 Users Could those groups presently using the Aspirin/MIGRAINES v4.0 neural network simulator from MITRE please reply to this message. A brief description of your motivation for using this software would be useful but not necessary. We are compiling a list of users so that we may more easily distribute the next release of software (Aspirin/MIGRAINES v5.0). Thank you. Russell Leighton INTERNET: russ at dash.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA From karit at spine.hut.fi Mon Jun 3 14:07:02 1991 From: karit at spine.hut.fi (Kari Torkkola) Date: Mon, 3 Jun 91 14:07:02 DST Subject: Research positions in speech and image processing Message-ID: <9106031107.AA08981@spine.hut.fi.hut.fi> RESEARCH POSITIONS AVAILABLE The newly created "Institut Dalle Molle d'Intelligence Arti- ficielle Perceptive" (IDIAP) in Martigny Switzerland seeks to hire qualified researchers in the areas of speech recognition and image manipulation. Candidates should be able to conduct in- dependent research in a UNIX environment on the basis of solid theoretical and applied knowledge. Salaries will be aligned with those offered by the Swiss government for equivalent positions. Laboratories are now being established in the newly renovated building that houses the Institute, and international network connections will soon be in place. Researchers are expected to begin activity during the academic year 1991-1992. IDIAP is the third institute of artificial intelligence sup- ported by the Dalle Molle Foundation, the others being ISSCO (at- tached to the University of Geneva) and IDSIA (situated in Lu- gano). The new institute will maintain close contact with these latter centers as well as with the Polytechnical School of Lausanne and the University of Geneva. To apply for a research position at IDIAP, please send a curriculum vita and technical reports to: Daniel Osherson, Directeur IDIAP Case Postale 609 CH-1920 Martigny Switzerland For further information by e-mail, contact: osherson at disuns2.epfl.ch From issnnet at park.bu.edu Mon Jun 3 11:27:36 1991 From: issnnet at park.bu.edu (issnnet@park.bu.edu) Date: Mon, 3 Jun 91 11:27:36 -0400 Subject: RFD: comp.org.issnnet Message-ID: <9106031527.AA06005@copley.bu.edu> REQUEST FOR DISCUSSION ---------------------- GROUP NAME: comp.org.issnnet STATUS: unmoderated CHARTER: The newsgroup shall serve as a medium for discussions pertaining to the International Student Society for Neural Networks (ISSNNet), Inc., and to its activities and programs as they pertain to the role of students in the field of neural networks. See details below. TARGET VOTING DATE: JUNE 20 - JULY 20, 1991 ****************************************************************************** PLEASE NOTE In agreement with USENET newsgroup guidelines for the creation of new newsgroups, this discussion period will continue until June 21, at which time voting will begin if deemed appropriate. ALL DISCUSSION SHOULD TAKE PLACE ON THE NEWSGROUP "news.groups" If you do not have access to USENET newsgroups but wish to contribute to the discussion, send your comments to: issnnet at park.bu.edu specifying whether you would like your message relayed to news.groups. A call for votes will be made to the same newsgroups and mailing lists that originally received this message. PLEASE DO NOT SEND REPLIES TO THIS MAILING LIST OR NEWSGROUP DIRECTLY! A call for votes will be broadcast in a timely fashion. Please do not send votes until then. ****************************************************************************** BACKGROUND AND INFORMATION: The purpose of the International Student Society for Neural Networks (ISSNNet) is to (1) provide a means of exchanging information among students and young professionals within the area of Neural Networks; (2) create an opportunity for interaction between students and professionals from academia and industry; (3) encourage support >from academia and industry for the advancement of students in the area of Neural Networks; (4) insure that the interest of all students in the area of Neural Networks is taken into consideration by other societies and institutions involved with Neural Networks; and (5) to foster a spirit of international and interdisciplinary kinship among students as the study of Neural Networks develops into a self-contained discipline. Since its creation one year ago, ISSNNet has grown to over 300 members in more than 20 countries around the world. One of the biggest problems we have faced thus far is to efficiently communicate with all the members. To this end, a network of "governors" has been created. Each governor is in charge of distributing information (such as our newsletter) to all local members, collect dues, notify local members of relevant activities, etc. However, even this system has problems. Communication to a possibly very large number of members relies entirely on one individual, and given the typically erratic schedule of a student, it is often difficult to insure prompt and timely distribution to all members. More to the point, up until this time all governors have been contacting a single person (yours truly), and that has been a problem. Regular discussions on the society and related matters become very difficult when routed through individuals in this fashion. The newsgroup would be primarily dedicated to discussion of items pertaining to the society. We are about to launch a massive call for nominations, in the hope that more students will step forward and take a leading role in the continued success of the society. In addition, ISSNNet is involved with a number of projects, many of which require extensive electronic mail discussions. For example, we are developing a sponsorship program for students presenting papers at NNet conferences. This alone has generated at least 100 mail messages to the ISSNNet account, most of which could have been answered by two or three "generic" postings. We have refrained from using some of the existing mailing lists and USENET newsgroups that deal with NNets because of the non-technical nature of our issues. In addition to messages that are strictly society-related, we feel that there are many messages posted to these existing bulletin boards for which our newsgroup would be a better forum. Here is a list of topics that frequently come up, which would be handled in comp.org.issnnet as part of our "sponsored" programs: "What graduate school should I go to?" Last year, ISSNNet compiled a list of graduate programs around the world. The list will be updated later this year to include a large number of new degree programs around the world. "What jobs are available?" We asked companies that attended last year's IJCNN-San_Diego and INNC-Paris conferences to fill out a questionnaire on employment opportunities for NNet students. "Does anyone have such-and-such NNet simulator?" Many students have put together computer simulations of NNet paradigms and these could be shared by people on this group. "When is the next IJCNN conference?" We have had a booth at past NNet conferences, and hope to continue doing this for more and more international and local meetings. We often have informal get-togethers at these conferences, where students and others have the opportunity to meet. ----------------------------------------------------------------------- For more information, please send e-mail to issnnet at park.bu.edu (ARPANET) write to: ISSNNet, Inc. PO Box 557, New Town Br. Boston, MA 02258 USA ISSNNet, Inc. is a non-profit corporation in the Commonwealth of Massachusetts. ISSNNet, Inc. P.O. Box 557, New Town Branch Boston, MA 02258 USA From dcp+ at cs.cmu.edu Mon Jun 3 15:51:50 1991 From: dcp+ at cs.cmu.edu (David Plaut) Date: Mon, 03 Jun 91 15:51:50 EDT Subject: Preprint: Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia Message-ID: <1831.675978710@DWEEB.BOLTZ.CS.CMU.EDU> The following paper is available in the neuroprose archive as plaut.cogsci91.ps.Z. It will appear in this year's Cognitive Science Conference proceedings. A much longer paper presenting a wide range of related work is in preparation and will be announced shortly. Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia David C. Plaut Tim Shallice School of Computer Science Department of Psychology Carnegie Mellon University University College, London dcp at cs.cmu.edu ucjtsts at ucl.ac.uk Deep dyslexics are patients with neurological damage who exhibit a variety of symptoms in oral reading, including semantic, visual and morphological effects in their errors, a part-of-speech effect, and better performance on concrete than abstract words. Extending work by Hinton & Shallice (1991), we develop a recurrent connectionist network that pronounces both concrete and abstract words via their semantics, defined so that abstract words have fewer semantic features. The behavior of this network under a variety of ``lesions'' reproduces the main effects of abstractness on deep dyslexic reading: better correct performance for concrete words, a tendency for error responses to be more concrete than stimuli, and a higher proportion of visual errors in response to abstract words. Surprisingly, severe damage within the semantic system yields better performance on *abstract* words, reminiscent of CAV, the single, enigmatic patient with ``concrete word dyslexia.'' To retrieve this from the neuroprose archive type the following: unix> ftp 128.146.8.62 Name: anonymous Password: neuron ftp> binary ftp> cd pub/neuroprose ftp> get plaut.cogsci91.ps.Z ftp> quit unix> zcat plaut.cogsci91.ps.Z | lpr ------------------------------------------------------------------------------- David Plaut dcp+ at cs.cmu.edu School of Computer Science 412/268-8102 Carnegie Mellon University Pittsburgh, PA 15213-3890 From mjolsness-eric at CS.YALE.EDU Wed Jun 5 15:50:55 1991 From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness) Date: Wed, 5 Jun 91 15:50:55 EDT Subject: TR: Bayesian Inference on Visual Grammars by NNs that Optimize Message-ID: <9106051951.AA25379@NEBULA.SYSTEMSZ.CS.YALE.EDU> The following paper is available in the neuroprose archive as mjolsness.grammar.ps.Z: Bayesian Inference on Visual Grammars by Neural Nets that Optimize Eric Mjolsness Department of Computer Science Yale University New Haven, CT 06520-2158 YALEU/DCS/TR854 May 1991 Abstract: We exhibit a systematic way to derive neural nets for vision problems. It involves formulating a vision problem as Bayesian inference or decision on a comprehensive model of the visual domain given by a probabilistic {\it grammar}. A key feature of this grammar is the way in which it eliminates model information, such as object labels, as it produces an image; correspondance problems and other noise removal tasks result. The neural nets that arise most directly are generalized assignment networks. Also there are transformations which naturally yield improved algorithms such as correlation matching in scale space and the Frameville neural nets for high-level vision. Deterministic annealing provides an effective optimization dynamics. The grammatical method of neural net design allows domain knowledge to enter from all levels of the grammar, including ``abstract'' levels remote from the final image data, and may permit new kinds of learning as well. The paper is 56 pages long. To get the file from neuroprose: unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get mjolsness.grammar.ps.Z ftp> quit unix> uncompress mjolsness.grammar.ps.Z unix> lpr mjolsness.grammar.ps (or however you print postscript) -Eric ------- From jm2z+ at andrew.cmu.edu Thu Jun 6 13:47:19 1991 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Thu, 6 Jun 91 13:47:19 -0400 (EDT) Subject: Are they really worth the effort ? Message-ID: <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu> I'd like to have a debate about the advantages of distributed over local representations. I mean sure, distributed representations are great for they work in 2^n instead of n space, they degrade gracefully and all these PDP Bible type of things. But ... are they really that good ? For one thing they make our life awfully difficult in terms of undertanding and manipulating them .. .Are they really worth the effort ? Do you have concrete examples in your work where they did a better job than local representations ? Javier From ogs0%dixie.dnet at gte.com Thu Jun 6 17:21:22 1991 From: ogs0%dixie.dnet at gte.com (Oliver G. Selfridge) Date: Thu, 6 Jun 91 17:21:22 -0400 Subject: Warren McCulloch's widow Message-ID: <9106062121.AA05259@bunny.gte.com> I sadly announce that Rook McCulloch, widow to Warren McCulloch, dies last night at the age of 92. Warren himself, with Walter Pitts wrote the revolutionary introduction to neural nets in the middle 40s in two well-knwon papers. Rook maintained a bright aned contributory life up to the end and we will all miss her. Oliver Selfridge From jm2z+ at andrew.cmu.edu Thu Jun 6 18:19:45 1991 From: jm2z+ at andrew.cmu.edu (Javier Movellan) Date: Thu, 6 Jun 91 18:19:45 -0400 (EDT) Subject: are they worth the effort II Message-ID: <4cHfI1q00WBK03HW0G@andrew.cmu.edu> Please send your thoughts to connectionists so that we all can be instructed about the advantages of distributed representations. By the way, I already got two responses that I will sumarize bellow. Response number one provided the following arguments: 1- Brain uses distributed representations. He cites Lashley's (1929) experiments where rats show graceful performance degradation when they were partially deprived of their cortex. 2- Distributed representations are more resistant to degradation. He claims this may have military implications (systems resistant to enemy fire type of thing). [ OK does anybody out ther have data showing that distributed representations are more noise resistant than local representations ? I mean one can always clone the local representations and get noise resistance that way -Javier ] 3- He claims distributed representations performed very well in his research projects. [ Unfortunately he confuses distributed representations with backpropagation (BP). It is BP that worked well. It is always possible to force BP to develop local representations and perhaps it would work better that way. -Javier ] Response number two claims that *very* distributed representations are probably the wrong way to go. He said "Sligthly" distributed representations (like the ones used in Kruschke's ALCOVE model) are better. Unfortunately he does not provide any data supporting this point. I just got response # 3, which claims that distributed representations performed consistently better than local in the NETtalk domain and in isolated letter speech. [ Tom, could you send me some references ? Thanks - Javier ] -- Javier From tgd at turing.CS.ORST.EDU Thu Jun 6 18:11:44 1991 From: tgd at turing.CS.ORST.EDU (Tom Dietterich) Date: Thu, 6 Jun 91 15:11:44 PDT Subject: Are they really worth the effort ? In-Reply-To: Javier Movellan's message of Thu, 6 Jun 91 13:47:19 -0400 (EDT) <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu> Message-ID: <9106062211.AA13213@turing.CS.ORST.EDU> In my studies of error-correcting output codes, I found that these codes---which are particularly neat distributed representations---performed consistently better than local representations in the NETtalk domain and in isolated letter speech recognition. --Tom Thomas G. Dietterich Department of Computer Science Dearborn Hall, 303 Oregon State University Corvallis, OR 97331-3102 503-737-5559 From Nigel.Goddard at B.GP.CS.CMU.EDU Thu Jun 6 19:06:13 1991 From: Nigel.Goddard at B.GP.CS.CMU.EDU (Nigel.Goddard@B.GP.CS.CMU.EDU) Date: Thu, 6 Jun 91 19:06:13 EDT Subject: distributed/local Message-ID: Both extremes are wrong for representing conceptual knowledge (i.e., one unit per concept versus all units participate in all concepts). Disadvantages of extreme local include no tolerance to failure (neurons die all the time), difficult to express nuance without impossibly large numbers of units. The big advantage is it is easy to see what is going on, to design structures. Disadvantages of extreme distributed include crosstalk when more than one item is active and difficulty communicating an active item from one part of the architecture to another (too many links required). The big advantages are fault-tolerance (graceful degradation) and generalization. The answer is something in between the extremes (not that this is news to anyone), depending on what the task is. Order logn units per concept for an n-unit net might be a good place to start. Feldman has a TR discussing these issues in much more depth (TR 189, "Neural Representation of Conceptual Knowledge", Computer Science Dpt, Univ. Rochester, NY 14627). Also published as a book chapter, I believe. Nigel Goddard From soller%asylum at cs.utah.edu Thu Jun 6 22:54:43 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Thu, 6 Jun 91 20:54:43 -0600 Subject: Request for Information on Cognitive Science Curriculum Message-ID: <9106070254.AA24372@asylum.utah.edu> At the University of Utah, we are in the process of putting together a curriculum for Cognitive Science degrees at the undergraduate and graduate level. This faculty/student initiative is being led by Dr. Dick Burgess of Physiology. We were wondering what classes and sequences are considered to form the core of established Cognitive Science degree granting programs at graduate and undergraduate levels? Jerome Soller Department of C.S. U. of Utah soller at asylum.utah.edu From slehar at park.bu.edu Fri Jun 7 08:56:58 1991 From: slehar at park.bu.edu (Steve Lehar) Date: Fri, 7 Jun 91 08:56:58 -0400 Subject: Distributed Representations In-Reply-To: connectionists@c.cs.cmu.edu's message of 7 Jun 91 09:39:59 GM Message-ID: <9106071256.AA15832@park.bu.edu> I think the essence of this debate is in the nature of the input data. If your input is boolean in nature and reliably correct, then the processing performed on it can be similarly boolean and sequential with a great saving in time and space. It is when the input is fuzzy, ambiguous and distributed that the sequential logical boolean type of processing runs into problems. A perfect example is image understanding. No single local region of the image is sufficient for reliable identification. Try this yourself- punch a little hole in a big piece of paper and lay it on a randomly selected photograph and see how much you can recognize through that one local aperture. You have no way of knowing what the local feature is without the global context, but how do you know the global context without building it up out of the local pieces? Studies of the visual system suggest that in nature this problem is solved by a parallel optimization of all the local pieces in parallel together with many levels of global representations, such that the final interpretation is a kind of relaxation due to all of the constraints felt at all of the different representations all at the same time. This is the basic idea of Grossberg's BCS/FCS algorithm, and is in contrast to a more sequential "AI" approach where the local pieces are each evaluated independantly, and the results passed on to the next stage. I would claim that such an approach can never work reliably with natural images. I would be happy to provide more information on the BCS/FCS and my implementations of it to interested parties. From hendler at cs.UMD.EDU Fri Jun 7 10:40:58 1991 From: hendler at cs.UMD.EDU (Jim Hendler) Date: Fri, 7 Jun 91 10:40:58 -0400 Subject: distributed/local In-Reply-To: Nigel.Goddard@B.GP.CS.CMU.EDU's message of Thu, 6 Jun 91 19:06:13 EDT <9106071428.AA09615@mimsy.UMD.EDU> Message-ID: <9106071440.AA23704@dormouse.cs.UMD.EDU> For what it's worth, some preliminary results showing a well-behaved relationship between local and distributed reps are in paper I had at the NIPS conf (Advances in Neur. Info. Proc. Sys I - Touretzky (ed), 1989, p.553). I have followed up on this work a little, with a better anaylsis of the relationship described in last year's Cog. Sci. Conference, but the work is pretty preliminary. I've pretty much stopped pursuing this actively, but anyone wanting to pick up on it is welcome... -J. Hendler From hu at eceserv0.ece.wisc.edu Fri Jun 7 11:22:28 1991 From: hu at eceserv0.ece.wisc.edu (Yu Hu) Date: Fri, 7 Jun 91 10:22:28 -0500 Subject: What is distributed/local representation Message-ID: <9106071522.AA18585@eceserv0.ece.wisc.edu> While lots of buzzzzz words such as graceful degradation, appear in the discussion, may I ask a rather naive question: Shall someone give a mathematically (or .....ly) sound definition of distribution and local representation (of what?) then we proceed to discuss them? Suppose the representations are for data vector in an N-dimensional space. Is Distributed representation referred to data with many non-zero elements, and local representation to the opposite? If not, what are they? Regards, Yu Hen Hu Department of Electrical and Computer Engr. (608)262-6724(phone) Univ. of Wisconsin - Madison (608)262-1267(fax) 1415 Johnson Drive hu at engr.wisc.edu Madison, WI 53706-1691 U.S.A. From indurkhy at paul.rutgers.edu Fri Jun 7 12:10:42 1991 From: indurkhy at paul.rutgers.edu (Nitin Indurkhya) Date: Fri, 7 Jun 91 12:10:42 EDT Subject: Are they really worth the effort ? Message-ID: <9106071610.AA17674@paul.rutgers.edu> >In my studies of error-correcting output codes, I found that these >codes---which are particularly neat distributed >representations---performed consistently better than local >representations in the NETtalk domain and in isolated letter speech >recognition. in our own studies with the NETtalk dataset that you gave us, we found that local representations were competitive. the results are reported in "reduced complexity rule induction" by weiss and indurkhya (to be presented at ijcai-91). --nitin From lina at mimosa.physio.nwu.edu Fri Jun 7 12:47:29 1991 From: lina at mimosa.physio.nwu.edu (Lina Massone) Date: Fri, 7 Jun 91 11:47:29 CDT Subject: No subject Message-ID: <9106071647.AA05357@mimosa.physio.nwu.edu> About distributed representations The concept of distributed representation is intimately related to the concept of redundancy. The central nervous system makes a great use of redundant representations in the way receptive/projective fields are organized. I do not agree on the fact that distributed/redundant representations are primarily a protection against possible injuries or failures of the components; I'd rather consider that as a useful side-effect. To me the main values of redundancy are: greater sensitivity, higher resolution, improvement of signal-to-noise ratio, reduction of demand for stability of performance and for precision in ontogenesis. In general a comparison between the activity of a population of neurons and the activity of a single neuron will show that the population is sensitive to lower stimulus intensities, smaller increments, briefer events, higher frequencies, wider dynamic ranges than a single neuron and is less disturbed by independent drift and instability. As far as the amount of redundancy, there is some physiological evidence that the coding of information in the CNS is a compromise between fully distributed and fully localized. Given that the available number of neurons is limited, an entity (a piece of information) cannot be represented over a very large population of neurons that overlaps almost completely with the population activated by a different entity; this would cause a high degree of interference and would correspond to a very inefficient memory storage system. To maintain some degree of orthogonality within a limited number of neurons, the CNS makes the number of neurons - active for each stimulus - low. In other words each entity is represented across an ensemble of neurons but the ensemble is of limited size. As far as coarse coding, Ken Laws raised the issue of matching the structure of data with the code. I agree on that. The CNS does that by having neighboring receptors stimulated by neighboring fractions of the impinging world, i.e. by means of a topological principle. An example of the computational advantages of this idea for control problems is given in L. Massone, E. Bizzi (1990) On the role of input representations in sensorimotor mapping, Proc. IJCNN, Washington D.C. Lina Massone From tgd at turing.cs.orst.edu Fri Jun 7 12:45:26 1991 From: tgd at turing.cs.orst.edu (Tom Dietterich) Date: Fri, 7 Jun 91 09:45:26 PDT Subject: Distributed Representations In-Reply-To: Ken Laws's message of Thu 6 Jun 91 22:02:27-PDT <676270947.0.LAWS@AI.SRI.COM> Message-ID: <9106071645.AA16085@turing.CS.ORST.EDU> Date: Thu 6 Jun 91 22:02:27-PDT From: Ken Laws Mail-System-Version: I'm not sure this is the same concept, but there were several papers at the last IJCAI showing that neural networks worked better than decision trees. The reason seemed to be that neural decisions depend on all the data all the time, whereas local decisions use only part of the data at one time. This is not the same concept at all. You are worrying about locality in the input space, whereas distributed representations usually concern (lack of) locality in the output space or in some intermediate representation. I have applied decision trees to learn distributed representations of output classes, and in all of my experiments, the distributed representation performs better than learning either one large tree (to make a k-way discrimination) or learning k separate trees. I believe this is because a distributed representation is able to correct for errors made in learning any individual output unit. The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive presents experimental support for this claim. I've never put much stock in the military reliability claims. A bullet through the chip or its power supply will be a real challenge. Noise tolerance is important, though, and I suspect that neural systems really are more tolerant. It isn't a neural vs. non-neural issue: distributed representations are more redundant, and hence, more resistant to (local) damage. Noise tolerance is also not a neural vs. non-neural issue. To achieve noise tolerance, you must control over-fitting. There are many ways to do this: low-dimensional representations, smoothness assumptions, minimum description length methods, cross-validation, etc. Terry Sejnowski's original NETtalk work has always bothered me. He used a neural network to set up a mapping from an input bit string to 27 output bits, if I recall. I have never seen a "control" experiment showing similar results for 27 separate discriminant analyses, or for a single multivariate discriminant. I suspect that the results would be far better. The wonder of the net was not that it worked so well, but that it worked at all. I think you should perform these studies before you make such claims. I myself doubt them very much, because the NETtalk task violates the assumptions of discriminant analysis. In my experience, backpropagation works quite well on the NETtalk task. We have found that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor method) and generalized radial basis functions do better than backpropagation, but everything else we have tried does worse (decision trees, perceptrons, Fringe). I have come to believe strongly in "coarse-coded" representations, which are somewhat distributed. (I have no insight as to whether fully distributed representations might be even better. I suspect that their power is similar to adding quadratic and higher-order terms to a standard statistical model.) The real win in coarse coding occurs if the structure of the code models structure in the data source (or perhaps in the problem to be solved). -- Ken Laws The real win in any problem comes from good modelling, of course. But since we can't guarantee a priori that our representations are good models, it is important to develop ways for recovering from inappropriate models. I believe distributed representations provide one such way. --Tom Dietterich From dhw at t13.Lanl.GOV Fri Jun 7 14:31:42 1991 From: dhw at t13.Lanl.GOV (David Wolpert) Date: Fri, 7 Jun 91 12:31:42 MDT Subject: No subject Message-ID: <9106071831.AA11289@t13.lanl.gov> Javier Movellan wonders about the relative "advantages of distributed over local representations". He asks of members of the net, "Do you have concrete examples in your work where they did a better job than local representations? I have concrete examples in which they do worse - sometimes far worse. See references below. David Wolpert (dhw at tweety.lanl.gov) D. H. Wolpert, "A benchmark for how well neural nets generalize", Biological Cybernetics, 61 (1989), 303-315. D. H. Wolpert, "Constructing a generalizer superior to NETtalk via a mathematical theory of generalization", Neural Networks, 3 (1990), 445-452. D. H. Wolpert, "Improving the performance of generalizers via time-series-like pre-processing of the learning set", Los Alamos Report LA-UR-91-350, submitted to IEEE PAMI. From kukich at flash.bellcore.com Fri Jun 7 17:26:05 1991 From: kukich at flash.bellcore.com (Karen Kukich) Date: Fri, 7 Jun 91 17:26:05 -0400 Subject: distributed vs. local encoding schemes Message-ID: <9106072126.AA06750@flash.bellcore.com> I ran some back-prop spelling correction experiments a few years ago in which one of the control variables was the use of distributed vs. local encoding schemes for both input and output representations. Local encodings schemes were clear winners in both speed of learning and performance (correction accuracy for novel misspellings). To clarify, a local output scheme was simply a 1-of-n vector (n=200) where each node represented one word in the lexicon; a "semi-local" input scheme was a 15*30=450-unit vector where each 30-unit block locally encoded one letter in a word of up to 15 characters. This positionally-encoded input scheme was thus local w.r.t individual letters in a word but distributed w.r.t the whole word. (Incidentally, the nets took slightly longer to learn to correct the shift-variant insertion and deletion errors, but they eventually learned them as well as the shift-invariant substitution and transposition errors.) The distributed encoding schemes were m-distance lexicodes, where m is the Hamming distance btwn codes. Thus lexicode-1 is just a binary number code. I tried lexicodes of m=1,2,3 and 4 for both output words and input letters. Both speed of learning and correction accuracy improved linearly with increasing m. These results were published in a paper that appeared in the U.S. Post Office Avanced Technology Conference in May of 1988. My only interpretation of the results is that local encoding schemes simplify the learning task for nets; I'm convinced that distributed schemes are essential for cognitive processes such as semantic representation at least, due to the need for multi-dimensional semantic access and association. As an epilog, I ran a few more experiments afterword that left me with a small puzzle. In the above experiments I had also found that performance improved as the number of hidden nodes increased up to about n(=200) and then leveled off. Afterwords, I tested the local net with the 450-unit positionally-encoded input scheme and NO hidden nodes and found performance equal to or better than any net with a hidden layer and much faster learning. But when I tried a shift-invariant input encoding scheme, in which misspellings were encoded by a 420-unit vector representing letter bigrams and unigrams, I found similarly good performance for nets with hidden layers but miserable performance for a net with no hidden layer. Apparently, the positionally-encoded input scheme yields a set of linearly- separable input classes but the shift-invariant scheme does not. It's still not clear to me why this is? Karen Kukich kukich at bellcore.com From ps_coltheart at vaxa.mqcc.mq.oz.au Sat Jun 8 10:51:22 1991 From: ps_coltheart at vaxa.mqcc.mq.oz.au (Max Coltheart) Date: Sat, 8 Jun 91 09:51:22 est Subject: distributed representations Message-ID: <9106072351.AA01618@macuni.mqcc.mq.oz.au> The original posting about this mentioned the property of graceful degradation as one of the virtues of systems that use distributed respresentations. In what way is this a virtue? For nets that are doing some engineering job such as character recognition, it would obviously be good if some damage or malfunction didn't much affect the net's performance. But for nets that are meant to be models of cognition, the hidden assumption seems to be that after brain damage there is graceful degradation of cognitive processing, so the fact that nets show graceful degradation too means they have promise for modelling cognition. But where's the evidence that brain damage degrades cognition gracefully? That is, the person just gets a little bit worse at a lot of things? Very commonly, exactly the opposite happens - the person remains normal at almost all kinds of cognitive processing, but some specific cognitive task suffers catastroph- ically. No graceful degradation here. I could give very many examples: I'll just give one (Semanza & Zettin, Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had impaired language, but this impairment was confined to language production (comprehension was fine) and to the production of just one type of word: proper nouns. He could understand proper nouns normally, but could produce almost none whilst his production of other kinds of nouns was normal. What's graceful about this degradation of cognition? If cognition does *not* degrade gracefully, and neural nets do, what does this say about neural nets as models of cognition? Max Coltheart From dave at cogsci.indiana.edu Fri Jun 7 22:03:50 1991 From: dave at cogsci.indiana.edu (David Chalmers) Date: Fri, 7 Jun 91 21:03:50 EST Subject: distributed reps Message-ID: Properties like damage resistance, graceful degradation, etc, are all nice, useful, cognitively plausible possibilities, but I would have thought that by far the most important property of distributed representation is the potential for systematic processing. Obviously ultra-local systems (every possible concept represented by an arbitrary symbol) don't allow much systematic processing, as each symbol has to be handled by its own special rule: e.g. , (though things can be improved somewhat by connecting the symbols up, as e.g. in a semantic network). Things are much improved by using compositional representations, as e.g. found in standard AI. If you represent many concepts by compounding the basic tokens, then certain semantic properties can be reflected in internal structure -- e.g. "LOVES(CAT, DOG)" and "LOVES(JOHN,BILL)" have relevantly similar internal structures -- opening the door to processing these structures in systematic ways. Distributed representations just take this idea a step further. One sees the systematicity made possible by giving representations internal structure as above, and says "why stop there?" e.g. why not give every representation internal structure (why should CATs and DOGs miss out?). Compositional representations as above only represent a limited range of semantic properties systematically in internal structure -- namely, compositional properties. All kinds of other semantic properties might be fair game. By moving to e.g. vectorial representation for every concept, then e.g. the similarity structure of the semantic space can be reflected in the similarity structure of the representational space, and so on. And it turns out that you can process compositional properties systematically too (though not quite as easily). The combination of a multi-dimensional space with a large supply of possible non-linear operations seems to open up a lot of possible kinds of systematic processing, essentially because these operations can chop up the space in ways that standard operations on compositional structures can't. The proof is in the pudding, i.e. the kinds of systematic processing that connectionist networks exhibit all the time. Most obviously, automatic generalization: new inputs are slotted into some representational form, hopefully leading to reasonable behaviour from the network. Similarly for dealing with old inputs in new contexts. By comparison, with ultra-local representations, generalization is right out (except by assimilating new inputs into an old category, e.g. by nearest neighbour methods). Using compositional representations, certain kinds of generalization are obviously possible, as with decision trees. These suffer a bit from having to deal directly with the original input space, rather than developing a new representational space as with dist reps: so you (a) don't get the very useful capacity to take a representation that's developed and use it for other purposes (e.g. as context for a recurrent network, or as input for some new network), and (b) are likely to have problems on very large input spaces (anyone using decision trees for vision?). Both (a) and (b) suggest that decision trees may be unlikely candidates for the backbone of a cognitive architecture (conversely, the ability of connectionist networks to transform one representational space into another is likely to be key to their success as a cognitive architecture). As for generalization performance, that's an empirical matter, but the results of Dietterich etc seem to indicate that decision trees don't do quite as well, presumably because of the limited ways in which they can chop up a representational space (nasty rectangular blocks vs groovy flexible curves). There's far too much else that could be said, so I'll stop here. Dave Chalmers. From tsejnowski at UCSD.EDU Fri Jun 7 22:48:11 1991 From: tsejnowski at UCSD.EDU (Terry Sejnowski) Date: Fri, 7 Jun 91 19:48:11 PDT Subject: distributed/local Message-ID: <9106080248.AA27620@sdbio2.UCSD.EDU> A nice paper that compares ID3 decision trees with backprop on NETtalk and other data sets: Shavlik, J. W., Mooney, R. J., and Towell, G. G. Symbolic and neural learning algorithms: An experimental comparison (revised). Univ. Wisconsin Dept Comp. Sci Tech Report #955 (to appear Machine Learning #6). Overall, backprop performed slightly better than ID3 but took longer to train. Backprop was also more effective in using distributed coding schemes for the inputs and outputs. An error-correcting code, or even a random code, works better than a local code or hand-crafted features. (Ghulum Bakiri and Tom Dietterich reached the same conclusion). The issue of the code developed by the hidden units is also an interesting issue. In NETtalk, the intermediate code was semidistributed -- around 15% of the hidden units were used to represent each letter-to-sound correspondence. The vowels and the consonants were fairly well segregated, arguing for local coding at a gross population level (something seen in the brain) but distributed coding at the level of single units (also observed in the brain). The degree of coarseness clearly depends on the grain of the problem. In the original study Charlie Rosenberg and I showed that backprop with hidden units outperformed perceptorons, and hence 26 independent linear discriminants. The NETtalk database is available to anyone who wants to benchmark their learning algorithm. For ftp access contact Scott.Fahlman at b.gp.cs.cmu.edu Terry From french at cogsci.indiana.edu Sat Jun 8 00:39:11 1991 From: french at cogsci.indiana.edu (Bob French) Date: Fri, 7 Jun 91 23:39:11 EST Subject: semi-distributed representations Message-ID: One simultaneous advantage and disadvantage of fully distributed representations is that one representation will affect many others. This phenomenon of interference is what allows networks to generalize but it is also what leads to the problem of catastrophic forgetting. It is reasonable to suppose that the amount of interference in backpropagation networks is directly proportional to the amount of overlap of representations in the hidden layer (the "overlap" of two representations can be defined as the dot product of their activation vectors). The greater the overlap (i.e., the more distributed the representations), the more the network will be affected by catastrophic forgetting, but the better it will be at generalizing. The less the overlap (i.e., the more local the representations), the less the network will be affected by catastrophic forgetting, but the worse it will be at generalizing. If we want nets that do not need to be retrained completely when new data is presented to them but still retain their ability to generalize, we must therefore use representations that are neither too local, nor too distributed, what I have called "semi-distributed" representations. I have a paper to appear in CogSci Proceedings 1991 that proposes this relationship between the amount of overlap of representations in the hidden layer and catastrophic forgetting and generalization. The paper outlines one simple method that allows a BP network to evolve its own semi-distributed representations as it learns. - Bob French Center for Research on Concepts and Cognition Indiana University From dcp+ at cs.cmu.edu Sun Jun 9 09:30:32 1991 From: dcp+ at cs.cmu.edu (David Plaut) Date: Sun, 09 Jun 91 09:30:32 EDT Subject: distributed representations In-Reply-To: Your message of Sat, 08 Jun 91 10:51:22 -0400. <9106072351.AA01618@macuni.mqcc.mq.oz.au> Message-ID: <2428.676474232@DWEEB.BOLTZ.CS.CMU.EDU> >But where's the evidence that brain damage degrades cognition gracefully? That >is, the person just gets a little bit worse at a lot of things? Very commonly, >exactly the opposite happens - the person remains normal at almost all kinds >of cognitive processing, but some specific cognitive task suffers catastroph- >ically. No graceful degradation here. I think the issue here is a matter of scale. "Graceful degredation" refers to the gradual loss of function with increasing severity of damage - it says nothing about how specific or general that function is. Connectionist models can be modular at a global scale, but use distributed representations and show graceful degredation *within* modules. I think you would agree that, within a particular domain, this is a reasonable characterization of the behavior of many types of patient (to the degree that we understand the modular organization of certain aspects of cognition and the nature of individual patients' damage). Of course, severe damage to a module might still produce catestrophic loss of its function, perhaps leaving the remaining functions relatively intact. On the other hand, the *degree* of specificity of impairment certainly places constraints on the modular organization and the nature of the representations within each module (although I think connectionist modeling illustrates the danger of the "specific impairment implies separate module" logic). Only specific modeling work can demonstrate whether connectionist architectures and representations can account for the behavior of specific patients in an informative way. -Dave ------------------------------------------------------------------------------- David Plaut dcp+ at cs.cmu.edu School of Computer Science 412/268-8102 Carnegie Mellon University Pittsburgh, PA 15213-3890 From gasser at bend.UCSD.EDU Sun Jun 9 00:58:26 1991 From: gasser at bend.UCSD.EDU (Michael Gasser) Date: Sat, 8 Jun 91 21:58:26 PDT Subject: Distributed representations and graceful degradation Message-ID: <9106090458.AA04907@bend.UCSD.EDU> Max Coltheart discusses how damage to real neural networks often results in more of a clumsy than a graceful sort of degradation. But isn't degradation under conditions of increasing task complexity a different matter? I'm thinking of the processing of increased levels of embedding or (possibly also) numbers of arguments in natural language. Fixed-length distributed representations of syntactic or semantic structure (e.g., RAAM, Elman nets) seem to model this behavior quite well, in comparison to the usual symbolic approach (you're no more likely to fail at 28 levels of embedding than at 2) and to localist connectionist approaches (you can handle sentences with 3 arguments, but 4 are out because you run of units). Mike Gasser From siegelma at yoko.rutgers.edu Sun Jun 9 10:56:40 1991 From: siegelma at yoko.rutgers.edu (siegelma@yoko.rutgers.edu) Date: Sun, 9 Jun 91 10:56:40 EDT Subject: TR available from neuroprose; Turing equivalence Message-ID: <9106091456.AA12844@yoko.rutgers.edu> The following report is now available from the neuroprose archive: NEURAL NETS ARE UNIVERSAL COMPUTING DEVICES H. T. Siegelmann and E.D. Sontag. (13pp.) Abstract: It is folk knowledge that neural nets should be capable of simulating arbitrary computing devices. Past formalizations of this fact have been proved under the hypotheses that there are potentially infinitely many neurons available during a computation and/or that interconnections are multiplicative. In this work, we show the existence of a finite network, made up of sigmoidal neurons, which simulates a universal Turing machine. It is composed of less than 100,000 synchronously evolving processors, interconnected linearly. -Hava ----------------------------------------------------------------------------- To obtain copies of the postscript file, please use Jordan Pollack's service: Example: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): ftp> cd pub/neuroprose ftp> binary ftp> get (remote-file) siegelman.turing.ps.Z (local-file) siegelman.turing.ps.Z ftp> quit unix> uncompress siegelman.turing.ps.Z unix> lpr -P(your_local_postscript_printer) siegelman.turing.ps ---------------------------------------------------------------------------- If you have any difficulties with the above, please send e-mail to siegelma at paul.rutgers.edu. DO NOT "reply" to this message, please. From jagota at cs.Buffalo.EDU Sun Jun 9 16:52:33 1991 From: jagota at cs.Buffalo.EDU (Arun Jagota) Date: Sun, 9 Jun 91 16:52:33 EDT Subject: Information Capacity and Local vs Distributed Message-ID: <9106092052.AA04177@sybil.cs.Buffalo.EDU> Dear Connectionists, I think Information Capacity* (IC) (Abu-Mostafa, Jacques 85) is a useful quantitative criterion for L vs D, illustrated by the following trivial example. You are given k pebbles, to be placed in k-of-n locations. location has pebble => `1', otherwise `0'. IC == # distinct vectors that can be stored = C(n,k) (n choose k) For this e.g, its nice that the Binomial distribution quantifies IC for L vs D. The IC of k ~ n/2 (distributed) is by far superior. k = 1 ==> Local, IC = n k is n/2 ==> distributed, IC = C(n,n/2) is maximum k = n-1 ==> over-distributed, IC = n With (threshold-element) connectionist nets, the analogy holds, but the (hidden or output layer) units [locations] are not independent. I would think there is scope for theory and empirical work along these lines. I have seen IC work on symmetric nets but even here I am unaware of work on IC as a function of k. I am unaware (haven't looked) of any work on FF nets. * - IC is actually defined as log of how I have shown Sincerely, Arun Jagota jagota at cs.buffalo.edu From peterc at chaos.cs.brandeis.edu Mon Jun 10 00:06:49 1991 From: peterc at chaos.cs.brandeis.edu (Peter Cariani) Date: Mon, 10 Jun 91 00:06:49 edt Subject: (the late) Rook McCulloch Message-ID: <9106100406.AA29926@chaos.cs.brandeis.edu> Rook McCulloch also edited a 4 volume set of Warren McCulloch's works, "The Collected Works of Warren S. McCulloch", published by Intersystems Press in 1989 (401 Victor Way #3, Salinas, CA 93907 USA; $84 for 4 volumes, paper). In addition to her forward and Warren McCulloch's papers, the set also contains some very nice essays by Jerry Lettvin, Michael Arbib, F.S.C. Northrop, Heinz von Foerster, D.M. MacKay (and others). For those of us who never knew the McCullochs, this seems to be the best available source of information about what they thought and felt. Also of relevance is the book of Steve Heims on the Macy conferences and the origins of cybernetics ("The Cybernetics Group", MIT Press, 1991) in which Warren McCulloch's role is amply discussed. From bates at crl.ucsd.edu Mon Jun 10 12:25:10 1991 From: bates at crl.ucsd.edu (Elizabeth Bates) Date: Mon, 10 Jun 91 09:25:10 PDT Subject: response to max coltheart Message-ID: <9106101625.AA25405@crl.ucsd.edu> From Mailer-Daemon Sun Jun 9 19:23:49 1991 From: Mailer-Daemon (Mail Delivery Subsystem) Date: Sun, 9 Jun 91 16:23:49 PDT Subject: Returned mail: User unknown Message-ID: <9106092323.AA19415@crl.ucsd.edu> ----- Transcript of session follows ----- Connected to macuni.mqcc.mq.oz.au: >>> RCPT To: <<< 550 ... User unknown: Connection refused 550 connectnet at macuni.mqcc.mq.oz.au... User unknown ----- Unsent message follows ----- From bates Sun Jun 9 19:23:49 1991 From: bates (Elizabeth Bates) Date: Sun, 9 Jun 91 16:23:49 PDT Subject: distributed representations Message-ID: <9106092323.AA19411@crl.ucsd.edu> I respectfully disagree with Max Coltheart that brain damage usually or even often yields discrete and domain-specific performance decrements. to be sure, such cases have been reported -- and indeed, their "news value" often lies in the surprisingly discrete nature of the patient's profile. but such case studies typically fail to recognize issues like the peaks and valleys that might have been there premorbidly, i.e. in the "man that used to be". also, we often fail to recognize that by choosing those patients with "interesting" profiles against an unspecified number of background patients with "uninteresting" profiles, we are capitalizing on chance distributions across a number of noisy domains. given 1000 patients who are normally distributed across 100 tasks, I have a pretty solid chance of finding a good number of striking "double dissociations" and even more "single dissociations" entirely by chance. For a simulation that makes EXACTLY that point (coupled with a detailed critique of a "real" study of 20 patients that make this very error), see Bates, Appelbaum and Allard, "Statistical constraints on the use of single case studies in neuropsychological research", in the last issue of Brain and Language. -liz bates From bates at crl.ucsd.edu Mon Jun 10 12:29:33 1991 From: bates at crl.ucsd.edu (Elizabeth Bates) Date: Mon, 10 Jun 91 09:29:33 PDT Subject: Distributed representations and graceful degradation Message-ID: <9106101629.AA25488@crl.ucsd.edu> Marcel Just and Patricia Carpenter have a paper coming out in Psychological Review that shows (reviewing quite a range of studies) how the ability of normal adults to handle (read, comprehend) various levels of grammatical complexity and ambiguity interacts with (1) that adult's working memory span, and (2) the effects of a cognitive load imposed by a secondary task. The notion of graceful degradation seems to apply to their work very well. You can obtain a preprint of their paper by contacting them at CMU (Psychology Department). -liz bates From cabestan at eel.upc.es Mon Jun 10 10:05:46 1991 From: cabestan at eel.upc.es (JOAN CABESTANY) Date: Mon, 10 Jun 1991 14:05:46 +0000 Subject: Call for Papers IWANN'91 Message-ID: <"155*/S=cabestan/OU=eel/O=upc/PRMD=iris/ADMD= /C=es/"@MHS> Dear Colleagues, Please find here the second Call for Papers for IWANN'91. Remember that the absolute limit date for work presentation is June 20 th. IWANN'91 will be held in GRANADA next September. ****************************************************************** ****************************************************************** INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS IWANN'91 Second Announcement Granada, Spain September 17-19, 1991 ORGANISED AND SPONSORED BY Spanish Chapter of the Computer Society of IEEE, AEIA (IEEE Computer Society Affiliate), and Department of Electronic and Computer Technology. University of Granada. Spain. SCOPE Artificial Neural Networks (ANN) were first developed as structural or functional modelling systems of natural ones, featuring the ability to perform problem-solving tasks. They can be thought as computing arrays consisting of series of repetitive uniform processors (neuron-like elements) placed on a grid. Learning is achieved by changing the interconnections between these processing elements. Hence, these systems are also called connectionist models. ANN has become a subject of wide-spread interest: they offer an odd scheme-based programming standpoint and exhibit higher computing speeds than conventional von-Neumann architectures, thus easing or even enabling handling complex task such as artificial vision, speech recognition, information recovery in noisy environments or general pattern recognition. In ANN systems, collective information management is achieved by means of parallel operation of neuron-like elements, into which information processing is distributed. It is intended to exploit this highly parallel processing capability as far as possible in complex problem-solving tasks. Cross-fertilization between the domains of artificial and real neural nets is desirable. The more genuine problems of biological computation and information processing in the nervous system still remain open and contributions in this line are more than welcome. Methodology, theoretical frames, structural and organizational principles in neuroscience, self- organizing and co-operative processes and knowledge based descriptions of neural tissue are relevant topics to bridge the gap between the artificial and natural perspectives. The workshop intends to serve as a meeting place for engineers and scientists working in this area, so that present contacts and relationships can be further increased. The workshop will comprise two complementary activities: . scientific and technical conferences, and . scientific communications sessions. TOPICS The workshop is open to all aspects of artificial neural networks, including: 1. Neural network theories. Neural models. 2. Biological perspectives 3. Neural network architectures and algorithms. 4. Software developments and tools. 5. Hardware implementations 6. Applications. LOCATION Facultad de Ciencias Campus Universitario de Fuentenueva Universidad de Granada 18071 GRANADA. (SPAIN) LANGUAGES English and Spanish will be the official working languages. English is preferable as the working language. Simultaneous translation will be available. Simultaneous translation will be available. CALL FOR PAPERS The Programme Committee seeks original papers on the six above mentioned areas. Survey papers on the various available approaches or particular application domains are also sought. In their submitted papers, authors should pay particular attention to explaining the theoretical and technical choices involved, to make clear the limitations encountered and to describe the current state of development of their work. INSTRUCTIONS TO AUTHORS Three copies of submitted papers (not exceeding 8 pages in 21x29.7 cms (DIN-A4), with 1,6 cm. left, right, top and bottom margins) should be received by the Programme Chairman at the address below before June 20, 1991. The headlines should be centred and include: . the title of paper in capitals . the name(s) of author(s) . the address(es) of author(s), and . a 10 line abstract. Three blank lines should be left between each of the above items, and four between the headlines and the body of the paper, written in English, single-spaced and not exceeding the 8 pages limit. All papers received will be refereed by the Programme Committee. The Committee will communicate their decision to the authors on July 10. Accepted papers will be published in the proceedings to be distributed to workshop participants. In addition to the paper, one sheet should be attached including the following information: . the title of the paper, . the name(s) of author(s), . a list of five keywords, . a reference to which of the six topics the paper concerns, and . postal address of one of the authors, with phone and fax numbers, and E-mail (if available). . presentation language We intend to get in touch with various international publishers (such as Springer-Verlag and Prentice-Hall) for the final version of the proceedings. PROGRAM AND ORGANIZATION COMMITTEE Organization Chairman: Alberto Prieto (Unv. Granada. Spain) Programme Chairman: Jos Mira (UNED. Madrid. Spain) Senen Barro Unv. de Santiago (E) Francois Blayo Ecole Polytechnique Federale de Lausanne (S) Joan Cabestany Unv. Pltca. de Catalunya (E) Marie Cottrell Unv. Paris I (F) Jose Antonio Corrales Unv. Oviedo. (E) Gerard Dreyfus ESPCI Paris (F) Gregorio Fernandez Unv. Pltca. de Madrid (E) J. Simoes da Fonseca Unv. de Lisboa (P) Karl Goser Unv. Dortmund (G) Jeanny Herault INPG Grenoble (F) Jose Luis Huertas CNM- Universidad de Sevilla (E) Simon Jones Unv. Nottingham (UK) Chistian Jutten INPG Grenoble (F) Antonio Lloris Unv. Granada (E) Panos A. Ligomenides Unv. of Maryland (USA) Javier Lopez Aligue Unv. de Extremadura. (E) Federico Moran Unv. Complutense. Madrid (E) Roberto Moreno Unv. Las Palmas Gran Canaria (E) Franz Pichler Johannes Kepler Univ. (Aus) Ulrich Rueckert Unv. Dortmund (G) Francisco Sandoval Unv. de Malaga (E) Carmen Torras Instituto de Ciberntica. CSIC. Barcelona (E) V. Tryba Unv. Dortmund (G) Elena Valderrama CNM- Unv. Autonoma de Barcelona (E) Michel Weinfeld Ecole Polytechnique Paris (F) LOCAL ORGANIZING COMMITTEE (Universidad de Granada) Juan Julian Merelo Julio Ortega Francisco J. Pelayo Begona del Pino Alberto Prieto ORGANIZING ENTITIES: Spanish Chapter of the Computer Society of IEEE, AEIA (IEEE Computer Society Affiliate), and Department of Electronic and Computer Technology. University of Granada. Spain. SPONSORING ENTITIES: Ayuntamiento de Granada (Dto. de Congresos) Caja General Universidad de Granada SOME USEFUL INFORMATION Granada is a beautiful city that lies to the south of Spain, in which the mixture between Christian and Muslim culture reaches its architectural peak. The Alhambra is the most magnificent European Muslim fortress and palace conserved to-date, and Granada nights are known in all Spain for their liveliness, due to the high proportion of students. The river Genil gives rise to the Vega or Valley of Granada, where the soil is fertile and bears the most varied crops. It has small farms and beautiful villages, some as interesting as Santa Fe, where the voyage for the discovery of America was negotiated. From Granada it takes only one hour to get to the southernmost ski resort in Europe, Sierra Nevada, where Winter sports can be enjoyed. A wide road leads right up to the Veleta Peak, so that in Summer it can be reached by car. This road, at 3,428 m. above sea level, is the highest in Europe. 65 Km. from the city of Granada is Granada's Costa del Sol (so called Costa Tropical or Tropical Coast). The University of Granada is the third most important in Spain. It has 40,000 students, which makes up one sixth of the whole population. This is what gives the city a youthful and dynamic atmosphere, stimulating a "living culture". The weather during mid-September in Granada is warm, and temperatures of 30 degrees Centigrade are not unusual. Temperatures can lower during the night, so a pullover is advised. During the day, t-shirts or light shirts and trousers are the most suitable clothes. PRE AND POST WORKSHOP TOURS: A-EXCURSION: September 16: Trip to Alpujarra, typical mountain villages. Time: 9.00-20h. Price: 3500 ptas./per person (Includes Bus and lunch). B-EXCURSION: September 20: Trip to Costa del Sol, including Nerja with its wonderful caves and the seaside resorts of Almunecar and Salobre$a. Time: 9.00-20h. Price: 2000 ptas. (Includes Bus) SOCIAL ACTIVITIES: September 16: Pre Workshop tour (A-Excursion) September 17: 20:00 Reception at the Hospital Real (16th Century University Central Services Building). 22:00 Night visit to the Alhambra. September 18: 20:00 Reception at the "Palacio de los Cordova" (Albaic!n), given by the Granada City Hall (Congress Dept.). September 19: 21:00 Official dinner September 20: Post Workshop tour (B-Excursion) PROVISIONAL SCHEDULE September 17: 9:15 Opening session. 10:00-11:30 Lecture 1:Natural and Artificial Neural Nets; Prof. Dr. Roberto MORENO (Universidad de las Palmas de Gran Canaria) 11:30-12:00 Coffee-break. 12:00-13:30 Session 1. 16:00-17:30 Session 2. 17:30-18:00 Coffee-break. 18:00-19:30 Session 3 September 18: 09:30-11:00 Lecture 2: Application and Implementation of Neural Networks in Microelectronics; Prof. Dr. Ing. Karl GOSER (Universitt Dortmund) 11:00-11:30 Coffee-break. 11:30-13:30 Session 4. 16:00-17:30 Session 5. 17:30-18:00 Coffee-break. 18:00-19:30 Session 6. September 19: 09:00-11:00 Lecture 3: Cooperative Computing and Neural Networks; Prof. Panos A. LIGOMENIDES (University of Maryland) 11:00-11:30 Coffee-break. 11:30-13:30 Session 7. 16:00-17:30 Session 8. 17:30-18:00 Coffee-break. 18:00-19:30 Session 9. This form should be sent before July 25 to: Viajes Internacional Expreso (V.I.E.); Galerias Preciados; Carrera del Genil, s/n. 18005 GRANADA (Spain) Tnos. (34) 58-22.44.95, (34) 58-22.75.86, (34) 58-224944; Telex: 78525 The following hotels are available with special fees for the Workshop participants. The prices are per night and they include V.A.T. and continental breakfast: Hotel Cat. Single room Double room ______________________________________________________________ Condor *** 7700 10070 pts. Eurobecquer ** 4630 5820 Tour A ........... 3.500 pts. Tour B ........... 2.000 pts Please tick the appropriate box. Reservations can be guaranteed before July 25th. A list of other hotels is enclosed (Please address directly to them). Payment should be made in Spanish currency. I enclose a bank cheque payable to: V.I.E. INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS (IWANN'91) Granada, Spain, September 17-19, 1991 HOTEL BOOKING FORM SURNAME ______________________________________ FIRST NAME _______________________________ ORGANIZATION _______________________________________________________________ ADDRESS ____________________________________________________________________ CITY _____________________ POST CODE __________ COUNTRY______________________ TELEPHONE __________________ FAX _________________________ E-MAIL:_______________________ Accompanying person(s) ________________________________________________________ I want to reserve: _______ double room(s); ___________ single room(s) Arrival date:__________ Time: __________ Departure date:_________ Time:_________ INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS (IWANN'91) Granada, Spain, September 17-19, 1991 REGISTRATION FORM SURNAME ______________________________________ FIRST NAME _______________________________ ORGANIZATION ________________________________________________________________ ADDRESS ____________________________________________________________________ CITY _____________________ POST CODE __________ COUNTRY ______________________ TELEPHONE __________________ FAX _________________________ E-MAIL: _______________________ Fill in the appropriate box: Fee Before June 25th After June 25th ___________________________________________________________________ Regular 33.000 35.000 IEEE,AEIA,ATI members 28.000 30.000 Scholarship 4.000 5.000 This form should be sent as soon as possible to: Departamento de Electronica y Tecnologia de Computadores Facultad de Ciencias Universidad de Granada 18071 GRANADA (SPAIN) In order to avoid delays, please fax the registration form, together with a copy of the cheque or the bank transfer to: FAX: 34-58-24.32.30 or 34-58-27.42.58 INSCRIPTION PAYMENTS: Cheque payable to: IWANN'91 (16.142.512) or alternatively transfer to: IWANN'91 IWANN'91 account number: 16.142.512 account number: 007.01-450888 Caja Postal (Code: 2088-2037.1) or to Caja General Camino de Ronda, 138 Camino de Ronda, 156 18003 GRANADA (SPAIN) 18003 GRANADA (SPAIN) ************************************************************************ From ashley at spectrum.cs.unsw.oz.au Tue Jun 11 00:11:13 1991 From: ashley at spectrum.cs.unsw.oz.au (Ashley Aitken) Date: Tue, 11 Jun 91 0:11:13 AES Subject: distributed representations In-Reply-To: <9106072351.AA01618@macuni.mqcc.mq.oz.au>; from "Max Coltheart" at Jun 8, 91 9:51 am Message-ID: <9106101413.10651@munnari.oz.au> G'day, In the discussion of "Distributed Representations", Max Coltheart writes: > > But for nets that are meant to be > models of cognition, the hidden assumption seems to be that after brain damage > there is graceful degradation of cognitive processing, so the fact that nets > show graceful degradation too means they have promise for modelling cognition. > > But where's the evidence that brain damage degrades cognition gracefully? That > is, the person just gets a little bit worse at a lot of things? Very commonly, > exactly the opposite happens - the person remains normal at almost all kinds > of cognitive processing, but some specific cognitive task suffers catastroph- > ically. No graceful degradation here. I would suggest that Max is possibly confusing diffuse brain damage with catastrophic brain damage. Diffuse brain damage is the elimination of a small percentage of neurons diffusely from throughout the brain. Examples are the natural death of neurons throughout the brain and, perhaps, micro-lesions. The continual death of an immense number of neurons in the brain, thankfully only really amounts to the death of a very small percentage of the neurons in the brain. In any of the partitioned networks of the brain (say an area of the cortex) we would expect only a small number of neurons to die. If one considers that a neuron may receive in the order of thousands of synapses on it's dendritic tree, it can be understood, I believe, how the network (thought of as a connectionist network) could continue to function if one or two of these were to be eliminated. I would suggest that this continual death of neurons in the brain with the subtle, and often unnoticed, degradation in cognitive performance to be an example of (diffuse) brain damage degrading cognition gracefully. Hence, I believe this type of degradation does show neural networks have promise for modelling cogntion. Of course, this does depend on the the degradation seen in cognition being shown to be qualitatively the same as degradation seen in artificial neural networks. Catastrophic brain damage, on the other hand, is the gross elimination of neurons (usually relatively localized) from the brain. Examples are lesions resulting from head injuries or strokes, and ablation. It would seem that in this case one is most likely seeing the complete (or nearly complete) elimination of an entire network (or a critical part of it) and hence the elimination of it's associated and dependent function(s). I don't believe anyone would suggest that the brain's function would degrade gracefully under such terrorist action. Max continues: > > I could give very many examples: I'll just give one (Semanza & Zettin, > Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had > impaired language, but this impairment was confined to language production > (comprehension was fine) and to the production of just one type of word: proper > nouns. He could understand proper nouns normally, but could produce almost none > whilst his production of other kinds of nouns was normal. What's graceful about > this degradation of cognition? I am definitely no expert neuroscientist but I would suggest that this is an example of catastrophic brain damage not diffuse brain damage. Hence, I would not expect graceful degradation of cognitive performance. It seems to me that this would be too much to ask of all but the most completely holographic-like systems. The interesting point to be made from this example would then be that it appears to be evidence for a cortical region involved (directly or in-line) with the speech of only nouns. Amazing! It would also be interesting to test if there is any subtle difference in our *understanding* of a noun depending upon whether we are receiving (ie hearing or seeing it) with when we are producing (ie speaking or imagining) it. If this diagnosis of catastrophic brain damage is correct then I believe this example is mute upon whether or not the brain is functionally a Connectionist System. Still, the Connectionist System, in my opinion, gets the points for the diffuse brain damage. Hence Max's concluding suggestion, > If cognition does *not* degrade gracefully, and neural nets do, what does this > say about neural nets as models of cognition? becomes rather misplaced because cognition does appear to degrade gracefully under diffuse brain damage and catastrophically under catastrophic brain damage. The former providing possible evidence for neural networks as models of cognition. Ashley ashley at spectrum.cs.unsw.oz.au From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From ahmad at ICSI.Berkeley.EDU Mon Jun 10 18:58:12 1991 From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad) Date: Mon, 10 Jun 91 15:58:12 PDT Subject: Preprint Message-ID: <9106102258.AA16050@icsib18.Berkeley.EDU> The following paper (to appear in this Cognitive Science proceedings) is available from the neuroprose archives as ahmad.cogsci91.ps.Z (ftp instructions below). Efficient Visual Search: A Connectionist Solution by Subutai Ahmad & Stephen Omohundro International Computer Science Institute Abstract Searching for objects in scenes is a natural task for people and has been extensively studied by psychologists. In this paper we examine this task from a connectionist perspective. Computational complexity arguments suggest that parallel feed-forward networks cannot perform this task efficiently. One difficulty is that, in order to distinguish the target from distractors, a combination of features must be associated with a single object. Often called the binding problem, this requirement presents a serious hurdle for connectionist models of visual processing when multiple objects are present. Psychophysical experiments suggest that people use covert visual attention to get around this problem. In this paper we describe a psychologically plausible system which uses a focus of attention mechanism to locate target objects. A strategy that combines top-down and bottom-up information is used to minimize search time. The behavior of the resulting system matches the reaction time behavior of people in several interesting tasks. A postscript version of the paper can be obtained by ftp from cheops.cis.ohio-state.edu. The file is ahmad.cogsci91.ps.Z in the pub/neuroprose directory. You can either use the Getps script or follow these steps: unix:2> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. Name (cheops.cis.ohio-state.edu:): anonymous 331 Guest login ok, send ident as password. Password: neuron 230 Guest login ok, access restrictions apply. ftp> cd pub/neuroprose ftp> binary ftp> get ahmad.cogsci91.ps.Z ftp> quit unix:4> uncompress ahmad.cogsci91.ps.Z unix:5> lpr ahmad.cogsci91.ps --Subutai ahmad at icsi.berkeley.edu From crr at shum.huji.ac.il Mon Jun 10 15:12:11 1991 From: crr at shum.huji.ac.il (crr@shum.huji.ac.il) Date: Mon, 10 Jun 91 22:12:11 +0300 Subject: distributed vs. local encoding schemes Message-ID: <9106101912.AA28249@shum.huji.ac.il> Terry Sejnowski mentioned the kinds of hidden units that we found in NETtalk. As for the input/output representations, we ran a number of experiments using both local (one unit per letter/phoneme, but more than one unit on per window) and distributed representations (more than one unit on per letter/phoneme). Learning times are generally faster with distributed representations simply because the net inputs and resulting error gradients are larger. (However it might be possible to boost the learning rate for the local representation to match the distributed one. I don't know if this would affect generalization or not since I didn't try it.) Using a representation that "makes sense" for the particular domain (such as using an articulatory feature code for the phonemes -- or is this local because the units represent features?) also leds to faster learning, and is more resistant to damage than a "random" encoding of the phonemes. Charlie Rosenberg From CADEPS at BBRNSF11.BITNET Tue Jun 11 08:56:05 1991 From: CADEPS at BBRNSF11.BITNET (JANSSEN Jacques) Date: Tue, 11 Jun 91 14:56:05 +0200 Subject: No subject Message-ID: <5901C8A706400066@BITNET.CC.CMU.EDU> STEERABLE GenNets - A Query. Abstract : One can evolve a GenNet (a neural net evolved with the genetic algorithm) to display two separate behaviors depending upon the setting of a clamped input control variable. By using an intermediate control value one obtains an intermediate behavior. For example, let the behaviors be sinusoidal oscillations of periods T1 and T2, where the control settings are 0.5 and -0.5 By using a control value of 0.3, one will get a sinusoid with a period between T1 and T2. Why? Has anyone out there had any similar experiences (i.e. of this sort of generalised behavioral learning), and has anybody any idea why GenNets are capable of such a phenomenon? If I receive some interesting replies, I'll prepare a summary and report back. Further details. One of the great advantages of GenNets (= using the GA to teach your neural nets their behaviors) over traditional NN paradigms such as backprop, Hopfield, etc is that the GA treats your NN as a black box, and it doesnt matter how complex the internal dynamics of the NN are. All that counts is the result. How well did the NN perform? If it did well, the bitstring which codes for the NN's weights will survive. This allows the creation of GenNets which can cope with both inputs and outputs which vary constantly. One does not need stationary output values a la Hopfield etc. Hence NNs become much more "dynamic", compared to the more "static" nature of traditional paradigms. One can thus evolve dynamics (behaviors) on NNs (GenNets). This opens up a new world of NN possibilities. If one can evolve a GenNet to express one behavior, why not two? If two, can one evolve a continuum of behaviors depending upon the setting of a controlled input value? The variable frequency generator GenNet mentioned above shows that this is possible. But I'm damned if I know why? Whats going on? Have any of you had similar experiences? Any clues for a theoretical explanation for this extraordinary phenomenon? P.S. To evolve this GenNet, use a fully connected net, with all external inputs set at zero, except for two inputs. Clamp one at 0.5, and the other at 0.5 (and then -0.5 in the second "experiment"). The fitness is the inverse of the sum of the two sums (for the two expts) of the squares of the difference between the desired output at each clock cycle and the actual output. Assign one neuron to be the output neuron. Cheers, Hugo de Garis, University of Brussels, Belgium, George Mason University, VA, USA. From thomasp at gshalle1.informatik.tu-muenchen.de Tue Jun 11 11:50:25 1991 From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas) Date: Tue, 11 Jun 1991 17:50:25 +0200 Subject: Research Position in SPAIN ? Message-ID: <9106111550.AA08800@gshalle1.informatik.tu-muenchen.de> I'm a graduate student in computer science at Munich Technical University and plan to work in a research position related to neural networks in SPAIN. I would extremely appreciate if you could provide me some information on university/private/company research institutes active or interested in the field of neural network research and located in the Madrid or Seville area. Preferably, I would like to start working in Spain in November 91 or, alternatively, in January/February 1992. Sincerely, Patrick Thomas Institute for Medical Psychology Goethestr. 31 8000 Munich 2 From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From moeller at kiti.informatik.uni-bonn.de Thu Jun 13 03:50:34 1991 From: moeller at kiti.informatik.uni-bonn.de (Knut Moeller) Date: Thu, 13 Jun 91 09:50:34 +0200 Subject: TR available from neuroprose; learning algorithms Message-ID: <9106130750.AA01054@kiti.> The following report is now available from the neuroprose archive: LEARNING BY ERROR-DRIVEN DECOMPOSITION D.Fox V.Heinze K.Moeller S.Thrun G.Veenker (6pp.) Abstract: In this paper we describe a new selforganizing decomposition technique for learning high-dimensional mappings. Problem decomposition is performed in an error-driven manner, such that the resulting subtasks (patches) are equally well approximated. Our method combines an unsupervised learning scheme (Feature Maps [Koh84]) with a nonlinear approximator (Backpropagation [RHW86]). The resulting learning system is more stable and effective in changing environments than plain backpropagation and much more powerful than extended feature maps as proposed by [RMW89]. Extensions of our method give rise to active exploration strategies for autonomous agents facing unknown environments. The appropriateness of this technique is demonstrated with an example from mathematical function approximation. ----------------------------------------------------------------------------- To obtain copies of the postscript file, please use Jordan Pollack's service: Example: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): ftp> cd pub/neuroprose ftp> binary ftp> get (remote-file) fox.decomp.ps.Z (local-file) fox.decomp.ps.Z ftp> quit unix> uncompress fox.decomp.ps.Z unix> lpr -P((your_local_postscript_printer) fox.decomp.ps.Z ---------------------------------------------------------------------------- If you have any difficulties with the above, please send e-mail to moeller at kiti.informatik.uni-bonn.de DO NOT "reply" to this message!! From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From kbj at jupiter.risc.rockwell.com Mon Jun 10 13:57:41 1991 From: kbj at jupiter.risc.rockwell.com (Ken Johnson) Date: Mon, 10 Jun 91 10:57:41 PDT Subject: No subject Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com> In response to the debate on Distributed vs. Local Representations..... Everyone in this field has a view point colored by their academic background. So here is mine. The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'.. The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties. In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation. In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'. An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain. In this case we have Ashby's Law of Requisite Variety. I can't find my copy of the reference, but its by John Porter circa 1983-1987. In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel. Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description. In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely. References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington. What we found was an important dichotomy. Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes. Without this characteristic pattern classification would not group very similar patterns together. On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately. Hence, we see proper code organization required similar codes be close while different codes needed to be far apart. One should expect this property if the goal of the system ! is representationaly richness rat The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly. Correct utilization of neural representation bandwidth is something we don't use very well. In fact, I'll state that we don't use it at all. The notion of bandwidth immediately suggests time as a representational dimension we don't use. Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted. Thus, the code is again static. Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and increasing the processing capabilities of neural systems? Ken Johnson kbj at risc.rockwell.com From thomasp at gshalle1.informatik.tu-muenchen.de Thu Jun 13 13:33:19 1991 From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas) Date: Thu, 13 Jun 1991 19:33:19 +0200 Subject: Gracias & Sorry Message-ID: <9106131733.AA19732@gshalle1.informatik.tu-muenchen.de> Sorry for the "garbage" and muchas gracias to all those helping out with adresses and conference announcements. Patrick From utans-joachim at CS.YALE.EDU Sat Jun 15 12:48:45 1991 From: utans-joachim at CS.YALE.EDU (Joachim Utans) Date: Sat, 15 Jun 91 12:48:45 EDT Subject: preprint available Message-ID: <9106151648.AA01689@SUNNY.SYSTEMSX.CS.YALE.EDU> The following preprint has been placed in the neuroprose archive at Ohio State University: Selecting Neural Network Architectures via the Prediction Risk: Application to Corporate Bond Rating Prediction Joachim Utans John Moody Department of Electrical Engineering Department of Computer Science Yale University Yale University New Haven, CT 06520 New Haven, CT 06520 Abstract: Intuitively, the notion of generalization is closely related to the ability of an estimator to perform well with new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select the optimal network architecture. The prediction risk needs to be estimated from the available data; here we approximate the prediction risk by v-fold cross-validation and asymtotic estimates of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of complete a priori information that could be used to impose a structure to the network architecture. To retrieve it by anonymous ftp: unix> ftp cheops.cis.ohio-state.edu # (or ftp 128.146.8.62) Name (cheops.cis.ohio-state.edu:): anonymous Password (cheops.cis.ohio-state.edu:anonymous): neuron ftp> cd pub/neuroprose ftp> binary ftp> get utans.bondrating.ps.Z ftp> quit unix> uncompress utans.bondrating.ps unix> lpr -P(your_local_postscript_printer) utans.bondrating.ps Joachim Utans From h1201kam at ella.hu Sun Jun 16 13:05:00 1991 From: h1201kam at ella.hu (Kampis Gyorgy) Date: Sun, 16 Jun 91 13:05:00 Subject: a new book; special issue on emergence; preprint availab Message-ID: <9106161115.AA13832@sztaki.hu> ANNOUNCEMENTS **************************************************************** 1. a new book 2. a Special Issue on emergence 3. preprint available **************************************************************** 1. the book George Kampis SELF-MODIFYING SYSTEMS IN BIOLOGY AND COGNITIVE SCIENCE: a New Framework for Dynamics, Information and Complexity Pergamon, Oxford-New York, March 1991, 546pp with 96 Figures About the book: The main theme of the book is the possibility of generating information by a recursive self-modification and self- redefinition in systems. The book offers technical discussions of a variety of systems (Turing machines, input-output systems, synergetic systems, connectionist networks, nonlinear dynamic systems, etc.) to contrast them with the systems capable of self-modification. What in the book are characterized as 'simple systems' involve a fixed definition of their internal modes of operations, with variables, parts, categories, etc. invariant. Such systems can be represented by single schemes, like computational models of the above kind. A relevant observation concerning model schemes is that any scheme grasps but one facet of material structure, and hence to every model there belongs a complexity excluded by it. In other words, to every simple system there belongs a complex one that is implicit. Self-modifying systems are 'complex' in the sense that they are characterized by the author as ones capable to access an implicate material complexity and turn it into the information carrying variables of a process. An example for such a system would be a tape recorder which spontaneously accesses new modes of information processing (e.g. bits represented as knots on the tape). A thesis discussed in the book is that unlike current technical systems, many natural systems know how to do that trick, and make it their principle of functioning. The book develops the mathematics, philosophy and methodology for dealing with such systems, and explains how they work. A constructive theory of models is offered, with which the modeling of systems can be examined in terms of algorithmic information theory. This makes possible a novel treatment of various old issues like causation and determinism, symbolic and nonsymbolic systems, the origin of system complexity, and, finally, the notion of information. The book introduces technical concepts such as information sets, encoding languages, material implications, supports, and reading frames, to develop these topics, and a class of systems called 'component-systems', to give examples for self-modifying systems. As an application, it is discussed how the latter can be applied to understand aspects of evolution and cognition. From tgelder at phil.indiana.edu Mon Jun 17 11:45:58 1991 From: tgelder at phil.indiana.edu (Timothy van Gelder) Date: Mon, 17 Jun 91 10:45:58 EST Subject: distribution and its advantages Message-ID: Javier Movellan's question -- what are distributed representations are good for anyway? -- is I think an important one for connectionism and cognitive science generally. Trouble is, the way it was put, it presupposes that there is some one kind of representation that everyone is referring to when they talk about distribution. In fact, though most people have a reasonable idea what they themselves intend when they use the term "distributed", they usually don't realize that its not the way many other people use it. This is immediately apparent if one takes an overview of the responses that actually came in. Various people took it that a representation is distributed if it utilizes many units rather than just one, with the "strength" of distribution increasing as the total number of units (or perhaps, the proportion of available units) used increases. Massone by contrast thought the key concept is that of redundancy, which I take roughly to mean that a given piece of input information is represented multiple times. This presumably requires that many units are used (i.e., that there is distribution in the previous sense) but is a significantly stronger requirement. Massone's position was echoed in some other responses. Chalmers claims that a distributed representation is one in which every representation, whether of a basic concept or a more complex one, has a kind of semantically significant internal structure. This definition also seems to presuppose the first kind of definition, but is different from redundancy. Proposing a somewhat different definition again, French suggested that distribution is a matter of the degree of "overlap" between representations of different entities. And so on. This lack of agreement over what distribution actually is at least partly responsible for the fact that no really clear and useful consensus on the advantages of distributed representation really emerged in the responses to the initial question. It manifests a wider lack of agreement over the concept of distribution in connectionism and cognitive science more generally. I once surveyed as many of the definitions and occurrences of "distribution", "distributed representation", etc., as I could find in the cognitive science literature, and found that there were at least 5 very different basic properties that people often refer to as distribution. These ranged from a very simple notion of "spread-out- ness" - each entity being represented by activity in many units rather than just one - at one extreme, to complete functional equipotentiality at the other. (A representation is functionally equipotential when any part of it can stand in for the whole thing. Holograms are famous for exhibiting a form of equipotentiality.) Authors often picked up multiple strands and ran them together in one characterization, or defined distribution differently on different occasions, sometimes even in the same work. Probably the two most common definitions are (1) the notion of simple extendedness just mentioned (i.e., using "many" units to represent a given item) and (2) superimposition of representations. We have superimposition when there are multiple items being represented at the same time, but no way of pointing to the discrete part of the representation which is responsible for item A, the discrete part which is responsible for item B, and so forth. Think of the weights in a standard feed-forward network. Here multiple input-output associations are represented at the same time, but there is (in general) no separate set of weights for each association. To see how these two senses simultaneously dominate connectionist discussions of distribution, think again of the answers to Movellan's question. Many of the answers took the form, roughly, that "when I used representations involving activity in many units rather than just one in such and such a network, I found better (or worse!) performance". Other responses, particularly those that made reference to the brain or neuropsychological results, were more concerned with the extent to which there is separate or discrete storage of the various components of our knowledge in a given circumscribed domain. (In these contexts, "graceful degradation" in performance is often thought to be a consequence of knowledge being stored in an inextricably superimposed fashion.) In one sense, it is not surprising that these are the two most common notions of distribution. Perhaps the only thing that is really clear about distribution is the opposition between distribution and localization: whatever distributed representations are, they are non-local. Trouble is, "local" turns out to be ambiguous. Sometimes "local" means restricted in extent (e.g., using only one unit rather than many), and sometimes it means not overlapping with the representation of anything else. The two most common senses of "distribution" mentioned a moment ago simply result from denying locality in these two distinct senses. It seems to me that a necessary condition for any significant progress on the question "what are distributed representations good for?" is that this general state of confusion over what "distributed" means be resolved. This means clearly laying out the different senses that are floating around, picking out the one that is the most central and most theoretically significant, and giving it a reasonably precise definition. I attempted this in Ch.1 of my PhD dissertation (Distributed Representation, University of Pittsburgh 1989); a shorter overview of some of the material from that chapter has recently appeared as "What is the D in PDP? An overview of the concept of distribution" in Stich, Ramsey & Rumelhart (eds) Philosophy and Connectionist Theory. In my opinion, the most important concept in the vicinity of distribution is that of superimposition of representations, and it is for this that the term "distributed" should really be reserved. One advantage of this strategy is that superimposition admits of a surprisingly clear and satisfying mathematical definition: Suppose R is a representation of multiple items. If the representings of the different items are fully superimposed, every part of the representation R must be implicated in representing each item. If this is achieved in a non-trivial way there must be some encoding process that generates R given the various items to be stored, and which makes R vary, at every point, as a function of each item. This process will be implementing a certain kind of transformation from items to representations. This suggests thinking of distribution more generally in terms of mathematical transformations exhibiting a certain abstract structure of dependency of the output on the input. More precisely, define any transformation from a function F to another function G as strongly distributing just in case the value of G at any point varies with the value of F at every point; the Fourier transform is a classic example. Similarly, a transformation from F to G is weakly distributing, relative to a division of the domain of F into a number of sub-domains, just in case the value of G at every point varies as a function of the value of F at at least one point in each sub-domain. The classic example here is the linear associator, in which a series of vector pairs are stored in a weight matrix by first forming, and then adding together, their respective outer products. Each element of the matrix varies with every stored vector, but only with one element of each of those vectors. (The "functions" F and G in this case describe the input vectors and the association matrix respectively; e.g., given an argument specifying a place in an input vector, F returns the value of the vector at that place.) Clearly, a given distributing transformation yields a whole space of functions resulting from applying that transformation to different inputs (i.e., different functions F). If we think of these output functions as descriptions of representations, and the input functions as descriptions of items to be represented, the distributing transformation is defining a whole space or scheme of distributed representations. To be a distributed representation, then, is to be a member of such a scheme; it is to be a representation R of a series of items C such that the encoding process which generates R on the basis of C implements a given distributing transformation. Basically, then, distributed representations are what you get from distributing transformations, which are transformations which make each part of the output (the representation) depend on every part of the input (what you're representing). Now, mathematically speaking, there is a vast number of different kinds of distributing transformations, and so there is a vast number of possible instantiations of distributed representation. Connectionists can be seen as exploring that portion of the space of possible transformations that you can handle with n-dimensional vector operations, learning algorithms, etc. In other domains such as optics it is possible to implement other forms of distributing transformations and hence to get distributed representations with different properties. There are a number of reasons for wanting to define distributed representation in terms of superimposition generally, and distributed transformations in particular: (a) superimposition is certainly one of the most common of the standard senses of "distribution" in current usage, and so we remain as close as possible to that usage; (b) superimposition admits of a precise mathematical definition, so those who think clarity only comes from formalization should be kept happy; (c) various popular properties of distributed representation such as automatic generalization and graceful degradation are a natural consequence of distribution defined this way; (d) in practice, in a connectionist context, distribution in the sense of requiring many units rather than just one is a necessary precondition of this more full-blooded notion; hence any advantages that accrue to representations in virtue of utilizing many units also accrue to superimposed representations; (e) a number of other interesting theoretical results follow from defining distribution this way: in particular, it can be shown that distributed representations cannot be symbolic in nature, on a reasonably precise definition of "symbolic" (see e.g. my "Why distributed representation is inherently non- symbolic", in G. Dorffner (ed.) Konnektionismus in Artificial Intelligence und Kognitionsforschung. Berlin: Springer- Verlag, 1990; 58-66). On the basis of this kind of definition of what distributed representation is, what kind of answer can be given to the "what are distributed representations good for?" question? Well, the kind of answer you will find satisfying will depend very much on what your theoretical interests are. A connectionist whose concerns have more of an applied, engineering focus will want to know what specific processing benefits arise from using representations generated by distributing transformations. As mentioned in (c) above, I think that some of the favorite virtues of distribution are best seen as an immediate consequence of superimposition. The technical issues here still need much clarification, however. As a cognitive scientist, on the other hand, I'm interested in more general questions such as - what are the advantages of distribution for human knowledge representation? Here I don't have any actual answers ready to hand; the most I can do the moment is point to the kind of question that seems the most interesting. Speaking at the broadest possible level: various difficulties encountered in mainstream AI, combined with some philosophical reflections, suggest that everyday commonsense knowledge cannot be fully and effectively captured in any kind of purely symbolic format; that, in other words, symbolic representation is fundamentally the wrong medium for capturing at least certain kinds of human knowledge. Just above I mentioned that distributed representation (defined in terms of superimposition) can be shown to be intrinsically non-symbolic. The obvious suggestion then is: perhaps the most important advantage of distributed representation is that it (and it alone?) is capable of representing the kind of knowledge that underlies everyday human competence? Tim van Gelder From tsejnowski at UCSD.EDU Mon Jun 17 13:14:00 1991 From: tsejnowski at UCSD.EDU (Terry Sejnowski) Date: Mon, 17 Jun 91 10:14:00 PDT Subject: Santa Fe Time Series Competition Message-ID: <9106171714.AA23031@sdbio2.UCSD.EDU> A Time Series Prediction and Analysis Competition The Santa Fe Institute August 1, 1991 - December 31, 1991 A wide range of new techniques are now being applied to the time series analysis problems of predicting the future behavior of a system and deducing properties of the system that produced the time series. Such problems arise in most observational disciplines, including physics, biology, and economics; new tools, such as the use of connectionist models for forecasting, or the extraction of parameters of nonlinear systems with time-delay embedding, promise to provide results that are unobtainable with more traditional time series techniques. Unfortunately, the realization and evaluation of this promise has been hampered by the difficulty of making rigorous comparisons between competing techniques, particularly ones that come from different disciplines. In order to facilitate such comparisons and to foster contact among the relevant disciplines, the Santa Fe Institute is organizing a time series analysis and prediction competition. A few carefully chosen experimental time series will be made available through a computer at the Santa Fe Institute, and quantitative analyses of these data will be collected in the areas of forecasting, characterization (evaluating dynamical measures of the system such as the number of degrees of freedom and the information production rate), and system identification (inferring a model of the system's governing equations). At the close of the competition the performance of the techniques submitted will be compared and published, and the server will continue to operate as an archive of data, programs, and comparisons among algorithms. There will be no monetary prizes. A workshop is planned for the Spring of 1992 to explore the results of the competition. The competition does not require advance registration; to enter, simply retrieve the data and submit your analysis. The detailed description of the competition categories and instructions for retrieving the data and entering the competition will be available after August 1 through four routes: ACCESSING THE DATA --------- --- ---- ftp: Ftp to sfi.santafe.edu (192.12.12.1) as user "tsguest" and use "tsguest" for the password. Get the file "instructions". dial-up: There are two dial-up lines: 505-988-1705 (2400 baud), and 505-986-0252 (any speed to 9600 baud). The settings for both lines are no parity, 8 bit words, 1 stop bit. At the connect press return; at the prompt type "login tsguest" and use "tsguest" for the password. At the next prompt type "telnet sfi" and login as user "tsguest" (password "tsguest"). Using either "kermit" or "xmodem", retrieve the file instructions". When you are finished, logout from sfi and from the prompt. mail server: Send email to tserver at sfi.santafe.edu with the phrase "send time series instructions" in either the subject or the body of the message. The mailer will return a file with more detailed instructions for requesting the data and submitting analyses. pc disks: The data is available on disks in either IBM-PC or Mac formats. To cover the cost of distributing the data, send $25 to Time Series Competition Disks, The Santa Fe Institute, 1120 Canyon Road, Santa Fe, NM 87501, and specify the machine type, disk size, and disk density required. Instructions will be included with the disks on submitting a return disk with the analysis of the data. FOR MORE INFORMATION --- ---- ----------- Further questions about the competition, or inquiries about contributing data to be used in the competition, should be directed to: Time Series Competition Santa Fe Institute 1660 Old Pecos Trail, Suite A Santa Fe, NM 87501 (505) 984--8800 tserver at sfi.santafe.edu or to one of the organizers: Neil Gershenfeld Andreas Weigend Department of Physics Xerox Palo Alto Research Center Harvard University 3333 Coyote Hill Road 15 Oxford Street Palo Alto, CA 94304 Cambridge, MA 02138 (415) 322-4066 (617) 495-5641 andreas at sfi.santafe.edu neilg at sfi.santafe.edu ADVISORY BOARD -------- ----- Prof. Leon Glass Department of Physiology McGill University Prof. Clive W. J. Granger Center for Econometric Analysis Department of Economics University of California, San Diego Prof. William H. Press Department of Physics and Center for Astrophysics Harvard University Prof. Maurice B. Priestley Department of Mathematics The University of Manchester Institute of Science and Technology Prof. Itamar Procaccia Department of Chemical Physics The Weizmann Institute of Science Prof. T. Subba Rao Department of Mathematics The University of Manchester Institute of Science and Technology Prof. Harry L. Swinney Department of Physics University of Texas at Austin From pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU Tue Jun 18 14:10:12 1991 From: pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU (Michael Pazzani) Date: Tue, 18 Jun 91 11:10:12 -0700 Subject: Special Issue of Machine Learning Journal Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU> MACHINE LEARNING will be publishing a special issue on Computer Models of Human Learning. The ideal paper would describe an aspect of human learning, present a computational model of the learning behavior, evaluate how the performance of the model compares to the performance of human learners, and describe any additional predictions made by the computational model. Since it is hoped that the papers will be of interest to both cognitive psychologists and computer scientists, papers should be clearly written and provide the background information necessary to appreciate the contribution of the computational model. Manuscripts must be received by April 1, 1992, to assure full consideration. One copy should be mailed to the editor: Michael Pazzani Department of Information and Computer Science University of California, Irvine, CA 92717 USA In addition, four copies should be mailed to: Karen Cullen MACH Editorial Office Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, MA 02061 USA Papers will be subject to the standard review process. Please pass this announcement along to interested colleagues. From pazzani at pan.ICS.UCI.EDU Tue Jun 18 14:10:12 1991 From: pazzani at pan.ICS.UCI.EDU (Michael Pazzani) Date: Tue, 18 Jun 91 11:10:12 -0700 Subject: Special Issue of Machine Learning Journal Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU> MACHINE LEARNING will be publishing a special issue on Computer Models of Human Learning. The ideal paper would describe an aspect of human learning, present a computational model of the learning behavior, evaluate how the performance of the model compares to the performance of human learners, and describe any additional predictions made by the computational model. Since it is hoped that the papers will be of interest to both cognitive psychologists and computer scientists, papers should be clearly written and provide the background information necessary to appreciate the contribution of the computational model. Manuscripts must be received by April 1, 1992, to assure full consideration. One copy should be mailed to the editor: Michael Pazzani Department of Information and Computer Science University of California, Irvine, CA 92717 USA In addition, four copies should be mailed to: Karen Cullen MACH Editorial Office Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, MA 02061 USA Papers will be subject to the standard review process. Please pass this announcement along to interested colleagues. From pollack at cis.ohio-state.edu Tue Jun 18 11:28:35 1991 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Tue, 18 Jun 91 11:28:35 -0400 Subject: Neuroprose Turbulence Expected Message-ID: <9106181528.AA01029@dendrite.cis.ohio-state.edu> Cheops, the pyramid machine upon which NEUROPROSE resides, will be decommissioned. The Neuroprose archive will move, with luck, to a new Sparcserver at the same IP address also called Cheops. But between today and July 1, all cis.ohio-state.edu systems (including email) will be pretty wobbly, so expect delays. Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Phone: (614)292-4890 (then * to fax) From uh311ae at sunmanager.lrz-muenchen.de Tue Jun 18 18:31:23 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 19 Jun 91 00:31:23+0200 Subject: large SIMD nn machines, ASI Message-ID: <9106182231.AA12381@sunmanager.lrz-muenchen.de> Hello, I wonder wether there are any other beta testers of the ASI Cnaps machine out who might want to share some experiences. Specifically, has anyone - implemented a non-local algorithm (CG, PCG), - implemented a good random number generator memory efficient enough to be put into node memory / what do you think about tables or host communication for an alternative implementation ? - thought about interfacing some hardware as preprocessor, pi- ping data in via DMA ? - found a job for idle processors (small net sizes) - liked the 1-bit weight mode - ported the debugger to Irix - (other) ? Some of these questions should be familiar to other SIMD programmers to ( I have the witbrock gf11 paper). Thank you for any hints. Cheers, Henrik (Rick at vee.lrz-muenchen.de) H. Klagges, Laser Institute Prof Haensch, PhysDep U of Munich, FRG + IBM Research Division, Binnig group From uh311ae at sunmanager.lrz-muenchen.de Tue Jun 18 18:44:50 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 19 Jun 91 00:44:50+0200 Subject: Backpercolation Message-ID: <9106182244.AA12446@sunmanager.lrz-muenchen.de> Hello, I wonder wether the backpercolation algorithm (see back articles in comp.ai.neural-nets) is important or not. I got some very preliminary results on very simple problems (n-n-n linear channel with few (3-10) patterns) which look not bad, but complicated ones don't seem parti- cularly zooming yet (yes, there are some bugs in my code left). If anyone likes a C++ - backperc - server object (guarantedly broken) to avoid reinventing the wheel and get some basic data structures, let me know. The only problem: Mark Jurik (mgj at cup.portal.com) wants you to sign a nondisclosure thing first before I can sent it out to you. Anyway, if someone else has some first results, I would really like to see them. Cheers, Henrik (Rick at vee.lrz-muenchen.de) From ITGT500 at INDYCMS.BITNET Tue Jun 18 16:32:57 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Tue, 18 Jun 91 15:32:57 EST Subject: Distributed vs. local representation Message-ID: <25A077F07E800064@BITNET.CC.CMU.EDU> Following I would like to state my views on the distributed and local representations. All comments are more than welcome. I think that if we define a strict local representation as: "one object (or item, entity, etc.) is represented by one node (or unit, neuron, etc.) only, and one node represents only one object", then all the other situations probably can be classified as distributed representation (either semi- or full-distributed). In other words, only the one-to-one representation belongs to local representation. The others, multiple-to-one, one-to- multiple, and multiple-to-multiple representations all belong to distributed representation. Therefore, the distributed representation has more senses than local. This may help reduce the confusion regarding these definitions. Because distributed representation covers more range than local, there are many different appearance in distributed representation. One point unnoticed up to now is the difference between the "binary representation" (the node takes binary values only) and the "analog representation" (the node takes analog values). In NETtalk and many other examples, the distributed representation used seems to be the binary one. However, the world seems favoring and is taking the analog form. Therefore, analog distributed representation probably is the one that is working and dominating our cognitive processes. I met one such problem in our work on parabolic problem. We found that if it was not impossible, it would be very difficult to use (strict) local or binary distributed representation to solve the parabolic problem. It was only the analog distributed representation that worked well. We concluded that from the practical application viewpoint, local and the distributed representation all would work if the training and test patterns were discrete and finite. However, if the training and/or test patterns were continuous and infinite, only distributed representation worked. -Bo From aam9n at hagar2.acc.Virginia.EDU Wed Jun 19 04:39:39 1991 From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai) Date: Wed, 19 Jun 91 04:39:39 EDT Subject: Distributed Representations Message-ID: <9106190839.AA00322@hagar2.acc.Virginia.EDU> We connectionists never tire of talking about "distributed representations", and with good reason. However, I have never come across a rigorous definition of the concept. Now, I realize that this notion, like most powerful ones, will necessarily be diminished in any process of definition, however inclusive that might be. That has not fazed us in trying to define entropy, information, complexity, learnability --- and probability! My question is: has anyone rigorously, or even empirically, tried to come up with a definition for distributed representations --- especially a way to quantify distributed-ness? I suppose high-order statistics represent a way to look at this, but have there been any attempts to develop a definition specifically in the context of connectionist networks? And would that be such a bad thing? Ali Minai Dept of EE University of Virginia aam9n at Virginia.EDU From maureen at ai.toronto.edu Wed Jun 19 11:38:49 1991 From: maureen at ai.toronto.edu (Maureen Smith) Date: Wed, 19 Jun 1991 11:38:49 -0400 Subject: Announce new CRG Technical Report Message-ID: <91Jun19.113852edt.780@neuron.ai.toronto.edu> The following technical report is available for ftp from the neuroprose archive. A hardcopy may also be requested. (See below for details.) Though written for a statistics audience, this report should be of interest to connectionists and others interested in machine learning, as it reports a Bayesian solution for one type of "unsupervised concept learning". The technique employed is also related to that used in Boltzmann Machines. Bayesian Mixture Modeling by Monte Carlo Simulation Radford M. Neal Technical Report CRG-TR-91-2 Department of Computer Science University of Toronto It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation. This method exhibits the true Bayesian predictive distribution, implicitly integrating over the entire underlying parameter space. An infinite number of mixture components can be accommodated without difficulty, using a prior distribution for mixing proportions that selects a reasonable subset of components to explain any finite training set. The need to decide on a ``correct'' number of components is thereby avoided. The feasibility of the method is shown empirically for a simple classification task. To obtain a compressed PostScript version of this report from neuroprose, ftp to "cheops.cis.ohio-state.edu" (128.146.8.62), log in as "anonymous" with password "neuron", set the transfer mode to "binary", change to the directory "pub/neuroprose", and get the file "neal.bayes.ps.Z". Then use the command "uncompress neal.bayes.ps.Z" to convert the file to PostScript. To obtain a hardcopy version of the paper by physical mail, send mail to : Maureen Smith Department of Computer Science University of Toronto 6 King's College Road Toronto, Ontario M5A 1A4 From schraudo at cs.UCSD.EDU Wed Jun 19 21:39:56 1991 From: schraudo at cs.UCSD.EDU (Nici Schraudolph) Date: Wed, 19 Jun 91 18:39:56 PDT Subject: hertz.refs.bib patch Message-ID: <9106200139.AA29142@beowulf.ucsd.edu> In adding the "HKP:" prefix to the citation keys in the BibTeX version of the Hertz/Krogh/Palmer bibliography I forgot to modify the internal cross-citations accordingly. I've appended the necessary patch below; it only involves three lines, but those who don't feel up to the task can ftp the patched file (still called hertz.refs.bib.Z) from neuroprose. My apologies for the invonvenience, - Nici Schraudolph. Here's the patch: *** hertz.refs.bib Wed Jun 19 18:23:36 1991 *************** *** 73,80 **** @string{snowbird = "Neural Networks for Computing"} % -------------------------------- Books --------------------------------- ! @string{inAR = "Reprinted in \cite{Anderson88}"} ! @string{partinAR = "Partially reprinted in \cite{Anderson88}"} @string{pdp = "Parallel Distributed Processing"} % ------------------------------- Journals --------------------------------- --- 73,80 ---- @string{snowbird = "Neural Networks for Computing"} % -------------------------------- Books --------------------------------- ! @string{inAR = "Reprinted in \cite{HKP:Anderson88}"} ! @string{partinAR = "Partially reprinted in \cite{HKP:Anderson88}"} @string{pdp = "Parallel Distributed Processing"} % ------------------------------- Journals --------------------------------- *************** *** 3500,3506 **** pages = "75--112", journal = cogsci, volume = 9, ! note = "Reprinted in \cite[chapter 5]{Rumelhart86a}", year = 1985 } --- 3500,3506 ---- pages = "75--112", journal = cogsci, volume = 9, ! note = "Reprinted in \cite[chapter 5]{HKP:Rumelhart86a}", year = 1985 } From rich at gte.com Thu Jun 20 10:24:53 1991 From: rich at gte.com (Rich Sutton) Date: Thu, 20 Jun 91 10:24:53 -0400 Subject: Job Announcement - GTE Message-ID: <9106201424.AA29945@bunny> The connectionist machine learning project at GTE Laboratories is looking for a researcher in computational models of learning and adaptive control. Applications from highly-qualified candidates are solicited. A demonstrated ability to perform and publish world-class research is required. The ideal candidate would also be interested in pursuing applications of their research within GTE businesses. GTE is a large company with major businesses in local telphone operations, mobile communications, lighting, precision materials, and government systems. GTE Labs has had one of the largest machine learning research groups in industry for about seven years. A doctorate in Computer Science, Computer Engineering or Mathematics is required. A demonstrated ability to communicate effectively in writing and in technical and business presentations is also required. Please send resumes and correspondence to: June Pierce GTE Labs MS-44 40 Sylvan Road Waltham, MA 02254 USA From ga1043 at sdcc6.UCSD.EDU Thu Jun 20 12:48:06 1991 From: ga1043 at sdcc6.UCSD.EDU (ga1043) Date: Thu, 20 Jun 91 09:48:06 PDT Subject: Super-Turing discussion Message-ID: <9106201648.AA15438@sdcc6.UCSD.EDU> A couple of months ago, there was a discussion on the network about neural nets, their capabilities, super-Turing machines, etc. About five or six references were mentioned. Does anyone have a list of those refereces, or a copy of that discussion? If you could forward the information to me at ga1043 at sdcc6.ucsd.edu, I would appreciate it. Valerie Hardcastle From rstark at aipna.edinburgh.ac.uk Thu Jun 20 12:29:54 1991 From: rstark at aipna.edinburgh.ac.uk (rstark@aipna.edinburgh.ac.uk) Date: Thu, 20 Jun 91 12:29:54 BST Subject: Distributed vs. Localist Representations Message-ID: <4210.9106201129@fal.aipna.ed.ac.uk> One aspect of this issue which seems implicit in much of this discussion is the notion that distributed representation can be considered a *relative* property. Thus the "room schema" network is "distributed" relative to rooms, but "localist" relative to ovens. Likewise, the Jets and Sharks model, which is considered to be strictly localist in the sense that each unit explictly represents a single concept (eg. "is-in-thirties"), does produce representations that are distributed relative to individual gang members. Andy Clark notes this in Microcognition. Does this seem correct? Is anyone uncomfortable with calling the Jets and Sharks a "distributed" model since each individual is represented by a pattern over the units (one unit active in each competition network), even though each unit can be clearly labelled in a localist fashion? Note that his notion of relativity in distributed representation is (I believe) distinct from its continuous aspects (seen in references to "paritally-" or "semi-" distributed representations), which may be quantifiable using eg. Tim Van Gelder's proposal of degree of superimposition. -Randall Stark --------------------------------------------------------------------------- Randall Stark TEL: (+44)-31-650-2725 | Dept of Artifical Intelligence JANET: rstark at uk.ac.ed.aipna | 80, South Bridge ARPA: rstark%uk.ac.ed.aipna at nsfnet-relay | University of Edinburgh UUCP: ...!uunet!mcsun!ukc!aipna!rstark | Edinburgh, EH1 1HN, UK --------------------------------------------------------------------------- From haffner at lannion.cnet.fr Fri Jun 21 11:36:19 1991 From: haffner at lannion.cnet.fr (Haffner Patrick) Date: 21 Jun 91 17:36:19+0200 Subject: POST-DOCTORAL VACANCY : Connectionism and Oral Dialogue Message-ID: <9106211536.AA02620@lsun26> Applications are invited for research assistantship(s) for post-doctoral or sabbatical candidates. Funding at the French National Telecommunications Research Centre (Centre National d'Etudes des Telecommunications, CNET) will commence in September '91 for a two-year period ; the work location will be Lannion, Brittany, France. Experience is required in Natural Language Processing, especially Oral Dialogue Processing, by Connectionist methods. Applicants should specify the period between Sept '91 and Sept '93 which interests them. Applications, including CV/Resume, should be sent to : Mme Christel Sorin CNET LAA/TSS/RCP BP 40 22301 LANNION CEDEX FRANCE TEL : +33 96-05-31-40 FAX : +33 96-05-35-30 E-MAIL : sorin at lannion.cnet.fr From ITGT500 at INDYCMS.BITNET Thu Jun 20 11:55:32 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Thu, 20 Jun 91 10:55:32 EST Subject: Distributed Representations In-Reply-To: Your message of Wed, 19 Jun 91 04:39:39 EDT Message-ID: ----------------------------Original message---------------------------- Two days ago I mentioned (strict) local representation, binary distributed representation, and analog distributed representation. As an attempt to answer Ali Minai's question, I try to give my understandings on representations as follows: (1). In my opinion, the key points underlying the definitions of representa- tions are the correspondences between the objects (or items, entities, etc.) to be represented by the units (or nodes, neurons, etc.) of the network and the units. The objects can be classified according to the properties they have. More than one object probably can possess the same property. In this case, these objects should be classified into the same group with this property. The units can represent different properties of the objects, or different objects within the same property group. As mentioned in my mail two days ago, there are four kinds of correspondences existed for the relationships between the objects and units: one-to-one, multiple-to-one, one-to-multiple, and multiple-to-multiple. If we define the (strict) local representation as the one that represents the one-to-one correspondence only, then all the other three correspondences can be called distributed representations. However, since there are three different correspondences in distributed representation, the word "distributed representation" will probably be a too broad or too general concept if we try to use one definition "distributed representation" to refer to all the three correspondences. It is perhaps due to this too general word or concept that brought about the confusion on the advantages and disadvantages of local representation vs. distributed representation. (2). In an attempt to clarify these confusions, I think it is necessary to give more specific definitions to all these four different correspondences. Followings are my attempt to define these representations: Local Representation ---- The one-to-one correspondence in which each object is represented by one unit, and each unit represents only one object. Units in local representation always take binary values. Binary Distributed Representation ---- The one-to-multiple correspondence in which each object is represented by multiple units and each unit is employed to represent only one object. The unit takes only binary values here because it represents only one object, there is no need for it to take analog values. Analog Distributed Representation ---- The multiple-to-one correspondence in which multiple objects with the same property are represented by one unit and each unit represents multiple objects with the same property only. Here the unit takes different analog values for different objects within this property group. Different analog values are used to differentiate these different objects within the same property group. Mixed Distributed Representation ---- The multiple-to-multiple correspondence in which multiple objects of multiple properties are represented by one unit and each unit represents multiple objects with multiple properties. Here, the units take either binary or analog values depending on the properties and the object they represent. I am not sure whether the above definitions clarify these concepts and reduce the confusions on these problems or not. Welcome your comments on above statements. Bo Xu Dept. of Physiology and Biophysics School of Medicine Indiana University ITGT500 at INDYCMS.BITNET From hwang at pierce.ee.washington.edu Fri Jun 21 14:54:47 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Fri, 21 Jun 91 11:54:47 PDT Subject: IJCNN'91 Presidents' Forum (new announcement from Prof. Marks) Message-ID: <9106211854.AA13350@pierce.ee.washington.edu.> News release IEEE NEURAL NETWORKS COUNCIL IS SPONSORING A PRESIDENTS' FORUM AT IJCNN `91 IN SEATTLE, WASHINGTON Robert J. Marks II, Professor at the University of Washington and President of the IEEE Neural Networks Council (NNC), has announced that for the first time the IEEE/NNC will be sponsoring a Presidents' Forum during IJCNN `91 in Seattle, Washington, July 8-12, 1991. The participants of the Presidents' Forum will be the Presidents of the major artificial neural network societies of the world, including the China Neural Networks Committee, the Joint European Neural Network Initiative, the Japanese Neural Networks Society and the Russian Neural Networks Society. The Forum will be open to conference attendees and the press on Wednesday evening, 6:30-8:30 pm, July 10, 1991, at the Washington State Convention Center in Seattle. Each President will give a short (15-20 minute) presentation of the activities of their society, followed by a short question/answer period. Robert J. Marks II will be this year's moderator. From aam9n at honi4.acc.virginia.edu Thu Jun 20 17:39:38 1991 From: aam9n at honi4.acc.virginia.edu (aam9n) Date: Thu, 20 Jun 91 17:39:38 EDT Subject: Distributed Representations Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU> Bo Xu presents a very interesting classification of representations in terms of their distribution over representational units. The definitions of each class are internally clear enough, but I have some comments about how "distributivity" is defined, and where it leads. Let's take the definitions that Bo Xu gives: >Local Representation ---- The one-to-one correspondence in which each object > is represented by one unit, and each unit represents only one object. > Units in local representation always take binary values. No quarrel about this one being a local representation. >Binary Distributed Representation ---- The one-to-multiple correspondence > in which each object is represented by multiple units and each unit > is employed to represent only one object. The unit takes only binary > values here because it represents only one object, there is no need > for it to take analog values. Suppose I have two objects --- an apple and a pear --- and six representational units r1.....r6. Then, if I read this definition correctly, a distributed representation might be 000111 <-> apple and 111000 <-> pear. Since the units are binary, they are presumably "on" if the object is present and "off" if it is not. No reference is made to "properties" defining the object, and so there is no semantic content in any unit beyond that of mere signification: each unit is, ideally, identical. The question is: why have three units signifying one object when they work as one? One reason might be to achieve redundancy, and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Is this a distributed representation, though? To decide that, I must have an *external* definition of what it means for a representation to be distributed. Tentatively, I say that "a representation is distributed over a group of units if no single unit's correct operation is critical to the representation". This certainly holds in the above example. It holds, indeed, in all error-correcting codes. In a binary distributed representation, then, I can define the "degree of distributivity" as the minimum Hamming distance of the code. This is quite consistent, if rather disappointingly mundane. >Analog Distributed Representation ---- The multiple-to-one correspondence > in which multiple objects with the same property are represented by > one unit and each unit represents multiple objects with the same > property only. Here the unit takes different analog values for > different objects within this property group. Different analog > values are used to differentiate these different objects within the > same property group. Here, under the obvious reading of this definition, I have two categories (units) called "fruits" and "vegetables". Each represents many objects with different values, but mutually exclusively. Thus, I might have apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code 0.1,0.1. This is obviously equivalent to a binary representation with each unit replaced by, say, n binary units. The question is: does this code embody the principle of dispensibility? Not necessarily. One wrong bit could change an apple into a lemon, or even lose all information about the category of the object. Thus, in the general case, such a representation is "distributed" only in the physical sense of activating (or not activating) units in a group. Each unit is still functionally critical. >Mixed Distributed Representation ---- The multiple-to-multiple correspondence > in which multiple objects of multiple properties are represented by > one unit and each unit represents multiple objects with multiple > properties. Here, the units take either binary or analog values > depending on the properties and the object they represent. Now here we have what most people mean by "distributed representations". We have many properties, each represented by a unit, and many objects. Each object can be encoded in terms of its properties. If the set of properties does not have enough discrimination, multiple objects could have the same code. Even if the property set is sufficient for unique representation, it is possible that the malfunction of one unit may change one object to another. The question then is: is this dependency small or large? Does a small malfunction in a unit cause catastrophic change in the semantic content of the whole group of units? I can "distribute" my representation over all the atoms in the universe, but if that doesn't give me some protection from point failures, I have not truly "distributed" things at all --- merely multiplied the local representation. Now, of course, in the "real" world where things are uniformly or normally distributed and errors are uncorrelated, increasing the size of a representation over a set of independent units will almost always confer some degree of protection from catastrophic point failures. An important issue is how to *maximize* this. And to do that, we must be able to measure it. One way would be to minimize the average information each representational unit conveys about the represented objects, which is a simple maximum entropy formulation. This requirement must, of course, be balanced by an adequate representation imperative. Other formulations are certainly possible, and probably much better. In any case, many of the more interesting issues in distributed representation arise when the "object" being represented is only implicitly available, or when the representation is distributed over a hierarchy of units, not all of which are directly observable, and not all of which count in the final encoding. Comments? Ali Minai aam9n at Virginia.EDU From aam9n at honi4.acc.virginia.edu Thu Jun 20 17:39:38 1991 From: aam9n at honi4.acc.virginia.edu (aam9n) Date: Thu, 20 Jun 91 17:39:38 EDT Subject: Distributed Representations Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU> Bo Xu presents a very interesting classification of representations in terms of their distribution over representational units. The definitions of each class are internally clear enough, but I have some comments about how "distributivity" is defined, and where it leads. Let's take the definitions that Bo Xu gives: >Local Representation ---- The one-to-one correspondence in which each object > is represented by one unit, and each unit represents only one object. > Units in local representation always take binary values. No quarrel about this one being a local representation. >Binary Distributed Representation ---- The one-to-multiple correspondence > in which each object is represented by multiple units and each unit > is employed to represent only one object. The unit takes only binary > values here because it represents only one object, there is no need > for it to take analog values. Suppose I have two objects --- an apple and a pear --- and six representational units r1.....r6. Then, if I read this definition correctly, a distributed representation might be 000111 <-> apple and 111000 <-> pear. Since the units are binary, they are presumably "on" if the object is present and "off" if it is not. No reference is made to "properties" defining the object, and so there is no semantic content in any unit beyond that of mere signification: each unit is, ideally, identical. The question is: why have three units signifying one object when they work as one? One reason might be to achieve redundancy, and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Is this a distributed representation, though? To decide that, I must have an *external* definition of what it means for a representation to be distributed. Tentatively, I say that "a representation is distributed over a group of units if no single unit's correct operation is critical to the representation". This certainly holds in the above example. It holds, indeed, in all error-correcting codes. In a binary distributed representation, then, I can define the "degree of distributivity" as the minimum Hamming distance of the code. This is quite consistent, if rather disappointingly mundane. >Analog Distributed Representation ---- The multiple-to-one correspondence > in which multiple objects with the same property are represented by > one unit and each unit represents multiple objects with the same > property only. Here the unit takes different analog values for > different objects within this property group. Different analog > values are used to differentiate these different objects within the > same property group. Here, under the obvious reading of this definition, I have two categories (units) called "fruits" and "vegetables". Each represents many objects with different values, but mutually exclusively. Thus, I might have apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code 0.1,0.1. This is obviously equivalent to a binary representation with each unit replaced by, say, n binary units. The question is: does this code embody the principle of dispensibility? Not necessarily. One wrong bit could change an apple into a lemon, or even lose all information about the category of the object. Thus, in the general case, such a representation is "distributed" only in the physical sense of activating (or not activating) units in a group. Each unit is still functionally critical. >Mixed Distributed Representation ---- The multiple-to-multiple correspondence > in which multiple objects of multiple properties are represented by > one unit and each unit represents multiple objects with multiple > properties. Here, the units take either binary or analog values > depending on the properties and the object they represent. Now here we have what most people mean by "distributed representations". We have many properties, each represented by a unit, and many objects. Each object can be encoded in terms of its properties. If the set of properties does not have enough discrimination, multiple objects could have the same code. Even if the property set is sufficient for unique representation, it is possible that the malfunction of one unit may change one object to another. The question then is: is this dependency small or large? Does a small malfunction in a unit cause catastrophic change in the semantic content of the whole group of units? I can "distribute" my representation over all the atoms in the universe, but if that doesn't give me some protection from point failures, I have not truly "distributed" things at all --- merely multiplied the local representation. Now, of course, in the "real" world where things are uniformly or normally distributed and errors are uncorrelated, increasing the size of a representation over a set of independent units will almost always confer some degree of protection from catastrophic point failures. An important issue is how to *maximize* this. And to do that, we must be able to measure it. One way would be to minimize the average information each representational unit conveys about the represented objects, which is a simple maximum entropy formulation. This requirement must, of course, be balanced by an adequate representation imperative. Other formulations are certainly possible, and probably much better. In any case, many of the more interesting issues in distributed representation arise when the "object" being represented is only implicitly available, or when the representation is distributed over a hierarchy of units, not all of which are directly observable, and not all of which count in the final encoding. Comments? Ali Minai aam9n at Virginia.EDU From ITGT500 at INDYCMS.BITNET Sat Jun 22 11:38:17 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Sat, 22 Jun 91 10:38:17 EST Subject: Distributed Representations Message-ID: <29E19BB296800064@BITNET.CC.CMU.EDU> Ali Minai presented a good example of apple and pear. I am going to answer some questions he raised. Let's look at his statements first. >is not. No reference is made to "properties" defining the object, and so there >is no semantic content in any unit beyond that of mere signification: each This is a very good question. Generally speaking, there are many properties existed at the same time for each object. Let's take the apple as an example. An apple can be classified according to its taste, color, size, shape, or whether it is a fruit or not (as Ali Minai chose) etc. Different people will choose different criteria to meet the purpose of their applications. >unit is, ideally, identical. The question is: why have three units signifying >one object when they work as one? One reason might be to achieve redundancy, >and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear). Redundancy and fault-tolerance may be reasons for binary distributed representation. Another reason probably comes from the faster convergence rate consideration. Karen Kukich has done some interesting work and concludes that the advantage of local representation is the faster convergence rate (see K. Kukich, "Variations on a Back-Propagation Name Recognition Net" in the Proceedings of the United States Postal Service Advanced Technology Conference, Vol. 2, 722-735). The binary distributed representation is similar to local representation in that they all take binary values. However, as to why "three" instead of "five" or any other numbers, I also don't know. This question is probably similar to the question of "how many hidden units are needed for a specific task?". It may depend on to what degree the redundancy is needed. >Here, under the obvious reading of this definition, I have two categories >(units) called "fruits" and "vegetables". Each represents many objects >with different values, but mutually exclusively. Thus, I might have >apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code >0.1,0.1. This is obviously equivalent to a binary representation with >each unit replaced by, say, n binary units. The question is: does this >code embody the principle of dispensibility? Not necessarily. One wrong bit >could change an apple into a lemon, or even lose all information about the >category of the object. Thus, in the general case, such a representation >is "distributed" only in the physical sense of activating (or not activating) >units in a group. Each unit is still functionally critical. It is true if there is a bit of error, the apple will change to lemon etc. However, the key point here is that the neural net's fault-tolerance characteristic exists only after it is trained and has reached an accuracy criterion. If we are dealing with many objects and use 0.1 as a value to differentiate different objects, we will train the net to reach a criterion at least smaller than 0.1 (otherwise, the net will be of no use). Thus, for seen patterns, the error will not be so big that an apple will turn into a lemon. For unseen patterns, bigger errors probably will occur, and apples probably will turn to lemons or whatsoever. However, this time we may not attribute the problem to the representation used only. This is related to the generalizability of the net, and the learning algorithm, units responsive characteristics and even the topology of the net all probably are playing roles for the generalizability of the net. >Now here we have what most people mean by "distributed representations". We >nother. The question then is: is this dependency small or large? Does >small malfunction in a unit cause catastrophic change in the semantic >content of the whole group of units? I can "distribute" my representation When talking about the representations, the graceful degradation of brain is introduced as a criterion. However, since the neural net is still far away from a real brain model, some cautiousness should be taken when relating the neural net to brain. The first thing to be made clear is that which layer of neural net we are refering to. Most people refer to the interface layers (the input and output layers) of neural net when they talk about the local/distributed representations. However, they refer to all layers (both the interface layers and hidden layers) when they talk about the graceful degradation. However, what are the justices for the interface layers to possess graceful degradation? If we say that neural net resembles brain in some aspects, then the resemblance most likely lies in the hidden layers instead of the interface layers. The criterion of graceful degradation should be made on the hidden layers instead of the interface layers. In most of current nets, the hidden layers are using mixed distributed representation, and thus possess the graceful degradation characteristics. As to the interface layers (input/output layers), we can demand them to possess the graceful degradation characteristics too. However, in my opinion, this will lead to many additional problems and confusions. The mixed distributed representation is good for hidden layers, not for interface layers. I think for the interface layers, the analog distributed representation works best because: (1) Considerations at the interface layers should be practicality instead of graceful degradation. There is no justice and no need for the interface layers to possess the graceful degradation. (2). The analog distributed representation has classified the objects to be represented. The objects with the same property are classified into the same group. The differences between the objects in the same group are represented by different analog values of the unit representing this property group (eg, assume that there are four apples and three pears, then in analog distributed representation, two units should be used: unit A for apple and unit P for pear. The four apples can be represented by letting unit A take four different analog values. The three pears can be represented by letting unit P take three different analog values.). This is the most natural way when we deal with many objects. Why should we sacrifice the natural way (analog distributed representation) for the graceful degradation (which may not belong to the interface layers. The hidden layers are using mixed distributed representation and possess graceful degradation) when we are considering the interface layers? We used the analog distributed representation in a parabolic problem (a task mapping the parabola curve we used to compare the performances of BPNN and PPNN) and found that the analog distributed representation was the best and most natural representation for problems (such as the parabolic problem) which has continuous and infinite training/test patterns (objects). In sum, I think that we should be more specific when we talk about the representations and brain-like characteristics of neural nets: (1) For the interface layers (input/output layers), the analog distributed representation is the best choice because at the interface layers, the priority of consideration is practicality, and the analog distributed representation is the most natural one and most easily to be used in dealing with many objects. (2) For the hidden layers, the mixed distributed representation is the best choice because the graceful degradation requirement now is the priority to be taken into account of for hidden layers. Fortunately, most of the current network architechures have ensured such requirement for hidden layers. Bo Xu ITGT500 at INDYCMS.BITNET From aam9n at hagar3.acc.Virginia.EDU Sat Jun 22 21:49:33 1991 From: aam9n at hagar3.acc.Virginia.EDU (Ali Ahmad Minai) Date: Sat, 22 Jun 91 21:49:33 EDT Subject: Distributed Representations Message-ID: <9106230149.AA00465@hagar3.acc.Virginia.EDU> Bo Xu raises some questions about distributed representations in the context of feed-forward neural networks, particularly with regard to graceful degradation. I do not agree that to require graceful degradation is to imply "brain-like" networks. In my opinion, the very notion of distribution is fundamentally linked to the requirement that each representational unit be minimally loaded, and that each representation be as homogeneously distributed over all representational units as possible. That this produces graceful degradation is partly true (only to the first order, given the non-linearity of the system), but that is incidental. Speaking of which layers to apply the definition to, I think that in a feed-forward associative network (analog or binary), the hidden neurons (or all the weights) are the representational units. The input neurons merely distribute the prior part of the association, and the output neurons merely produce the posterior part. The latter are thus a "recovery mechanism" designed to "decode" the distributed representation of the hidden units and recover the "original" item. Of course, in a heteroassociative system, the "recovered original" is not the same as the "stored original". I realize that this is stretching the definition of "representation", but it seems quite natural to me. The issue of a "recovery mechanism" is quite fundamental to the question of representational distribution. Without a requirement for adequate recoverability, any finite medium could be "distributedly" loaded with a potentially infinite number of representations, without being able to reproduce any of them. To ensure adequate recoverability, however, representations must be "distinct", or mutually non-interacting, in some sense. Given the countervailing requirement of distributedness, the obvious route of separation by localization is not available, and we must arrive at some compromise principle of minimum mutual disturbance, such as a requirement for orthogonality or linear independence (rather artificial, if you ask me). My point is that defining distributed representations only in terms of unconstrained characteristics is a partial solution. Internal and external constraining factors must be included in the formulation to adequately ground the definition. These are provided by the requirements of maximum dispensibility and adequate recoverability. Zillions of issues remain unaddressed by this formulation too, especially those of consistent measurement. I feel that each domain and situation will have to supply its own specifics. I am not sure I understand Bo Xu's assertion that analog representations are "more natural". Certainly, to approximate a parabola (which I have done hundreds of times with different neural nets) would imply using an analog representation, but it is not clear if that is so natural for classifying apples and pears. Using different analog values to indicate intra-class variations is reasonable and, under specific circumstances, might even be provably better than a binary representation. But I would be very hesitant to generalize over all possible circumstances. In any case, a global characterization of distributed representation should depend of specifics only for details, and should apply to both discrete and analog representations. Ali Minai University of Virginia aam9n at Virginia.EDU From ross at psych.psy.uq.oz.au Sun Jun 23 01:52:51 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 23 Jun 1991 15:52:51 +1000 Subject: Distributed vs Localist Representations Message-ID: <9106230552.AA02343@psych.psy.uq.oz.au> Randall Stark (rstark at aipna.edinburgh.ac.uk) writes: >One aspect of this issue which seems implicit in much of this discussion >is the notion that distributed representation can be considered >a *relative* property. Thus the "room schema" network is "distributed" >relative to rooms, but "localist" relative to ovens. A related point was raised by Paul Smolensky in his work on variable binding using tensor representations. By his definition a representation is distributed if enitities of external interest (objects, attributes, values or whatever) are represented as patterns across multiple units. The point Paul makes is that in much connectionist work the variables are localised while the values are distributed. That is, the set of units is typically divided into disjoint groups that function as registers or variables. Each variable is able to hold a pattern of activations that is a distributed value. He proposed a mechanism in which the variables are not disjoint sets of units but instead are patterns that are bound to the patterns representing values. Using this scheme a binding of a variable with a value is itself represented as a pattern distributed over units and multiple bindings can be simultaneously represented on the same units. The nice point about this is that it puts variables and values on an equal footing, they are both patterns. In fact the system does not need to distinguish between them from a processing perspective. Whether something is a variable or a value is a question of how it is used, not how it is represented or implemented. Ross Gayler ross at psych.psy.uq.oz.au From aarons at cogs.sussex.ac.uk Sun Jun 23 16:13:31 1991 From: aarons at cogs.sussex.ac.uk (Aaron Sloman) Date: Sun, 23 Jun 91 21:13:31 +0100 Subject: Varieties of intelligence (long) Message-ID: <1666.9106232013@csrn.cogs.susx.ac.uk> A friend, Gerry Martin, is interested in "achievers", how they differ and the conditions that create them or enable them to achieve. I offered to try to find out if anyone knew of relevant work on different kinds of (human) intelligence, how they develop, what they are, and what (social) mechanisms if any enable them to be matched with opportunities for development or fulfilment. There's a collection of related questions. 1. To what extent does evolution produce variation in intellectual capabilities, motivations, etc.? How far is the observable variation due to environmental factors? This is an old question, of course, and very ill-defined (e.g. there is probably no meaningful metric for the contributions of genetic and environmental factors to individual development). It is clear that physical variability is inherent in evolutionary mechanisms: without this there could not be (Darwinian) evolution. The same must presumably be true for "mental" variability. Do genetic factors produce different kinds of differences: in intellectual capabilities, motivational patterns, perceptual abilities, memory abilities, problem solving abilities, etc. I think it was Waddington who offered the metaphor of the "epigenetic landscape" genetically determining the opportunities for development of an individual. The route actually taken through the landscape would depend on the individual's environment. So our question is how different are the landscapes (the sets of possible developmental routes) with which each human child is born, and to what extent do they determine different opportunities for mental, as well as physical development? (Obviously the two are linked: a blind child won't as easily become a great painter.) (Piaget suggested that all the human landscapes have a common structure, with well defined stages. I suspect this view will not survive close analysis.) For intelligent social animals, mental variability is more important than physical variability: a social system has more diversity of intellectual and motivational requirements in its "jobs" than diversity of physical requirements. (Perhaps not if you include the "jobs" done for us by other animals, plants, microorganisms, machines, etc., without which our society could not survive.) Anyhow, without variation in mental properties (whether produced genetically or not) it could be hard to achieve the division of labour that enables a complex social system to work. Aldous Huxley's book "Brave New World" takes this idea towards an unpalatable conclusion. The need for mental variability goes beyond infrastructure: without such variability all artists would be painters, or all would be composers, or all would be poets, and all scientists would be physicists, or biologists... Division of labour is required not only for the enabling mechanisms of society, but also for cultural richness. 2. What is the form of this variability? Folk psychology has it that there are different kinds of genius - musical geniuses, mathematical geniuses, geniuses in biology, great actors and actresses, etc. Could any of these have excelled in any other field? Would the right education have turned Mozart into a great mathematicion, or would his particular "gifts" never have engaged with advanced mathematics? Could a suitable background have made Newton a great composer? Does anyone have any insight into the genetic requirements for different kinds of creative excellence? We can distinguish two broad questions: (a) is there wide variability in DEGREE in innate capabilities (b) is there also wide variability in KIND (domain, field of application, or whatever)? In either case it would be interesting to know what kinds of mechanisms account for the differences? Could they be quantitative (as many naive scientists have supposed -- e.g. number of brain cells, number of connections, speed of transmission of signals, etc.) or are the relevant differences more likely to be structural -- i.e. differences in hardware or software organisation? It looks as if many ordinary human learning capabilities need specific pre-determined structures, providing the basis for learning abilities: e.g. learning languages with complex syntax, learning music, learning to control limbs, learning to see structured objects, learning to play games, learning mathematics, and so on. (Some of the structures creating these capabilities might be shared between different kinds of potential.) If these enabling structures are not "all-or-nothing" systems there could sometimes be partial structures at birth, giving some individuals subsets of "normal" capabilities. Are these all a result of pre-natal damage, or might the gene pool INHERENTLY generate such variety? (An unpalatable truth?) Does the gene pool also produce some individuals with powerful supersets of what is relatively common? Are there importantly different supersets, corresponding to distinct "gifts"? (E.g. Mozart, Newton, Shakespeare.) What are the additional mechanisms these individuals have? Can those born without be given them artificially? (E.g. through special training, hormone treatment, etc..) 3. To what extent do different approaches to AI (I include connectionism as a sub-field of AI) provide tools to model different sorts of mentalities? As far as I know, although there has been much empirical research (e.g. on twins) to find out what is and what is not determined genetically, there there has been very little discussion of mechanisms that might be related to such variability. >From an AI standpoint it is easy to speculate about ways in which learning systems could be designed that are initially highly sensitive to minor and subtle environmental differences and which, through various kinds of positive feedback, amplify differences so that even individuals that start off very similar could, in a rich and varied environment, end up very different. This sort of thing could be a consequence of multi-layered self-modifying architectures with thresholds of various kinds that get modified by "experience" and which thereby change the behaviour of systems which cause other thresholds to be modified. Even without thresholds, hierarchies of condition-action rules, where some of the actions create or alter other rules, would also provide for enormous variability. (As could hierarchies of pdp networks, some of which change the topology of others.) Cascades of such changes could produce huge qualitative variation in various kinds of intellectual capabilities as well as variation in motivational, emotional and personality traits, aesthetic tastes, etc. Such architectures might allow relatively small genetic differences as well as small environmental differences to produce vast differences in adult capabilities. Variation in tastes in food, or preferences for mates, despite common biological needs, seem to be partly a result of cultural feedback through such developmental mechanisms. But is it all environmental? I gather there are genetic factors that stop some people liking the tastes of certain foods. What about a taste for mathematics, or a general taste for intellectual achievement? 4. Does anyone have any notion of the kinds of differences in implementation that could account for differences in tastes, capabilities, etc. Would it require: (a) differences in underlying physical architectures (e.g. different divisions of brains into cooperative sub-nets, or different connection topologies among neurones?), (b) differences in the contents of "knowledge bases", "plan databases", skill databases, etc. (By "database" I include what can be stored in a trainable network.) (c) differences in numerical parameters. or something quite different? I suspect there's a huge variety of distinct ways in which qualitative differences in capability can emerge: some closer to hardware differences, some closer to software differences. The latter might in principle be easier to change, but not in practice, if for example, it requires de-compiling a huge and messy system. The only AI-related work that I know of that explicitly deals not only with the design or development of a single agent, but with variable populations, is work on genetic algorithms, which can produce a family of slightly different design solutions. Of course, it is premature for anyone to consider modelling evolutionary processes that would produce collections of "complete" intelligent agents (as opposed to collections of solutions to simple problems like planning problems, recognition problems, or whatever). But has anyone investigated general principles involved in mechanisms that could produce populations of agents with important MENTAL differences? Are there any general principles? (Are the mental epigenetic landscapes for a species importantly different in structure from the physical ones? Perhaps for some organisms, e.g. ants, there's a lot less difference than for others, e.g. chimpanzees?) 5. There are related questions about the need for or possibility of social engineering. (The questions are fraught with political and ethical problems.) In particular, if truly gifted individuals have narrowly targetted potential, are there mechanims that enable such potential to be matched with appropriate opportunities for development and application? Do rare needs have a way of "attracting" those with the rare ability to tackle them? What mechanisms can help to match individuals with unusual combinations of motives and capabilities, with tasks or roles that require those combinations? In a crude and only partly successful way the educational system and career advisory services attempt to do this. Special schools or special lessons for gifted children attempt to enhance the match-making. However, these formal institutions work only insofar as there are fairly broad and widely-recognized categories of individuals and of tasks. They don't address the problem of matching the potentially very high achievers to very specific opportunities and tasks that need them. Some job advertisements and recruitment services attempt to do this but there's no guarantee that they make contact with really suitable candidates, and we all know how difficult selection is. Also these mechanisms assume that the need has been identified. There was no institution that identified the need for a theory of gravity and recruited Newton, provided him with opportunties, etc. Was it pure chance then that he was "found"? Or were there many others who might have achieved what he did? Or were there unrecognized social mechanisms that "arranged" the match? If so, how far afield could he have been born without defeating the match-making? If the potentially very high acheivers only have very small areas in which their potential can be realized, and if each type is very rare, there may be no general way to set up conditions that bring them into the appropriate circumstances. An important example might turn out to be the problem of matching the particular collection of talents, knowledge, and opportunity that would enable a cure for AIDS to be found. In a homogenous global culture with richly integrated (electronic?) information systems it might be possible to reduce the risks of such lost opportunities, but only if there are ways of recognizing in advance that a particular individual is likely to be well suited to a particular task. The more narrowly defined and rare the task and the capabilities, the less likely it is that the match can be recognized in advance. Is the idea that there are important but extremely difficult tasks and challenges that only a very few individuals have the potential to cope with just a romantic myth? Or is every solvable problem, every achievable goal, solvable by a large subset of humanity, given the right training and opportunity? (Will we ever know whether nobody but Fermat had what it takes to prove his "last" theorem?) Even if the "romantic myth" is close to the truth, there may be no way of setting up social mechanisms with a good chance of bringing important opportunities and appropriately gifted individuals together: social systems are so complex that all attempts to control them, however well-meaning, invariably have a host of unintended, often undesirable, consequences, some of them long term and far less obvious than missiles that hit the wrong target. Could some variant of AI help here? It seems unlikely that connectionist pattern recognition techniques could work. (E.g. where would training sets come from?) Could some more abstract sort of expert system help? Neither could inform us that the person capable of solving a particular problem is an unknown child in a remote underdeveloped community. Perhaps there is nothing for it, but to rely on chance, co-incidence, or whatever combination of ill-understood biological and social processes have worked up to now in enabling humankind to achieve what distinguishes us from ants and apes) including our extremes of ecological vandalism). ----------------------------------------------------------------------- I don't know if I have captured Gerry's questions well: he hasn't seen this message. But if you have any relevant comments including pointers to literature, information about work in progress, criticisms of the presuppositions of the questions, conjectures about the answers, etc. I'll be interested to receive them and to pass them on. I'll post this to connectionists and the comp.ai newsgroup. (Should it go to others?) Apologies for length. Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons at cogs.sussex.ac.uk After 18th July 1991: School of Computer Science. The University of Birmingham, UK. Email: A.Sloman at cs.bham.ac.uk From ITGT500 at INDYCMS.BITNET Mon Jun 24 10:45:52 1991 From: ITGT500 at INDYCMS.BITNET (Bo Xu) Date: Mon, 24 Jun 91 09:45:52 EST Subject: Distributed Representations Message-ID: Ali Minai mentioned a good point on where the representations are considered. Let's see his messages first: >Speaking of which layers to apply the definition to, I think that in a >feed-forward associative network (analog or binary), the hidden neurons >(or all the weights) are the representational units. The input neurons >merely distribute the prior part of the association, and the output neurons >merely produce the posterior part. The latter are thus a "recovery mechanism" >designed to "decode" the distributed representation of the hidden units and >recover the "original" item. Of course, in a heteroassociative system, the >"recovered original" is not the same as the "stored original". I realize that >this is stretching the definition of "representation", but it seems quite >natural to me. I think according to the criterion of where representations exist, the representations can be classified into two different types: (1). External representations ---- The representations existed at the interface layers (input and/or output layers). They are responsible for the information transmission between the network and the outside world (coding the input information at the input layer and decoding the output information at the output layer). (2). Internal representations ---- The representations existed at the hidden layers. These representations are used to encode the mappings from the input field to the output field. The mappings are the core of the neural net. If I understand correctly, Ali Minai is referring to the internal representations only, and neglect the external representations. The internal representations are very important representations. However, these representations are determined by the topology of the network, and we cannot change them unless we change the network topology. Most of the current networks' topology ensure that the internal representations are mixed distributed representations (as I pointed out several days ago). Their working mechanisms are still a black-box. Without changing the topology of the network, what we can choose and select are the external representations only. They should not be neglected. >Zillions of issues remain unaddressed by this formulation too, especially >those of consistent measurement. I feel that each domain and situation >will have to supply its own specifics. >I am not sure I understand Bo Xu's assertion that analog representations >are "more natural". Certainly, to approximate a parabola (which I have >done hundreds of times with different neural nets) would imply using an >analog representation, but it is not clear if that is so natural for >classifying apples and pears. Using different analog values to indicate >intra-class variations is reasonable and, under specific circumstances, >might even be provably better than a binary representation. But I would >be very hesitant to generalize over all possible circumstances. In any >case, a global characterization of distributed representation should depend >of specifics only for details, and should apply to both discrete and analog >representations. It's true that there will be zillions of issues in practical applications. However, it's also due to this fact, it will be very difficult (if not impossible) to study all these zillions issues first before drawing some conclusions. Some generalizations based on limited studies are probably necessary and of help when facing such a situation. I want to thank Ali Minai for his comments. All of his comments are very valuable and thought-stimulating. Bo Xu Indiana University ITGT500 at INDYCMS.BITNET From aam9n at hagar2.acc.Virginia.EDU Mon Jun 24 22:29:34 1991 From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai) Date: Mon, 24 Jun 91 22:29:34 EDT Subject: Distributed Representations Message-ID: <9106250229.AA00528@hagar2.acc.Virginia.EDU> This is in response to Bo Xu's last posting regarding distributed representations. I think one of the problems is a basic incompatibility in our notions of "representations" and where they exist. I would like to clarify my earlier posting somewhat on this point. I wrote: >>Speaking of which layers to apply the definition to, I think that in a >>feed-forward associative network (analog or binary), the hidden neurons >>(or all the weights) are the representational units. The input neurons >>merely distribute the prior part of the association, and the output neurons >>merely produce the posterior part. The latter are thus a "recovery mechanism" >>designed to "decode" the distributed representation of the hidden units and >>recover the "original" item. Of course, in a heteroassociative system, the >>"recovered original" is not the same as the "stored original". I realize that >>this is stretching the definition of "representation", but it seems quite >>natural to me. To which Bo replied: >I think according to the criterion of where representations exist, the >representations can be classified into two different types: > >(1). External representations ---- The representations existed at the > interface layers (input and/or output layers). They are > responsible for the information transmission between the network > and the outside world (coding the input information at the input > layer and decoding the output information at the output layer). > >(2). Internal representations ---- The representations existed at the > hidden layers. These representations are used to encode the > mappings from the input field to the output field. The mappings > are the core of the neural net. > >If I understand correctly, Ali Minai is referring to the internal >representations only, and neglect the external representations. The internal >representations are very important representations. However, these >representations are determined by the topology of the network, and we cannot >change them unless we change the network topology. Most of the current >networks' topology ensure that the internal representations are mixed >distributed representations (as I pointed out several days ago). Their >working mechanisms are still a black-box. > >Without changing the topology of the network, what we can choose and >select are the external representations only. They should not be neglected. First, let me state what I meant by the "stored" and "recovered" representations in the heteroassociative case. We can see the process of the heteroassociation of an input vector U and output vector V in a feed-forward network as a process of encoding a representation of the vector UV over the hidden units of the network. This is what I call "storage". There is a special requirement here that, given U, a mechanism should be able to produce V over the output units, thus "completing the pattern". The process of doing this is what I call "recovery" (or "recall"). The way I see it (and I believe most other connectionists too) is that the representational part of the network consists of its "internals" --- either the weights, or the hidden units. Far from being uncontrollable, as Bo Xu states, these are *precisely* the things that we *do* control --- not in a micro sense, but through complex global schemes such as training algorithms. The prior to be stored, which Bo takes to be the representation, is, to me, just a given that has been through some unspecified preprocessing. It is the "object" to be represented (though I agree that all objects are themselves representations). From rosauer at ira.uka.de Tue Jun 25 14:27:57 1991 From: rosauer at ira.uka.de (Bernd Rosauer) Date: Tue, 25 Jun 91 14:27:57 MET DST Subject: genetic algorithms + neural networks Message-ID: I am interested in any kind of combination of genetic algorithms and neural network training. I am aware of the papers presented at * Connectionist Models Summer School, 1990 * First International Workshop on Parallel Problem Solving from Nature, 1990 * Third International Conference on Genetic Algorithms, 1989 * Advances in Neural Information Processing Systems 2, 1989. Please, let me know if there is any further work on that topic. Post to , so I will summarize here. Thanks a lot Bernd From stork at GUALALA.CRC.RICOH.COM Mon Jun 24 20:36:49 1991 From: stork at GUALALA.CRC.RICOH.COM (David Stork) Date: Mon, 24 Jun 91 17:36:49 -0700 Subject: Job offer Message-ID: <9106250036.AA11456@cache.CRC.Ricoh.Com> The Ricoh California Research Center has an oppening for a staff programmer or researcher in neural networks and connectionism. This opening is for a B.S. or possibly M.S.-level graduate in Physics, Computer Science, Math, Electrical Engineering, Cognitive Science, Psychology, and related topics. A background in some hardware design is a plus. The Ricoh California Research Center is located in Menlo Park, about one mile from Stanford University. Contact: Dr. David G. Stork Ricoh California Research Center 2882 Sand Hill Road #115 Menlo Park, CA 94025-7022 stork at crc.ricoh.com From issnnet at park.bu.edu Tue Jun 25 15:39:29 1991 From: issnnet at park.bu.edu (issnnet@park.bu.edu) Date: Tue, 25 Jun 91 15:39:29 -0400 Subject: Call For Votes: comp.org.issnnet Message-ID: <9106251939.AA04607@copley.bu.edu> CALL FOR VOTES ---------------- GROUP NAME: comp.org.issnnet STATUS: unmoderated CHARTER: The newsgroup shall serve as a medium for discussions pertaining to the International Student Society for Neural Networks (ISSNNet), Inc., and to its activities and programs as they pertain to the role of students in the field of neural networks. Details were posted in the REQUEST FOR DISCUSSION, and can be requested from . VOTING PERIOD: JUNE 25 - JULY 25, 1991 ****************************************************************************** VOTING PROCESS If you wish to vote for or against the creation of comp.org.issnnet, please send your vote to: issnnet at park.bu.edu To facilitate collection and sorting of votes, please include one of these lines in your "subject:" entry: If you favor creation of comp.org.issnnet, your subject should read: YES - comp.org.issnnet If you DO NOT favor creation of comp.org.issnnet, use the subject: NO - comp.org.issnnet YOUR VOTE ONLY COUNTS IF SENT DIRECTLY TO THE ABOVE ADDRESS. ----------------------------------------------------------------------- For more information, please send e-mail to issnnet at park.bu.edu (ARPANET) write to: ISSNNet, Inc. PO Box 557, New Town Br. Boston, MA 02258 USA ISSNNet, Inc. is a non-profit corporation in the Commonwealth of Massachusetts. NOTE -- NEW SURFACE ADDRESS: ISSNNet, Inc. P.O. Box 15661 Boston, MA 02215 USA From koch at CitIago.Bitnet Thu Jun 27 06:12:08 1991 From: koch at CitIago.Bitnet (Christof Koch) Date: Thu, 27 Jun 91 03:12:08 PDT Subject: Phase-locking without oscillations Message-ID: <910627031202.20402f6a@Iago.Caltech.Edu> The following paper is available by anyonymous FTP from Ohio State University from pub/neuroprose. The file is called "koch.syncron.ps.Z". A SIMPLE NETWORK SHOWING BURST SYNCHRONIZATION WITHOUT FREQUENCY-LOCKING Christof Koch and Heinz Schuster ABSTRACT: The dynamic behavior of a network model consisting of all-to-all excitatory coupled binary neurons with global inhibition is studied analytically and numerically. It is shown that for random input signals, the output of the network consists of synchronized bursts with apparently random intermissions of noisy activity. We introduce the fraction of simultaneously firing neurons as a measure for synchrony and prove that its temporal correlation function displays, besides a delta peak at zero indicating random processes, strongly damped oscillations. Our results suggest that synchronous bursts can be generated by a simple neuronal architecture which amplifies incoming coincident signals. This synchronization process is accompanied by damped oscillations which, by themselves, however, do not play any constructive role in this and can therefore be considered to be an epiphenomenon. Key words: neuronal networks / stochastic activity / burst synchronization / phase-locking / oscillations For comments, send e-mail to koch at iago.caltech.edu. Christof P.S. And this is how you can FTP and print the file: unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62) Name: anonymous Password: neuron ftp> cd pub/neuroprose (actually, cd neuroprose) ftp> binary ftp> get koch.syncron.ps.Z ftp> quit unix> uncompress koch.syncron.ps.Z unix> lpr koch.syncron.ps Read and be illuminated. From nowlan at helmholtz.sdsc.edu Thu Jun 27 14:38:58 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Thu, 27 Jun 91 11:38:58 MST Subject: Thesis/TR available Message-ID: <9106271838.AA27191@bose> The following technical report version of my thesis is now available from the School of Computer Science, Carnegie Mellon University: ------------------------------------------------------------------------------- Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures CMU-CS-91-126 Steven J. Nowlan School of Computer Science Carnegie Mellon University ABSTRACT In this thesis, we consider learning algorithms for neural networks which are based on fitting a mixture probability density to a set of data. We begin with an unsupervised algorithm which is an alternative to the classical winner-take-all competitive algorithms. Rather than updating only the parameters of the ``winner'' on each case, the parameters of all competitors are updated in proportion to their relative responsibility for the case. Use of such a ``soft'' competitive algorithm is shown to give better performance than the more traditional algorithms, with little additional cost. We then consider a supervised modular architecture in which a number of simple ``expert'' networks compete to solve distinct pieces of a large task. A soft competitive mechanism is used to determine how much an expert learns on a case, based on how well the expert performs relative to the other expert networks. At the same time, a separate gating network learns to weight the output of each expert according to a prediction of its relative performance based on the input to the system. Experiments on a number of tasks illustrate that this architecture is capable of uncovering interesting task decompositions and of generalizing better than a single network with small training sets. Finally, we consider learning algorithms in which we assume that the actual output of the network should fall into one of a small number of classes or clusters. The objective of learning is to make the variance of these classes as small as possible. In the classical decision-directed algorithm, we decide that an output belongs to the class it is closest to and minimize the squared distance between the output and the center (mean) of this closest class. In the ``soft'' version of this algorithm, we minimize the squared distance between the actual output and a weighted average of the means of all of the classes. The weighting factors are the relative probability that the output belongs to each class. This idea may also be used to model the weights of a network, to produce networks which generalize better from small training sets. ------------------------------------------------------------------------------- Unfortunately there is NOT an electronic version of this TR. Copies may be ordered by sending a request for TR CMU-CS-91-126 to: Computer Science Documentation School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA There will be a charge of $10.00 U.S. for orders from the U.S., Canada or Mexico and $15.00 U.S. for overseas orders to cover copying and mailing costs (the TR is 314 pages in length). Checks and money orders should be made payable to Carnegie Mellon University. Note that if your institution is part of the Carnegie Mellon Technical Report Exchange Program there will be NO charge for this TR. REQUESTS SENT DIRECTLY TO MY E-MAIL ADDRESS WILL BE FILED IN /dev/null. - Steve (P.S. Please note my new e-mail address is nowlan at helmholtz.sdsc.edu). ------- End of Forwarded Message From D.M.Shumsheruddin at computer-science.birmingham.ac.uk Thu Jun 27 06:06:51 1991 From: D.M.Shumsheruddin at computer-science.birmingham.ac.uk (Dean Shumsheruddin) Date: Thu, 27 Jun 91 11:06:51 +0100 Subject: Request for references on navigation Message-ID: <961.9106271006@christopher-robin.cs.bham.ac.uk> I am looking for references to work on neural nets for navigation in graph-structured environments. I've already found the papers by Pomerleau and Bachrach in NIPS 3. I would greatly appreciate information about related work. If there is sufficient interest I'll post a summary to the list. Dean Shumsheruddin University of Birmingham, UK dms at cs.bham.ac.uk From russ at oceanus.mitre.org Fri Jun 28 10:47:09 1991 From: russ at oceanus.mitre.org (Russell Leighton) Date: Fri, 28 Jun 91 10:47:09 EDT Subject: Aspirin/MIGRAINES v4.0 Users Message-ID: <9106281447.AA13459@oceanus.mitre.org> Aspirin/MIGRAINES v4.0 Users Could those groups presently using the Aspirin/MIGRAINES v4.0 neural network simulator from MITRE please reply to this message. A brief description of your motivation for using this software would be useful but not necessary. We are compiling a list of users so that we may more easily distribute the next release of software (Aspirin/MIGRAINES v5.0). Thank you. Russell Leighton INTERNET: russ at dash.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA