From matsuoka at nttspch.ntt.jp Mon Apr 4 15:24:20 1988 From: matsuoka at nttspch.ntt.jp (Tatsuo Matsuoka) Date: Mon, 4 Apr 88 15:24:20 JST Subject: Could I have Technical Report ? Message-ID: <8804040624.AA06459@nttspch.NTT> Dear Colleague: Let me know the physical mail address of CMU technical report director. Or if you have the paper which I like to get, could you send me a copy ? I'd like to have the paper: David C. Plaut, Steven Nowlan, and Geoffrey Hinton "Experiments on Learning by Back Propagation," Technical Report CMU-CS-86-126 Carnegie-Mellon University, 1986. Thank you. Tatsuo MATSUOKA (who is now researching speech recognition using connectionist model) address: 320C Language Media Laboratory NTT Human Interface Laboratories 1-2356 Take Yokosuka-shi Kanagawa 238 Japan phone : +81 468 59 2943 From munro at b.psy.cmu.edu Mon Apr 4 20:14:59 1988 From: munro at b.psy.cmu.edu (Paul Munro Paul Munro) Date: Monday, 04 Apr 88 19:14:59 EST Subject: Two Tech Reports Message-ID: The following TRs are available and can be obtained by writing to Ms. Susan Webreck LIS Building 5th Floor University of Pittsburgh Pittsburgh PA 15260 Learning to Represent and Understand Locative Prepositional Phrases Cynthia Cosic and Paul Munro Abstract: The spatial juxtaposition observed between any two objects can generally be expressed by a locative prepositional phrase of the form noun-preposition-noun. A network is described which learns to associate locative prepositions with the spatial relationships they express using back-propagation. Since the mapping is context sensitive, input patterns to the network include a representation of the objects: for example, the preposition "on" expresses different relationships in the phrases "house on lake" and "plate on table". The network is designed such that, during the learning process, it must develop a small set of features for representing the nouns appropriately. The problem is framed as a pattern completion task in which the pattern to be completed consists of four components: two nouns, a preposition, and the spatial relationship; the missing component is either the preposition or the spatial relationship. After learning, the network was analyzed in terms of [1] its ability to process novel pattern combinations, [2] clustring of the noun representations, [3] the core meanings of the prepositions and [4] the incremental influence of 0, 1, and 2 nouns on the interpreted meaning of a preposition. The above is TR IS88002 Below is TR IS88003: Self-supervised Learning of Concepts by Single Units and "Weakly Local" Representations Paul Munro Abstract: A mathematical system, \fIself-supervised learning\fR (SSL) is presented that describes a form of learning for high-order "concept units" (C-units) that learn to become sensitive to categories of stimuli associated by some feature (the concept) that they share. Implicit in the SSL model is the assumption that each C-unit receives input from at least two information streams or "banks". Under SSL, each C-unit becomes very selective across one of the streams, the training bank; that is, patterns in the training bank are strongly filtered by the C-unit such that all of them are ignored, save one, or a few. The preferred stimulus pattern in the traing bank serves as a "seed" for concept formation, as an associative process causes the stimulus patterns on the other banks to drive the C- C-unit to the extent that they are corellated with the seed stimulus in the world. The possibility that linguistic informationmay provide seed stimuli suggests an approach via SSL for understanding the role of language in concept formation. From feldman at cs.rochester.edu Wed Apr 6 09:32:16 1988 From: feldman at cs.rochester.edu (feldman@cs.rochester.edu) Date: Wed, 6 Apr 88 09:32:16 EDT Subject: Could I have Technical Report ? Message-ID: <8804061332.AA22578@wasat.cs.rochester.edu> Write to the Computer Science Department, Technical REport Librarian, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213. From SSUG5 at CLUSTER.SUSSEX.AC.UK Wed Apr 6 14:26:26 1988 From: SSUG5 at CLUSTER.SUSSEX.AC.UK (SSUG5@CLUSTER.SUSSEX.AC.UK) Date: 6-APR-1988 18:26:26 GMT Subject: No subject Message-ID: Hi. I am a Computing/AI student in the School of Cognitive Sciences at Univ. of Sussex, UK. Prof. S.E. Fahlamn at CMU gave out this BB address as of interest to people exploring connectionist/PDP/neural net models. I would be very grateful if you could put me on your mailing list. My EMAIL addresses are JANET ssug5 at uk.ac.sussex.cluster BITNET ssug5 at cluster.sussex.ac.uk CSNET/ARPANET/MILNET ssug5%ssusex.cluster at ucl-cs (the last two being to the best of my knowledge!). Thanks Richard Hall. From HARRYF at VAX.OXFORD.AC.UK Tue Apr 12 11:55:37 1988 From: HARRYF at VAX.OXFORD.AC.UK (HARRYF@VAX.OXFORD.AC.UK) Date: 12-APR-1988 15:55:37 GMT Subject: Addition to mailing list Message-ID: Please relay this digest to me --- Harry Fearnley, Mathematical Institute, Oxford From elman at sdamos.ling.ucsd.edu Tue Apr 12 13:02:22 1988 From: elman at sdamos.ling.ucsd.edu (Jeff Elman) Date: Tue, 12 Apr 88 10:02:22 PDT Subject: TR: Finding structure in time Message-ID: <8804121702.AA25547@amos.ling.ucsd.edu> Center for Research in Language Technical Report 8801/April "Finding structure in time" Jeff Elman Department of Linguistics University of California San Diego Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implicitly by its effects on processing rather than explicitly (as in a spa- tial representation). The current report develops a propo- sal along these lines first described by Michael Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory. In this approach, hidden unit patterns are fed back to themselves; the inter- nal representations which develop thus reflect task demands in the context of prior internal states. A set of simulations is described which range from relatively simple proplems (temporal version of XOR) to dis- covering syntactic/semantic features for words, to the prob- lem of resolving pronominal reference for sentences which conform to Reinhart's (1983) C-command formulation. In the latter case, it is shown that a solution is possible which does not require the symbol-processing invoked by C- command. It is suggested that some aspects of language behavior can be profitably viewed as a complex sequential behavior; the problem of discovering linguistic structure in these cases is then the problem of discovering complex tem- poral structure. ---------------------------------------- Requests should be sent to J. Elman; Dept. of Linguistics, C-008; Univ. of Calif., San Diego; La Jolla CA 92093-0108. Email: elman at amos.ling.ucsd.edu From netlist at psych.Stanford.EDU Thu Apr 14 09:52:50 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 14 Apr 88 06:52:50 PDT Subject: TODAY: SU Adaptive Networks Colloquium Message-ID: REMINDER: TODAY Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS Co-sponsored by the Depts. of Psychology and Electrical Engineering Apr. 14 (Thursday, 3:15pm): DEMITRI PSALTIS "Optical Neural Computers" Caltech 116-18 Pasadena, CA 91125 Additional Information ---------------------- Location: Room 380-380C, which can be reached through the lower level courtyard between the Psychology and Mathematical Sciences buildings. For additional information: Contact Mark Gluck, Bldg. 420-316; (415) 725-2434 or email to gluck at psych.stanford.edu From pollack at nmsu.csnet Thu Apr 14 19:46:38 1988 From: pollack at nmsu.csnet (pollack@nmsu.csnet) Date: Thu, 14 Apr 88 17:46:38 MDT Subject: TR: Recursive Auto-Associative Memory Message-ID: Recursive Auto-Associative Memory: Devising Compositional Distributed Representations Jordan Pollack MCCS-88-124 Computing Research Laboratory New Mexico State University Las Cruces, NM 88003 A major outstanding problem for connectionist models is the representation of variable-sized recursive and sequen- tial data structures, such as trees and stacks, in fixed- resource systems. Some design work has been done on general-purpose distributed representations with some capa- city for sequential or recursive structures, but no system to date has developed its own. This paper presents connectionist mechanisms along with a general strategy for developing such representations automatically: Recursive Auto-associative Memory (RAAM). A modified autoassociative error-propagation learning regimen is used to develop fixed-width representations and access mechanisms for stacks and trees. The strategy involves the co-evolution of the training environment along with the access mechanisms and distributed representations. These representations are compositional, similarity-based, and recursive, and may lead to many new applications of neural networks to traditionally symbolic tasks. Several examples of its use are given. ________________________ To order, send $3 along with a request for "MCCS-88-124" to Technical Reports Librarian Computing Research Laboratory Box 3CRL New Mexico State University Las Cruces, NM 88003 To save the $3 you can bug me about it, or try to get a copy from the person at Yale, Brandeis, UCSD, CMU, UCLA, UMD, UMass, ICSI, or UPenn, who just recently visited Las Cruces. From WEILI%WPI.BITNET at husc6.harvard.edu Fri Apr 15 17:31:46 1988 From: WEILI%WPI.BITNET at husc6.harvard.edu (WEILI%WPI.BITNET@husc6.harvard.edu) Date: Fri, 15 Apr 88 16:31:46 est Subject: Analoge outputs Message-ID: <8804152131.AA18445@wpi.local> I wonder if there are some neural network models which could give analog outputs? From djb at thumper.bellcore.com Sat Apr 16 13:37:56 1988 From: djb at thumper.bellcore.com (David J. Burr) Date: Sat, 16 Apr 88 12:37:56 est Subject: Analog Outputs Message-ID: <8804161737.AA25586@lafite.bellcore.com> Regarding the question of networks with analog outputs I recommend some earlier papers on elastic mapping algorithms (below). These have true analog outputs since they are based on Green's function solutions in continuous media. In contrast to similar self-organized mapping schemes, they ex- hibit full symmetry on the input spaces in the sense that the distance from pattern A to B is equal to the distance from B to A. This is based on a generalization of the "winner-take-all" arrangement to a "symmetric winner-take- all". Extension to an "average of k winners" was suggested in the CGIP paper. Symmetry is important as it uses a "push-pull" force field for more accurate correspondence and hence faster conver- gence. This is useful for pulling together extremities of strokes in character recognition. Popular self-organizing feature maps can be viewed as a special case of elastic map- ping without the symmetry component. Symmetric elastic map- ping recently found optimum solutions to travelling salesman problems using fewer iterations than a competing analog method (Burr, Snowbird, April 1988). D. J. Burr, Computer Graphics and Image Processing, Vol. 15, 102-112, 1981. D. J. Burr, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3, No. 6, 708-713, Nov. 1981. From nowlan at ai.toronto.edu Sat Apr 16 15:03:06 1988 From: nowlan at ai.toronto.edu (Steven J. Nowlan) Date: Sat, 16 Apr 88 15:03:06 EDT Subject: analog outputs In-Reply-To: Your message of Fri, 15 Apr 88 22:36:30 -0400. Message-ID: <88Apr16.150321edt.27298@ephemeral.ai.toronto.edu> With regards to Dave's suggestion that recurrent backprop nets could possibly provide analog CAM, I am just finishing a TR on an application in which a recurrent backprop net was trained to develop stable attractors for a particular set of vectors. The set of vectors corresponded to the set of solution vectors for different instances of the n queens problem, and the network may be treated as performing a constraint satisfaction search from some initial point, or as recovering a stored memory from a very noisy input. Although the sets of vectors stored in this application were binary (desired states 0.1 or 0.9), in principal the method can be used to store analog vectors, the major limitation is probably a constraint on the minimum euclidean distance between the stored vectors, to ensure the attraction basins for each remain distinct. - Steve From terry at cs.jhu.edu Sun Apr 17 01:12:12 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Sun, 17 Apr 88 01:12:12 edt Subject: analog outputs Message-ID: <8804170512.AA14658@crabcake.cs.jhu.edu> Fernando Pineda has recently shown at Snowbird that his recurrent backprop generalization can store analog vectors and retrieve them as content adressable memories. P. Simard and Dana Ballard presented a similar result. Terry ----- From harnad at Princeton.EDU Mon Apr 18 16:40:40 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Mon, 18 Apr 88 15:40:40 est Subject: Associative learning: Call for Commentators (BBS) Message-ID: <8804182040.AA18797@mind.Princeton.EDU> The following is the abstract of a target article to appear in Behavioral and Brain Sciences (BBS). All BBS articles are accompanied by "open peer commentary" from across disciplines and around the world. For information about serving as a commentator on this article, send email to harnad at mind.princeton.edu or write to BBS, 20 Nassau Street, #240, Princeton NJ 08540 [tel: 609-921-7771]. Specialists in the following areas are encouraged to contribute: connectionism/PDP, neural modeling, associative modeling, classical conditioning, operant conditioning, cognitive psychology, behavioral biology, neuroethology. CLASSICAL CONDITIONING: THE NEW HEGEMONY Jaylan Sheila Turkkan Division of Behavioral Biology Department of Psychiatry and Behavioral Sciences The Johns Hopkins University School of Medicine Converging data from different disciplines are showing that the role of classical [associative] conditioning processes in the elaboration of human and animal behavior is larger than previously supposed. Older restrictive views of classically conditioned responses as merely secretory, reflexive or emotional are giving way to a broader conception that includes problem-solving and other rule-governed behavior thought to be under the exclusive province of either operant conditioning or cognitive psychology. There have also been changes in the way conditioning is conducted and evaluated. Data from a number of seemingly unrelated phenomena such as postaddictive drug relapse, the placebo effect and immune system conditioning turn out to be related to classical conditioning. Classical conditioning has also been found in simpler and simpler organisms and has recently been demonstrated in brain slices in utero. This target article will integrate the diverse areas of classical conditioning research and theory; it will also challenge teleological interpretations of classically conditioned responses and will offer some basic principles to guide experimental testing in diverse areas. Stevan Harnad harnad at mind.princeton.edu (609)-921-7771 From gary%cs at ucsd.edu Mon Apr 18 15:46:33 1988 From: gary%cs at ucsd.edu (Gary Cottrell) Date: Mon, 18 Apr 88 12:46:33 PDT Subject: analog outputs Message-ID: <8804181946.AA02444@desi.UCSD.EDU> Does anyone have Pineda's net address or a pointer to a reference on this? g. gary cottrell Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at sdcsvax.ucsd.edu (ARPA) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!sdcsvax!gary (USENET) gwcottrell at ucsd.edu (BITNET) From pollack at nmsu.csnet Mon Apr 18 15:45:56 1988 From: pollack at nmsu.csnet (pollack@nmsu.csnet) Date: Mon, 18 Apr 88 13:45:56 MDT Subject: Analog Content-addressable Memories (ACAM) Message-ID: I have thought about this issue as well, and, at one point tried to build a "noisy auto- associative" network, where randomly perturbed input patterns were mapped back into the pure source. It didn't work too well, but it sounds like Nowlan has gotten something similar to work. One problem with using the normal form of back-prop is that the sigmoid function tends to be like BSB, pushing values to their extrema of 0 and 1. Something which would be really nice would be a generative memory, in the following sense. Given a finite training basis of analog patterns, the resultant ACAM would have a theoretically infinite number of attractor states, which were in some sense "similar" to the training patterns. Its possible that this type of memory already exists, but was considered a failed experiment by a dynamical systems researcher. (Similarly, "slow glass" may already exist in the failed experiments of a chemist at Poloroid!) This type of ACAM would be nice, say, if one were storing analog patterns which represented, say, sentence meanings. The resultant memory might be able to represent a much larger set of meanings as stable patterns. I have, as yet, no idea how to do this. Jordan From djb at thumper.bellcore.com Tue Apr 19 11:38:36 1988 From: djb at thumper.bellcore.com (David J. Burr) Date: Tue, 19 Apr 88 10:38:36 est Subject: ACAM Message-ID: <8804191538.AA28586@lafite.bellcore.com> Doesn't Hinton's weight decay scheme help to keep the activations from going to their extreme values of 0 and 1? This would seem to make a feed forward net more analog-like, since the sigmoids try to remain more linear rather than step-like. From alexis%yummy.mitre.org at gateway.mitre.org Tue Apr 19 16:47:46 1988 From: alexis%yummy.mitre.org at gateway.mitre.org (alexis%yummy.mitre.org@gateway.mitre.org) Date: Tue, 19 Apr 88 16:47:46 EDT Subject: ACAM Message-ID: <8804192047.AA00608@marzipan.mitre.org> > Doesn't Hinton's weight decay scheme help to keep the activations > from going to their extreme values of 0 and 1? This would seem > to make a feed forward net more analog-like, since the sigmoids > try to remain more linear rather than step-like. Actually not. Weight decay causes weights that are no longer being used to disappear; but unless it's turned up unduely high it does not stop you from getting out of the linear region of the nodes (which is good since generally a network of linear nodes is rather boring from a computational point of view). Weight decay does make it easier though, to develop analog mappings with a feedforward network. From terry at cs.jhu.edu Tue Apr 19 16:59:57 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Tue, 19 Apr 88 16:59:57 edt Subject: analog outputs Message-ID: <8804192059.AA25137@crabcake.cs.jhu.edu> A number of you have asked for more information about the recent results with recurrent back-prop and content-adressable analog memories. Pineda, F., Phys. Rev. Letters 59, 2229 (1987). The work he reported at Snowbird is available from him in reprint form: Dr. Fernando Pineda Applied Physics Laboratory Johns Hopkins University Johsn Hopkins Road Laurel, MD 20707 P. Simard and D. Ballard presented a paper at the Snowbird meeting earlier in April. I'm not sure if a preprint is available, but you can write to them at the Computer Science Department, University of Rochester, Rochester, NY Terry ----- From phil at mirror.TMC.COM Tue Apr 19 17:12:24 1988 From: phil at mirror.TMC.COM (Phil Madsen) Date: Tue, 19 Apr 88 17:12:24 EDT Subject: ACAM Message-ID: <8804192112.AA28475@prism.TMC.COM> OK. Sounds good. From marvit%hplpm at hplabs.HP.COM Wed Apr 20 16:25:38 1988 From: marvit%hplpm at hplabs.HP.COM (Peter Marvit) Date: Wed, 20 Apr 88 13:25:38 PDT Subject: One to Many? Message-ID: <19284.577571138@hplpm> Most (nay all) of the simple connectionist networks which allow come sort of pattern association are a one-to-one or many-to-one mapping. For example, given a set of inputs, light the "odd" or "even" output unit or produce *the* past tense of a word given its root. What simple architecture exist for allowing a one-to-many mapping (e.g., given a, b or c are allowed -- perhaps with equal frequency). Why would one want this? I'm trying to explore different mechanisms for pluralizing nouns (extending some of Rumelhart and McClelland's work in verb past tense generation.). Unfortunately, some words (e.g., "fish") have alternate forms. So far, I've just thrown away the "less common" plurals but it feels unsatisfactory in the general case. I'm also using the generalized delta rule with a standard back-propagation system; eventually, I'll experiment with other architectures. A few observations and random thoughts: A one-to-many may actually be a one-to-one with given context. Thus my example may prefer "fish" when speaking about a "school of f." but prefer "fishes" when talking about "three f." This seems to unduly restrict input, however. One could think of an extension of this problem as "give all strings in a list which match this substring." One could construct a network with some units acting as "multiple answer" flags which, if activated, would stochastically decide which "answer" to be the output pattern. This feels klugey and inelegant. It also violates my original request for a "simple architecture". Further, in the case of a back-propagation network, it introduces the sole element of chance into an otherwise completely deterministic system. Is such a mixed mode system realistic? It may be that the simpler architectures are unsuitable for this. If not, what would be the simplest "mixed architecture" which would handle the problem? What other applications would use a one-to-many? -Peter Marvit HP Labs (part time) U.C. Berkeley (part time) From WEILI%WPI.BITNET at husc6.harvard.edu Thu Apr 21 22:01:14 1988 From: WEILI%WPI.BITNET at husc6.harvard.edu (WEILI%WPI.BITNET@husc6.harvard.edu) Date: Thu, 21 Apr 88 21:01:14 est Subject: The Paper Message-ID: <8804220201.AA22110@wpi.local> Sorry to broadcast this mail, but I failed to send my mail to Eric Saund. Hi, Saund ( Saund at OZ.AI.MIT.EDU): I am very interested in the paper you mentioned in the mail and would like to get a copy of it. My address is: Wei Li Electrical Engineering Department Worcester Polytechnic Institute Worcester, MA. 01609 BTW, what are you doing for your research now? I will do something related to computer vision using neural network models. Thank you very much. -- Wei Li From 8414902 at UWACDC.ACS.WASHINGTON.EDU Thu Apr 21 15:31:00 1988 From: 8414902 at UWACDC.ACS.WASHINGTON.EDU (TERU) Date: Thu, 21 Apr 1988 12:31 PDT Subject: Novelty of Neural Net Approach Message-ID: .. a deeper methematical study of the nervous system ... will affect our understanding of the aspects of mathematics itself that are involved. - John von Neumann, The Computer and the brain. In the following is my attempt to list the differences between conventional systems and neural network (or connectionist) systems. The intent is to point out the novelty of neural network approach. Although simplistically stated, it may serve as a seed for discussion. Your comments, opinions and criticism are most welcomed. (The word "conventional" is used here in a very loose sense only to highlight the characteristics of neural net approach.) Conventional Systems Neural Net (Connectionist) Systems 1. linear ( in analog) pseudolinear logical ( in digital) softlogic 2. try to eliminate noise try to utilize noise (suffer from noise) (immune to noise) 3. usually need reliable work with unreliable components components 4. need designed fault- have built-in fault-tolerance tolerance 5. emphasize economy of emphasize redundancy operation 6. want sharp switches use dull switches (for digital) (with sigmoid function) 7. usually operate synchronously may work asynchronously easily under a global clock without a global clock 8. have complex structure have rather uniform structure 9. need to decompose (if possible) have built-in paralellism the process for parallel processing 10. designed or programmed with learn from examples; rules specifying the system may self-organize behavior In designing a system, engineers have been working very hard to linearlize the system, eliminate noises, produce reliable components, etc. The issues listed above are very important ones in system design today. It is interesting to see that neural network systems offer an alternative approach in every issue listed above. In some cases, the approach is opposite in its direction. Teru Homma Univ. of Washington, FT-10 Seattle, WA 98195 From harnad at Princeton.EDU Fri Apr 22 01:21:27 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Fri, 22 Apr 88 00:21:27 est Subject: Society for Philosophy and Psychology, Annual Meeting Message-ID: <8804220521.AA11680@mind.Princeton.EDU> Society for Philosophy and Psychology: 14th Annual Meeting Thursday May 19 - Sunday May 22 University of North Carolina, Chapel Hill Contributors will include Jerry Fodor, Ruth Millikan, Colin Beer, Robert Stalnaker, Paul Churchland, Lynn Nadel, Michael McCloskey, James Anderson, Alan Prince, Paul Smolensky, John Perry, William Lycan, Alvin Goldman Paper (PS) and Symposia (SS) on: Naturalism and Intentional Content (SS) Animal Communication (SS)_ The Child's Conception of Mind (PS) Cognitive Science and Mental State, Wide and Narrow (PS) Logic and Language (PS) Folk Psychology (PS) Current Controversies: Categorization and Connectionism (PS) Current Controversies: Rationality and Reflexivity (PS) Neuroscience and Philosophy of Mind (SS) Connectionism and Psychological Explanation (SS) Embodied vs Disembodies Approaches to Cognition (SS) Emotions, Cognition and Culture (SS) Naturalistic Semantics and Naturalized Epistemology (PS) Registration is $30 for SPP members and $40 for nonmembers. Write to Extension and Continuing Education CB # 3420, Abernethy Hall UNC-Chapel Hill Chapel Hill NC 27599-3420 Membership Information ($15 regular; $5 students): Professor Patricia Kitcher email: ir205%sdcc6 at sdcsvax.ucsd.edu Department of Philosophy B002 University of California - San Diego La Jolla CA 92093 From tap at nmsu.csnet Fri Apr 22 02:56:19 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 22 Apr 88 00:56:19 MDT Subject: One to Many? In-Reply-To: Peter Marvit's message of Wed, 20 Apr 88 13:25:38 PDT <19284.577571138@hplpm> Message-ID: It seems that what you want could be supplied by any associative memory that was capable of pattern completion. Examples are Hopfield networks, Willshaw networks, and Anderson's "Brain state in a box" networks. These networks are all best used for this purpose as recurrent relaxation networks, in contrast to the feed-forward networks you seem to have considered. To see that these can be used to store a one-to-many mapping, let A, B, etc. be bit-vectors and then consider storing the following vectors: AP AQ BR BS BT Now, if A_ (where '_' means zero or random) was used as a key then there are two patterns which *might* be retrieved; AP or AQ, but *only one* of these *would* be retreived in a *particular* relaxation. If B_ was used as a key there are three patterns which might be retrieved. Another way of putting this is that there are multiple attractors which have A as part of their description. There might be some problems with trying to make different attractors out of states which are very close, but there are ways to cope with this. Kawamoto (Alan H. Kawamoto, 1986, "Resolution of Lexical Ambiguity Using a Network that Learns", Dept. of Psychology, CMU) used a brain state in a box model for associating spelling, phonetic, part-of-speech, and "semantic" information about words. The system coped with ambiguity, so two different words could have the same spelling, and the initial perturbations of the system ("context") would determine which state the system settled to. If there were no initial perturbations, then the system would settle to the state it had been most frequently trained on. Easier-to-find descriptions of other associative networks can be found in the PDP books, for example, the room-"schema" example of Rumelhart, Smolensky, McClelland, and Hinton in chapter 14. Tony Plate ----------------------------- Tony Plate Computing Research Laboratory Box 30001 New Mexico State University Las Cruces, New Mexico 88003 (505) 646-5948 CSNET: tap%nmsu From goddard at aurel.caltech.edu Fri Apr 22 14:26:37 1988 From: goddard at aurel.caltech.edu (goddard@aurel.caltech.edu) Date: Fri, 22 Apr 88 11:26:37 -0700 Subject: [TERU: Novelty of Neural Net Approach] Message-ID: <8804221826.AA05192@aurel.caltech.edu> > Conventional Systems Neural Net (Connectionist) Systems > > >8. have complex structure have rather uniform structure >.. >10. designed or programmed with learn from examples; > rules specifying the system may self-organize > behavior Real (biological) neural networks have very complex computational structure. Ask any computational neuroscientist. Connectionist networks will need complex structure to solve hard problems. There is also a significant amount of genetically determined structure in real neural systems. We will also have to pre-structure artificial systems, to bootstrap the subsequent learning. Nigel Goddard From hinton at ai.toronto.edu Thu Apr 21 01:00:21 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Thu, 21 Apr 88 01:00:21 EDT Subject: One to Many? In-Reply-To: Your message of Wed, 20 Apr 88 13:45:38 -0400. Message-ID: <88Apr22.112158edt.27106@ephemeral.ai.toronto.edu> The boltzmann machine architecture allows one-to-many mappings in an unkludgy way, and its a simple architecture. If you insist on being deterministic, you can get BP to give a probability distribution over a set of possible answers, provided each possible answer has its own output unit. Unlike the Boltzmann machine, BP cannot represent probabilites of combinations of output units. For example, a Boltzmann machine can give high probability to the output vectors 01 and 10 and low probability to 11 and 00. BP cannot do this. Geoff From tap at nmsu.csnet Fri Apr 22 20:33:12 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 22 Apr 88 18:33:12 MDT Subject: Discussion: Numbers - Connectionist Symbols analogy Message-ID: Consider the following analogy between ways of representing numbers and connectionist representations (proposed to me by a professor of psychology at NMSU, Roger Schvanevelt). There are many ways of representing numbers. '238', '11101110', 'CCXXXIIX', 'two-hundred and thirty-eight', 'e^5.4722706736' (e^ln(238)), 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', and 'X', a pointer to any of the above, are all representations of 238. Some are more useful than others, and sometimes the usefulness of a representation depends on what we want to do with it. All of these representations for 238, with the exception of the last two, have some structure. What makes some of these representations useful in certain situations is that the structure of the representation itself makes immediately apparent the relevant properties of the thing it refers to. And when all properties of the symbol are irrelevant to what is being done with it, the pointer representation is a perfectly adequate representation. The roman numeral representation is horrible for arithmetic, (though quite suitable for some other tasks, such as labelling). In Roman times very few people knew how to multiply, and one reason was that the algorithm for multiplying with roman numbers is very long and tedious, and difficult to understand and remember. Some historians have suggested that the Roman's representation for numbers is the reason that their acheivements in arithmetic and mathematics did not match their technical acheivements in other areas. Now, taking language as a domain, and words and their meanings as the things to be represented, what is it that some people like about connectionist representations of them? I think it is that the connectionist representations make the relevant properties immediately apparent. This is the case in distributed 'micro-feature' representations of words and meanings. So, the upshot of this analogy is that doing AI with list-based representations is like doing arithmetic with roman numbers, i.e. possible, but difficult and a hindrance to the development of the field. And the final question is: Can connectionism provide the "positional base-encoding" for symbols that represent the objects that AI needs to manipulate?. All of these points have been made previously, but it seems to me that putting them in the context of this analogy adds a certain (false?) coherence and force to them. Comments? ----------------------------- Tony Plate Computing Research Laboratory Box 30001 New Mexico State University Las Cruces, New Mexico 88003 (505) 646-5948 CSNET: tap%nmsu From jose at tractatus.bellcore.com Mon Apr 25 10:19:16 1988 From: jose at tractatus.bellcore.com (Stephen J. Hanson) Date: Mon, 25 Apr 88 10:19:16 EDT Subject: One to Many? Message-ID: <8804251419.AA06160@tractatus.bellcore.com> .... Unlike the Boltzmann machine, BP cannot represent probabilites of combinations of output units. For example, a Boltzmann machine can give high probability to the output vectors 01 and 10 and low probability to 11 and 00. BP cannot do this. ------ However, this is just the sort of thing one can do with multiple hidden layers, in which the next hidden layer is making (certainty/uncertainty) decisions about combinations of units below....of course with the caveat that learning time increases and learning efficacy decreases...comparatively BMs take a bit of training time as well.. Steve (jose at bellcore.com) From tenorio at ee.ecn.purdue.edu Mon Apr 25 10:35:31 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Mon, 25 Apr 88 09:35:31 EST Subject: One to Many? In-Reply-To: Your message of Thu, 21 Apr 88 01:00:21 EDT. <88Apr22.112158edt.27106@ephemeral.ai.toronto.edu> Message-ID: <8804251435.AA14986@ee.ecn.purdue.edu> >> >> If you insist on being deterministic, you can get BP to give a probability >> distribution over a set of possible answers, provided each possible answer >> has its own output unit. Unlike the Boltzmann machine, BP cannot represent >> probabilites of combinations of output units. For example, a Boltzmann >> machine can give high probability to the output vectors 01 and 10 and low >> probability to 11 and 00. BP cannot do this. >> >> Geoff It seems that this BP characteristic has to do more with the architectural structure (number of links) than with the algorithm itself. An interesting experiment would be to enhance the architecture by allowing links between units in the same layer (input or output), and BP, or even a generalized version of it (i.e. Almeida ICNN87). If you allow the information to be captured by the proper number of links, the network will make a good attempt at it. This is the same message we got from using a hidden layer. --ft. From netlist at psych.Stanford.EDU Tue Apr 26 12:22:23 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Tue, 26 Apr 88 09:22:23 PDT Subject: TODAY: SU Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS ************************************************************************** Apr. 26 (Tuesday, 3:15pm): CHRISTOF KOCH "Computing Optical Flow in Man and Machine" Div of Biology; 216-76 Calif. Inst. of Technology Pasadena, Ca 91125 ************************************************************************** ABSTRACT The key problem in computing motion from the time-varying image intensity is the aperture problem. Using a form of smoothness constraint, i.e. the computed opical flow should be (1) compatible with the measured data, and (2) as smooth as possible, leads to a variational formulation. The resulting energy functional can be minimized using different networks. Choosing an "analog" or "frequency" representation of velocity leads to a simple resistive network, built out of linear resistances and current and voltage sources. We are currently implementing these networks into VLSI circuits. Choosing the "place" or "unit" representation in agreement with cortical physiology, leads to a different network, with a much higher connectivity and non-linear neurons. We are mapping these neuronal networks onto the primate's visual system, simulating the X and Y pathways into V1 and the subsequent motion field computation in MT. * * * Location: Room 380-380C which can be reached through the lower level courtyard between the Psychology and Mathematical Sciences buildings. Information: To be placed on an electronic mail distribution list for information about these and other adaptive network events in the Stanford area, send email to netlist at psych.stanford.edu. For additional information, contact Mark Gluck, Bldg. 420-316; (415) 725-2434 or email to gluck at psych.stanford.edu From tenorio at ee.ecn.purdue.edu Tue Apr 26 12:58:42 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Tue, 26 Apr 88 11:58:42 EST Subject: One to Many? In-Reply-To: Your message of Tue, 05 Apr 88 00:00:37 EDT. <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> Message-ID: <8804261658.AA01192@ee.ecn.purdue.edu> >> >> I disagree. Its nothing to do with the architecture. Its simply >> that deterministic units cannot REPRESENT higher-order statistics >> over the output units. >> >> Geoff >> Let me explain my point of view. I hope I understand you argument as posed by your first email. Basically the stochastic machinery used in the BM capture what would appear to be a covariance between the units. We have been try to extend the BM to the continuous case, and that seem to be true. Now in the case of deterministic units, given the proper transfer function (non-linearity)and the proper information (required number of links between units, one can design networks to capture a variety of different features. Notice that deterministic units might not necessary be using a sigmoid function, but they can used a series of more complex parameterized transfer functions, such as, the GMDH algorithm (Molnar ICNN87), or spherically and polynomial graded units (Hansen and Burr gte TR 87), or even the Multivariate Normal Distribution units that we are experimenting with. Some problems, with special characteritics of the input pattern, allows the regular quasi-integrator to define a function similar to a Bayes classifier which optimizes MAP. Of course, if such retrictions on the input type are removed, the transfer function has to adequately be modified and sometimes more links are also required to capture certain statistical characteristics. I really don't see how that is only a function of whether the net is DET or STOCH, but rather of the unit transfer function and architectural characteristics. If you modify the connection scheme in the BM, it would no longer capture the same form of statistics, although the algorithm you remain the same (sort of obvious, I guess). Similarly, if links are added between output units in DET units, interdependence would be more easily captured. Would could even imagine schemes where output unit activation would go to a context unit, and then back to the output unit (similar to JLElman CRL TR8801 UCSD), to capture temporal covariances. Even simpler would be interunit links with a momentum term set for about 1 cycle. --ft From hinton at ai.toronto.edu Tue Apr 5 00:00:37 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 5 Apr 88 00:00:37 EDT Subject: One to Many? In-Reply-To: Your message of Mon, 25 Apr 88 08:35:31 -0400. Message-ID: <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> I disagree. Its nothing to do with the architecture. Its simply that deterministic units cannot REPRESENT higher-order statistics over the output units. Geoff From mike%bucasb.bu.edu at bu-it.BU.EDU Tue Apr 26 18:50:55 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Tue, 26 Apr 88 18:50:55 EDT Subject: INNS 89 Conference Message-ID: <8804262250.AA20211@bucasb.bu.edu> April 26, 1988 GOOD NEWS FOR THE NEURAL NETWORK COMMUNITY! There are now over 2000 members of the International Neural Network Society from 34 countries and 47 states of the U.S.A. The INNS is thus beginning to fulfill its purpose of offering our community an intellectual home of its own. In particular, over 500 abstracts were submitted to the 1988 First Annual INNS meeting in Boston, to be held on September 6--10, 1988, at the Park Plaza Hotel. The abstracts cover the full spectrum of topics in the neural network field. While many are working hard on the final program and plans for the 1988 meeting, we also needed to plan further ahead. Accordingly, the INNS Governing Board approved holding the Second Annual INNS Meeting in Washington, DC, on September 5--9, 1989, and we have negotiated a contract with the Omni Shoreham Hotel. See you in Boston in '88 and Washington in '89! Steve Grossberg, President, INNS Demetri Psaltis, Vice President, INNS Harold Szu, Secretary-Treasurer, INNS ---- Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From tenorio at ee.ecn.purdue.edu Tue Apr 26 20:25:14 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Tue, 26 Apr 88 19:25:14 EST Subject: One to Many? In-Reply-To: Your message of Tue, 05 Apr 88 00:00:37 EDT. <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> Message-ID: <8804270025.AA14453@ee.ecn.purdue.edu> I am sorry for the mistake in the reference of the previous message. The second report is: Hason, S. J. and Burr, D. J. Knowledge Representation in Connectionist Networks. Technical Report Bell Communication Research, 435 So. St. Morristown, NJ, 07960. --ft. From terry at cs.jhu.edu Wed Apr 27 09:40:06 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Wed, 27 Apr 88 09:40:06 edt Subject: NETtalk database Message-ID: <8804271340.AA29415@crabcake.cs.jhu.edu> There have many requests for the NETtalk database. A training dictionary of 20,000 words marked with phonemes and stresses is now available from: Kathy Yantis Cognitive Science Center Johns Hopkins University 34th and Charles Streets Baltimore, MD 21218 Please specify the media you want: 1/2" tape, 9 track 1600, 3200 or 6250 bpi UNIX or ANSI labelled (VMS compatible) 1/4" Sun cartridge (Quick-11, TAR) 5 1/4" 1.2 MB floppy (MS-DOS) Enclose a check or money order for $50 to cover costs made out to: Johns Hopkins Cognitive Science Center. Terry Sejnowski ----- From terry Wed Apr 27 09:30:36 1988 From: terry (Terry Sejnowski ) Date: Wed, 27 Apr 88 09:30:36 edt Subject: INNS 89 Conference Message-ID: <8804271330.AA29281@crabcake.cs.jhu.edu> There have many requests for the NETtalk database. A training dictionary of 20,000 words marked with phonemes and stresses is now available from: Kathy Yantis Cognitive Science Center Johns Hopkins University 34th and Charles Streets Baltimore, MD 21218 Please specify the media you want: 1/2" tape, 9 track 1600, 3200 or 6250 bpi UNIX or ANSI labelled (VMS compatible) 1/4" Sun cartridge (Quick-11, TAR) 5 1/4" 1.2 MB floppy (MS-DOS) Enclose a check or money order for $50 to cover costs made out to: Johns Hopkins Cognitive Science Center. Terry Sejnowski ----- From dad at cs.brown.edu Thu Apr 28 15:09:31 1988 From: dad at cs.brown.edu (David A. Durfee) Date: Thu, 28 Apr 88 15:09:31 EDT Subject: I'd like to join the mailing lists Message-ID: From mike%bucasb.bu.edu at bu-it.BU.EDU Wed Apr 27 20:24:30 1988 Received: by cs.brown.edu (5.51/1.00) id AA29649; Wed, 27 Apr 88 20:24:24 EDT Received: from bucasb.bu.edu by bu-it.BU.EDU (4.0/4.7) id AA16976; Wed, 27 Apr 88 20:22:55 EDT Return-Path: Received: by bucasb.bu.edu (5.51/4.7) id AA08789; Wed, 27 Apr 88 20:23:27 EDT Date: Wed, 27 Apr 88 20:23:27 EDT From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Message-Id: <8804280023.AA08789 at bucasb.bu.edu> To: dad at cs.brown.edu In-Reply-To: "David A. Durfee"'s message of Wed, 27 Apr 88 17:25:59 EDT <8804272329.AA16310 at bu-cs.bu.edu> Subject: I'm very interested in Neural Nets I don't manage a mailing list. Try to subscribe to one of lists in my message like neuron at ti-csl.csc.ti.com or connectionists at c.cs.cmu.edu Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From 8414902 at UWACDC.ACS.WASHINGTON.EDU Fri Apr 29 02:15:00 1988 From: 8414902 at UWACDC.ACS.WASHINGTON.EDU (TERU) Date: Thu, 28 Apr 1988 23:15 PDT Subject: Structure of Neural Nets Message-ID: >Real (biological) neural networks have very complex computational structure. >Ask any computational neuroscientist. Connectionist networks will >need complex structure to solve hard problems. There is also a >significant amount of genetically determined structure in real neural >systems. We will also have to pre-structure artificial systems, to >bootstrap the subsequent learning. > >Nigel Goddard Surely biological neural nets have complex structure: several types of neurons, synapses, neurotransmitters, ion-channels, distinct layers, nuclei, commissures, etc. I agree that the artificial neural nets need these structural complexity to solve real-world problems. Certainly research activities are pointing to that direction. The benefit of layered structure has already been well-known. Modular structure is interesting so that it expands the net's heterogeneous structure while maintaining rather uniform structure in a layer within a module. Each module may be treated and controlled as a unit at another level. Another step, for example, will be to somehow build functionally different types of neurons in the net. I will appreciate comments and pointers concerning these issues of hierarchy, modularity and neuron-types in neural nets biological or artificial. - Teru Homma From dad at cs.brown.edu Fri Apr 29 07:10:14 1988 From: dad at cs.brown.edu (David A. Durfee) Date: Fri, 29 Apr 88 07:10:14 EDT Subject: could I be added to your mailing list? Message-ID: From mike%bucasb.bu.edu at bu-it.BU.EDU Wed Apr 27 20:24:30 1988 Received: by cs.brown.edu (5.51/1.00) id AA29649; Wed, 27 Apr 88 20:24:24 EDT Received: from bucasb.bu.edu by bu-it.BU.EDU (4.0/4.7) id AA16976; Wed, 27 Apr 88 20:22:55 EDT Return-Path: Received: by bucasb.bu.edu (5.51/4.7) id AA08789; Wed, 27 Apr 88 20:23:27 EDT Date: Wed, 27 Apr 88 20:23:27 EDT From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Message-Id: <8804280023.AA08789 at bucasb.bu.edu> To: dad at cs.brown.edu In-Reply-To: "David A. Durfee"'s message of Wed, 27 Apr 88 17:25:59 EDT <8804272329.AA16310 at bu-cs.bu.edu> Subject: I'm very interested in Neural Nets Status: RO I don't manage a mailing list. Try to subscribe to one of lists in my message like neuron at ti-csl.csc.ti.com or connectionists at c.cs.cmu.edu Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From BISON%HNYKUN53.BITNET at VMA.CC.CMU.EDU Fri Apr 29 15:56:00 1988 From: BISON%HNYKUN53.BITNET at VMA.CC.CMU.EDU (BISON%HNYKUN53.BITNET@VMA.CC.CMU.EDU) Date: Fri, 29 Apr 88 15:56 N Subject: subscription Message-ID: I would like to join the connectionist mailing list. At the moment I'm working on a connectionist model to simulate aphasic language production. Thanks, Pieter Bison Department of Psychology University of Nijmegen P/O box 9104 BITNET: bison at hnykun53 6500 HE Nijmegen, Netherlands From wfreeman%garnet.Berkeley.EDU at violet.berkeley.edu Fri Apr 29 23:58:16 1988 From: wfreeman%garnet.Berkeley.EDU at violet.berkeley.edu (wfreeman%garnet.Berkeley.EDU@violet.berkeley.edu) Date: Fri, 29 Apr 88 20:58:16 pdt Subject: No subject Message-ID: <8804300358.AA23282@garnet.berkeley.edu> To: connectionists 29 apr 88 From: wfreeman at garnet Re: a physiologist's view of connectionism i'd like some feedback on this essay before it gets frozen into print and invite commentary of any sort. thanks in advance, walter Why neural networks don't yet fly: inquiry into the neurodynamics of biological intelligence. Walter J Freeman Department of Physiology-Anatomy University of California Berkeley CA 94720 USA 2nd Annual Intern. Conf. on Neural Networks San Diego CA 23 - 27 July 1988 Abstract Sensory and perceptual information exists as space-time patterns of neural activity in cortex in two modes: axonal pulses and dendritic currents. Which one we observe depends on the experimental techniques we choose in order to make our observations. The brain does its analysis of sensory input, as in feature extraction and preprocessing, in the medium of action potentials as point processes in networks of individual neurons. It does syn- thesis of its sensory input with past experience and ex- pectancy of future action in the medium of dendritic in- tegration in local mean fields. Both kinds of activity are found to coexist in olfactory and visual cortex, each preceding and then following the other. The transforma- tion of information from the pulse mode to the dendritic mode involves a state transition of the cortical network that can be modeled by a Hopf bifurcation in both software and hardware embodiments. This state transition appears to be an essential precursor to an act of neural pattern classification. However, the models suggest that the classification of a given stimulus into one of several learned classes is done by a mapping of the stimulus into a landscape that has been shaped by prior learning, and that it is not done by a multiple bifurcation into one of a collection of limit cycle attractors at the moment of choice. Introduction The strongest justification at present for the study of neural net- works is the inspiration they draw from the performance characteristics of their biological cousins. Yet it is often unclear what is to be copied and what omitted. John Denker among others has pointed out that both birds and airplanes have wings, but that only birds have feathers. While it is true that brains and neural networks share certain structural features such as massive parallelism, biological networks solve complex problems easily and creatively, and existing neural networks do not. Whereas Wilbur and Orville Wright solved first the problems of lift and then of control in flight, neural networkers have solved only the problems of statics and not the problems of dynamic control. Neural networks have not yet begun to soar. One reason I will argue here is that most theoreticians have pursued a false goal of stability, and have not reckoned with the intrinsic instabil- ity of wetware brains that enables their remarkable adaptiveness. A related key limitation in many current approaches is the lack of application by engineers of the hierarchical modes in which wetware brains sustain information for storage, transformation, and other operations. There are two documented modes that in some senses are diametrically op- posed but in other senses are strongly complementary. Probably others ex- ist, but they need not concern us here. One is typified by the action po- tential and the point process, the other by the synaptic potential and the local mean field. In sensory systems the one is the basis for feature ex- traction, preprocessing and analysis. The other is the basis for experien- tial integration, classification and synthesis. Neither can supplant or function without the other. They coexist in the same layers of neurons, and whether we observe one or the other depends on how we acquire, process and measure our biological data. My aim in this brief review is to exemplify these two modes of infor- mation, describe how they are derived from brains and how they are convert- ed each to the other, and explain their significance for the design of new and more successful neural networks. Examples of biological information Information in biological networks of the kind I am concerned with here takes the form of space-time neural activity patterns. Each pattern is relational and neither symbolic nor representational. It does not "stand for" something outside the brain, as a letter does in an alphabet, nor does it reside in fixed form as a goal or a "teacher". It is a dynamic process that mediates adaptive behavior. It results from a stimulus and in some sense causes a response, but it also incorporates past experience and the intent of future action. These being private and unique to each brain, we cannot in principle as observers know the exact information content of each pattern or even the coordinate system in which it is embedded. What we can do is to establish statistically the relation of a given space-time pattern of neural activity to an antecedent or a consequent event in the outside world. We do this by repeatedly presenting each of two or more stimuli to a subject and then demonstrating some invariant con- tiguity between each stimulus and a consequent neural activity pattern. Because we do not know the metric of the internal computational spaces, we must collect numerous input-output pairs and rely on statistical invariants that emerge from one or another form of ensemble averaging. For the point process each ensemble is collected over time from one or more points, and for the mean field it is collected simultaneously at multiple points in space in the form of a set of recordings in time. The distinction is cru- cial though subtle. I will cite examples from the primary visual cortex and from the ol- factory bulb, a specialized form of sensory cortex that is located close to the input of the olfactory system. The paradigmatic experiment in the pulse mode in olfaction consists in locating a single neuron in the bulb with a microelectrode, presenting in succession odorants A, B, C,... at the same or different concentrations, and measuring the pulse firing rate of the neuron. This is repeated for neuron i, ii, iii,... at different spatial locations in the bulb. The results are presented in the form of a table, which shows that each odorant over some concentration range (typically narrow) excites some neurons but not most others, indicating that each odorant establishes a spatial pattern of selective activation in the bulb, putatively resembling a constellation of stars in the night sky, although each neurons typically responds to a variety of odorants. This is a form of labeled line coding, with pulse rate or probability as the state variable for each line, channel or axon.. The paradigmatic experiment in the wave mode is to record the elec- troencephalogram (EEG) from an array of macroelectrodes (optimally 64) placed on the surface of the bulb. All of the simultaneously recorded EEG traces contain a common waveform or carrier that differs across the spatial array in amplitude. Odorant-specific patterns of amplitude i, ii, iii,... are seen to recur on presentation of odorants A, B, C,..., but only if the subjects are trained to discriminate them each from the others and only if they are motivated to do so (Skarda & Freeman, 1987). Learning and arousal are both essential. The odorant information is expressed as a spatial am- plitude modulation of the common carrier for the duration of a sniff, on the order of 0.1 sec. It can be likened to a monochromatic half-tone pic- ture in a newspaper. The information density is spatially uniform, because no one dot in the picture carries by its size any more or less information than any other. The carrier is identified by making a spatial ensemble average of the 64 traces that are recorded during a sniff and then regress- ing this ensemble average onto each unaveraged EEG trace to derive its am- plitude coefficient. One cannot use time ensemble averaging over sniffs, because the spectrum of the carrier and its phase relations to the initiat- ing stimulus vary unpredictably across inhalations. The result of measure- ment is a 64x1 vector that expresses the spatial pattern of amplitude. These two kinds of information, pulse and wave, coexist in each area of the bulb, and in other stages of the olfactory system as well. Whether one sees the one or the other kind depends on the experimental procedures that one uses, which in turn depends on one's goals and hypotheses. Comparable results hold for the primary visual cortex. The well-known paradigm in the pulse mode is to measure the pulse rate of a single neuron while repeatedly presenting patterned light stimuli to the retina so as to define its receptor field. This is repeated for a large number of neurons, and the results are presented in the form of graphs showing the spatial structures of orientation and ocular dominance columns and the topographic mapping of the retina onto the dozen or more specialized subareas of the visual cortex for color, motion detection, etc. The inference is made that "features" of the visual world are extracted and mapped spatially onto the cortex in the firing rates of labeled lines. The information is said to be encoded in the pulse trains of single neurons. Activity in the wave mode is likewise recorded with arrays of ma- croelectrodes on the visual cortex of an awake, motivated, trained Rhesus monkey (Freeman & van Dijk, 1987). A common carrier is retrieved from the EEG traces by linear decomposition, and its spatial pattern is expressed in the matrix of coefficients that are obtained by fitting each trace to the spatial ensemble average. A specific, identifiable spatial pattern of am- plitude modulation is found to recur on each trial when the motivated sub- ject is inferred to be discriminating a specific visual cue. This is evi- dence for distributed coding of information similar to the wave mode of ol- factory coding. Evidence for this mode of activity has also been found re- cently in the visual cortex of the cat (Gray & Singer, 1988). Again, it is apparent that these two kinds of information in the pulse and wave modes coexist in the cortex. Neural mechanisms of transformation In this section I will consider the mechanisms by which the discrete activity in the pulse mode is transformed into the wave mode and then back again. In doing so I will draw on experiments in software (Freeman & Yao, 1988) and hardware (Freeman, Eisenberg & Burke, 1987) modeling of cortical dynamics. I will argue that the activity patterns in the pulse mode con- stitute the end result of stimulus analysis by neural preprocessing, and that the patterns in the wave mode manifest the results of spatial integra- tion of the pulse activity with past experience and present motivational state. The conversion from pulses to waves takes place at synapses. There are many kinds and locations of modifiable synapses, two of which are par- ticularly important for information processing. One type is the primary synapse between an incoming axon and its target cortical neuron. It is subject to change in respect to recent use by nonspecific facilitation or by attenuation in respect to the local volume of input into a neighborhood. Attenuation is a multiplicative form of inhibition that operates in processes of dynamic range compression and signal normalization. The other type supports the long range excitatory connections that form innumerable feedback loops of mutual excitation within a cortical layer. These secondary synapses among cortical neurons are subject to change in respect to associative learning in accordance with some variant of the Hebb rule. The matrix of numbers representing the strengths of synaptic action corresponds to the W matrix of Amari (1977) and the T ma- trix of Hopfield (1982). When an animal is trained to discriminate an odor A, B, C,... a Hebbian nerve cell assembly is formed among the cortical neu- rons by the strengthened synaptic connections between each pair of coac- tivated neurons (Freeman, 1968, 1975). This nerve cell assembly is a basis for the classification of odorants by trained subjects (Skarda & Freeman, 1987). Conversion of waves to pulses occurs at trigger zones of neurons, where the sum of dendritic currents regulates the firing rate of each neu- ron. The relation between membrane current and pulse density, both of which are continuous variables, has the form of a sigmoid. The range between threshold (zero firing rate) and asymptotic maximum is much nar- rower than a comparable input-output sigmoid relation at the synapses, so that as a general rule the pulse-wave conversion takes place in a small- signal near-linear range. It follows from this and related considerations that the operation of the local neighborhood can be expressed as a linear time-invariant integrator cascaded with a static nonlinear bilateral sa- turation function (Freeman, 1967, 1968, 1975). An important feature that distinguishes the biological sigmoid from its neural network cousins is the finding that the maximal slope of the curve and thereby the maximal gain of the local ensemble is displaced to the excitatory side (Freeman, 1979). Input not only excites neurons, it increases their forward gains. Furthermore, the slope of the curve is in- creased with factors that increase arousal and motivation in animals. When a stimulus is given to which a subject has been sensitized by discrimina- tive training, so that a nerve cell assembly has been formed, the re- excitation within the assembly is enhanced by both the input-dependent non- linearity and by arousal. The result is regenerative feedback in a high- gain system that leads to instability. The large collection of intercon- nected and interactive ensembles undergoes a state transition from a pres- timulus state to a stimulated state. The state prior to entry of input is low-amplitude and low-gain, so that neurons not interacting strongly with each other are free to react to input on incoming lines. When sufficient input arrives to one or another nerve cell assembly in an aroused subject, the amplitude and gain both in- crease, and the neurons strongly interact with each other. In this highly interactive state the information that each received during the preceding input state is disseminated rapidly over the entire extent of an interac- tive area of neurons, apparently within a few milliseconds over regions comprising many square mm or cm of surface area and many millions of neu- rons. The spatial density of the information becomes uniform, just as it does in a 2-dimensional Fourier transform of a visual scene. Simulations of these transitions have shown that the input information in the pulse mode is not degraded or lost in the conversion from the pulse mode to the wave mode. The point-wise input is mapped under spatial in- tegration into a distribution of spatial activity that introduces the past experience of the subject through the nerve cell assemblies and the present state of expectation embodied in the factors relating to arousal and motivation, that is, brain state with respect to future action. When it is read out, both in the brain and in the models that simulate the process, the output is coarse-grained at the surface and summed in ab- solute value or by squaring to give a positive quantity at each channel, which re-establishes the pulse mode at the input to the next stage. In the process of coarse-graining the output from the preceding stage is "cleaned" under spatiotemporal integration to attenuate all activity that is not shared by the entire transmitting array of neurons. Only the cooperative activity that is shared by the whole is successfully injected into the next succeeding stage. This completes the inverse transformation back to the pulse mode. In brief, input on labeled lines that is injected by axons into the cortex can destabilize the neural mass, depending on past experience and present motivation, and the cortex can converge to a distributed pattern of fluctuating activity that expresses that confluence of stimulus, experience and expectation. The key to understanding is the state transition that changes the properties of the cortex and extends the information from the local to the distributed mode. It is an input-induced transition from a low-energy disordered state to a high-energy more ordered state (Prigogine, 1984; Skarda & Freeman, 1987). Transmission outside of the cortex has not yet been studied, but I postulate that similar state transitions may occur in subcortical masses, so that with each transfer of information from one brain mass to another, there is injection of information on labeled lines, transition to integra- tion in the wave mode, and reconversion of the integrand to a labeled line pattern on the output channels, the last neural output being the discharges of motor neurons in the brainstem and spinal cord. Implications for neural networks The current theory and practice of neural networks has incorporated many of the important features of the static design of nervous systems, particularly those based on parallel feedforward nets, but has neglected the dynamics of real nervous systems in favor of unrealistic abstractions that do not do justice to the ceaseless fluctuations of neural activity. These are commonly designated as "noise" and removed by stimulus-locked time ensemble averaging in order to impose the ideal of an invariant base- line that precedes the stimulus arrival, or to which the system converges as it "learns" or "perceives". It is this artificial creation of a stable equilibrium to represent the desired state of neural networks, the incorporation of it as a design criterion, that has crippled their performance. Wet nervous systems do not have equilibria except under deep anesthesia, surgical isolation, or near- terminal damage of one kind or another. These reduced states are in fact useful for measuring the open loop time and space constants of parts of wet nervous systems, but clearly, and in the case of general anesthesia by de- finition, there is no information processing in such reduced systems. In- stead, nervous systems appear to be designed by evolution to be destabil- ized by input. They seek input as a means of inducing state changes that disseminate and integrate fresh input with past experience as the basis for impending action. There is no intrinsic hardware or software barrier to constructing neural networks that have the properties of effecting these transforma- tions. We have demonstrated the principles by which they occur in simple systems, that are built with well-known components and algorithms, that are used in novel ways as dictated by the theory and the correspondance to per- formance to the wetware (Freeman, Eisenburg & Burke, 1987; Freeman & Yao, 1988). The key attributes are the biological sigmoid curve, the associa- tional connectivity that is subject to modification by learning, the vari- able global gain under motivational factors, and, most importantly, the ability to change from a low-level receiving state to high-level transmit- ting state. In the low-level state the input injects information into the system. In the high-level state induced by the input in the prepared system, the information is integrated, globally distributed, and incorporated into a novel form of display. The forms of spatial integration of the output in the wetware brain are not yet known. We replace them with a simple Eu- clidean distance measure in n-space, where n is the number of channels simulated each with its amplitude of output of the common carrier. The models show robust abilities for rapid classification of input into learned categories despite the presence of noise, incomplete inputs, overlap of templates and component variability in the case of the hardware embodiment. In these models the input on labeled lines induces a global oscil- lation by a state transition corresponding to a Hopf bifurcation. The classification is performed by the use of a Euclidean distance measure in 64-space. The output is by step functions on labeled lines from a decision function operating on the distributed pattern in the wave mode. Conver- gence to a pattern depends on the input and not on the initial conditions. Classification succeeds well before asymptotic convergence to a steady state. Thereby the frame rate for successive input samples can exceed 10/sec, so that a fluctuating and unpredictable environment can be tracked by the rapidly adapting device. References Amari S (1977a) Neural theory of association and concept-formation. Biological Cybernetics 26: 175-185. Amari S (1977b) Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics 27: 77-87. Freeman WJ (1967) Analysis of function of cerebral cortex by use of control systems theory. Logistics Review 1: 5-40. Freeman WJ (1968) Analog simulation of prepyriform cortex in the cat. Mathematical Biosciences 2: 181-190. Freeman WJ (1975) Mass Action in the Nervous System. Academic Press, New York. Freeman WJ (1979) Nonlinear dynamics of paleocortex manifested in the olfactory EEG. Biological Cybernetics 35: 21-37. Freeman WJ, Eisenberg J & Burke B (1987) Hardware simulation of dynamics in learning: the SPOCK. Proceedings 1st Int. Conf. Neural Networks San Diego III: 435-442. Freeman WJ & van Dijk B (1987) Spatial patterns of visual cortical fast EEG during conditioned reflex in a rhesus monkey. Brain Research 422: 267-276. Freeman WJ, Yao Y & Burke B (1988) Central pattern generating and recognizing in olfactory bulb: a correlation rule. Neural Networks, in press. Gray CM & Singer W (1988) Nonlinear cooperativity mediates oscillatory responses in orientation columns of cat visual cortex. Submitted. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Nat'l. Acad. Sci. USA 79: 3088-3092. Prigogine I (1984) From Being to Becoming. Freeman, New York. Skarda CA & Freeman WJ (1987) How brains make chaos in order to make sense of the world. Brain & Behavioral Sciences 10: 161-195. Supported by grants MH06686 from the National Institute of Mental Health and 87NE129 from the Air Force Office of Scientific Research. From tap at nmsu.csnet Sat Apr 30 01:10:38 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 29 Apr 88 23:10:38 MDT Subject: Structure of Neural Nets - control In-Reply-To: TERU's message of Thu, 28 Apr 1988 23:15 PDT <8804300128.AA26022@opus> Message-ID: > Modular structure is interesting so that it expands the > net's heterogeneous structure while maintaining rather > uniform structure in a layer within a module. Each module > may be treated and controlled as a unit at another level. > - Teru Homma This raises a whole lot of questions: Do we really want strict modular structure, where each module is controlled as a unit at another (higher) level? -- Some structural modularity may be neccessary, but maybe the type of modularity that works best for dealing with the world is a flexible one, where the boundaries and interfaces of modules are flexible and context-dependent. How are these modules at that higher level controlled? - as units at an even higher level? Where does it end?, -- it must end in some self-regulated unit or module. Will this result in the type of inflexible control that one sees in many discrete-symbol AI systems? Can we develop principles of self-regulation and apply them directly to all modules, rather than having a hierarchy of control? Is Grossberg's ART a step in this direction? Nearly all work by connectionists that is related by AI has been on developing better representations for data. Most of the processing has been of a single step (single forward propagation, single relaxation, or single settling), and consequently the control has been very simple. Very little work has been done on problems that require more complex control of processing, such as planning, or analysing sequential input of unbounded length and complexity. The control structures used in discrete-symbol AI for doing this type of processing have the same inflexibilities as the discrete-symbol representations. Can connectionists do for control structures what they are doing for representations? i.e. decompose them and recompose them in a more flexible and accessible way. Can representation of control and representation of data become the same thing? I mean this in a strong sense, I don't mean that they should just be representable using the same techniques, but rather that they be identical: the representation of data represents by virtue of recording (memory) and creating (recall, action) processes that that data has effect upon, (and there is NOTHING else). Is there any real and/or valuable distinction between process and control, or is it just a convenient way of interpreting complex systems? (and what is process and what is control?). I would claim that discrete-symbol AI has often made this distinction. It is most evident in systems which systems which provide automatic-backtracking the control decisions (which goal to select next, how to backtrack) are usually made by the system and not the program which is running in it (e.g. most Prologs). I also claim this is bad: systems whose designs contain explicit or implicit distinctions between control and process are doomed to inflexible hierarchical control. But can it be otherwise? -- yes -- some prologs (e.g. Nu-Prolog) allow the data to affect control decisions - goal selection is dependent upon patterns of instantiation. I think this is a step in the right direction, and one that can be taken much further by connectionism. -- Tony Plate From matsuoka at nttspch.ntt.jp Mon Apr 4 15:24:20 1988 From: matsuoka at nttspch.ntt.jp (Tatsuo Matsuoka) Date: Mon, 4 Apr 88 15:24:20 JST Subject: Could I have Technical Report ? Message-ID: <8804040624.AA06459@nttspch.NTT> Dear Colleague: Let me know the physical mail address of CMU technical report director. Or if you have the paper which I like to get, could you send me a copy ? I'd like to have the paper: David C. Plaut, Steven Nowlan, and Geoffrey Hinton "Experiments on Learning by Back Propagation," Technical Report CMU-CS-86-126 Carnegie-Mellon University, 1986. Thank you. Tatsuo MATSUOKA (who is now researching speech recognition using connectionist model) address: 320C Language Media Laboratory NTT Human Interface Laboratories 1-2356 Take Yokosuka-shi Kanagawa 238 Japan phone : +81 468 59 2943 From munro at b.psy.cmu.edu Mon Apr 4 20:14:59 1988 From: munro at b.psy.cmu.edu (Paul Munro Paul Munro) Date: Monday, 04 Apr 88 19:14:59 EST Subject: Two Tech Reports Message-ID: The following TRs are available and can be obtained by writing to Ms. Susan Webreck LIS Building 5th Floor University of Pittsburgh Pittsburgh PA 15260 Learning to Represent and Understand Locative Prepositional Phrases Cynthia Cosic and Paul Munro Abstract: The spatial juxtaposition observed between any two objects can generally be expressed by a locative prepositional phrase of the form noun-preposition-noun. A network is described which learns to associate locative prepositions with the spatial relationships they express using back-propagation. Since the mapping is context sensitive, input patterns to the network include a representation of the objects: for example, the preposition "on" expresses different relationships in the phrases "house on lake" and "plate on table". The network is designed such that, during the learning process, it must develop a small set of features for representing the nouns appropriately. The problem is framed as a pattern completion task in which the pattern to be completed consists of four components: two nouns, a preposition, and the spatial relationship; the missing component is either the preposition or the spatial relationship. After learning, the network was analyzed in terms of [1] its ability to process novel pattern combinations, [2] clustring of the noun representations, [3] the core meanings of the prepositions and [4] the incremental influence of 0, 1, and 2 nouns on the interpreted meaning of a preposition. The above is TR IS88002 Below is TR IS88003: Self-supervised Learning of Concepts by Single Units and "Weakly Local" Representations Paul Munro Abstract: A mathematical system, \fIself-supervised learning\fR (SSL) is presented that describes a form of learning for high-order "concept units" (C-units) that learn to become sensitive to categories of stimuli associated by some feature (the concept) that they share. Implicit in the SSL model is the assumption that each C-unit receives input from at least two information streams or "banks". Under SSL, each C-unit becomes very selective across one of the streams, the training bank; that is, patterns in the training bank are strongly filtered by the C-unit such that all of them are ignored, save one, or a few. The preferred stimulus pattern in the traing bank serves as a "seed" for concept formation, as an associative process causes the stimulus patterns on the other banks to drive the C- C-unit to the extent that they are corellated with the seed stimulus in the world. The possibility that linguistic informationmay provide seed stimuli suggests an approach via SSL for understanding the role of language in concept formation. From feldman at cs.rochester.edu Wed Apr 6 09:32:16 1988 From: feldman at cs.rochester.edu (feldman@cs.rochester.edu) Date: Wed, 6 Apr 88 09:32:16 EDT Subject: Could I have Technical Report ? Message-ID: <8804061332.AA22578@wasat.cs.rochester.edu> Write to the Computer Science Department, Technical REport Librarian, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213. From SSUG5 at CLUSTER.SUSSEX.AC.UK Wed Apr 6 14:26:26 1988 From: SSUG5 at CLUSTER.SUSSEX.AC.UK (SSUG5@CLUSTER.SUSSEX.AC.UK) Date: 6-APR-1988 18:26:26 GMT Subject: No subject Message-ID: Hi. I am a Computing/AI student in the School of Cognitive Sciences at Univ. of Sussex, UK. Prof. S.E. Fahlamn at CMU gave out this BB address as of interest to people exploring connectionist/PDP/neural net models. I would be very grateful if you could put me on your mailing list. My EMAIL addresses are JANET ssug5 at uk.ac.sussex.cluster BITNET ssug5 at cluster.sussex.ac.uk CSNET/ARPANET/MILNET ssug5%ssusex.cluster at ucl-cs (the last two being to the best of my knowledge!). Thanks Richard Hall. From HARRYF at VAX.OXFORD.AC.UK Tue Apr 12 11:55:37 1988 From: HARRYF at VAX.OXFORD.AC.UK (HARRYF@VAX.OXFORD.AC.UK) Date: 12-APR-1988 15:55:37 GMT Subject: Addition to mailing list Message-ID: Please relay this digest to me --- Harry Fearnley, Mathematical Institute, Oxford From elman at sdamos.ling.ucsd.edu Tue Apr 12 13:02:22 1988 From: elman at sdamos.ling.ucsd.edu (Jeff Elman) Date: Tue, 12 Apr 88 10:02:22 PDT Subject: TR: Finding structure in time Message-ID: <8804121702.AA25547@amos.ling.ucsd.edu> Center for Research in Language Technical Report 8801/April "Finding structure in time" Jeff Elman Department of Linguistics University of California San Diego Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implicitly by its effects on processing rather than explicitly (as in a spa- tial representation). The current report develops a propo- sal along these lines first described by Michael Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory. In this approach, hidden unit patterns are fed back to themselves; the inter- nal representations which develop thus reflect task demands in the context of prior internal states. A set of simulations is described which range from relatively simple proplems (temporal version of XOR) to dis- covering syntactic/semantic features for words, to the prob- lem of resolving pronominal reference for sentences which conform to Reinhart's (1983) C-command formulation. In the latter case, it is shown that a solution is possible which does not require the symbol-processing invoked by C- command. It is suggested that some aspects of language behavior can be profitably viewed as a complex sequential behavior; the problem of discovering linguistic structure in these cases is then the problem of discovering complex tem- poral structure. ---------------------------------------- Requests should be sent to J. Elman; Dept. of Linguistics, C-008; Univ. of Calif., San Diego; La Jolla CA 92093-0108. Email: elman at amos.ling.ucsd.edu From netlist at psych.Stanford.EDU Thu Apr 14 09:52:50 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 14 Apr 88 06:52:50 PDT Subject: TODAY: SU Adaptive Networks Colloquium Message-ID: REMINDER: TODAY Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS Co-sponsored by the Depts. of Psychology and Electrical Engineering Apr. 14 (Thursday, 3:15pm): DEMITRI PSALTIS "Optical Neural Computers" Caltech 116-18 Pasadena, CA 91125 Additional Information ---------------------- Location: Room 380-380C, which can be reached through the lower level courtyard between the Psychology and Mathematical Sciences buildings. For additional information: Contact Mark Gluck, Bldg. 420-316; (415) 725-2434 or email to gluck at psych.stanford.edu From pollack at nmsu.csnet Thu Apr 14 19:46:38 1988 From: pollack at nmsu.csnet (pollack@nmsu.csnet) Date: Thu, 14 Apr 88 17:46:38 MDT Subject: TR: Recursive Auto-Associative Memory Message-ID: Recursive Auto-Associative Memory: Devising Compositional Distributed Representations Jordan Pollack MCCS-88-124 Computing Research Laboratory New Mexico State University Las Cruces, NM 88003 A major outstanding problem for connectionist models is the representation of variable-sized recursive and sequen- tial data structures, such as trees and stacks, in fixed- resource systems. Some design work has been done on general-purpose distributed representations with some capa- city for sequential or recursive structures, but no system to date has developed its own. This paper presents connectionist mechanisms along with a general strategy for developing such representations automatically: Recursive Auto-associative Memory (RAAM). A modified autoassociative error-propagation learning regimen is used to develop fixed-width representations and access mechanisms for stacks and trees. The strategy involves the co-evolution of the training environment along with the access mechanisms and distributed representations. These representations are compositional, similarity-based, and recursive, and may lead to many new applications of neural networks to traditionally symbolic tasks. Several examples of its use are given. ________________________ To order, send $3 along with a request for "MCCS-88-124" to Technical Reports Librarian Computing Research Laboratory Box 3CRL New Mexico State University Las Cruces, NM 88003 To save the $3 you can bug me about it, or try to get a copy from the person at Yale, Brandeis, UCSD, CMU, UCLA, UMD, UMass, ICSI, or UPenn, who just recently visited Las Cruces. From WEILI%WPI.BITNET at husc6.harvard.edu Fri Apr 15 17:31:46 1988 From: WEILI%WPI.BITNET at husc6.harvard.edu (WEILI%WPI.BITNET@husc6.harvard.edu) Date: Fri, 15 Apr 88 16:31:46 est Subject: Analoge outputs Message-ID: <8804152131.AA18445@wpi.local> I wonder if there are some neural network models which could give analog outputs? From djb at thumper.bellcore.com Sat Apr 16 13:37:56 1988 From: djb at thumper.bellcore.com (David J. Burr) Date: Sat, 16 Apr 88 12:37:56 est Subject: Analog Outputs Message-ID: <8804161737.AA25586@lafite.bellcore.com> Regarding the question of networks with analog outputs I recommend some earlier papers on elastic mapping algorithms (below). These have true analog outputs since they are based on Green's function solutions in continuous media. In contrast to similar self-organized mapping schemes, they ex- hibit full symmetry on the input spaces in the sense that the distance from pattern A to B is equal to the distance from B to A. This is based on a generalization of the "winner-take-all" arrangement to a "symmetric winner-take- all". Extension to an "average of k winners" was suggested in the CGIP paper. Symmetry is important as it uses a "push-pull" force field for more accurate correspondence and hence faster conver- gence. This is useful for pulling together extremities of strokes in character recognition. Popular self-organizing feature maps can be viewed as a special case of elastic map- ping without the symmetry component. Symmetric elastic map- ping recently found optimum solutions to travelling salesman problems using fewer iterations than a competing analog method (Burr, Snowbird, April 1988). D. J. Burr, Computer Graphics and Image Processing, Vol. 15, 102-112, 1981. D. J. Burr, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3, No. 6, 708-713, Nov. 1981. From nowlan at ai.toronto.edu Sat Apr 16 15:03:06 1988 From: nowlan at ai.toronto.edu (Steven J. Nowlan) Date: Sat, 16 Apr 88 15:03:06 EDT Subject: analog outputs In-Reply-To: Your message of Fri, 15 Apr 88 22:36:30 -0400. Message-ID: <88Apr16.150321edt.27298@ephemeral.ai.toronto.edu> With regards to Dave's suggestion that recurrent backprop nets could possibly provide analog CAM, I am just finishing a TR on an application in which a recurrent backprop net was trained to develop stable attractors for a particular set of vectors. The set of vectors corresponded to the set of solution vectors for different instances of the n queens problem, and the network may be treated as performing a constraint satisfaction search from some initial point, or as recovering a stored memory from a very noisy input. Although the sets of vectors stored in this application were binary (desired states 0.1 or 0.9), in principal the method can be used to store analog vectors, the major limitation is probably a constraint on the minimum euclidean distance between the stored vectors, to ensure the attraction basins for each remain distinct. - Steve From terry at cs.jhu.edu Sun Apr 17 01:12:12 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Sun, 17 Apr 88 01:12:12 edt Subject: analog outputs Message-ID: <8804170512.AA14658@crabcake.cs.jhu.edu> Fernando Pineda has recently shown at Snowbird that his recurrent backprop generalization can store analog vectors and retrieve them as content adressable memories. P. Simard and Dana Ballard presented a similar result. Terry ----- From harnad at Princeton.EDU Mon Apr 18 16:40:40 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Mon, 18 Apr 88 15:40:40 est Subject: Associative learning: Call for Commentators (BBS) Message-ID: <8804182040.AA18797@mind.Princeton.EDU> The following is the abstract of a target article to appear in Behavioral and Brain Sciences (BBS). All BBS articles are accompanied by "open peer commentary" from across disciplines and around the world. For information about serving as a commentator on this article, send email to harnad at mind.princeton.edu or write to BBS, 20 Nassau Street, #240, Princeton NJ 08540 [tel: 609-921-7771]. Specialists in the following areas are encouraged to contribute: connectionism/PDP, neural modeling, associative modeling, classical conditioning, operant conditioning, cognitive psychology, behavioral biology, neuroethology. CLASSICAL CONDITIONING: THE NEW HEGEMONY Jaylan Sheila Turkkan Division of Behavioral Biology Department of Psychiatry and Behavioral Sciences The Johns Hopkins University School of Medicine Converging data from different disciplines are showing that the role of classical [associative] conditioning processes in the elaboration of human and animal behavior is larger than previously supposed. Older restrictive views of classically conditioned responses as merely secretory, reflexive or emotional are giving way to a broader conception that includes problem-solving and other rule-governed behavior thought to be under the exclusive province of either operant conditioning or cognitive psychology. There have also been changes in the way conditioning is conducted and evaluated. Data from a number of seemingly unrelated phenomena such as postaddictive drug relapse, the placebo effect and immune system conditioning turn out to be related to classical conditioning. Classical conditioning has also been found in simpler and simpler organisms and has recently been demonstrated in brain slices in utero. This target article will integrate the diverse areas of classical conditioning research and theory; it will also challenge teleological interpretations of classically conditioned responses and will offer some basic principles to guide experimental testing in diverse areas. Stevan Harnad harnad at mind.princeton.edu (609)-921-7771 From gary%cs at ucsd.edu Mon Apr 18 15:46:33 1988 From: gary%cs at ucsd.edu (Gary Cottrell) Date: Mon, 18 Apr 88 12:46:33 PDT Subject: analog outputs Message-ID: <8804181946.AA02444@desi.UCSD.EDU> Does anyone have Pineda's net address or a pointer to a reference on this? g. gary cottrell Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at sdcsvax.ucsd.edu (ARPA) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!sdcsvax!gary (USENET) gwcottrell at ucsd.edu (BITNET) From pollack at nmsu.csnet Mon Apr 18 15:45:56 1988 From: pollack at nmsu.csnet (pollack@nmsu.csnet) Date: Mon, 18 Apr 88 13:45:56 MDT Subject: Analog Content-addressable Memories (ACAM) Message-ID: I have thought about this issue as well, and, at one point tried to build a "noisy auto- associative" network, where randomly perturbed input patterns were mapped back into the pure source. It didn't work too well, but it sounds like Nowlan has gotten something similar to work. One problem with using the normal form of back-prop is that the sigmoid function tends to be like BSB, pushing values to their extrema of 0 and 1. Something which would be really nice would be a generative memory, in the following sense. Given a finite training basis of analog patterns, the resultant ACAM would have a theoretically infinite number of attractor states, which were in some sense "similar" to the training patterns. Its possible that this type of memory already exists, but was considered a failed experiment by a dynamical systems researcher. (Similarly, "slow glass" may already exist in the failed experiments of a chemist at Poloroid!) This type of ACAM would be nice, say, if one were storing analog patterns which represented, say, sentence meanings. The resultant memory might be able to represent a much larger set of meanings as stable patterns. I have, as yet, no idea how to do this. Jordan From djb at thumper.bellcore.com Tue Apr 19 11:38:36 1988 From: djb at thumper.bellcore.com (David J. Burr) Date: Tue, 19 Apr 88 10:38:36 est Subject: ACAM Message-ID: <8804191538.AA28586@lafite.bellcore.com> Doesn't Hinton's weight decay scheme help to keep the activations from going to their extreme values of 0 and 1? This would seem to make a feed forward net more analog-like, since the sigmoids try to remain more linear rather than step-like. From alexis%yummy.mitre.org at gateway.mitre.org Tue Apr 19 16:47:46 1988 From: alexis%yummy.mitre.org at gateway.mitre.org (alexis%yummy.mitre.org@gateway.mitre.org) Date: Tue, 19 Apr 88 16:47:46 EDT Subject: ACAM Message-ID: <8804192047.AA00608@marzipan.mitre.org> > Doesn't Hinton's weight decay scheme help to keep the activations > from going to their extreme values of 0 and 1? This would seem > to make a feed forward net more analog-like, since the sigmoids > try to remain more linear rather than step-like. Actually not. Weight decay causes weights that are no longer being used to disappear; but unless it's turned up unduely high it does not stop you from getting out of the linear region of the nodes (which is good since generally a network of linear nodes is rather boring from a computational point of view). Weight decay does make it easier though, to develop analog mappings with a feedforward network. From terry at cs.jhu.edu Tue Apr 19 16:59:57 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Tue, 19 Apr 88 16:59:57 edt Subject: analog outputs Message-ID: <8804192059.AA25137@crabcake.cs.jhu.edu> A number of you have asked for more information about the recent results with recurrent back-prop and content-adressable analog memories. Pineda, F., Phys. Rev. Letters 59, 2229 (1987). The work he reported at Snowbird is available from him in reprint form: Dr. Fernando Pineda Applied Physics Laboratory Johns Hopkins University Johsn Hopkins Road Laurel, MD 20707 P. Simard and D. Ballard presented a paper at the Snowbird meeting earlier in April. I'm not sure if a preprint is available, but you can write to them at the Computer Science Department, University of Rochester, Rochester, NY Terry ----- From phil at mirror.TMC.COM Tue Apr 19 17:12:24 1988 From: phil at mirror.TMC.COM (Phil Madsen) Date: Tue, 19 Apr 88 17:12:24 EDT Subject: ACAM Message-ID: <8804192112.AA28475@prism.TMC.COM> OK. Sounds good. From marvit%hplpm at hplabs.HP.COM Wed Apr 20 16:25:38 1988 From: marvit%hplpm at hplabs.HP.COM (Peter Marvit) Date: Wed, 20 Apr 88 13:25:38 PDT Subject: One to Many? Message-ID: <19284.577571138@hplpm> Most (nay all) of the simple connectionist networks which allow come sort of pattern association are a one-to-one or many-to-one mapping. For example, given a set of inputs, light the "odd" or "even" output unit or produce *the* past tense of a word given its root. What simple architecture exist for allowing a one-to-many mapping (e.g., given a, b or c are allowed -- perhaps with equal frequency). Why would one want this? I'm trying to explore different mechanisms for pluralizing nouns (extending some of Rumelhart and McClelland's work in verb past tense generation.). Unfortunately, some words (e.g., "fish") have alternate forms. So far, I've just thrown away the "less common" plurals but it feels unsatisfactory in the general case. I'm also using the generalized delta rule with a standard back-propagation system; eventually, I'll experiment with other architectures. A few observations and random thoughts: A one-to-many may actually be a one-to-one with given context. Thus my example may prefer "fish" when speaking about a "school of f." but prefer "fishes" when talking about "three f." This seems to unduly restrict input, however. One could think of an extension of this problem as "give all strings in a list which match this substring." One could construct a network with some units acting as "multiple answer" flags which, if activated, would stochastically decide which "answer" to be the output pattern. This feels klugey and inelegant. It also violates my original request for a "simple architecture". Further, in the case of a back-propagation network, it introduces the sole element of chance into an otherwise completely deterministic system. Is such a mixed mode system realistic? It may be that the simpler architectures are unsuitable for this. If not, what would be the simplest "mixed architecture" which would handle the problem? What other applications would use a one-to-many? -Peter Marvit HP Labs (part time) U.C. Berkeley (part time) From WEILI%WPI.BITNET at husc6.harvard.edu Thu Apr 21 22:01:14 1988 From: WEILI%WPI.BITNET at husc6.harvard.edu (WEILI%WPI.BITNET@husc6.harvard.edu) Date: Thu, 21 Apr 88 21:01:14 est Subject: The Paper Message-ID: <8804220201.AA22110@wpi.local> Sorry to broadcast this mail, but I failed to send my mail to Eric Saund. Hi, Saund ( Saund at OZ.AI.MIT.EDU): I am very interested in the paper you mentioned in the mail and would like to get a copy of it. My address is: Wei Li Electrical Engineering Department Worcester Polytechnic Institute Worcester, MA. 01609 BTW, what are you doing for your research now? I will do something related to computer vision using neural network models. Thank you very much. -- Wei Li From 8414902 at UWACDC.ACS.WASHINGTON.EDU Thu Apr 21 15:31:00 1988 From: 8414902 at UWACDC.ACS.WASHINGTON.EDU (TERU) Date: Thu, 21 Apr 1988 12:31 PDT Subject: Novelty of Neural Net Approach Message-ID: .. a deeper methematical study of the nervous system ... will affect our understanding of the aspects of mathematics itself that are involved. - John von Neumann, The Computer and the brain. In the following is my attempt to list the differences between conventional systems and neural network (or connectionist) systems. The intent is to point out the novelty of neural network approach. Although simplistically stated, it may serve as a seed for discussion. Your comments, opinions and criticism are most welcomed. (The word "conventional" is used here in a very loose sense only to highlight the characteristics of neural net approach.) Conventional Systems Neural Net (Connectionist) Systems 1. linear ( in analog) pseudolinear logical ( in digital) softlogic 2. try to eliminate noise try to utilize noise (suffer from noise) (immune to noise) 3. usually need reliable work with unreliable components components 4. need designed fault- have built-in fault-tolerance tolerance 5. emphasize economy of emphasize redundancy operation 6. want sharp switches use dull switches (for digital) (with sigmoid function) 7. usually operate synchronously may work asynchronously easily under a global clock without a global clock 8. have complex structure have rather uniform structure 9. need to decompose (if possible) have built-in paralellism the process for parallel processing 10. designed or programmed with learn from examples; rules specifying the system may self-organize behavior In designing a system, engineers have been working very hard to linearlize the system, eliminate noises, produce reliable components, etc. The issues listed above are very important ones in system design today. It is interesting to see that neural network systems offer an alternative approach in every issue listed above. In some cases, the approach is opposite in its direction. Teru Homma Univ. of Washington, FT-10 Seattle, WA 98195 From harnad at Princeton.EDU Fri Apr 22 01:21:27 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Fri, 22 Apr 88 00:21:27 est Subject: Society for Philosophy and Psychology, Annual Meeting Message-ID: <8804220521.AA11680@mind.Princeton.EDU> Society for Philosophy and Psychology: 14th Annual Meeting Thursday May 19 - Sunday May 22 University of North Carolina, Chapel Hill Contributors will include Jerry Fodor, Ruth Millikan, Colin Beer, Robert Stalnaker, Paul Churchland, Lynn Nadel, Michael McCloskey, James Anderson, Alan Prince, Paul Smolensky, John Perry, William Lycan, Alvin Goldman Paper (PS) and Symposia (SS) on: Naturalism and Intentional Content (SS) Animal Communication (SS)_ The Child's Conception of Mind (PS) Cognitive Science and Mental State, Wide and Narrow (PS) Logic and Language (PS) Folk Psychology (PS) Current Controversies: Categorization and Connectionism (PS) Current Controversies: Rationality and Reflexivity (PS) Neuroscience and Philosophy of Mind (SS) Connectionism and Psychological Explanation (SS) Embodied vs Disembodies Approaches to Cognition (SS) Emotions, Cognition and Culture (SS) Naturalistic Semantics and Naturalized Epistemology (PS) Registration is $30 for SPP members and $40 for nonmembers. Write to Extension and Continuing Education CB # 3420, Abernethy Hall UNC-Chapel Hill Chapel Hill NC 27599-3420 Membership Information ($15 regular; $5 students): Professor Patricia Kitcher email: ir205%sdcc6 at sdcsvax.ucsd.edu Department of Philosophy B002 University of California - San Diego La Jolla CA 92093 From tap at nmsu.csnet Fri Apr 22 02:56:19 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 22 Apr 88 00:56:19 MDT Subject: One to Many? In-Reply-To: Peter Marvit's message of Wed, 20 Apr 88 13:25:38 PDT <19284.577571138@hplpm> Message-ID: It seems that what you want could be supplied by any associative memory that was capable of pattern completion. Examples are Hopfield networks, Willshaw networks, and Anderson's "Brain state in a box" networks. These networks are all best used for this purpose as recurrent relaxation networks, in contrast to the feed-forward networks you seem to have considered. To see that these can be used to store a one-to-many mapping, let A, B, etc. be bit-vectors and then consider storing the following vectors: AP AQ BR BS BT Now, if A_ (where '_' means zero or random) was used as a key then there are two patterns which *might* be retrieved; AP or AQ, but *only one* of these *would* be retreived in a *particular* relaxation. If B_ was used as a key there are three patterns which might be retrieved. Another way of putting this is that there are multiple attractors which have A as part of their description. There might be some problems with trying to make different attractors out of states which are very close, but there are ways to cope with this. Kawamoto (Alan H. Kawamoto, 1986, "Resolution of Lexical Ambiguity Using a Network that Learns", Dept. of Psychology, CMU) used a brain state in a box model for associating spelling, phonetic, part-of-speech, and "semantic" information about words. The system coped with ambiguity, so two different words could have the same spelling, and the initial perturbations of the system ("context") would determine which state the system settled to. If there were no initial perturbations, then the system would settle to the state it had been most frequently trained on. Easier-to-find descriptions of other associative networks can be found in the PDP books, for example, the room-"schema" example of Rumelhart, Smolensky, McClelland, and Hinton in chapter 14. Tony Plate ----------------------------- Tony Plate Computing Research Laboratory Box 30001 New Mexico State University Las Cruces, New Mexico 88003 (505) 646-5948 CSNET: tap%nmsu From goddard at aurel.caltech.edu Fri Apr 22 14:26:37 1988 From: goddard at aurel.caltech.edu (goddard@aurel.caltech.edu) Date: Fri, 22 Apr 88 11:26:37 -0700 Subject: [TERU: Novelty of Neural Net Approach] Message-ID: <8804221826.AA05192@aurel.caltech.edu> > Conventional Systems Neural Net (Connectionist) Systems > > >8. have complex structure have rather uniform structure >.. >10. designed or programmed with learn from examples; > rules specifying the system may self-organize > behavior Real (biological) neural networks have very complex computational structure. Ask any computational neuroscientist. Connectionist networks will need complex structure to solve hard problems. There is also a significant amount of genetically determined structure in real neural systems. We will also have to pre-structure artificial systems, to bootstrap the subsequent learning. Nigel Goddard From hinton at ai.toronto.edu Thu Apr 21 01:00:21 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Thu, 21 Apr 88 01:00:21 EDT Subject: One to Many? In-Reply-To: Your message of Wed, 20 Apr 88 13:45:38 -0400. Message-ID: <88Apr22.112158edt.27106@ephemeral.ai.toronto.edu> The boltzmann machine architecture allows one-to-many mappings in an unkludgy way, and its a simple architecture. If you insist on being deterministic, you can get BP to give a probability distribution over a set of possible answers, provided each possible answer has its own output unit. Unlike the Boltzmann machine, BP cannot represent probabilites of combinations of output units. For example, a Boltzmann machine can give high probability to the output vectors 01 and 10 and low probability to 11 and 00. BP cannot do this. Geoff From tap at nmsu.csnet Fri Apr 22 20:33:12 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 22 Apr 88 18:33:12 MDT Subject: Discussion: Numbers - Connectionist Symbols analogy Message-ID: Consider the following analogy between ways of representing numbers and connectionist representations (proposed to me by a professor of psychology at NMSU, Roger Schvanevelt). There are many ways of representing numbers. '238', '11101110', 'CCXXXIIX', 'two-hundred and thirty-eight', 'e^5.4722706736' (e^ln(238)), 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', and 'X', a pointer to any of the above, are all representations of 238. Some are more useful than others, and sometimes the usefulness of a representation depends on what we want to do with it. All of these representations for 238, with the exception of the last two, have some structure. What makes some of these representations useful in certain situations is that the structure of the representation itself makes immediately apparent the relevant properties of the thing it refers to. And when all properties of the symbol are irrelevant to what is being done with it, the pointer representation is a perfectly adequate representation. The roman numeral representation is horrible for arithmetic, (though quite suitable for some other tasks, such as labelling). In Roman times very few people knew how to multiply, and one reason was that the algorithm for multiplying with roman numbers is very long and tedious, and difficult to understand and remember. Some historians have suggested that the Roman's representation for numbers is the reason that their acheivements in arithmetic and mathematics did not match their technical acheivements in other areas. Now, taking language as a domain, and words and their meanings as the things to be represented, what is it that some people like about connectionist representations of them? I think it is that the connectionist representations make the relevant properties immediately apparent. This is the case in distributed 'micro-feature' representations of words and meanings. So, the upshot of this analogy is that doing AI with list-based representations is like doing arithmetic with roman numbers, i.e. possible, but difficult and a hindrance to the development of the field. And the final question is: Can connectionism provide the "positional base-encoding" for symbols that represent the objects that AI needs to manipulate?. All of these points have been made previously, but it seems to me that putting them in the context of this analogy adds a certain (false?) coherence and force to them. Comments? ----------------------------- Tony Plate Computing Research Laboratory Box 30001 New Mexico State University Las Cruces, New Mexico 88003 (505) 646-5948 CSNET: tap%nmsu From jose at tractatus.bellcore.com Mon Apr 25 10:19:16 1988 From: jose at tractatus.bellcore.com (Stephen J. Hanson) Date: Mon, 25 Apr 88 10:19:16 EDT Subject: One to Many? Message-ID: <8804251419.AA06160@tractatus.bellcore.com> .... Unlike the Boltzmann machine, BP cannot represent probabilites of combinations of output units. For example, a Boltzmann machine can give high probability to the output vectors 01 and 10 and low probability to 11 and 00. BP cannot do this. ------ However, this is just the sort of thing one can do with multiple hidden layers, in which the next hidden layer is making (certainty/uncertainty) decisions about combinations of units below....of course with the caveat that learning time increases and learning efficacy decreases...comparatively BMs take a bit of training time as well.. Steve (jose at bellcore.com) From tenorio at ee.ecn.purdue.edu Mon Apr 25 10:35:31 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Mon, 25 Apr 88 09:35:31 EST Subject: One to Many? In-Reply-To: Your message of Thu, 21 Apr 88 01:00:21 EDT. <88Apr22.112158edt.27106@ephemeral.ai.toronto.edu> Message-ID: <8804251435.AA14986@ee.ecn.purdue.edu> >> >> If you insist on being deterministic, you can get BP to give a probability >> distribution over a set of possible answers, provided each possible answer >> has its own output unit. Unlike the Boltzmann machine, BP cannot represent >> probabilites of combinations of output units. For example, a Boltzmann >> machine can give high probability to the output vectors 01 and 10 and low >> probability to 11 and 00. BP cannot do this. >> >> Geoff It seems that this BP characteristic has to do more with the architectural structure (number of links) than with the algorithm itself. An interesting experiment would be to enhance the architecture by allowing links between units in the same layer (input or output), and BP, or even a generalized version of it (i.e. Almeida ICNN87). If you allow the information to be captured by the proper number of links, the network will make a good attempt at it. This is the same message we got from using a hidden layer. --ft. From netlist at psych.Stanford.EDU Tue Apr 26 12:22:23 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Tue, 26 Apr 88 09:22:23 PDT Subject: TODAY: SU Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS ************************************************************************** Apr. 26 (Tuesday, 3:15pm): CHRISTOF KOCH "Computing Optical Flow in Man and Machine" Div of Biology; 216-76 Calif. Inst. of Technology Pasadena, Ca 91125 ************************************************************************** ABSTRACT The key problem in computing motion from the time-varying image intensity is the aperture problem. Using a form of smoothness constraint, i.e. the computed opical flow should be (1) compatible with the measured data, and (2) as smooth as possible, leads to a variational formulation. The resulting energy functional can be minimized using different networks. Choosing an "analog" or "frequency" representation of velocity leads to a simple resistive network, built out of linear resistances and current and voltage sources. We are currently implementing these networks into VLSI circuits. Choosing the "place" or "unit" representation in agreement with cortical physiology, leads to a different network, with a much higher connectivity and non-linear neurons. We are mapping these neuronal networks onto the primate's visual system, simulating the X and Y pathways into V1 and the subsequent motion field computation in MT. * * * Location: Room 380-380C which can be reached through the lower level courtyard between the Psychology and Mathematical Sciences buildings. Information: To be placed on an electronic mail distribution list for information about these and other adaptive network events in the Stanford area, send email to netlist at psych.stanford.edu. For additional information, contact Mark Gluck, Bldg. 420-316; (415) 725-2434 or email to gluck at psych.stanford.edu From tenorio at ee.ecn.purdue.edu Tue Apr 26 12:58:42 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Tue, 26 Apr 88 11:58:42 EST Subject: One to Many? In-Reply-To: Your message of Tue, 05 Apr 88 00:00:37 EDT. <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> Message-ID: <8804261658.AA01192@ee.ecn.purdue.edu> >> >> I disagree. Its nothing to do with the architecture. Its simply >> that deterministic units cannot REPRESENT higher-order statistics >> over the output units. >> >> Geoff >> Let me explain my point of view. I hope I understand you argument as posed by your first email. Basically the stochastic machinery used in the BM capture what would appear to be a covariance between the units. We have been try to extend the BM to the continuous case, and that seem to be true. Now in the case of deterministic units, given the proper transfer function (non-linearity)and the proper information (required number of links between units, one can design networks to capture a variety of different features. Notice that deterministic units might not necessary be using a sigmoid function, but they can used a series of more complex parameterized transfer functions, such as, the GMDH algorithm (Molnar ICNN87), or spherically and polynomial graded units (Hansen and Burr gte TR 87), or even the Multivariate Normal Distribution units that we are experimenting with. Some problems, with special characteritics of the input pattern, allows the regular quasi-integrator to define a function similar to a Bayes classifier which optimizes MAP. Of course, if such retrictions on the input type are removed, the transfer function has to adequately be modified and sometimes more links are also required to capture certain statistical characteristics. I really don't see how that is only a function of whether the net is DET or STOCH, but rather of the unit transfer function and architectural characteristics. If you modify the connection scheme in the BM, it would no longer capture the same form of statistics, although the algorithm you remain the same (sort of obvious, I guess). Similarly, if links are added between output units in DET units, interdependence would be more easily captured. Would could even imagine schemes where output unit activation would go to a context unit, and then back to the output unit (similar to JLElman CRL TR8801 UCSD), to capture temporal covariances. Even simpler would be interunit links with a momentum term set for about 1 cycle. --ft From hinton at ai.toronto.edu Tue Apr 5 00:00:37 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 5 Apr 88 00:00:37 EDT Subject: One to Many? In-Reply-To: Your message of Mon, 25 Apr 88 08:35:31 -0400. Message-ID: <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> I disagree. Its nothing to do with the architecture. Its simply that deterministic units cannot REPRESENT higher-order statistics over the output units. Geoff From mike%bucasb.bu.edu at bu-it.BU.EDU Tue Apr 26 18:50:55 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Tue, 26 Apr 88 18:50:55 EDT Subject: INNS 89 Conference Message-ID: <8804262250.AA20211@bucasb.bu.edu> April 26, 1988 GOOD NEWS FOR THE NEURAL NETWORK COMMUNITY! There are now over 2000 members of the International Neural Network Society from 34 countries and 47 states of the U.S.A. The INNS is thus beginning to fulfill its purpose of offering our community an intellectual home of its own. In particular, over 500 abstracts were submitted to the 1988 First Annual INNS meeting in Boston, to be held on September 6--10, 1988, at the Park Plaza Hotel. The abstracts cover the full spectrum of topics in the neural network field. While many are working hard on the final program and plans for the 1988 meeting, we also needed to plan further ahead. Accordingly, the INNS Governing Board approved holding the Second Annual INNS Meeting in Washington, DC, on September 5--9, 1989, and we have negotiated a contract with the Omni Shoreham Hotel. See you in Boston in '88 and Washington in '89! Steve Grossberg, President, INNS Demetri Psaltis, Vice President, INNS Harold Szu, Secretary-Treasurer, INNS ---- Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From tenorio at ee.ecn.purdue.edu Tue Apr 26 20:25:14 1988 From: tenorio at ee.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Tue, 26 Apr 88 19:25:14 EST Subject: One to Many? In-Reply-To: Your message of Tue, 05 Apr 88 00:00:37 EDT. <88Apr26.120627edt.27126@ephemeral.ai.toronto.edu> Message-ID: <8804270025.AA14453@ee.ecn.purdue.edu> I am sorry for the mistake in the reference of the previous message. The second report is: Hason, S. J. and Burr, D. J. Knowledge Representation in Connectionist Networks. Technical Report Bell Communication Research, 435 So. St. Morristown, NJ, 07960. --ft. From terry at cs.jhu.edu Wed Apr 27 09:40:06 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Wed, 27 Apr 88 09:40:06 edt Subject: NETtalk database Message-ID: <8804271340.AA29415@crabcake.cs.jhu.edu> There have many requests for the NETtalk database. A training dictionary of 20,000 words marked with phonemes and stresses is now available from: Kathy Yantis Cognitive Science Center Johns Hopkins University 34th and Charles Streets Baltimore, MD 21218 Please specify the media you want: 1/2" tape, 9 track 1600, 3200 or 6250 bpi UNIX or ANSI labelled (VMS compatible) 1/4" Sun cartridge (Quick-11, TAR) 5 1/4" 1.2 MB floppy (MS-DOS) Enclose a check or money order for $50 to cover costs made out to: Johns Hopkins Cognitive Science Center. Terry Sejnowski ----- From terry Wed Apr 27 09:30:36 1988 From: terry (Terry Sejnowski ) Date: Wed, 27 Apr 88 09:30:36 edt Subject: INNS 89 Conference Message-ID: <8804271330.AA29281@crabcake.cs.jhu.edu> There have many requests for the NETtalk database. A training dictionary of 20,000 words marked with phonemes and stresses is now available from: Kathy Yantis Cognitive Science Center Johns Hopkins University 34th and Charles Streets Baltimore, MD 21218 Please specify the media you want: 1/2" tape, 9 track 1600, 3200 or 6250 bpi UNIX or ANSI labelled (VMS compatible) 1/4" Sun cartridge (Quick-11, TAR) 5 1/4" 1.2 MB floppy (MS-DOS) Enclose a check or money order for $50 to cover costs made out to: Johns Hopkins Cognitive Science Center. Terry Sejnowski ----- From dad at cs.brown.edu Thu Apr 28 15:09:31 1988 From: dad at cs.brown.edu (David A. Durfee) Date: Thu, 28 Apr 88 15:09:31 EDT Subject: I'd like to join the mailing lists Message-ID: From mike%bucasb.bu.edu at bu-it.BU.EDU Wed Apr 27 20:24:30 1988 Received: by cs.brown.edu (5.51/1.00) id AA29649; Wed, 27 Apr 88 20:24:24 EDT Received: from bucasb.bu.edu by bu-it.BU.EDU (4.0/4.7) id AA16976; Wed, 27 Apr 88 20:22:55 EDT Return-Path: Received: by bucasb.bu.edu (5.51/4.7) id AA08789; Wed, 27 Apr 88 20:23:27 EDT Date: Wed, 27 Apr 88 20:23:27 EDT From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Message-Id: <8804280023.AA08789 at bucasb.bu.edu> To: dad at cs.brown.edu In-Reply-To: "David A. Durfee"'s message of Wed, 27 Apr 88 17:25:59 EDT <8804272329.AA16310 at bu-cs.bu.edu> Subject: I'm very interested in Neural Nets I don't manage a mailing list. Try to subscribe to one of lists in my message like neuron at ti-csl.csc.ti.com or connectionists at c.cs.cmu.edu Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From 8414902 at UWACDC.ACS.WASHINGTON.EDU Fri Apr 29 02:15:00 1988 From: 8414902 at UWACDC.ACS.WASHINGTON.EDU (TERU) Date: Thu, 28 Apr 1988 23:15 PDT Subject: Structure of Neural Nets Message-ID: >Real (biological) neural networks have very complex computational structure. >Ask any computational neuroscientist. Connectionist networks will >need complex structure to solve hard problems. There is also a >significant amount of genetically determined structure in real neural >systems. We will also have to pre-structure artificial systems, to >bootstrap the subsequent learning. > >Nigel Goddard Surely biological neural nets have complex structure: several types of neurons, synapses, neurotransmitters, ion-channels, distinct layers, nuclei, commissures, etc. I agree that the artificial neural nets need these structural complexity to solve real-world problems. Certainly research activities are pointing to that direction. The benefit of layered structure has already been well-known. Modular structure is interesting so that it expands the net's heterogeneous structure while maintaining rather uniform structure in a layer within a module. Each module may be treated and controlled as a unit at another level. Another step, for example, will be to somehow build functionally different types of neurons in the net. I will appreciate comments and pointers concerning these issues of hierarchy, modularity and neuron-types in neural nets biological or artificial. - Teru Homma From dad at cs.brown.edu Fri Apr 29 07:10:14 1988 From: dad at cs.brown.edu (David A. Durfee) Date: Fri, 29 Apr 88 07:10:14 EDT Subject: could I be added to your mailing list? Message-ID: From mike%bucasb.bu.edu at bu-it.BU.EDU Wed Apr 27 20:24:30 1988 Received: by cs.brown.edu (5.51/1.00) id AA29649; Wed, 27 Apr 88 20:24:24 EDT Received: from bucasb.bu.edu by bu-it.BU.EDU (4.0/4.7) id AA16976; Wed, 27 Apr 88 20:22:55 EDT Return-Path: Received: by bucasb.bu.edu (5.51/4.7) id AA08789; Wed, 27 Apr 88 20:23:27 EDT Date: Wed, 27 Apr 88 20:23:27 EDT From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Message-Id: <8804280023.AA08789 at bucasb.bu.edu> To: dad at cs.brown.edu In-Reply-To: "David A. Durfee"'s message of Wed, 27 Apr 88 17:25:59 EDT <8804272329.AA16310 at bu-cs.bu.edu> Subject: I'm very interested in Neural Nets Status: RO I don't manage a mailing list. Try to subscribe to one of lists in my message like neuron at ti-csl.csc.ti.com or connectionists at c.cs.cmu.edu Michael Cohen ---- Center for Adaptive Systems Boston University (617-353-7857) Email: mike at bucasb.bu.edu Smail: Michael Cohen Center for Adaptive System Department of Mathematics, Boston University 111 Cummington Street Boston, Mass 02215 From BISON%HNYKUN53.BITNET at VMA.CC.CMU.EDU Fri Apr 29 15:56:00 1988 From: BISON%HNYKUN53.BITNET at VMA.CC.CMU.EDU (BISON%HNYKUN53.BITNET@VMA.CC.CMU.EDU) Date: Fri, 29 Apr 88 15:56 N Subject: subscription Message-ID: I would like to join the connectionist mailing list. At the moment I'm working on a connectionist model to simulate aphasic language production. Thanks, Pieter Bison Department of Psychology University of Nijmegen P/O box 9104 BITNET: bison at hnykun53 6500 HE Nijmegen, Netherlands From wfreeman%garnet.Berkeley.EDU at violet.berkeley.edu Fri Apr 29 23:58:16 1988 From: wfreeman%garnet.Berkeley.EDU at violet.berkeley.edu (wfreeman%garnet.Berkeley.EDU@violet.berkeley.edu) Date: Fri, 29 Apr 88 20:58:16 pdt Subject: No subject Message-ID: <8804300358.AA23282@garnet.berkeley.edu> To: connectionists 29 apr 88 From: wfreeman at garnet Re: a physiologist's view of connectionism i'd like some feedback on this essay before it gets frozen into print and invite commentary of any sort. thanks in advance, walter Why neural networks don't yet fly: inquiry into the neurodynamics of biological intelligence. Walter J Freeman Department of Physiology-Anatomy University of California Berkeley CA 94720 USA 2nd Annual Intern. Conf. on Neural Networks San Diego CA 23 - 27 July 1988 Abstract Sensory and perceptual information exists as space-time patterns of neural activity in cortex in two modes: axonal pulses and dendritic currents. Which one we observe depends on the experimental techniques we choose in order to make our observations. The brain does its analysis of sensory input, as in feature extraction and preprocessing, in the medium of action potentials as point processes in networks of individual neurons. It does syn- thesis of its sensory input with past experience and ex- pectancy of future action in the medium of dendritic in- tegration in local mean fields. Both kinds of activity are found to coexist in olfactory and visual cortex, each preceding and then following the other. The transforma- tion of information from the pulse mode to the dendritic mode involves a state transition of the cortical network that can be modeled by a Hopf bifurcation in both software and hardware embodiments. This state transition appears to be an essential precursor to an act of neural pattern classification. However, the models suggest that the classification of a given stimulus into one of several learned classes is done by a mapping of the stimulus into a landscape that has been shaped by prior learning, and that it is not done by a multiple bifurcation into one of a collection of limit cycle attractors at the moment of choice. Introduction The strongest justification at present for the study of neural net- works is the inspiration they draw from the performance characteristics of their biological cousins. Yet it is often unclear what is to be copied and what omitted. John Denker among others has pointed out that both birds and airplanes have wings, but that only birds have feathers. While it is true that brains and neural networks share certain structural features such as massive parallelism, biological networks solve complex problems easily and creatively, and existing neural networks do not. Whereas Wilbur and Orville Wright solved first the problems of lift and then of control in flight, neural networkers have solved only the problems of statics and not the problems of dynamic control. Neural networks have not yet begun to soar. One reason I will argue here is that most theoreticians have pursued a false goal of stability, and have not reckoned with the intrinsic instabil- ity of wetware brains that enables their remarkable adaptiveness. A related key limitation in many current approaches is the lack of application by engineers of the hierarchical modes in which wetware brains sustain information for storage, transformation, and other operations. There are two documented modes that in some senses are diametrically op- posed but in other senses are strongly complementary. Probably others ex- ist, but they need not concern us here. One is typified by the action po- tential and the point process, the other by the synaptic potential and the local mean field. In sensory systems the one is the basis for feature ex- traction, preprocessing and analysis. The other is the basis for experien- tial integration, classification and synthesis. Neither can supplant or function without the other. They coexist in the same layers of neurons, and whether we observe one or the other depends on how we acquire, process and measure our biological data. My aim in this brief review is to exemplify these two modes of infor- mation, describe how they are derived from brains and how they are convert- ed each to the other, and explain their significance for the design of new and more successful neural networks. Examples of biological information Information in biological networks of the kind I am concerned with here takes the form of space-time neural activity patterns. Each pattern is relational and neither symbolic nor representational. It does not "stand for" something outside the brain, as a letter does in an alphabet, nor does it reside in fixed form as a goal or a "teacher". It is a dynamic process that mediates adaptive behavior. It results from a stimulus and in some sense causes a response, but it also incorporates past experience and the intent of future action. These being private and unique to each brain, we cannot in principle as observers know the exact information content of each pattern or even the coordinate system in which it is embedded. What we can do is to establish statistically the relation of a given space-time pattern of neural activity to an antecedent or a consequent event in the outside world. We do this by repeatedly presenting each of two or more stimuli to a subject and then demonstrating some invariant con- tiguity between each stimulus and a consequent neural activity pattern. Because we do not know the metric of the internal computational spaces, we must collect numerous input-output pairs and rely on statistical invariants that emerge from one or another form of ensemble averaging. For the point process each ensemble is collected over time from one or more points, and for the mean field it is collected simultaneously at multiple points in space in the form of a set of recordings in time. The distinction is cru- cial though subtle. I will cite examples from the primary visual cortex and from the ol- factory bulb, a specialized form of sensory cortex that is located close to the input of the olfactory system. The paradigmatic experiment in the pulse mode in olfaction consists in locating a single neuron in the bulb with a microelectrode, presenting in succession odorants A, B, C,... at the same or different concentrations, and measuring the pulse firing rate of the neuron. This is repeated for neuron i, ii, iii,... at different spatial locations in the bulb. The results are presented in the form of a table, which shows that each odorant over some concentration range (typically narrow) excites some neurons but not most others, indicating that each odorant establishes a spatial pattern of selective activation in the bulb, putatively resembling a constellation of stars in the night sky, although each neurons typically responds to a variety of odorants. This is a form of labeled line coding, with pulse rate or probability as the state variable for each line, channel or axon.. The paradigmatic experiment in the wave mode is to record the elec- troencephalogram (EEG) from an array of macroelectrodes (optimally 64) placed on the surface of the bulb. All of the simultaneously recorded EEG traces contain a common waveform or carrier that differs across the spatial array in amplitude. Odorant-specific patterns of amplitude i, ii, iii,... are seen to recur on presentation of odorants A, B, C,..., but only if the subjects are trained to discriminate them each from the others and only if they are motivated to do so (Skarda & Freeman, 1987). Learning and arousal are both essential. The odorant information is expressed as a spatial am- plitude modulation of the common carrier for the duration of a sniff, on the order of 0.1 sec. It can be likened to a monochromatic half-tone pic- ture in a newspaper. The information density is spatially uniform, because no one dot in the picture carries by its size any more or less information than any other. The carrier is identified by making a spatial ensemble average of the 64 traces that are recorded during a sniff and then regress- ing this ensemble average onto each unaveraged EEG trace to derive its am- plitude coefficient. One cannot use time ensemble averaging over sniffs, because the spectrum of the carrier and its phase relations to the initiat- ing stimulus vary unpredictably across inhalations. The result of measure- ment is a 64x1 vector that expresses the spatial pattern of amplitude. These two kinds of information, pulse and wave, coexist in each area of the bulb, and in other stages of the olfactory system as well. Whether one sees the one or the other kind depends on the experimental procedures that one uses, which in turn depends on one's goals and hypotheses. Comparable results hold for the primary visual cortex. The well-known paradigm in the pulse mode is to measure the pulse rate of a single neuron while repeatedly presenting patterned light stimuli to the retina so as to define its receptor field. This is repeated for a large number of neurons, and the results are presented in the form of graphs showing the spatial structures of orientation and ocular dominance columns and the topographic mapping of the retina onto the dozen or more specialized subareas of the visual cortex for color, motion detection, etc. The inference is made that "features" of the visual world are extracted and mapped spatially onto the cortex in the firing rates of labeled lines. The information is said to be encoded in the pulse trains of single neurons. Activity in the wave mode is likewise recorded with arrays of ma- croelectrodes on the visual cortex of an awake, motivated, trained Rhesus monkey (Freeman & van Dijk, 1987). A common carrier is retrieved from the EEG traces by linear decomposition, and its spatial pattern is expressed in the matrix of coefficients that are obtained by fitting each trace to the spatial ensemble average. A specific, identifiable spatial pattern of am- plitude modulation is found to recur on each trial when the motivated sub- ject is inferred to be discriminating a specific visual cue. This is evi- dence for distributed coding of information similar to the wave mode of ol- factory coding. Evidence for this mode of activity has also been found re- cently in the visual cortex of the cat (Gray & Singer, 1988). Again, it is apparent that these two kinds of information in the pulse and wave modes coexist in the cortex. Neural mechanisms of transformation In this section I will consider the mechanisms by which the discrete activity in the pulse mode is transformed into the wave mode and then back again. In doing so I will draw on experiments in software (Freeman & Yao, 1988) and hardware (Freeman, Eisenberg & Burke, 1987) modeling of cortical dynamics. I will argue that the activity patterns in the pulse mode con- stitute the end result of stimulus analysis by neural preprocessing, and that the patterns in the wave mode manifest the results of spatial integra- tion of the pulse activity with past experience and present motivational state. The conversion from pulses to waves takes place at synapses. There are many kinds and locations of modifiable synapses, two of which are par- ticularly important for information processing. One type is the primary synapse between an incoming axon and its target cortical neuron. It is subject to change in respect to recent use by nonspecific facilitation or by attenuation in respect to the local volume of input into a neighborhood. Attenuation is a multiplicative form of inhibition that operates in processes of dynamic range compression and signal normalization. The other type supports the long range excitatory connections that form innumerable feedback loops of mutual excitation within a cortical layer. These secondary synapses among cortical neurons are subject to change in respect to associative learning in accordance with some variant of the Hebb rule. The matrix of numbers representing the strengths of synaptic action corresponds to the W matrix of Amari (1977) and the T ma- trix of Hopfield (1982). When an animal is trained to discriminate an odor A, B, C,... a Hebbian nerve cell assembly is formed among the cortical neu- rons by the strengthened synaptic connections between each pair of coac- tivated neurons (Freeman, 1968, 1975). This nerve cell assembly is a basis for the classification of odorants by trained subjects (Skarda & Freeman, 1987). Conversion of waves to pulses occurs at trigger zones of neurons, where the sum of dendritic currents regulates the firing rate of each neu- ron. The relation between membrane current and pulse density, both of which are continuous variables, has the form of a sigmoid. The range between threshold (zero firing rate) and asymptotic maximum is much nar- rower than a comparable input-output sigmoid relation at the synapses, so that as a general rule the pulse-wave conversion takes place in a small- signal near-linear range. It follows from this and related considerations that the operation of the local neighborhood can be expressed as a linear time-invariant integrator cascaded with a static nonlinear bilateral sa- turation function (Freeman, 1967, 1968, 1975). An important feature that distinguishes the biological sigmoid from its neural network cousins is the finding that the maximal slope of the curve and thereby the maximal gain of the local ensemble is displaced to the excitatory side (Freeman, 1979). Input not only excites neurons, it increases their forward gains. Furthermore, the slope of the curve is in- creased with factors that increase arousal and motivation in animals. When a stimulus is given to which a subject has been sensitized by discrimina- tive training, so that a nerve cell assembly has been formed, the re- excitation within the assembly is enhanced by both the input-dependent non- linearity and by arousal. The result is regenerative feedback in a high- gain system that leads to instability. The large collection of intercon- nected and interactive ensembles undergoes a state transition from a pres- timulus state to a stimulated state. The state prior to entry of input is low-amplitude and low-gain, so that neurons not interacting strongly with each other are free to react to input on incoming lines. When sufficient input arrives to one or another nerve cell assembly in an aroused subject, the amplitude and gain both in- crease, and the neurons strongly interact with each other. In this highly interactive state the information that each received during the preceding input state is disseminated rapidly over the entire extent of an interac- tive area of neurons, apparently within a few milliseconds over regions comprising many square mm or cm of surface area and many millions of neu- rons. The spatial density of the information becomes uniform, just as it does in a 2-dimensional Fourier transform of a visual scene. Simulations of these transitions have shown that the input information in the pulse mode is not degraded or lost in the conversion from the pulse mode to the wave mode. The point-wise input is mapped under spatial in- tegration into a distribution of spatial activity that introduces the past experience of the subject through the nerve cell assemblies and the present state of expectation embodied in the factors relating to arousal and motivation, that is, brain state with respect to future action. When it is read out, both in the brain and in the models that simulate the process, the output is coarse-grained at the surface and summed in ab- solute value or by squaring to give a positive quantity at each channel, which re-establishes the pulse mode at the input to the next stage. In the process of coarse-graining the output from the preceding stage is "cleaned" under spatiotemporal integration to attenuate all activity that is not shared by the entire transmitting array of neurons. Only the cooperative activity that is shared by the whole is successfully injected into the next succeeding stage. This completes the inverse transformation back to the pulse mode. In brief, input on labeled lines that is injected by axons into the cortex can destabilize the neural mass, depending on past experience and present motivation, and the cortex can converge to a distributed pattern of fluctuating activity that expresses that confluence of stimulus, experience and expectation. The key to understanding is the state transition that changes the properties of the cortex and extends the information from the local to the distributed mode. It is an input-induced transition from a low-energy disordered state to a high-energy more ordered state (Prigogine, 1984; Skarda & Freeman, 1987). Transmission outside of the cortex has not yet been studied, but I postulate that similar state transitions may occur in subcortical masses, so that with each transfer of information from one brain mass to another, there is injection of information on labeled lines, transition to integra- tion in the wave mode, and reconversion of the integrand to a labeled line pattern on the output channels, the last neural output being the discharges of motor neurons in the brainstem and spinal cord. Implications for neural networks The current theory and practice of neural networks has incorporated many of the important features of the static design of nervous systems, particularly those based on parallel feedforward nets, but has neglected the dynamics of real nervous systems in favor of unrealistic abstractions that do not do justice to the ceaseless fluctuations of neural activity. These are commonly designated as "noise" and removed by stimulus-locked time ensemble averaging in order to impose the ideal of an invariant base- line that precedes the stimulus arrival, or to which the system converges as it "learns" or "perceives". It is this artificial creation of a stable equilibrium to represent the desired state of neural networks, the incorporation of it as a design criterion, that has crippled their performance. Wet nervous systems do not have equilibria except under deep anesthesia, surgical isolation, or near- terminal damage of one kind or another. These reduced states are in fact useful for measuring the open loop time and space constants of parts of wet nervous systems, but clearly, and in the case of general anesthesia by de- finition, there is no information processing in such reduced systems. In- stead, nervous systems appear to be designed by evolution to be destabil- ized by input. They seek input as a means of inducing state changes that disseminate and integrate fresh input with past experience as the basis for impending action. There is no intrinsic hardware or software barrier to constructing neural networks that have the properties of effecting these transforma- tions. We have demonstrated the principles by which they occur in simple systems, that are built with well-known components and algorithms, that are used in novel ways as dictated by the theory and the correspondance to per- formance to the wetware (Freeman, Eisenburg & Burke, 1987; Freeman & Yao, 1988). The key attributes are the biological sigmoid curve, the associa- tional connectivity that is subject to modification by learning, the vari- able global gain under motivational factors, and, most importantly, the ability to change from a low-level receiving state to high-level transmit- ting state. In the low-level state the input injects information into the system. In the high-level state induced by the input in the prepared system, the information is integrated, globally distributed, and incorporated into a novel form of display. The forms of spatial integration of the output in the wetware brain are not yet known. We replace them with a simple Eu- clidean distance measure in n-space, where n is the number of channels simulated each with its amplitude of output of the common carrier. The models show robust abilities for rapid classification of input into learned categories despite the presence of noise, incomplete inputs, overlap of templates and component variability in the case of the hardware embodiment. In these models the input on labeled lines induces a global oscil- lation by a state transition corresponding to a Hopf bifurcation. The classification is performed by the use of a Euclidean distance measure in 64-space. The output is by step functions on labeled lines from a decision function operating on the distributed pattern in the wave mode. Conver- gence to a pattern depends on the input and not on the initial conditions. Classification succeeds well before asymptotic convergence to a steady state. Thereby the frame rate for successive input samples can exceed 10/sec, so that a fluctuating and unpredictable environment can be tracked by the rapidly adapting device. References Amari S (1977a) Neural theory of association and concept-formation. Biological Cybernetics 26: 175-185. Amari S (1977b) Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics 27: 77-87. Freeman WJ (1967) Analysis of function of cerebral cortex by use of control systems theory. Logistics Review 1: 5-40. Freeman WJ (1968) Analog simulation of prepyriform cortex in the cat. Mathematical Biosciences 2: 181-190. Freeman WJ (1975) Mass Action in the Nervous System. Academic Press, New York. Freeman WJ (1979) Nonlinear dynamics of paleocortex manifested in the olfactory EEG. Biological Cybernetics 35: 21-37. Freeman WJ, Eisenberg J & Burke B (1987) Hardware simulation of dynamics in learning: the SPOCK. Proceedings 1st Int. Conf. Neural Networks San Diego III: 435-442. Freeman WJ & van Dijk B (1987) Spatial patterns of visual cortical fast EEG during conditioned reflex in a rhesus monkey. Brain Research 422: 267-276. Freeman WJ, Yao Y & Burke B (1988) Central pattern generating and recognizing in olfactory bulb: a correlation rule. Neural Networks, in press. Gray CM & Singer W (1988) Nonlinear cooperativity mediates oscillatory responses in orientation columns of cat visual cortex. Submitted. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Nat'l. Acad. Sci. USA 79: 3088-3092. Prigogine I (1984) From Being to Becoming. Freeman, New York. Skarda CA & Freeman WJ (1987) How brains make chaos in order to make sense of the world. Brain & Behavioral Sciences 10: 161-195. Supported by grants MH06686 from the National Institute of Mental Health and 87NE129 from the Air Force Office of Scientific Research. From tap at nmsu.csnet Sat Apr 30 01:10:38 1988 From: tap at nmsu.csnet (tap@nmsu.csnet) Date: Fri, 29 Apr 88 23:10:38 MDT Subject: Structure of Neural Nets - control In-Reply-To: TERU's message of Thu, 28 Apr 1988 23:15 PDT <8804300128.AA26022@opus> Message-ID: > Modular structure is interesting so that it expands the > net's heterogeneous structure while maintaining rather > uniform structure in a layer within a module. Each module > may be treated and controlled as a unit at another level. > - Teru Homma This raises a whole lot of questions: Do we really want strict modular structure, where each module is controlled as a unit at another (higher) level? -- Some structural modularity may be neccessary, but maybe the type of modularity that works best for dealing with the world is a flexible one, where the boundaries and interfaces of modules are flexible and context-dependent. How are these modules at that higher level controlled? - as units at an even higher level? Where does it end?, -- it must end in some self-regulated unit or module. Will this result in the type of inflexible control that one sees in many discrete-symbol AI systems? Can we develop principles of self-regulation and apply them directly to all modules, rather than having a hierarchy of control? Is Grossberg's ART a step in this direction? Nearly all work by connectionists that is related by AI has been on developing better representations for data. Most of the processing has been of a single step (single forward propagation, single relaxation, or single settling), and consequently the control has been very simple. Very little work has been done on problems that require more complex control of processing, such as planning, or analysing sequential input of unbounded length and complexity. The control structures used in discrete-symbol AI for doing this type of processing have the same inflexibilities as the discrete-symbol representations. Can connectionists do for control structures what they are doing for representations? i.e. decompose them and recompose them in a more flexible and accessible way. Can representation of control and representation of data become the same thing? I mean this in a strong sense, I don't mean that they should just be representable using the same techniques, but rather that they be identical: the representation of data represents by virtue of recording (memory) and creating (recall, action) processes that that data has effect upon, (and there is NOTHING else). Is there any real and/or valuable distinction between process and control, or is it just a convenient way of interpreting complex systems? (and what is process and what is control?). I would claim that discrete-symbol AI has often made this distinction. It is most evident in systems which systems which provide automatic-backtracking the control decisions (which goal to select next, how to backtrack) are usually made by the system and not the program which is running in it (e.g. most Prologs). I also claim this is bad: systems whose designs contain explicit or implicit distinctions between control and process are doomed to inflexible hierarchical control. But can it be otherwise? -- yes -- some prologs (e.g. Nu-Prolog) allow the data to affect control decisions - goal selection is dependent upon patterns of instantiation. I think this is a step in the right direction, and one that can be taken much further by connectionism. -- Tony Plate