From eric at mcc.com Mon Jan 2 17:02:06 1989 From: eric at mcc.com (Eric Hartman) Date: Mon, 2 Jan 89 16:02:06 CST Subject: Tech Report Announcement Message-ID: <8901022202.AA11713@legendre.aca.mcc.com> The following MCC Technical Report is now available. Requests may be sent to eric at mcc.com or Eric Hartman Microelectronics and Computer Technology Corporation 3500 West Balcones Center Drive Austin, TX 78759-6509 U.S.A. ------------------------------------------------------------------------ Explorations of the Mean Field Theory Learning Algorithm Carsten Peterson* and Eric Hartman Microelectronics and Computer Technology Corporation 3500 West Balcones Center Drive Austin, TX 78759-6509 MCC Technical Report Number: ACA-ST/HI-065-88 Abstract: The mean field theory (MFT) learning algorithm is elaborated and explored with respect to a variety of tasks. MFT is benchmarked against the back propagation learning algorithm (BP) on two different feature recognition problems: two-dimensional mirror symmetry and eight-dimensional statistical pattern classification. We find that while the two algorithms are very similar with respect to generalization properties, MFT normally requires a substantially smaller number of training epochs than BP. Since the MFT model is bidirectional, rather than feed-forward, its use can be extended naturally from purely functional mappings to a content addressable memory. A network with N visible and N hidden units can store up to approximately 2N patterns with good content-addressability. We stress an implementational advantage for MFT: it is natural for VLSI circuitry. Also, its inherent parallelism can be exploited with fully synchronous updating, allowing efficient simulations on SIMD architectures. *Present Address: Department of Theoretical Physics University of Lund Solvegatan 14A, S-22362 Lund, Sweden From Scott.Fahlman at B.GP.CS.CMU.EDU Mon Jan 2 21:57:10 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Mon, 02 Jan 89 21:57:10 EST Subject: Benchmarks mailing list Message-ID: Two or three weeks ago I sent a message to this mailing list announcing our intention to set up at CMU a collection of learning benchmarks, accessible via FTP from the Arpanet. The hope is that this collection will help the research community in our joint effort to characterize the speed and quality of various learning algorithms on a variety of different learning tasks. There were a few problems in getting the new mailing lists set up over the holidays, but I believe we're now ready to proceed. I anticipate that there will be considerable discussion about the usefulness of various benchmarks, how they should be run, results, etc. Rather than clog the "connectionists" mailing list with these benchmark-related messages, we have set up a new mailing list whose Arpanet address is "nn-bench at cs.cmu.edu". If you want to be added to this mailing list, send an "add me" message to "nn-bench-request at cs.cmu.edu". Please include a valid netmail address that we can reach from the Arpanet. If messages to the address you give start bouncing, we'll have to delete you from the list. The "nn-bench-request" address is also the proper destination for "delete me" requests, address changes, and other messages intended only for the mailing list and data base maintainers. Please do not send such messages to "nn-bench" -- you will inconvenience a lot of people and make yourself look like a fool. At present, the mailing list maintainers are Michael Witbrock and me. If you just want to access the benchmark collection and not participate in the related discussions, you don't have to join the "nn-bench" mailing list. Once there is a useful collection of files in one place, I will tell people on the "connectionists" mailing list how to access them. I suggest we wait until January 15 or so before we start discussing substantive issues on the "nn-bench" list. This will give people time to join the mailing list before the fun begins. We will archive old messages for those who join later. -- Scott Fahlman, CMU From kruschke at cogsci.berkeley.edu Tue Jan 3 03:30:12 1989 From: kruschke at cogsci.berkeley.edu (John Kruschke) Date: Tue, 3 Jan 89 00:30:12 PST Subject: No subject Message-ID: <8901030830.AA09915@cogsci.berkeley.edu> Here is the compilation of responses to my request for info on weight decay. I have kept editing to a minimum, so you can see exactly what the author of the reply said. Where appropriate, I have included some comments of my own, set off in square brackets. The responses are arranged into three broad topics: (1) Boltzmann-machine related; (2) back-prop related; (3) psychology related. Thanks to all, and happy new year! --John ----------------------------------------------------------------- ORIGINAL REQUEST: I'm interested in all the information I can get regarding WEIGHT DECAY in back-prop, or in other learning algorithms. *In return* I'll collate all the info contributed and send the complilation out to all contributors. Info might include the following: REFERENCES: - Applications which used weight decay - Theoretical treatments Please be as complete as possible in your citation. FIRST-HAND EXPERIENCE - Application domain, details of I/O patterns, etc. - exact decay procedure used, and results (Please send info directly to me: kruschke at cogsci.berkeley.edu Don't use the reply command.) T H A N K S ! --John Kruschke. ----------------------------------------------------------------- From: Geoffrey Hinton Date: Sun, 4 Dec 88 13:57:45 EST Weight-decay is a version of what statisticians call "Ridge Regression". We used weight-decay in Boltzmann machines to keep the energy barriers small. This is described in section 6.1 of: Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984) Boltzmann Machines: Constraint satisfaction networks that learn. Technical Report CMU-CS-84-119, Carnegie-Mellon University. I used weight decay in the family trees example. Weight decay was used to improve generalization and to make the weights easier to interpret (because, at equilibrium, the magnitude of a weight = its usefulness). This is in: Rumelhart, D.~E., Hinton, G.~E., and Williams, R.~J. (1986) Learning representations by back-propagating errors. {\it Nature}, {\bf 323}, 533--536. I used weight decay to achieve better generalization in a hard generalization task that is reported in: Hinton, G.~E. (1987) Learning translation invariant recognition in a massively parallel network. In Goos, G. and Hartmanis, J., editors, {\it PARLE: Parallel Architectures and Languages Europe}, pages~1--13, Lecture Notes in Computer Science, Springer-Verlag, Berlin. Weight-decay can also be used to keep "fast" weights small. The fast weights act as a temporary context. One use of such a context is described in: Hinton, G.~E. and Plaut, D.~C. (1987) Using fast weights to deblur old memories. {\it Proceedings of the Ninth Annual Conference of the Cognitive Science Society}, Seattle, WA. --Geoff ----------------------------------------------------------------- [In his lecture at the International Computer Science Institute, Berkeley CA, on 16-DEC-88, Geoff also mentioned that weight decay is good for wiping out the initial values of weights so that only the effects of learning remain. In particular, if the change (due to learning) on two weights is the same for all updates, then the two weights converge to the same value. This is one way to generate symmetric weights from non-symmetric starting values. --John] ----------------------------------------------------------------- From: Michael.Franzini at SPEECH2.CS.CMU.EDU Date: Sun, 4 Dec 1988 23:24-EST My first-hand experience confirms what I'm sure many other people have told you: that (in general) weight decay in backprop increases generalization. I've found that it's particulary important for small training sets, and its effect diminishes as the training set size increases. Weight decay was first used by Barak Pearlmutter. The first mention of weight decay is, I believe, in an early paper of Hinton's (possibly the Plaut, Nowlan, and Hinton CMU CS tech report), and it is attributed to "Barak Pearlmutter, Personal Communication" there. The version of weight decay that (i'm fairly sure) all of us at CMU use is one in which each weight is multiplied by 0.999 every epoch. Scott Fahlman has a more complicated version, which is described in his QUICKPROP tech report. [QuickProp is also described in his paper in the Proceedings of the 1988 Connectionist Models Summer School, published by Morgan Kaufmann. --John] The main motivation for using it is to eliminate spurious large weights which happen not to interfere with recognition of training data but would interfere with recognizing testing data. (This was Barak's motivation for trying it in the first place.) However, I have heard more theoretical justifications (which, unfortunately, I can't reproduce.) In case Barak didn't reply to your message, you might want to contact him directly at bap at cs.cmu.edu. --Mike ----------------------------------------------------------------- From: Barak.Pearlmutter at F.GP.CS.CMU.EDU Date: 8 Dec 1988 16:36-EST We first used weight decay as a way to keep weights in a boltzmann machine from growing too large. We added a term to the thing being minimized, G, so that G' = G + 1/2 h \sum_{i Date: Tue, 6 Dec 88 09:34 CST Probably he will respond to you himself, but Alex Weiland of MITRE presented a paper at INNS in Boston on shaping, in which the order of presentation of examples in training a back-prop net was altered to reflect a simpler rule at first. Over a number of epochs he gradually changed the examples to slowly change the rule to the one desired. The nets learned much faster than if he just tossed the examples at the net in random order. He told me that it would not work without weight decay. He said their rule-of-thumb was the decay should give the weights a half-life of 2 to 3 dozen epochs (usually a value such as 0.9998). But I neglected to ask him if he felt that the number of epochs or the number of presentations was important. Perhaps if one had a significantly different training set size, that rule-of-thumb would be different? I have started some experiments simular to his shaping, using some random variation of the training data (where the random variation grows over time). Weiland also discussed this in his talk. I haven't yet compared decay with no-decay. I did try (as a lark) using decay with a regular (non-shaping) training, and it did worse than we usually get (on same data and same network type/size/shape). Perhaps I was using a stupid decay value (0.9998 I think) for that situation. I hope to get back to this, but at the moment we are preparing for a software release to our shareholders (MCC is owned by 20 or so computer industry corporations). In the next several weeks a lot of people will go on Christmas vacation, so I will be able to run a bunch of nets all at once. They call me the machine vulture. ----------------------------------------------------------------- From: Tony Robinson Date: Sat, 3 Dec 88 11:10:20 GMT Just a quick note in reply to your message to `connectionists' to say that I have tried to use weight decay with back-prop on networks with order 24 i/p, 24 hidden, 11 o/p units. The problem was vowel recognition (I think), it was about 18 months ago, and the problem was of the unsolvable type (i.e. non-zero final energy). My conclusion was that weight decay only made matters worse, and my justification (to myself) for abandoning weight decay was that you are not even pretending to do gradient descent any more, and any good solution formed quickly becomes garbaged by scaling the weights. If you want to avoid hidden units sticking on their limiting values, why not use hidden units with no limiting values, for instance I find the activation function f(x) = x * x works better than f(x) = 1.0 / (1.0 + exp(- x)) anyway. Sorry I havn't got anything formal to offer, but I hope these notes help. Tony Robinson. ----------------------------------------------------------------- From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Sat, 3 Dec 88 11:54:02 EST Actually, "costs" or "penalty" functions are probably better terms. We had a poster last week at NIPS that discussed some of the pitfalls and advantages of two kinds of costs. I can send you the paper when we have a version available. Stephen J. Hanson (jose at bellcore.com) ----------------------------------------------------------------- [ In a conversation in his office on 06-DEC-88, Dave Rumelhart described to me several cost functions he has tried. The motive for the functions he has tried is different from the motive for standard weight decay. Standard weight decay, \sum_{i,j} w_{i,j}^2 , is used to *distribute* weights more evenly over the given connections, thereby increasing robustness (cf. earlier replies). He has tried several other cost functions in an attempt to *localize*, or concentrate, the weights on a small subset of the given connections. The goal is to improve generalization. His favorite is \sum_{i,j} ( w_{i,j}^2 / ( K + w_{i,j}^2 ) ) where K is a constant, around 1 or 2. Note that this function is negatively accelerating, whereas standard weight decay is positively accelerating. This function penalizes small weights (proportionally) more than large weights, just the opposite of standard weight decay. He has also tried, with less satisfying results, \sum ( 1 - \exp - (\alpha w_{i,j}^2) ) and \sum \ln ( K + w_{i,j}^2 ). Finally, he has tried a cost function designed to make all the fan-in weights of a single unit decay, when possible. That is, the unit is effectively cut out of the network. The function is \sum_i (\sum_j w_{i,j}^2) / ( K + \sum_j w_{i,j}^2 ). Each weight is thereby penalized (inversely) proportionally to the total fan-in weight of its node. --John ] ----------------------------------------------------------------- [ This is also a relevant place to mention my paper in the Proceedings of the 1988 Connectionist Models Summer School, "Creating local and distributed bottlenecks in back-propagation networks". I have since developed those ideas, and have expressed the localized bottleneck method as gradient descent on an additional cost term. The cost term is quite general, and some forms of decay are simply special cases of it. --John] ----------------------------------------------------------------- From: john moody Date: Sun, 11 Dec 88 22:54:11 EST Scalettar and Zee did some interesting work on weight decay with back prop for associative memory. They found that a Unary Representation emerged (see Baum, Moody, and Wilczek; Bio Cybernetics Aug or Sept 88 for info on Unary Reps). Contact Tony Zee at UCSB (805)961-4111 for info on weight decay paper. --John Moody ----------------------------------------------------------------- From: gluck at psych.Stanford.EDU (Mark Gluck) Date: Sat, 10 Dec 88 16:51:29 PST I'd appreciate a copy of your weight decay collation. I have a paper in MS form which illustrates how adding weight decay to the linear-LMS one-layer net improves its ability to predict human generalization in classification learning. mark gluck dept of psych stanford univ, stanford, ca 94305 ----------------------------------------------------------------- From: INAM000 (Tony Marley) Date: SUN 04 DEC 1988 11:16:00 EST I have been exploring some ideas re COMPETITIVE LEARNING with "noisy weights" in modeling simple psychophysics. The task is the classical one of identifying one of N signals by a simple (verbal) response -e.g. the stimuli might be squares of different sizes, and one has to identify the presented one by saying the appropriate integer. We know from classical experiments that people cannot perform this task perfectly once N gets larger than about 7, but performance degrades smoothly for larger N. I have been developing simulations where the mapping is learnt by competitive learning, with the weights decaying/varying over time when they are not reset by relevant inputs. I have not got too many results to date, as I have been taking the psychological data seriously, which means worrying about reaction times, sequential effects, "end effects" (stimuli at the end of the range more accurately identified), range effects (increasing the stimulus range has little effect), etc.. Tony Marley ----------------------------------------------------------------- From: aboulanger at bbn.com (Albert Boulanger) Date: Fri, 2 Dec 88 19:43:14 EST This one concerns the Hopfield model. In James D Keeler, "Basin of Attraction of Neural Network Models", Snowbird Conference Proceedings (1986), 259-264, it is shown that the basins of attraction become very complicated as the number of stored patterns increase. He uses a weight modification method called "unlearning" to smooth out these basins. Albert Boulanger BBN Systems & Technologies Corp. aboulanger at bbn.com ----------------------------------------------------------------- From: Joerg Kindermann Date: Mon, 5 Dec 88 08:21:03 -0100 We used a form of weight decay not for learning but for recall in multilayer feedforward networks. See the following abstract. Input patterns are treated as ``weights'' coming from a constant valued external unit. If you would like a copy of the technical report, please send e-mail to joerg at gmdzi.uucp or write to: Dr. Joerg Kindermann Gesellschaft fuer Mathematik und Datenverarbeitung Schloss Birlinghoven Postfach 1240 D-5205 St. Augustin 1 WEST GERMANY Detection of Minimal Microfeatures by Internal Feedback J. Kindermann & A. Linden Abstract We define the notion of minimal microfeatures and introduce a new method of internal feedback for multilayer networks. Error signals are used to modify the input of a net. When combined with input DECAY, internal feedback allows the detection of sets of minimal microfeatures, i.e. those subpatterns which the network actually uses for discrimination. Additional noise on the training data increases the number of minimal microfeatures for a given pattern. The detection of minimal microfeatures is a first step towards a subsymbolic system with the capability of self-explanation. The paper provides examples from the domain of letter recognition. ----------------------------------------------------------------- From: Helen M. Gigley Date: Mon, 05 Dec 88 11:03:23 -0500 I am responding to your request even though my use of decay is not with respect to learning in connectionist-like models. My focus has been on a functioning system that can be lesioned. One question I have is what is the behavioral association to weight decay? What aspects of learning is it intended to reflect. I can understand that activity decay over time of each cell is meaningful and reflects a cellular property, but what is weight decay in comparable terms? Now, I will send you offprints if you would like of my work and am including a list of several publications which you may be able to peruse. The model, HOPE, is a hand-tuned structural connectionist model that is designed to enable lesioning without redesign or reprogramming to study possible processing causes of aphasia. Decay factors as an integral part of dynamic time-dependent processes are one of several aspects of processing in a neural environment which potentially affect the global processing results even though they are defined only locally. If I can be of any additional help please let me know. Helen Gigley References: Gigley, H.M. Neurolinguistically Constrained Simulation of Sentence Comprehension: Integrating Artificial Intelligence and Brain Theorym Ph.D. Dissertation, UMass/Amherst, 1982. Available University Microfilms, Ann Arbor, MI. Gigley, H.M. HOPE--AI and the dynamic process of language behavior. in Cognition and Brain Theory 6(1) :39-88, 1983. Gigley, H.M. Grammar viewed as a functioning part of of a cognitive system. Proceedings of ACL 23rd Annual Meeting, Chicago, 1985 . Gigley, H.M. Computational Neurolinguistics -- What is it all about? in IJCAI Proceedings, Los Angeles, 1985. Gigley, H.M. Studies in Artificial Aphasia--experiments in processing change. In Journal of Computer Methods and Programs in Biomedicine, 22 (1): 43-50, 1986. Gigley, H.M. Process Synchronization, Lexical Ambiguity Resolution, and Aphasia. In Steven L. Small, Garrison Cottrell, and Michael Tanenhaus (eds.) Lexical Ambiguity Resolution, Morgen Kaumann, 1988. ----------------------------------------------------------------- From: bharucha at eleazar.Dartmouth.EDU (Jamshed Bharucha) Date: Tue, 13 Dec 88 16:56:00 EST I haven't tried weight decay but am curious about it. I am working on back-prop learning of musical sequences using a Jordan-style net. The network develops a musical schema after learning lots of sequences that have culture-specific regularities. I.e., it learns to generate expectancies for tones following a sequential context. I'm interested in knowing how to implement forgetting, whether short term or long term. Jamshed. ----------------------------------------------------------------- From will at ida.org Tue Jan 3 10:50:14 1989 From: will at ida.org (Craig Will) Date: Tue, 3 Jan 89 10:50:14 EST Subject: Copies of DARPA Request for Proposals Available Message-ID: <8901031550.AA16284@csed-1> Copies of DARPA Request for Proposals Available Copies of the DARPA Neural Network Request for Propo- sals are now available (free) upon request. This is the same text as that published December 16 in the Commerce Business Daily, but reformatted and with bigger type for easier reading. This version was sent as a 4-page "Special supplementary issue" to subscribers of Neural Network Review in the United States. To get a copy mailed to you, send your US postal address to either: Michele Clouse clouse at ida.org (milnet) or: Neural Network Review P. O. Box 427 Dunn Loring, VA 22027 From harnad at Princeton.EDU Wed Jan 4 10:12:06 1989 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 4 Jan 89 10:12:06 EST Subject: Connectionist Concepts: BBS Call for Commentators Message-ID: <8901041512.AA11296@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on this article, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ THE CONNECTIONIST CONSTRUCTION OF CONCEPTS Adrian Cussins, New College, Oxford Keywords: connectionism, representation, cognition, perception, nonconceptual content, concepts, learning, objectivity, semantics Computational modelling of cognition depends on an underlying theory of representation. Classical cognitive science has exploited the syntax/semantics theory of representation derived from formal logic. As a consequence, the kind of psychological explanation supported by classical cognitive science is "conceptualist": psychological phenomena are modelled in terms of relations between concepts and between the sensors/effectors and concepts. This kind of explanation is inappropriate according to Smolensky's "Proper Treatment of Connectionism" [BBS 11(1) 1988]. Is there an alternative theory of representation that retains the advantages of classical theory but does not force psychological explanation into the conceptualist mold? I outline such an alternative by introducing an experience-based notion of nonconceptual content and by showing how a complex construction out of nonconceptual content can satisfy classical constraints on cognition. Cognitive structure is not interconceptual but intraconceptual. The theory of representational structure within concepts allows psychological phenomena to be explained as the progressive emergence of objectivity. This can be modelled computationally by transformations of nonconceptual content which progressively decrease its perspective-dependence through the formation of a cognitive map. Stevan Harnad ARPA/INTERNET harnad at confidence.princeton.edu harnad at princeton.edu harnad at mind.princeton.edu srh at flash.bellcore.com harnad at elbereth.rutgers.edu CSNET: harnad%mind.princeton.edu at relay.cs.net UUCP: harnad at princeton.uucp BITNET: harnad at pucc.bitnet harnad1 at umass.bitnet Phone: (609)-921-7771 From will at ida.org Wed Jan 4 10:59:54 1989 From: will at ida.org (Craig Will) Date: Wed, 4 Jan 89 10:59:54 EST Subject: Copies of DARPA Req for Prop Available Message-ID: <8901041559.AA13970@csed-1> Copies of DARPA Request for Proposals Available Copies of the DARPA Neural Network Request for Propo- sals are now available (free) upon request. This is the same text as that published December 16 in the Commerce Business Daily, but reformatted and with bigger type for easier reading. This version was sent as a 4-page "Special supplementary issue" to subscribers of Neural Network Review in the United States. To get a copy mailed to you, send your US postal address to either: Michele Clouse clouse at ida.org (milnet) or: Neural Network Review P. O. Box 427 Dunn Loring, VA 22027 From harnad at Princeton.EDU Wed Jan 4 10:18:00 1989 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 4 Jan 89 10:18:00 EST Subject: Speech Perception: BBS Multiple Book Review Message-ID: <8901041518.AA11306@psycho.Princeton.EDU> Below is the abstract of a book that will be multiply reviewed in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Reviewers must be current BBS Associates or nominated by a current BBS Associate. To be considered as a reviewer for this book, to suggest other appropriate reviewers, or for information about how to become a BBS Associate, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ BBS Multiple Book review of: SPEECH PERCEPTION BY EAR AND EYE: A PARADIGM FOR PSYCHOLOGICAL INQUIRY (Hillsdale NJ: LE Erlbaum Associates 1987) Dominic William Massaro Program in Experimental Psychology University of California, Santa Cruz Keywords: speech perception; vision; audition; categorical perception; connectionist models; fuzzy logic; sensory impairment; decision making This book is about the processing of information, particularly in face-to-face spoken communication where both audible and visible information are available. Experimental tasks were designed to manipulate many of these sources of information independently and to test mathematical fuzzy logical and other models of performance and the underlying stages of information processing. Multiple sources of information are evaluated and integrated to achieve speech perception. Graded information seems to be derived about the degree to which an input fits a given category rather than just all-or-none categorical information. Sources of information are evaluated independently, with the integration process insuring that the least ambiguous sources have the most impact on the judgment. The processes underlying speech-perception also occur in a variety of other behaviors, ranging from categorization to sentence interpretation, decision making and forming impressions about people. ----- Stevan Harnad INTERNET harnad at confidence.princeton.edu harnad at princeton.edu harnad at mind.princeton.edu srh at flash.bellcore.com harnad at elbereth.rutgers.edu CSNET: harnad%mind.princeton.edu at relay.cs.net UUCP: harnad at princeton.uucp BITNET: harnad at pucc.bitnet harnad1 at umass.bitnet Phone: (609)-921-7771 From mesard at BBN.COM Thu Jan 5 09:37:12 1989 From: mesard at BBN.COM (mesard@BBN.COM) Date: Thu, 05 Jan 89 09:37:12 -0500 Subject: Tech Report Announcement In-Reply-To: Your message of Mon, 02 Jan 89 16:02:06 -0600. <8901022202.AA11713@legendre.aca.mcc.com> Message-ID: Please send me a copy of the tech report Explorations of the Mean Field Theory Learning Algorithm Thanks. Wayne Mesard Mesard at BBN.COM 70 Fawcett St. Cambridge, MA 02138 617-873-1878 From gluck at psych.Stanford.EDU Thu Jan 5 10:20:17 1989 From: gluck at psych.Stanford.EDU (Mark Gluck) Date: Thu, 5 Jan 89 07:20:17 PST Subject: Human Learning & Connectionist Models Message-ID: I would grateful to receive information about people using connectionist/neural-net approaches within cognitive psychology to model human learning and memory data. Citations to published work, information about work in progress, and copies of reprints or preprints would be most welcome and appreciated. Mark Gluck Dept. of Psychology Jordan Hall; Bldg. 420 Stanford University Stanford, CA 94305 (415) 725-2434 gluck at psych.stanford.edu. From kanderso at BBN.COM Thu Jan 5 16:30:15 1989 From: kanderso at BBN.COM (kanderso@BBN.COM) Date: Thu, 05 Jan 89 16:30:15 -0500 Subject: No subject In-Reply-To: Your message of Tue, 03 Jan 89 00:30:12 -0800. <8901030830.AA09915@cogsci.berkeley.edu> Message-ID: I enjoyed John's summary of weight decay, but it raised a few questions. Just as John did, i'll be glad to summarize the responses to the group. 1. mentioned that " Weight-decay is a version of what statisticians call "Ridge Regression"." What do you mean by "version" is is exactly the same, or just slightly? I think i know what Ridge Regression is, but i don't see an obvious strong connection. I see a weak one, and after i think about it more maybe i'll say something about it. The ideas behind Ridge regression probably came from Levenberg and Marquardt who used it in nonlinear least squares: Levenberg K., A Method for the solution of certain nonlinear problems in least squares, Q. Appl. Math, Vol 2, pages 164-168, 1944. Marquardt, D.W., An algorithm for least squares estimation of non-linear parameters, J. Soc. Industrial and Applied Math., 11:431-441, 1963. 2. John quoted Dave Rumelhart as saying that standard weight decay distributes weights more evenly over the given connections, thereby increasing robustness. Why does smearing out large weights increase robustness? What does robustness mean here, the ability to generalize? k From dreyfus at cogsci.berkeley.edu Thu Jan 5 21:04:34 1989 From: dreyfus at cogsci.berkeley.edu (Hubert L. Dreyfus) Date: Thu, 5 Jan 89 18:04:34 PST Subject: Connectionist Concepts: BBS Call for Commentators Message-ID: <8901060204.AA02484@cogsci.berkeley.edu> Stevan: Stuart and I would like to write a joint comment on Cussins' paper. Please send us the latest version by e-mail or regular mail whichever you prefer. Hubert Dreyfus From daugman%charybdis at harvard.harvard.edu Fri Jan 6 10:41:42 1989 From: daugman%charybdis at harvard.harvard.edu (j daugman) Date: Fri, 6 Jan 89 10:41:42 EST Subject: Neural Networks in Natural and Artificial Vision Message-ID: For preparation of 1989 conference tutorials and reviews, I would be grateful to receive any available p\reprints reporting research on neural network models of human / biological vision and applications in artificial vision. Thanks in advance. John Daugman Harvard University 950 William James Hall Cambridge, Mass. 02138 From josh at flash.bellcore.com Fri Jan 6 14:32:55 1989 From: josh at flash.bellcore.com (Joshua Alspector) Date: Fri, 6 Jan 89 14:32:55 EST Subject: VLSI Implementations of Neural Networks Message-ID: <8901061932.AA07422@flash.bellcore.com> I will be giving a tuturial on the above topic at the Custom Integrated Circuits Conference. Vu grafs are due at the end of February and I would like to include as complete a description as possible of current efforts in the VLSI implementation of neural networks. I would appreciate receiving any preprints or hard copies of vu grafs regarding any work you are doing. E-mail reports are also acceptable. Please send to: Joshua Alspector Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07960-1910 From neural!jsd Fri Jan 6 12:45:14 1989 From: neural!jsd (John Denker) Date: Fri, 6 Jan 89 12:45:14 EST Subject: confidence / runner-up activation Message-ID: <8901061744.AA10566@neural.UUCP> Yes, we've been using the activation level of the runner-up neurons to provide confidence information in our character recognizer for some time. The work was reported at the last San Diego mtg and at the last Denver mtg. --- jsd (John Denker) From netlist at psych.Stanford.EDU Tue Jan 10 09:43:16 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Tue, 10 Jan 89 06:43:16 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS Co-sponsored by the Departments of Psychology and Electrical Engineering Winter Quarter 1989 Schedule ---------------------------- Jan. 12th (Thursday, 3:30pm): ----------------------------- STEVEN PINKER CONNECTIONISM AND Department of Brain & Cognitive Sciences THE FACTS OF HUMAN LANGUAGE Massachusetts Institute of Technology email: steve at psyche.mit.edu (with commentary by David Rumelhart) Jan. 24th (Tuesday, 3:30pm): ---------------------------- LARRY MALONEY LEARNING BY ASSERTION: Department of Psychology CALIBRATING A SIMPLE VISUAL SYSTEM New York University email: ltm at xp.psych.nyu.edu Feb. 9th (Thursday, 3:30pm): ---------------------------- CARVER MEAD VLSI MODELS OF NEURAL NETWORKS Moore Professor of Computer Science California Institute of Technology Feb. 21st (Tuesday, 3:30pm): ---------------------------- PIERRE BALDI ON SPACE AND TIME IN NEURAL COMPUTATIONS Jet Propulsion Laboratory California Institute of Technology email: pfbaldi at caltech.bitnet Mar. 14th (Tuesday, 3:30pm): ---------------------------- ALAN LAPEDES NONLINEAR SIGNAL PROCESSING WITH NEURAL NETS Theoretical Division - MS B213 Los Alamos National Laboratory email: asl at lanl.gov Additional Information ---------------------- The talks (including discussion) last about one hour and fifteen minutes. Following each talk, there will be a reception. Unless otherwise noted, all talks will be held in room 380-380F, which is in the basement of the Mathematical Sciences buildings. To be placed on an electronic-mail distribution list for information about these and other adaptive network events in the Stanford area, send email to netlist at psych.stanford.edu. For additional information, contact: Mark Gluck, Department of Psychology, Bldg. 420, Stanford University, Stanford, CA 94305 (phone 415-725-2434 or email to gluck at psych.stanford.edu). Program Committe: Committee: Bernard Widrow (E.E.), David Rumelhart, Misha Pavel, Mark Gluck (Psychology). This series is supported by the Departments of Psychology and Electrical Engineering and by a gift from the Thomson-CSF Corporation. Coming this Spring: D. Parker, B. McNaughton, G. Lynch & R. Granger From hinton at ai.toronto.edu Tue Jan 10 10:09:11 1989 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 10 Jan 89 10:09:11 EST Subject: new tech report Message-ID: <89Jan10.100924est.10956@ephemeral.ai.toronto.edu> The following report can be obtained by sending an email request to carol at ai.toronto.edu If this fails try carol%ai.toronto.edu at relay.cs.net Please do not send email to me about it (so don't use "reply" or "answer"). "Deterministic Boltzmann Learning Performs Steepest Descent in Weight-space." Geoffrey E. Hinton Department of Computer Science University of Toronto Technical report CRG-TR-89-1 ABSTRACT The Boltzmann machine learning procedure has been successfully applied in deterministic networks of analog units that use a mean field approximation to efficiently simulate a truly stochastic system {Peterson and Anderson, 1987}. This type of ``deterministic Boltzmann machine'' (DBM) learns much faster than the equivalent ``stochastic Boltzmann machine'' (SBM), but since the learning procedure for DBM's is only based on an analogy with SBM's, there is no existing proof that it performs gradient descent in any function, and it has only been justified by simulations. By using the appropriate interpretation for the way in which a DBM represents the probability of an output vector given an input vector, it is shown that the DBM performs steepest descent in the same function as the original SBM, except at rare discontinuities. A very simple way of forcing the weights to become symmetrical is also described, and this makes the DBM more biologically plausible than back-propagation. From netlist at psych.Stanford.EDU Wed Jan 11 09:29:01 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Wed, 11 Jan 89 06:29:01 PST Subject: Thurs (1/12): Steven Pinker on Language Models Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 12th (Thursday, 3:30pm): ----------------------------- ******************************************************************************** STEVEN PINKER CONNECTIONISM AND Department of Brain & Cognitive Sciences THE FACTS OF HUMAN LANGUAGE Massachusetts Institute of Technology email: steve at psyche.mit.edu (with commentary by David Rumelhart) ******************************************************************************** Abstract Connectionist modeling holds the promise of making important contributions to our understanding of human language. For example, such models can explore the role of parallel processing, constraint satisfaction, neurologically realistic architectures, and efficient pattern-matching in linguistic processes. However, the current connectionist program of language modeling seems to be motivated by a different set of goals: reviving classical associationism, elminating levels of linguistic representation, and maximizing the role of top-down, knowledge-driven processing. I present evidence (developed in collaboration with Alan Prince) that these goals are ill-advised, because the empirical assumptions they make about human language are simply false. Specifically, evidence from adults' and children's abilities with morphology, semantics, and syntax suggests that people possess formal linguistic rules and autonomous linguistic representations, which are not based on the statistical correlations among microfeatures that current connectionist models rely on so heavily. Moreover, I suggest that treating the existence of mentally-represented rules and representations as an empirical question will lead to greater progress than rejecting them on a priori methodological grounds. The data suggest that some linguistic processes are saliently rule-like, and call for a suitable symbol-processing architecture, whereas others are associative, and can be insightfully modeled using connectionist mechanisms. Thus taking the facts of human language seriously can lead to an interesting rapprochement between standard psycholinguistics and connectionist modeling. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From unido!gmdzi!joerg at uunet.UU.NET Thu Jan 12 04:30:50 1989 From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann) Date: Thu, 12 Jan 89 08:30:50 -0100 Subject: CALL FOR PARTICIPATION Message-ID: <8901120730.AA03021@gmdzi.UUCP> Workshop ``DANIP'' Distributed Adaptive Neural Information Processing. 24.-25.4.1989 Gesellschaft fuer Mathematik und Datenverarbeitung mbH Sankt Augustin Neural information processing is constantly gaining increasing attention in many scientific areas. As a consequence the first ``Workshop Konnektionismus'' at the GMD was organized in February 1988. It gave an overview of research activities in neural networks and their applications to Artificial Intelligence. Now, almost a year later, the time has come to focus on the state of neural information processing itself. The aim of the workshop is to discuss TECHNICAL aspects of information processing in neural networks on the basis of personal contributions in one of the following areas: - new or improved learning algorithms (including evaluations) - self organization of structured (non-localist) neural networks - time series analysis by means of neural networks - adaptivity, e.g the problem of relearning - adequate coding of information for neural processing - generalization - weight interpretation (correlative and other)} Presentations which report on ``work in progress'' are encouraged. The size of the workshop will be limited to 15 contributions of 30 minutes in length. A limited number of additional participants may attend the workshop and take part in the discussions. To apply for the workshop as a contributor, please send information about your contribution (1-2 pages in English or a relevant publication). If you want to participate without giving an oral presentation, please include a description of your background in the field of neural networks. Proceedings on the basis of workshop contributions will be published after the workshop. SCHEDULE: 28 February 1989: deadline for submission of applications 20 March 1989: notification of acceptance 24 - 25 April 1989: workshop ``DANIP'' 31 July 1989: deadline for submission of full papers to be included in the proceedings Applications should be sent to the following address: Dr. Joerg Kindermann or Alexander Linden Gesellschaft fuer Mathematik und Datenverarbeitung mbH - Schloss Birlinghoven - Postfach 1240 D-5205 Sankt Augustin 1 WEST GERMANY e-mail: joerg at gmdzi al at gmdzi From pwh at ece-csc.ncsu.edu Fri Jan 13 17:28:39 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Fri, 13 Jan 89 17:28:39 EST Subject: No subject Message-ID: <8901132228.AA05092@ece-csc.ncsu.edu> NEURAL NETWORKS CALL FOR PAPERS International Joint Conference on Neural Networks June 18-22, 1989 Washington, D.C. The 1989 IEEE/INNS International Joint Conference on Neural Networks (IJCNN-89) will be held at the Sheraton Washington Hotel in Washington, D.C., USA from June 18-22, 1989. IJCNN-89 is the first conference in a new series devoted to the technology and science of neurocomputing and neural networks in all of their aspects. The series replaces the previous IEEE ICNN and INNS Annual Meeting series and is jointly sponsored by the IEEE Technical Activities Board Neural Network Committee and the International Neural Network Society (INNS). IJCNN-89 will be the only major neural net- work meeting of 1989 (IEEE ICNN-89 and the 1989 INNS Annual Meeting have both been cancelled). Thus, it behooves all members of the neural network community who have important new results for presentation to prepare their papers now and submit them by the IJCNN-89 deadline of 1 FEBRUARY 1989. The Conference Proceedings will be distributed AT THE REGISTRATION DESK to all regular conference registrants as well as to all student registrants. The conference will include a day of tutorials (June 18), the exhibit hall (the neurocomputing industry's primary annual trade show), plenary talks, and social events. Mark your calendar today and plan to attend IJCNN-89 -- the definitive annual progress report on the neurocomputing revolution! DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 is FEBRUARY 1, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Analysis -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Optimization -Electronic Neurocomputers -Neural Network Architectures & Theory -Speech Recognition FULL PAPERS in camera-ready form (1 original on Author's Kit forms and 5 reduced 8 1/2" x 11" copies) should be submitted to Nomi Feldman, Confer- ence Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, IJCNN-89 Conference Coordinator 3770 Tansy Street, San Diego, CA 92121 (619) 453-6222 From rudnick at cse.ogc.edu Sat Jan 14 18:05:27 1989 From: rudnick at cse.ogc.edu (Mike Rudnick) Date: Sat, 14 Jan 89 15:05:27 PST Subject: genetic search and neural nets Message-ID: <8901142305.AA07774@ogccse.OGC.EDU> I am a phd candidate in computer science at Oregon Graduate Center. My research interest is in using genetic search to tackle artificial neural network (ANN) scaling issues. My particular orientation is to view minimizing interconnections as a central issue, partly motivated by VLSI implementation issues. I am starting a mailing list for those interested in applying genetic search to/with/for ANNs. Mail a request to Neuro-evolution-request at cse.ogc.edu to have your name added to the list. A bibliography of work relating artificial neural networks (ANNs) and genetic search is available. It is organized/oriented for someone familiar with the ANN literature but unfamiliar with the genetic search literature. Send a request to Neuro-evolution-request at cse.ogc.edu for a copy. If there is sufficient interest I will post the bibliography here. -------------------------------------------------------------------------- Mike Rudnick CSnet: rudnick at cse.ogc.edu Computer Science & Eng. Dept. ARPAnet: rudnick%cse.ogc.edu at relay.cs.net Oregon Graduate Center BITNET: rudnick%cse.ogc.edu at relay.cs.net 19600 N.W. von Neumann Dr. UUCP: {tektronix,verdix}!ogccse!rudnick Beaverton, OR. 97006-1999 (503) 690-1121 X7390 -------------------------------------------------------------------------- From sontag at fermat.rutgers.edu Tue Jan 17 14:08:03 1989 From: sontag at fermat.rutgers.edu (sontag@fermat.rutgers.edu) Date: Tue, 17 Jan 89 14:08:03 EST Subject: Kolmogorov's superposition theorem Message-ID: <8901171908.AA00964@control.rutgers.edu> *** I am posting this for Professor Rui de Figuereido, a researcher in Control Theory and Circuits who does not subscribe to this list. Please direct cc's of all responses to his e-mail address (see below). -eduardo s. *** KOLMOGOROV'S SUPERPOSITION THEOREM AND ARTIFICIAL NEURAL NETWORKS Rui J. P. de Figueiredo Dept. of Electrical and Computer Engineering Rice University, Houston, TX 77251-1892 e-mail: rui at zeta.rice.edu The implementation of the Kolmogorov-Arnold-Sprecher Superposition Theorem [1-3] in terms of artificial neural networks was first presented and fully discussed by me in 1980 [4]. I also discussed, then [4], applications of these structures to statistical pattern recognition and image and multi- dimensional signal processing. However, I did not use the words "neural networks" in defining the underlying networks. For this reason, the current researchers on neural nets including Robert Hecht-Nielsen [5] do not seem to be aware of my contribution [4]. I hope that this note will help correct history. Incidentally, there is a misprint in [4]. In [4], please insert "no" in the statement before eqn.(4). That statement should read: "Sprecher showed that lambda can be any nonzero number which satisfies no equation ..." [1] A.K.Kolmogorov, "On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition," Dokl.Akad.Nauk.SSSR,Vol.114,pp.369-373,1957. [2] V.I.Arnol'd, "On functions of three variables," Dokl.Akad.Nauk.SSSR, Vol.114,pp.953-956,1957. [3] D.A.Sprecher, "An improvement in the superposition theorem of Kolmogorov," J.Math.Anal.Appl.,Vol.38,pp.208-213,1972. [4] Rui J.P.de Figueiredo, "Implications and applications of Kolmogorov's superposition theorem,"IEEE Trans.Auto.Contr.,Vol.AC-25,pp.1227-1231,1980. [5] R.Hecht-Nielsen, "Kolmogorov's mapping neural network existence theorem," IEEE 1st Int.Conf.on Neural Networks, San Diego,CA,June 21-24,1987,paper III-11. From ncr-fc!avery at ncr-sd.sandiego.ncr.com Tue Jan 17 19:43:43 1989 From: ncr-fc!avery at ncr-sd.sandiego.ncr.com (ncr-fc!avery@ncr-sd.sandiego.ncr.com) Date: Tue, 17 Jan 89 17:43:43 MST Subject: new address Message-ID: <8901180043.AA19084@ncr-fc.FtCollins.NCR.com> I have a new e-mail address. Not the one in the relpy field but this one. avery%ncr-fc at ncr-sd.sandiego.ncr.com Will you please get me back on the discussion group. From MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU Tue Jan 17 23:22:00 1989 From: MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU (MUMME%IDCSVAX.BITNET@CUNYVM.CUNY.EDU) Date: Tue, 17 Jan 89 20:22 PST Subject: Tech. Report Available Message-ID: The following tech. report is available from the University of Illinois Dept. of Computer Science: UIUCDCS-R-88-1485 STORAGE CAPACITY OF THE LINEAR ASSOCIATOR: BEGINNINGS OF A THEORY OF COMPUTATIONAL MEMORY by Dean C. Mumme May, 1988 ABSTRACT This thesis presents a characterization of a simple connectionist-system, the linear-associator, as both a memory and a classifier. Toward this end, a theory of memory based on information-theory is devised. The principles of the information-theory of memory are then used in conjunction with the dynamics of the linear-associator to discern its storage capacity and classification capabilities as they scale with system size. To determine storage capacity, a set of M vector-pairs called "items" are stored in an associator with N connection-weights. The number of bits of information stored by the system is then determined to be about (N/2)logM. The maximum number of items storable is found to be half the number of weights so that the information capacity of the system is quantified to be (N/2)logN. Classification capability is determined by allowing vectors not stored by the associator to appear its input. Conditions necessary for the associator to make a correct response are derived from constraints of information theory and the geometry of the space of input-vectors. Results include derivation of the information-throughput of the associator, the amount of information that that must be present in an input-vector and the number of vectors that can be classified by an associator of a given size with a given storage load. Figures of merit are obtained that allow comparison of capabilities of general memory/classifier systems. For an associator with a simple non-linarity on its output, the merit figures are evaluated and shown to be suboptimal. Constant attention is devoted to relative parameter size required to obtain the derived performance characteristics. Large systems are shown to perform nearest the optimum performance limits and suggestions are made concerning system architecture needed for best results. Finally, avenues for extension of the theory to more general systems are indicated. This tech. report is essentially my Ph.D. thesis completed last May and can be obtained by sending e-mail to: erna at a.cs.uiuc.edu Please do not send requests to me since I now live in Idaho and don't have access to the tech. reports. When replying to this notice, please do not use REPLY or send a note to "CONNECTIONISTS...". Send your request directly to Erna. Comments, questions and suggestions about the work can be sent directly to me at the address below. Thank You! Dean C. Mumme bitnet: mumme at idcsvax Dept. of Computer Science University of Idaho Moscow, ID 83843 From poggio at wheaties.ai.mit.edu Tue Jan 17 22:47:17 1989 From: poggio at wheaties.ai.mit.edu (Tomaso Poggio) Date: Tue, 17 Jan 89 22:47:17 EST Subject: Kolmogorov's superposition theorem In-Reply-To: sontag@fermat.rutgers.edu's message of Tue, 17 Jan 89 14:08:03 EST <8901171908.AA00964@control.rutgers.edu> Message-ID: <8901180347.AA21088@rice-chex.ai.mit.edu> Kolmogorov 's theorem and its relation to networks are discussed in Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input systems: computational properties of polynomial algorithms, Poggio and Reichardt). There are references there to older papers (see especially the two nice papers by H. Abelson). From mozer%neuron at boulder.Colorado.EDU Wed Jan 18 16:19:46 1989 From: mozer%neuron at boulder.Colorado.EDU (Michael C. Mozer) Date: Wed, 18 Jan 89 14:19:46 MST Subject: oh boy, more tech reports... Message-ID: <8901182119.AA00413@neuron> Please e-mail requests to "kate at boulder.colorado.edu". Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment Michael C. Mozer Paul Smolensky University of Colorado Department of Computer Science Tech Report # CU-CS-421-89 This paper proposes a means of using the knowledge in a network to deter- mine the functionality or _relevance_ of individual units, both for the purpose of understanding the network's behavior and improving its perfor- mance. The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units. This _skeletonization_ technique can be used to simplify networks by eliminating units that convey redundant infor- mation; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules." [An abridged version of this TR will appear in NIPS proceedings.] --------------------------------------------------------------------------- And while I'm at it, some other recent junk, I mean stuff... A Focused Back-Propagation Algorithm for Temporal Pattern Recognition Michael C. Mozer University of Toronto Connectionist Research Group Tech Report # CRG-TR-88-3 Time is at the heart of many pattern recognition tasks, e.g., speech recog- nition. However, connectionist learning algorithms to date are not well- suited for dealing with time-varying input patterns. This paper introduces a specialized connectionist architecture and corresponding specialization of the back-propagation learning algorithm that operates efficiently on temporal sequences. The key feature of the architecture is a layer of self-connected hidden units that integrate their current value with the new input at each time step to construct a static representation of the tem- poral input sequence. This architecture avoids two deficiencies found in other models of sequence recognition: first, it reduces the difficulty of temporal credit assignment by focusing the back propagated error signal; second, it eliminates the need for a buffer to hold the input sequence and/or intermediate activity levels. The latter property is due to the fact that during the forward (activation) phase, incremental activity _traces_ can be locally computed that hold all information necessary for back propagation in time. It is argued that this architecture should scale better than conventional recurrent architectures with respect to sequence length. The architecture has been used to implement a temporal version of Rumelhart and McClelland's verb past-tense model. The hidden units learn to behave something like Rumelhart and McClelland's "Wickelphones," a rich and flexible representation of temporal information. --------------------------------------------------------------------------- A Connectionist Model of Selective Attention in Visual Perception Michael C. Mozer University of Toronto Connectionist Research Group Tech Report # CRG-TR-88-4 This paper describes a model of selective attention that is part of a con- nectionist object recognition system called MORSEL. MORSEL is capable of identifying multiple objects presented simultaneously on its "retina," but because of capacity limitations, MORSEL requires attention to prevent it from trying to do too much at once. Attentional selection is performed by a network of simple computing units that constructs a variable-diameter "spotlight" on the retina, allowing sensory information within the spotlight to be preferentially processed. Simulations of the model demon- strate that attention is more critical for less familiar items and that at- tention can be used to reduce inter-item crosstalk. The model suggests four distinct roles of attention in visual information processing, as well as a novel view of attentional selection that has characteristics of both early and late selection theories. From Scott.Fahlman at B.GP.CS.CMU.EDU Wed Jan 18 13:54:02 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Wed, 18 Jan 89 13:54:02 EST Subject: Benchmark collection Message-ID: The mailing list "nn-bench at cs.cmu.edu" is now in operation. I believe that all the "add me" requests received prior to 1/17/89 have been serviced. Of course, it's possible that we messed up some of the requests. If you sent in a request more than a couple of days ago and if you have not yet seen any "nn-bench" mail, please contact "nn-bench-request at cs.cmu.edu" and we'll investigate. New requests should be sent to that same address. The list currently has about 80 subscribers, plus two rebroadcast sites. -- Scott Fahlman, CMU From pollack at cis.ohio-state.edu Fri Jan 20 15:40:09 1989 From: pollack at cis.ohio-state.edu (Jordan B. Pollack) Date: Fri, 20 Jan 89 15:40:09 EST Subject: Technical Report: LAIR 89-JP-NIPS Message-ID: <8901202040.AA13239@orange.cis.ohio-state.edu> Preprint of a NIPS paper is now available. Request LAIR 89-JP-NIPS From: Randy Miller CIS Dept/Ohio State University 2036 Neil Ave Columbus, OH 43210 or respond to this message but MODIFY THE To: AND Cc: LINES!!!!! ------------------------------------------------------------------------------ IMPLICATIONS OF RECURSIVE DISTRIBUTED REPRESENTATIONS Jordan B. Pollack Laboratory for AI Research Ohio State University Columbus, OH 43210 I will describe my recent results on the automatic development of fixed-width recursive distributed representations of variable-sized hierarchal data structures. One implication of this work is that certain types of AI-style data-structures can now be represented in fixed-width analog vectors. Simple inferences can be performed using the type of pattern associations that neural networks excel at. Another implication arises from noting that these representations become self-similar in the limit. Once this door to chaos is opened, many interesting new questions about the representational basis of intelligence emerge, and can (and will) be discussed. From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Sat Jan 21 00:06:08 1989 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias) Date: Sat, 21 Jan 89 00:06:08 EST Subject: No subject Message-ID: i was reading through the abstracts of the Boston 1988 INNS conference and noticed H. Bourlard and C. Welleken's paper on the relations between Hidden Markov Models and Multi Layer Perceptron. Does anybody have any pointers to papers on the subject by the same (preferrably) or other authors? or the e-mail address of these two authors? Thanasis Kehagias From netlist at psych.Stanford.EDU Sun Jan 22 18:16:33 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Sun, 22 Jan 89 15:16:33 PST Subject: Thurs (1/12): Steven Pinker on Language Models Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 24th (Tuesday, 3:30pm): ----------------------------- ******************************************************************************** Learning by Assertion: Calibrating a Simple Visual System LARRY MALONEY Deptartment of Psychology 6 Washington Place; 8th Floor New York University New York, NY 10003 email: ltm at xp.psych.nyu.edu ******************************************************************************** Abstract An ideal visual system is calibrated if its estimates reflect the actual state of the scene: Straight lines, for example, should be judged to be straight. If an ideal visual system is modeled as a neural network, then it is calibrated only if the weights linking elements of the the network are assigned correct values. I describe a method (`Learning by Assertion') for calibrating an ideal visual system by adjusting the weights. The method requires no explicit feedback or prior knowledge concerning the contents of the environment. This work is relevant to biological visual development and calibration, to the calibration of machine vision systems, and to the design of adaptive network algorithms. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Sun Jan 22 18:23:16 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Sun, 22 Jan 89 15:23:16 PST Subject: (Tues. 1/24): Larry Maloney on Visual Calibration Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 24th (Tuesday, 3:30pm): ----------------------------- ******************************************************************************** Learning by Assertion: Calibrating a Simple Visual System LARRY MALONEY Deptartment of Psychology 6 Washington Place; 8th Floor New York University New York, NY 10003 email: ltm at xp.psych.nyu.edu ******************************************************************************** Abstract An ideal visual system is calibrated if its estimates reflect the actual state of the scene: Straight lines, for example, should be judged to be straight. If an ideal visual system is modeled as a neural network, then it is calibrated only if the weights linking elements of the the network are assigned correct values. I describe a method (`Learning by Assertion') for calibrating an ideal visual system by adjusting the weights. The method requires no explicit feedback or prior knowledge concerning the contents of the environment. This work is relevant to biological visual development and calibration, to the calibration of machine vision systems, and to the design of adaptive network algorithms. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From rsun at cs.brandeis.edu Sun Jan 22 17:02:48 1989 From: rsun at cs.brandeis.edu (Ron Sun) Date: Sun, 22 Jan 89 17:02:48 est Subject: Technical Report: LAIR 89-JP-NIPS Message-ID: Please send this TR to Ron Sun Brandeis U CS Waltham, MA 02254 Thank you. From koch%HAMLET.BITNET at VMA.CC.CMU.EDU Mon Jan 23 14:29:31 1989 From: koch%HAMLET.BITNET at VMA.CC.CMU.EDU (Christof Koch) Date: Mon, 23 Jan 89 11:29:31 PST Subject: Gimme a break! Message-ID: <890123112923.20203114@Hamlet.Caltech.Edu> re. "Call for papers IJCNN, the only major neural network meeting of 1989 [sic]" Neural Information Processing Systems 1989 at Denver will be held this year from November 28-th until November 30-th followed by a workshop on December 1/2. This is the third annual meeting held under the auspices of the IEEE, Society of Neuroscience, and APS. For further information contact Scott Kirkpatrick, General Chairman (kirk at ibm.com) or wait for the Call for Papers which is in preparation. Christof From jbower at bek-mc.caltech.edu Mon Jan 23 16:20:39 1989 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Mon, 23 Jan 89 13:20:39 pst Subject: NIPS 89 Message-ID: <8901232120.AA14266@bek-mc.caltech.edu> To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. Jim Bower From jbower at bek-mc.caltech.edu Mon Jan 23 16:17:05 1989 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Mon, 23 Jan 89 13:17:05 pst Subject: NIPS Message-ID: <8901232117.AA14257@bek-mc.caltech.edu> To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. Jim Bower From Dave.Touretzky at B.GP.CS.CMU.EDU Mon Jan 23 18:46:25 1989 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Mon, 23 Jan 89 18:46:25 EST Subject: message from Jim Bower Message-ID: <331.601602385@DST.BOLTZ.CS.CMU.EDU> ================================================================ Date: Sun, 22 Jan 89 20:37:57 pst From: jbower at bek-mc.caltech.edu (Jim Bower) To: Connectionists-Request at q.cs.cmu.edu Subject: NIPS 89 To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. From movellan%garnet.Berkeley.EDU at violet.berkeley.edu Mon Jan 23 23:32:11 1989 From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu (movellan%garnet.Berkeley.EDU@violet.berkeley.edu) Date: Mon, 23 Jan 89 20:32:11 pst Subject: Weight Decay Message-ID: <8901240432.AA18293@garnet.berkeley.edu> Referring to the compilation about weight decay from John: I cannot see the analogy between weight decay and ridge regression. The weight solutions in a linear network (Ordinary Least Squares) are the solutions to (I'I) W = I'T where: I is the input matrix (rows are # of patterns in epoch and columns are # of input units in net). T is the teacher matrix (rows are # of patterns in epoch and columns are # of teacher units in net). W is the matrix of weights (net is linear with only one layer!). The weight solutions in ridge regression would be given by (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> represents the identity matrix. Notice that k<1> has the same effect as increasing the variances of the inputs (Diagonal of I'I) without increasing their covariances (rest of the I'I matrix). The final effect is biasing the W solutions but reducing the extreme variability to which they are subject when I'I is near singular (multicollinearity). Obviously collinearity may be a problem in nets with a large # of hidden units. I am presently studying how and why collinearity in the hidden layer affects generalization and whether ridge solutions may help in this situation. I cannot see though how these ridge solutions relate to weight decay. -Javier From ILPG0 at ccuab1.uab.es Tue Jan 24 09:23:00 1989 From: ILPG0 at ccuab1.uab.es (CORTO MALTESE) Date: Tue, 24 Jan 89 14:23 GMT Subject: Suscription Message-ID: Dear list owner, I should be grateful if you can add my name in the list of suscriptors of Connectionists. My name is O. S. Vilageliu, and the E. Mail address: ilpg0 at ccuab1.uab.es I thank you beforehand, Sincerely yours, Olga Soler From pollack at cis.ohio-state.edu Tue Jan 24 11:51:15 1989 From: pollack at cis.ohio-state.edu (Jordan B. Pollack) Date: Tue, 24 Jan 89 11:51:15 EST Subject: Gimme a break! In-Reply-To: Christof Koch's message of Mon, 23 Jan 89 11:29:31 PST <890123112923.20203114@Hamlet.Caltech.Edu> Message-ID: <8901241651.AA02067@toto.cis.ohio-state.edu> Speaking of NIPS versus IJCNN, At least NIPS is pronouncable, even though, as Terry S pointed out, Nabisco already holds it as a trademark. If the international joint conference is to be as lasting a success as, say, IJCAI, then its acronym should smoothly roll off the tongue. Here are some of the alternatives I've just come up with: Minor variations: JINNC (Jink) Permute the word order IJCONN Same name, but include the "ON" ICONN Leave out the "Joint" (for a drug free meeting?) ICONS International Conf. on Neural Systems (Hey! This is even a Word!) The most elegant name is simply NN "Neural Networks", which can be spoken as either "N Squared" signifying both its size and technical nature, or "Double-N", signifying both the need for a big spread and the yearly "round-up" of research results like cattle... Of course the search for acronyms usually generates useless debris: NIPSOID Neural Information Processing Systems On an International Dimension MANIC Most (of the) Artificially Neural International Community DNE (Sounds like DNA?) Dear Neural Enthusiast... BNANA Big Network of Artificial Neural Aficionados ARTIST Adaptive Resonance Theory as International Science and Technology IBSH I better Stop Here. From kanderso at BBN.COM Tue Jan 24 13:54:04 1989 From: kanderso at BBN.COM (kanderso@BBN.COM) Date: Tue, 24 Jan 89 13:54:04 -0500 Subject: Weight Decay In-Reply-To: Your message of Mon, 23 Jan 89 20:32:11 -0800. <8901240432.AA18293@garnet.berkeley.edu> Message-ID: Date: Mon, 23 Jan 89 20:32:11 pst From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu Message-Id: <8901240432.AA18293 at garnet.berkeley.edu> To: connectionists at cs.cmu.edu Subject: Weight Decay Referring to the compilation about weight decay from John: I cannot see the analogy between weight decay and ridge regression. The weight solutions in a linear network (Ordinary Least Squares) are the solutions to (I'I) W = I'T where: I is the input matrix (rows are # of patterns in epoch and columns are # of input units in net). T is the teacher matrix (rows are # of patterns in epoch and columns are # of teacher units in net). W is the matrix of weights (net is linear with only one layer!). The weight solutions in ridge regression would be given by (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> represents the identity matrix. Notice that k<1> has the same effect as increasing the variances of the inputs (Diagonal of I'I) without increasing their covariances (rest of the I'I matrix). The final effect is biasing the W solutions but reducing the extreme variability to which they are subject when I'I is near singular (multicollinearity). Obviously collinearity may be a problem in nets with a large # of hidden units. I am presently studying how and why collinearity in the hidden layer affects generalization and whether ridge solutions may help in this situation. I cannot see though how these ridge solutions relate to weight decay. -Javier Yes i was confused by this too. Here is what the connection seems to be. Say we are trying to minimize an energy function E(w) of the weight vector for our network. If we add a constraint that also attempts to minimize the length of w we would add a term kw'w to our energy function. Taking your linear least squares problem we would have E = (T-IW)'(T-IW) + kW'W dE/dW = I'IW - I'T + kW setting dE/dW = 0 gives [I'I +k<1>]W = I'T, ie. Ridge Regression. W = [I'I + k<1>]^-1 I'T The covariance matrix is [I'I + k<1>]^-1 so the effect of increasing k 1. Make the matrix more invertable. 2. Reduces the covariance so that new training data will have less effect on your weights. 3. You loose some resolution in weight space. I agree that collinearity is probably very important, and i'll be glad to discuss that off line. k From jose at tractatus.bellcore.com Wed Jan 25 10:02:09 1989 From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Wed, 25 Jan 89 10:02:09 EST Subject: Weight Decay Message-ID: <8901251502.AA05090@tractatus.bellcore.com> actually I think the connection is more general--ridge regression is a special case of variance techniques in regression called "biased regression" (including principle components), biases are introduced in order to remove effects of collinearity as has been discussed and to attempt to achieve estimators that may have a lower variance then the theoretical best least squares unbiased estimator ("blue") since when assumptions of linearity and independence are violated LSE are not particularly attractive and will not necessarily achieve "blue"s. Conseqently nonlinear regression and ordinary linear least squares regression with collinear variables may be able to achieve lower variance estimators by entertaining biases. In the nonlinear case a bias term would enter as a "constraint" to be mininmized with Error (y-yhat) sup 2. This constriant is actually a term that can push weights differentially towards zero--and in terms of regression is bias in terms of neural networks--weight decay. Ridge regression is a specific case in terms of linear lse where the off diagonal terms of the correlation matrix are given less weight by adding a small constant to the diagonal in order to reduce the collinearity problem--it is still controversial in statistical arenas--not everyone subcribes to the notion of introducing biases--since it is hard a-priori to know what bias might be optimal for a given problem. I have a paper with Lori Pratt that describes this relationship more generally that had been given at the last NIPS and should be available soon as a tech report. Steve Hanson From rui at rice.edu Wed Jan 25 18:34:38 1989 From: rui at rice.edu (Rui DeFigueiredo) Date: Wed, 25 Jan 89 17:34:38 CST Subject: No subject Message-ID: <8901252334.AA01804@zeta.rice.edu> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In-Reply-To: poggio at wheaties.ai.mit.edu's message of Tue, 17 Jan 89 22:47:17 EST Subject: Kolmogorov's superposition theorem Kolmogorov's theorem and its relation to networks are discussed in Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input systems: computational properties of polynomial algorithms, Poggio and Reichardt). There are references there to older papers (see especially the two nice papers by H. Abelson). - - - - - - - - - - - - end of message - - - - - - - - - - - - Comment: Poggio and Reichardt's paper, "On the representation of multi-input systems: Computational properties of polynomial algorithms" (Biol. Cyber., 37, 167-186, 1980) appeared, not earlier but, in the same year as deFigueiredo's, "Implications and applications of Kolmogorov's superposition theorem" (IEEE Trans. on Automatic Control, AC-25, 1227-1231, 1980). From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 12:55:49 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 12:55:49 EST Subject: DARPA Program announcement (long, 2 of 3) Message-ID: Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT(BAA#89-04): NEURAL NETWORKS: HARDWARE TECHNOLOGY BASE DEVELOPMENT SOL BAA#89-04 DUE 030189 POC Douglas M. Pollock, Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving proposals to develop hardware system components that capitalize on the inherent massive parallelism and expected robustness of neural network models. The objective of the present effort is to lay the groundwork for future construction of full-scale artificial neural network computing machines through the development of advanced hardware implementation technologies. DARPA does no intend to build full-scale machines at this stage of the program. Areas of interest include modifiable-weight synaptic connections, neuron processing unit devices, and scalable neural net architecture designs. The technologies proposed may be analog or digital, using silicon or other materials, and may be electronic, optoelectronic, optical, or other. The technology should be robust to manufacturing and environmental variability. It should be flexible and modular to accommodate evolving neural network system architectures and to allow for scale-up to large-sized systems through assembly/interconnection of smaller subsystems. It should be appropriate for future compact, low-power systems. It must accommodate the high fan-out/high fan-in properties characteristic of artificial neural network systems with high density interconnects, and it must have high throughput capability to achieve rapid processing of large volumes of data. Only those proposals that clearly delineate how the objective enumerated above are to be achieved and that demonstrate extensive prior experience in hardware design and fabrication will be favorably considered. If the proposal addresses a component technology, proposers should provide a detailed description of the interface features required for integration into a working artificial neural network system. Whether the proposed technology is adapted to a specific neural net model or, conversely, is applicable to a broad range of models, the proposer should clearly define the specific features of the proposed hardware that underlie its particular applicability. To the extent that availability of the proposed technology will facilitate the implementation of advanced systems other that artificial neural network systems, that potential impact should be described. Hardware developers are encouraged to work in close coordination with neural network modelers to better understand the range of current projected architectural requirements. DARPA will also entertain a limited number of proposals to develop near-term prototypes with high potential for demonstrating the expected power of artificial neural networks. This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From poggio at wheaties.ai.mit.edu Thu Jan 26 13:01:23 1989 From: poggio at wheaties.ai.mit.edu (Tomaso Poggio) Date: Thu, 26 Jan 89 13:01:23 EST Subject: No subject In-Reply-To: Rui DeFigueiredo's message of Wed, 25 Jan 89 17:34:38 CST <8901252334.AA01804@zeta.rice.edu> Message-ID: <8901261801.AA15158@wheat-chex.ai.mit.edu> ... From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 13:00:34 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 13:00:34 EST Subject: DARPA Program Announcement (long, 3 of 3) Message-ID: Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT(BAA#89-03): NEURAL NETWORKS: THEORY AND MODELING SOL BAA#89-03 DUE 030189 POC Douglas M. Pollock, Contracts(202)694-1771; Dr. Barbara L. Yoon, Technical(202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving proposals to develop and analyze new artificial neural network system architectures/structures and training procedures; define the requirements for scale-up to large-sized artificial neural networks; and characterize the properties, limitations, and data requirements of new and existing artificial neural network systems. Proposers are encouraged to submit proposals that deal with, but are not limited to, any combination of the following thrusts within these areas: (1) New artificial neural architectures with one or more of the following features: (a) Potential for addressing real-time sensory data processing and real-time sensorimonitor control; (b) Networks that incorporate features of sensory, motor, and perceptual processing in biological systems; (c) Nodal elements with increased processing capability, including sensitivity to temporal variations in synaptic inputs; (d) Modular networks composed of multiple interconnected subnets; (e) Hybrid systems combining neural and conventional information processing techniques; (f) Mechanisms to achieve modifications of network behavior in response to external consequences of initial actions; (g) Mechanisms that exhibit selective attention; (h) Strategies for developing conceptual systems and internal data representations well adapted to specific tasks; (i) Means for recognizing and producing sequences of temporal patterns. (2) Faster, more efficient training procedures that: (a) Are robust to noisy data and able to accommodate delayed feedback; (b) Minimize the need for external intervention for feedback; (c) Identify optimal choices of initial classification features or categories; (d) Generate internal models of the external world to guide appropriate responses to external stimuli. (3) Theoretical analyses that address; (a) Data representations; (b) Scaling properties for new and existing systems; (c) Matching of system complexity to the nature and amount of training data; (d) Tolerance to nodal element and synaptic failure; (e) Stability and convergence of new and existing systems; (f) Relationships between neural networks and conventional approaches. DARPA will also entertain a limited number of proposals to address special applications with high potential for demonstrating the expected power of artificial neural networks.This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 12:49:35 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 12:49:35 EST Subject: DARPA Program announcement (long, 1 of 3) Message-ID: Barbara Yoon at DARPA has apparently been flooded with requests for the three DARPA program announcements in the neural network area. To lighten the load, she asked us to send out the full text of these announcements to members of this mailing list. The text in this and the following two messages is copied verbatim from the Commerce Business Daily. We have resisted the temptation to insert paragraph breaks to improve readability. I apologize for dumping so much text on people who alrady have copies of the announcements or who are not interested, but this seems the best way to get the word out to a large set of potentially interested people. Please don't contact us about this program -- the appropriate phone numbers and addresses are listed in the announcements. -- Scott Fahlman, CMU =========================================================================== Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT (BAA#89-02): NEURAL NETWORKS: COMPARATIVE PERFORMANCE MEASUREMENTS SOL BAA#89-02 DUE 030189 POC Douglas M. Pollock, Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO is interested in receiving proposals to construct and test software simulations of artificial neural networks (or software simulations of hybrid systems incorporating artificial neural networks) that perform defined, complex classification tasks in the following application areas: (1) Automatic target recognition; (2) Continuous speech recognition; (3) Sonar signal discrimination; and (4) Seismic signal discrimination. The objectives of this program are to advance the state-of-the-art in application of artificial neural network approaches to classification problems; to investigate the optimal role of artificial neural networks in hybrid classification systems; and to measure the projected performance of artificial neural networks (or hybrid systems containing neural networks) in order to support a comparison with the performance of alternative, competing technologies. DARPA will provide application developers with a standard set of training data, appropriate to the application, to be used as the basis for training (or otherwise developing) their classification systems. The systems developed will then be evaluated independently in classification of standard sets of test data, distinct from the training set. The four application tasks are more fully described below. (1) Automatic target recognition: (a) Given a multi-spectral training set of time-correlated images of up to ten land vehicles (which may be partially obscured and in cluttered environments) with ground truth provided, identify and classify these vehicles in a new set of images (outside the training set); (b) Given images of two or more new land vehicles, recognize these vehicles as distinct from the original set and distinguish them from one another (with no system reprogramming or retraining); (c) Given a new training set of data on air vehicles, with system reprogramming and/or retraining, modify the system to identify and classify this new class of targets. (2) Continuous speech recognition: (a) Given a training set of 2800 spoken English sentences (with a 1000 word vocabulary), transcribe to written text spoken English sentences from a test set (outside the training set); (b) With no system reprogramming or retraining, transcribe to text spoken English sentences using vocabulary outside the initial vocabulary (given only the phonetic spelling of the new words); (c) Given training data on spoken foreign language sentences (with characteristics similar to those of the English sentence data base described in application (2)(a) above), with system programming and/or retraining, modify the system to transcribe to text spoken foreign language sentences. (3) Sonar signal discrimination: (a) Given a training set of several acoustic signature transients and passive marine acoustic signals (both signal types in noisy environments), detect and classify each signal type in a test set (outside the training set); (b) Given two or more new passive marine acoustic signals, with no system reprogramming or retraining, recognize these signals as distinct from the original set and distinguish them from one another; (c) Given a new training set of data on underwater echoes from active sonar returns, with system reprogramming and/or retraining, modify the system to detect and classify each signal type in this new class of signals and distinguish them from the original set of acoustic signals. (4) Seismic signal discrimination: (a) Given a training set of seismic signals (and associated parameters) from different types of seismic events of varying magnitudes, each event recorded at two or more seismic stations with ground truth provided, classify (as to signal type), locate, and estimate the magnitude of similar events in a test set of seismic signals (outside the training set); (b) Given one or more new types of seismic signals, recognize these signals as distinct from the original set (with no system reprogramming or retraining); (c) Given a new training set of seismic signals from seismic stations located in different geological regions from the original stations, with system reprogramming and/or retrain- ing, modify the system to classify and characterize this new set of signals. The criteria for evaluating the performance of the classification systems will include: (a) Classification accuracy (the appropriate accuracy metric for the task addressed, e.g., percentage or correct detections, identifications, and/or classifications, including false alarms where applicable; or total error rates); (b) System development time (the time required to develop and train the system); (c) Fault tolerance (the percentage of original performance when subjected to failure of some of the processing elements); (d) Generality (the accuracy of the system for new input data significantly outside the range of training data); (e) Adaptability (the time and effort required to modify the system to address similar classification problems with different classes of data); (f) Computational efficiency (the period solution speed when optimally implemented in hardware); (g) Size and power requirements (the projected size and power requirements of the computational hardware); (h) Performance vs training data (the rate of improvement in performance with increasing size of the training data set). This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From pwh at ece-csc.ncsu.edu Thu Jan 26 17:31:04 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Thu, 26 Jan 89 17:31:04 EST Subject: No subject Message-ID: <8901262231.AA03761@ece-csc.ncsu.edu> REVISED SUBMISSION DEADLINE FOR IJCNN-89 PAPERS--FEBRUARY 15, 1989 International Joint Conference on Neural Networks June 18-22, 1989 Washington, D.C. DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 has been revised to FEBRUARY 15, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Analysis -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Speech Processing and Recognition -Electronic Neurocomputers -Neural Network Architectures & Theory -Optimization FULL PAPERS in camera-ready form (1 original on Author's Kit forms and 5 reduced 8 1/2" x 11" copies) should be submitted to Nomi Feldman, Confer- ence Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, IJCNN-89 Conference Coordinator 3770 Tansy Street, San Diego, CA 92121 (619) 453-6222 From REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU Fri Jan 27 12:55:00 1989 From: REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU (Rex C. Bontrager) Date: Fri, 27 Jan 1989 12:55 EST Subject: INNS membership Message-ID: Who do I contact regarding INNS membership? (more precisely, to whom do I send my money?) Rex C. Bontrager Bitnet: rexb at purccvm Internet: rexb at vm.cc.purdue.edu Phone: (317) 494-1787 ext. 256 From neural!yann Wed Jan 25 15:13:58 1989 From: neural!yann (Yann le Cun) Date: Wed, 25 Jan 89 15:13:58 -0500 Subject: Weight Decay Message-ID: <8901252012.AA00971@neural.UUCP> Consider a single layer linear network with N inputs. When the number of training pattern is smaller than N , the set of solutions (in weight space) is a proper linear subspace. adding weight decay will select the minimum norm solution in this subspace (if the weight decay coefficient is decreased with time). The minimum norm solution happens to be the solution given by the pseudo-inverse technique (cf Kohonen), and the solution which optimally cancels out uncorrelated zero mean additive noise on the input. - Yann Le Cun From reggia at mimsy.umd.edu Fri Jan 27 19:41:19 1989 From: reggia at mimsy.umd.edu (James A. Reggia) Date: Fri, 27 Jan 89 19:41:19 EST Subject: call for papers Message-ID: <8901280041.AA04500@mimsy.umd.edu> CALL FOR PAPERS The 13th Annual Symposium on Computer Applications in Medical Care will have a tract this year on applications of neural models (connectionist models, etc.) in medicine. The Symposium will be held in Washington DC, as in previous years, on November 5 - 8, 1989. Submissions are refereed and if accepted, appear in the Symposium Proceedings. Deadline for submission of manuscripts (six copies, double spaced, max. of 5000 words) is March 3, 1989. For further information and/or a copy of the detailed call for papers, contact: SCAMC Office of Continuing Medical Education George Washington University Medical Center 2300 K Street, NW Washington, DC 20037 The detailed call for papers includes author information sheets that must be returned with a manuscript. From elman at amos.ling.ucsd.edu Sat Jan 28 01:24:24 1989 From: elman at amos.ling.ucsd.edu (Jeff Elman) Date: Fri, 27 Jan 89 22:24:24 PST Subject: UCSD Cog Sci faculty opening Message-ID: <8901280624.AA11066@amos.ling.ucsd.edu> ASSISTANT PROFESSOR COGNITIVE SCIENCE UNIVERSITY OF CALIFORNIA, SAN DIEGO The Department of Cognitive Science at UCSD expects to receive permission to hire one person for a tenure-track position at the Assistant Professor level. The Department takes a broadly based approach to the study of cognition, including its neurological basis, in individuals and social groups, and machine intelligence. We seek someone whose interests cut across conventional disciplines. Interests in theory, computational modeling (especially PDP), or applications are encouraged. Candidates should send a vita, reprints, a short letter describing their background and interests, and names and addresses of at least three references to: Search Committee Cognitive Science, C-015-E University of California, San Diego La Jolla, CA 92093 Applications must be received prior to March 15, 1989. Salary will be commensurate with experience and qualifications, and will be based upon UC pay schedules. Women and minorities are especially encouraged to apply. The University of California, San Diego is an Affirmative Action/Equal Opportunity Employer. From Dave.Touretzky at B.GP.CS.CMU.EDU Sat Jan 28 07:14:37 1989 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Sat, 28 Jan 89 07:14:37 EST Subject: INNS membership In-Reply-To: Your message of Fri, 27 Jan 89 12:55:00 -0500. Message-ID: <462.601992877@DST.BOLTZ.CS.CMU.EDU> PLEASE: Do not send requests for general information (like how to join INNS) to the CONNECTIONISTS list! This list is intended for serious scientific discussion only. If you need help with an address or something equally trivial, send mail to connectionists-request if you must. Better yet, use the Neuron Digest. Don't waste people's time on CONNECTIONISTS. -- Dave From norman%cogsci at ucsd.edu Sun Jan 29 13:36:36 1989 From: norman%cogsci at ucsd.edu (Donald A Norman-UCSD Cog Sci Dept) Date: Sun, 29 Jan 89 10:36:36 PST Subject: addendum to UCSD Cog Sci faculty opening Message-ID: <8901291836.AA22314@sdics.COGSCI> Jef Ellman's posting of the job at UCSD in the Cognitive Science Department was legally and technically accurate, but he should have added one important sentence: Get the application -- or at least, a letter of interest -- to us immediately. We are very late in getting the word out, and decisions will have to be made quickly. The sooner we know of the pool of applicants, the better. (Actually, I now discover one inaccuracy -- the ad says we "expect to receive permission to hire ..." In fact, we now do have that permission. If you have future interests -- say you are interested not now, but in a year or two or three -- that too is important for us to know, so tell us. don norman From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Sun Jan 29 22:39:30 1989 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias) Date: Sun, 29 Jan 89 22:39:30 EST Subject: speech list? Message-ID: does anyone know of a mailing list where speech questions are discussed? (not necessarily as related to connectionist methods; just speech questions in general). thanks a lot, Thanasis From pwh at ece-csc.ncsu.edu Mon Jan 30 14:48:25 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Mon, 30 Jan 89 14:48:25 EST Subject: IJCNN Call for Papers Amendment Message-ID: <8901301948.AA25787@ece-csc.ncsu.edu> Amendment to IJCNN call for papers Sorry...Upon reflection the wording in the IJCNN call for papers did not convey the proper meaning. Perhaps a better way to say it would have been, "IJCNN-89 is replacing both the ICNN and INNS meetings in 1989." The intent was for people to realize that if they planned to submit to either ICNN or INNS or both in 1989, the joint conference is the only opportunity to do so. Part of the reason for extending the deadline is to allow for the short notice (no INNS call for papers had previously been issued, since the merger of the two conferences just occurred). The original text was meant to imply the above and nothing more. No offense should be taken because none was intended. By the way, I was at last year's NIPS conference and thought it was an excellent conference. I plan to be there again next year. Also there has been some confusion over the revised deadline for paper sub- missions to IJCNN. The revised deadline STILL STANDS as FEBRUARY 15. P.S. Following the precedent set at the IJCAI, my pronunciation of IJCNN is idge-kin. The acronyms were good though! Wes Snyder, Co-Chairman of the Organization Committee, IJCNN-89 January 30, 1989 From eric at mcc.com Mon Jan 2 17:02:06 1989 From: eric at mcc.com (Eric Hartman) Date: Mon, 2 Jan 89 16:02:06 CST Subject: Tech Report Announcement Message-ID: <8901022202.AA11713@legendre.aca.mcc.com> The following MCC Technical Report is now available. Requests may be sent to eric at mcc.com or Eric Hartman Microelectronics and Computer Technology Corporation 3500 West Balcones Center Drive Austin, TX 78759-6509 U.S.A. ------------------------------------------------------------------------ Explorations of the Mean Field Theory Learning Algorithm Carsten Peterson* and Eric Hartman Microelectronics and Computer Technology Corporation 3500 West Balcones Center Drive Austin, TX 78759-6509 MCC Technical Report Number: ACA-ST/HI-065-88 Abstract: The mean field theory (MFT) learning algorithm is elaborated and explored with respect to a variety of tasks. MFT is benchmarked against the back propagation learning algorithm (BP) on two different feature recognition problems: two-dimensional mirror symmetry and eight-dimensional statistical pattern classification. We find that while the two algorithms are very similar with respect to generalization properties, MFT normally requires a substantially smaller number of training epochs than BP. Since the MFT model is bidirectional, rather than feed-forward, its use can be extended naturally from purely functional mappings to a content addressable memory. A network with N visible and N hidden units can store up to approximately 2N patterns with good content-addressability. We stress an implementational advantage for MFT: it is natural for VLSI circuitry. Also, its inherent parallelism can be exploited with fully synchronous updating, allowing efficient simulations on SIMD architectures. *Present Address: Department of Theoretical Physics University of Lund Solvegatan 14A, S-22362 Lund, Sweden From Scott.Fahlman at B.GP.CS.CMU.EDU Mon Jan 2 21:57:10 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Mon, 02 Jan 89 21:57:10 EST Subject: Benchmarks mailing list Message-ID: Two or three weeks ago I sent a message to this mailing list announcing our intention to set up at CMU a collection of learning benchmarks, accessible via FTP from the Arpanet. The hope is that this collection will help the research community in our joint effort to characterize the speed and quality of various learning algorithms on a variety of different learning tasks. There were a few problems in getting the new mailing lists set up over the holidays, but I believe we're now ready to proceed. I anticipate that there will be considerable discussion about the usefulness of various benchmarks, how they should be run, results, etc. Rather than clog the "connectionists" mailing list with these benchmark-related messages, we have set up a new mailing list whose Arpanet address is "nn-bench at cs.cmu.edu". If you want to be added to this mailing list, send an "add me" message to "nn-bench-request at cs.cmu.edu". Please include a valid netmail address that we can reach from the Arpanet. If messages to the address you give start bouncing, we'll have to delete you from the list. The "nn-bench-request" address is also the proper destination for "delete me" requests, address changes, and other messages intended only for the mailing list and data base maintainers. Please do not send such messages to "nn-bench" -- you will inconvenience a lot of people and make yourself look like a fool. At present, the mailing list maintainers are Michael Witbrock and me. If you just want to access the benchmark collection and not participate in the related discussions, you don't have to join the "nn-bench" mailing list. Once there is a useful collection of files in one place, I will tell people on the "connectionists" mailing list how to access them. I suggest we wait until January 15 or so before we start discussing substantive issues on the "nn-bench" list. This will give people time to join the mailing list before the fun begins. We will archive old messages for those who join later. -- Scott Fahlman, CMU From kruschke at cogsci.berkeley.edu Tue Jan 3 03:30:12 1989 From: kruschke at cogsci.berkeley.edu (John Kruschke) Date: Tue, 3 Jan 89 00:30:12 PST Subject: No subject Message-ID: <8901030830.AA09915@cogsci.berkeley.edu> Here is the compilation of responses to my request for info on weight decay. I have kept editing to a minimum, so you can see exactly what the author of the reply said. Where appropriate, I have included some comments of my own, set off in square brackets. The responses are arranged into three broad topics: (1) Boltzmann-machine related; (2) back-prop related; (3) psychology related. Thanks to all, and happy new year! --John ----------------------------------------------------------------- ORIGINAL REQUEST: I'm interested in all the information I can get regarding WEIGHT DECAY in back-prop, or in other learning algorithms. *In return* I'll collate all the info contributed and send the complilation out to all contributors. Info might include the following: REFERENCES: - Applications which used weight decay - Theoretical treatments Please be as complete as possible in your citation. FIRST-HAND EXPERIENCE - Application domain, details of I/O patterns, etc. - exact decay procedure used, and results (Please send info directly to me: kruschke at cogsci.berkeley.edu Don't use the reply command.) T H A N K S ! --John Kruschke. ----------------------------------------------------------------- From: Geoffrey Hinton Date: Sun, 4 Dec 88 13:57:45 EST Weight-decay is a version of what statisticians call "Ridge Regression". We used weight-decay in Boltzmann machines to keep the energy barriers small. This is described in section 6.1 of: Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984) Boltzmann Machines: Constraint satisfaction networks that learn. Technical Report CMU-CS-84-119, Carnegie-Mellon University. I used weight decay in the family trees example. Weight decay was used to improve generalization and to make the weights easier to interpret (because, at equilibrium, the magnitude of a weight = its usefulness). This is in: Rumelhart, D.~E., Hinton, G.~E., and Williams, R.~J. (1986) Learning representations by back-propagating errors. {\it Nature}, {\bf 323}, 533--536. I used weight decay to achieve better generalization in a hard generalization task that is reported in: Hinton, G.~E. (1987) Learning translation invariant recognition in a massively parallel network. In Goos, G. and Hartmanis, J., editors, {\it PARLE: Parallel Architectures and Languages Europe}, pages~1--13, Lecture Notes in Computer Science, Springer-Verlag, Berlin. Weight-decay can also be used to keep "fast" weights small. The fast weights act as a temporary context. One use of such a context is described in: Hinton, G.~E. and Plaut, D.~C. (1987) Using fast weights to deblur old memories. {\it Proceedings of the Ninth Annual Conference of the Cognitive Science Society}, Seattle, WA. --Geoff ----------------------------------------------------------------- [In his lecture at the International Computer Science Institute, Berkeley CA, on 16-DEC-88, Geoff also mentioned that weight decay is good for wiping out the initial values of weights so that only the effects of learning remain. In particular, if the change (due to learning) on two weights is the same for all updates, then the two weights converge to the same value. This is one way to generate symmetric weights from non-symmetric starting values. --John] ----------------------------------------------------------------- From: Michael.Franzini at SPEECH2.CS.CMU.EDU Date: Sun, 4 Dec 1988 23:24-EST My first-hand experience confirms what I'm sure many other people have told you: that (in general) weight decay in backprop increases generalization. I've found that it's particulary important for small training sets, and its effect diminishes as the training set size increases. Weight decay was first used by Barak Pearlmutter. The first mention of weight decay is, I believe, in an early paper of Hinton's (possibly the Plaut, Nowlan, and Hinton CMU CS tech report), and it is attributed to "Barak Pearlmutter, Personal Communication" there. The version of weight decay that (i'm fairly sure) all of us at CMU use is one in which each weight is multiplied by 0.999 every epoch. Scott Fahlman has a more complicated version, which is described in his QUICKPROP tech report. [QuickProp is also described in his paper in the Proceedings of the 1988 Connectionist Models Summer School, published by Morgan Kaufmann. --John] The main motivation for using it is to eliminate spurious large weights which happen not to interfere with recognition of training data but would interfere with recognizing testing data. (This was Barak's motivation for trying it in the first place.) However, I have heard more theoretical justifications (which, unfortunately, I can't reproduce.) In case Barak didn't reply to your message, you might want to contact him directly at bap at cs.cmu.edu. --Mike ----------------------------------------------------------------- From: Barak.Pearlmutter at F.GP.CS.CMU.EDU Date: 8 Dec 1988 16:36-EST We first used weight decay as a way to keep weights in a boltzmann machine from growing too large. We added a term to the thing being minimized, G, so that G' = G + 1/2 h \sum_{i Date: Tue, 6 Dec 88 09:34 CST Probably he will respond to you himself, but Alex Weiland of MITRE presented a paper at INNS in Boston on shaping, in which the order of presentation of examples in training a back-prop net was altered to reflect a simpler rule at first. Over a number of epochs he gradually changed the examples to slowly change the rule to the one desired. The nets learned much faster than if he just tossed the examples at the net in random order. He told me that it would not work without weight decay. He said their rule-of-thumb was the decay should give the weights a half-life of 2 to 3 dozen epochs (usually a value such as 0.9998). But I neglected to ask him if he felt that the number of epochs or the number of presentations was important. Perhaps if one had a significantly different training set size, that rule-of-thumb would be different? I have started some experiments simular to his shaping, using some random variation of the training data (where the random variation grows over time). Weiland also discussed this in his talk. I haven't yet compared decay with no-decay. I did try (as a lark) using decay with a regular (non-shaping) training, and it did worse than we usually get (on same data and same network type/size/shape). Perhaps I was using a stupid decay value (0.9998 I think) for that situation. I hope to get back to this, but at the moment we are preparing for a software release to our shareholders (MCC is owned by 20 or so computer industry corporations). In the next several weeks a lot of people will go on Christmas vacation, so I will be able to run a bunch of nets all at once. They call me the machine vulture. ----------------------------------------------------------------- From: Tony Robinson Date: Sat, 3 Dec 88 11:10:20 GMT Just a quick note in reply to your message to `connectionists' to say that I have tried to use weight decay with back-prop on networks with order 24 i/p, 24 hidden, 11 o/p units. The problem was vowel recognition (I think), it was about 18 months ago, and the problem was of the unsolvable type (i.e. non-zero final energy). My conclusion was that weight decay only made matters worse, and my justification (to myself) for abandoning weight decay was that you are not even pretending to do gradient descent any more, and any good solution formed quickly becomes garbaged by scaling the weights. If you want to avoid hidden units sticking on their limiting values, why not use hidden units with no limiting values, for instance I find the activation function f(x) = x * x works better than f(x) = 1.0 / (1.0 + exp(- x)) anyway. Sorry I havn't got anything formal to offer, but I hope these notes help. Tony Robinson. ----------------------------------------------------------------- From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Sat, 3 Dec 88 11:54:02 EST Actually, "costs" or "penalty" functions are probably better terms. We had a poster last week at NIPS that discussed some of the pitfalls and advantages of two kinds of costs. I can send you the paper when we have a version available. Stephen J. Hanson (jose at bellcore.com) ----------------------------------------------------------------- [ In a conversation in his office on 06-DEC-88, Dave Rumelhart described to me several cost functions he has tried. The motive for the functions he has tried is different from the motive for standard weight decay. Standard weight decay, \sum_{i,j} w_{i,j}^2 , is used to *distribute* weights more evenly over the given connections, thereby increasing robustness (cf. earlier replies). He has tried several other cost functions in an attempt to *localize*, or concentrate, the weights on a small subset of the given connections. The goal is to improve generalization. His favorite is \sum_{i,j} ( w_{i,j}^2 / ( K + w_{i,j}^2 ) ) where K is a constant, around 1 or 2. Note that this function is negatively accelerating, whereas standard weight decay is positively accelerating. This function penalizes small weights (proportionally) more than large weights, just the opposite of standard weight decay. He has also tried, with less satisfying results, \sum ( 1 - \exp - (\alpha w_{i,j}^2) ) and \sum \ln ( K + w_{i,j}^2 ). Finally, he has tried a cost function designed to make all the fan-in weights of a single unit decay, when possible. That is, the unit is effectively cut out of the network. The function is \sum_i (\sum_j w_{i,j}^2) / ( K + \sum_j w_{i,j}^2 ). Each weight is thereby penalized (inversely) proportionally to the total fan-in weight of its node. --John ] ----------------------------------------------------------------- [ This is also a relevant place to mention my paper in the Proceedings of the 1988 Connectionist Models Summer School, "Creating local and distributed bottlenecks in back-propagation networks". I have since developed those ideas, and have expressed the localized bottleneck method as gradient descent on an additional cost term. The cost term is quite general, and some forms of decay are simply special cases of it. --John] ----------------------------------------------------------------- From: john moody Date: Sun, 11 Dec 88 22:54:11 EST Scalettar and Zee did some interesting work on weight decay with back prop for associative memory. They found that a Unary Representation emerged (see Baum, Moody, and Wilczek; Bio Cybernetics Aug or Sept 88 for info on Unary Reps). Contact Tony Zee at UCSB (805)961-4111 for info on weight decay paper. --John Moody ----------------------------------------------------------------- From: gluck at psych.Stanford.EDU (Mark Gluck) Date: Sat, 10 Dec 88 16:51:29 PST I'd appreciate a copy of your weight decay collation. I have a paper in MS form which illustrates how adding weight decay to the linear-LMS one-layer net improves its ability to predict human generalization in classification learning. mark gluck dept of psych stanford univ, stanford, ca 94305 ----------------------------------------------------------------- From: INAM000 (Tony Marley) Date: SUN 04 DEC 1988 11:16:00 EST I have been exploring some ideas re COMPETITIVE LEARNING with "noisy weights" in modeling simple psychophysics. The task is the classical one of identifying one of N signals by a simple (verbal) response -e.g. the stimuli might be squares of different sizes, and one has to identify the presented one by saying the appropriate integer. We know from classical experiments that people cannot perform this task perfectly once N gets larger than about 7, but performance degrades smoothly for larger N. I have been developing simulations where the mapping is learnt by competitive learning, with the weights decaying/varying over time when they are not reset by relevant inputs. I have not got too many results to date, as I have been taking the psychological data seriously, which means worrying about reaction times, sequential effects, "end effects" (stimuli at the end of the range more accurately identified), range effects (increasing the stimulus range has little effect), etc.. Tony Marley ----------------------------------------------------------------- From: aboulanger at bbn.com (Albert Boulanger) Date: Fri, 2 Dec 88 19:43:14 EST This one concerns the Hopfield model. In James D Keeler, "Basin of Attraction of Neural Network Models", Snowbird Conference Proceedings (1986), 259-264, it is shown that the basins of attraction become very complicated as the number of stored patterns increase. He uses a weight modification method called "unlearning" to smooth out these basins. Albert Boulanger BBN Systems & Technologies Corp. aboulanger at bbn.com ----------------------------------------------------------------- From: Joerg Kindermann Date: Mon, 5 Dec 88 08:21:03 -0100 We used a form of weight decay not for learning but for recall in multilayer feedforward networks. See the following abstract. Input patterns are treated as ``weights'' coming from a constant valued external unit. If you would like a copy of the technical report, please send e-mail to joerg at gmdzi.uucp or write to: Dr. Joerg Kindermann Gesellschaft fuer Mathematik und Datenverarbeitung Schloss Birlinghoven Postfach 1240 D-5205 St. Augustin 1 WEST GERMANY Detection of Minimal Microfeatures by Internal Feedback J. Kindermann & A. Linden Abstract We define the notion of minimal microfeatures and introduce a new method of internal feedback for multilayer networks. Error signals are used to modify the input of a net. When combined with input DECAY, internal feedback allows the detection of sets of minimal microfeatures, i.e. those subpatterns which the network actually uses for discrimination. Additional noise on the training data increases the number of minimal microfeatures for a given pattern. The detection of minimal microfeatures is a first step towards a subsymbolic system with the capability of self-explanation. The paper provides examples from the domain of letter recognition. ----------------------------------------------------------------- From: Helen M. Gigley Date: Mon, 05 Dec 88 11:03:23 -0500 I am responding to your request even though my use of decay is not with respect to learning in connectionist-like models. My focus has been on a functioning system that can be lesioned. One question I have is what is the behavioral association to weight decay? What aspects of learning is it intended to reflect. I can understand that activity decay over time of each cell is meaningful and reflects a cellular property, but what is weight decay in comparable terms? Now, I will send you offprints if you would like of my work and am including a list of several publications which you may be able to peruse. The model, HOPE, is a hand-tuned structural connectionist model that is designed to enable lesioning without redesign or reprogramming to study possible processing causes of aphasia. Decay factors as an integral part of dynamic time-dependent processes are one of several aspects of processing in a neural environment which potentially affect the global processing results even though they are defined only locally. If I can be of any additional help please let me know. Helen Gigley References: Gigley, H.M. Neurolinguistically Constrained Simulation of Sentence Comprehension: Integrating Artificial Intelligence and Brain Theorym Ph.D. Dissertation, UMass/Amherst, 1982. Available University Microfilms, Ann Arbor, MI. Gigley, H.M. HOPE--AI and the dynamic process of language behavior. in Cognition and Brain Theory 6(1) :39-88, 1983. Gigley, H.M. Grammar viewed as a functioning part of of a cognitive system. Proceedings of ACL 23rd Annual Meeting, Chicago, 1985 . Gigley, H.M. Computational Neurolinguistics -- What is it all about? in IJCAI Proceedings, Los Angeles, 1985. Gigley, H.M. Studies in Artificial Aphasia--experiments in processing change. In Journal of Computer Methods and Programs in Biomedicine, 22 (1): 43-50, 1986. Gigley, H.M. Process Synchronization, Lexical Ambiguity Resolution, and Aphasia. In Steven L. Small, Garrison Cottrell, and Michael Tanenhaus (eds.) Lexical Ambiguity Resolution, Morgen Kaumann, 1988. ----------------------------------------------------------------- From: bharucha at eleazar.Dartmouth.EDU (Jamshed Bharucha) Date: Tue, 13 Dec 88 16:56:00 EST I haven't tried weight decay but am curious about it. I am working on back-prop learning of musical sequences using a Jordan-style net. The network develops a musical schema after learning lots of sequences that have culture-specific regularities. I.e., it learns to generate expectancies for tones following a sequential context. I'm interested in knowing how to implement forgetting, whether short term or long term. Jamshed. ----------------------------------------------------------------- From will at ida.org Tue Jan 3 10:50:14 1989 From: will at ida.org (Craig Will) Date: Tue, 3 Jan 89 10:50:14 EST Subject: Copies of DARPA Request for Proposals Available Message-ID: <8901031550.AA16284@csed-1> Copies of DARPA Request for Proposals Available Copies of the DARPA Neural Network Request for Propo- sals are now available (free) upon request. This is the same text as that published December 16 in the Commerce Business Daily, but reformatted and with bigger type for easier reading. This version was sent as a 4-page "Special supplementary issue" to subscribers of Neural Network Review in the United States. To get a copy mailed to you, send your US postal address to either: Michele Clouse clouse at ida.org (milnet) or: Neural Network Review P. O. Box 427 Dunn Loring, VA 22027 From harnad at Princeton.EDU Wed Jan 4 10:12:06 1989 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 4 Jan 89 10:12:06 EST Subject: Connectionist Concepts: BBS Call for Commentators Message-ID: <8901041512.AA11296@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on this article, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ THE CONNECTIONIST CONSTRUCTION OF CONCEPTS Adrian Cussins, New College, Oxford Keywords: connectionism, representation, cognition, perception, nonconceptual content, concepts, learning, objectivity, semantics Computational modelling of cognition depends on an underlying theory of representation. Classical cognitive science has exploited the syntax/semantics theory of representation derived from formal logic. As a consequence, the kind of psychological explanation supported by classical cognitive science is "conceptualist": psychological phenomena are modelled in terms of relations between concepts and between the sensors/effectors and concepts. This kind of explanation is inappropriate according to Smolensky's "Proper Treatment of Connectionism" [BBS 11(1) 1988]. Is there an alternative theory of representation that retains the advantages of classical theory but does not force psychological explanation into the conceptualist mold? I outline such an alternative by introducing an experience-based notion of nonconceptual content and by showing how a complex construction out of nonconceptual content can satisfy classical constraints on cognition. Cognitive structure is not interconceptual but intraconceptual. The theory of representational structure within concepts allows psychological phenomena to be explained as the progressive emergence of objectivity. This can be modelled computationally by transformations of nonconceptual content which progressively decrease its perspective-dependence through the formation of a cognitive map. Stevan Harnad ARPA/INTERNET harnad at confidence.princeton.edu harnad at princeton.edu harnad at mind.princeton.edu srh at flash.bellcore.com harnad at elbereth.rutgers.edu CSNET: harnad%mind.princeton.edu at relay.cs.net UUCP: harnad at princeton.uucp BITNET: harnad at pucc.bitnet harnad1 at umass.bitnet Phone: (609)-921-7771 From will at ida.org Wed Jan 4 10:59:54 1989 From: will at ida.org (Craig Will) Date: Wed, 4 Jan 89 10:59:54 EST Subject: Copies of DARPA Req for Prop Available Message-ID: <8901041559.AA13970@csed-1> Copies of DARPA Request for Proposals Available Copies of the DARPA Neural Network Request for Propo- sals are now available (free) upon request. This is the same text as that published December 16 in the Commerce Business Daily, but reformatted and with bigger type for easier reading. This version was sent as a 4-page "Special supplementary issue" to subscribers of Neural Network Review in the United States. To get a copy mailed to you, send your US postal address to either: Michele Clouse clouse at ida.org (milnet) or: Neural Network Review P. O. Box 427 Dunn Loring, VA 22027 From harnad at Princeton.EDU Wed Jan 4 10:18:00 1989 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 4 Jan 89 10:18:00 EST Subject: Speech Perception: BBS Multiple Book Review Message-ID: <8901041518.AA11306@psycho.Princeton.EDU> Below is the abstract of a book that will be multiply reviewed in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Reviewers must be current BBS Associates or nominated by a current BBS Associate. To be considered as a reviewer for this book, to suggest other appropriate reviewers, or for information about how to become a BBS Associate, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ BBS Multiple Book review of: SPEECH PERCEPTION BY EAR AND EYE: A PARADIGM FOR PSYCHOLOGICAL INQUIRY (Hillsdale NJ: LE Erlbaum Associates 1987) Dominic William Massaro Program in Experimental Psychology University of California, Santa Cruz Keywords: speech perception; vision; audition; categorical perception; connectionist models; fuzzy logic; sensory impairment; decision making This book is about the processing of information, particularly in face-to-face spoken communication where both audible and visible information are available. Experimental tasks were designed to manipulate many of these sources of information independently and to test mathematical fuzzy logical and other models of performance and the underlying stages of information processing. Multiple sources of information are evaluated and integrated to achieve speech perception. Graded information seems to be derived about the degree to which an input fits a given category rather than just all-or-none categorical information. Sources of information are evaluated independently, with the integration process insuring that the least ambiguous sources have the most impact on the judgment. The processes underlying speech-perception also occur in a variety of other behaviors, ranging from categorization to sentence interpretation, decision making and forming impressions about people. ----- Stevan Harnad INTERNET harnad at confidence.princeton.edu harnad at princeton.edu harnad at mind.princeton.edu srh at flash.bellcore.com harnad at elbereth.rutgers.edu CSNET: harnad%mind.princeton.edu at relay.cs.net UUCP: harnad at princeton.uucp BITNET: harnad at pucc.bitnet harnad1 at umass.bitnet Phone: (609)-921-7771 From mesard at BBN.COM Thu Jan 5 09:37:12 1989 From: mesard at BBN.COM (mesard@BBN.COM) Date: Thu, 05 Jan 89 09:37:12 -0500 Subject: Tech Report Announcement In-Reply-To: Your message of Mon, 02 Jan 89 16:02:06 -0600. <8901022202.AA11713@legendre.aca.mcc.com> Message-ID: Please send me a copy of the tech report Explorations of the Mean Field Theory Learning Algorithm Thanks. Wayne Mesard Mesard at BBN.COM 70 Fawcett St. Cambridge, MA 02138 617-873-1878 From gluck at psych.Stanford.EDU Thu Jan 5 10:20:17 1989 From: gluck at psych.Stanford.EDU (Mark Gluck) Date: Thu, 5 Jan 89 07:20:17 PST Subject: Human Learning & Connectionist Models Message-ID: I would grateful to receive information about people using connectionist/neural-net approaches within cognitive psychology to model human learning and memory data. Citations to published work, information about work in progress, and copies of reprints or preprints would be most welcome and appreciated. Mark Gluck Dept. of Psychology Jordan Hall; Bldg. 420 Stanford University Stanford, CA 94305 (415) 725-2434 gluck at psych.stanford.edu. From kanderso at BBN.COM Thu Jan 5 16:30:15 1989 From: kanderso at BBN.COM (kanderso@BBN.COM) Date: Thu, 05 Jan 89 16:30:15 -0500 Subject: No subject In-Reply-To: Your message of Tue, 03 Jan 89 00:30:12 -0800. <8901030830.AA09915@cogsci.berkeley.edu> Message-ID: I enjoyed John's summary of weight decay, but it raised a few questions. Just as John did, i'll be glad to summarize the responses to the group. 1. mentioned that " Weight-decay is a version of what statisticians call "Ridge Regression"." What do you mean by "version" is is exactly the same, or just slightly? I think i know what Ridge Regression is, but i don't see an obvious strong connection. I see a weak one, and after i think about it more maybe i'll say something about it. The ideas behind Ridge regression probably came from Levenberg and Marquardt who used it in nonlinear least squares: Levenberg K., A Method for the solution of certain nonlinear problems in least squares, Q. Appl. Math, Vol 2, pages 164-168, 1944. Marquardt, D.W., An algorithm for least squares estimation of non-linear parameters, J. Soc. Industrial and Applied Math., 11:431-441, 1963. 2. John quoted Dave Rumelhart as saying that standard weight decay distributes weights more evenly over the given connections, thereby increasing robustness. Why does smearing out large weights increase robustness? What does robustness mean here, the ability to generalize? k From dreyfus at cogsci.berkeley.edu Thu Jan 5 21:04:34 1989 From: dreyfus at cogsci.berkeley.edu (Hubert L. Dreyfus) Date: Thu, 5 Jan 89 18:04:34 PST Subject: Connectionist Concepts: BBS Call for Commentators Message-ID: <8901060204.AA02484@cogsci.berkeley.edu> Stevan: Stuart and I would like to write a joint comment on Cussins' paper. Please send us the latest version by e-mail or regular mail whichever you prefer. Hubert Dreyfus From daugman%charybdis at harvard.harvard.edu Fri Jan 6 10:41:42 1989 From: daugman%charybdis at harvard.harvard.edu (j daugman) Date: Fri, 6 Jan 89 10:41:42 EST Subject: Neural Networks in Natural and Artificial Vision Message-ID: For preparation of 1989 conference tutorials and reviews, I would be grateful to receive any available p\reprints reporting research on neural network models of human / biological vision and applications in artificial vision. Thanks in advance. John Daugman Harvard University 950 William James Hall Cambridge, Mass. 02138 From josh at flash.bellcore.com Fri Jan 6 14:32:55 1989 From: josh at flash.bellcore.com (Joshua Alspector) Date: Fri, 6 Jan 89 14:32:55 EST Subject: VLSI Implementations of Neural Networks Message-ID: <8901061932.AA07422@flash.bellcore.com> I will be giving a tuturial on the above topic at the Custom Integrated Circuits Conference. Vu grafs are due at the end of February and I would like to include as complete a description as possible of current efforts in the VLSI implementation of neural networks. I would appreciate receiving any preprints or hard copies of vu grafs regarding any work you are doing. E-mail reports are also acceptable. Please send to: Joshua Alspector Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07960-1910 From neural!jsd Fri Jan 6 12:45:14 1989 From: neural!jsd (John Denker) Date: Fri, 6 Jan 89 12:45:14 EST Subject: confidence / runner-up activation Message-ID: <8901061744.AA10566@neural.UUCP> Yes, we've been using the activation level of the runner-up neurons to provide confidence information in our character recognizer for some time. The work was reported at the last San Diego mtg and at the last Denver mtg. --- jsd (John Denker) From netlist at psych.Stanford.EDU Tue Jan 10 09:43:16 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Tue, 10 Jan 89 06:43:16 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: ADAPTIVE NETWORKS AND THEIR APPLICATIONS Co-sponsored by the Departments of Psychology and Electrical Engineering Winter Quarter 1989 Schedule ---------------------------- Jan. 12th (Thursday, 3:30pm): ----------------------------- STEVEN PINKER CONNECTIONISM AND Department of Brain & Cognitive Sciences THE FACTS OF HUMAN LANGUAGE Massachusetts Institute of Technology email: steve at psyche.mit.edu (with commentary by David Rumelhart) Jan. 24th (Tuesday, 3:30pm): ---------------------------- LARRY MALONEY LEARNING BY ASSERTION: Department of Psychology CALIBRATING A SIMPLE VISUAL SYSTEM New York University email: ltm at xp.psych.nyu.edu Feb. 9th (Thursday, 3:30pm): ---------------------------- CARVER MEAD VLSI MODELS OF NEURAL NETWORKS Moore Professor of Computer Science California Institute of Technology Feb. 21st (Tuesday, 3:30pm): ---------------------------- PIERRE BALDI ON SPACE AND TIME IN NEURAL COMPUTATIONS Jet Propulsion Laboratory California Institute of Technology email: pfbaldi at caltech.bitnet Mar. 14th (Tuesday, 3:30pm): ---------------------------- ALAN LAPEDES NONLINEAR SIGNAL PROCESSING WITH NEURAL NETS Theoretical Division - MS B213 Los Alamos National Laboratory email: asl at lanl.gov Additional Information ---------------------- The talks (including discussion) last about one hour and fifteen minutes. Following each talk, there will be a reception. Unless otherwise noted, all talks will be held in room 380-380F, which is in the basement of the Mathematical Sciences buildings. To be placed on an electronic-mail distribution list for information about these and other adaptive network events in the Stanford area, send email to netlist at psych.stanford.edu. For additional information, contact: Mark Gluck, Department of Psychology, Bldg. 420, Stanford University, Stanford, CA 94305 (phone 415-725-2434 or email to gluck at psych.stanford.edu). Program Committe: Committee: Bernard Widrow (E.E.), David Rumelhart, Misha Pavel, Mark Gluck (Psychology). This series is supported by the Departments of Psychology and Electrical Engineering and by a gift from the Thomson-CSF Corporation. Coming this Spring: D. Parker, B. McNaughton, G. Lynch & R. Granger From hinton at ai.toronto.edu Tue Jan 10 10:09:11 1989 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 10 Jan 89 10:09:11 EST Subject: new tech report Message-ID: <89Jan10.100924est.10956@ephemeral.ai.toronto.edu> The following report can be obtained by sending an email request to carol at ai.toronto.edu If this fails try carol%ai.toronto.edu at relay.cs.net Please do not send email to me about it (so don't use "reply" or "answer"). "Deterministic Boltzmann Learning Performs Steepest Descent in Weight-space." Geoffrey E. Hinton Department of Computer Science University of Toronto Technical report CRG-TR-89-1 ABSTRACT The Boltzmann machine learning procedure has been successfully applied in deterministic networks of analog units that use a mean field approximation to efficiently simulate a truly stochastic system {Peterson and Anderson, 1987}. This type of ``deterministic Boltzmann machine'' (DBM) learns much faster than the equivalent ``stochastic Boltzmann machine'' (SBM), but since the learning procedure for DBM's is only based on an analogy with SBM's, there is no existing proof that it performs gradient descent in any function, and it has only been justified by simulations. By using the appropriate interpretation for the way in which a DBM represents the probability of an output vector given an input vector, it is shown that the DBM performs steepest descent in the same function as the original SBM, except at rare discontinuities. A very simple way of forcing the weights to become symmetrical is also described, and this makes the DBM more biologically plausible than back-propagation. From netlist at psych.Stanford.EDU Wed Jan 11 09:29:01 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Wed, 11 Jan 89 06:29:01 PST Subject: Thurs (1/12): Steven Pinker on Language Models Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 12th (Thursday, 3:30pm): ----------------------------- ******************************************************************************** STEVEN PINKER CONNECTIONISM AND Department of Brain & Cognitive Sciences THE FACTS OF HUMAN LANGUAGE Massachusetts Institute of Technology email: steve at psyche.mit.edu (with commentary by David Rumelhart) ******************************************************************************** Abstract Connectionist modeling holds the promise of making important contributions to our understanding of human language. For example, such models can explore the role of parallel processing, constraint satisfaction, neurologically realistic architectures, and efficient pattern-matching in linguistic processes. However, the current connectionist program of language modeling seems to be motivated by a different set of goals: reviving classical associationism, elminating levels of linguistic representation, and maximizing the role of top-down, knowledge-driven processing. I present evidence (developed in collaboration with Alan Prince) that these goals are ill-advised, because the empirical assumptions they make about human language are simply false. Specifically, evidence from adults' and children's abilities with morphology, semantics, and syntax suggests that people possess formal linguistic rules and autonomous linguistic representations, which are not based on the statistical correlations among microfeatures that current connectionist models rely on so heavily. Moreover, I suggest that treating the existence of mentally-represented rules and representations as an empirical question will lead to greater progress than rejecting them on a priori methodological grounds. The data suggest that some linguistic processes are saliently rule-like, and call for a suitable symbol-processing architecture, whereas others are associative, and can be insightfully modeled using connectionist mechanisms. Thus taking the facts of human language seriously can lead to an interesting rapprochement between standard psycholinguistics and connectionist modeling. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From unido!gmdzi!joerg at uunet.UU.NET Thu Jan 12 04:30:50 1989 From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann) Date: Thu, 12 Jan 89 08:30:50 -0100 Subject: CALL FOR PARTICIPATION Message-ID: <8901120730.AA03021@gmdzi.UUCP> Workshop ``DANIP'' Distributed Adaptive Neural Information Processing. 24.-25.4.1989 Gesellschaft fuer Mathematik und Datenverarbeitung mbH Sankt Augustin Neural information processing is constantly gaining increasing attention in many scientific areas. As a consequence the first ``Workshop Konnektionismus'' at the GMD was organized in February 1988. It gave an overview of research activities in neural networks and their applications to Artificial Intelligence. Now, almost a year later, the time has come to focus on the state of neural information processing itself. The aim of the workshop is to discuss TECHNICAL aspects of information processing in neural networks on the basis of personal contributions in one of the following areas: - new or improved learning algorithms (including evaluations) - self organization of structured (non-localist) neural networks - time series analysis by means of neural networks - adaptivity, e.g the problem of relearning - adequate coding of information for neural processing - generalization - weight interpretation (correlative and other)} Presentations which report on ``work in progress'' are encouraged. The size of the workshop will be limited to 15 contributions of 30 minutes in length. A limited number of additional participants may attend the workshop and take part in the discussions. To apply for the workshop as a contributor, please send information about your contribution (1-2 pages in English or a relevant publication). If you want to participate without giving an oral presentation, please include a description of your background in the field of neural networks. Proceedings on the basis of workshop contributions will be published after the workshop. SCHEDULE: 28 February 1989: deadline for submission of applications 20 March 1989: notification of acceptance 24 - 25 April 1989: workshop ``DANIP'' 31 July 1989: deadline for submission of full papers to be included in the proceedings Applications should be sent to the following address: Dr. Joerg Kindermann or Alexander Linden Gesellschaft fuer Mathematik und Datenverarbeitung mbH - Schloss Birlinghoven - Postfach 1240 D-5205 Sankt Augustin 1 WEST GERMANY e-mail: joerg at gmdzi al at gmdzi From pwh at ece-csc.ncsu.edu Fri Jan 13 17:28:39 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Fri, 13 Jan 89 17:28:39 EST Subject: No subject Message-ID: <8901132228.AA05092@ece-csc.ncsu.edu> NEURAL NETWORKS CALL FOR PAPERS International Joint Conference on Neural Networks June 18-22, 1989 Washington, D.C. The 1989 IEEE/INNS International Joint Conference on Neural Networks (IJCNN-89) will be held at the Sheraton Washington Hotel in Washington, D.C., USA from June 18-22, 1989. IJCNN-89 is the first conference in a new series devoted to the technology and science of neurocomputing and neural networks in all of their aspects. The series replaces the previous IEEE ICNN and INNS Annual Meeting series and is jointly sponsored by the IEEE Technical Activities Board Neural Network Committee and the International Neural Network Society (INNS). IJCNN-89 will be the only major neural net- work meeting of 1989 (IEEE ICNN-89 and the 1989 INNS Annual Meeting have both been cancelled). Thus, it behooves all members of the neural network community who have important new results for presentation to prepare their papers now and submit them by the IJCNN-89 deadline of 1 FEBRUARY 1989. The Conference Proceedings will be distributed AT THE REGISTRATION DESK to all regular conference registrants as well as to all student registrants. The conference will include a day of tutorials (June 18), the exhibit hall (the neurocomputing industry's primary annual trade show), plenary talks, and social events. Mark your calendar today and plan to attend IJCNN-89 -- the definitive annual progress report on the neurocomputing revolution! DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 is FEBRUARY 1, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Analysis -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Optimization -Electronic Neurocomputers -Neural Network Architectures & Theory -Speech Recognition FULL PAPERS in camera-ready form (1 original on Author's Kit forms and 5 reduced 8 1/2" x 11" copies) should be submitted to Nomi Feldman, Confer- ence Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, IJCNN-89 Conference Coordinator 3770 Tansy Street, San Diego, CA 92121 (619) 453-6222 From rudnick at cse.ogc.edu Sat Jan 14 18:05:27 1989 From: rudnick at cse.ogc.edu (Mike Rudnick) Date: Sat, 14 Jan 89 15:05:27 PST Subject: genetic search and neural nets Message-ID: <8901142305.AA07774@ogccse.OGC.EDU> I am a phd candidate in computer science at Oregon Graduate Center. My research interest is in using genetic search to tackle artificial neural network (ANN) scaling issues. My particular orientation is to view minimizing interconnections as a central issue, partly motivated by VLSI implementation issues. I am starting a mailing list for those interested in applying genetic search to/with/for ANNs. Mail a request to Neuro-evolution-request at cse.ogc.edu to have your name added to the list. A bibliography of work relating artificial neural networks (ANNs) and genetic search is available. It is organized/oriented for someone familiar with the ANN literature but unfamiliar with the genetic search literature. Send a request to Neuro-evolution-request at cse.ogc.edu for a copy. If there is sufficient interest I will post the bibliography here. -------------------------------------------------------------------------- Mike Rudnick CSnet: rudnick at cse.ogc.edu Computer Science & Eng. Dept. ARPAnet: rudnick%cse.ogc.edu at relay.cs.net Oregon Graduate Center BITNET: rudnick%cse.ogc.edu at relay.cs.net 19600 N.W. von Neumann Dr. UUCP: {tektronix,verdix}!ogccse!rudnick Beaverton, OR. 97006-1999 (503) 690-1121 X7390 -------------------------------------------------------------------------- From sontag at fermat.rutgers.edu Tue Jan 17 14:08:03 1989 From: sontag at fermat.rutgers.edu (sontag@fermat.rutgers.edu) Date: Tue, 17 Jan 89 14:08:03 EST Subject: Kolmogorov's superposition theorem Message-ID: <8901171908.AA00964@control.rutgers.edu> *** I am posting this for Professor Rui de Figuereido, a researcher in Control Theory and Circuits who does not subscribe to this list. Please direct cc's of all responses to his e-mail address (see below). -eduardo s. *** KOLMOGOROV'S SUPERPOSITION THEOREM AND ARTIFICIAL NEURAL NETWORKS Rui J. P. de Figueiredo Dept. of Electrical and Computer Engineering Rice University, Houston, TX 77251-1892 e-mail: rui at zeta.rice.edu The implementation of the Kolmogorov-Arnold-Sprecher Superposition Theorem [1-3] in terms of artificial neural networks was first presented and fully discussed by me in 1980 [4]. I also discussed, then [4], applications of these structures to statistical pattern recognition and image and multi- dimensional signal processing. However, I did not use the words "neural networks" in defining the underlying networks. For this reason, the current researchers on neural nets including Robert Hecht-Nielsen [5] do not seem to be aware of my contribution [4]. I hope that this note will help correct history. Incidentally, there is a misprint in [4]. In [4], please insert "no" in the statement before eqn.(4). That statement should read: "Sprecher showed that lambda can be any nonzero number which satisfies no equation ..." [1] A.K.Kolmogorov, "On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition," Dokl.Akad.Nauk.SSSR,Vol.114,pp.369-373,1957. [2] V.I.Arnol'd, "On functions of three variables," Dokl.Akad.Nauk.SSSR, Vol.114,pp.953-956,1957. [3] D.A.Sprecher, "An improvement in the superposition theorem of Kolmogorov," J.Math.Anal.Appl.,Vol.38,pp.208-213,1972. [4] Rui J.P.de Figueiredo, "Implications and applications of Kolmogorov's superposition theorem,"IEEE Trans.Auto.Contr.,Vol.AC-25,pp.1227-1231,1980. [5] R.Hecht-Nielsen, "Kolmogorov's mapping neural network existence theorem," IEEE 1st Int.Conf.on Neural Networks, San Diego,CA,June 21-24,1987,paper III-11. From ncr-fc!avery at ncr-sd.sandiego.ncr.com Tue Jan 17 19:43:43 1989 From: ncr-fc!avery at ncr-sd.sandiego.ncr.com (ncr-fc!avery@ncr-sd.sandiego.ncr.com) Date: Tue, 17 Jan 89 17:43:43 MST Subject: new address Message-ID: <8901180043.AA19084@ncr-fc.FtCollins.NCR.com> I have a new e-mail address. Not the one in the relpy field but this one. avery%ncr-fc at ncr-sd.sandiego.ncr.com Will you please get me back on the discussion group. From MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU Tue Jan 17 23:22:00 1989 From: MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU (MUMME%IDCSVAX.BITNET@CUNYVM.CUNY.EDU) Date: Tue, 17 Jan 89 20:22 PST Subject: Tech. Report Available Message-ID: The following tech. report is available from the University of Illinois Dept. of Computer Science: UIUCDCS-R-88-1485 STORAGE CAPACITY OF THE LINEAR ASSOCIATOR: BEGINNINGS OF A THEORY OF COMPUTATIONAL MEMORY by Dean C. Mumme May, 1988 ABSTRACT This thesis presents a characterization of a simple connectionist-system, the linear-associator, as both a memory and a classifier. Toward this end, a theory of memory based on information-theory is devised. The principles of the information-theory of memory are then used in conjunction with the dynamics of the linear-associator to discern its storage capacity and classification capabilities as they scale with system size. To determine storage capacity, a set of M vector-pairs called "items" are stored in an associator with N connection-weights. The number of bits of information stored by the system is then determined to be about (N/2)logM. The maximum number of items storable is found to be half the number of weights so that the information capacity of the system is quantified to be (N/2)logN. Classification capability is determined by allowing vectors not stored by the associator to appear its input. Conditions necessary for the associator to make a correct response are derived from constraints of information theory and the geometry of the space of input-vectors. Results include derivation of the information-throughput of the associator, the amount of information that that must be present in an input-vector and the number of vectors that can be classified by an associator of a given size with a given storage load. Figures of merit are obtained that allow comparison of capabilities of general memory/classifier systems. For an associator with a simple non-linarity on its output, the merit figures are evaluated and shown to be suboptimal. Constant attention is devoted to relative parameter size required to obtain the derived performance characteristics. Large systems are shown to perform nearest the optimum performance limits and suggestions are made concerning system architecture needed for best results. Finally, avenues for extension of the theory to more general systems are indicated. This tech. report is essentially my Ph.D. thesis completed last May and can be obtained by sending e-mail to: erna at a.cs.uiuc.edu Please do not send requests to me since I now live in Idaho and don't have access to the tech. reports. When replying to this notice, please do not use REPLY or send a note to "CONNECTIONISTS...". Send your request directly to Erna. Comments, questions and suggestions about the work can be sent directly to me at the address below. Thank You! Dean C. Mumme bitnet: mumme at idcsvax Dept. of Computer Science University of Idaho Moscow, ID 83843 From poggio at wheaties.ai.mit.edu Tue Jan 17 22:47:17 1989 From: poggio at wheaties.ai.mit.edu (Tomaso Poggio) Date: Tue, 17 Jan 89 22:47:17 EST Subject: Kolmogorov's superposition theorem In-Reply-To: sontag@fermat.rutgers.edu's message of Tue, 17 Jan 89 14:08:03 EST <8901171908.AA00964@control.rutgers.edu> Message-ID: <8901180347.AA21088@rice-chex.ai.mit.edu> Kolmogorov 's theorem and its relation to networks are discussed in Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input systems: computational properties of polynomial algorithms, Poggio and Reichardt). There are references there to older papers (see especially the two nice papers by H. Abelson). From mozer%neuron at boulder.Colorado.EDU Wed Jan 18 16:19:46 1989 From: mozer%neuron at boulder.Colorado.EDU (Michael C. Mozer) Date: Wed, 18 Jan 89 14:19:46 MST Subject: oh boy, more tech reports... Message-ID: <8901182119.AA00413@neuron> Please e-mail requests to "kate at boulder.colorado.edu". Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment Michael C. Mozer Paul Smolensky University of Colorado Department of Computer Science Tech Report # CU-CS-421-89 This paper proposes a means of using the knowledge in a network to deter- mine the functionality or _relevance_ of individual units, both for the purpose of understanding the network's behavior and improving its perfor- mance. The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units. This _skeletonization_ technique can be used to simplify networks by eliminating units that convey redundant infor- mation; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules." [An abridged version of this TR will appear in NIPS proceedings.] --------------------------------------------------------------------------- And while I'm at it, some other recent junk, I mean stuff... A Focused Back-Propagation Algorithm for Temporal Pattern Recognition Michael C. Mozer University of Toronto Connectionist Research Group Tech Report # CRG-TR-88-3 Time is at the heart of many pattern recognition tasks, e.g., speech recog- nition. However, connectionist learning algorithms to date are not well- suited for dealing with time-varying input patterns. This paper introduces a specialized connectionist architecture and corresponding specialization of the back-propagation learning algorithm that operates efficiently on temporal sequences. The key feature of the architecture is a layer of self-connected hidden units that integrate their current value with the new input at each time step to construct a static representation of the tem- poral input sequence. This architecture avoids two deficiencies found in other models of sequence recognition: first, it reduces the difficulty of temporal credit assignment by focusing the back propagated error signal; second, it eliminates the need for a buffer to hold the input sequence and/or intermediate activity levels. The latter property is due to the fact that during the forward (activation) phase, incremental activity _traces_ can be locally computed that hold all information necessary for back propagation in time. It is argued that this architecture should scale better than conventional recurrent architectures with respect to sequence length. The architecture has been used to implement a temporal version of Rumelhart and McClelland's verb past-tense model. The hidden units learn to behave something like Rumelhart and McClelland's "Wickelphones," a rich and flexible representation of temporal information. --------------------------------------------------------------------------- A Connectionist Model of Selective Attention in Visual Perception Michael C. Mozer University of Toronto Connectionist Research Group Tech Report # CRG-TR-88-4 This paper describes a model of selective attention that is part of a con- nectionist object recognition system called MORSEL. MORSEL is capable of identifying multiple objects presented simultaneously on its "retina," but because of capacity limitations, MORSEL requires attention to prevent it from trying to do too much at once. Attentional selection is performed by a network of simple computing units that constructs a variable-diameter "spotlight" on the retina, allowing sensory information within the spotlight to be preferentially processed. Simulations of the model demon- strate that attention is more critical for less familiar items and that at- tention can be used to reduce inter-item crosstalk. The model suggests four distinct roles of attention in visual information processing, as well as a novel view of attentional selection that has characteristics of both early and late selection theories. From Scott.Fahlman at B.GP.CS.CMU.EDU Wed Jan 18 13:54:02 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Wed, 18 Jan 89 13:54:02 EST Subject: Benchmark collection Message-ID: The mailing list "nn-bench at cs.cmu.edu" is now in operation. I believe that all the "add me" requests received prior to 1/17/89 have been serviced. Of course, it's possible that we messed up some of the requests. If you sent in a request more than a couple of days ago and if you have not yet seen any "nn-bench" mail, please contact "nn-bench-request at cs.cmu.edu" and we'll investigate. New requests should be sent to that same address. The list currently has about 80 subscribers, plus two rebroadcast sites. -- Scott Fahlman, CMU From pollack at cis.ohio-state.edu Fri Jan 20 15:40:09 1989 From: pollack at cis.ohio-state.edu (Jordan B. Pollack) Date: Fri, 20 Jan 89 15:40:09 EST Subject: Technical Report: LAIR 89-JP-NIPS Message-ID: <8901202040.AA13239@orange.cis.ohio-state.edu> Preprint of a NIPS paper is now available. Request LAIR 89-JP-NIPS From: Randy Miller CIS Dept/Ohio State University 2036 Neil Ave Columbus, OH 43210 or respond to this message but MODIFY THE To: AND Cc: LINES!!!!! ------------------------------------------------------------------------------ IMPLICATIONS OF RECURSIVE DISTRIBUTED REPRESENTATIONS Jordan B. Pollack Laboratory for AI Research Ohio State University Columbus, OH 43210 I will describe my recent results on the automatic development of fixed-width recursive distributed representations of variable-sized hierarchal data structures. One implication of this work is that certain types of AI-style data-structures can now be represented in fixed-width analog vectors. Simple inferences can be performed using the type of pattern associations that neural networks excel at. Another implication arises from noting that these representations become self-similar in the limit. Once this door to chaos is opened, many interesting new questions about the representational basis of intelligence emerge, and can (and will) be discussed. From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Sat Jan 21 00:06:08 1989 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias) Date: Sat, 21 Jan 89 00:06:08 EST Subject: No subject Message-ID: i was reading through the abstracts of the Boston 1988 INNS conference and noticed H. Bourlard and C. Welleken's paper on the relations between Hidden Markov Models and Multi Layer Perceptron. Does anybody have any pointers to papers on the subject by the same (preferrably) or other authors? or the e-mail address of these two authors? Thanasis Kehagias From netlist at psych.Stanford.EDU Sun Jan 22 18:16:33 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Sun, 22 Jan 89 15:16:33 PST Subject: Thurs (1/12): Steven Pinker on Language Models Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 24th (Tuesday, 3:30pm): ----------------------------- ******************************************************************************** Learning by Assertion: Calibrating a Simple Visual System LARRY MALONEY Deptartment of Psychology 6 Washington Place; 8th Floor New York University New York, NY 10003 email: ltm at xp.psych.nyu.edu ******************************************************************************** Abstract An ideal visual system is calibrated if its estimates reflect the actual state of the scene: Straight lines, for example, should be judged to be straight. If an ideal visual system is modeled as a neural network, then it is calibrated only if the weights linking elements of the the network are assigned correct values. I describe a method (`Learning by Assertion') for calibrating an ideal visual system by adjusting the weights. The method requires no explicit feedback or prior knowledge concerning the contents of the environment. This work is relevant to biological visual development and calibration, to the calibration of machine vision systems, and to the design of adaptive network algorithms. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Sun Jan 22 18:23:16 1989 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Sun, 22 Jan 89 15:23:16 PST Subject: (Tues. 1/24): Larry Maloney on Visual Calibration Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Jan. 24th (Tuesday, 3:30pm): ----------------------------- ******************************************************************************** Learning by Assertion: Calibrating a Simple Visual System LARRY MALONEY Deptartment of Psychology 6 Washington Place; 8th Floor New York University New York, NY 10003 email: ltm at xp.psych.nyu.edu ******************************************************************************** Abstract An ideal visual system is calibrated if its estimates reflect the actual state of the scene: Straight lines, for example, should be judged to be straight. If an ideal visual system is modeled as a neural network, then it is calibrated only if the weights linking elements of the the network are assigned correct values. I describe a method (`Learning by Assertion') for calibrating an ideal visual system by adjusting the weights. The method requires no explicit feedback or prior knowledge concerning the contents of the environment. This work is relevant to biological visual development and calibration, to the calibration of machine vision systems, and to the design of adaptive network algorithms. Additional Information ---------------------- Location: Room 380-380F, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Mailing lists: To be added to the network mailing list, netmail to netlist at psych.stanford.edu. For additional information, or contact Mark Gluck (gluck at psych.stanford.edu). Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From rsun at cs.brandeis.edu Sun Jan 22 17:02:48 1989 From: rsun at cs.brandeis.edu (Ron Sun) Date: Sun, 22 Jan 89 17:02:48 est Subject: Technical Report: LAIR 89-JP-NIPS Message-ID: Please send this TR to Ron Sun Brandeis U CS Waltham, MA 02254 Thank you. From koch%HAMLET.BITNET at VMA.CC.CMU.EDU Mon Jan 23 14:29:31 1989 From: koch%HAMLET.BITNET at VMA.CC.CMU.EDU (Christof Koch) Date: Mon, 23 Jan 89 11:29:31 PST Subject: Gimme a break! Message-ID: <890123112923.20203114@Hamlet.Caltech.Edu> re. "Call for papers IJCNN, the only major neural network meeting of 1989 [sic]" Neural Information Processing Systems 1989 at Denver will be held this year from November 28-th until November 30-th followed by a workshop on December 1/2. This is the third annual meeting held under the auspices of the IEEE, Society of Neuroscience, and APS. For further information contact Scott Kirkpatrick, General Chairman (kirk at ibm.com) or wait for the Call for Papers which is in preparation. Christof From jbower at bek-mc.caltech.edu Mon Jan 23 16:20:39 1989 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Mon, 23 Jan 89 13:20:39 pst Subject: NIPS 89 Message-ID: <8901232120.AA14266@bek-mc.caltech.edu> To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. Jim Bower From jbower at bek-mc.caltech.edu Mon Jan 23 16:17:05 1989 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Mon, 23 Jan 89 13:17:05 pst Subject: NIPS Message-ID: <8901232117.AA14257@bek-mc.caltech.edu> To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. Jim Bower From Dave.Touretzky at B.GP.CS.CMU.EDU Mon Jan 23 18:46:25 1989 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Mon, 23 Jan 89 18:46:25 EST Subject: message from Jim Bower Message-ID: <331.601602385@DST.BOLTZ.CS.CMU.EDU> ================================================================ Date: Sun, 22 Jan 89 20:37:57 pst From: jbower at bek-mc.caltech.edu (Jim Bower) To: Connectionists-Request at q.cs.cmu.edu Subject: NIPS 89 To whom it may concern: A few days ago there was an announcement on the connectionist network that only one "major" neural network meeting would be held in 1989. While "major" in past meeting announcements for the INNS and the IEEE San Diego meetings has seemed most often to be equated with total attendance, and size of the exhibit area, an equally important measure might be the overall quality of the work presented and therefore, the importance of the meeting to the field. Accordingly, the previous announcement should probably be amended to include the fact that the Third annual Neural Information Processing Systems (NIPS) meeting will be held in late 1989 in Denver. While the objective of this meeting is not to be the biggest meeting ever, and submitted papers are refereed, authors might consider submitting important results to this meeting anyway. A call for papers will be announced, as usual, on this network. From movellan%garnet.Berkeley.EDU at violet.berkeley.edu Mon Jan 23 23:32:11 1989 From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu (movellan%garnet.Berkeley.EDU@violet.berkeley.edu) Date: Mon, 23 Jan 89 20:32:11 pst Subject: Weight Decay Message-ID: <8901240432.AA18293@garnet.berkeley.edu> Referring to the compilation about weight decay from John: I cannot see the analogy between weight decay and ridge regression. The weight solutions in a linear network (Ordinary Least Squares) are the solutions to (I'I) W = I'T where: I is the input matrix (rows are # of patterns in epoch and columns are # of input units in net). T is the teacher matrix (rows are # of patterns in epoch and columns are # of teacher units in net). W is the matrix of weights (net is linear with only one layer!). The weight solutions in ridge regression would be given by (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> represents the identity matrix. Notice that k<1> has the same effect as increasing the variances of the inputs (Diagonal of I'I) without increasing their covariances (rest of the I'I matrix). The final effect is biasing the W solutions but reducing the extreme variability to which they are subject when I'I is near singular (multicollinearity). Obviously collinearity may be a problem in nets with a large # of hidden units. I am presently studying how and why collinearity in the hidden layer affects generalization and whether ridge solutions may help in this situation. I cannot see though how these ridge solutions relate to weight decay. -Javier From ILPG0 at ccuab1.uab.es Tue Jan 24 09:23:00 1989 From: ILPG0 at ccuab1.uab.es (CORTO MALTESE) Date: Tue, 24 Jan 89 14:23 GMT Subject: Suscription Message-ID: Dear list owner, I should be grateful if you can add my name in the list of suscriptors of Connectionists. My name is O. S. Vilageliu, and the E. Mail address: ilpg0 at ccuab1.uab.es I thank you beforehand, Sincerely yours, Olga Soler From pollack at cis.ohio-state.edu Tue Jan 24 11:51:15 1989 From: pollack at cis.ohio-state.edu (Jordan B. Pollack) Date: Tue, 24 Jan 89 11:51:15 EST Subject: Gimme a break! In-Reply-To: Christof Koch's message of Mon, 23 Jan 89 11:29:31 PST <890123112923.20203114@Hamlet.Caltech.Edu> Message-ID: <8901241651.AA02067@toto.cis.ohio-state.edu> Speaking of NIPS versus IJCNN, At least NIPS is pronouncable, even though, as Terry S pointed out, Nabisco already holds it as a trademark. If the international joint conference is to be as lasting a success as, say, IJCAI, then its acronym should smoothly roll off the tongue. Here are some of the alternatives I've just come up with: Minor variations: JINNC (Jink) Permute the word order IJCONN Same name, but include the "ON" ICONN Leave out the "Joint" (for a drug free meeting?) ICONS International Conf. on Neural Systems (Hey! This is even a Word!) The most elegant name is simply NN "Neural Networks", which can be spoken as either "N Squared" signifying both its size and technical nature, or "Double-N", signifying both the need for a big spread and the yearly "round-up" of research results like cattle... Of course the search for acronyms usually generates useless debris: NIPSOID Neural Information Processing Systems On an International Dimension MANIC Most (of the) Artificially Neural International Community DNE (Sounds like DNA?) Dear Neural Enthusiast... BNANA Big Network of Artificial Neural Aficionados ARTIST Adaptive Resonance Theory as International Science and Technology IBSH I better Stop Here. From kanderso at BBN.COM Tue Jan 24 13:54:04 1989 From: kanderso at BBN.COM (kanderso@BBN.COM) Date: Tue, 24 Jan 89 13:54:04 -0500 Subject: Weight Decay In-Reply-To: Your message of Mon, 23 Jan 89 20:32:11 -0800. <8901240432.AA18293@garnet.berkeley.edu> Message-ID: Date: Mon, 23 Jan 89 20:32:11 pst From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu Message-Id: <8901240432.AA18293 at garnet.berkeley.edu> To: connectionists at cs.cmu.edu Subject: Weight Decay Referring to the compilation about weight decay from John: I cannot see the analogy between weight decay and ridge regression. The weight solutions in a linear network (Ordinary Least Squares) are the solutions to (I'I) W = I'T where: I is the input matrix (rows are # of patterns in epoch and columns are # of input units in net). T is the teacher matrix (rows are # of patterns in epoch and columns are # of teacher units in net). W is the matrix of weights (net is linear with only one layer!). The weight solutions in ridge regression would be given by (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> represents the identity matrix. Notice that k<1> has the same effect as increasing the variances of the inputs (Diagonal of I'I) without increasing their covariances (rest of the I'I matrix). The final effect is biasing the W solutions but reducing the extreme variability to which they are subject when I'I is near singular (multicollinearity). Obviously collinearity may be a problem in nets with a large # of hidden units. I am presently studying how and why collinearity in the hidden layer affects generalization and whether ridge solutions may help in this situation. I cannot see though how these ridge solutions relate to weight decay. -Javier Yes i was confused by this too. Here is what the connection seems to be. Say we are trying to minimize an energy function E(w) of the weight vector for our network. If we add a constraint that also attempts to minimize the length of w we would add a term kw'w to our energy function. Taking your linear least squares problem we would have E = (T-IW)'(T-IW) + kW'W dE/dW = I'IW - I'T + kW setting dE/dW = 0 gives [I'I +k<1>]W = I'T, ie. Ridge Regression. W = [I'I + k<1>]^-1 I'T The covariance matrix is [I'I + k<1>]^-1 so the effect of increasing k 1. Make the matrix more invertable. 2. Reduces the covariance so that new training data will have less effect on your weights. 3. You loose some resolution in weight space. I agree that collinearity is probably very important, and i'll be glad to discuss that off line. k From jose at tractatus.bellcore.com Wed Jan 25 10:02:09 1989 From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Wed, 25 Jan 89 10:02:09 EST Subject: Weight Decay Message-ID: <8901251502.AA05090@tractatus.bellcore.com> actually I think the connection is more general--ridge regression is a special case of variance techniques in regression called "biased regression" (including principle components), biases are introduced in order to remove effects of collinearity as has been discussed and to attempt to achieve estimators that may have a lower variance then the theoretical best least squares unbiased estimator ("blue") since when assumptions of linearity and independence are violated LSE are not particularly attractive and will not necessarily achieve "blue"s. Conseqently nonlinear regression and ordinary linear least squares regression with collinear variables may be able to achieve lower variance estimators by entertaining biases. In the nonlinear case a bias term would enter as a "constraint" to be mininmized with Error (y-yhat) sup 2. This constriant is actually a term that can push weights differentially towards zero--and in terms of regression is bias in terms of neural networks--weight decay. Ridge regression is a specific case in terms of linear lse where the off diagonal terms of the correlation matrix are given less weight by adding a small constant to the diagonal in order to reduce the collinearity problem--it is still controversial in statistical arenas--not everyone subcribes to the notion of introducing biases--since it is hard a-priori to know what bias might be optimal for a given problem. I have a paper with Lori Pratt that describes this relationship more generally that had been given at the last NIPS and should be available soon as a tech report. Steve Hanson From rui at rice.edu Wed Jan 25 18:34:38 1989 From: rui at rice.edu (Rui DeFigueiredo) Date: Wed, 25 Jan 89 17:34:38 CST Subject: No subject Message-ID: <8901252334.AA01804@zeta.rice.edu> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In-Reply-To: poggio at wheaties.ai.mit.edu's message of Tue, 17 Jan 89 22:47:17 EST Subject: Kolmogorov's superposition theorem Kolmogorov's theorem and its relation to networks are discussed in Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input systems: computational properties of polynomial algorithms, Poggio and Reichardt). There are references there to older papers (see especially the two nice papers by H. Abelson). - - - - - - - - - - - - end of message - - - - - - - - - - - - Comment: Poggio and Reichardt's paper, "On the representation of multi-input systems: Computational properties of polynomial algorithms" (Biol. Cyber., 37, 167-186, 1980) appeared, not earlier but, in the same year as deFigueiredo's, "Implications and applications of Kolmogorov's superposition theorem" (IEEE Trans. on Automatic Control, AC-25, 1227-1231, 1980). From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 12:55:49 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 12:55:49 EST Subject: DARPA Program announcement (long, 2 of 3) Message-ID: Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT(BAA#89-04): NEURAL NETWORKS: HARDWARE TECHNOLOGY BASE DEVELOPMENT SOL BAA#89-04 DUE 030189 POC Douglas M. Pollock, Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving proposals to develop hardware system components that capitalize on the inherent massive parallelism and expected robustness of neural network models. The objective of the present effort is to lay the groundwork for future construction of full-scale artificial neural network computing machines through the development of advanced hardware implementation technologies. DARPA does no intend to build full-scale machines at this stage of the program. Areas of interest include modifiable-weight synaptic connections, neuron processing unit devices, and scalable neural net architecture designs. The technologies proposed may be analog or digital, using silicon or other materials, and may be electronic, optoelectronic, optical, or other. The technology should be robust to manufacturing and environmental variability. It should be flexible and modular to accommodate evolving neural network system architectures and to allow for scale-up to large-sized systems through assembly/interconnection of smaller subsystems. It should be appropriate for future compact, low-power systems. It must accommodate the high fan-out/high fan-in properties characteristic of artificial neural network systems with high density interconnects, and it must have high throughput capability to achieve rapid processing of large volumes of data. Only those proposals that clearly delineate how the objective enumerated above are to be achieved and that demonstrate extensive prior experience in hardware design and fabrication will be favorably considered. If the proposal addresses a component technology, proposers should provide a detailed description of the interface features required for integration into a working artificial neural network system. Whether the proposed technology is adapted to a specific neural net model or, conversely, is applicable to a broad range of models, the proposer should clearly define the specific features of the proposed hardware that underlie its particular applicability. To the extent that availability of the proposed technology will facilitate the implementation of advanced systems other that artificial neural network systems, that potential impact should be described. Hardware developers are encouraged to work in close coordination with neural network modelers to better understand the range of current projected architectural requirements. DARPA will also entertain a limited number of proposals to develop near-term prototypes with high potential for demonstrating the expected power of artificial neural networks. This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From poggio at wheaties.ai.mit.edu Thu Jan 26 13:01:23 1989 From: poggio at wheaties.ai.mit.edu (Tomaso Poggio) Date: Thu, 26 Jan 89 13:01:23 EST Subject: No subject In-Reply-To: Rui DeFigueiredo's message of Wed, 25 Jan 89 17:34:38 CST <8901252334.AA01804@zeta.rice.edu> Message-ID: <8901261801.AA15158@wheat-chex.ai.mit.edu> ... From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 13:00:34 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 13:00:34 EST Subject: DARPA Program Announcement (long, 3 of 3) Message-ID: Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT(BAA#89-03): NEURAL NETWORKS: THEORY AND MODELING SOL BAA#89-03 DUE 030189 POC Douglas M. Pollock, Contracts(202)694-1771; Dr. Barbara L. Yoon, Technical(202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving proposals to develop and analyze new artificial neural network system architectures/structures and training procedures; define the requirements for scale-up to large-sized artificial neural networks; and characterize the properties, limitations, and data requirements of new and existing artificial neural network systems. Proposers are encouraged to submit proposals that deal with, but are not limited to, any combination of the following thrusts within these areas: (1) New artificial neural architectures with one or more of the following features: (a) Potential for addressing real-time sensory data processing and real-time sensorimonitor control; (b) Networks that incorporate features of sensory, motor, and perceptual processing in biological systems; (c) Nodal elements with increased processing capability, including sensitivity to temporal variations in synaptic inputs; (d) Modular networks composed of multiple interconnected subnets; (e) Hybrid systems combining neural and conventional information processing techniques; (f) Mechanisms to achieve modifications of network behavior in response to external consequences of initial actions; (g) Mechanisms that exhibit selective attention; (h) Strategies for developing conceptual systems and internal data representations well adapted to specific tasks; (i) Means for recognizing and producing sequences of temporal patterns. (2) Faster, more efficient training procedures that: (a) Are robust to noisy data and able to accommodate delayed feedback; (b) Minimize the need for external intervention for feedback; (c) Identify optimal choices of initial classification features or categories; (d) Generate internal models of the external world to guide appropriate responses to external stimuli. (3) Theoretical analyses that address; (a) Data representations; (b) Scaling properties for new and existing systems; (c) Matching of system complexity to the nature and amount of training data; (d) Tolerance to nodal element and synaptic failure; (e) Stability and convergence of new and existing systems; (f) Relationships between neural networks and conventional approaches. DARPA will also entertain a limited number of proposals to address special applications with high potential for demonstrating the expected power of artificial neural networks.This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From Scott.Fahlman at B.GP.CS.CMU.EDU Thu Jan 26 12:49:35 1989 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Thu, 26 Jan 89 12:49:35 EST Subject: DARPA Program announcement (long, 1 of 3) Message-ID: Barbara Yoon at DARPA has apparently been flooded with requests for the three DARPA program announcements in the neural network area. To lighten the load, she asked us to send out the full text of these announcements to members of this mailing list. The text in this and the following two messages is copied verbatim from the Commerce Business Daily. We have resisted the temptation to insert paragraph breaks to improve readability. I apologize for dumping so much text on people who alrady have copies of the announcements or who are not interested, but this seems the best way to get the word out to a large set of potentially interested people. Please don't contact us about this program -- the appropriate phone numbers and addresses are listed in the announcements. -- Scott Fahlman, CMU =========================================================================== Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO), 1400 Wilson Blvd., Arlington, VA 22209-2308 A--BROAD AGENCY ANNOUNCEMENT (BAA#89-02): NEURAL NETWORKS: COMPARATIVE PERFORMANCE MEASUREMENTS SOL BAA#89-02 DUE 030189 POC Douglas M. Pollock, Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303. The Defense Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO is interested in receiving proposals to construct and test software simulations of artificial neural networks (or software simulations of hybrid systems incorporating artificial neural networks) that perform defined, complex classification tasks in the following application areas: (1) Automatic target recognition; (2) Continuous speech recognition; (3) Sonar signal discrimination; and (4) Seismic signal discrimination. The objectives of this program are to advance the state-of-the-art in application of artificial neural network approaches to classification problems; to investigate the optimal role of artificial neural networks in hybrid classification systems; and to measure the projected performance of artificial neural networks (or hybrid systems containing neural networks) in order to support a comparison with the performance of alternative, competing technologies. DARPA will provide application developers with a standard set of training data, appropriate to the application, to be used as the basis for training (or otherwise developing) their classification systems. The systems developed will then be evaluated independently in classification of standard sets of test data, distinct from the training set. The four application tasks are more fully described below. (1) Automatic target recognition: (a) Given a multi-spectral training set of time-correlated images of up to ten land vehicles (which may be partially obscured and in cluttered environments) with ground truth provided, identify and classify these vehicles in a new set of images (outside the training set); (b) Given images of two or more new land vehicles, recognize these vehicles as distinct from the original set and distinguish them from one another (with no system reprogramming or retraining); (c) Given a new training set of data on air vehicles, with system reprogramming and/or retraining, modify the system to identify and classify this new class of targets. (2) Continuous speech recognition: (a) Given a training set of 2800 spoken English sentences (with a 1000 word vocabulary), transcribe to written text spoken English sentences from a test set (outside the training set); (b) With no system reprogramming or retraining, transcribe to text spoken English sentences using vocabulary outside the initial vocabulary (given only the phonetic spelling of the new words); (c) Given training data on spoken foreign language sentences (with characteristics similar to those of the English sentence data base described in application (2)(a) above), with system programming and/or retraining, modify the system to transcribe to text spoken foreign language sentences. (3) Sonar signal discrimination: (a) Given a training set of several acoustic signature transients and passive marine acoustic signals (both signal types in noisy environments), detect and classify each signal type in a test set (outside the training set); (b) Given two or more new passive marine acoustic signals, with no system reprogramming or retraining, recognize these signals as distinct from the original set and distinguish them from one another; (c) Given a new training set of data on underwater echoes from active sonar returns, with system reprogramming and/or retraining, modify the system to detect and classify each signal type in this new class of signals and distinguish them from the original set of acoustic signals. (4) Seismic signal discrimination: (a) Given a training set of seismic signals (and associated parameters) from different types of seismic events of varying magnitudes, each event recorded at two or more seismic stations with ground truth provided, classify (as to signal type), locate, and estimate the magnitude of similar events in a test set of seismic signals (outside the training set); (b) Given one or more new types of seismic signals, recognize these signals as distinct from the original set (with no system reprogramming or retraining); (c) Given a new training set of seismic signals from seismic stations located in different geological regions from the original stations, with system reprogramming and/or retrain- ing, modify the system to classify and characterize this new set of signals. The criteria for evaluating the performance of the classification systems will include: (a) Classification accuracy (the appropriate accuracy metric for the task addressed, e.g., percentage or correct detections, identifications, and/or classifications, including false alarms where applicable; or total error rates); (b) System development time (the time required to develop and train the system); (c) Fault tolerance (the percentage of original performance when subjected to failure of some of the processing elements); (d) Generality (the accuracy of the system for new input data significantly outside the range of training data); (e) Adaptability (the time and effort required to modify the system to address similar classification problems with different classes of data); (f) Computational efficiency (the period solution speed when optimally implemented in hardware); (g) Size and power requirements (the projected size and power requirements of the computational hardware); (h) Performance vs training data (the rate of improvement in performance with increasing size of the training data set). This effort is a part of the DARPA program on Neural Networks, the total funding for which is anticipated to be $33M over a 28 month period. Proposals for projects covering less than 28 months are encouraged. Proposals may be submitted any time through 4PM, March 1, 1989. The proposal must contain the information listed below. (1) The name, address, and telephone number of the individual or organization submitting the proposal; (2) A brief title that clearly identifies the application being addressed, a concise descriptive summary of the proposed research, a supporting detailed statement of the technical approach, and a description of the facilities to be employed in this research. Cooperative arrangements among industries, universities, and other institutions are encouraged whenever this is advantageous to executing the proposed research. Proprietary portions to the technical proposal should be specifically identified. Such proprietary information will be treated with strict confidentiality; (3) The names, titles, and proposed roles of this principal investigators and other key personnel to be employed in the conduct of this research, with brief, resumes that describe their pertinent accomplishments and publications; (4) A cost proposal on SF1411 (or its equivalent) describing total costs, and an itemized list of costs for labor, expendable and non-expendable equipment and supplies, travel, subcontractors, consultants, and fees; (5) A schedule listing anticipated spending rates and program milestones; (6) The signature of the individual (if applying on his own behalf) or of an official duly authorized to commit the organization in business and financial affairs. Proposals should address a single application. The technical content of the proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2 x 11 inches), exclusive of figures, tables, references, resumes, and cost proposal. Proposals should contain a statement of validity for at least 150 days beyond the closing date of this announcement. Evaluation of proposals received in response to the BAA will be accomplished through a peer or scientific review. Selection of proposals will be based on the following evaluation criteria, listed in descending order of relative importance; (1) Contribution of the proposed work to the stated objectives of the program; (2) The soundness of the technical approach; (3) The uniqueness and innovative content; (4) The qualifications of the principal and supporting investigators; (5) The institution's capabilities and facilities; and (6) The reasonableness of the proposed costs. Selection will be based primarily on scientific or technical merit, importance to the program and fund availability. Cost realism and reasonableness will only be significant in deciding between two technically equal proposals. Fifteen copies of proposals should be sub- mitted to: Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, Arlington, VA 2209-2308. Technical questions should be addressed to Dr. Yoon, telephone (202)694-1303. This CBD notice itself constitutes the Broad Agency Announcement as contemplated in FAR 6.102(d)(2). No additional written information is available, nor will a formal RFP or other solicitation regarding this announcement be issued. Requests for same will be disregarded. The Government reserves the right to select for award all, some or none of the proposals received in response to this announcement. All responsible sources may submit a proposal which shall be considered by DARPA. From pwh at ece-csc.ncsu.edu Thu Jan 26 17:31:04 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Thu, 26 Jan 89 17:31:04 EST Subject: No subject Message-ID: <8901262231.AA03761@ece-csc.ncsu.edu> REVISED SUBMISSION DEADLINE FOR IJCNN-89 PAPERS--FEBRUARY 15, 1989 International Joint Conference on Neural Networks June 18-22, 1989 Washington, D.C. DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 has been revised to FEBRUARY 15, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Analysis -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Speech Processing and Recognition -Electronic Neurocomputers -Neural Network Architectures & Theory -Optimization FULL PAPERS in camera-ready form (1 original on Author's Kit forms and 5 reduced 8 1/2" x 11" copies) should be submitted to Nomi Feldman, Confer- ence Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, IJCNN-89 Conference Coordinator 3770 Tansy Street, San Diego, CA 92121 (619) 453-6222 From REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU Fri Jan 27 12:55:00 1989 From: REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU (Rex C. Bontrager) Date: Fri, 27 Jan 1989 12:55 EST Subject: INNS membership Message-ID: Who do I contact regarding INNS membership? (more precisely, to whom do I send my money?) Rex C. Bontrager Bitnet: rexb at purccvm Internet: rexb at vm.cc.purdue.edu Phone: (317) 494-1787 ext. 256 From neural!yann Wed Jan 25 15:13:58 1989 From: neural!yann (Yann le Cun) Date: Wed, 25 Jan 89 15:13:58 -0500 Subject: Weight Decay Message-ID: <8901252012.AA00971@neural.UUCP> Consider a single layer linear network with N inputs. When the number of training pattern is smaller than N , the set of solutions (in weight space) is a proper linear subspace. adding weight decay will select the minimum norm solution in this subspace (if the weight decay coefficient is decreased with time). The minimum norm solution happens to be the solution given by the pseudo-inverse technique (cf Kohonen), and the solution which optimally cancels out uncorrelated zero mean additive noise on the input. - Yann Le Cun From reggia at mimsy.umd.edu Fri Jan 27 19:41:19 1989 From: reggia at mimsy.umd.edu (James A. Reggia) Date: Fri, 27 Jan 89 19:41:19 EST Subject: call for papers Message-ID: <8901280041.AA04500@mimsy.umd.edu> CALL FOR PAPERS The 13th Annual Symposium on Computer Applications in Medical Care will have a tract this year on applications of neural models (connectionist models, etc.) in medicine. The Symposium will be held in Washington DC, as in previous years, on November 5 - 8, 1989. Submissions are refereed and if accepted, appear in the Symposium Proceedings. Deadline for submission of manuscripts (six copies, double spaced, max. of 5000 words) is March 3, 1989. For further information and/or a copy of the detailed call for papers, contact: SCAMC Office of Continuing Medical Education George Washington University Medical Center 2300 K Street, NW Washington, DC 20037 The detailed call for papers includes author information sheets that must be returned with a manuscript. From elman at amos.ling.ucsd.edu Sat Jan 28 01:24:24 1989 From: elman at amos.ling.ucsd.edu (Jeff Elman) Date: Fri, 27 Jan 89 22:24:24 PST Subject: UCSD Cog Sci faculty opening Message-ID: <8901280624.AA11066@amos.ling.ucsd.edu> ASSISTANT PROFESSOR COGNITIVE SCIENCE UNIVERSITY OF CALIFORNIA, SAN DIEGO The Department of Cognitive Science at UCSD expects to receive permission to hire one person for a tenure-track position at the Assistant Professor level. The Department takes a broadly based approach to the study of cognition, including its neurological basis, in individuals and social groups, and machine intelligence. We seek someone whose interests cut across conventional disciplines. Interests in theory, computational modeling (especially PDP), or applications are encouraged. Candidates should send a vita, reprints, a short letter describing their background and interests, and names and addresses of at least three references to: Search Committee Cognitive Science, C-015-E University of California, San Diego La Jolla, CA 92093 Applications must be received prior to March 15, 1989. Salary will be commensurate with experience and qualifications, and will be based upon UC pay schedules. Women and minorities are especially encouraged to apply. The University of California, San Diego is an Affirmative Action/Equal Opportunity Employer. From Dave.Touretzky at B.GP.CS.CMU.EDU Sat Jan 28 07:14:37 1989 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Sat, 28 Jan 89 07:14:37 EST Subject: INNS membership In-Reply-To: Your message of Fri, 27 Jan 89 12:55:00 -0500. Message-ID: <462.601992877@DST.BOLTZ.CS.CMU.EDU> PLEASE: Do not send requests for general information (like how to join INNS) to the CONNECTIONISTS list! This list is intended for serious scientific discussion only. If you need help with an address or something equally trivial, send mail to connectionists-request if you must. Better yet, use the Neuron Digest. Don't waste people's time on CONNECTIONISTS. -- Dave From norman%cogsci at ucsd.edu Sun Jan 29 13:36:36 1989 From: norman%cogsci at ucsd.edu (Donald A Norman-UCSD Cog Sci Dept) Date: Sun, 29 Jan 89 10:36:36 PST Subject: addendum to UCSD Cog Sci faculty opening Message-ID: <8901291836.AA22314@sdics.COGSCI> Jef Ellman's posting of the job at UCSD in the Cognitive Science Department was legally and technically accurate, but he should have added one important sentence: Get the application -- or at least, a letter of interest -- to us immediately. We are very late in getting the word out, and decisions will have to be made quickly. The sooner we know of the pool of applicants, the better. (Actually, I now discover one inaccuracy -- the ad says we "expect to receive permission to hire ..." In fact, we now do have that permission. If you have future interests -- say you are interested not now, but in a year or two or three -- that too is important for us to know, so tell us. don norman From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Sun Jan 29 22:39:30 1989 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias) Date: Sun, 29 Jan 89 22:39:30 EST Subject: speech list? Message-ID: does anyone know of a mailing list where speech questions are discussed? (not necessarily as related to connectionist methods; just speech questions in general). thanks a lot, Thanasis From pwh at ece-csc.ncsu.edu Mon Jan 30 14:48:25 1989 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Mon, 30 Jan 89 14:48:25 EST Subject: IJCNN Call for Papers Amendment Message-ID: <8901301948.AA25787@ece-csc.ncsu.edu> Amendment to IJCNN call for papers Sorry...Upon reflection the wording in the IJCNN call for papers did not convey the proper meaning. Perhaps a better way to say it would have been, "IJCNN-89 is replacing both the ICNN and INNS meetings in 1989." The intent was for people to realize that if they planned to submit to either ICNN or INNS or both in 1989, the joint conference is the only opportunity to do so. Part of the reason for extending the deadline is to allow for the short notice (no INNS call for papers had previously been issued, since the merger of the two conferences just occurred). The original text was meant to imply the above and nothing more. No offense should be taken because none was intended. By the way, I was at last year's NIPS conference and thought it was an excellent conference. I plan to be there again next year. Also there has been some confusion over the revised deadline for paper sub- missions to IJCNN. The revised deadline STILL STANDS as FEBRUARY 15. P.S. Following the precedent set at the IJCAI, my pronunciation of IJCNN is idge-kin. The acronyms were good though! Wes Snyder, Co-Chairman of the Organization Committee, IJCNN-89 January 30, 1989