From smagt at fwi.uva.nl Wed Oct 2 04:28:55 1991 From: smagt at fwi.uva.nl (Patrick van der Smagt) Date: Wed, 2 Oct 91 09:28:55 +0100 Subject: reprint announcement Message-ID: <9110020828.AA20879@fwi.uva.nl> I mentioned a paper some time ago about neural robotics control. Popular demand made me decide to make it available by anonymous ftp from neuroprose. --------------------------------------------------------------------- The following reprint is available by ftp from the neuroprose archive at archive.cis.ohio-state.edu: A real-time learning neural robot controller P. Patrick van der Smagt Ben J. A. Kr\"ose Department of Computer Systems University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands ABSTRACT A neurally based adaptive controller for a 6 degrees of freedom (DOF) robot manipulator with only rotary joints and a hand-held camera is described. The task of the system is to place the manipulator directly above an object that is observed by the camera (i.e., 2D hand-eye coordination). The requirement of adaptivity results in a system which does not make use of any inverse kinematics formulas or other detailed knowledge of the plant; instead, it should be self-supervising and adapt on-line. The proposed neural system will directly translate the preprocessed sensory data to joint displacements. It controls the plant in a feedback loop. The robot arm may make a sequence of moves before the target is reached, when in the meantime the network learns from experience. The network is shown to adapt quickly (in only tens of trials) and form a correct mapping from input to output domain. Here's how to get the reprint from neuroprose: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get smagt.rtcontrol.ps.Z ftp> quit unix> uncompress smagt.rtcontrol.ps.Z unix> lpr smagt.rtcontrol.ps (or however you print postscript) Questions or comments can be sent to me at: Patrick van der Smagt Department of Computer Systems University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands email: smagt at fwi.uva.nl fax: +31 20 525 7490 phone: +31 20 525 7524 From LAUTRUP at nbivax.nbi.dk Wed Oct 2 04:05:00 1991 From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup) Date: Wed, 2 Oct 1991 09:05 +0100 (NBI, Copenhagen) Subject: preprint Message-ID: <1F984C7800023236@nbivax.nbi.dk> New preprint Uniqueness of Parisi's Scheme for Replica Symmetry Breaking B. Lautrup Computational Neural Network Center The Niels Bohr Institute Blegdamsvej 17 2100 Copenhagen, Denmark Abstract: Replica symmetry breaking in spin glass models is investigated using elements of the theory of permutation groups. It is shown how the various types of symmetry breaking gives rise to special algebras and that Parisi's scheme may be uniquely characterized by two simple conditions on these algebras, namely transposition symmetry and simple extensibility. An alternative to the Parisi scheme is shown to be unacceptable. The paper may be retrieved by anonymous ftp from nbibel.nbi.dk (129.142.100.11) in the directory pub/neuroprose under the name lautrup.parisi.ps.Z It is a compressed postscript file. Regards Benny Lautrup From mclennan at cs.utk.edu Wed Oct 2 15:10:21 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Wed, 2 Oct 91 15:10:21 -0400 Subject: report available Message-ID: <9110021910.AA12670@maclennan.cs.utk.edu> ** Please do not forward to other boards. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. N.B. The uncompressed file is long (2.07 MB), so you may have to use the -s (symbolic link) option on lpr to print it. ----------------------------------------------------- Gabor Representations of Spatiotemporal Visual Images Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-144 ABSTRACT: We review Gabor's Uncertainty Principle and the limits it places on the representation of any signal. Representations in terms of Gabor elementary functions (Gaussian-modulated sinusoids), which are optimal in terms of this uncertainty principle, are compared with Fourier and wavelet representations. We also review Daugman's evidence for representations based on two-dimensional Gabor functions in mammalian visual cortex. We suggest three- dimensional Gabor elementary functions as a model for motion selectivity in complex and hypercomplex cells in visual cortex. This model also suggests a computational role for low frequency oscillations (such as the alpha rhythm) in visual cortex. A preliminary version of this paper was presented at the workshop ``Foundational Methods for Behavioral and Computational Neurosci- ences,'' Georgetown University, May 13-15, 1991. ----------------------------------------------------- FTP INSTRUCTIONS Either use the Getps script, or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.gabor.ps.Z ftp> quit unix> uncompress maclennan.gabor.ps.Z unix> lpr -s maclennan.gabor.ps (or however you print postscript) If you need hardcopy, then send your request to: library at cs.utk.edu Bruce MacLennan Department of Computer Science 107 Ayres Hall The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From M.Stannett at dcs.sheffield.ac.uk Wed Oct 2 16:30:06 1991 From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk) Date: Wed, 2 Oct 91 16:30:06 BST Subject: Concurrent semantics Message-ID: <9110021530.AA04587@sun5.dcs.sheffield.ac.uk> Dear All, IF THIS MESSAGE ISN'T RELEVANT TO YOU, PLEASE PASS IT TO SOMEONE TO WHOM IT IS. One of my major delights in computer science is the nature of concurrent semantics, and especially the "non-interleaving" models like Mazurkiewicz trace language and their analogues (these are models which represent so-called 'true' concurrency, rather than trying to flatten everything down into sequences of actions). Nonetheless, I readily admit that the more standard "interleaving" models are fascinating in their own right as well. In any case, I'm certain we're all trying to solve the same problems, but merely approaching them from slightly different angles - in ten years time, we'll be wondering what all the disagreement was about .... {{{ CONNECTIONISTS: concurrent semantics is concerned with working out what complex concurrent systems are actually doing, and how properly to represent their behaviour. Applying the standard sequential interpretations to concurrent systems can sometimes lead to misleading results. Consequently, I would argue that finding a deep understanding of the nature of complex networks probably involves exactly the same problems as are currently faced by concurrent semantics theorists. It might prove extremely fruitful to see some colloborations between the two fields }}} As far as I can work out, there seems to be only negligible contact between the many groups working in the area. I'd like to see some sort of elecronic forum for discussing ideas in the area - even if we can't work together, at least we might be able to exchange ideas rapidly from time to time. Please let me know if you'd be interested in joining in a sort of loosely confederated "concurrency club" or whatever. Obviously, there's be no funding to speak of, but then, given sufficient enthusiasm, we shouldn't need any. (At least, not yet). Provided the task isn't TOO time-consuming, I'll happily channel messages to interested parties for the time-being. Thanks for reading! Mike Stannett ( M.Stannett @ uk.ac.sheffield.dcs ) From et at eng.cam.ac.uk Wed Oct 2 10:31:19 1991 From: et at eng.cam.ac.uk (E. Tzirkel-Hancock) Date: Wed, 2 Oct 91 15:31:19 +0100 Subject: Technical Report Available Message-ID: <24638.9110021431@tw700.eng.cam.ac.uk> The following report has been placed in the neuroprose archives at Ohio State University: STABLE CONTROL OF NONLINEAR SYSTEMS USING NEURAL NETWORKS Eli Tzirkel-Hancock & Frank Fallside Technical Report CUED/F-INFENG/TR.81 Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England Abstract A neural network based direct control architecture is presented, that achieves output tracking for a class of continuous time nonlinear plants, for which the nonlinearities are unknown. The controller employs neural networks to perform approximate input/output plant linearization. The network parameters are adapted according to a stability principle. The architecture is based on a modification of a method previously proposed by the authors, where the modification comprises adding a sliding control term to the controller. This modification serves two purposes: first, as suggested by Sanner and Slotine, sliding control compensates for plant uncertainties outside the state region where the networks are used, thus providing global stability; second, the sliding control compensates for inherent network approximation errors, hence improving tracking performance. A complete stability and tracking error convergence proof is given and the setting of the controller parameters is discussed. It is demonstrated that as a result of using sliding control, better use of the network's approximation ability can be achieved, and the asymptotic tracking error can be made dependent only on inherent network approximation errors and the frequency range of unmodeled dynamical modes. Two simulations are provided to demonstrate the features of the control method. ************************ How to obtain a copy ************************ a) via FTP: % ftp archive.cis.ohio-state.edu .. Name (archive.cis.ohio-state.edu): anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get tzirkel.control_tr81.ps.Z ftp> quit % uncompress tzirkel.control_tr81.ps.Z % lp tzirkel.control_tr81.ps b) via postal mail: Request a hardcopy from Eli Tzirkel, et at eng.cam.ac.uk Speech Laboratory Cambridge University Engineering Department Trumpington Street, Cambridge CB2 1PZ England From STIVA%IRMKANT.BITNET at vma.cc.cmu.edu Thu Oct 3 11:41:47 1991 From: STIVA%IRMKANT.BITNET at vma.cc.cmu.edu (stefano nolfi) Date: Thu, 03 Oct 91 11:41:47 EDT Subject: Technical Report Available Message-ID: The following technical report is available. Send request to STIVA at IRMKANT.BITNET DO NOT REPLAY TO THIS MESSAGE ------------------------------------------------------------------------ Learning, Behavior, and Evolution Domenico Parisi Stefano Nolfi Federico Cecconi Institute of Psychology CNR - Rome e-mail: stiva at irmkant.Bitnet Abstract We present simulations of evolutionary processes operating on populations of neural networks to show how learning and behavior can influence evolution within a strictly Darwinian framework. Learning can accelerate the evolutionary process both when learning tasks correlated with the fitness criterion and when random learning tasks are used. Furthermore, an ability to learn a task can emerge and be transmitted evolutionarily for both correlated and uncorrelated tasks. Finally, behavior that allows the individual to self-select the incoming stimuli can influence evolution by becoming one of the factors that determine the observed phenotypic fitness on which selective reproduction is based. For all the effects demonstrated, we advance a consistent explanation in terms of a multidimensional weight space for neural networks, a fitness surface for the evolutionary task, and a performance surface for the learning task. This paper will be presented at ECAL-91 - European Conference on Artificial Life, December 1991, Paris. From mre1 at it-research-institute.brighton.ac.uk Thu Oct 3 09:20:50 1991 From: mre1 at it-research-institute.brighton.ac.uk (Mark Evans) Date: Thu, 3 Oct 91 09:20:50 BST Subject: IJCNN '91 Singapore - Request to share a room Message-ID: <1583.9110030820@itri.bton.ac.uk> I will be attending IJCNN '91 in Singapore on the 18-21 November where I will be presenting a paper. I would be interested in hearing from anyone who would like to share a twin room for the duration of the conference. (I am about to book myself a room or I could pay you if you have already booked a room.) I am PhD student at Brighton Polytechnic, UK working in the field of computer vision and neural networks. Anyone interested ? ################################################# # # # M.R. Evans mre1 at itri.bton.ac.uk # # Research Assistant mre1 at itri.uucp # # # # ITRI, # # Brighton Polytechnic, # # Lewes Road, # # BRIGHTON, # # E. Sussex, # # BN2 4AT. # # # # Tel: +44 273 642915/642900 # # Fax: +44 273 606653 # # # ################################################# From kak at max.ee.lsu.edu Thu Oct 3 10:38:55 1991 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Thu, 3 Oct 91 09:38:55 CDT Subject: TR's available Message-ID: <9110031438.AA14174@max.ee.lsu.edu> Please send me a copy of your report. Subhash Kak Professor of Electrical & Computer Engineering Louisiana State University Baton Rouge, LA 70803-5901 From M.Stannett at dcs.sheffield.ac.uk Fri Oct 4 16:12:17 1991 From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk) Date: Fri, 4 Oct 91 16:12:17 BST Subject: concurrent semantics mailing list Message-ID: <9110041512.AA06164@sun5.dcs.sheffield.ac.uk> Hello again! A number of subscribers to CONNECTIONISTS have indicated they haven't come across concurrent semantics (which may explain Chris Tofts' comments below). I'll send you a quick summary of the subject area in a few days' time, and try to show why it's relevant to connectionist researchers. Meanwhile ... two respondents have indicated that appropriate electronic fora already exist for the discussion of concurrent semantics, while others have demonstrated that (like me) they have no information about these fora. Since there's no point setting up a third system in competition with the other two I now know about, I enclose the details below. (If the other are indeed distinct, perhaps they should consider merging ...) --- Included message #1 --- From: Miranda Mowbray Hello Mike, Yes, this is a very good idea [...] There is already a Concurrency mailing list and archive, specially designed as a forum for rapid exchange of ideas between different groups working in Concurrency. It's been running for some time now and I'm surprised you haven't heard of it. It's run by Albert Meyer at MIT. To join, send a message to concurrency at theory.lcs.mit.edu saying that you'd like to be on the mailing list. You'll get information about archive files available. This is a high quality forum and I recommend joining. I also recommend that you tell anyone else who replies to your message and wants to be in a concurrency club. I don't see why you should go to the trouble of setting up your own separate club when one already exists, unless your version has specific local interests which are not catered for by Albert Meyer's; in any case what you *mustn't* do is set up a second forum which will keep people ignorant of the first, after all the whole point is to get everyone together! Thankyou for your public-spiritedness, Yours, Miranda. --- Included message(s) # 2/3 --- From: Chris Tofts Subject: Re: Concurrent semantics Hi Mike, interesting idea, at a symposium on complex systems in the states last year I suggested using ideas from algebraic concurrency theory to a collection of people working in neural nets etc, they not only seemed remarkable uniterested but failed to see any link. It seems that any connections (sic) will have to be exposed from the theoretical side. There already exists a news group for concurrency which is used, are you suggesting something other than this?? All the best, Chris. From: C.Tofts at uk.ac.bath.gdr I believe its mail.concurrency, at least that's what its called in edinburgh. Ask your local news guru, All the best, Chris. --- End of included messages --- From wray at ptolemy.arc.nasa.gov Fri Oct 4 19:11:01 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Fri, 4 Oct 91 16:11:01 PDT Subject: tree classification code available for comparative studies Message-ID: <9110042311.AA01252@ptolemy.arc.nasa.gov> I've made the following report available on the Neuroprose Archive (cheops.cis.ohio-state.edu) as buntine.treecode.ps.Z not because I think connectionists are "deeply" interested in tree learning research but because I think it would be a handy resource for comparative studies: 1) systems such as CART/C4 are recognised programs for benchmarking supervised learning systems against 2) home-grown reimplementations can be buggy and a timesink 3) if your problem has some inherent structure and a few key indicator variables then trees may be a good thing to try as well 4) trees typically don't work well with purely numeric data or with problems with many variables all giving some minor contribution to the prediction being made The IND Tree Package we developed here incorporates some of early C4, most of the classification trees component of CART (no regression) along with some more recent Bayesian/MDL approaches that sometimes work better. You can obtain LaTeX source for the following introductory report if you email to: ind at kronos.arc.nasa.gov and ask for "About the IND Tree Package". --------------------------------------- About the IND Tree Package Wray Buntine, RIACS NASA Ames Research Center Mail Stop 269-2 Moffet Field, CA 94035 September 29, 1991 This note introduces the IND Tree Package to prospective procurers and those users/installers looking at IND for the first time. IND does supervised learning using classification trees. IND integrates features from Breiman {\it et al.}'s CART and Quinlan's C4 with newer Bayesian and minimum encoding methods for growing classification trees, and provides an experimental control suite on top. The package comes with a manual, ``man'' entries, and a guide to tree methods and research. Information about obtaining IND, performance statistics, documentation, authorship, copyright, installation, etc., are given. IND is currently under development, although it has been used considerably since late 1989. IND is implemented in C under UNIX. ---------------------------------------- Wray Buntine RIACS (Research Inst. for Advanced Comp. Sc.) NASA Ames Research Center phone: (415) 604 3389 Mail Stop 244-17 fax: (415) 604 6997 Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Sun Oct 6 09:31:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Sun, 6 Oct 91 09:31 U Subject: Thank's for help. Message-ID: <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Sun Oct 6 09:33:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Sun, 6 Oct 91 09:33 U Subject: Thank's for help Message-ID: <01GBEHHAOVDCD7QHLX@BITNET.CC.CMU.EDU> From tackett at ipla00.dnet.hac.com Sun Oct 6 00:08:02 1991 From: tackett at ipla00.dnet.hac.com (Walter Alden Tackett) Date: Sun, 6 Oct 91 00:08:02 EDT Subject: tree classification code available for comparative studies Message-ID: <9110060708.AA10023@ipla00.ipl.hac.com> Wray Buntine writes: > not because I think connectionists are "deeply" interested in tree learning ...only in *dendritic* trees, maybe? ;-) -wt From aboulang at BBN.COM Sun Oct 6 11:40:36 1991 From: aboulang at BBN.COM (aboulang@BBN.COM) Date: Sun, 6 Oct 91 11:40:36 EDT Subject: Detailed Balance In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> Message-ID: The property (2) is called detailed balance resulting in a Gibbs distribution for the probability to find the system in a particular state. The rule (1) is an update procedure for the spin Sk which ensure detailed balance provided that E is an energy. Both principles are fundamental facts of statistical mechanics of neural networks (or if you prefer result from an maximum entropy analysis of neural nets). The book by Hertz Krogh and Palmer summerizes all that in a nice way. The book title is "Introduction to Neural Computation". We really should be saying that detailed balance in sampling implies a Gibbs distribution, but that the Gibbs distribution does not imply the use of a sampling procedure with detailed balance. There is some new work on this: J. Marroquin & A. Ramerez "Stochastic Cellular Automata with Gibbsian Invariant Measures" IEEE Trans Information Theory May(*), 1991 * I can't find the paper so I may have the month wrong. This is potentially good news to people trying to get annealing-type algorithms to work for fine-grained MIMD parallelism. Regrads, Albert Boulanger aboulanger at bbn.com From M.Stannett at dcs.sheffield.ac.uk Sun Oct 6 00:10:13 1991 From: M.Stannett at dcs.sheffield.ac.uk (Mike Stannett) Date: Sun, 6 Oct 91 00:10:13 BST Subject: summary of concurrent semantics Message-ID: <9110052310.AA15255@dcs.sheffield.ac.uk> ((This message is just over two pages of A4 long)) A very brief (incomplete) summary of concurrent semantics --------------------------------------------------------- (This description reflects my personal bias towards trace models; I apologise in advance to anyone who feels I've given an unbalanced account of the field.) You will recall Russell's demonstration that mathematics early this century was built on very dodgy ground. The search was on, and still is, for a formal theory of mathematics itself - why is it sensible to discuss some sets but not others? This purely mathematical problem led directly to many aspects of computer science that are now taken for granted. For example, Skolem (c. 1934) realised that the derivation of Russell's paradox could be avoided by introducing the notion of definition-by-recursion. Meanwhile, Church was developing the lambda-calculus, Post was working on his production systems, and Turing was introducing his machine models and computational AI. As a result, there is a wealth of structure available for discussing the underlying nature of computational processes themselves. This is essential in some cases. For example, we need to ensure that the code we produce will generate the same behaviour when compiled on two different systems; consequently, we need some way of describing the semantics of this code (i.e. what it's supposed to mean) which is machine-independent. There are several approaches to this problem, with perhaps the most mathematical being 'denotational semantics', under which all programs can be regarded as functional - a program becomes a function which maps abstract 'inputs' to abstract 'outputs'. For concurrent systems, this 'functional' view is insufficient. A standard example concerns the use of shared variables: from a purely sequential point of view, the two programs prog1: x=0; x++; x++ prog2: x+0; x+=2 are identical, since they implement the same overall function. From the concurrent point of view, they are NOT identical, because they can interact with a third process in different ways. For example, if we run first prog1 and then prog2 in the context of prog3: x=10; then the possible values of x on termination of the combined systems are different prog1 | prog3 : 2, 10, 11, 12, error prog2 | prog3 : 2, 10, 12, error depending on precisely when prog3 gets executed. Accordingly, much of concurrent semantics is based on the idea that processes should be regarded as active agents which interact with each other. For example, we would reject the notion that the variable x is just a passive entity which is operated upon; instead it becomes an agent in its own right, which interacts with the processes that update it. Many solutions to the problem of correctly representing the semantics of concurrent systems have been developed, and can be roughly divided into two 'schools' - interleaving and non-interleaving. According to the interleaving version, the sequences of activities that might be performed by two systems running concurrently are just the interleavings of the sequences for the systems taken individually. This is the approach adopted in (the standard theories of) CCS and CSP. The non-interleaving school argues that this representation is inappropriate, and indeed unnecessary, since models of 'true' concurrency are easy to develop (e.g. Petri nets). In the middle ground, there are models such as 'Mazurkiewicz trace theory' which consider the behaviour of a concurrent system to be represented by the collection of ALL its possible action-sequences (rather than accepting the notion that any one of these traces will do as a valid representation). Nor is this a complete list of the approaches used; for example, there is a growing tendency to use models based on category theory and general topology, but I can't reasonably include these in a short summary (besides, I don't know enough about them to represent them accurately). The key differences between the different approaches are in the way they treat the relationship between time and causality. Given that we are trying to describe a system based on the possible observations of its behaviour, we have to be careful when we impute relationships that may not exist. It may just happen, for example, that one event in a system is always followed by another - but this doesn't mean that they are causally related. Sometimes this doesn't matter, but problems can arise when we introduce additional processes with which to interact. It becomes very difficult to work out precisely how the models of individual processes should be 'stuck together' to get a valid model of the combined system. Presumably this problem is reflected in difficulties faced by connectionists in deciding what happens when large nets are considered to be made up of more manageable sub-nets. Do you have a general theory yet for deciding * what process is computed by a given net ? * what process is computed by a given combination of smaller nets ? If not, perhaps our two different disciplines could benefit from talking to one another. Some sources ============ Probably the best sources for results in semantics and concurrency are the many volumes of the "Lecture Notes in Computer Science" series from Springer-Verlag. In addition, CCS: The standard text is Robin Milner 1989 Communication and Concurrency Prentice-Hall International CSP: The standard text is C.A.R. (Tony) Hoare 1985 Communicating Sequential Processes Prentice Hall International A good collection of papers that demonstrates the relationships between the many approaches to concurrent semantics is Kwiatkowska M.Z., Shields M.W, and Thomas R.M. (eds) Semantics for concurrency, Leicester 1990 BCS/Springer 'Workshops in Computing' ISBN 3-540-19625-0 I've also got a couple of recent tech. reports concerning generalisations of trace theory for those who want them, but be warned that these are of a highly technical nature, and may not be of much relevance to you just yet. These are Kwiatkowska M.Z. and Stannett M. On transfinite traces CS-91-06 Stannett M. Trace convergence over infinite alphabets CS-91-08 Best wishes, Mike Stannett. From rba at vintage.bellcore.com Mon Oct 7 15:23:15 1991 From: rba at vintage.bellcore.com (Bob Allen) Date: Mon, 7 Oct 91 15:23:15 -0400 Subject: No subject Message-ID: <9110071923.AA12445@vintage.bellcore.com> Subject: Student Travel Grants for NIPS'91 Modest financial support for travel to the Neural Information Processing Systems (NIPS, Denver Dec 2-5, 1991) conference is available to students and other young researchers who are active in neural networks research. Those requesting support should send a one-page summary of their background and research interests, a cirriculum vitae, and their email address to: Dr. R.B. Allen NIPS Treasurer Bellcore MRE 2A-367 445 South Street Morristown, NJ 07960-1910 Travel grant check for those receiving awards will be available at the conference registration desk. From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Tue Oct 8 13:00:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Tue, 8 Oct 91 13:00 U Subject: Thank's Message-ID: <01GBHGORN4ZKD7Q01U@BITNET.CC.CMU.EDU> From aboulang%BBN.COM at CARNEGIE.BITNET Sun Oct 6 11:40:36 1991 From: aboulang%BBN.COM at CARNEGIE.BITNET (aboulang%BBN.COM@CARNEGIE.BITNET) Date: Sun, 6 Oct 91 11:40:36 EDT Subject: Detailed Balance In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> Message-ID: <01GBFAUK6QK0D7QISN@BITNET.CC.CMU.EDU> We really should be saying that detailed balance in sampling implies a Gibbs distribution, but that the Gibbs distribution does not imply the use of a sampling procedure with detailed balance. There is some new work on this: J. Marroquin & A. Ramerez "Stochastic Cellular Automata with Gibbsian Invariant Measures" IEEE Trans Information Theory May(*), 1991 * I can't find the paper so I may have the month wrong. This is potentially good news to people trying to get annealing-type algorithms to work for fine-grained MIMD parallelism. Regrads, Albert Boulanger aboulanger at bbn.com From PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU Tue Oct 8 12:03:11 1991 From: PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU (Pal Ribarics) Date: Tue, 08 Oct 91 16:03:11 GMT Subject: NN Workshop Message-ID: <01GBI4QH7740D7POVG@BITNET.CC.CMU.EDU> ******************************************************************************* Dear Colleague , we would like to remind you of the deadline for sending abstracts to the topical workshop on Neural Networks within the Second International Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics Talks will be selected by the Organizing Committee on the basis of a detailed abstract to be submitted before: 15 October, 1991. to the address below. You will also find a registration form which was sent to you in a prior mail. Best regards B. Denby C. Kiesling C. Peterson P. Ribarics ======================================================================== SECOND INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS FOR HIGH ENERGY AND NUCLEAR PHYSICS 1992 January 13 - 18 L'AGELONDE FRANCE-TELECOM LA LONDE LES MAURES BP 64 F-83250 REGISTRATION NAME: FIRSTNAME: LABORATORY: COUNTRY ADDRESS: TEL: FAX: TELEX: E-MAIL: HOTEL RESERVATION (Number of persons): In the following you are expected to answer with the corresponding number or character from the list above. However if your interest is not mentioned in the list give a full description. WOULD YOU BE INTERESTED TO JOIN A WORKING GROUP OF THE ASTEC PROJECT ? YES/NO GROUP: SUBGROUP: WOULD YOU LIKE TO ATTEND TOPICAL WORKSHOPS OR TUTORIALS ? WORKSHOPS: TUTORIALS: WOULD YOU LIKE TO PRESENT A TALK ? YES/NO TALK TITLE: To be considered by the organizing committee, send an extended abstract before Oct. 15, 1991 to: Michele Jouhet Marie-claude Fert CERN L.A.P.P. - IN2P3 PPE-ADM B.P. 110 CH-1211 Geneve 23 F-74941 Annecy-Le-Vieux SWITZERLAND FRANCE Tel: (41) 22 767 21 23 Tel: (33) 50 23 32 45 Fax: (41) 22 767 65 55 Fax: (33) 50 27 94 95 Telex: 419 000 Telex: 385 180 F E-mail: jouhet at CERNVM Workshop fee : 700 FFr. Student : 500 FFr. Accommodation : 2000 FFr. Accompagning Person: +1200 FFr. To be paid by check: Title: International Workshop CREDIT LYONNAIS/Agence Internationale Bank: 30002 Guichet: 1000 Account: 909154 V Address: LYON REPUBLIQUE The accommodation includes: hotel-room, breakfast, lunch and dinner for 6 days. Tennis, mountain bike and other activities will be available. Denis Perret-Gallix Tel: (41) 22 767 62 93 E-mail: Perretg at CERNVM Fax: (41) 22 782 89 23 From squires at cs.wisc.edu Wed Oct 9 03:22:37 1991 From: squires at cs.wisc.edu (Charles Squires) Date: Wed, 9 Oct 91 02:22:37 -0500 Subject: 3 reports available Message-ID: <9110090722.AA17071@mozzarella.cs.wisc.edu> *** PLEASE DO NOT FORWARD TO OTHER LISTS *** The following three working papers have been placed in the neuroprose archive: -Maclin, R. and Shavlik, J.W., Refining Algorithms with Knowledge-Based Neural Networks: Improving the Chou-Fasman Algorithm for Protein Folding, Machine Learning Research Group Working Paper 91-2. Neuroprose file name: maclin.fskbann.ps.Z -Scott, G.M., Shavlik, J.W., and Ray, W.H., Refining PID Controllers using Neural Networks, Machine Learning Research Group Working Paper 91-3. Neuroprose file name: scott.nnpid.ps.Z -Towell, G.G. and Shavlik, J.W., The Extraction of Refined Rules from Knowledge-Based Neural Networks, Machine Learning Research Group Working Paper 91-4. Neuroprose file name: towell.interpretation.ps.Z The abstract of each paper and ftp instructions follow: ---------- Refining Algorithms with Knowledge-Based Neural Networks: Improving the Chou-Fasman Algorithm for Protein Folding Richard Maclin Jude W. Shavlik Computer Sciences Dept. University of Wisconsin - Madison email: maclin at cs.wisc.edu We describe a method for using machine learning to refine algorithms represented as generalized finite-state automata. The knowledge in an automaton is translated into a corresponding artificial neural network, and then refined by applying backpropagation to a set of examples. Our technique for translating an automaton into a network extends the KBANN algorithm, a system that translates a set of propositional, non- recursive rules into a corresponding neural network. The topology and weights of the neural network are set by KBANN so that the network represents the knowledge in the rules. We present the extended system, FSKBANN, which augments the KBANN algorithm to handle finite-state automata. We employ FSKBANN to refine the Chou-Fasman algorithm, a method for predicting how globular proteins fold. The Chou-Fasman algorithm cannot be elegantly formalized using non-recursive rules, but can be concisely described as a finite-state automaton. Empirical evidence shows that the refined algorithm FSKBANN produces is statistically significantly more accurate than both the original Chou-Fasman algorithm and a neural network trained using the standard approach. We also provide extensive statistics on the type of errors each of the three approaches makes and discuss the need for better definitions of solution quality for the protein- folding problem. ---------- Refining PID Controllers using Neural Networks Gary M. Scott (Chemical Engineering) Jude W. Shavlik (Computer Sciences) W. Harmon Ray (Chemical Engineering) University of Wisconsin The KBANN (Knowledge-Based Artificial Neural Networks) approach uses neural networks to refine knowledge that can be written in the form of simple propositional rules. We extend this idea further by presenting the MANNCON (Multivariable Artif- icial Neural Network Control) algorithm by which the mathematical equations governing a PID (Proportional-Integral-Derivative) con- troller determine the topology and initial weights of a network, which is further trained using backpropagation. We apply this method to the task of controlling the outflow and temperature of a water tank, producing statistically- significant gains in accu- racy over both a standard neural network approach and a non- learning PID controller. Furthermore, using the PID knowledge to initialize the weights of the network produces statistically less variation in testset accuracy when compared to networks initial- ized with small random numbers. ---------- The Extraction of Refined Rules from Knowledge-Based Neural Networks Geoffrey G. Towell Jude W. Shavlik Department of Computer Science University of Wisconsin E-mail Address: towell at cs.wisc.edu Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this paper, we propose and empirically evaluate a method for the final, and possibly most difficult, step. This method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules: (1) closely reproduce (and can even exceed) the accuracy of the network from which they are extracted; (2) are superior to the rules produced by methods that directly refine symbolic rules; (3) are superior to those produced by previous techniques for extracting rules from trained neural networks; (4) are ``human comprehensible.'' Thus, the method demonstrates that neural networks can be an effective tool for the refinement of symbolic knowledge. Moreover, the rule-extraction technique developed herein contributes to the understanding of how symbolic and connectionist approaches to artificial intelligence can be profitably integrated. ---------- FTP Instructions: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclin.fskbann.ps.Z OR... get scott.nnpid.ps.Z OR... get towell.interpretation.ps.Z ftp> quit unix> uncompress maclin.fskbann.ps.Z OR... uncompress scott.nnpid.ps.Z OR... uncompress towell.interpretation.ps.Z unix> lpr maclin.fskbann.ps OR... lpr scott.nnpid.ps OR... lpr towell.interpretation.ps (or use whatever command you use to print PostScript) From danielg at cogs.sussex.ac.uk Wed Oct 9 07:07:27 1991 From: danielg at cogs.sussex.ac.uk (Daniel Glaser) Date: Wed, 9 Oct 91 12:07:27 +0100 Subject: Restrictions on recurrent learning Message-ID: <29747.9110091107@rsunx.cogs.susx.ac.uk> I have been working on some simple recurrent networks as defined by Jordan(1986) and Elman(1990), and am interested in the class of temporal regularities that they can learn. In particular, how do they compare with more general back propagation through time defined by the PDP group(1986) and Werbos(1990) ? In the Jordan/Elman nets, activation flows forward in time from `copies' of units from previous cycles, and thus, during learning, error only propagates backwards locally in time. Does anyone know of any theoretical or empirical work on what these different types of network can learn ? If replies are addressed to me personally, I will post a summary in due course. Thanks Daniel. References: Elman, J.~L. (1990). Finding structure in time. {\em Cognitive Science}, {\bf 14}:179--211. Jordan, M.~I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In {\em Proceedings of the Eighth Annual Meeting of the Cognitive Science Society}, Hillsdale, NJ. Erlbaum. Rumelhart, D.~E., McClelland, J.~L., \& Williams, R.~J. (1986). Learning internal representations by error propagation. In D.~E. Rumelhart \& J.~L. McClelland (Eds.), {\em Parallel Distributed Processing: Explorations in the Microstructure of Cognition}, volume~1 chapter~8. Cambridge, MA: MIT Press/Bradford Books. Werbos, P.~J. (1990). Backpropagation through time: What it does and how to do it. {\em Proceedings of the IEEE}, 78(10):1550--1560. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Wed Oct 9 14:05:33 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Wed, 09 Oct 91 14:05:33 -0400 Subject: Recurrent Cascade-Correlation Code Message-ID: Simulation code for the Recurrent Cascade-Correlation (RCC) algorithm, previously available only in Common Lisp, has now been translated into C by Conor Doherty of the University College of Dublin (Ireland). This code is a modification of the C program for original Cascade-Correlation, written by Scott Crowder of Carnegie Mellon. My thanks to Conor and Scott for their help in making these programs available to the barbarian hordes who speak only C. For a description of this algorithm, see Scott E. Fahlman, "The Recurrent Cascade-Correlation Architecture" in Advances in Neural Information Processing Systems 3, edited by R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Morgan Kaufmann Publishers, 1991. Alternatively, see the tech report mentioned below. The instructions for accessing any of this code via FTP are included at the end of this message. Scott E. Fahlman School of Computer Science Carnegie Mellon University =========================================================================== Public-domain simulation programs for the Quickprop, Cascade-Correlation, and Recurrent Cascade-Correlation learning algorithms are available via anonymous FTP on the Internet. This code is distributed without charge on an "as is" basis. There is no warranty of any kind by the authors or by Carnegie-Mellon University. Instructions for obtaining the code via FTP are included below. If you can't get it by FTP, contact me by E-mail (sef+ at cs.cmu.edu) and I'll try *once* to mail it to you. Specify whether you want the C or Lisp version. If it bounces or your mailer rejects such a large message, I don't have time to try a lot of other delivery methods. I am maintaining an E-mail list of people using this code so that I can notify them of any changes or problems that occur. I would appreciate hearing about any interesting applications of this code, and will try to help with any problems people run into. Of course, if the code is incorporated into any products or larger systems, I would appreciate an acknowledgement of where it came from. If for some reason these programs do not work for you, please contact me and I'll try to help. Common errors: (1) Some people don't notice that the symmetric sigmoid output units in cascor have a range of -0.5 to +0.5 (for reasons that are mostly historical). If you try to force this algorithm to produce an output of +1.0 or +37.3, it isn't going to work. (2) Note that quickprop (which is used inside of Cascade-Correlation) is designed to update the weights after every epoch, and it assumes that all the epochs are identical. If you try to run this code updating after every training case, you will lose badly. If you want to change the training set, it is important to zero out the PREV-SLOPES and DELTAS vectors, and also to re=build the caches in Cascade-Correlation. HOW TO GET IT: For people (at CMU, MIT, and soon some other places) with access to the Andrew File System (AFS), you can access the files directly from directory "/afs/cs.cmu.edu/project/connect/code". This file system uses the same syntactic conventions as BSD Unix: case sensitive names, slashes for subdirectories, no version numbers, etc. The protection scheme is a bit different, but that shouldn't matter to people just trying to read these files. For people accessing these files via FTP: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu". The internet address of this machine is 128.2.254.155, for those who need it. 2. Log in as user "anonymous" with your own ID as password. You may see an error message that says "filenames may not have /.. in them" or something like that. Just ignore it. 3. Change remote directory to "/afs/cs/project/connect/code". NOTE: you must do this in a single operation. 4. At this point FTP should be able to get a listing of files in this directory with DIR and fetch the ones you want with GET. (The exact FTP commands you use depend on your local FTP server.) Current contents: quickprop1.lisp Original Common Lisp version of Quickprop. quickprop1.c C version by Terry Regier, U. Cal. Berkeley. cascor1.lisp Original Common Lisp version of Cascade-Correlation. cascor1.c C version by Scott Crowder, Carnegie Mellon rcc1.lisp Common Lisp version of Recurrent Cascade-Correlation. rcc1.c C version, trans. by Conor Doherty, Univ. Coll. Dublin vowel.c Code for Tony Robinson's vowel benchmark. am4.tar.Z Aspirin/Migraine code from MITRE. backprop.lisp Overlay for quickprop1.lisp. Turns it into backprop. --------------------------------------------------------------------------- Tech reports describing these algorithms can also be obtained via FTP. These are Postscript files, processed with the Unix compress/uncompress program. Follow the steps for FTP access as above, but cd to directory unix> ftp pt.cs.cmu.edu (or 128.2.254.155) Name: anonymous Password: ftp> cd /afs/cs/project/connect/tr ftp> binary ftp> get filename.ps.Z ftp> quit unix> uncompress filename.ps.Z unix> lpr filename.ps (or however you print postscript files) For "filename", sustitute the following: cascor-tr Cascade-Correlation paper. qp-tr Paper on Quickprop and other backprop speedups. rcc-tr Recurrent Cascade-Correlation paper. precision Hoehfeld-Fahlman paper on Cascade-Correlation with limited numerical precision. From B344DSL at UTARLG.UTA.EDU Wed Oct 9 23:55:00 1991 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Wed, 9 Oct 1991 22:55 CDT Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu> ANNOUNCEMENT AND CALL FOR ABSTRACTS WORKSHOP ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas SIG of the International Neural Network Society (INNS). To be held at a loca- tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday, February 6-8, 1992. Confirmed speakers include: Stephen Grossberg (Boston University) Stephen Hampson (University of California, Irvine) Karl Pribram (Radford University) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) The focus of this conference will be twofold: (1) how to optimize different aspects of neural and cognitive function and (2) whether particular natural or artificial solutions to specific neural or cognitive problems are in fact opti- mal. Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat- egorization, sensory perception, and motor control. The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions. Speakers will not be re- quired to write a paper, but will be invited to contribute chapters to a book several months after the conference. Books based on two previous MIND conferen- ces -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published by Lawrence Erlbaum Associates. Registration for the conference will be $80 for non-students, $20 for students, with a $10 rebate for MIND or Texas SIG membership. We will try to arrange for discounted air fares from American Airlines as we have done in the past. Those interested in presenting should send me a short (1-3 paragraph) abstract by December 1, 1991, using either e-mail, FAX, or snail mail. Notification of ac- ceptance will be given December 15, 1991. We will not be holding parallel ses- sions, so there are limitations on the number of speakers. However, individu- als who send high-quality abstracts that cannot be accommodated in actual talks will have space to present their work in posters at the conference, and will also be invited to contribute to the book. Prof. Daniel S. Levine Department of Mathematics University of Texas at Arlington Arlington, TX 76019-0408 e-mail: b344dsl at utarlg.uta.edu FAX: 817-794-5802 Telephone: 817-273-3598 From bessiere at imag.fr Thu Oct 10 12:48:37 1991 From: bessiere at imag.fr (Pierre Bessiere) Date: Thu, 10 Oct 1991 17:48:37 +0100 Subject: 4 reports available Message-ID: <9110101648.AA09388@imag.imag.fr> The following four papers/reports have been placed in the neuroprose archive: - Bessiere, P.; "Toward a synthetic cognitive paradigm: Probabilistic Inference"; Conference COGNITIVA90, Madrid, Spain, 1990 Neuroprose file name: bessiere.cognitiva90.ps.Z - Talbi, E-G. & Bessiere, P.; "A parallel genetic algorithm for the graph partitioning problem"; ACM-ICS91 (Conference on Super Computing), Cologne, Germany, 1991 Neuroprose file name: bessiere.acm-ics91.ps.Z - Bessiere, P., Chams, A. & Muntean, T.; "A virtual machine model for artificial neural network programming"; INNC90 (International Neural Networks Conference), Paris, France, 1990 Neuroprose file name: bessiere.innc90.ps.Z - Bessiere, P., Chams, A. & Chol, P.; "MENTAL: a virtual machine approach to artificial neural networks programming"; ESPRIT B.R.A. project NERVES (3049), final report, 1991 The abstract of each paper and ftp instructions follow: ---------- TOWARD A SYNTHETIC COGNITIVE PARADIGM: PROBABILISTIC INFERENCE Cognitive science is a very active field of scientific interest. It turns out to be a "melting pot" of ideas coming from very different areas. One of the principal hopes is that some synthetic cognitive paradigms will emerge from this interdisciplinary "brain storming". The goal of this paper is to answer the question: "Given the state of the art, is there any hints indicating the emergence of such synthetic paradigms?" The main thesis of the paper is that there is a good candidate, namely, the probabilistic inference paradigm. In support of the above thesis the structure of the paper is as follows: - in a first part, we identify five criteria to qualify as a synthetic cognitive paradigm (validity, self consistency, competence, feasibility and mimetic power); - in the second paragraph, the principles of probabilistic inference are reviewed and justifications of validity and self consistency of this paradigm are given (Marr's computational level); - then, the competence criterion is discussed, considering the efficiency of probabilistic inference for dealing with the different classical cognitive riddles and analyzing the relationships of probabilistic inference with several of the usual connexionist formalisms (Marr's algorithmic level); - the criteria of feasibility (condition of computer implementation) and mimetic power (adequation with what is known of the architecture of the nervous system) are finally considered in the fourth part (Marr's implementation level). As a conclusion, it will appear that probabilistic inference is at least a very interesting framework to get a synthetic overview of a number of works in the area and to identify and formalize the most puzzling questions. Some of these questions will be listed. In fact, probabilistic inference will appear finally to be able to play the same role for computational cognitive science that formal logic has played for classical symbolic Artificial Intelligence: a sound mathematical foundation serving as a guide line, as a constant reference and as a source of inspiration. ---------- A PARALLEL GENETIC ALGORITHM FOR THE GRAPH PARTITIONING PROBLEM Genetic algorithms are stochastic search and optimization techniques which can be usedf for a wide range of applications. This paper addresses the application of genetic algorithms to the graph partitioning problem. Standard genetic algorithms with large populations suffer from lack of efficiency (quite high execution time). A massively parallel genetic algorithm is proposed, an implementation on a SuperNode( of Transputers( and results of various benchmarks are given. The parallel algorithm shows a superlinear speed-up, in the sense that when multiplying the number of processors by p, the time spent to reach a solution with a given score, is divided by kp (k>1). A comparative analysis of our approach with hill-climbing algorithms and simulated annealing is also presented. The experimental measures show that our algorithm gives better results concerning both the quality of the solution and the time needed to reach it. ---------- A VIRTUAL MACHINE MODEL FOR ARTIFICIAL NEURAL NETWORK PROGRAMMING This paper introduces the model of a virtual machine for A.N.N. (Artificial Neural Networks). The context of this work is a collaborative project to study new V.L.S.I. implementations and new architectures for neuronal machines. The work consists in the specification and a prototype implementation of a description language for A.N.N., of the associated virtual machine, of the compiler between them and of the compilers mapping the virtual machine on different highly parallel computers. In this short paper we present the virtual machine model which combines the features of various parallel programming paradigms. Our model allows, in particular, to have the same A.N.N. program running on both synchronous or asynchronous type of machines. In this framework a parallel architecture (S.M.A.R.T.) and a dynamically reconfigurable parallel machine of Transputers (SuperNode) are considered as target machines. ---------- MENTAL: A VIRTUAL MACHINE APPROACH TO ARTIFICIAL NEURAL NETWORKS PROGRAMMING (ATTENTION: 100 pages) This report treats (extensively) the same subject than the short paper described just above. Some parts are extracted from the three previouly presented papers. ---------- These reports may be FTP from either neuroprose archives or from my own server (IMAG): How to get files from the Neuroprose archives? ______________________________________________ Anonymous ftp on: - archive.cis.ohio-state.edu (128.146.8.52) mymachine>ftp archive.cis.ohio-state.edu Name: anonymous Password: yourname at youradress ftp>cd pub/neuroprose ftp>binary ftp>get bessiere.foo.ps.Z ftp>quit mymachine>uncompress bessiere.foo.ps.Z How to get files from IMAG? ___________________________ Anonymous ftp on: - 129.88.32.1 mymachine>ftp 129.88.32.1 Name: anonymous Password: yourname at youradress ftp>cd pub/SYMPA/NNandGA ftp>binary ftp>get bessiere.foo.ps.Z ftp>quit mymachine>uncompress bessiere.foo.ps.Z -- Pierre BESSIERE *************** IMAG/LGI phone: BP 53X Work: 33/76.51.45.72 38041 Grenoble Cedex Home: 33/76.51.16.15 FRANCE Fax: 33/76.44.66.75 Telex:UJF 980 134 F E-Mail: bessiere at imag.imag.fr C'est au savant moderne que convient, plus qu'a tout autre, l'austere conseil de Kipling: "Si tu peux voir s'ecrouler soudain l'ouvrage de ta vie, et te remettre au travail, si tu peux souffrir, lutter, mourrir sans murmurer, tu seras un homme , mon fils." Dans l'oeuvre de la science seulement on peut aimer ce qu on detruit, on peut continuer le passe en le niant, on peut venerer son maitre en le contredisant. GASTON BACHELARD From gary at cs.UCSD.EDU Thu Oct 10 13:26:04 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Thu, 10 Oct 91 10:26:04 PDT Subject: Restrictions on recurrent learning Message-ID: <9110101726.AA24233@desi.ucsd.edu> Fu-Sheng Tsung and I showed there were problems that a hidden-recurrent (Elman-style) net can learn that an output-recurrent Jordan net can't in our 1989 paper in IJCNN: Tsung, Fu-Sheng and Cottrell, G. (1989) A sequential adder using recurrent networks. In \fIProceedings of the International Joint Conference on Neural Networks\fP, Washington, D.C. A similar paper with some state space analysis is in: Cottrell, G. and Fu-sheng Tsung (1991). Learning simple arithmetic procedures. In J.A. Barnden & J.B. Pollack (Eds), \fIAdvances in connectionist and neural computation theory, Vol 1: High-level connectionist models\fP, Norwood: Ablex. There are simple logical arguments that show that hidden-recurrent nets are more powerful than output-recurrent nets. The bottom line is that if there is a problem where the teaching signal forces "forgetting" of the input, then a Jordan-style output-recurrent network cannot respond to things that require remembering it. Hal White also believes Elman nets are strictly more powerful than Jordan nets, but I'm not sure he has a proof. gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at cs.ucsd.edu (INTERNET) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET) gcottrell at ucsd.edu (BITNET) From ECONEC at vax.oxford.ac.uk Fri Oct 11 11:39:00 1991 From: ECONEC at vax.oxford.ac.uk (ECONEC@vax.oxford.ac.uk) Date: Fri, 11 Oct 91 11:39 BST Subject: REQUEST FOR INFORMATION: NNs AND ECONOMICS Message-ID: REQUEST FOR INFORMATION I am studying for an MLitt/DPhil at the Oxford University and would be very grateful for some help. This message is being transmitted to several relevant lists and please feel free to forward it to anyone who might be interested. Apologies in advance to anyone who gets fed up with seeing it! 1) REQUEST: I am interested in references and names for work broadly in the area of AI techniques applied to economics. To narrow this down, I am interested in AI as a tool for developing alternative models of economic behaviour than the traditional view of man as a perfectly informed calculating machine! Because of the behavioural aspect and my preference for economic theory I am hoping to avoid work that simply uses AI techniques to solve traditional models faster. (GAs as function optimisers for instance.) Similarly I am not seeking information on decision support or Expert Systems unless they make some attempt (or claim) to emulate human decision making behaviour. (Default Logics? Frames?) Please err on the side of completeness! 2) OFFER: Obviously I can provide summaries of my findings to various lists in the usual way. (Perhaps you could say where you saw my post so I can keep the summaries relevant to each list.) What I would also like to do is find out whether there is any interest in an adhoc email list of people working in this area. Or if there is one already I would very much like to hear about it. I'm sure such things have been going for years in the US but information here in the UK seems very sparse. I would be quite happy to "maintain" an unofficial bulletin board or mailing list if one does not exist. Many thanks in advance for any help and please feel free to contact me on any aspect of this posting. Edmund Chattoe SNAIL: LADY MARGARET HALL OXFORD OXON OX2 6QA From lyn at dcs.exeter.ac.uk Fri Oct 11 13:56:49 1991 From: lyn at dcs.exeter.ac.uk (Lyn Shackleton) Date: Fri, 11 Oct 91 13:56:49 BST Subject: special deal for Connection Science Message-ID: <11273.9110111256@castor.dcs.exeter.ac.uk> ********** CONNECTION SCIENCE SPECIAL ISSUE ****************** CONNECTIONIST MODELLING OF PSYCHOLOGICAL PROCESSES VOLUME 3.2 (out now) EDITOR Noel Sharkey SPECIAL BOARD Jim Anderson Andy Barto Thomas Bever Glyn Humphreys Walter Kintsch Dennis Norris Kim Plunkett Ronan Reilly Dave Rumelhart Antony Sanford CONTENTS J R Levenick:NAPS: a connectionist implementation of cognitive maps. A Pouget & S J Thorpe: Connectionist models of orientation identification. D R Shanks: A connectionist account of base-rate biases in categorization. A J O'Toole, K Deffenbacher, H Abdi & J Bartlett: Simulating the "Other-race effect" as a problem in perceptual learning. S Kaplan, M Sonntag & E Chown: Tracing recurrent activity of cognitive elements (TRACE): a model of temporal dynamics in a cell assembly. Research Notes: A H Kawamoto & S N Kitzis: Time course of regular and irregular pronunciations. A VERY SPECIAL DEAL FOR MEMBERS OF THE CONNECTIONISTS MAILING. Prices for members of this list will now be: North America 44 US Dollars (reduced from 126 dollars) Elsewhere and U.K. 22 pounds sterling. (Sterling checks must be drawn on a UK bank) These rates start from 1st January 1992 (volume 4). Conditions: 1. Personal use only (i.e. non-institutional). 2. Must subscribe from your private address. You can receive a subscription form by emailing direct to the publisher: email: carfax at ibmpcug.co.uk Say for the attention of David Green and say CONNECTIONISTS MAILING LIST. noel From mclennan at cs.utk.edu Fri Oct 11 17:43:22 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Fri, 11 Oct 91 17:43:22 -0400 Subject: report: contin. symbol systems Message-ID: <9110112143.AA01451@maclennan.cs.utk.edu> ** Please do not forward to other boards. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. N.B. The uncompressed file is long (1.82 MB), so you may have to use the -s (symbolic link) option on lpr to print it. ----------------------------------------------------- Continuous Symbol Systems The Logic of Connectionism Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-145 ABSTRACT: It has been long assumed that knowledge and thought are most naturally represented as _discrete_symbol_systems_ (calculi). Thus a major contribution of connectionism is that it provides an alternative model of knowledge and cognition that avoids many of the limitations of the traditional approach. But what idea serves for connectionism the same unifying role that the idea of a calculus served for the traditional theories? We claim it is the idea of a _continuous_symbol_system_. This paper presents a preliminary formulation of continuous sym- bol systems and indicates how they may aid the understanding and development of connectionist theories. It begins with a brief phenomenological analysis of the discrete and continuous; the aim of this analysis is to directly contrast the two kinds of symbols systems and identify their distinguishing characteristics. Next, based on the phenomenological analysis and on other observations of existing continuous symbol systems and connectionist models, I sketch a mathematical characterization of these systems. Finally the paper turns to some applications of the theory and to its implications for knowledge representation and the theory of com- putation in a connectionist context. Specific problems addressed include decomposition of connectionist spaces, representation of recursive structures, properties of connectionist categories, and decidability in continuous formal systems. A preliminary version of this paper was presented at the workshop "Neural Networks for Knowledge Representation, Fourth Annual Workshop of the Metroplex Institute for Neural Dynamics (MIND)," Westlake TX, October 4-6, 1990. Also presented at "ConnectFest 1990," sponsored by Indiana University Center for Research in Concepts and Cognition, November 3-4, 1990. ----------------------------------------------------- FTP INSTRUCTIONS Either use "Getps maclennan.css.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.css.ps.Z ftp> quit unix> uncompress maclennan.css.ps.Z unix> lpr -s maclennan.css.ps (or however you print postscript) Note that the postscript version is missing three (nonessential) figures that have been pasted into the hardcopy version. If you need hardcopy, then send your request to: library at cs.utk.edu Bruce MacLennan Department of Computer Science 107 Ayres Hall The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From david at cns.edinburgh.ac.uk Sun Oct 13 17:20:34 1991 From: david at cns.edinburgh.ac.uk (David Willshaw) Date: Sun, 13 Oct 91 17:20:34 BST Subject: Computational Neureoscientist post Message-ID: <4519.9110131620@subnode.cns.ed.ac.uk> UNIVERSITY OF OXFORD MRC Centre in Brain and Behaviour The Medical Research Council has awarded a 7-year grant to establish a Research Centre in Brain and Behaviour, based at the University of Oxford, and also involving scientists from other universities including Birmingham, Cambridge, Durham, Edinburgh and London. The main theme of the Research Centre is the organisation, function, development and disorders of the cerebral cortex, and central to this theme is the exploration of the cortex as an instrument of computation. To this end, the Centre carries out research involving many different methodologies, in the areas of sensory systems, learning and memory, and motor control. Applications are invited for the post of Computational Neuroscientist to work on theoretical aspects of learning and memory. The post will be based at the University of Edinburgh, where the post-holder will be expected to spend 80% of his/her time. The remaining time will be spent in linking with complementary work being carried out by other participants of the Centre, particularly at the universities of Oxford and Cambridge. A range of projects is available, and prospective applicants are encouraged to discuss their plans with Dr David Willshaw of the University of Edinburgh. Two possibilities which are compatible with present work are: 1) Development of a model of the mammalian hippocampal formation as an associative memory; 2) Investigation of associative and error-correcting models of cerebellar function as implemented in a biologically realistic form. This appointment, which is available from January 1992 for 2 years in the first instance and potentially renewable for a further 4 years, will be made on the RS1A scale (currently 11,969-19,073 pounds p.a. with a discretionary scale rising to 21,391 pounds p.a.). Applications (including the name and address of two referees) should be sent to Ms Catherine Greasley, Administrative Secretary, MRC Research Centre in Brain and Behaviour, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD (telephone (0865) 271364 - mornings only) no later than Friday 8 November 1991. The University of Oxford is an Equal Opportunities Employer David Willshaw Centre for Cognitive Science 2 Buccleuch Place Edinburgh EH8 9LW UK Tel: (+44) 31 650 4404 Fax: (+44) 31 650 4587 Email: d.willshaw at edinburgh.ac.uk From harnad at Princeton.EDU Sun Oct 13 19:51:05 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 13 Oct 91 19:51:05 EDT Subject: Newell's Unified Theories of Cognition: BBS Call for Book Reviewers Message-ID: <9110132351.AA08163@psycho> Below is the abstract of a book that will be accorded multiple book review in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on this book, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at clarity.princeton.edu or harnad at pucc.bitnet or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] To help us put together a balanced list of commentators, please give some indication of the aspects of the topic on which you would bring your areas of expertise to bear if you are selected as a commentator. ____________________________________________________________________ BBS Multiple Book Review of: UNIFIED THEORIES OF COGNITION (Harvard University Press, 1990) Allen Newell School of Computer Science Carnegie-Mellon University This book presents the case that cognitive science should turn its attention to developing theories of human cognition that cover the full range of human perceptual, cognitive, and action phenomena. Cognitive science has now produced a massive number of high quality regularities with many microtheories that reveal important mechanisms. The need for integration is pressing and will continue to increase. Equally important, cognitive science now has the theoretical concepts and tools to support serious attempts at unified theories. The argument is made entirely by presenting an exemplar unified theory of cognition both to show what a real unified theory would be like and to provide convincing evidence that such theories are feasible. The exemplar is Soar, a cognitive architecture realized as a software system. After a detailed discussion of the architecture and its properties, with its relation to the constraints on cognition in the real world and to existing ideas in cognitive science, Soar is used as a theory for a wide range of cognitive phenomena: immediate responses (stimulus-response compatibility and the Sternberg phenomena); discrete motor skills (transcription typing); memory and learning (episodic memory and the acquisition of skill through practice); problem solving (cryptarithmetic puzzles and syllogistic reasoning); language (sentence verification and taking instructions); and development (transitions in the balance beam task). The treatments vary in depth and adequacy, but they clearly reveal a single, highly specific, operational theory that works over the entire range of human cognition. Soar is presented as an exemplar unified theory, not as the sole candidate. Cognitive science is not ready yet for a single theory -- there must be multiple attempts. But cognitive science must begin to work towards such unified theories. From kamil at apple.com Mon Oct 14 19:41:34 1991 From: kamil at apple.com (Kamil A. Grajski) Date: Mon, 14 Oct 91 16:41:34 -0700 Subject: batch-mode parallel implementations Message-ID: <9110142341.AA19545@apple.com> Hi folks, In reviewing some implementations of back-prop type algorithms on parallel machines, it is apparent that several such implementations obtain their high performance because of batch-mode training. What this means is that one operates on N independent training patterns simultaneously and then collects all the weight update information and reestimates once per N samples. Example where this has been used (among others) are the GF-111, MasPar, CM-2, Warp (I think, at least for a self-org feature map implementation), etc. In many papers, I have read passing references to the fact that real-time learning is preferred (in practice) over the theoretically indicated batch-mode (so-called "true gradient") learning. Some of the arguments given include "faster" convergence and "better" generalization. Are the convergence and generalization arguments linked at some deeper level of analysis? (You could have fast convergence which generalizes poorly, etc.) I have played with this just a little bit on small speech and other datasets without reaching any conclusive results. I am wondering whether there have been some definitive studies, theoretical and/or practical which really confront this issue? How big an issue is this for people? For example, would you NOT look at a parallel design which assumes batch-mode training? Kamil P.S. If this is a dead issue and I missed the funeral, I apologize. ================ Kamil A. Grajski Apple Computer (408) 974-1313 kamil at apple.com ================ From B344DSL at UTARLG.UTA.EDU Mon Oct 14 14:14:00 1991 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 14 Oct 1991 13:14 CDT Subject: Announcement of talk by Pribram at Georgetown, Oct. 18 Message-ID: <01GBQK3RGXY80003LS@utarlg.uta.edu> From: IN%"PRUEITT at guvax.georgetown.edu" "Paul S. Prueitt" 14-OCT-1991 12:29:48.05 To: IN%"kpribram at ruacad.ac.runet.edu" "kpribram" CC: IN%"duziakm at isnet.inmos.com" "duziakm", IN%"pwerbos at note.nsf.gov" "pwerbos", IN%"liwu at aic.nrl.navy.mil" "liwu", IN%"kugler at rucs2.sunlab.cs.runet.edu" "kugler", IN%"medsker at AUVM.BITNET" "medsker", IN%"b344dsl at UTARLG.UTA.EDU" "b344dsl", IN%"prueitt Subj: Pribram's Talk on Friday From PRUEITT at guvax.georgetown.edu Mon Oct 14 14:15:00 1991 From: PRUEITT at guvax.georgetown.edu (Paul S. Prueitt) Date: 14 Oct 91 13:15:00 EST Subject: Pribram's Talk on Friday Message-ID: <01GBQIANVZ28000315@utarlg.uta.edu> Please Communicate within your group *********************Please post and forward on E-mail******************* ******************** Georgetown University Physics Department and Neural Network Research Facility 1991-92 Colloquium Series on Behavioral and Computational Neuroscience Friday, October 18th 4:00 P.M. to 6:00 P.M. Auditorium Room 112 Reiss Building, Georgetown University Refreshments at 3:30 P.M. in Room 505 Dr. Karl Pribram **************** Center for Brain Research and Informational Sciences, Radford University Brain and Perception, Holonomy and Structure in Figural Processing Dr. Pribram will discuss topics from his new book; Brain and Perception, Holonomy and Structure in Figural Processing. A one hour prepared lecture is to be followed by a one hour discussion. The book is now available from Dr. Edward J. Finn, Chairman of the G.U. Physics Department or from Lawrence Erlbaum Associates. Professor Pribram will autograph copies of the book after the Colloquium. ************************************************************************* For additional information please call Edward Finn at 202-687-6231. Parking: Use Georgetown Univ. Entrance One from Reservoir Road (Northern Boundary) ******************** From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 15 01:23:47 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 15 Oct 91 01:23:47 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Mon, 14 Oct 91 16:41:34 -0800. <9110142341.AA19545@apple.com> Message-ID: I don't recall seeing any studies that claim better generalization for per-sample or continuous updating than for per-epoch or batch updating. Can you supply some citations? The only reason I can think of for better generalization in the per-sample case would be a weak sort of simulated-annealing effect, with the random variation among individual training samples helping to jiggle the system out of small local minima in the vicinity of the best answer. As for speed of convergence, continuous updating clearly beats per-epoch updating if the training set is highly redundant. To see this, imagine taking a small set or training cases, duplicating that set 1000 times, and presenting the resulting huge set as the training set. Per-sample updating would probably have converged on a good set of weights before the first per-epoch weight adjustment is ever made. Also, in some cases it just is not practical to use per-epoch updating. There may be a stream of ever-changing data going by, and it may be impractical to store a large set of samples from this data stream for repeated use. On the other hand, it is rather dangerous to use continuous updating with high learning rates or with techniques that adjust the learning rate based on some sort of second-derivative estimate. If you are not very careful, a few atypical cases in a row can accelerate you right out of the solar system. Some techniques, such as quickprop and most of the conjugate gradient methods, depend on the ability to look at the same set of training examples more than once, so they inherently are per-epoch models. In my opinion, the best solution in most situations is probably to use one of the accelerated convergence methods and to update the weights after an "epoch" whose size is chosen by the experimenter. It must be sufficiently large to give a reasonably stable picture of the overall gradient, but not so large that the gradient is computed many times over before a weight-update cycle occurs. However, I am sure that this view is not universally accepted: some people seem to believe that per-sample updating is superior in all cases. -- Scott Fahlman From castillo at eel.upc.es Tue Oct 15 11:16:28 1991 From: castillo at eel.upc.es (Francisco Castillo Cobo) Date: Tue, 15 Oct 1991 11:16:28 UTC+0100 Subject: add to NEURAS-LIST Message-ID: <"114*/S=castillo/OU=eel/O=upc/PRMD=iris/C=es/"@MHS> Hi, I am currently compiling a list of incremental (or growing) neural network s, I have some already identified, including RCE and Tiling. I am interested in receiving additional references on the matter and would be glad to summarize t he responses and se nd them to anyone who might be interested. Thanx! F.Castillo From ecal at cgref.cemagref.fr Tue Oct 15 08:06:13 1991 From: ecal at cgref.cemagref.fr (European Conference on Artificial Life) Date: Tue, 15 Oct 91 12:06:13 GMT Subject: ECAL91 programme Message-ID: <9110151206.AA11528@cgref> Please find enclosed an E-mail version of ECAL91 programme (more up-to-date than the paper programme). You can use the registration form enclosed, granted that you send your payment by regular mail at the given address. =====================CUT HERE=====================CUT HERE====================== 1st European Conference on Artificial Life ________________________________________________________________________________ PROGRAMME - PROGRAMME - PROGRAMME- PROGRAMME ________________________________________________________________________________ EEEEEEE CCCCCC AA LL 99 11 EE CC AA AA LL 99 99 11 EE CC AA AA LL 99 99 11 EE CC AA AA LL 99 99 11 EEEEE CC AAAAAAAAA LL 99999 11 EE CC AA AA LL 99 11 EE CC AA AA LL 99 11 EE CC AA AA LL 99 11 EEEEEEE CCCCC AA AA LLLLLLLL 9999 11 ________________________________________________________________________________ To be held on December 11-13 1991 in Centre des Congres de la Villette Salle Laser cite des Sciences et de l'Industrie Paris, France Publisher : MIT Press / Bradford Books Sponsors : la Cite CEMAGREF Banque de France CNR Fondation de France AFCET Electricite de France CREA Offilib ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Artificial life: a new scientific field Artificial life embodies a recent and important conceptual step in modern science: asserting that the core of intelligence and cognitive abilities is the same as the capacity for living. Metaphorically, artificial life would see in the modest insect rather than in the symbolic abilities of an expert the best prototype for intelligence . What needs to be understood and characterized is the class of processes that endow living creatures with their characteric autonomy, key properties such as viability, abduction and adaptability. The autonomy of the living beings is understood here both with regards to their actions and to the way in which they shape their world into significance. This exploration goes hand in hand with the theory, design and construction of simple autonomous agents. The recent surge of interest in 'artificial life' has to be understood in the context of the long tradition inaugurated with cybernetics, seeking common basis for the living and the artificial. Artificial life can take advantage of the years of research in the tradition of symbolic computation that still characterizes most of the research in artificial intelligence, as well as the more recent explosive development of neural networks and connectionist approaches. Artificial life also induces a renewal of a whole range of engineering traditions, such as control theory and robotics, beyond classical notions of goal and planning, into biologically inspired notions of viability and adaptation, situatedness and operational closure, thus putting evolutionary processes at the very center of the stage. The first European meeting intends to highlight the practice of such autonomous systems in all their forms, by hosting the presentation and discussion of the most recent research in the area. Beyond research results, another main intention of the meeting is to engage researchers and philosophers to examine the epistemological basis of this new trend. Only a sustained analysis of the main concepts and ideas can provide a fertile ground for important advances and a change of research paradigm. Conference Chairs : Paul Bourgine and Francisco Varela Programme Committee : H. Bersini, B Ch. G. Langton, USA R. Brooks, USA J. A. Meyer, F J. Demongeot, F H.Schwefel, FRG B. Goodwin, UK D. Parisi, I S. Kauffman, USA Organizing Committee : I. Alvarez V. Douzal L. Bochereau T. Fuhs G. Deffuant ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Wednesday December 11 8:00 REGISTRATION 9:30 WELCOME ADDRESS Paul BOURGINE, CEMAGREF - (F), Francisco VARELA, CREA - (F) 9:45 AUTONOMOUS ROBOTS (I) Invited lecture: Rodney BROOKS, MIT - (USA) "Robots and artificial life" Uwe SCHNEPF, GMD - (FRG), Mukesh J. PATEL, University of Sussex - (UK) "Concept formation as Emergent Phenomena" Rolf PFEIFER, Free University of Brussels - (B), Paul VERSCHURE, Univ. of California, Santa Cruz (USA) "Distributed adaptive control : a paradigm for autonomous agents" Break / refreshments Tim SMITHERS, University of Edimburgh - (UK) "Taking eliminative materialism seriously : a methodology for autonomous systems research" Leslie P. KAELBLING, Brown University - (USA) "An adaptable mobile robot" Pattie MAES, MIT - (USA) "Learning behavior networks from experience" 13:15 LUNCH 14:30 SWARM INTELLIGENCE Invited lecture: Jean-Louis DENEUBOURG, Free University of Brussels - (B) "Swarm-made architecture" Alberto COLORNI, Marco DORIGO, Vittorio MANIEZZO, Politecnico di Milano - (I) "Distributed optimization by ant colonies" Andrew M. ASSAD, Univ. of Illinois - (USA), Norman H. PACKARD, Inst. for Scientific Interchange - (I) "Emergent colonization in an artificial ecology" Gerardo BENI, Susan HACKWOOD, Univ. of California, Riverside - (USA) "The maximum entropy principle and sensing in swarm intelligence" Break / refreshments 17:00 EPISTEMOLOGICAL ISSUES Stefan HELMREICH, Stanford University - (USA) "The historical and epistemological ground of von Neumann's theory of self-reproducing automata and theory of games" Jean-Luc DORMOY, EDF - (F), Sylvie KORNMAN, LAFORIA - (F) "Meta-knowledge, autonomy and (artificial) evolution : some lessons learnt so far" 18:00 POSTERS AND DEMOS ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Thursday December 12 9:00 EPISTEMOLOGICAL ISSUES (Continued) R. Allen GARDNER, Beatrix T. GARDNER, University of Nevada - (USA) "A feedforward model of animal learning" Bernard MANDERICK, Free University of Brussels - (B) "Selectionist systems as cognitive systems" Break / refreshments 10:15 AUTONOMOUS ROBOTS (II) Ian HORSWILL, MIT - (USA) "Characterizing adaptation by constraint" Didier KEYMEULEN, Jo DECUYPER, Free University of Brussels - (B) "On the self-organizing properties of topological maps" Piet SPIESSENS, Jan TORREELE, Free University of Brussels - (B) "Massively parallel evolution of recurrent networks : an approach to temporal processing" Dave CLIFF, University of Sussex - (UK) "Neural networks for visual tracking in an artificial fly" 12:45 LUNCH 14:15 LEARNING AND EVOLUTION Invited lecture: Domenico PARISI, Stefano NOLFI, Federico CECCONI, CNR - (I) "Learning, behaviour, and evolution" Hugues BERSINI, Free University of Brussels - (B) "Immune network and adaptive control" Franck HOFFMEISTER, Thomas BACK , University of Dortmund - (FRG) "Genetic self-learning" Heinz MUHLENBEIN, GMD - (FRG) "Darwin's continent cycle theory and its simulation by the Prisoner's dilemna" Break / refreshments Melanie MITCHELL, John H. HOLLAND, University of Michigan - (USA), Stephanie FORREST, University of New Mexico - (USA) "The royal road for genetic algorithms : fitness landscapes and GA performance" Brad FULLMER, Risto MIIKKULAINEN, University of Texas - (USA) "Evolving finite state behaviour using marker-based genetic encoding of neural networks" 18:00 Invited lecture: Stuart KAUFMANN , University of Pennsylvania - (USA) "Waiting for Carnot" 20:30 DINNER ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Friday December 13 9:30 ADAPTIVE AND EVOLUTIONARY MECHANISMS Barry McMULLIN, Dublin City University - (UK) "The Holland alpha-Universes revisited" Robert J. COLLINS, David R. JEFFERSON, University of California - (USA) "The evolution of sexual selection and female choice" Filippo MENCZER, Domenico PARISI, CNR - (I) "A model for the emergence of sex in evolving networks : adaptive advantage or random drift ?" Break / refreshments Inman HARVEY, University of Sussex - (UK) "Species adaptation genetic algorithms : a basis for a continuing SAGA" Jakob SKIPPER, Niels Bohr Institute - (Dk) "The complete zoo evolution in a box" Jeffrey HORN, University of Illinois - (USA) "Measuring the evolving complexity of stimulus-response organisms" 13:15 LUNCH 14:30 CONCEPTUAL FOUNDATIONS Hugues BERSINI, Free University of Brussels - (B) "Animat's I" Claus EMMECHE, Institute of Computer and Systems Sciences - (Dk) "Life as an abstract phenomenon : is Artificial Life possible ?" John STEWART - Paris (F) "Life=cognition : the epistemological and ontological signifance of Artificial Life" Break / refreshments Peter CARIANI, Boston - (USA) "Some epistemological implications of devices which construct their own sensors and effectors" Mark A. BEDAU, Reed College - (USA) "Philosophical aspects of Articial Life" 17:30 CONCLUDING REMARKS ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ POSTER SESSION Petr KURKA, Charles University - (Cz) "Natural Selection in a population of automata" Thomas BACK, University of Dortmund - (FRG) "Self-adaptation in genetic algorithms" Robert DAVIDGE, University of Sussex - (UK) "Looking at life" Hugo de GARIS, Free University of Brussels - (B) "Streerable GenNets : the genetic programming of controllable behaviors in GenNets" Bruno MARCHAL, Free University of Brussels - (B) "Amoeba, planaria and dreaming machines" Alexis DROGOUL, Jacques FERBER, LAFORIA - (F) "A behavioural simulation model for the study of emergent social structures" Antonio RIZZO, CNR - (I), Neil BURGESS, University of Manchester - (UK) "Action based neural network for adaptive control : the tank case" John R. KOZA, Stanford University - (USA) "Evolving emergent wall following robotic behavior using the genetic programming paradigm" Bruno GAS, Rene NATOWICZ, ESIEE - (F) "A non-supervised continuous learning model of neural network for temporal sequence recognition" Eric DEDIEU, Emmanuel MAZER, IMAG - (F) "The SWALLOW modeler : an approach to sensory relevance" Gilles VENTURINI, ESIEE - (F) "Characterizing the adaptation abilities of a class of genetic base machine learning algorithms" Barbara WEBB, Tim SMITHERS, University of Edimburgh - (UK) "The connection between AI and biology in the study of behaviour" Ulrich NEHMZOW, Tim SMITHERS, University of Edimburgh - (UK) "Using motor actions for location recognition" Stephen TODD, Wiliam LATHAM, IBM - (UK) "Artificial life or surreal art?" R.C. PATON , H. S. NWANA, M. J. SHAVE, T. J. BENCH-CAPON, University of Liverpool - (UK) "Computing at the tissue/organ level (with particular reference to the liver)" Pierre BESSIERE, IMAG - (F) "Genetic Algorithms applied to formal neural networks : parallel genetic implementation of a Boltzmann machine and associated robotic experimentations" Karl SIMS, Thinking Machines Corp. - (USA) "Interactive evolution of dynamical systems" Nicolas MEULEAU, CEMAGREF - (F) "Co-evolution and mimetism : a program simulating road traffic" Christian NOTTOLA, Frederic LEROY, Banque de France - (F) "Dynamics of artificial markets M. SNAITH, 0.HOLLAND, TAG - (UK) "Application of the temporal difference learning to the neural control of quadrupede locomotion" Simon GOOS, Jean-Louis DENEUBOURG, Free University of Brussels - (B) "Harvesting by a group of robots" ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Registration Form Name : ...................... First name : ....................... Firm :.............................................................. Address : ............................................................ ...................................................................... Zip code : ............. City : ..................................... Country : ................................ Phone : ............ Fax : ............... Email : .................................. Invoice to be sent to : ................................ Registration fees Before 20/11/91 After 20/11/91 ________________________________________________________________________________ Students* o FF 750 o FF 750 University Members o FF 1500 o FF 1750 Others o FF 2200 o FF 2500 ________________________________________________________________________________ * Student status proof required These fees include all refreshments and lunches. Payment (in french francs only, foreign cheques accepted): o Cheques (to be sent to ECAL 91) please note that all charges, if any, must be at the participants' expense. o Banker's draft to the order of ECAL: Credit Lyonnais, bank account 30002 08948 0000079087X 55 Versailles StLouis, F-78000. PLease ask your bank to arrange the transfer at no cost for the beneficiary. Bank charges, if any, will be at the participants' expense. Travel Please, send me o Domestic railway discount ticket SNCF (20%) o Domestic flight discount ticket Air Inter (35%) Cancellations Refunds of 50 % will be made if a written request is received before November 30. No refunds will be made for cancellations received after this date. In case of conference cancellation beyond its control, ECAL organizing committee limits its liability to the registration fees already paid. Date Signature Send this form to : ECAL 91 17 allee Gabrielle d'Estrees F-75019 Paris FRANCE Further information concerning registration : Fax : (+33) 1 40 96 60 80 Voice : (+33) 1 40 96 61 79 E-mail : ecal at cemagref.fr ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ General Information ___________________ Language The conference will be conducted in English. Accommodation Hotel Forest Hill La Villette *** (5-minutes walk ) 26 av. Corentin Cariou, Paris. Tel : +33 1 44 72 15 30, fax: 33 1 44 72 15 80. Single or double rooms: 480FF, special price for ECAL participants. Hotel Arcade La Villette ** (5-minutes walk) Tel : +33 1 40 38 04 04 Single: 390FF, double room: 420FF. Please reserve at least 30 days in advance. Hotel Campanile Pantin ** (10-minutes walk) Tel : +33 1 48 91 32 76 Single or double rooms: 335FF. Please reserve at least 45 days in advance. Tel : +33 (1) 48 91 32 76 Reservation centers (other hotels): Tel: 33 1 47 27 15 15 (500 to 700FF rooms). Tel: 33 1 43 59 12 12. (Elysee 12 12). Tel: 33 1 42 56 30 00, fax 33 1 42 89 42 97 (Paris Sejour Reservations) Tourist information : 33 1 47 23 61 72 Cheaper accomodations are available at: Centre de sejour Eugene Henaff Tel 33 (1) 48 39 19 05 Entry visas ___________ For non European Community members, please check with the french consulate whether you need a Visa. Access to Paris cite des Sciences et de l'Industrie ___________________________________________________ La cite des Sciences et de l'Industrie is located in northeast Paris, at La Villette Park, 30, avenue Corentin Cariou, 75019 Paris. It is 40 minutes from Roissy and Orly airports. You can reach the Cite: by car: Circular highway, Porte de la Villette exit. Parking available at quai de la Charente and Boulevard Macdonald; by metro: Line 7, Porte de la Villette station; by bus: lines 150-152-250A-PC. For information about the cite des Sciences, call 33 1 46 42 13 13 (round-the-clock), or by Minitel: 3615 code Villette. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Tue Oct 15 10:22:41 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Tue, 15 Oct 91 09:22:41 EST Subject: Paper Message-ID: Following is the abstract of a paper accepted by IJCNN'91-SINGAPORE. The main purpose of this paper was to attack the problems of slow rate of convergence, local minima, and incapability of learning (under certain preset criteria) etc problems associated with the original back-propagation neural nets from an alternative viewpoint ---- topology ---- instead of the learning algorithm and units responsive characteristics. It was shown in this paper that the topology is a very important factor limiting the performances of back-propagation neural networks besides the already studied factors such as the learning algorithm and the units characteristics. All comments are welcome. PPNN: A Faster Learning and Better Generalizing Neural Net Bo Xu Indiana University Liqing Zheng Purdue University Abstract----It was pointed out in this paper that the planar topology of current back-propagation neural network (BPNN) sets limits to solve the slow convergence rate problem, local minima, and other problems associated with BPNN. The parallel probabilistic neural network (PPNN) using a new neural network topology, stereotopology, was proposed to overcome these problems. The learning ability and the generalization ability of BPNN and PPNN were compared for several problems. The simulation results show that PPNN was capable of learning any kinds of problems much faster than BPNN and generalized better than BPNN too. It was analyzed that the faster, universal learnability of PPNN was due to the parallel characteristic of PPNN's stereotopology, and the better generalization ability came from the probabilistic characteristic of PPNN's memory retrieval rule. Bo Xu Indiana University itgt500 at indycms.iupui.edu From xiru at Think.COM Tue Oct 15 11:35:55 1991 From: xiru at Think.COM (xiru Zhang) Date: Tue, 15 Oct 91 11:35:55 EDT Subject: batch-mode parallel implementations In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Tue, 15 Oct 91 01:23:47 -0400 <9110151441.AA10584@chaos.cs.brandeis.edu> Message-ID: <9110151535.AA02757@yangtze.think.com> From jcp at vaxserv.sarnoff.com Wed Oct 16 12:03:09 1991 From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385) Date: Wed, 16 Oct 91 12:03:09 EDT Subject: batch-mode parallel implementations Message-ID: <9110161603.AA09000@sarnoff.sarnoff.com> Xiru Zhang stated: >From the point of view of implementation, if a network is not large, there >is not much you can parallelize if you do per-sample training. Even in per-sample training one may be able to efficiently exploit a parallel machine. Each processor simulates the same network but has a different set of initial weights. The convergence time and performance of a trained network can be very dependent on the initial weights. I would appreciate being sent references that discuss this last statement. John Pearson David Sarnoff Research Center CN5300 Princeton, NJ 08543 609-734-2385 jcp at as1.sarnoff.com From gary at cs.UCSD.EDU Wed Oct 16 13:05:51 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Wed, 16 Oct 91 10:05:51 PDT Subject: batch-mode parallel implementations Message-ID: <9110161705.AA27497@desi.ucsd.edu> I tried implementing Elman's simple recurrent nets on an Intel Hypercube using data parallelism (a copy of the net at each node, each getting a part of the training set). I found is was as fast as a bat out of h**l, but as many times faster as it was, it was also as many times SLOWER at converging, leading to a net gain of 0! g. PS I did not try conjugate gradient, or back propping more steps in time, which probably would have helped convergence lots. From orilex at crl.ucsd.edu Wed Oct 16 15:33:44 1991 From: orilex at crl.ucsd.edu (Roy Higginson) Date: Wed, 16 Oct 91 12:33:44 PDT Subject: address for Sanger Message-ID: <9110161933.AA21258@crl.ucsd.edu> Can someone give me an e-mail address for Dennis Sanger AT&T Bell/Univ of CO at Boulder? Thanks, Higginson From ajr at eng.cam.ac.uk Wed Oct 16 17:48:31 1991 From: ajr at eng.cam.ac.uk (Tony Robinson) Date: Wed, 16 Oct 91 17:48:31 BST Subject: TR available: Phoneme recognition with recurrent networks Message-ID: <16687.9110161648@dsl.eng.cam.ac.uk> ***Do not forward to other bboards*** I've recently completed a technical report on connectionist phoneme recognition which I would like to make available to interested researchers. It describes a series of changes which have been made to tidy up a previously published system. Copies of the technical report may be obtained courtesy of Jordan Pollack by anonymous ftp from archive.cis.ohio-state.edu in the directory /pub/neuroprose as file robinson-tr82.ps.Z. If this option is not available to you, or if you would like a reprint of the background article, please send me email giving your full address. Tony [Robinson] Cambridge University Engineering Department, Trumpington Street, Cambridge, UK ------------------------------------------------------------------------------ Several Improvements to a Recurrent Error Propagation Network Phone Recognition System Tony Robinson ajr at eng.cam.ac.uk CUED/F-INFENG/TR.82 30 September 1991 Recurrent Error Propagation Networks have been shown to give good performance on the speaker independent phone recognition task in comparison with other methods [Robinson and Fallside, Computer Speech and Language, July 1991]. This short report describes several recent improvements made to the existing recogniser for the TIMIT database. The improvements are: an addition to the preprocessor to represent voicing information; use of histogram normalisation on the input channels of the network; normalisation of the output channels to enforce unity sum; a change in the cost function to give equal weighting to each target symbol; a change in the representation of the outputs to reduce quantisation errors; retraining on the complete TIMIT training set; and the better estimation of HMM phone models. Most of these changes decrease the number of arbitrary parameters used and allow for the integration of the system with standard HMM techniques. The result of these changes is a decrease in the number of errors by about 16% (from 36.5% to 30.7% when all 61 TIMIT phones are used and from 30.2% to 25.0% on a reduced 39 phone set). From shams at maxwell.hrl.hac.com Wed Oct 16 17:23:42 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Wed, 16 Oct 91 14:23:42 PDT Subject: batch-mode parallel implementations Message-ID: <9110162123.AA08260@maxwell.hrl.hac.com> We have exploited the "epoch" training method for implementing back-prop on a 2-D systolic array processor of Hughes [1,2]. There are two basic problems with this approach. First, there are only a limited number of models that allow for epoch training (e.g. back-prop). Second, this type of parallelism is not useful during recall or classification cycle since there is only a single input pattern to be evaluated (unless the input data rate exceeds the processor throughput enabling the input data to be buffered for batch processing). As the number of neurons used in real-world applications continue to increase, there would be enough computation to keep all the processors busy without having to use epoch parallelism. [1] S. Shams and K. W. Przytula, "Mapping of Neural Networks onto Programmable Parallel Machines," Proceedings of the Intern. Symp. on Circuits and Systems, New Orleans, LA, Vol. 4, pp. 2613-2617, 1990. [2] S. Shams and K. W. Przytula. "Implementation of Multilayer Neural Networks on Parallel Programmable Digital Computers." In Parallel Algorithms and Architectures for DSP Applications. Ed. M. Bayoumi, Kluwer Academic Publishers, pp. 225-253, 1991. Soheil Shams Hughes Research Labs. From karunani at CS.ColoState.EDU Wed Oct 16 22:23:31 1991 From: karunani at CS.ColoState.EDU (n karunanithi) Date: Wed, 16 Oct 91 20:23:31 MDT Subject: HowtoScale Message-ID: <9110170223.AA05027@zappa> Dear Connectionist, Some time back I posted the following problem in this news group and many people responded with suggestions and references. I thankful to all of them. I have summarized their responses and posting here to for other who might find it interesting. For completeness sake I have included my original posting as well. ******Issue raised: Background: ----------- I have been using neural network models (both Feed-Forward Nets and Recurrent Nets) in a prediction application and I am getting pretty good results. In fact neural networks approach outperformed many well known analytic models. Similar results have been reported by many researchers in (chaotic) time series predictions. Suppose that X is the independent variable and Y is the dependent variable. Let (x(i),y(i)) represent a sequence of actual input/output values observed at time i = 0,1,2,..,t of a temporal process. Let further that both the input and the output variables are single dimensional variable and can take on a sequence of +ve integers up to a maximum of 2000. Once we train a network with the history of the system up to time "t" we can use the network to predict outputs y(t+h), h=1,..,n for any future input x(t+h). In my application I already have the complete sequence and hence I know what is the maximum value for x and y. Using these maximum I normalized both X and Y over a 0.1 to 0.9 range. (Here I call such normalization as "scaled representation".) Since I have the complete sequence it is possible for me to evaluate how good the networks' predictions are. Now some basic issues: --------------------- 1) How to represent these variables if we don't know in advance what the maximum values are? Scaled representation presupposes the existence of a maximum value. Some may suggest that a linear units can be used at the output layer to get rid of scaling. If so how do I represent the input variable? The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed to 1.0) when the sum is >= 14. However one may suggest that changing the output range of the sigmoidal can help to get rid of saturation effect. Is it a correct approach? 2) In such prediction application, people (including me) compare the predictive accuracy of neural networks with that of parametric models(that are based on analytical reasons). But one main advantage with the parametric models is that their parameters can be calculated using any of the following parameter estimation techniques: least square, maximum likelyhood, Bayesian, Genetic Algorithms or any other method. These parameter estimation techniques do not require any scaling, and hence there is no need for preguessing of the maximum values. However with the scaled representation in neural networks one can not proceed without making guesses about the maximum(or a future) input and/or output. In many real life situations such guesses are infeasible or dangerous. How do we address this situation? ____________________________________________________________________________ N. KARUNANITHI E-Mail: karunani at CS.ColoState.EDU Computer Science Dept, Colorado State University, Collins, CO 80523. ____________________________________________________________________________ ******Responses Received: 1) Dr Huang at CMU Date: Thu, 26 Sep 1991 11:40-EDT From: Xuedong.Huang at SPEECH2.CS.CMU.EDU I have several papers addressing the issues you raised. See for example: [1] Huang, X : A Study on Speaker-Adaptive Speech Recognition" DARPA Speech and Language Workshop, Feb , 1991, pp278-283 [2] Huang, X, K. Lee and A. Waibel: Connectionist speaker normlization and its applications to speech recognition", IEEE Workshop on NNSP, Princeton, Sept. 1991 X.D. Huang, PhD Research Computer Scientist ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ School of Computer Science Tel: (412) 268 2329 Carnegie Mellon University Fax: (412) 681 5739 Pittsburgh, PA 15213 Email: xdh at cs.cmu.edu ============================================================================= 2) From Alexander at CUNY Date: Thu, 26 Sep 91 14:45 EDT From: TWOMBLY%JHUBARD.BITNET at CUNYVM.CUNY.EDU In response to your question about scaling for sigmoidal units..... I ran into the same problem of not knowing the maximum value that my input/output data would take at any particular time. There were no a priori bounds that could be reasonably set, so the solution (in this case) was to get rid of the sigmoidal activation function and replace it with one that did not require any scaling. The function I used was a clipped linear function - that is, f(x) = 0. for x<0., and f(x) = x for x>0. For my data this activation function worked as well as the sigmoidal units (in some cases better) because the hidden units never took advantage of the non-linearity in the upper range of the sigmoid function. The only difficulty with this function is that it does not have a continuous derivative at 0. You can get around this problem by tacking on a 1/x type function for x<0 that drops off very quickly. This will provide a well behaved, non-zero derivative for all parts of the activation function while adding a negligable value to the output for x<0. The actual function I use is: f(x) = x; x > 0. f(x) = 1/(10**2 - x*10**4); x < 0. I hope this helps. -Alexander ============================================================================= 3) Dr. Fahlman at CMU Date: Thu, 26 Sep 91 22:20:14 -0400 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU 1) How to represent these variables if we don't know in advance what the maximum values are? Scaled representation presupposes the existence of a maximum value. Some may suggest that a linear units can be used at the output layer to get rid of scaling. Right, I was about to suggest that. If so how do I represent the input variable? The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed to 1.0) when the sum is >= 14. However one may suggest that changing the output range of the sigmoidal can help to get rid of saturation effect. Is it a correct approach? For a non-recurrent network, the first layer of weights cand and usually will scale the inputs for you. You save some learning time and possible traps if the inputs are in some reasonable range, but it really isn't essential. I'd advise adding a small constant (0.1 works well) to the derivative of the sigmoid for all units so that you can recover if the unit gets pinned to an extreme value. I don't understand your second point, so I won't try to reply to it. Scott Fahlman Carnegie Mellon University ============================================================================= 4) Ian Fitchet at Birmingham University Date: Fri, 27 Sep 91 03:43:40 +0100 From: Ian Fitchet I'm no expert, but how about having two outputs: one is a control and has a (mostly) fixed value; the other is the output y(i) which is adjusted such that the one divided by the other gives the required result. Off the top of my head, have the control output 0.9 most of the time, when the value of y(i) goes above unity have y(i) = 0.9 and the control decrease, so that if the control equalled 0.45, say, then the real value of the output would be 0.9/0.45 = 2.0 . Of course the question is then, how do I train the nextwork to set the value of the control? But I leave that as an exercise... :-) Cheers, Ian -- Ian Fitchet I.D.Fitchet at cs.bham.ac.uk School of Computer Science Univ. of Birmingham, UK, B15 2TT "You run and you run to catch up with the sun, but it's sinking" Pink Floyd ============================================================================= 5) From Dermot O'Brien at the University of Edinburgh Date: Fri, 27 Sep 91 10:32:31 WET DST Sender: dob at castle.edinburgh.ac.uk You may be interested in the following references (if you havn't read them already): @techreport{Lapedes:87, Author = "Alan S. Lapedes and Robert M. Farber", Title = "Nonlinear signal processing using neural networks: prediction and system modelling", Institution = "Los Alamos National Laboratory", Year = 1987, Number = "LA-UR-87-2662"} @incollection{Lapedes:88, Author = "Alan S. Lapedes and Robert M. Farber", Title = "How Neural Nets Work", BookTitle = "Evolution, Learning, and Cognition", Pages = {331--346}, Editor = "Y.C Lee", Year = 1988, Publisher = "World Scientific", Address = "Singapore"} The above papers analyse the behaviour of feed-forward neural networks applied to the problem of time series prediction, and make an interesting analogy with Fourier decomposition. Cheers, Dermot O'Brien Physics Department University of Edinburgh The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland ============================================================================= 6) From: Tony Robinson Date: Fri, 27 Sep 91 12:23:23 BST My immediate advice is: Don't put the input through a nonlinearity at the start of the network. Use linear output units. Allow a linear path through the system so that if a linear solution to the problem is possible then this is a possible network solution. Then you will have no problems with maximum values. Tony [Robinson] ============================================================================= End of summary. ____________________________________________________________________________ N. KARUNANITHI E-Mail: karunani at CS.ColoState.EDU Computer Science Dept, Colorado State University, Collins, CO 80523. ____________________________________________________________________________ From thomasp at informatik.tu-muenchen.dbp.de Mon Oct 14 05:17:00 1991 From: thomasp at informatik.tu-muenchen.dbp.de (Thomas) Date: 14 Oct 91 10:17 +0100 Subject: report available Message-ID: <91Oct14.101724met.34256(a)gshalle1.informatik.tu-muenchen.de> From khosla at latcs1.lat.oz.au Thu Oct 17 04:00:31 1991 From: khosla at latcs1.lat.oz.au (Rajiv Khosla) Date: Thu, 17 Oct 91 18:00:31 +1000 Subject: Spatial crosstalk and modular NN architechture Message-ID: <9110170800.AA00862@latcs1.lat.oz.au> Dear Connectionists, Can anyone enlighten me on the following. I have to model a problem with 28 discrete inputs(1's and 0's) and 26 discrete outputs. Infact, these 26 discrete outputs can be represented by 5 normalized continous outputs also. Now, I have no problem modelling it as a 28-11-5 network using Scott Fahlman's quickprop . However, I get into all sorts of problems when I have to model 28-?-26 network(? stands for any no. of hid. units. I tried upto 104). Sometime back, I read a paper on modular NN architechtures which suggested that because of spatial crosstalk one should have dedicated or independent links between hidden units and each output unit. This would result in faster training and better generalization. I tried this architechture by making suitable changes in the quickprop algorithm but to no avail. There is no improvement over the standard architechture vis-a-vis training. Infact, things seemed to get slightly worse. I tried with 2,3,4 sets(that is, in all 52,78,104 hid. units resp.) of hid. units per output unit. I gave up after about 5000 epochs as I couldn't see any significant improvement in the total error. Has anyone used the modular architechture in a similar situation with large number of output nodes with positive results? Am I doing something wrong? Is there any other solution except making the outputs continous and reducing the number of output nodes? I have only recently started reading this group. So, Pl. excuse the naiveity of the questions if any. Please e-mail your replies to khosla at latcs1.lat.oz.au Thanks in advance, Rajiv From neural!lamoon.neural!yann at att.att.com Thu Oct 17 10:46:39 1991 From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com) Date: Thu, 17 Oct 91 10:46:39 -0400 Subject: batch-mode parallel implementations Message-ID: <9110171446.AA19788@lamoon> Several years ago, Steve Nowlan and I implemented a "batch-mode" vectorized backprop on a Cray. Just as in Gary Cottrell's story, the raw CUPS rate was high, but because batch mode converges so much slower than on-line, the net gain was 0. I think Patrick Haffner and Alex Waibel had a similar experience with their implementations of TDNNs on the Alliant. Now, the larger, and more redundant the dataset is, the larger the difference in convergence speed between on-line and batch. For small (and/or random) datasets, batch might be OK, but who cares. Also, if you need a very high accuracy solution (for function approximation for example), a second-order batch technique will probably be better than on-line. Sadly, almost all speedup techniques for backprop only apply to batch (or semi-batch) mode. That includes conjugate gradient, delta-bar-delta, most Newton or Quasi-Newton methods (BFGS...), etc... I would love to see a clear demonstration that any of these methods beats a carefully tuned on-line gradient on a large pattern classification problem. I tried many of these methods several years ago, and failed. I think there are two interesting challenges here: 1 - Explain theoretically why on-line is so much faster than batch (something that goes beyond the "multiple copies" argument). 2 - Find more speedup methods that work with on-line training. -- Yann Le Cun From kamil at apple.com Thu Oct 17 12:59:21 1991 From: kamil at apple.com (Kamil A. Grajski) Date: Thu, 17 Oct 91 09:59:21 -0700 Subject: batch & on-line training Message-ID: <9110171659.AA23721@apple.com> The consensus opinion seems to be that on-line learning is preferred for situations consisting of a classification problem with a large (possibly redundant) dataset. What appears to have been a common experience is that batch-mode training generates impressive MCUP statistics, but convergence is slower enough that the net gain is 0. It is difficult to make a scientific judgement still, mostly because the evidence appears to be largely anecdotal, e.g., "I really tried hard to make one (batch, or on-line) work, and it beat the other." It has been observed that several algorithms for accelerating convergence are designed for (semi-)batch mode. Were these to be seriously evaluated, would the net gain 0 still occur? On the other hand, with more work could on-line methods widen their apparent superiority? I don't think that we're splitting hairs by addressing this issue. One trend in the implementations side of NNs is to have the highest MCUPS performance. In several instances, this is achieved using mappings/architectures which rest on batch-mode training. I think that one might design a neurocomputer differently depending on which training mode is to be used, e.g., the communication vs computation curves are different. So, at the moment, in certain instances, we've actually put the cart before the horse. We have fast batch implemen- tations. Do we make batch-mode training better, or can we make on-line so fast and so optimally design a machine that the issue is moot? (I'm ignoring the (possibly substantial) conflicting requirements between training & recognition modes, here.) In any event, it seems that folks are having success doing either in different situations. However, there doesn't seem to be a compelling argument for preferring one or the other IN PRINCIPLE. Cheers, Kamil From dlukas at park.bu.edu Thu Oct 17 12:58:42 1991 From: dlukas at park.bu.edu (David Lukas) Date: Thu, 17 Oct 91 12:58:42 -0400 Subject: Graduate study in Cognitive & Neural Systems at Boston University Message-ID: <9110171658.AA15628@park.bu.edu> (please post) *********************************************** * * * DEPARTMENT OF * * COGNITIVE AND NEURAL SYSTEMS (CNS) * * AT BOSTON UNIVERSITY * * * *********************************************** Stephen Grossberg, Chairman The Boston University Department of Cognitive and Neural Systems offers comprehensive advanced training in the neural and computational principles, mechanisms, and architectures that underly human and animal behavior, and the application of neural network architectures to the solution of outstanding technological problems. Applications for Fall, 1992 admissions and financial aid are now being accepted for both the MA and PhD degree programs. To obtain a brochure describing the CNS Program and a set of application materials, write or telephone: Department of Cognitive & Neural Systems Boston University 111 Cummington Street, Room 240 Boston, MA 02215 (617) 353-9481 or send a mailing address to: kellyd at cns.bu.edu Applications for admission and financial aid should be received by the Graduate School Admissions Office no later than January 15. Applicants are required to submit undergraduate (and, if applicable, graduate) transcripts, three letters of recommendation, and Graduate Record Examination (GRE) scores. The Advanced Test should be in the candidate's area of departmental specialization. GRE scores may be waived for MA candidates and, in exceptional cases, for PhD candidates, but absence of these scores may decrease an applicant's chances for admission and financial aid. Description of the CNS Department: The Department of Cognitive and Neural Systems (CNS) provides advanced training and research experience for graduate students interested in the neural and computational principles, mechanisms, and architectures that underlie human and animal behavior, and the application of neural network architectures to the solution of outstanding technological problems. Students are trained in a broad range of areas concerning cognitive and neural systems, including vision and image processing; speech and language understanding; adaptive pattern recognition; cognitive information processing; self-organization; associative learning and long-term memory; cooperative and competitive network dynamics and short-term memory; reinforcement, motivation, and attention; adaptive sensory-motor control and robotics; and biological rhythms; as well as the mathematical and computational methods needed to support advanced modeling research and applications. The CNS Department awards MA, PhD, and BA/MA degrees. The CNS Department embodies a number of unique features. It has developed a core curriculum that consists of ten interdisciplinary graduate courses each of which integrates the psychological, neurobiological, mathematical, and computational information needed to theoretically investigate fundamental issues concerning mind and brain processes and the applications of neural networks to technology. Additional advanced courses, including research seminars, are also offered. Each course is typically taught once a week in the evening to make the program available to qualified students, including working professionals, throughout the Boston area. Students develop a coherent area of expertise by designing a program that includes courses in areas such as Biology, Computer Science, Engineering, Mathematics, and Psychology, in addition to courses in the CNS core curriculum. The CNS Department prepares students for thesis research with scientists in one of several Boston University research centers or groups, and with Boston-area scientists collaborating with these centers. The unit most closely linked to the department is the Center for Adaptive Systems. The Center for Adaptive Systems is also part of the Boston Consortium for Behavioral and Neural Studies, a Boston-area multi-institutional Congressional Center of Excellence. Another multi-institutional Congressional Center of Excellence focused at Boston University is the Center for the Study of Rhythmic Processes. Other research resources include distinguished research groups in neurophysiology, neuroanatomy, and neuropharmacology at the Medical School and the Charles River campus; in sensory robotics, biomedical engineering, computer and systems engineering, and neuromuscular research within the Engineering School; in dynamical systems within the mathematics department; in theoretical computer science within the Computer Science Department; and in biophysics and computational physics within the Physics Department. 1991 FACULTY and STAFF of CNS and CAS: Daniel H. Bullock Nancy Kopell Gail A. Carpenter John W.L. Merrill Michael A. Cohen Ennio Mingolla H. Steven Colburn Alan Peters Paolo Gaudiano Adam Reeves Stephen Grossberg James T. Todd Thomas G. Kincaid Allen Waxman From MURTAGH at SCIVAX.STSCI.EDU Thu Oct 17 15:29:25 1991 From: MURTAGH at SCIVAX.STSCI.EDU (MURTAGH@SCIVAX.STSCI.EDU) Date: Thu, 17 Oct 1991 15:29:25 -0400 (EDT) Subject: Workshop: Par. Prob. Solving: Applns. in Statistics & Economics Message-ID: <911017152925.28c128fa@SCIVAX.STSCI.EDU> Workshop Announcement and Call for Papers: "Parallel Problem Solving From Nature: Applications in Statistics & Economics". ------------------------------------------------------------------------------- Interdisciplinary Project Center for Supercomputing, ETH, Zurich, Switzerland. December 10-11, 1991. Support/Sponsorship: DOSES/Statistical Office of the European Communities; IPS, ETH Zurich; Konjunkturforschungsstelle, ETH Zurich; MasPar Distributor AG Zurich; PAR, Schweizerische Informatiker Gesellschaft; Parsytec GmbH, Aachen; QT optec AG, Zug; Schweizerischer Bankverein, Basel, IBM Switzerland. Program Committee: J. Frain (Central Bank of Ireland), K. Kirchmayr (Schweizerischer Bankverein, Basel), F. Murtagh (Munotec Systems, Munich and Dublin), P. Van Nypelseer (DOSES/EUROSTAT, Luxembourg), U. Reimer (Rentenanstalt Zuerich), M.M. Richter (DFKI Kaiserslautern), W. Roth (Konjunkturforschungsstelle ETH, Zurich), D. Wuertz (IPS, ETH Zurich), and H.G. Zimmermann (Siemens, Munich). Invited Speakers: J. Bernasconi (ABB Corp. Research, Baden), A. Colin (Citibank, London), F. Fogelman-Soulie (MIMETICS, Chatenay Malabry), J. Frain (Central Bank of Ireland), H. Horner (Universitaet Heidelberg), H. Muehlenbein (GMD, Sankt Augustin, Bonn), F. Murtagh (Munotec Syst., Munich), M.B. Priestley (UMIST Manchester), R. Rohwer (CSTR University of Edinburgh), C. Schaefer (Rowland Inst. of Science, Cambridge MA), P. Treleaven (University College London), A. Varfis (Joint Research Center, Ispra), H.-M. Wallmeier (IBM Scientific Center, Heidelberg), D. Weers (Aspen Intellect, Zug), A. Weigend (Stanford University) D. Wuertz (IPS, ETH Zurich), H.G. Zimmermann (Siemens, Munich). Registration: SFr 400 for those from profit-making companies; otherwise SFr 150. A limited fund will be available to support younger participants who would not otherwise be able to attend. Late registration, after November 1, additional SFr 50. Remittance (only Swiss Francs) to: PASE-Workshop - Dr. Diethelm Wuertz, Schweizerischer Bankverein, Zurich. Acccount number: P0-206066.0. Accommodation requests: directly to: Verkehrsverein Zurich (VVZ), Kongressbuero, Postfach, CH-8023 Zurich, Switzerland (Tel: + 41 1 211-1256). Contact Point: Dr. Diethelm Wuertz, IPS ETH Zurich, ETH Zentrum, CLU B3, CH-8092 Zurich, Switzerland. Fax: + 41 1 252-0185. Email: wuertz at ips.ethz.ch or the undersigned. Abstract: 1 page, by November 1. F.D. Murtagh murtagh at scivax.stsci.edu From dominic at DEBUSSY.CODA.CS.CMU.EDU Thu Oct 17 16:21:08 1991 From: dominic at DEBUSSY.CODA.CS.CMU.EDU (Chioccioli) Date: Thu, 17 Oct 91 14:21:08 -0600 Subject: No subject Message-ID: <9110172021.AA24272@debussy.cs.colostate.edu> This posting briefly describes my interest in parallel learning algorithms for neural networks. Currently I am investigating the following two aspects of parallel reinforcement learning algorithms for sequential decision tasks: 1) Multiple nets on multiple task simulations. Our goal here is to combine multiple-simultaneous experiences to reduce the wall-clock time required to learn a task. 2) Multiple nets on single task simulation. This paradigm assumes that multiple simulations cannot be run, however, parallel search of the (single) experience space obtained from running a single simulation can be used to reduce the total number of trials (i.e. simulated experiences) required for learning. Several different algorithms will be attempted for both of the above tasks. I am interested in hearing from others who may also be doing research in parallel learning algorithms for neural networks. Pointers to relevant publications or references will be most helpful. thanks, in advance for any responses. I will post a summary of any references I receive provided that this is not a repeated request and that sufficient response is forthcoming. Regards, Steve Dominic dominic at debussy.cs.colostate.edu Colorado State University Computer Science Dept. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Thu Oct 17 16:01:33 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Thu, 17 Oct 91 15:01:33 EST Subject: Paper Message-ID: Two days ago I posted the abstract of our paper "PPNN: A Faster Learning and Better Generalizing Neural Net". Because the paper will appear in the proceeding of IJCNN'91-SINGAPORE, I thought it would be not necessary to place it in the neuroprose. However, since the posting, I have received large amounts of messages requesting a copy of the paper, and the request is still going on. Because I had no preparation for this, I was unable to answer all of the messages in time. Please excuse me for any possible delay and errors in replying your requests. Thanks to many colleagues suggestion, I am going to place the paper to neuroprose archive. I will provide the procedures for reaching it at cheops of Ohio State when it is ready. I will be happy to send hardcopy to those having no access to FTP. Bo Xu Indiana University itgt500 at indycms.iupui.edu From khosla at latcs1.lat.oz.au Thu Oct 17 20:49:38 1991 From: khosla at latcs1.lat.oz.au (khosla@latcs1.lat.oz.au) Date: Fri, 18 Oct 91 10:49:38 +1000 Subject: Pl. Ignore Message-ID: <9110180049.AA28265@latcs2.lat.oz.au> This is a test From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Oct 18 02:10:29 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 18 Oct 91 02:10:29 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0500. <9110171446.AA19788@lamoon> Message-ID: Yann LeCun writes: Now, the larger, and more redundant the dataset is, the larger the difference in convergence speed between on-line and batch. For small (and/or random) datasets, batch might be OK, but who cares. I think that it may be misleading to lump together "large" and "redundant" as if they were the same thing, or as if they were inseparable. I agree that for highly redundant datasets, continuous updating has an advantage. I also agree that for small datasets, we don't care much about speed. But it seems to me that it is possible to have a large, not-very-redundant data set, and that accelerated batch methods should have an advantage for these. I guess you could measure redundancy by seeing if some subset of the training data set produces essentially the same gradient vector as the full set. Probably statisticians have good ways of talking about this redundancy business -- unfortunately, I don't know the right vocabulary. In a data set with noise, you need a big enough training set to raise relatively rare but real features above the level of the random background noise. If you have roughly that much data, I bet fast batch techniques would win; if you have a training set that is several times this minimal size, then continuous updating would win. That's my suspicion, anyway. I would love to see a clear demonstration that any of these methods beats a carefully tuned on-line gradient on a large pattern classification problem. I tried many of these methods several years ago, and failed. Well, if my hypothesis above is right, we could demonstrate this by finding a dataset that is large enough to make you happy, but not highly redundant. I guess that we could create this by taking any large dataset, measuring its redundancy, and trimming it down to minimal size (assuming that the result still can be classified as large). Do you know of any big sets that would qualify? It should preferably a relatively "pure" N-input data-classification problem, without all the additional issues (e.g. translation invariance) that are present in image-processing and speech-processing tasks. I think there are two interesting challenges here: 1 - Explain theoretically why on-line is so much faster than batch (something that goes beyond the "multiple copies" argument). 2 - Find more speedup methods that work with on-line training. I have a hunch that if we work hard enough on speeding up online training, we'll end up with something whose NET EFFECT is equivalent to the following: 1. Accumulate gradient data for a length of time that is adaptively chosen: Large enough for the gradients to be stable and accurate, but not large enough to be redundant. 2. Use something equivalent to one of the batch-processing acceleration techniques on this smoothed gradient. That's not to say that the technique will necessary do this in an obvious way -- it may be twiddling the weights each time a sample goes by -- but I suspect this kind of accumulation, smoothing, and acceleration will be present at some level. As I said, for now this is just a hunch. -- Scott Fahlman P.S. I avoid using the term "on-line" for what I call "per-sample" or "continuous" updating of weights. For me, "online" means something else. At this moment, I am sitting at my workstation watching one of my batch-updating algorithms running "on-line" in front of me. From smagt at fwi.uva.nl Fri Oct 18 09:23:07 1991 From: smagt at fwi.uva.nl (Patrick van der Smagt) Date: Fri, 18 Oct 91 14:23:07 +0100 Subject: Spatial crosstalk and modular NN architechture Message-ID: <9110181323.AA28643@fwi.uva.nl> > I have to model a problem with 28 discrete inputs(1's and 0's) and >26 discrete outputs. Infact, these 26 discrete outputs can be represented by >5 normalized continous outputs also. If one would want to model any kind of function, why go for the least obvious solution via a neural network first? Since your problem is binary, too, I would first create a much simpler method such as k-nearest-neighbour or any bin approach which would enable one to gain an understanding of the data and the overlap. Ten years ago this would have been a more standard approach, instead of using a black box (aka neural network). The reason that I would _not_ immediately grasp a network to do some function-approximation is that I have seen too many people choke on the fact that they do not understand their data, or the complexity of the data, a reasonable ratio #degrees of freedom:#learning samples, etc. Patrick van der Smagt From xiru at Think.COM Fri Oct 18 10:42:04 1991 From: xiru at Think.COM (xiru Zhang) Date: Fri, 18 Oct 91 10:42:04 EDT Subject: batch & on-line training In-Reply-To: "Kamil A. Grajski"'s message of Thu, 17 Oct 91 09:59:21 -0700 <9110171659.AA23721@apple.com> Message-ID: <9110181442.AA03133@yangtze.think.com> Date: Thu, 17 Oct 91 09:59:21 -0700 From: "Kamil A. Grajski" The consensus opinion seems to be that on-line learning is preferred for situations consisting of a classification problem with a large (possibly redundant) dataset. What appears to have been a common experience is that batch-mode training generates impressive MCUP statistics, but convergence is slower enough that the net gain is 0. It is difficult to make a scientific judgement still, mostly because the evidence appears to be largely anecdotal, e.g., "I really tried hard to make one (batch, or on-line) work, and it beat the other." I have used per-epoch training on an auto-association netowrk, to extract "features" of protein local structures, using as few hidden units as possible. I spent a lot of time to fine-tune the training process, such as using different learning rate at different stage of training, different momentum term, different range of random weights at the beginning, how large each "batch" is, etc. At the end I got a pretty good convergence rate. (Maybe I did not spend enough effert to fine-tune the per-sample training.) My feeling is that training a large network with lots of examples is still an art. You can almost always improve it if you spend time on it. Per-epoch training may have somewhat different behavior than per-sample training. So different training schedule is often needed. And it takes time to figure out what is a good one. It also critically depends on the particular problem you want to solve. Besides the issue of convergence rate, I wonder if people have compared networks trained by per-epoch schedule and per-sample schedule, to see if they have the same level of generalization. One thing I noticed in my work is that per-sample training tends to make certain weights much larger than in per-epoch training. But I am not sure if this is true in general. - Xiru Zhang From neural!lamoon.neural!yann at att.att.com Fri Oct 18 11:08:03 1991 From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com) Date: Fri, 18 Oct 91 11:08:03 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 02:10:29 -0400. Message-ID: <9110181508.AA00547@lamoon> Scott Fahlman writes: >I avoid using the term "on-line" for what I call "per-sample" or >"continuous" updating of weights. I personally prefer the phrase "stochastic gradient" to all of these. >I guess you could measure redundancy by seeing if some subset of the >training data set produces essentially the same gradient vector as the full >set. Hmmm, I think any dataset for which you expect good generalization is redundant. Train your net on 30% of the dataset, and measure how many of the remaining 70% you get right. If you get a significant portion of them right, then accumulating gradients on these examples (without updating the weights) would be little more than a waste of time. This suggests the following (unverified) postulate: The better the generalization, the bigger the speed difference between on-line (per-sample, stochastic....) and batch. In other words, any dataset interesting enough to be learned (as opposed to stored) has to be redundant. There might be no such thing as a large non-redundant dataset that is worth learning. -- Yann From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Oct 18 12:38:38 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 18 Oct 91 12:38:38 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0500. <9110181508.AA00547@lamoon> Message-ID: Original-From: Yann le Cun I personally prefer the phrase "stochastic gradient" to all of these. That's a fine term, but it seems to me that it refers to one of the effects of per-sample updating, and not to the mechanism itself. You might get a "stochastic gradient" because you are updating after every randomly chosen sample, but you might also get it from noise in the samples themselves. So if you want to refer to the choice of updating mechanism, and not to the quality of the gradient, I think it's better to use a term like "per-sample updating" that is nearly impossible for the reader to misunderstand. >I guess you could measure redundancy by seeing if some subset of the >training data set produces essentially the same gradient vector as the full >set. Hmmm, I think any dataset for which you expect good generalization is redundant. Train your net on 30% of the dataset, and measure how many of the remaining 70% you get right. If you get a significant portion of them right, then accumulating gradients on these examples (without updating the weights) would be little more than a waste of time. This suggests the following (unverified) postulate: The better the generalization, the bigger the speed difference between on-line (per-sample, stochastic....) and batch. In other words, any dataset interesting enough to be learned (as opposed to stored) has to be redundant. There might be no such thing as a large non-redundant dataset that is worth learning. I think we may be talking about two different things here. Let's assume that there is some underlying distribution that we are trying to model, and that we take some number of samples from this distribution to use as a training set. It is clearly true that there must be some "redundancy" in the underlying distribution if it is to be worth modelling. In this case, I'm using the term "redundancy" to mean that there's some sort of regular statistical structure that is stable enough to be of predictive value. Put another way, the distribution must not be totally random-looking; it has less than the maximum possible information per sample. However, given one of these redundant underlying distributions, we want to choose a training set that is large enough to be representative of the distribution (and to separate signal from noise), but not so large as to be redundant itself. This training set is what I was referring to in my earlier message. I think it is quite possible for the training set to be large, not internally redundant, and interesting in the sense that it models an predictable (redundant) underlying distribution. And this is the kind of case where I think that batch-updating has an advantage. -- Scott Fahlman From english at sun1.cs.ttu.edu Fri Oct 18 14:20:19 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Fri, 18 Oct 91 13:20:19 CDT Subject: batch-mode parallel implementations Message-ID: <9110181820.AA00593@sun1.cs.ttu.edu> Scott Fahlman remarked, > As for speed of convergence, continuous updating clearly beats per-epoch > updating if the training set is highly redundant. Another important factor is the autocorrelation of the training sequence. Consider a (highly redundant) training sequence that starts with 1000 examples of A and ends with 1000 examples of B. With continuous updating, there is a good chance that learning the B examples will cause the learned response to A examples to be lost. The obvious answer, in this contrived case, is to alternate presentations of A and B examples. Now for an uncontrived case: Suppose we are training a recurrent net for speaker-independent speech recognition, and that inputs to the net are power spectra extracted from the speech signal at fixed intervals. There are relatively long intervals in which the speech sound (spectrum) does not change much. There are even longer intervals in which the speaker does not change. Reordering the spectra for an utterance is clearly not an option, and continuous updating seems imprudent even though the redundancy of the training set is high. I'm sure there are plenty of nonstationary time series, other than speech, which present the same problems. In response to Scott's remark on the batch size used with an accelerated convergence procedure, > It must be sufficiently large to give a reasonably stable picture of > the overall gradient, but not so large that the gradient is computed > many times over before a weight-update cycle occurs. I would like to mention a case where, surprisingly, even large batches gave instability. The application was recognition of handwritten lower-case letters, and the network was of the LeCun variety. The training set comprised three batches of 1950 letter images (a total of 5850 images). This partition was chosen randomly. Fahlman's quickprop behaved poorly, and with some close inspection I found a number of weights for which the partial derivative was changing sign from one batch to the next. Further, the magnitudes of those partials were not always small. In short, the performance surfaces for the three batches differed considerably. The moral: You may have to make a single batch of the entire training set, even when working with fairly large training sets. -- Tom English english at sun1.cs.ttu.edu From nowlan at helmholtz.sdsc.edu Fri Oct 18 14:30:16 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Fri, 18 Oct 91 11:30:16 MST Subject: batch-mode parallel implementations In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0400. Message-ID: <9110181830.AA14145@bose> A couple of clarifications with regards to Yann's post: i) The dataset used in the comparison had a high degree of redundancy. ii) The "batch-mode" back-prop was vanilla fixed-step gradient descent, not a second order method. The issue of "batch" versus "on-line" is still a very open one. For relatively small problems (for me < ~5000 cases) I prefer conjugate gradient because of accuracy and no need to tune parameters. These techniques are also very easy to parallelize over cases. I have also implemented on a Cray a BP simulator that vectorized over connections rather than cases, and could implement on-line or batch techniques with ease. My experience here suggested that speed-ups could be obtained when the network had as few as a few thousand connections. - Steve From yoshua at psyche.mit.edu Sat Oct 19 12:55:19 1991 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Sat, 19 Oct 91 12:55:19 EDT Subject: online parallel implementation Message-ID: <9110191655.AA12225@psyche.mit.edu> This message concerns an attempt to apply some parallelism to online back-propagation. I had recently access to N = 20 to 40 NeXT workstations on which I could perform learning experiments with back-propagation. My training database was huge (TIMIT, more than half a million patterns, but organized in sequences - sentences - of about 100 'frames' each), so I did not want to use a batch-based method. The idea I attempted to implement was the following: Split the database into N copies. Run N versions of the network on each of the N copies (on the N machines). Share weights _asynchronously_ among the networks, after 1 or more sequence. A 'server' program running on a separate machine received requests from any of the other machines to collect its contribution and return to it the current global moving average of the weights. Since I was running backpropagation through time the weight update was performed only after each sequence even in the single machine implementation, hence the update was not much less 'online' in the parallel implementation. Unfortunately, I don't have anymore access to these machines - because I have moved to a new institution - and I didn't have time to perform enough experiments and compare this approach with others. Yoshua Bengio MIT From honavar at iastate.edu Sat Oct 19 13:30:33 1991 From: honavar at iastate.edu (honavar@iastate.edu) Date: Sat, 19 Oct 91 12:30:33 CDT Subject: redundancy (was Re: batch-mode implementations) In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0400. <9110181508.AA00547@lamoon> Message-ID: <9110191730.AA07387@iastate.edu> Scott Fahlman wrote: >>I guess you could measure redundancy by seeing if some subset of the >>training data set produces essentially the same gradient vector as the full >>set. Yann Le Cun responded: > Hmmm, I think any dataset for which you expect good generalization is redunda nt. > Train your net on 30% of the dataset, and measure how many of the remaining > 70% you get right. If you get a significant portion of them right, then > accumulating gradients on these examples (without updating the weights) would > be little more than a waste of time. It is probably useful to distinguish between redundancy WITHIN the training set and the redundancy BETWEEN the training and test sets (or, redundancy in the combined training and test sets). I suspect Scott Fahlman was refering to the redundancy (R1) within the training set while Le Cun was refering to the redundancy (R2) in the set formed by the union of training set and test set (please correct me if I am wrong). I would expect the relationship between generalization and R1 to be quite different from the relationship between generaization and R2. Whether the two measures of redundancy will be the same or not will almost certainly depend on the method(s) (e.g., sampling procedures, sample size reduction techniques) used to arrive at the data actually given to the network during training. In fact, if a training set T (obtained say, by random sampling from some underlying distribution) were to be preprocessed in some fashion (e.g., using statistical techniques) and reduced training set T' was obtained from T after eliminating the "redundant" samples, clearly the redundancy (R1') within the reduced training set T' will be much smaller than the redundancy (R1) in the original training set T although the overall redundancy (R2) in the set formed by the union of T and the test data may be more or less equal to the redundancy (R2') in the set formed by the union of T' and the test data. My guess is that the generalization on the test data will be more or less the same irrespective of whether T or T' is used for training the network. Vasant Honavar honavar at iastate.edu From nowlan at helmholtz.sdsc.edu Sat Oct 19 15:05:24 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Sat, 19 Oct 91 12:05:24 MST Subject: Paper Announcement (Neuroprose) Message-ID: <9110191905.AA15742@bose> ** Paper available via Neuroprose *************************************** ** Please do not forward to other mailing lists or boards. Thank you. ** The following paper has been placed in the Neuroprose archives at Ohio State. The file is nowlan.soft-share.ps.Z Ftp instructions follow the abstract. ----------------------------------------------------- Simplifying Neural Networks by Soft Weight-Sharing Steven J. Nowlan Computational Neuroscience Laboratory The Salk Institute P.O. Box 5800 San Diego, CA 92186-5800 Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Canada M5S 1A4 ABSTRACT: One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of non-zero weights. We propose a more complicated penalty term in which the distribution of weight values is modelled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability densities under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms. ----------------------------------------------------- FTP INSTRUCTIONS Either use "Getps nowlan.soft-share.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get nowlan.soft-share.ps.Z ftp> quit unix> uncompress nowlan.soft-share.ps.Z unix> lpr -s nowlan.soft-share.ps (or however you print postscript) Steven J. Nowlan Computational Neuroscience Laboratory The Salk Institute P.O. Box 85800 San Diego, CA 92186-5800 Work Phone: 619-453-4100 X463 e-mail: nowlan at helmholtz.sdsc.edu From tgd at guard.berkeley.edu Sat Oct 19 17:09:06 1991 From: tgd at guard.berkeley.edu (Tom Dietterich) Date: Sat, 19 Oct 91 14:09:06 -0700 Subject: batch-mode parallel implementations In-Reply-To: Tom English's message of Fri, 18 Oct 91 13:20:19 CDT <9110181820.AA00593@sun1.cs.ttu.edu> Message-ID: <9110192109.AA04626@guard.berkeley.edu> There has been a fair amount of work in decision-tree learning on the issue of breaking large training sets into smaller batches. In 1980, Quinlan introduced a method called "windowing" in which a small sample (or window) of the training data is initially drawn at random. The algorithm is trained on this window and then tested on the remainder of the data (that was excluded from the window). Then, some fraction of the misclassified examples (possibly all of them) are added to the window. Generally speaking, in noise-free domains, windowing works quite well. A very high-performing decision tree can be learned with a relatively small window. However, for noisy data, the general experience has been that the window eventually grows to include the entire training set. Jason Catlett (Sydney U) recently completed his dissertation on testing windowing and various other related tricks on datasets of roughly 100K examples (straight classification problems). I recommend his papers and thesis. His main conclusion is that if you want high performance, you need to look at all of the data. --Tom From ross at psych.psy.uq.oz.au Sat Oct 19 19:50:16 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 20 Oct 1991 09:50:16 +1000 Subject: batch & on-line training Message-ID: <9110192350.AA02282@psych.psy.uq.oz.au> On the topic of batch versus on-line training, Kamil at apple.com writes: > ... there doesn't seem to be a > compelling argument for preferring one or the other IN PRINCIPLE. I would like to turn the dichotomy into a trichotomy and argue that there is an 'in principle' reason for a preference. I want to add one-shot learning, which I define (on the spur of the moment) to be successful learning from one occasion of exposure to the input. This phenomenon is known to happen in animals (e.g. it can happen in taste aversion conditioning) and can happen in humans (e.g. recognition of an abstract painting seen only once before). One-shot learning becomes critical if you are trying to perform 'cognitive' tasks - when you learn the route to a new office you don't need hundreds or thousands of exposures to get it right. Obviously, one-shot learning can't be expected to happen in all circumstances: you have to be working in a constrained problem domain that can support it and the learner has to have the background knowledge that will support what is to be learned. Most of the work that is done with backprop and its relatives starts with near to a tabula rasa and all the time and effort goes into creating the universe from only the input data. Obviously, techniques do exist for one-shot learning: e.g. simple delta rule with a learning rate of 1. The problem is that they fail on the problems that people regard as interesting - inputs non-orthogonal and hidden units required. The challenge is to find a one-shot learning algorithm that can work on interesting problems. I believe that this will require strong architectural and problem data constraints. I see the current heavy use of gradient-descent techniques as analogous to the period in the history of AI when researchers looked for general problem solving techniques that were universally applicable. General techniques worked on toy problems but rapidly bogged down on real problems. In BP, we have a technique for learning arbitrary mappings, and we pay for it with excruciatingly slow learning. To summarise: IF you want to perform cognitive tasks THEN 'in principle' one shot learning is the only training regime that is acceptable (although slower learning may be required to get the net to the point where it can learn in one shot). All you have to do is invent a good one-shot learning scheme :-). Ross Gayler ross at psych.psy.uq.oz.au From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Oct 20 11:08:11 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 20 Oct 91 11:08:11 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 13:20:19 -0600. <9110181820.AA00593@sun1.cs.ttu.edu> Message-ID: I would like to mention a case where, surprisingly, even large batches gave instability. The application was recognition of handwritten lower-case letters, and the network was of the LeCun variety. The training set comprised three batches of 1950 letter images (a total of 5850 images). This partition was chosen randomly. Fahlman's quickprop behaved poorly, and with some close inspection I found a number of weights for which the partial derivative was changing sign from one batch to the next. Further, the magnitudes of those partials were not always small. In short, the performance surfaces for the three batches differed considerably. The moral: You may have to make a single batch of the entire training set, even when working with fairly large training sets. -- Tom English Note that it is OK to switch from one training set to another when using Quickprop, but that every time you change the training set you *must* zero out the prev-slopes and delta vectors. This prevents to quadratic part of the algorithm from trying to draw a parabola between two slopes that are not closely related. If you don't do this, that one step can badly mess up the weights you've laboriously accumulated so far. Of course, if you do this after every sample, the quadratic acceleration never kicks in and you end up with nothing more than plain old backprop without momentum. If you want to get any benefit from quickprop, you have to run each distinct training set for at least a few cycles. If you were aware of all that (it's unclear from your message) and still experienced instability, then I would say that the batches, even though they are fairly large, are not large enough to provide a fair representation of the underlying distribution. -- Scott From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Sun Oct 20 19:55:51 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Sun, 20 Oct 91 18:55:51 EST Subject: One-shot learning Message-ID: Ross Gayler wrote: >To summarise: IF you want to perform cognitive tasks THEN 'in principle' one >shot learning is the only training regime that is acceptable (although slower >learning may be required to get the net to the point where it can learn in >one shot). All you have to do is invent a good one-shot learning scheme :-). Although one-shot (-trial) learning may not be the only mode of learning in our cognitive processes, it's true that the learning in our cognitive processes will not take as many times (epochs) as current BPNN takes. One-shot learning can be served as a goal and a criterion for learning schemes in both cognitive learning processes as well as learning systems for practical applications. Our work on PPNN (I posted the abstract several days ago) was originally driven by the one-trial learning. Although PPNN has not reached one-trial learning, it has stepped closer to it. In order to contrast the topological effect, we constrained PPNN to be the same as BPNN in all aspects except the topology. It was shown that the stereotopology alone can increase the training times (epoches) by several orders (due to the characteristics of PPNN's stereotopology, we used the average training time instead of epochs to measure the rate of convergence). It was found that the more difficult the problem is, the higher the order is. This topological speedup lies in the fact that there is a cause of slowness in the original planar topology of BPNN that cannot be accounted for by the learning algorithm or units characteristics (no matter what learning algorithm is used or what units responsive characteristics are employed, this cause of slow learning always exists. It is inherent to the planar topology of BPNN). Bo Xu Indiana University itgt500 at indycms.iupui.edu From mmoller at daimi.aau.dk Mon Oct 21 08:13:06 1991 From: mmoller at daimi.aau.dk (mmoller@daimi.aau.dk) Date: Mon, 21 Oct 91 13:13:06 +0100 Subject: Batch methods versus stochastic methods... Message-ID: <9110211213.AA13826@sinope.daimi.aau.dk> --- Concerning the discussion about batch-update versus stochastic update. The last about 6 month we have been working with online versus batch problems. A preprint of a paper, which tries to describe why the stochastic methods in some instances are better than the deterministic batch mehods will soon be available via the neuroprose archive. The paper also introduces a new algorithm which combines the good properties of the stochastic methods as well the batch methods. Our results so far can be summarized as follows: The redundancy of the trainingset plays as has been mentioned before a very important role. It is not clear, however, how to define this redundancy in a proper way. The usually definition of redundancy taken from the information theory can give a hint about he redundancy but can not in any obvious way provide a precise defintion, because this would involve the information content of the trainingset as well as the internal dynamics (the structur) of the network. So when we discuss the concept of redundancy we should be aware of that redundancy in the context of learning in feedforward networks is not very well defined. Another very important issue which I think is even more important than the concept of redundancy is the structure of the error surface. The "true" error surface which are given by the whole trainingset is as we know often characterized by a large number of flat regions and very steep, narrow ravines. Batch methods operates in the true but very complex error surfaces while stochastic methods operate in partial error surfaces which are only approximations to the true error surface. So stochastic methods makes a noisy, stochastic search in the true error surface which can help them through the flat regions. One can think of the stochastic search as a kind of "simulated annealing" approach where increase of error is also allowed. The algorithm we propose is based on a combination of the good properties of stochastic and batch algorithms. The main idea is to use a conjugate gradient algorithm on blocks of data (block-update or semi-batch update). Because the conjugate gradient algorithms updates weights with variable (and sometime large) stepsizes a validation scheme is used to control the updates. Through a simple sample technique we estimate the probabillity that an update will decrease the total error. This probabillity is then used to decide whether to update or not. The number of patterns needed in each block-update is a variable and controlled by an adaptive optimization scheme during training. We have done some experiments with this approach on the nettalk problem. Our results so far shows that the approach decreases the error faster per epoch than the stochastic backpropagation. More computation is however needed per epoch. An interesting observation is that the number of blocks needed to make an update is growing during learning so that after a certain number of epochs the blocksize is equal to the number of patterns. When this happens the algorithm is equal to a traditional batch-mode algorithm and no validation is needed anymore. In order to be able to draw some definite conclusions we need a few more experiments on different trainingsets. Unfortunately, we do not have any datasets of the proper size. So I would appreciate if anyone could inform me about where to find big datasets that are public available. -- Martin M ----------------------------------------------------------------------- Martin F. Moller email: mmoller at daimi.aau.dk Computer Science Department phone: +45 86202711 5223 Aarhus University fax: +45 86135725 Ny Munkegade, Building 540 8000 Aarhus C Denmark ---------------------------------------------------------------------- From giles at research.nec.com Mon Oct 21 09:15:03 1991 From: giles at research.nec.com (Lee Giles) Date: Mon, 21 Oct 91 09:15:03 EDT Subject: Announcement of NIPS Workshop Message-ID: <9110211315.AA19197@fuzzy.nec.com> Announcement of NIPS Workshop: ************************************************************************** RECURRENT NETWORKS: THEORY AND APPLICATIONS Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains one of the important open issues in the neural network area. Training algorithms are very inefficient in terms of memory demands, computational needs or both. Little is known about convenient architectures for recurrent networks. The number of known successful applications is very limited. Even for static applications (operation in the "fixed point mode"), recurrent networks are more general, and therefore more powerful, in principle, than feedforward ones. However, once again, little is known about their actual (dis)advantages, convenient architectures, successful applications, etc. We welcome proposals for presentations ( no more than one page in length) related to the theme of theory or applications of recurrent networks. Subject to the number of received proposals, we envisage a two day workshop, one day theory, the next day applications, with 15-20 minute presentations, each followed by about 10 minutes of discussion. Please send proposals to Lee Giles. Organizers: Professor Luis Borges de Almeida INESC Rua Alves Redol, 9 Apartado 10105 1017 LISBOA CODEX PORTUGAL 351-1-544607 inesc!lba at relay.EU.net (or) lba at sara.inesc.pt C. Lee Giles NEC Research Institute 4 Independence Way Princeton, N.J. 08540 609-951-2642 FAX: 609-951-2482 giles at research.nj.nec.com Richard Rohwer Centre for Speech Technology Research Edinburgh University 80, South Bridge Edinburgh EH1 1HN, Scotland (44 or 0) (31) 650-2764 FAX: (44 or 0) (31) 226-2730 rr%ed.cstr at nsfnet-relay.ac.uk (or) rr at uk.ac.ed.cstr ************************************************************************** C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From DOW_ERNST at LILLY.COM Mon Oct 21 10:16:00 1991 From: DOW_ERNST at LILLY.COM (Ernst Dow, 276-9916) Date: Mon, 21 Oct 1991 09:16 EST Subject: one-shot learning Message-ID: <01GC03SM0RHC0000EE@GATEWAY.LILLY.COM> Ross Gayler writes: I want to add one-shot learning, which I define (on the spur of the moment) to be successful learning from one occasion of exposure to the input. This phenomenon is known to happen in animals (e.g. it can happen in taste aversion conditioning) and can happen in humans (e.g. recognition of an abstract painting seen only once before). etc. If it was a big enough event in your life, you will have memorized the event. If it was not so monumental, you can help your memory by replaying the event in your mind. But in this case, we are talking memorization, not generalization. You may be able to identify the painting you saw before, but could you make the leap to recognizing all other abstract paintings? Ernst Dow ernst at lilly.com From: DOW ERNST (MCVAX0::TC64566) From mike at psych.ualberta.ca Mon Oct 21 12:15:37 1991 From: mike at psych.ualberta.ca (Mike R. W. Dawson) Date: Mon, 21 Oct 1991 10:15:37 -0600 Subject: Open position in cognitive psychology Message-ID: <9110211613.AA01542@psych.ualberta.ca> I'd like to bring the following open position in cognitive psychology to the attention of anyone who might be modeling cognitive processes with their networks: ======================================================================= Cognitive or Developmental Psychologists The Department of Psychology, University of Alberta, invites applications for one and, subject to budgetary considerations, possibly two tenure track positions at the level of beginning Assistant Professor, salary range: $38,955-$55,755. Candidates with research expertise in either COGNITIVE PSYCHOLOGY or DEVELOPMENTAL PSYCHOLOGY will be considered. The position in Cognitive is open with respect to area of specialization. The position in Developmental is also open with respect to area, but there is some preference for individuals with interests in language development, conceptual development, mathematical cognition, reading, scientific reasoning, spelling, or writing. Current Developmental faculty conduct research on emergent literacy, reading, and arithmetic skill. Decisions will be made on the basis of demonstrated research excellence, interactions with colleagues, and teaching ability. Applications should include a curriculum vita, three letters of recommendation, and reprints or recent publications. These materials should be sent, as appropriate, to Cognitive Search Chair, Dr. Peter Dixon, or Developmental Search Chair, Dr. Jeffrey Bisanz, Department of Psychology, University of Alberta, Edmonton, Alberta, Canada T6G 2E9. To receive full consideration, all materials must be received by January 1, 1992. The University of Alberta is committed to the principle of equity in employment. The University encourages applications from aboriginal persons, disabled persons, members of visible minorities and women. ======================================================================== Michael R. W. Dawson email: mike at psych.ualberta.ca Department of Psychology University of Alberta Edmonton, Alberta Tel: +1 403 492 5175 T6G 2E9, Canada Fax: +1 403 492 1768 From bap at james.psych.yale.edu Mon Oct 21 13:41:35 1991 From: bap at james.psych.yale.edu (Barak Pearlmutter) Date: Mon, 21 Oct 91 13:41:35 -0400 Subject: Paper Announcement (Neuroprose) In-Reply-To: "Steven J. Nowlan"'s message of Sat, 19 Oct 91 12:05:24 MST <9110191905.AA15742@bose> Message-ID: <9110211741.AA03347@james.psych.yale.edu> The following paper has not been placed in the Neuroprose archives at Ohio State. The file is not pearlmutter.soft-share.soft-share.ps.Z. Ftp instructions follow the abstract. ----------------------------------------------------- Simplifying Neural Network Soft Weight-Sharing Measures by Soft Weight-Measure Soft Weight Sharing Barak Pearlmutter Department of Psychology P.O. Box 11A Yale Station New Haven, CT 06520-7447 ABSTRACT: It has been shown by Nowlan and Hinton (1991) that it is advantagious to construct weight complexity measures for use in weight regularization through the use of EM, instead of relying on some a-priori complexity measure, or even worse, neglecting regularization by assuming a uniform distribution. Their work can be regarded as a generalization of the "Optimal Brain Damage" of Le Cunn et al (1990), in which the distribution of weights is estimated with a histogram, a peculiar functional form for a distribution. Nowlan and Hinton assume a much simpler functional form for the distribution, avoiding overfitting and therefore overregularization. However, they disregard the issue of regularization of the regularizer itself. Just as certain weights might be considered a-priori quite unlikely, certain distributions of weights may be considered a-priori quite unlikely. To solve this problem, we introduce a regularization term on the parameters of the weight distribution being estimated. This regularization term is itself determined by a distribution over these distributional parameters. In this light, Nowlan and Hinton (1991) make the uniform distributional parameter distribution assumption. Here, we estimate the distribution of distributions by running an ensemble of networks, with EM used to estimate the weight distribtion of each network (following Nowlan and Hinton), but we then use EM to estimate the distribution of distributions across networks. Of course, each estimated distribution is used to regularize the parameters over which that distribution is defined, leading to regularization of the individual network regularizers. We do not consider how to estimate the a-priori distribution which might be used to regularize the distribution being used to regularize the distribution being used to regularize the weights being estimated from the data, which will be the explored in a future paper. ----------------------------------------------------- FTP INSTRUCTIONS Either use "getps pearlmutter.soft-share.soft-share.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get pearlmutter.soft-share.soft-share.ps.Z ftp> quit unix> uncompress pearlmutter.soft-share.soft-share.ps.Z unix> lpr -s pearlmutter.soft-share.soft-share.ps Barak Pearlmutter Department of Psychology P.O. Box 11A Yale Station New Haven, CT 06520-7447 Work Phone: 203 432-7011 From ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu Mon Oct 21 15:46:00 1991 From: ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu (ANDERSON%BROWNCOG.BITNET@mitvma.mit.edu) Date: Mon, 21 Oct 91 14:46 EST Subject: Technical Report Announcement Message-ID: Technical Report 91-3 available from: Department of Cognitive and Linguistic Sciences Box 1978, Brown University, Providence, RI 02912 A Study in Numerical Perversity: Teaching Arithmetic to a Neural Network James A. Anderson, Kathryn T. Spoehr, and David J. Bennett Department of Cognitive and Linguistic Sciences Box 1978 Brown University Providence, RI 02912 Abstract There are only a few hundred well-defined facts in elementary arithmetic, but humans find them hard to learn and hard to use. One reason for this difficulty is that the structure of elementary arithmetic lends itself to severe associative interference. If a neural network corresponds in any sense to brain-style computation, then we should expect similar difficulties teaching elementary arithmetic to a neural network. We find this observation is correct for a simple network that was taught the multiplication tables. We can enhance learning of arithmetic by forming a hybrid coding for the representation of number that contains a powerful analog or "sensory" component as well as a more abstract component. When the simple network uses a hybrid representation, many of the effects seen in human arithmetic learning are reproduced, including overall error patterns and response time patterns for false products. An extension of the arithmetic network is capable of being flexibly programmed to correctly answer questions involving terms such as "bigger" or "smaller." Problems can be answered correctly, even if the particular comparisons involved had not been learned previously. Such a system is genuinely creative and flexible, though only in a limited domain. It remains to be seen if the computational limitations of this approach are coincident with the limitations of human cognition. A version of this report will appear as a chapter in: "Neural Networks for Knowledge Representation and Inference" Edited by Daniel S. Levine and Manuel Aparicio, IV To be published by Lawrence Erlbaum Associates, Hillsdale, New Jersey Copies can be obtained by sending an email message to: LI700008 at brownvm.BITNET or to: anderson at browncog.BITNET From english at sun1.cs.ttu.edu Mon Oct 21 17:12:09 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Mon, 21 Oct 91 16:12:09 CDT Subject: batch-mode parallel implementations Message-ID: <9110212112.AA01265@sun1.cs.ttu.edu> With regard to my earlier posting on problems I encountered in applying Quickprop, Scott Fahlman has replied: Note that it is OK to switch from one training set to another when using Quickprop, but that every time you change the training set you *must* zero out the prev-slopes and delta vectors. If you want to get any benefit from quickprop, you have to run each distinct training set for at least a few cycles. If you were aware of all that (it's unclear from your message).... Well, I was not aware of what others were doing in practice. Scott's original tech report on Quickprop gave results only for the case of once-per-epoch weight updates. I apologize for referring to my implementation with once-per-batch weight updates and no zeroing between batches as "Fahlman's Quickprop." What I *did* understand was that Quickprop's attempt to approximate the error surface with a paraboloid was going to be fouled-up if the "pictures" of the error surface gleaned from different batches were substantially different. Training for multiple iterations with one batch, and then resetting the variables used in estimating the shape of the error surface before going on to the next batch would certainly eliminate the problem I described. The prospect of choosing the number of iterations per batch does not thrill me, however. In general, I hate parameter tweaking. From my perspective, the worst thing about parameter tweaking is that we don't really know how it affects the quality of the final network obtained. Also, exploring the effects of different parameter settings takes too much of *my* time. I want a procedure that does not require tweaking and that runs at a reasonable fraction of the speed of a "well-tuned" stochastic gradient descent procedure for a wide range of problems. (I haven't experimented with conjugate gradient descent yet, but it seems to fit my bill.) --Tom english at sun1.cs.ttu.edu From giles at research.nec.com Tue Oct 22 15:51:28 1991 From: giles at research.nec.com (Lee Giles) Date: Tue, 22 Oct 91 15:51:28 EDT Subject: Announcement of NIPS (Neural Information Processing Systems) Workshop Message-ID: <9110221951.AA21064@fuzzy.nec.com> Announcement of NIPS (Neural Information Processing Systems) Workshop: Dec 6-7, Vail, Colorado. ************************************************************************** RECURRENT NETWORKS: THEORY AND APPLICATIONS Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains one of the important open issues in the neural network area. Training algorithms are very inefficient in terms of memory demands, computational needs or both. Little is known about convenient architectures for recurrent networks. The number of known successful applications is very limited. Even for static applications (operation in the "fixed point mode"), recurrent networks are more general, and therefore more powerful, in principle, than feedforward ones. However, once again, little is known about their actual (dis)advantages, convenient architectures, successful applications, etc. We welcome proposals for presentations ( no more than one page in length) related to the theme of theory or applications of recurrent networks. Subject to the number of received proposals, we envisage a two day workshop, one day theory, the next day applications, with 15-20 minute presentations, each followed by about 10 minutes of discussion. Please send proposals to Lee Giles. Organizers: Professor Luis Borges de Almeida INESC Rua Alves Redol, 9 Apartado 10105 1017 LISBOA CODEX PORTUGAL 351-1-544607 inesc!lba at relay.EU.net (or) lba at sara.inesc.pt C. Lee Giles NEC Research Institute 4 Independence Way Princeton, N.J. 08540 609-951-2642 FAX: 609-951-2482 giles at research.nj.nec.com Richard Rohwer Centre for Speech Technology Research Edinburgh University 80, South Bridge Edinburgh EH1 1HN, Scotland (44 or 0) (31) 650-2764 FAX: (44 or 0) (31) 226-2730 rr%ed.cstr at nsfnet-relay.ac.uk (or) rr at uk.ac.ed.cstr ************************************************************************** C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From thsspxw at iitmax.iit.edu Tue Oct 22 19:10:57 1991 From: thsspxw at iitmax.iit.edu (Peter Wohl) Date: Tue, 22 Oct 91 18:10:57 CDT Subject: batch-mode parallel implementations In-Reply-To: <8431.688104706@B.GP.CS.CMU.EDU>; from "Connectionist_Research_Group@B.GP.CS.CMU.EDU" at Oct 22, 91 12:11 (midnight) Message-ID: <9110222311.AA09935@iitmax.iit.edu> Dear connectionists, I have some comments on several of these, so I decided not to include all the history of this discussion in my reply (you read it anyway). So here I go: 1. Given per-sample training, one still faces the problem of how to deal with really large networks (thousands of neurons and hundreds of thousands connections) on a parallel machine that has far fewer processors. What has been proposed: a) SIMD (don't cry for unused processors, as long as you can communicate fast enough); b) MIMD with clustering neurons somehow together, to increase granularity (SIMD also needs some), problem here being dependence on VERY particular nets (usually layers with powers of 2 neurons); c) re-writing the communication of the algorithm (see for example my paper this coming Nov at ICTAI'91). 2. I agree that epoch-training is probably desirable. How large is a "typical" epoch for a "large" net (thousands of neurons, fraction of million connections at least) ? Tens of vectors, hundreds ? I would say, no more than few hundreds. 3. "Recall" (forward propagation with no weight update) is far easier to parallelize, since there is no end-of-epoch bottleneck (barrier synch). In some results (to be published next year), we achieved (on 32 BBN Butterfly processors) almost 2 million connec-presen/sec with backprop., but over 5 million at recall. (2.5 million if you "adjust" forward-only by dividing by two, to match the backprop figure more closely). To summarize, I think the real problem of parallelizing ANNs applies when at least one of net-size or training-epoch-size is large (and thus slow when run sequentially). And don't forget: net architecture could change during training (e.g. cascade corr), and still keep it parallel. Thanks for your patience, Peter Wohl thsspxw at iitmax.iit.edu From spotter at darwin.bio.uci.edu Tue Oct 22 19:17:52 1991 From: spotter at darwin.bio.uci.edu (Steve Potter) Date: Tue, 22 Oct 91 16:17:52 PDT Subject: Continuous vs. Batch learning Message-ID: <9110222317.AA22627@sanger.bio.uci.edu> It is pretty clear to me that biological neural networks have all adapted to prefer the continuous learning technique, as we can verify for humans by remembering something that we only saw (or heard, etc.) once. One-trial learning paradigms abound in the behavioral literature. I cant think of any biological examples of batch learning, in which sensory data are saved until a certain number of them can be somehow averaged together and conclusions made and remembered. Any ideas? Anyway, perhaps we should take an example from nature, which has been optimizing things far longer than we have! Steve Potter UC Irvine Psychobiology dept. Irvine, CA 92717 spotter at darwin.bio.uci.edu From jbower at cns.caltech.edu Wed Oct 23 00:47:51 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Tue, 22 Oct 91 21:47:51 PDT Subject: CNS*92 Message-ID: <9110230447.AA01301@cns.caltech.edu> CALL FOR PAPERS First Annual Computation and Neural Systems Meeting CNS*92 Tuesday, July 26 through Sunday, July 31 1992 San Francisco, California This is the first annual meeting of an inter-disciplinary conference intended to address the broad range of research approaches and issues involved in the general field of computational neuroscience. The meeting itself has grown out of a workshop on "The Analysis and Modeling of Neural Systems" which has been held each of the last two years at the same site. The strong response to these previous meetings has suggested that it is now time for an annual open meeting on computational approaches to understanding neurobiological systems. CNS*92 is intended to bring together experimental and theoretical neurobiologists along with engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in understanding how neural systems compute. The meeting will equally emphasize experimental, model-based, and more abstract theoretical approaches to understanding neurobiological computation. The first day of the meeting (July 26) will be devoted to tutorial presentations and workshops focused on particular technical issues confronting computational neurobiology. The next three days will include the main technical program consisting of plenary, contributed and poster sessions. There will be no parallel sessions and the full text of presented papers will be published. Following the regular session, there will be two days of focused workshops at a site on the California coast (July 30-31). Participation in the workshops is restricted to 75 attendees. Technical Program: Plenary, contributed and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Presentation categories: A. Theory and Analysis B. Modeling and Simulation C. Experimental D. Tools and Techniques Themes: A. Development B. Cell Biology C. Excitable Membranes and Synaptic Mechanisms D. Neurotransmitters, Modulators, Receptors E. Sensory Systems 1. Somatosensory 2. Visual 3. Auditory 4. Olfactory 5. Other F. Motor Systems and Sensory Motor Integration G. Behavior H. Cognitive I. Disease Submission Procedures: Original research contributions are solicited, and will be carefully refereed. Authors must submit six copies of both a 1000-word (or less) summary and six copies of a separate singlepage 50-100 word abstract clearly stating their results postmarked by January 7, 1992. Accepted abstracts will be published in the conference program. Summaries are for program committee use only. At the bottom of each abstract page and on the first summary page indicate preference for oral or poster presentation and specify at least one appropriate category and and theme. Also indicate preparation if applicable. Include addresses of all authors on the front of the summary and the abstract and indicate to which author correspondence should be addressed. Submissions will not be considered that lack category information, separate abstract sheets, the required six copies, author addresses, or are late. Mail Submissions To: Chris Ploegaert CNS*92 Submissions Division of Biology 216-76 Caltech Pasadena, CA. 91125 Mail For Registration Material To: Chris Ghinazzi Lawrence Livermore National Laboratories P.O. Box 808 Livermore CA. 94550 All submitting authors will be sent registration material automatically. Program committee decisions will be sent to the correspondence author only. CNS*92 Organizing Committee: Program Chair, James M. Bower, Caltech. Publicity Chair, Frank Eeckman, Lawrence Livermore Labs. Finances, John Miller, UC Berkeley and Nora Smiriga, Institute of Scientific Computing Res. Local Arrangements, Ted Lewis, UC Berkeley and Muriel Ross, NASA Ames. Program Committee: William Bialek, NEC Research Institute. James M. Bower, Caltech. Frank Eeckman, Lawrence Livermore Labs. Scott Fraser, Caltech. Christof Koch, Caltech. Ted Lewis, UC Berkeley. Eve Marder, Brandeis. Bruce McNaughton, University of Arizona. John Miller, UC Berkeley. Idan Segev, Hebrew University, Jerusalem Shihab Shamma, University of Maryland. Josef Skrzypek, UCLA. DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 please post From palmer at world.std.com Wed Oct 23 02:25:10 1991 From: palmer at world.std.com (Kent D Palmer) Date: Wed, 23 Oct 91 02:25:10 -0400 Subject: THINKNET NEWSLETTER ANNOUNCEMENT Message-ID: <9110230625.AA18459@world.std.com> ===========================START=OF=THINKNET=FILE============================ ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||| PLEASE POST ----- NEWSLETTER ANNOUNCEMENT |||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| /| ....... .. .. . . . . .==|........ ... .. .... . .... .. ._____. . * . . / ===|_ _. ..______________________________...... | | | | |\ | / ======== |\ ...| .... |.THINKNET:An Electronic.... | |---| | | \ |< ========== |. \ .|---- . |.Journal Of Philosophy,... | | | | | \| \ ======== |... \| ..... |.Meta-Theory, And Other.. | | | | | | \ ====== |.... |____.. |.Thoughtful Discussions.... .==| ........ .. .... .. ... .. . \| .... ... .. .. . . .. . . ----------------------------------------------------------------------------- OCTOBER 1991 ISSUE 001 VOLUME 1 NUMBER 1 ----------------------------------------------------------------------------- This is an announcement for Thinknet, an on-line magazine forum dedicated to thoughtfulness in the cybertime environment. Thinknet covers philosophy, systems theory, and meta-theoretical discussions within disciplines. It is your interdisciplinary window on to what significant information sources are available to foster thought provoking discussion. *CONTENTS* Publication Data Scope of newsletter. Rationale for newsletter. Subscriptions and Submittals address. Bulletin Boards where it may be found. Services offered by newsletter. Staff of this edition. Coda: call for participation. About Thinknet Discussion of goals of Thinknet Newsletter. Prospect for Philosophy and Systems Theory in Cybertime Is there a possibility for a renaissance for philosophy? The Philosophy Category on GEnie Review by Gordon Swobe with list of topics. Philosophy on the WELL Review by Jeff Dooley with list of topics. Origin Conference on the WELL Review by Bruce Schuman with list of topics Internet Philosophy Mailing Lists A review of all know philosophy oriented mailing lists by Stephen Clark. Books Of Note THE MATRIX !%@:: A DIRECTORY OF ELECTRONIC MAIL ADDRESSING & NETWORKS Other Publications BOARDWATCH MAGAZINE SOFTWARE ENGINEERING FOUNDATIONS [a work in progress] Books, Electronic Newsletters, and Cyber-Artifacts Received ARTCOM NEWSLETTER FACTSHEET FIVE Protocols for Meaningful Discussions: ARTICLE by Kent Palmer A consideration of how philosophy discussions might be made more useful and their history accessible by using a voluntary protocol. Thoughtful Communications: EDITORIAL Closing remarks. <<<<<<<<<<<>>>>>>>>>>>> ----------------------------------------------------------------------------- HOW TO GET YOUR COPY kdp ----------------------------------------------------------------------------- *Price* The electronic form is FREE. Hardcopies cost money for reproduction, postage, and handling. *Subscriptions* Send an e-mail message to the following address: thinknet at world.std.com Your message should be of the following form: SEND THINKNET TO YourFullName AT YourEmailAddress Some mailing lists do not include your return mailing address if you use the reply function of your mail reader so you must make sure your return e-mail address is in the body of your message. Thinknet file is long, about 1113 lines; 7136 words; 51795 bytes. You will be added to the thinknet subscription list. You will get all further issues unless you unsubscribe. *Bulletin Boards* Thinknet will be posted in the WELL philosophy conference in a topic. The WELL 27 Gate Five Road, Sausalito, CA 94965 modem 415-332-6106 voice 415-332-4335 Also on GEnie in the Philosophy category under the Religion and Ethics Bulletin Board. GEnie Client Services 1-800-638-9636 *PHILOS-L Listserver* You will eventually be able to get the thinknet newsletter from a listserver. Send the message 'GET THINKNET DOC' to 'LISTSERV at LIVERPOOL.AC.UK'. If you get an error message try the regular thinknet address. *Or if all else fails* THINKNET PO BOX 8383 ORANGE CA 92664-8383 UNITED STATES ==============================END=THINKNET=FILE============================= From ross at psych.psy.uq.oz.au Wed Oct 23 04:23:43 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Wed, 23 Oct 1991 18:23:43 +1000 Subject: one-shot learning Message-ID: <9110230823.AA28466@psych.psy.uq.oz.au> Ernst Dow (ernst at lilly.com) writes (in the context of one-shot or one-trial learning): >But in this case, we are talking memorization, not generalization. You may >be able to identify the painting you saw before, but could you make the >leap to recognizing all other abstract paintings? My interest is in analogical retrieval and not one-trial learning (except to the extent that it is necessary for 'truly cognitive' capabilities). The literature on analogy stresses the role that goals play in determining the apparent similarity (and hence generalisation) of entities. That is, in analogy the generalisation pattern emerges at recall time rather than being completely determined at storage time. For such a (post-hoc) generaliser it makes sense to attempt to memorise everything. This contrasts with the approach of most BP work where the system learns an internal representation (read that as set of hidden units and weights) that supports a particular pre-specified pattern of generalisation. I realise that there is more to life than analogical recall and some generalisation is based on literal similarity etc, but I am just stating the extreme position for simplicity. Ross Gayler ross at psych.psy.uq.oz.au From pluto at cs.UCSD.EDU Mon Oct 21 19:29:59 1991 From: pluto at cs.UCSD.EDU (Mark Plutowksi) Date: Mon, 21 Oct 91 16:29:59 PDT Subject: Redundancy Message-ID: <9110212329.AA12326@tournesol.ucsd.edu> Scott Fahlman writes: :: :: I guess you could measure redundancy by seeing if some subset of the :: training data set produces essentially the same gradient vector as the full :: set. Probably statisticians have good ways of talking about this :: redundancy business -- unfortunately, I don't know the right vocabulary. Indeed they do; however, they begin from a more general perspective: for a particular "n", where "n" is the number of exemplars we are going to train on, call a set of "n" exemplars optimal if better generalization can not be obtained by training on any other set of "n" exemplars. This criterion is called "Integrated Mean Squared Error." See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. Using appropriate approximations, we can use this to obtain what you suggest. Results for the case of clean data are currently available in Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD CSE department (see [Plutowski & White, 1991].) Basically, given a set of candidate training examples, we select a subset which if trained upon give a gradient highly correlated with the gradient obtained by training upon the entire set. This results in a concise set of exemplars representative (in a precise sense) of the entire set. Preliminary empirical results indicate that the end result is what we originally desired: training upon this well chosen subset results in generalization close to that obtained by training upon the entire set. Tom Dietterich writes: :: :: There has been a fair amount of work in decision-tree learning on the :: issue of breaking large training sets into smaller batches. In 1980, :: Quinlan introduced a method called "windowing" in which a small sample :: (or window) of the training data is initially drawn at random. The :: algorithm is trained on this window and then tested on the remainder of :: the data (that was excluded from the window). Then, some fraction of :: the misclassified examples (possibly all of them) are added to the :: window. :: :: Generally speaking, in noise-free domains, windowing works quite well. :: A very high-performing decision tree can be learned with a relatively :: small window. However, for noisy data, the general experience has :: been that the window eventually grows to include the entire training set. :: Jason Catlett (Sydney U) recently completed his dissertation on :: testing windowing and various other related tricks on datasets of :: roughly 100K examples (straight classification problems). I recommend :: his papers and thesis. :: :: His main conclusion is that if you want high performance, you need to :: look at all of the data. Could you provide a reference to the work demonstrating the performance of windowing on clean data? And could you provide an e-mail address for Jason Catlett? I am in the process of setting up benchmarking experiments for the technique I mentioned above. Although I consider the more general task of fitting arbitrary functional mappings, these works seem relevant. Thanks, ================= == Mark Plutowski Computer Science and Engineering 0114 University of California, San Diego La Jolla, CA ----------- REFERENCES: ----------- Box,G., and N.Draper. 1987. {\bf Empirical Model-Building and Response Surfaces.} Wiley, New York. Khuri, A.I., and J.A.Cornell. 1987. {\bf Response Surfaces (Designs and Analyses)}. Marcel Dekker, Inc., New York. Myers, Raymond H., and A.I. Khuri, W.H. Carter, Jr. 1989. ``Response Surface Methodology: 1966-1988.'' {\em Technometrics}. vol.31, no.2. Plutowski, Mark E., and Halbert White. 1991. ``Active selection of training examples for network learning in noiseless environments.'' Technical Report No. CS91-180, Department of Computer Science and Engineering, The University of California, San Diego. 92093-0114. Accepted pending revision by IEEE Transactions on Neural Networks. ---- Here are some other related works: -------- Cohn, David, Les Atlas, and Richard Ladner. 1990. ``Training connectionist networks with queries and selective sampling.'' {\em Advances in Neural Information Processing Systems 2,} Proc. of the Neural Information Processing Systems Conference. Morgan Kaufmann, San Mateo, California. Hwang, Jenq-Neng, J.J. Choi, Seho Oh, and Robert J. Marks III. 1990. ``Query learning based on boundary search and gradient computation of trained multilayer perceptrons. '' {\em Proc. IJCNN 1990, San Diego. The International Joint Conference on Neural Networks.} IEEE press. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Oct 21 21:27:08 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 21 Oct 91 21:27:08 -0400 Subject: Redundancy In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800. <9110212329.AA12326@tournesol.ucsd.edu> Message-ID: :: I guess you could measure redundancy by seeing if some subset of the :: training data set produces essentially the same gradient vector as the full :: set. Probably statisticians have good ways of talking about this :: redundancy business -- unfortunately, I don't know the right vocabulary. Indeed they do; however, they begin from a more general perspective: for a particular "n", where "n" is the number of exemplars we are going to train on, call a set of "n" exemplars optimal if better generalization can not be obtained by training on any other set of "n" exemplars. This criterion is called "Integrated Mean Squared Error." See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. Using appropriate approximations, we can use this to obtain what you suggest. Results for the case of clean data are currently available in Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD CSE department (see [Plutowski & White, 1991].) Basically, given a set of candidate training examples, we select a subset which if trained upon give a gradient highly correlated with the gradient obtained by training upon the entire set. This results in a concise set of exemplars representative (in a precise sense) of the entire set. Preliminary empirical results indicate that the end result is what we originally desired: training upon this well chosen subset results in generalization close to that obtained by training upon the entire set. Thanks for the references. This is a useful beginning, but doesn't seem to address the problem we were discussing. In many real-world problems, the following constraints hold: 1. We do not have direct access to "the entire set". In fact, this set may well be infinite. All we can do is collect some number of samples, and there is usually a cost for obtaining each sample. 2. Rather than hand-crafting a training set by choosing all its elements, we want to choose an appropriate "n" and then pick "n" samples at random from the set we are trying to model. Of course, if collecting samples is cheap and network training is expensive, you might throw some samples away and not use them in the training set. I don't *think* that this would ever improve generalization, but it might lead to faster training without hurting generalization. 3. The data may not be "clean". The structure we are trying to model may be masked by a lot of random noise. Do you know of any work on how to pick an optimal "n" under these conditions? I would guess that this sort of problem is already well-studied in statistics; if not, it seems like a good research topic for someone with the proper background. -- Scott Fahlman From pluto at cs.UCSD.EDU Mon Oct 21 21:54:29 1991 From: pluto at cs.UCSD.EDU (Mark Plutowksi) Date: Mon, 21 Oct 91 18:54:29 PDT Subject: Redundancy Message-ID: <9110220154.AA12390@tournesol.ucsd.edu> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ..in response to your message, included here: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= :: To: Mark Plutowksi :: Cc: connectionists at CS.CMU.EDU :: Subject: Re: Redundancy :: In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800. :: <9110212329.AA12326 at tournesol.ucsd.edu> :: Date: Mon, 21 Oct 91 21:27:08 -0400 :: From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU :: :: :: I guess you could measure redundancy by seeing if some subset of the :: :: training data set produces essentially the same gradient vector as the full :: :: set. Probably statisticians have good ways of talking about this :: :: redundancy business -- unfortunately, I don't know the right vocabulary. :: :: Indeed they do; however, they begin from a more general perspective: :: for a particular "n", where "n" is the number of exemplars we are going to :: train on, call a set of "n" exemplars optimal if better generalization can :: not be obtained by training on any other set of "n" exemplars. :: This criterion is called "Integrated Mean Squared Error." :: See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. :: :: Using appropriate approximations, we can use this to obtain what you suggest. :: Results for the case of clean data are currently available in :: Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD :: CSE department (see [Plutowski & White, 1991].) Basically, given a set of :: candidate training examples, we select a subset which if trained upon :: give a gradient highly correlated with the gradient obtained by :: training upon the entire set. This results in a concise set of exemplars :: representative (in a precise sense) of the entire set. :: Preliminary empirical results indicate that the end result is what we :: originally desired: training upon this well chosen subset results in :: generalization close to that obtained by training upon the entire set. :: :: Thanks for the references. This is a useful beginning, but doesn't seem to :: address the problem we were discussing. In many real-world problems, the :: following constraints hold: :: :: 1. We do not have direct access to "the entire set". In fact, this set may :: well be infinite. All we can do is collect some number of samples, and :: there is usually a cost for obtaining each sample. :: :: 2. Rather than hand-crafting a training set by choosing all its elements, :: we want to choose an appropriate "n" and then pick "n" samples at random :: from the set we are trying to model. Of course, if collecting samples is :: cheap and network training is expensive, you might throw some samples away :: and not use them in the training set. I don't *think* that this would ever :: improve generalization, but it might lead to faster training without :: hurting generalization. :: :: 3. The data may not be "clean". The structure we are trying to model may :: be masked by a lot of random noise. :: :: Do you know of any work on how to pick an optimal "n" under these :: conditions? I would guess that this sort of problem is already :: well-studied in statistics; if not, it seems like a good research topic for :: someone with the proper background. :: :: -- Scott Fahlman :: I don't know of a feasible way of choosing such an "n". Instead, I obtain a greedy approximation to it. What we do (as reported in the tech report by Plutowski & White) is sequentially grow the training set, first finding an "optimal" training set of size 1, then fitting the network to this training set, appending the training set with a new exemplar selected from the set of available candidates, obtaining a training set of size 2 which is "approximately optimal", fitting this set, appending a third exemplar, etc, continuing the process until the network fit obtained by training over the exemplars fits the rest of the available examples within the desired tolerance. I have no idea as to how close the resulting training sets are to being truly IMSE-optimal. But, they are much more concise than the original set - and so far, at least on the toy problems I have tried so far, it has resulted in a computational benefit, apparently because training on the smaller set of exemplars provides an informative gradient at much lower cost than is required to obtain a gradient over all of the available examples. The more the redundancy in the data, the more the computational benefit. Of course, more extensive testing is required (and in progress.) = Mark Plutowski From 72247.2225 at CompuServe.COM Mon Oct 21 23:05:00 1991 From: 72247.2225 at CompuServe.COM (Larry Fast) Date: 21 Oct 91 23:05:00 EDT Subject: Backprop Feedback Gain Message-ID: <911022030500_72247.2225_EHL25-1@CompuServe.COM> I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to compensate for the following problem: As Backprop passes the error back thru multiple layers, the gradient has a built in tendency to decay. At the output the maximum slope of the 1/( 1 + e(-sum)) activation function is 0.5. Each successive layer multiplies this slope by a maximum of 0.5. The maximum gains at various layers (where n is the output layer) is: max slope at layer n = 0.5 max slope at layer n-2 = 0.125 max slope at layer n-3 = 0.0625 max slope at layer n-4 = 0.03125 .... It has been suggested (by a couple of sources) that an attempt should be made to have each layer learn at the same rate. To this end, I'm installing a gain factor on error being backpropagated. The new error function is: errorPropGain * act * (1 - act) The nominal value that makes sense is 2 (or more). This would allow at least the maximum learning rate to propagate unattenuated. Has anyone else tried this, or any other method of flattening out the learning rate in deep layers. Any info regarding more recent releases of PDP or a users' group would also be helpful. Please respond directly to 72247.2225 at compuserve.com Thanks, Larry Fast From max.coltheart at mrc-apu.cam.ac.uk Mon Oct 21 23:04:38 1991 From: max.coltheart at mrc-apu.cam.ac.uk (max.coltheart@mrc-apu.cam.ac.uk) Date: Tue, 22 Oct 1991 11:04:38 +0800 Subject: redundancy and generalization Message-ID: <18650.9110221006@sirius.mrc-apu.cam.ac.uk> Consider the eight words PAT PAD CAT CAD POT POD COT COD. Give a net the task of translating these from letters to phonemes. Choose any subset of, say, four items as the training set and after training to asymptote test performance on the other four. Even with a training set that contains all the information needed for the test set (e.g. PAT POD CAT COD exemplifies every letter-phoneme pairing twice), the various architectures we have been trying score 0% on the generalization set (in this example, the net learns nothing about the third letter so in the generalisation test translates PAD as "pat", POT as "pod", COT as "cod" and CAD as "cat". Is this problem, trivial for rule-learning algorithms, insoluble for any system that learns by error-correction? Tom Dietterich writes: >Generally speaking, in noise-free domains, windowing works quite well. >A very high-performing decision tree can be learned with a relatively >small window. However, for noisy data, the general experience has >been that the window eventually grows to include the entire training set. >Jason Catlett (Sydney U) recently completed his dissertation on >testing windowing and various other related tricks on datasets of >roughly 100K examples (straight classification problems). I recommend >his papers and thesis. > >His main conclusion is that if you want high performance, you need to >look at all of the data. "The window eventually grows to include the entire training set" = "the system is incapable of generalizing accurately ". Note that noise isn't the problem. In my example, there's no noise, and no generalization Max Coltheart max.coltheart at mrc-apu.cam.ac.uk From ahg at eng.cam.ac.uk Tue Oct 22 05:20:21 1991 From: ahg at eng.cam.ac.uk (A.H. Gee) Date: Tue, 22 Oct 91 10:20:21 +0100 Subject: No subject Message-ID: <22398.9110220920@tw700.eng.cam.ac.uk> ************** PLEASE DO NOT FORWARD TO OTHER NEWSGOUPS **************** The following technical report has been placed in the neuroprose archives at Ohio State University: NEURAL NETWORKS AND COMBINATORIAL OPTIMIZATION PROBLEMS - THE KEY TO A SUCCESSFUL MAPPING Andrew Gee, Sreeram Aiyer and Richard Prager Technical Report CUED/F-INFENG/TR 77 Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England Abstract For several years now there has been much research interest in the use of Hopfield networks to solve combinatorial optimization problems. Although initial results were disappointing, it has since been demonstrated how modified network dynamics and better problem mapping can greatly improve the solution quality. The aim of this paper is to build on this progress by presenting a new analytical framework in which problem mappings can be evaluated without recourse to purely experimental means. A linearized analysis of the Hopfield network's dynamics forms the main theory of the paper, followed by a series of experiments in which some problem mappings are investigated in the context of these dynamics. In all cases the experimental results are compatible with the linearized theory, and observed weaknesses in the mappings are fully explained within the framework. What emerges is a largely analytical technique for evaluating candidate problem mappings, without having to resort to the more usual trial and error. ************************ How to obtain a copy ************************ a) Via FTP: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get gee.opt_map.ps.Z ftp> quit unix> uncompress gee.opt_map.ps.Z unix> lpr gee.opt_map.ps (or however you print PostScript) Please note that a couple of the figures in the paper were produced on an Apple Mac, and the resulting PostScript is not quite standard. People using an Apple LaserWriter should have no problems though. b) Via postal mail: Request a hardcopy from Andrew Gee, Speech Laboratory, Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England. or email me: ahg at eng.cam.ac.uk From dlb at ukc.ac.uk Wed Oct 23 08:10:16 1991 From: dlb at ukc.ac.uk (dlb@ukc.ac.uk) Date: Wed, 23 Oct 91 13:10:16 +0100 Subject: Research Fellowship (UK) Message-ID: Research Fellowship in Neural Networks: Investigation of Digitally Implemented Neural Networks Based on Novel Goal-Seeking Principles UNIVERSITY OF KENT AT CANTERBURY Electronic Engineering Laboratories Applications are invited for a Research Fellowship in the Electronic Engineering Laboratories at the University of Kent to work on an SERC-funded project on digitally implemented neural networks. The project, part of an on-going programme of work in neural networks, will investigate the properties and applications of novel artificial neural networks based on Boolean processing nodes and embodying local low-level goal-seeking principles. Applicants should have a good Honours degree in electronic engineering or computer science/engineering and should preferably hold a Ph.D. degree in an appropriate area. Applicants with previous experience in the field of neural networks or image analysis would be especially welcome. The Digital Systems Research Group in the Electronic Engineering Laboratories have a very strong research programme in computational architectures for pattern processing, with a particular emphasis on neural network architectures. Extensive facilities to support this work are available, including both central and in-house computing systems, and a dedicated workstation will be available for this project. Technician support will also be provided. The appointment is for a three year period and is available from 1st January 1992. The salary is on the scale 11969 - 14170 pounds. informal enquiries may be made to Dr. Michael Fairhurst or Dr. David Bisset on +44 227-764000, or by e-mail to dlb at ukc.ac.uk Further particulars and application forms are available from The Personnel Office, The University of Kent at Canterbury, Canterbury, Kent, CT2 7NZ, England, quoting reference A92/13. Telephone +44 227 475482 or 764000 x3915. The closing date is 1st November 1991. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Wed Oct 23 11:23:19 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Wed, 23 Oct 91 11:23:19 -0400 Subject: Continuous vs. Batch learning In-Reply-To: Your message of Tue, 22 Oct 91 16:17:52 -0800. <9110222317.AA22627@sanger.bio.uci.edu> Message-ID: It is pretty clear to me that biological neural networks have all adapted to prefer the continuous learning technique... Anyway, perhaps we should take an example from nature, which has been optimizing things far longer than we have! Sure, but with a totally different technology. Give me 10^9 processors, 10^13 active, complex connections, and 3-D packing, and make short-term memory scarce, slow, and unreliable, and I'd pick continuous learning as well. And it wouldn't even take me a billion years to make the decision. -- Scott Fahlman From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Wed Oct 23 14:13:31 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU) Date: Wed, 23 Oct 91 13:13:31 EST Subject: Paper Message-ID: Because this is the first time I place paper into neuroprose, I have brought lots of troubles to Jordan Pollack of Ohio State. We don't know whether it's due to my postscript file's problem (I generated the ps file on MacWrite II by pressing and holding the command key and the "F" or "K" key together before clicking the "OK" button in the print dialogue menu) or not, the ps file cannot be printed at Jordan's place. We retried it several times, and he still cannot see it after processing it. However, the ps file inside the Inbox can be traced from UNIX. So we decide to leave the paper inside the Inbox subdirectory and announce it with a caveat that it may not work. I am sorry for this delay and inconvenience, and I will be very glad to know more methods to generate ps files from MacWrite II which will have a good behavior at neuroprose archive. Thanks in advance. The procedure to get the ps file from the Inbox is as follows: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose/Inbox ftp> binary ftp> get ppnn.ps6 ftp> quit unix> lpr ppnn.ps6 I want to thank Jordan for his great help since last week. I appreciate very much his instructions and patience in retrying different versions of ps files I sent to him. Bo Xu Indiana University itgt500 at indycms.iupui.edu From steck at spock.wsu.ukans.edu Wed Oct 23 15:11:40 1991 From: steck at spock.wsu.ukans.edu (jim steck (ME) Date: Wed, 23 Oct 91 14:11:40 -0500 Subject: Batch Mode Parallel Implementations Message-ID: <9110231911.AA01043@spock.wsu.UKans.EDU> S. Kollias and D. Anastassiou presented an interesting approximate second order training algorithm using a Least Squares Estimation Technique at IJCNN 1988 (IEEE Transactions on Circuits and Systems vol 36 no. 8 ). This algorithm is interesting because it updates the weights with each training pair, but performs the update using information saved from all previous training pairs. The algorithm includes a parameter called a forgetting factor which causes information from the previous training pairs to slowly be discounted (or forgotten). This is essentially learning somewhere in between "batch learning" and "on line learning". As an appoximate second order method, it is somewhat computationally intensive; however, the method is easily and productively vectorized on parallel architectures. Jim Steck From wray at ptolemy.arc.nasa.gov Wed Oct 23 18:33:42 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Wed, 23 Oct 91 15:33:42 PDT Subject: Paper Announcement (Neuroprose) In-Reply-To: Barak Pearlmutter's message of Mon, 21 Oct 91 13:41:35 -0400 <9110211741.AA03347@james.psych.yale.edu> Message-ID: <9110232233.AA17716@ptolemy.arc.nasa.gov> > Simplifying Neural Network > Soft Weight-Sharing Measures > by > Soft Weight-Measure > Soft Weight Sharing > > Barak Pearlmutter > Department of Psychology > P.O. Box 11A Yale Station I enjoyed this take-off immensely. Determining good regularisers (or priors) is a major problem facing feed-forward network research (and related representations), so I also enjoyed the original Nowlan-Hinton paper. Dramatic performance improvements can be got by careful choice of regulariser/prior (I know this from my tree research), and its a bit of a black art right now, though I have some good directions. Nowlan & Hinton suggest a strong theoretical basis exists for their approach (see their section 8), so perhaps we'll see more of this style, and "cleaner" versions to keep the theoreticians happy. By the way, at CLNL in Berkeley in August I expressed the view that this problem: i.e. Regularizers ------------ for a given network/activation-function configuration, what are suitable parameterised families of regularizes, and how might the parameters be set from the knowledge of the particular application being addressed NB. the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's equation (1) has several fairly elegant and practical solutions along with: Training -------- decision-theoretic/bounded-rationality approaches to batch vs. block (sub-batch) vs. pattern updates during gradient descent (i.e. of back-prop.) (i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion, or the batch update vs. stochastic update problem) and subsequent addition of second-order gradient methods as two of the most pressing problems to make feed-forward networks a "mature" technology that will then supercede many earlier non-neural methods. Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 244-17 fax: (415) 604 6997 Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov PS.thanks also to Martin Moller for adding some meat to the Training problem: > An interesting observation is that the number of blocks needed > to make an update is growing during learning so that after a certain > number of epochs the blocksize is equal to the number of patterns. > When this happens the algorithm is equal to a traditional batch-mode > algorithm and no validation is needed anymore. When explaining batch update vs. stochastic update to people, I always use this behaviour as an example of what a decision-theoretic training scheme **should** do, so I'm glad you've confirmed it experimentally. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Wed Oct 23 20:46:29 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU) Date: Wed, 23 Oct 91 19:46:29 EST Subject: Paper Message-ID: A moment ago I received a message from Jordan telling me that he can see the ppnn.ps6 file now and he has put it into neuroprose subdirectory named xu.ppnn.ps.Z. I am very glad to hear this news and also sorry for possible inconvenience to you. Please don't follow the procedure for ppnn.ps6 in Inbox (ppnn.ps6 may not be there anymore). Instead, following is the procedure to get the paper "PPNN: A Faster Learning and Better Generalizing Neural Net": unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get xu.ppnn.ps.Z ftp> quit unix> uncompress xu.ppnn.ps.Z unix> lpr xu.ppnn.ps (or however you print postscript) Thanks to Jordan again for his continuing efforts. Bo Xu Indiana University itgt500 at indycms.iupui.edu From karit at spine.hut.fi Thu Oct 24 05:53:53 1991 From: karit at spine.hut.fi (Kari Torkkola) Date: Thu, 24 Oct 91 11:53:53 +0200 Subject: Speech recognition research job in Switzerland (REPOST) Message-ID: <9110240953.AA01337@spine.hut.fi.hut.fi> ---------------------------------------------------------------------------- RESEARCH POSITIONS AVAILABLE IN SPEECH PROCESSING (repost) The newly created "Institut Dalle Molle d'Intelligence Artificielle Perceptive" (IDIAP) in Martigny, Switzerland seeks to hire qualified researchers in the area of automatic speech recognition. Candidates should be able to conduct independent research in a UNIX environment on the basis of solid theoretical and applied knowledge. Salaries will be aligned with those offered by the Swiss government for equivalent positions. Researchers are expected to begin activity in the beginning of 1992. IDIAP is supported by the Dalle Molle Foundation along with public-sector partners at the local and federal levels (in Switzerland). IDIAP is the third institute of artificial intelligence supported by the Dalle Molle Foundation, the others being ISSCO (attached to the University of Geneva) and IDSIA (situated in Lugano). The new institute maintains close contact with these latter centers as well as with the Polytechnical School of Lausanne and the University of Geneva. Applications for a research position at IDIAP should include the following elements: - a curriculum vitae - sample publications or technical reports - a brief description of the research programme that the candidate wishes to pursue - a list of personal references. Applications are due by December 1, 1991 and may be sent to the address below: Daniel Osherson IDIAP Case Postale 609 CH-1920 Martigny SWITZERLAND For further information by e-mail, contact: osherson at idiap.ch (Daniel Osherson, director) or karit at idiap.ch (Kari Torkkola, researcher) Please use the latter email address only for inquiries concerning speech recognition research. From prechelt at ira.uka.de Thu Oct 24 11:16:36 1991 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Thu, 24 Oct 91 16:16:36 +0100 Subject: Terminology (was: batch-mode parallel implementations) Message-ID: I noticed a lot of inconsistent use of terminology concerning the frequency of weight update in Backprop learning. I would like to make a suggestion for the meaning of certain terms, that is not based on the democratic aspect of what is used most often, but on investigations in a dictionary: There are three cases: (a) update after only ONE single example has been seen (b) update after ALL of the examples have been seen (c) something in between The terms used are epoch, block, batch, sample, continuous, on-line. An EPOCH is (thus saith my dictionary) not only a section of time or history (an "era"), but also a turning point. This should make EPOCH the preferred term for case (b), because the end of the training set clearly is such a turning point. A BATCH is a set of some size, a pile of things or so; with some inherent need for the information about its size. Thus it is a good candidate for case (c) and there should always be some indication of the size either as an absolute number, as a fraction of training set size or by some qualitative criterion. BLOCK could be a perhaps even better word for the same, for computer scientists, because blocks are always groups of a certain number of similar objects and the word does not have the danger of misunderstanding that stems from the term "batch-processing" from the early days of data processing, where everything was being executed completely, before you received the results. Unfortunately, for reasons of other connotations, confusion of Block with Epoch is nevertheless very likely. A SAMPLE is a part picked from a whole, usually for test purposes. Although it is not absolutely clear, that a sample is just a single object, in my ears the word tends to sound so. Thus it should be indicating case (a). CONTINOUS is a bad term to use, because the individual examples are not cut into parts, so BP is always discrete. ON-LINE usually means something like "available without physical action, merely by execution of software" and is of course completely inappropriate to learning, except perhaps where there is an infinite training set constantly floating through the machine. SUMMARY: -------- Let us use 'Epoch' for (b), 'Batch' for (c) and 'Sample' for (a). Let us avoid 'continous', 'on-line' and 'block' as much as possible. I think as scientists we should exercise some discipline in the use of language, especially when confusion is as close as in the area of learning systems... :-> Please direct all comments and flames to me. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4317, FAX: ++49/721/697760) | less simple. From oden at herky.cs.uiowa.edu Thu Oct 24 12:11:12 1991 From: oden at herky.cs.uiowa.edu (Gregg Oden) Date: Thu, 24 Oct 91 11:11:12 -0500 Subject: Batch mode in nature? Message-ID: <9110241611.AA26933@herky.cs.uiowa.edu> Steve Potter asks > I cant think of any biological examples of batch learning, in which > sensory data are saved until a certain number of them can be somehow > averaged together and conclusions made and remembered. Any ideas? If by 'sensory data' you mean the most peripheral, unanalyzed input representations, then probably not. Otherwise, yes: it has been a long-term recurring theme in the psychological literature on the development of concepts that exemplars are remembered with a great deal of specific detail until a sufficient corpus of them have been acquired to support the abstraction of a general concept. (Subsequently, idiosyncratic details may be lost/suppressed through assimilation to the encompassing category.) This notion is supported by the intuitive experience of reflective recognition of regularities; i. e., insight. In recent years, it has also gained empirical support from experimental work, most notably by Lee Brooks and his colleagues. Some of this was briefly discussed in my chapter in the Annual Review of Psychology, 1987. (See also Oden & Lopes, "On the internal structure of fuzzy subjective categories" in Recent Developments in Fuzzy Set and Possibility Theory, R. Yager, ed., 1982.) Gregg Oden Psychology & Computer Science U. of Iowa From huyser at mithril.stanford.edu Thu Oct 24 18:27:42 1991 From: huyser at mithril.stanford.edu (Karen Huyser) Date: Thu, 24 Oct 91 15:27:42 PDT Subject: learning and memory Message-ID: <9110242227.AA27923@mithril.stanford.edu> It seems to me people are confusing very different things in the recent discussion of learning (one-shot, generalization, etc). A posting from Ross Gayler quotes Ernst Dow as saying (in the context of one-shot learning): > You may be able to identify the painting you saw before, but could you > make the leap to recognizing all other abstract paintings? To have the experience of seeing a painting and to be able to recall the memory of the experience is one kind of learning and memory. To be told by someone that the painting is of a type called "abstract" is to add a category label, another kind of learning and memory. However, to recognize another painting as abstract or imitate the painting style one must form a sufficiently rich concept to be able to make a category with the label "abstract" and the original painting as one member of the class. For most humans, this involves questions, insightful answers, and many more examples of paintings. As a completely separate conceptual skill, consider the learning and concept-formation task that goes on while doing research. How does it come about that one day we look at a set of phenomena in a new way, with new concepts and categories? There are many different skills that appear under the labels "learning" and "memory". Karen Huyser huyser at mojave.stanford.edu From bill at nsma.arizona.edu Thu Oct 24 23:04:21 1991 From: bill at nsma.arizona.edu (Bill Skaggs) Date: Thu, 24 Oct 91 20:04:21 MST Subject: Continuous vs. Batch learning Message-ID: <9110250304.AA07667@nsma.arizona.edu> >It is pretty clear to me that biological neural networks have all adapted >to prefer the continuous learning technique, as we can verify for humans >by remembering something that we only saw (or heard, etc.) once. One-trial >learning paradigms abound in the behavioral literature. I cant think of >any biological examples of batch learning, in which sensory data are >saved until a certain number of them can be somehow averaged together >and conclusions made and remembered. Any ideas? David Marr's theory of the hippocampus proposed that it (the hippocampus) is an intermediate-term memory storage device, performing one-shot learning of experiences and then holding them for a period of days or weeks until they can be evaluated for significance and then gradually moved into the neocortex for permanent storage. In my humble opinion this is still the best available theory of what the hippocampus does. Some of the details have changed, but the basic idea still makes sense. Patrick Lynn has recently been exploring a more abstract version of Marr's idea, using a "buffer" of example patterns to train a recurrent back-prop net, with new patterns going into the buffer, hanging around for a while, then dropping out. He has found that under certain conditions buffering gives better performance than learning each pattern only when it is presented. (Reference: "Simple memory: a theory for archicortex." D. Marr, 1971, Phil Trans Roy Soc B 262: 23-81.) -- Bill Skaggs From gary at cs.UCSD.EDU Fri Oct 25 21:59:28 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Fri, 25 Oct 91 18:59:28 PDT Subject: Seminar abstract: The Sanguine Algorithm Message-ID: <9110260159.AA09259@desi.ucsd.edu> SEMINAR New approaches to learning in Connectionist Networks Garrison W. Cottrell Richard K. Belew Institute for Neural Declamation Condominium Community College of Southern California Previous approaches to learning in recurrent networks often involve batch learning: A large amount of effort is expended in deciding which way to move in weight space, then a little step is taken. We propose a new algorithm for learning in large networks which is orders of magnitude more efficient than batch learning. Based on the realization that many nearby points in weight space are worse than where we are now, we propose the sanguine algorithm. The basic idea is to become more happy with where we are, rather than going to all the work of moving. Hence the approach is quite simple: Randomly sample a nearby point in weight space. Compute the error functional based on that point. If it is better than the current point, repeat until we find a nearby point that is worse. Now, here's the real trick: Once we find a point worse off than where we are now, we stay where we are and increment a "happiness function". That is, we search until we find a place that we can "look down on" in weight space[1]. Now, in order to remain happy with where we are may involve a certain amount of minor work to keep this point in weight space looking good. For example, we could change the error functional until this point looks better than most other points we find. Towards this end, we can apply recent techniques (Nowlan & Hinton, 1991) to make the error functional soft and flabby. Then we can stretch the error any way we like. This approach can also be extended to replace computationally expensive "weight-sharing" techniques. If we make the weights soft and flabby, then lifting them becomes much easier since part of the weight always remains on the ground, and sharing the burden of large weights becomes unnecessary. Note that this can be done completely locally. We have applied this novel learning procedure to the problem of time series prediction. Using the Mackey-Glass equations with dimension 3.5, we give the network values at 0, 6, 12, and 18 time units back in time to predict the value of the time series 6 time units into the future. Using the Sanguine Algorithm, a network with only two hidden units rapidly converges to a soft error functional. Of course, the network has no idea of what value will come next; however, the happiness function shows it is quite blissful in its ignorance. We propose that this technique will have wide application in Republican approaches to government. ____________________ [1]Thus the pet name for our algorithm is the "Nyah Nyah Algo- rithm". From steck at spock.wsu.ukans.edu Sat Oct 26 13:49:10 1991 From: steck at spock.wsu.ukans.edu (jim steck (ME) Date: Sat, 26 Oct 91 12:49:10 -0500 Subject: Batch Learning and Parallel Implementation Message-ID: <9110261749.AA04481@spock.wsu.UKans.EDU> Regarding Parallel implementations of Batch and online learning.... S. Kollias and D. Anastassiou presented an interesting approximate second order training algorithm using a Least Squares Estimation Technique at IJCNN 1988 (IEEE Transactions on Circuits and Systems vol 36 no. 8 ). This algorithm is interesting because it updates the weights with each training pair, but performs the update using information saved from all previous training pairs. The algorithm includes a parameter called a forgetting factor which causes information from the previous training pairs to slowly be discounted (or forgotten). This is basically a type of learning somewhere inbetween "batch" learning and "on line" learning. As an appoximate second order method, it is somewhat computationally intensive; however, the method is easily and productively vectorized on parallel architectures. Jim Steck Wichita State University From todd at galadriel.stanford.edu Fri Oct 25 17:50:47 1991 From: todd at galadriel.stanford.edu (todd@galadriel.stanford.edu) Date: Fri, 25 Oct 91 14:50:47 PDT Subject: MUSIC AND CONNECTIONISM Book Announcement Message-ID: <9110252150.AA02708@galadriel.stanford.edu> BOOK ANNOUNCEMENT: MUSIC AND CONNECTIONISM edited by Peter M. Todd and D. Gareth Loy MUSIC AND CONNECTIONISM is now available from MIT Press. This 280-pp. book contains a wide variety of recent research in the applications of neural networks and other connectionist methods to the problems of musical listening and understanding, performance, composition, and aesthetics. It consists of a core of articles that originally appeared in the Computer Music Journal, along with several new articles by Kohonen, Mozer, Bharucha, and others, and new addenda to the original articles describing the authors' most recent work. Topics covered range from models of psychological processing of pitches, chords, and melodies, to algorithmic composition and performance factors. A wide variety of connectionist models are employed as well, including back-propagation in time, Kohonen feature maps, ART networks, and Jordan- and Elman-style networks. We've also included a discussion generated by the Computer Music Journal articles on the use and place of connectionist systems in artistic endeavors. A more detailed description of the book is provided below (from the jacket text), along with the complete table of contents. We hope this book will be of use to a wide variety of readers, including neural network researchers interested in a broad, challenging, and fun new area of application, cognitive scientists and music psychologists looking for robust new models of musical behavior, and artists seeking to learn more about a potentially very useful technology. MUSIC AND CONNECTIONISM can be found in bookstores that carry MIT Press publications, or can be purchased directly from MIT Press by calling their toll-free order number, 1-800-356-0343, and giving the operator this catalog number: 1CSAT 503, and this book code: TODMH. By phone and mail-order, the price is $39.95; in stores, it will probably be $45 (there is some confusion with the publisher on this point, so I wanted to give out the detailed information for phone orders to save people some money). Please drop me a line if you have any questions, and especially if you take up the gauntlet and pursue research or applications in this area! cheers, peter todd ***************************************************************************** Music and Connectionism edited by Peter M. Todd and D. Gareth Loy As one of our highest expressions of thought and creativity, music has always been a difficult realm to capture, model, and understand. The connectionist paradigm, now beginning to provide insights into many realms of human behavior, offers a new and unified viewpoint from which to investigate the subtleties of musical experience. \fIMusic and Connectionism\fP provides a fresh approach to both fields, using techniques of connectionism and parallel distributed processing to look at a wide range of topics in music research, from pitch perception to chord fingering to composition. The contributors, leading researchers in both music psychology and neural networks, address the challenges and opportunities of musical applications of network models. The result is a current and thorough survey that advances our understanding of musical perception, cognition, composition, and performance and of the design and analysis of networks. Music and Connectionism is based on a core of articles originally appearing as two special issues of the Computer Music Journal. These have been augmented with addenda covering more recent research by the authors. The book opens with tutorial chapters introducing neural networks in a musical context and relevant aspects of previous computer music research, making this a self-contained text. There are many new chapters, along with new section introductions, summaries of related work, and a final debate on the artistic implications of connectionist methods. Peter M. Todd is a doctoral candidate in the PDP Research Group of the Psychology Department at Stanford University. Gareth Loy DMA is an award-winning composer, member of the Board of Directors of the Computer Music Association, lecturer in the Music Department of UC San Diego, and member of the technical staff of Frox Inc. Contents: Preface and Introduction Peter M. Todd and D. Gareth Loy Part 1: Background Machine Tongues XII: Neural Networks Mark Dolson Connectionism and Musiconomy D. Gareth Loy Part 2: Perception and Cognition A Neural Net Model for Pitch Perception Hajime Sano and B. Keith Jenkins Connectionist Models for Tonal Analysis Don L. Scarborough, Ben O. Miller, and Jacqueline A. Jones The Representation of Pitch in a Neural Net Model of Chord Classification Bernice Laden and Douglas H. Keefe Pitch, Harmony, and Neural Nets: A Psychological Perspective Jamshed J. Bharucha The Ontogenesis of Tonal Semantics: Results of a Computer Study Marc Leman Modeling the Perception of Tonal Structure with Neural Nets Jamshed J. Bharucha and Peter M. Todd Using Connectionist Models to Explore Complex Musical Patterns Robert O. Gjerdingen The Quantization of Musical Time: A Connectionist Approach Peter Desain and Henkjan Honing Part 3: Applications A Connectionist Approach to Algorithmic Composition Peter M. Todd Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints Michael C. Mozer Creation By Refinement and the Problem of Algorithmic Music Composition J.P. Lewis A Nonheuristic Automatic Composing Method Teuvo Kohonen, Pauli Laine, Kalev Tiits, and Kari Torkkola Fingering for String Instruments with the Optimum Path Paradigm Samir I. Sayegh Part 4: Conclusions Letter from Otto Laske Responses to Laske by Todd and Loy Further Research and Directions Peter M. Todd List of Author Addresses From white at teetot.acusd.edu Fri Oct 25 19:49:14 1991 From: white at teetot.acusd.edu (Ray White) Date: Fri, 25 Oct 91 16:49:14 -0700 Subject: No subject Message-ID: <9110252349.AA27577@teetot.acusd.edu> Larry Fast writes: > I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to > compensate for the following problem: > As Backprop passes the error back thru multiple layers, the gradient has > a built in tendency to decay. At the output the maximum slope of > the 1/( 1 + e(-sum)) activation function is 0.5. > Each successive layer multiplies this slope by a maximum of 0.5. ..... > It has been suggested (by a couple of sources) that an attempt should be > made to have each layer learn at the same rate. ... > The new error function is: errorPropGain * act * (1 - act) This suggests to me that we are too strongly wedded to precisely f(sum) = 1/( 1 + e(-sum)) as the squashing function. That function certainly does have a maximum slope of 0.25. A nice way to increase that maximum slope is to choose a slightly different squashing function. For example f(sum) = 1/( 1 + e(-4*sum)) would fill the bill, or if you'd rather have your output run from -1 to +1, then tanh(sum) would work. I think that such changes in the squashing function should automatically improve the maximum-slope situation, essentially by doing the "errorPropGain" bookkeeping for you. Such solutions are static fixes. I suggested a dynamic adjustment of the learning parameter for recurrent backprop at IJCNN - 90 in San Diego (The Learning Rate in Back-Propagation Systems: an Application of Newton's Method, IJCNN 90, vol I, p 679). The method amounts to dividing the learning rate parameter by the square of the gradient of the output function (subject to an empirical minimum divisor). One should be able to do something similar with feedforward systems, perhaps on a layer by layer basis. - Ray White (white at teetot.acusd.edu) Please respond directly to 72247.2225 at compuserve.com Thanks, Larry Fast From BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU Sun Oct 27 14:17:00 1991 From: BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU (BUTUROVIC%BUEF78%yubgef51.bitnet@BITNET.CC.CMU.EDU) Date: Sun, 27 Oct 1991 21:17 +0200 Subject: forward propagation Message-ID: <2B147310A0000F63@yubgef51.bitnet> I am interested in training multi-layer perceptron without using back-propagation (BP) of the error. MLP training by means of the back-propagation (BP) algorithm is in fact minimization of the criterion function using the ordinary gradient-descent minimization algorithm. For this, the computation of derivatives is necessary. Now, it is off course possible to optimize multi-variable function without computation of derivatives. One of effective algorithms of this type is simplex algorithm [1], so it seems logical to utilize it for MLP training. There are two advantages in avoiding derivatives: first, transfer functions of the individual neurons may be non-differentiable. Second, BP utilizes a criterion function that must be written in the form of the average squared difference between target and actual outputs (there are variants to this, but, for the purpose of this discussion, they vary insignificantly), and the derivative of this function with respect to the weights must be computable. Using simplex, i. e. not using derivatives, this limitation can be avoided, as long as the function to be minimized can be measured. This can be important for applications in control where we are sometimes not able to express criterion function as a function of network parameters. There is one serious limitation regarding this algorithm, and it is spatial complexity. It requires roughly N*N memory locations, where N is the number of variables (network weights). In practice, this limits the size of the network to a couple of thousands of weights. In order to verify the behavior of the algorithm, I performed extensive experiments with Ljubomir Citkusev of the Boston University. We trained MLP to perform classification tasks on three data sets. In short, the results obtained indicate that training of the network using simplex can be done successfully. However, BP is more effective, regarding both classification accuracy (i. e., function approximation accuracy), and computational complexity (number of iterations). We didn't yet verify the ability of the algorithm to train networks with non-differentiable transfer functions or criterion functions that can not be computed analitically. It is puzzling that in [2] Minsky and Papert claimed the training of the perceptrons with hidden layers to be impossible, while at that time (1969.) there was available effective algorithm for precisely that task. While BP was shown to be superior in our experiments, they could have done some quite satisfactory training of the multi-layer networks when they wrote the book. I tried to talk to Minsky about this, but I couldn't do it. I would like to hear people's opinion on this idea. Also, it would be beneficial to know if anyone is aware of similar work. Thanks, Ljubomir Buturovic, University of Belgrade References [1] Nelder, J. A., and Mead, R. 1965, Computer Journal, vol. 7, p. 308. [2] M. Minsky, and S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, 1969. From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET Mon Oct 28 09:49:58 1991 From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya) Date: Mon, 28 Oct 91 09:49:58 JST Subject: Robinson's vowel dataset Message-ID: <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> Does anyone have any NEW results on Robinson's vowel dataset. I am aware of the original results given in his thesis: A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ 1989. Please send me mail, thanks Nitin Indurkhya (nitin at crl.hitachi.co.jp) From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Oct 28 00:10:20 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 28 Oct 91 00:10:20 EST Subject: Announcement of NIPS Workshop Message-ID: The Neural Information Processing Systems Conference will be followed by a program of workshops in Vail, Colorado on December 6 and 7, 1991. The following one-day workshop will be offered on December 6: Constructive and Destructive Learning Algorithms Workshop Leader: Scott E. Fahlman School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Internet: fahlman at cs.cmu.edu Most existing neural network learning algorithms work by adjusting connection weights in a fixed network. Recently we have seen the emergence of new learning algorithms that alter the network's topology as they learn. Some of these algorithms start with excess connections and remove any that are not needed; others start with a sparse network and add hidden units as needed, sometimes in multiple layers; some algorithms do both. These algorithms eliminate the need to guess in advance what network topology will best fit a given problem. In addition, some of these algorithms claim significant improvements in learning speed and generalization. A successful two-day workshop on this topic was presented at the NIPS-90 conference. A number of algorithms were presented by their authors and were critically evaluated. The past year has seen a great deal of additional work in this area, so a second workshop on this topic seems appropriate. We will briefly review the major algorithms presented last year. Then we will turn to more recent developments, including both new algorithms and experience gained in using the older ones. Finally, we will consider current trends and will try to identify open problems for future research. I would like to hear from people who are interested in presenting new algorithms or results at this workshop. I would particularly like to hear from people with application results or comparative studies using algorithms of this kind. The tentative plan, depending on the response we get, is allow 15-20 minutes for each presentation, with ample time for discussion. If you would like to present something, please send a short description to Scott Fahlman, at the internet address listed above. For Cascade-Correlation fans, I will be presenting a new variation called "Cascade 2" that performs better than the original in a number of situations, especially in problems with continuous analog outputs. From tesauro at watson.ibm.com Mon Oct 28 11:41:58 1991 From: tesauro at watson.ibm.com (Gerald Tesauro) Date: Mon, 28 Oct 91 11:41:58 EST Subject: Program information: NIPS91 Workshops Message-ID: The NIPS91 post-conference workshops will take place Dec. 5-7, 1991, at the Marriott Mark Resort Hotel in Vail, Colorado. The following message gives information on the program schedule and local arrangements, and is organized as follows: I. Summary schedule II. Workshop schedule III. Arrangements information IV. Workshop abstracts I. Summary Schedule: Thursday, Dec. 5th 5:00 pm Registration Open 7:00 pm Orientation Meeting 8:00 pm Reception Friday, Dec. 6th 7:00 am Breakfast 7:30 - 9:30 am Workshop Sessions 4:30 - 6:30 pm Workshop Sessions 7:00 pm Banquet Saturday, Dec. 7th 7:00 am Breakfast 7:30 - 9:30 am Workshop Sessions 4:30 - 6:30 pm Workshop Sessions 6:30 - 7:00 pm Wrap-up 7:30 pm Barbecue Dinner (optional) II. Workshop schedule: Friday, Dec. 6th: Character recognition Projection pursuit and neural networks Constructive and destructive learning algorithms II Modularity in connectionist models of cognition VLSI neural networks and neurocomputers (1st day) Recurrent networks: theory and applications (1st day) Active learning and control (1st day) Self-organization and unsupervised learning in vision (1st day) Developments in Bayesian methods for neural networks (1st day) Saturday, Dec. 7th: Oscillations and correlations in neural information processing Optimization of neural network architectures for speech recognition Genetic algorithms and neural networks Complexity issues in neural computation and learning Computer vision vs. network vision VLSI neural networks and neurocomputers (2nd day) Recurrent networks: theory and applications (2nd day) Active learning and control (2nd day) Self-organization and unsupervised learning in vision (2nd day) Developments in Bayesian methods for neural networks (2nd day) III. Arrangements information: Accomodations: The conference sessions will be held in the banquet area at Marriott Mark Resort, at Vail CO, 90 miles west of Denver. For accomodations, call the Mariott at (303)-476-4444. Our room rate is $74 (single or double). Condos for larger groups can be arranged through Destination Resorts, at (303)-476-1350. Registration: Registration fee for the workshops is $100 ($50 for students). Transportation: CME (Colorado Mountain Express) will be running special shuttles from the Sheraton in Denver up to the Marriott in Vail Thursday afternoon at a price of $31.00 per person. Call them at 1-800- 525-6363, at least 24 hours in advance, to reserve and give a credit card number for prepayment. CME also runs shuttles down from Vail to the Denver airport, same price, on Sunday at many convenient times. The earlier you call CME, the more vans will be made available for our use. Be sure to mention our special group code "NIPS". Hertz has a desk in the Sheraton, and will rent cars at a weekend rate for the trip up to Vail and back to the airport in Denver. This is an unlimited mileage rate; prices start at $60 (three days, plus tax). To make reservations call the Sheraton at 1-800-552-7030 and ask for Kevin Kline at the Hertz desk. Skiing: Skiing at Vail can be expensive. The lift tickets this year were slated to rise to $40 per day. The conference has negotiated very attractive group rates for tickets bought in advance: $56 for a 2-day ticket $84 for a 3-day ticket $108 for a 4-day ticket You can purchase these by sending a check to the conference registration office: NIPS*91 Registration, Siemens Research Center, 755 College Road East, Princeton, NJ 08540. The tickets will be printed for us, and available when we get to Vail on Thursday evening. There are several sources for rental boots and skis in Vail. The rental shop at the lifts and Banner Sports (located in the Marriott) are offering the following packages to those who identify themselves as NIPS attendees: skis, boots, poles skis, poles standard package $ 8 / day $6 / day performance package $ 11 / day $9 / day Banner will, as extra incentives, stay open for us after the Thursday orientation meeting, and give a 10% discount on anything else in the store. Optional Gourmet barbecue dinner(!): Finally, besides the conference banquet, included in the registration fee, there will be an optional dinner on Saturday night at Booco's Station, a few miles outside of Vail and world famous for its barbecued meats and special sauces. Dinner will include transportation (if you need it), appetizers, all-you-can-eat barbecue, cornbread, vegetables, dessert, and more than 40 kinds of beer at the cash bar. Tickets will be on sale at the Sheraton and at the Marriott. Price: $27. IV. Workshop Abstracts: ========================================================================= Modularity in Connectionist Models of Cognition Organizer: Jordan Pollack, Ohio State Univ. Speakers: Michael Mozer, Univ of Colorado Robert Jacobs, MIT John Barnden, New Mexico State University Rik Belew, UCSD Abstract: Classical modular theories of mind presume mental "organs" - function specific, put in place by evolution - which communicate in a symbolic language of thought. In the 1980's, Connectionists radically rejected this view in favor of more integrated architectures, uniform learning systems which would be very tightly coupled and communicate through many feedforward and feedback connections. However, as connectionist attempts at cognitive modeling have gotten more ambitious, ad-hoc modular structuring has become more prevalent. But there are concerns regarding how much architectural bias is allowable. There has been a flurry of work on resolving these concerns by seeking the principles by which modularity could arise in connectionist architectures. This will involve solving several major problems - data decomposition, structural credit assignment, and shared adaptive representations. This workshop will bring together proponents of modular connectionist architectures to discuss research direction, recent progress, and long-term challenges. ========================================================================= Character Recognition Organizers: C. L. Wilson and M. D. Garris, National Institute of Standards and Technology Speakers: Jon Hull, SUNY Buffalo Tom Vogl, ERIM Jim Keeler, MCC Chris Schofield, Nestor C. L. Wilson, NIST R. G. Casey, IBM Abstract: This workshop will consider issues related to present and future testing needs for character recognition including: 1) What is user experience in using the NIST and other publicly available databases? 2) What types of databases will be required in the future? 3) What are future testing needs, such as x-y coordinate stream or gray level data? 4) How can the evaluation of current research problems, such as segmentation, be enhanced through carefully designed databases, standard testing procedures, and automated evaluation methodologies. 5) Is the incorporation of context important in testing? 6) What other issues face the research and development of large scale recognition systems? The target audience includes those interested in and/or working on hand print recognition and developers who wish to include character recognition as part of systems to recognize documents. ========================================================================= Genetic Algorithms and Neural Networks Organizer: Rik Belew, Univ. of Calif. at San Diego Speakers: Rik Belew and Dave Rogers Abstract: This workshop will examine theoretical and algorithmic interactions between GA and NNet techniques, as well as models of the evolutionary constraints on nervous systems. Specific topics include: 1) Comparison and composition of global GA sampling techniques with the local (gradient) search of NNet methods. 2) Use of the GA to evolve additional higher-order function approximation terms (``hidden units''). 3) The dis/advantages of GA recombination and its impact on appropriate representations for NNets. 4) Trade-offs between NNet training time and GA generational time. 5) Parallel implementations of GAs that facilitate NNet simulation. 6) A role for ontogenesis between GA evolution and NNet learning. 7) The role optimality (doesn't!) play in evolution ========================================================================= Projection Pursuit and Neural Networks Organizers: Ying Zhao, Chris Atkeson and Peter Huber, MIT Speakers: R.Douglas Martin, University of Washington John Moody, Yale University Ying Zhao, MIT Andrew R. Barron, University of Illinois Nathan Intrator, Brown University Trevor Hastie, Bell Labs Abstract: Projection Pursuit is a nonparametric statistical technique to find "interesting" low dimensional projections of high dimensional data sets. We hope to improve our understanding of neural networks and projection pursuit by discussing issues such as fast training algorithms based on PP, duality with kernel approximation, possible avoidance of the "curse of dimensionality", and the sample complexity for PP. ========================================================================= Constructive and Destructive Learning Algorithms II Organizer: Scott E. Fahlman, Carnegie Mellon University Speakers: TBA Abstract: Recently we have seen the emergence of new learning algorithms that alter the network's topology. Some of these algorithms start with excess connections and remove any that are not needed; others start with a sparse network and add hidden units as needed, sometimes in multiple layers; some algorithms do both. In a two-day workshop on this topic at NIPS-90, a number of learning algorithms that modify network topology were presented by their authors and were critically evaluated. The past year has seen a great deal of additional work in this area. We will briefly review the major algorithms presented last year. Then we will turn to more recent developments, including both new algorithms and experience gained in using the older ones. Finally, we will consider current trends and will try to identify open problems for future research. ========================================================================= Oscillations and Correlations in Neural Information Processing Organizer: Ernst Niebur, Caltech Speakers: Bard Ermentrout, U. of Pittsburgh Hennric Jokeit, U. of Munich Marius Usher, Weizmann Institute Ernst Niebur, Caltech Abstract: This workshop will address models proposed for tasks like tieing together the different parts of one object in the visual field or for binding the different representations of an object in different cortical areas. Both oscillation-based models as well as alternative models based on phase coherence (correlations) will be considered in the light of the latest experimental findings. ========================================================================= Optimization of Neural Network Architectures for Speech Recognition Organizers: Uli Bodenhausen, Universitaet Karlsruhe Alex Waibel, Carnegie Mellon University Speakers: Kenichi Iso, NEC Corporation, Japan Patrich Haffner, CNET, France Mike Franzini, Telefonica I + D, Spain Abstract: A variety of neural network algorithms have recently been applied to speech recognition tasks. Besides having learning algorithms for weights, optimization of the network architectures is required to achieve good performance. Also of critical importance is the optimization of neural network architectures within hybrid systems for best performance of the system as a whole. Parameters that have to be optimized within these constraints include the number of hidden units, number of hidden layers, time-delays, connectivity within the network, input windows, the number of network modules, number of states and others. The proposed workshop intends to discuss and evaluate the importance of these architectural parameters and different integration strategies for speech recognition systems. Participating researchers interested in speech recognition are welcome to present short case studies on the optimization of neural networks, preferably with an evaluation of the optimization steps. The workshop could also be of interest to researchers working on constructive/destructive learning algorithms because the relevance of different architectural parameters should be considered for the design of these algorithms. ========================================================================= SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION Organizer: Jonathan A. Marshall, Univ. of North Carolina Speakers: Suzanna Becker, University of Toronto Irving Biederman, University of Southern California Thomas H. Brown, Yale University Joachim M. Buhmann, Lawrence Livermore National Laboratory Heinrich Bulthoff, Brown University Edward Callaway, Duke University Allan Dobbins, McGill University Gillian Einstein, Duke University Charles Gilbert, The Rockefeller Universty John E. Hummel, UCLA Daniel Kersten, University of Minnesota David Knill, University of Minnesota Laurence T. Maloney, New York University Jonathan A. Marshall, University of North Carolina at Chapel Hill Paul Munro, University of Pittsburgh Albert L. Nigrin, American University Alice O'Toole, The University of Texas at Dallas Jurgen Schmidhuber, University of Colorado Nicol Schraudolph, University of California at San Diego Michael P. Stryker, University of California at San Francisco Patrick Thomas, Technische Universitat Muenchen Rich Zemel, University of Toronto Abstract: This workshop considers the role that unsupervised learning procedures (e.g. Hebb-type rules) may play in the self-organization of cortical structures involved in the processing of visual information. Researchers in visual neuroscience, visual psychophysics and neural network modeling will be brought together to address head-on the key issue of how animal visual systems got the way they are. We hope that this will lead to a better understanding of the factors that shape the structure of animal visual systems, as well as better models of the neurophysiological processes underlying vision. ========================================================================= Developments in Bayesian methods for neural networks Organizers: David MacKay, Caltech Steve Nowlan, Salk Institute Abstract: The first day of this workshop will be 50% tutorial in content, reviewing some new ways Bayesian methods may be applied to neural networks. The rest of the workshop will be devoted to discussions of the frontiers and challenges facing Bayesian work in neural networks, including issues such as Monte Carlo clustering, data selection, active query learning, prediction of generalisation, missing inputs, unlabelled data and discriminative training, Discussion will be moderated by John Bridle. Speakers: Radford Neal Jurgen Schmidhuber John Moody David Haussler + Michael Kearns Sara Solla + Esther Levin Steve Renals Reading up before the workshop ------------------------------ People intending to attend this workshop are encouraged to obtain preprints of relevant material before NIPS. A selection of preprints are available by anonymous ftp, as follows: unix> ftp hope.caltech.edu (or ftp 131.215.4.231) login: anonymous password: ftp> cd pub/mackay ftp> get README.NIPS ftp> quit Then read the file README.NIPS for further information. Problems? Contact David MacKay, mackay at hope.caltech.edu ========================================================================= Active Learning and Control Organizers: David Cohn, Univ. of Washington Don Sofge, MIT Speakers: C. Atkeson, MIT A. Barto, Univ. of Massachussetts, Amherst J. Hwang, Univ. of Washington M. Jordan, MIT A. Moore, MIT J. Schmidhuber, University of Colorado, Boulder R. Sutton, GTE S. Thrun, Carnegie-Mellon University Abstract: An "active" learning system is one that is not merely a passive observer of its environment, but instead play an active role in determining its inputs. This definition includes classification networks that query for values in "interesting" parts of their domain, learning systems that actively "explore" their environment, and adaptive controllers that learn how to produce control outputs to achieve a goal. Common facets of these problems include building world models in complex domains, exploring a domain to safely and efficiently, and, planning future actions based on one's model. In this workshop, our main focus will be to address key unsolved problems which may be holding up progress on these problems rather than presenting polished, finished results. Our hopes are that unsolved problems in one field may be able to draw on insight from research in other fields. ========================================================================= Computer Vision vs Network Vision Organizers: John Mayhew and Terry Sejnowski Speakers: TBA Abstract: Computer vision has developed a methodology based on sound engineering practice: 1. Break the problem down into well-defined subproblems and mathematically analyze each part; 2. Develop efficient algorithms for each module; 3. Implement each algorithm with the best available technology. These are Marr's three levels: computational, algorithmic, and implementational. In contrast, proponents of neural networks have developed a different methodology: 1. Find a good representation for the input data that makes explicit the features needed to solve the problem; 2. Use learning algorithms to cluster and categorize the data; 3. Glue together networks that solve different parts of the problem with more learning. Networks are memory intensive and constraints from the hardware level are as important as constraints from the computational level. This workshop is intended to provoke a lively and free-wheeling discussion of the central issues in vision. ========================================================================= Complexity Issues in Neural Computation and Learning Organizers: Kai-Yeung Sui and Vwani Roychowdhury, Stanford Univ. Speakers: TBA Abstract: The goal of this workshop is to address recent developments in understanding the capabilities and limitations of various models for neural computation and learning. Topics will include: 1) circuit complexity of neural networks, 2) capacity of neural networks, and 3) complexity issues in learning algorithms. ========================================================================= RECURRENT NETWORKS: THEORY AND APPLICATIONS Organizers: Luis Borges de Almeida, INESC C. Lee Giles, NEC Research Institute Richard Rohwer, Edinburgh University Speakers: TBA Abstract: Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains an important open issue. Training algorithms are very inefficient in terms of memory and computational demands. Little is known about convenient architectures. The number of known successful applications is very limited. This is true even for static applications (operation in the "fixed point mode"). The first day of this two-day workshop will focus on the outstanding theoretical issues in recurrent neural networks, and the second day will examine existing and potential real-world applications. ========================================================================= VLSI Neural Networks and Neurocomputers Organizers: Clifford Lau, Office of Naval Research Jim Burr, Stanford University Speakers: TBA Abstract: This two-day workshop will address the latest advances in VLSI implementations of neural nets, and the design of high performance neurocomputers. We will present an updated list of currently available neurochips, and discuss a wide range of issues, including: 1) Design issues: Advantage and disadvantage of analog and digital approaches; how much arithmetic precision is necessary; which algorithms have been implemented; importantance of on-chip learning; neurochip design in existing CAD environment. 2) Performance issues: Critical factors to achieve robust performance; Tradeoffs between capacity and performance; scaling limits to constructing large neural networks. 3) Use of neurochips: What input/output devices are necessary; what programming support environment is necessary. 4) Application areas for supercomputing neurocomputers From zeiden at cs.wisc.edu Mon Oct 28 10:30:14 1991 From: zeiden at cs.wisc.edu (zeiden@cs.wisc.edu) Date: Mon, 28 Oct 91 09:30:14 CST Subject: tech report available in NEUROPROSE Message-ID: <9110281530.AA29229@ai.cs.wisc.edu> I have placed the following tech report in the NEUROPROSE ftp archive at Ohio State, under the name zeidenberg.containment.ps.Z Implementing Spatial Relations in Neural Nets: The Case of Figure/Ground and Containment Matthew Zeidenberg zeiden at cs.wisc.edu A neural network system that computes the relation of containment between objects in a retina-like input array is described. This system is multi-layer, and operates by recognizing and segmenting the objects in the input to place them in separated arrays. The figure of each object, that is, the set of all pixels on the perimeter of or contained in the object, is computed for each object, using a method that involves a connectionist implementation of a standard algorithm using parity networks. These figures are then used to compute containment relations between the objects in the input. ftp Instructions: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get zeidenberg.containment.ps.Z ftp> quit unix> uncompress zeidenberg.containment.ps.Z unix> lpr zeidenberg.containment.ps (or other command to print postscript) From black at seismo.CSS.GOV Mon Oct 28 12:01:00 1991 From: black at seismo.CSS.GOV (Mike Black) Date: Mon, 28 Oct 91 12:01:00 EST Subject: What is current technology in Analog Neural Nets? Message-ID: <9110281701.AA21092@beno.CSS.GOV> I've seen little discussion and have found no references to work in analog neural networks. If you can provide some references or indicate what your current work is I'll summarize. These are the goals for my current research: Given an analog data source (e.g. pulse generator): 1. Recognize pulses (for example a single shot square wave) and reject "noise" (i.e. triangular wave) at rates of at least 10MHz (that is, it should be able to deal with a minimum 100ns pulse width). 2. Provide the trigger for an external digitizer to grab the resultant "good" pulses. 3. Be software controllable (hardware should be able to be updated by remote control). Please forward any current work or capability in this area to: black at beno.css.gov >> ------------------------------------------------------------------------------- >> : usenet: black at beno.CSS.GOV : land line: 407-494-5853 : I want a computer: >> : real home: Melbourne, FL : home line: 407-242-8619 : that does it all!: >> ------------------------------------------------------------------------------- From lissie!botsec7!botsec1!dcl at uunet.UU.NET Mon Oct 28 13:54:54 1991 From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert) Date: Mon, 28 Oct 91 13:54:54 EST Subject: Resource Allocation Network (RAN) Message-ID: <9110281854.AA20399@botsec1.bot.COM> Dear Connectionists: Has anyone tried to implement the Resource Allocation Network of John Platt (NIPS 3 and Neural Computation V3 #2)? I have a first cut at an implementation, and so far I have not been able to approach his published results. I'd be very interested in corresponding with anyone who has tried this algorithm. Also, if anyone has a means of reaching John Platt, I'd love to hear about it. I've been calling Synaptics in San Jose for over a week now, and there don't seem to be any humans that work there...only voice mail. Thanks David Lambert dcl at object.com or dcl at panix.com From khosla at latcs1.lat.oz.au Mon Oct 28 22:32:44 1991 From: khosla at latcs1.lat.oz.au (Rajiv Khosla) Date: Tue, 29 Oct 91 14:32:44 +1100 Subject: Spatial crosstalk and modular NN architecture Message-ID: <9110290332.AA18704@latcs1.lat.oz.au> Dear Connectionists, This is regarding my problem of making a 28-11-26, binary input/output neural network work. Thanks to everyone who sent me the replies. Its working nice and kicking. Best results are achieved by connecting the input layer to the output layer. Thanks once again Rajiv From terry at jeeves.UCSD.EDU Tue Oct 29 02:35:07 1991 From: terry at jeeves.UCSD.EDU (Terry Sejnowski) Date: Mon, 28 Oct 91 23:35:07 PST Subject: Continuous vs. Batch learning Message-ID: <9110290735.AA01748@jeeves.UCSD.EDU> There is evidence that the hippocampus is doing something like batch mode teaching for neocortex. The hippocampus is needed for one-shot learning, also called declarative or episodic learning. It seems to be storing up a lot of examples and over a period of months transfers this informaiton to cortex, where it is stored in a more categorical representation. Terry ----- From smieja at jargon.gmd.de Tue Oct 29 05:14:40 1991 From: smieja at jargon.gmd.de (Frank Smieja) Date: Tue, 29 Oct 91 11:14:40 +0100 Subject: Batch methods versus stochastic methods... In-Reply-To: mmoller@daimi.aau.dk's message of Mon, 21 Oct 91 13:13:06 +0100 Message-ID: <9110291014.AA24169@jargon.gmd.de> -) Unfortunately, we do not have any datasets of the proper size. -) So I would appreciate if anyone could inform me about where to find big -) datasets that are public available. -) -) -- Martin M -) -) ----------------------------------------------------------------------- -) Martin F. Moller email: mmoller at daimi.aau.dk -) Computer Science Department phone: +45 86202711 5223 -) Aarhus University fax: +45 86135725 -) Ny Munkegade, Building 540 -) 8000 Aarhus C -) Denmark -) ---------------------------------------------------------------------- I demonstrated in my paper "MLP Solutions, Generalization and Hidden Unit Representations" in the DANIP (Distributed And Neural Information Processing) conference in Bonn, Germany, April 1989 (ed: Kindermann & Linden, pub: Oldenbourg Verlag), how one might "synthetically" construct a training set of any size of inputs/outputs, that may be generalized, insofar that the "regularities" beloved by our networks are guaranteed to exist, since they are used to generate the training set pairs, but not visible to the network until the examples are seen, and the learning results in "emergent generalization". I used this method in the paper to study a small diagnosis problem, but scaling up is no problem. If you cannot get hold of this book, and would like to see the paper, I can make it available in the neuroprose archive (unfortunately without figures, but they are not needed to explain the method). If this is also difficult, I will send hard copies to interested parties. Please send such requests directly to me (smieja at gmdzi.uucp) and I will either reply directly or to the bboard. -Frank Smieja From joachim at gmdzi.gmd.de Tue Oct 29 12:57:47 1991 From: joachim at gmdzi.gmd.de (Joachim Diederich) Date: Tue, 29 Oct 91 16:57:47 -0100 Subject: New Paper Message-ID: <9110291557.AA14221@gmdzi.gmd.de> The following paper has been placed in the Neuroprose archives at Ohio State. The file is "diederich.hybrid.ps.Z." See ftp in- structions below. Efficient Question Answering in a Hybrid System Joachim Diederich (1,2) & Debra L. Long (2) (1) German National Research Center for Computer Science (GMD) Schloss Birlinghoven, P.O. Box 1240 D-5205 St.Augustin 1, Germany (2) Department of Psychology University of California, Davis Davis, CA 95616, U.S.A. ABSTRACT: A connectionist model for answering open-class questions in the context of text processing is presented. The system answers ques- tions from different question categories, such as "How," Why," and "Consequence" questions. These question categories have been identified in several empirical studies (Graesser & Clark, 1985; Graesser, 1990). The system responds to a question by generating a set of possible answers that are weighted according to their plausibility. Search is performed by means of a massively paral- lel, directed spreading activation process. The search process operates on several knowledge sources (i.e., connectionist net- works) that are learned or explicitly built-in. Spreading activa- tion involves the use of signature messages (Lange & Dyer, 1989). Signature messages are numeric values that are propagated throughout the networks and identify a particular question category (this makes the system hybrid). Binder units that gate the flow of activation between textual units receive these signa- tures and change their states. That is, the binder units either block the spread of activation or allow the flow of activation in a certain direction. The process results in a pattern of activa- tion that represents a set of candidate answers based on avail- able knowledge sources. This paper will appear in the IJCNN-91 Singapore Proceedings. unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get diederich.hybrid.ps.Z ftp> quit unix> uncompress diederich.hybrid.ps.Z unix> lpr diederich.hybrid.ps Joachim Diederich German National Research Center for Computer Science (GMD) P.O. Box 1240 D-5205 St. Augustin 1 Germany From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 29 15:23:46 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 29 Oct 91 15:23:46 EST Subject: Robinson's vowel dataset In-Reply-To: Your message of Mon, 28 Oct 91 09:49:58 +0200. <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> Message-ID: Does anyone have any NEW results on Robinson's vowel dataset. I am aware of the original results given in his thesis: A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ 1989. I don't know of any more recent publications on this problem. I got some rather good results using Cascade-Correlation: (train 300 300 25)) SigOff 0.10, WtRng 1.00, WtMul 1.00 OMu 2.00, OEps 1.00, ODcy 0.0300, OPat 12, OChange 0.010 IMu 2.00, IEps 10.00, IDcy 0.0300, IPat 8, IChange 0.030 Utype :SIGMOID, Otype :SIGMOID, RawErr NIL, Pool 32 Trial 0: 181 of 462 cases wrong, 281 right, 60.82% @ 23 hidden Trial 1: 174 of 462 cases wrong, 288 right, 62.34% @ 11 hidden Trial 2: 193 of 462 cases wrong, 269 right, 58.23% @ 24 hidden Trial 3: 174 of 462 cases wrong, 279 right, 60.39% @ 15 hidden Trial 4: 180 of 462 cases wrong, 282 right, 61.04% @ 24 hidden Trial 5: 186 of 462 cases wrong, 276 right, 59.74% @ 17 hidden Trial 6: 188 of 462 cases wrong, 274 right, 59.31% @ 11 hidden Trial 7: 174 of 462 cases wrong, 288 right, 62.34% @ 15 hidden Trial 8: 173 of 462 cases wrong, 289 right, 62.55% @ 13 hidden Trial 9: 170 of 462 cases wrong, 292 right, 63.20% @ 18 hidden Avg: 180 of 462 cases wrong, 282 right, 61.03% @ 17 hidden The test set was run after each output training phase and the best value obtained is the one reported. The best results obtained by Robinson were 260 right (56%) for nearest neighbor, and 253 right (55%) for 528 Gaussian nodes or 88 square nodes. Backprop with 88 sigmoids never got better than 234 (51%). I've never published these results, because I think they are a bit of a cheat. The problem is that I played around with the decay factor and other parameters until I got good results on the test set. It's not clear that the same setting would give equally good performance on a new test set that I had never seen. Also, in all cases the algorithm obtained a solid level of 59% or so, but then wandered up and down, in no particular pattern, as new units were added. I can get a good number -- up to 63% -- by grabbing the best point on this random walk, but I don't honestly believe that the network at that point would give equally good results on new test data drawn from the same distribution. What we really need is a much larger data set for this problem. Then we could split the set into training data (a larger set, offering much better generalization), cross-validation data (used to determine when training should stop), and final test data, never used in training. The the current set is so small that it's not possible to split things up this way. -- Scott Fahlman From kak at max.ee.lsu.edu Tue Oct 29 16:36:52 1991 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Tue, 29 Oct 91 15:36:52 CST Subject: No subject Message-ID: <9110292136.AA01849@max.ee.lsu.edu> CALL FOR PAPERS Special Issue On NETWORKS FOR NEURAL PROCESSING Circuits, Systems, and Signal Processing Guest Editors: W.A. Porter, University of Alabama, Huntsville S.C. Kak, Louisiana State University, Baton Rouge Papers are solicited on the theoretical foundations, challenging applications and efficient parallel architectures for neural computing. Suggested topics include: training for generalization, use of higher order moments, rapid training algorithms, nonbinary design, optimization networks, and mapping networks. Papers which critique and/or compare recent developments in neural computation are also of interest. Papers should be prepared according to the Information for Contributors on the inside back cover of Circuits, Systems, and Signal Processing. Papers should be submitted in triplicate by January 20, 1992 in care of: Professor William A. Porter Department of Electrical and Computer Engineering The Univesity of Alabama in Huntsville Huntsville, AL 35899 [Tel. (205) 895-6858] For further information contact Professor S.C. Kak at kak at max.ee.lsu.edu or contact Professor W.A. Porter. From dlukas at PARK.BU.EDU Tue Oct 29 13:34:49 1991 From: dlukas at PARK.BU.EDU (David Lukas) Date: Tue, 29 Oct 91 13:34:49 -0500 Subject: Faculty position in Cognitive & Neural Systems at Boston University Message-ID: <9110291834.AA29864@cns.bu.edu> Assistant Professor Cognitive and Neural Systems Boston University Boston University seeks to hire a tenure track assistant professor starting in Fall 1992 for its graduate Department of Cognitive and Neural Systems. The Department offers an integrated curriculum offering the full range of psychological, neurobiological, and computational concepts, models, and methods in the fields of neural networks, computational neuroscience, parallel distributed processing, and biological information processing, in which Boston University is a leader. Candidates should have extensive analytic or computational research experience in modelling nonlinear neural networks, especially in one or more of the areas: learning, speech and language processing, adaptive pattern recognition, cognitive information processing, and adaptive sensory-motor control. Send a complete curriculum vitae and three letters of recommendation to Stephen Grossberg, Chairman, Search Committee, Department of Cognitive and Neural Systems, Room 240, 111 Cummington Street, Boston University, Boston, MA 02215, no later than January 1, 1992. Boston University is an Equal Opportunity/Affirmative Action employer. If you have questions or require further information, please reply to Carol Jefferson---caroly at cns.bu.edu. From demers at cs.UCSD.EDU Tue Oct 29 16:48:36 1991 From: demers at cs.UCSD.EDU (David DeMers) Date: Tue, 29 Oct 91 13:48:36 PST Subject: Generalization Message-ID: <9110292148.AA15810@beowulf.ucsd.edu> A short while back there was a discussion of generalization; I recall contributions by Wolpert and Goldfarb, among others. I didn't save the exchanges, however I'd like to look at them now. Unfortunately, I can't seem to connect up to the archive to retrieve the mailings. If anyone has most of the discussion still lying around, I'd appreciate it if you could mail it to me; also, I'd appreciate anyone's opinion on "what is generalization" in 250 words or less :-) I do have most of David Wolpert's papers, so don't need another copy of them... Thanks for any help, Dave From hcard at ee.UManitoba.CA Wed Oct 30 15:14:07 1991 From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA) Date: Wed, 30 Oct 91 14:14:07 CST Subject: batch learning Message-ID: <9110302014.AA00760@card.ee.umanitoba.ca> In the PDP books batch learning accumulates error derivatives from each pattern rather than simply their contributions to the total error, before making weight changes. It seems that gradient descent ought to add all the errors before taking any derivatives. Any comments? Howard Card From petsche at learning.siemens.com Wed Oct 30 15:33:38 1991 From: petsche at learning.siemens.com (Thomas Petsche) Date: Wed, 30 Oct 91 15:33:38 EST Subject: NIPS travel (limited cheap airfare) Message-ID: <9110302033.AA12077@learning.siemens.com> FYI: United has a special fare program available until tomorrow. We just booked a round trip from Newark to Denver (leave Monday morning and return Sunday morning) for $250. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 29 22:59:19 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 29 Oct 91 22:59:19 EST Subject: Resource Allocation Network (RAN) In-Reply-To: Your message of Mon, 28 Oct 91 13:54:54 -0500. <9110281854.AA20399@botsec1.bot.COM> Message-ID: Have you tried E-mail? I exchanged some mail with him a month or so ago: John Platt -- Scott From ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU Wed Oct 30 04:55:26 1991 From: ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU (ruizdeangulo%ispgro.cern.ch@BITNET.CC.CMU.EDU) Date: Wed, 30 Oct 91 10:55:26 +0100 Subject: batch-continous-one shot Message-ID: <9110300955.AA03462@dxmint.cern.ch> Referring to the batch-continous-one-shot learning discussion, in the reference bellow we describe an algorithm that can be labeled as one-shot learning. I think it fits well with the Plutowski and White method described recently. >What we do (as reported in the tech report by Plutowski & White) >is sequentially grow the training set, first finding >an "optimal" training set of size 1, then fitting the network to this >training set, appending the training set with a new exemplar selected from >the set of available candidates, obtaining a training set of size 2 which >is "approximately optimal", fitting this set, appending a third exemplar, etc, >continuing the process until the network fit obtained by training over the >exemplars fits the rest of the available examples within the desired tolerance. The MDL (Minimal disturbance learning) algorithm introduces a new exemplar minimizing an estimation of the loss function (error increment) over the old patterns.It makes a little search for this optimization but whatever the stopping point(for this search), perfect recall of the new exemplar is gotten. The network is not forced to assume any special kind of local-representation. Ruiz de Angulo,V.,Torras, C.(1991) Minimally disturbing Learning. In the proceedings of the IWANN 91.Springer Verlag From edelman at wisdom.weizmann.ac.il Thu Oct 31 04:08:00 1991 From: edelman at wisdom.weizmann.ac.il (Shimon Edelman) Date: Thu, 31 Oct 91 11:08+0200 Subject: Resource Allocation Network (RAN) In-Reply-To: <9110281854.AA20399@botsec1.bot.COM> Message-ID: <19911031090807.2.EDELMAN@YAD.weizmann.ac.il> A similar technique of RBF center allocation, in conjunction with other modifications of RBF learning, was successful in replicating human performance in the difficult visual task of hyperacuity vernier discrimination. See AI Memo 1271, "Synthesis of visual modules from examples: learning hyperacuity", by T. Poggio, M. Fahle and S. Edelman (January 1991). Center allocation is discussed there on p.7. -Shimon Edelman edelman at wisdom.weizmann.ac.il From dfausett at zach.fit.edu Thu Oct 31 09:48:43 1991 From: dfausett at zach.fit.edu ( Donald W. Fausett) Date: Thu, 31 Oct 91 09:48:43 -0500 Subject: What is current technology in Analog Neural Nets? Message-ID: <9110311448.AA02454@zach.fit.edu> Prof. Bernard Widrow at Stanford University (EE Dept) would be a likely source to stir you in the right direction. Locally, you might try Prof. Hal Brown at FIT (EE Dept). Good luck. -- Don Fausett From lissie!botsec7!botsec1!dcl at UUNET.uu.net Thu Oct 31 10:13:26 1991 From: lissie!botsec7!botsec1!dcl at UUNET.uu.net (David Lambert) Date: Thu, 31 Oct 91 10:13:26 EST Subject: Resource Allocation Network (RAN) Message-ID: <9110311513.AA24956@botsec1.bot.COM> Hi. Thanks to all respondents concerning my RAN question. I managed to get in touch with John Platt, and he was most helpful. John Platt writes: > Someone forwarded me your posting on the connectionist mailing list.. > Could you please follow up, and say that you have successfully used > RAN? It would be nice to leave an impression of a working algorithm... My sincere apologies for being lax in my courtesies, John. You're right, of course. I got RAN working just fine, and it works as well (if not better than) advertised. To those who asked for a copy of the resulting code, I'll probably release it sometime soon, through one mechanism or another. Thanks again. David Lambert dcl at object.com or dcl at panix.com From B344DSL at UTARLG.UTA.edu Wed Oct 9 23:55:00 1991 From: B344DSL at UTARLG.UTA.edu (B344DSL@UTARLG.UTA.edu) Date: Wed, 9 Oct 1991 22:55 CDT Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu> ANNOUNCEMENT AND CALL FOR ABSTRACTS WORKSHOP ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas SIG of the International Neural Network Society (INNS). To be held at a loca- tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday, February 6-8, 1992. Confirmed speakers include: Stephen Grossberg (Boston University) Stephen Hampson (University of California, Irvine) Karl Pribram (Radford University) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) The focus of this conference will be twofold: (1) how to optimize different aspects of neural and cognitive function and (2) whether particular natural or artificial solutions to specific neural or cognitive problems are in fact opti- mal. Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat- egorization, sensory perception, and motor control. The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions. Speakers will not be re- quired to write a paper, but will be invited to contribute chapters to a book several months after the conference. Books based on two previous MIND conferen- ces -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published by Lawrence Erlbaum Associates. Registration for the conference will be $80 for non-students, $20 for students, with a $10 rebate for MIND or Texas SIG membership. We will try to arrange for discounted air fares from American Airlines as we have done in the past. Those interested in presenting should send me a short (1-3 paragraph) abstract by December 1, 1991, using either e-mail, FAX, or snail mail. Notification of ac- ceptance will be given December 15, 1991. We will not be holding parallel ses- sions, so there are limitations on the number of speakers. However, individu- als who send high-quality abstracts that cannot be accommodated in actual talks will have space to present their work in posters at the conference, and will also be invited to contribute to the book. Prof. Daniel S. Levine Department of Mathematics University of Texas at Arlington Arlington, TX 76019-0408 e-mail: b344dsl at utarlg.uta.edu FAX: 817-794-5802 Telephone: 817-273-3598 From smagt at fwi.uva.nl Wed Oct 2 04:28:55 1991 From: smagt at fwi.uva.nl (Patrick van der Smagt) Date: Wed, 2 Oct 91 09:28:55 +0100 Subject: reprint announcement Message-ID: <9110020828.AA20879@fwi.uva.nl> I mentioned a paper some time ago about neural robotics control. Popular demand made me decide to make it available by anonymous ftp from neuroprose. --------------------------------------------------------------------- The following reprint is available by ftp from the neuroprose archive at archive.cis.ohio-state.edu: A real-time learning neural robot controller P. Patrick van der Smagt Ben J. A. Kr\"ose Department of Computer Systems University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands ABSTRACT A neurally based adaptive controller for a 6 degrees of freedom (DOF) robot manipulator with only rotary joints and a hand-held camera is described. The task of the system is to place the manipulator directly above an object that is observed by the camera (i.e., 2D hand-eye coordination). The requirement of adaptivity results in a system which does not make use of any inverse kinematics formulas or other detailed knowledge of the plant; instead, it should be self-supervising and adapt on-line. The proposed neural system will directly translate the preprocessed sensory data to joint displacements. It controls the plant in a feedback loop. The robot arm may make a sequence of moves before the target is reached, when in the meantime the network learns from experience. The network is shown to adapt quickly (in only tens of trials) and form a correct mapping from input to output domain. Here's how to get the reprint from neuroprose: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get smagt.rtcontrol.ps.Z ftp> quit unix> uncompress smagt.rtcontrol.ps.Z unix> lpr smagt.rtcontrol.ps (or however you print postscript) Questions or comments can be sent to me at: Patrick van der Smagt Department of Computer Systems University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands email: smagt at fwi.uva.nl fax: +31 20 525 7490 phone: +31 20 525 7524 From LAUTRUP at nbivax.nbi.dk Wed Oct 2 04:05:00 1991 From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup) Date: Wed, 2 Oct 1991 09:05 +0100 (NBI, Copenhagen) Subject: preprint Message-ID: <1F984C7800023236@nbivax.nbi.dk> New preprint Uniqueness of Parisi's Scheme for Replica Symmetry Breaking B. Lautrup Computational Neural Network Center The Niels Bohr Institute Blegdamsvej 17 2100 Copenhagen, Denmark Abstract: Replica symmetry breaking in spin glass models is investigated using elements of the theory of permutation groups. It is shown how the various types of symmetry breaking gives rise to special algebras and that Parisi's scheme may be uniquely characterized by two simple conditions on these algebras, namely transposition symmetry and simple extensibility. An alternative to the Parisi scheme is shown to be unacceptable. The paper may be retrieved by anonymous ftp from nbibel.nbi.dk (129.142.100.11) in the directory pub/neuroprose under the name lautrup.parisi.ps.Z It is a compressed postscript file. Regards Benny Lautrup From mclennan at cs.utk.edu Wed Oct 2 15:10:21 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Wed, 2 Oct 91 15:10:21 -0400 Subject: report available Message-ID: <9110021910.AA12670@maclennan.cs.utk.edu> ** Please do not forward to other boards. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. N.B. The uncompressed file is long (2.07 MB), so you may have to use the -s (symbolic link) option on lpr to print it. ----------------------------------------------------- Gabor Representations of Spatiotemporal Visual Images Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-144 ABSTRACT: We review Gabor's Uncertainty Principle and the limits it places on the representation of any signal. Representations in terms of Gabor elementary functions (Gaussian-modulated sinusoids), which are optimal in terms of this uncertainty principle, are compared with Fourier and wavelet representations. We also review Daugman's evidence for representations based on two-dimensional Gabor functions in mammalian visual cortex. We suggest three- dimensional Gabor elementary functions as a model for motion selectivity in complex and hypercomplex cells in visual cortex. This model also suggests a computational role for low frequency oscillations (such as the alpha rhythm) in visual cortex. A preliminary version of this paper was presented at the workshop ``Foundational Methods for Behavioral and Computational Neurosci- ences,'' Georgetown University, May 13-15, 1991. ----------------------------------------------------- FTP INSTRUCTIONS Either use the Getps script, or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.gabor.ps.Z ftp> quit unix> uncompress maclennan.gabor.ps.Z unix> lpr -s maclennan.gabor.ps (or however you print postscript) If you need hardcopy, then send your request to: library at cs.utk.edu Bruce MacLennan Department of Computer Science 107 Ayres Hall The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From M.Stannett at dcs.sheffield.ac.uk Wed Oct 2 16:30:06 1991 From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk) Date: Wed, 2 Oct 91 16:30:06 BST Subject: Concurrent semantics Message-ID: <9110021530.AA04587@sun5.dcs.sheffield.ac.uk> Dear All, IF THIS MESSAGE ISN'T RELEVANT TO YOU, PLEASE PASS IT TO SOMEONE TO WHOM IT IS. One of my major delights in computer science is the nature of concurrent semantics, and especially the "non-interleaving" models like Mazurkiewicz trace language and their analogues (these are models which represent so-called 'true' concurrency, rather than trying to flatten everything down into sequences of actions). Nonetheless, I readily admit that the more standard "interleaving" models are fascinating in their own right as well. In any case, I'm certain we're all trying to solve the same problems, but merely approaching them from slightly different angles - in ten years time, we'll be wondering what all the disagreement was about .... {{{ CONNECTIONISTS: concurrent semantics is concerned with working out what complex concurrent systems are actually doing, and how properly to represent their behaviour. Applying the standard sequential interpretations to concurrent systems can sometimes lead to misleading results. Consequently, I would argue that finding a deep understanding of the nature of complex networks probably involves exactly the same problems as are currently faced by concurrent semantics theorists. It might prove extremely fruitful to see some colloborations between the two fields }}} As far as I can work out, there seems to be only negligible contact between the many groups working in the area. I'd like to see some sort of elecronic forum for discussing ideas in the area - even if we can't work together, at least we might be able to exchange ideas rapidly from time to time. Please let me know if you'd be interested in joining in a sort of loosely confederated "concurrency club" or whatever. Obviously, there's be no funding to speak of, but then, given sufficient enthusiasm, we shouldn't need any. (At least, not yet). Provided the task isn't TOO time-consuming, I'll happily channel messages to interested parties for the time-being. Thanks for reading! Mike Stannett ( M.Stannett @ uk.ac.sheffield.dcs ) From et at eng.cam.ac.uk Wed Oct 2 10:31:19 1991 From: et at eng.cam.ac.uk (E. Tzirkel-Hancock) Date: Wed, 2 Oct 91 15:31:19 +0100 Subject: Technical Report Available Message-ID: <24638.9110021431@tw700.eng.cam.ac.uk> The following report has been placed in the neuroprose archives at Ohio State University: STABLE CONTROL OF NONLINEAR SYSTEMS USING NEURAL NETWORKS Eli Tzirkel-Hancock & Frank Fallside Technical Report CUED/F-INFENG/TR.81 Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England Abstract A neural network based direct control architecture is presented, that achieves output tracking for a class of continuous time nonlinear plants, for which the nonlinearities are unknown. The controller employs neural networks to perform approximate input/output plant linearization. The network parameters are adapted according to a stability principle. The architecture is based on a modification of a method previously proposed by the authors, where the modification comprises adding a sliding control term to the controller. This modification serves two purposes: first, as suggested by Sanner and Slotine, sliding control compensates for plant uncertainties outside the state region where the networks are used, thus providing global stability; second, the sliding control compensates for inherent network approximation errors, hence improving tracking performance. A complete stability and tracking error convergence proof is given and the setting of the controller parameters is discussed. It is demonstrated that as a result of using sliding control, better use of the network's approximation ability can be achieved, and the asymptotic tracking error can be made dependent only on inherent network approximation errors and the frequency range of unmodeled dynamical modes. Two simulations are provided to demonstrate the features of the control method. ************************ How to obtain a copy ************************ a) via FTP: % ftp archive.cis.ohio-state.edu .. Name (archive.cis.ohio-state.edu): anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get tzirkel.control_tr81.ps.Z ftp> quit % uncompress tzirkel.control_tr81.ps.Z % lp tzirkel.control_tr81.ps b) via postal mail: Request a hardcopy from Eli Tzirkel, et at eng.cam.ac.uk Speech Laboratory Cambridge University Engineering Department Trumpington Street, Cambridge CB2 1PZ England From STIVA%IRMKANT.BITNET at vma.cc.cmu.edu Thu Oct 3 11:41:47 1991 From: STIVA%IRMKANT.BITNET at vma.cc.cmu.edu (stefano nolfi) Date: Thu, 03 Oct 91 11:41:47 EDT Subject: Technical Report Available Message-ID: The following technical report is available. Send request to STIVA at IRMKANT.BITNET DO NOT REPLAY TO THIS MESSAGE ------------------------------------------------------------------------ Learning, Behavior, and Evolution Domenico Parisi Stefano Nolfi Federico Cecconi Institute of Psychology CNR - Rome e-mail: stiva at irmkant.Bitnet Abstract We present simulations of evolutionary processes operating on populations of neural networks to show how learning and behavior can influence evolution within a strictly Darwinian framework. Learning can accelerate the evolutionary process both when learning tasks correlated with the fitness criterion and when random learning tasks are used. Furthermore, an ability to learn a task can emerge and be transmitted evolutionarily for both correlated and uncorrelated tasks. Finally, behavior that allows the individual to self-select the incoming stimuli can influence evolution by becoming one of the factors that determine the observed phenotypic fitness on which selective reproduction is based. For all the effects demonstrated, we advance a consistent explanation in terms of a multidimensional weight space for neural networks, a fitness surface for the evolutionary task, and a performance surface for the learning task. This paper will be presented at ECAL-91 - European Conference on Artificial Life, December 1991, Paris. From mre1 at it-research-institute.brighton.ac.uk Thu Oct 3 09:20:50 1991 From: mre1 at it-research-institute.brighton.ac.uk (Mark Evans) Date: Thu, 3 Oct 91 09:20:50 BST Subject: IJCNN '91 Singapore - Request to share a room Message-ID: <1583.9110030820@itri.bton.ac.uk> I will be attending IJCNN '91 in Singapore on the 18-21 November where I will be presenting a paper. I would be interested in hearing from anyone who would like to share a twin room for the duration of the conference. (I am about to book myself a room or I could pay you if you have already booked a room.) I am PhD student at Brighton Polytechnic, UK working in the field of computer vision and neural networks. Anyone interested ? ################################################# # # # M.R. Evans mre1 at itri.bton.ac.uk # # Research Assistant mre1 at itri.uucp # # # # ITRI, # # Brighton Polytechnic, # # Lewes Road, # # BRIGHTON, # # E. Sussex, # # BN2 4AT. # # # # Tel: +44 273 642915/642900 # # Fax: +44 273 606653 # # # ################################################# From kak at max.ee.lsu.edu Thu Oct 3 10:38:55 1991 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Thu, 3 Oct 91 09:38:55 CDT Subject: TR's available Message-ID: <9110031438.AA14174@max.ee.lsu.edu> Please send me a copy of your report. Subhash Kak Professor of Electrical & Computer Engineering Louisiana State University Baton Rouge, LA 70803-5901 From M.Stannett at dcs.sheffield.ac.uk Fri Oct 4 16:12:17 1991 From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk) Date: Fri, 4 Oct 91 16:12:17 BST Subject: concurrent semantics mailing list Message-ID: <9110041512.AA06164@sun5.dcs.sheffield.ac.uk> Hello again! A number of subscribers to CONNECTIONISTS have indicated they haven't come across concurrent semantics (which may explain Chris Tofts' comments below). I'll send you a quick summary of the subject area in a few days' time, and try to show why it's relevant to connectionist researchers. Meanwhile ... two respondents have indicated that appropriate electronic fora already exist for the discussion of concurrent semantics, while others have demonstrated that (like me) they have no information about these fora. Since there's no point setting up a third system in competition with the other two I now know about, I enclose the details below. (If the other are indeed distinct, perhaps they should consider merging ...) --- Included message #1 --- From: Miranda Mowbray Hello Mike, Yes, this is a very good idea [...] There is already a Concurrency mailing list and archive, specially designed as a forum for rapid exchange of ideas between different groups working in Concurrency. It's been running for some time now and I'm surprised you haven't heard of it. It's run by Albert Meyer at MIT. To join, send a message to concurrency at theory.lcs.mit.edu saying that you'd like to be on the mailing list. You'll get information about archive files available. This is a high quality forum and I recommend joining. I also recommend that you tell anyone else who replies to your message and wants to be in a concurrency club. I don't see why you should go to the trouble of setting up your own separate club when one already exists, unless your version has specific local interests which are not catered for by Albert Meyer's; in any case what you *mustn't* do is set up a second forum which will keep people ignorant of the first, after all the whole point is to get everyone together! Thankyou for your public-spiritedness, Yours, Miranda. --- Included message(s) # 2/3 --- From: Chris Tofts Subject: Re: Concurrent semantics Hi Mike, interesting idea, at a symposium on complex systems in the states last year I suggested using ideas from algebraic concurrency theory to a collection of people working in neural nets etc, they not only seemed remarkable uniterested but failed to see any link. It seems that any connections (sic) will have to be exposed from the theoretical side. There already exists a news group for concurrency which is used, are you suggesting something other than this?? All the best, Chris. From: C.Tofts at uk.ac.bath.gdr I believe its mail.concurrency, at least that's what its called in edinburgh. Ask your local news guru, All the best, Chris. --- End of included messages --- From wray at ptolemy.arc.nasa.gov Fri Oct 4 19:11:01 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Fri, 4 Oct 91 16:11:01 PDT Subject: tree classification code available for comparative studies Message-ID: <9110042311.AA01252@ptolemy.arc.nasa.gov> I've made the following report available on the Neuroprose Archive (cheops.cis.ohio-state.edu) as buntine.treecode.ps.Z not because I think connectionists are "deeply" interested in tree learning research but because I think it would be a handy resource for comparative studies: 1) systems such as CART/C4 are recognised programs for benchmarking supervised learning systems against 2) home-grown reimplementations can be buggy and a timesink 3) if your problem has some inherent structure and a few key indicator variables then trees may be a good thing to try as well 4) trees typically don't work well with purely numeric data or with problems with many variables all giving some minor contribution to the prediction being made The IND Tree Package we developed here incorporates some of early C4, most of the classification trees component of CART (no regression) along with some more recent Bayesian/MDL approaches that sometimes work better. You can obtain LaTeX source for the following introductory report if you email to: ind at kronos.arc.nasa.gov and ask for "About the IND Tree Package". --------------------------------------- About the IND Tree Package Wray Buntine, RIACS NASA Ames Research Center Mail Stop 269-2 Moffet Field, CA 94035 September 29, 1991 This note introduces the IND Tree Package to prospective procurers and those users/installers looking at IND for the first time. IND does supervised learning using classification trees. IND integrates features from Breiman {\it et al.}'s CART and Quinlan's C4 with newer Bayesian and minimum encoding methods for growing classification trees, and provides an experimental control suite on top. The package comes with a manual, ``man'' entries, and a guide to tree methods and research. Information about obtaining IND, performance statistics, documentation, authorship, copyright, installation, etc., are given. IND is currently under development, although it has been used considerably since late 1989. IND is implemented in C under UNIX. ---------------------------------------- Wray Buntine RIACS (Research Inst. for Advanced Comp. Sc.) NASA Ames Research Center phone: (415) 604 3389 Mail Stop 244-17 fax: (415) 604 6997 Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Sun Oct 6 09:31:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Sun, 6 Oct 91 09:31 U Subject: Thank's for help. Message-ID: <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Sun Oct 6 09:33:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Sun, 6 Oct 91 09:33 U Subject: Thank's for help Message-ID: <01GBEHHAOVDCD7QHLX@BITNET.CC.CMU.EDU> From tackett at ipla00.dnet.hac.com Sun Oct 6 00:08:02 1991 From: tackett at ipla00.dnet.hac.com (Walter Alden Tackett) Date: Sun, 6 Oct 91 00:08:02 EDT Subject: tree classification code available for comparative studies Message-ID: <9110060708.AA10023@ipla00.ipl.hac.com> Wray Buntine writes: > not because I think connectionists are "deeply" interested in tree learning ...only in *dendritic* trees, maybe? ;-) -wt From aboulang at BBN.COM Sun Oct 6 11:40:36 1991 From: aboulang at BBN.COM (aboulang@BBN.COM) Date: Sun, 6 Oct 91 11:40:36 EDT Subject: Detailed Balance In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> Message-ID: The property (2) is called detailed balance resulting in a Gibbs distribution for the probability to find the system in a particular state. The rule (1) is an update procedure for the spin Sk which ensure detailed balance provided that E is an energy. Both principles are fundamental facts of statistical mechanics of neural networks (or if you prefer result from an maximum entropy analysis of neural nets). The book by Hertz Krogh and Palmer summerizes all that in a nice way. The book title is "Introduction to Neural Computation". We really should be saying that detailed balance in sampling implies a Gibbs distribution, but that the Gibbs distribution does not imply the use of a sampling procedure with detailed balance. There is some new work on this: J. Marroquin & A. Ramerez "Stochastic Cellular Automata with Gibbsian Invariant Measures" IEEE Trans Information Theory May(*), 1991 * I can't find the paper so I may have the month wrong. This is potentially good news to people trying to get annealing-type algorithms to work for fine-grained MIMD parallelism. Regrads, Albert Boulanger aboulanger at bbn.com From M.Stannett at dcs.sheffield.ac.uk Sun Oct 6 00:10:13 1991 From: M.Stannett at dcs.sheffield.ac.uk (Mike Stannett) Date: Sun, 6 Oct 91 00:10:13 BST Subject: summary of concurrent semantics Message-ID: <9110052310.AA15255@dcs.sheffield.ac.uk> ((This message is just over two pages of A4 long)) A very brief (incomplete) summary of concurrent semantics --------------------------------------------------------- (This description reflects my personal bias towards trace models; I apologise in advance to anyone who feels I've given an unbalanced account of the field.) You will recall Russell's demonstration that mathematics early this century was built on very dodgy ground. The search was on, and still is, for a formal theory of mathematics itself - why is it sensible to discuss some sets but not others? This purely mathematical problem led directly to many aspects of computer science that are now taken for granted. For example, Skolem (c. 1934) realised that the derivation of Russell's paradox could be avoided by introducing the notion of definition-by-recursion. Meanwhile, Church was developing the lambda-calculus, Post was working on his production systems, and Turing was introducing his machine models and computational AI. As a result, there is a wealth of structure available for discussing the underlying nature of computational processes themselves. This is essential in some cases. For example, we need to ensure that the code we produce will generate the same behaviour when compiled on two different systems; consequently, we need some way of describing the semantics of this code (i.e. what it's supposed to mean) which is machine-independent. There are several approaches to this problem, with perhaps the most mathematical being 'denotational semantics', under which all programs can be regarded as functional - a program becomes a function which maps abstract 'inputs' to abstract 'outputs'. For concurrent systems, this 'functional' view is insufficient. A standard example concerns the use of shared variables: from a purely sequential point of view, the two programs prog1: x=0; x++; x++ prog2: x+0; x+=2 are identical, since they implement the same overall function. From the concurrent point of view, they are NOT identical, because they can interact with a third process in different ways. For example, if we run first prog1 and then prog2 in the context of prog3: x=10; then the possible values of x on termination of the combined systems are different prog1 | prog3 : 2, 10, 11, 12, error prog2 | prog3 : 2, 10, 12, error depending on precisely when prog3 gets executed. Accordingly, much of concurrent semantics is based on the idea that processes should be regarded as active agents which interact with each other. For example, we would reject the notion that the variable x is just a passive entity which is operated upon; instead it becomes an agent in its own right, which interacts with the processes that update it. Many solutions to the problem of correctly representing the semantics of concurrent systems have been developed, and can be roughly divided into two 'schools' - interleaving and non-interleaving. According to the interleaving version, the sequences of activities that might be performed by two systems running concurrently are just the interleavings of the sequences for the systems taken individually. This is the approach adopted in (the standard theories of) CCS and CSP. The non-interleaving school argues that this representation is inappropriate, and indeed unnecessary, since models of 'true' concurrency are easy to develop (e.g. Petri nets). In the middle ground, there are models such as 'Mazurkiewicz trace theory' which consider the behaviour of a concurrent system to be represented by the collection of ALL its possible action-sequences (rather than accepting the notion that any one of these traces will do as a valid representation). Nor is this a complete list of the approaches used; for example, there is a growing tendency to use models based on category theory and general topology, but I can't reasonably include these in a short summary (besides, I don't know enough about them to represent them accurately). The key differences between the different approaches are in the way they treat the relationship between time and causality. Given that we are trying to describe a system based on the possible observations of its behaviour, we have to be careful when we impute relationships that may not exist. It may just happen, for example, that one event in a system is always followed by another - but this doesn't mean that they are causally related. Sometimes this doesn't matter, but problems can arise when we introduce additional processes with which to interact. It becomes very difficult to work out precisely how the models of individual processes should be 'stuck together' to get a valid model of the combined system. Presumably this problem is reflected in difficulties faced by connectionists in deciding what happens when large nets are considered to be made up of more manageable sub-nets. Do you have a general theory yet for deciding * what process is computed by a given net ? * what process is computed by a given combination of smaller nets ? If not, perhaps our two different disciplines could benefit from talking to one another. Some sources ============ Probably the best sources for results in semantics and concurrency are the many volumes of the "Lecture Notes in Computer Science" series from Springer-Verlag. In addition, CCS: The standard text is Robin Milner 1989 Communication and Concurrency Prentice-Hall International CSP: The standard text is C.A.R. (Tony) Hoare 1985 Communicating Sequential Processes Prentice Hall International A good collection of papers that demonstrates the relationships between the many approaches to concurrent semantics is Kwiatkowska M.Z., Shields M.W, and Thomas R.M. (eds) Semantics for concurrency, Leicester 1990 BCS/Springer 'Workshops in Computing' ISBN 3-540-19625-0 I've also got a couple of recent tech. reports concerning generalisations of trace theory for those who want them, but be warned that these are of a highly technical nature, and may not be of much relevance to you just yet. These are Kwiatkowska M.Z. and Stannett M. On transfinite traces CS-91-06 Stannett M. Trace convergence over infinite alphabets CS-91-08 Best wishes, Mike Stannett. From rba at vintage.bellcore.com Mon Oct 7 15:23:15 1991 From: rba at vintage.bellcore.com (Bob Allen) Date: Mon, 7 Oct 91 15:23:15 -0400 Subject: No subject Message-ID: <9110071923.AA12445@vintage.bellcore.com> Subject: Student Travel Grants for NIPS'91 Modest financial support for travel to the Neural Information Processing Systems (NIPS, Denver Dec 2-5, 1991) conference is available to students and other young researchers who are active in neural networks research. Those requesting support should send a one-page summary of their background and research interests, a cirriculum vitae, and their email address to: Dr. R.B. Allen NIPS Treasurer Bellcore MRE 2A-367 445 South Street Morristown, NJ 07960-1910 Travel grant check for those receiving awards will be available at the conference registration desk. From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU Tue Oct 8 13:00:00 1991 From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU) Date: Tue, 8 Oct 91 13:00 U Subject: Thank's Message-ID: <01GBHGORN4ZKD7Q01U@BITNET.CC.CMU.EDU> From aboulang%BBN.COM at CARNEGIE.BITNET Sun Oct 6 11:40:36 1991 From: aboulang%BBN.COM at CARNEGIE.BITNET (aboulang%BBN.COM@CARNEGIE.BITNET) Date: Sun, 6 Oct 91 11:40:36 EDT Subject: Detailed Balance In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU> Message-ID: <01GBFAUK6QK0D7QISN@BITNET.CC.CMU.EDU> We really should be saying that detailed balance in sampling implies a Gibbs distribution, but that the Gibbs distribution does not imply the use of a sampling procedure with detailed balance. There is some new work on this: J. Marroquin & A. Ramerez "Stochastic Cellular Automata with Gibbsian Invariant Measures" IEEE Trans Information Theory May(*), 1991 * I can't find the paper so I may have the month wrong. This is potentially good news to people trying to get annealing-type algorithms to work for fine-grained MIMD parallelism. Regrads, Albert Boulanger aboulanger at bbn.com From PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU Tue Oct 8 12:03:11 1991 From: PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU (Pal Ribarics) Date: Tue, 08 Oct 91 16:03:11 GMT Subject: NN Workshop Message-ID: <01GBI4QH7740D7POVG@BITNET.CC.CMU.EDU> ******************************************************************************* Dear Colleague , we would like to remind you of the deadline for sending abstracts to the topical workshop on Neural Networks within the Second International Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics Talks will be selected by the Organizing Committee on the basis of a detailed abstract to be submitted before: 15 October, 1991. to the address below. You will also find a registration form which was sent to you in a prior mail. Best regards B. Denby C. Kiesling C. Peterson P. Ribarics ======================================================================== SECOND INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS FOR HIGH ENERGY AND NUCLEAR PHYSICS 1992 January 13 - 18 L'AGELONDE FRANCE-TELECOM LA LONDE LES MAURES BP 64 F-83250 REGISTRATION NAME: FIRSTNAME: LABORATORY: COUNTRY ADDRESS: TEL: FAX: TELEX: E-MAIL: HOTEL RESERVATION (Number of persons): In the following you are expected to answer with the corresponding number or character from the list above. However if your interest is not mentioned in the list give a full description. WOULD YOU BE INTERESTED TO JOIN A WORKING GROUP OF THE ASTEC PROJECT ? YES/NO GROUP: SUBGROUP: WOULD YOU LIKE TO ATTEND TOPICAL WORKSHOPS OR TUTORIALS ? WORKSHOPS: TUTORIALS: WOULD YOU LIKE TO PRESENT A TALK ? YES/NO TALK TITLE: To be considered by the organizing committee, send an extended abstract before Oct. 15, 1991 to: Michele Jouhet Marie-claude Fert CERN L.A.P.P. - IN2P3 PPE-ADM B.P. 110 CH-1211 Geneve 23 F-74941 Annecy-Le-Vieux SWITZERLAND FRANCE Tel: (41) 22 767 21 23 Tel: (33) 50 23 32 45 Fax: (41) 22 767 65 55 Fax: (33) 50 27 94 95 Telex: 419 000 Telex: 385 180 F E-mail: jouhet at CERNVM Workshop fee : 700 FFr. Student : 500 FFr. Accommodation : 2000 FFr. Accompagning Person: +1200 FFr. To be paid by check: Title: International Workshop CREDIT LYONNAIS/Agence Internationale Bank: 30002 Guichet: 1000 Account: 909154 V Address: LYON REPUBLIQUE The accommodation includes: hotel-room, breakfast, lunch and dinner for 6 days. Tennis, mountain bike and other activities will be available. Denis Perret-Gallix Tel: (41) 22 767 62 93 E-mail: Perretg at CERNVM Fax: (41) 22 782 89 23 From squires at cs.wisc.edu Wed Oct 9 03:22:37 1991 From: squires at cs.wisc.edu (Charles Squires) Date: Wed, 9 Oct 91 02:22:37 -0500 Subject: 3 reports available Message-ID: <9110090722.AA17071@mozzarella.cs.wisc.edu> *** PLEASE DO NOT FORWARD TO OTHER LISTS *** The following three working papers have been placed in the neuroprose archive: -Maclin, R. and Shavlik, J.W., Refining Algorithms with Knowledge-Based Neural Networks: Improving the Chou-Fasman Algorithm for Protein Folding, Machine Learning Research Group Working Paper 91-2. Neuroprose file name: maclin.fskbann.ps.Z -Scott, G.M., Shavlik, J.W., and Ray, W.H., Refining PID Controllers using Neural Networks, Machine Learning Research Group Working Paper 91-3. Neuroprose file name: scott.nnpid.ps.Z -Towell, G.G. and Shavlik, J.W., The Extraction of Refined Rules from Knowledge-Based Neural Networks, Machine Learning Research Group Working Paper 91-4. Neuroprose file name: towell.interpretation.ps.Z The abstract of each paper and ftp instructions follow: ---------- Refining Algorithms with Knowledge-Based Neural Networks: Improving the Chou-Fasman Algorithm for Protein Folding Richard Maclin Jude W. Shavlik Computer Sciences Dept. University of Wisconsin - Madison email: maclin at cs.wisc.edu We describe a method for using machine learning to refine algorithms represented as generalized finite-state automata. The knowledge in an automaton is translated into a corresponding artificial neural network, and then refined by applying backpropagation to a set of examples. Our technique for translating an automaton into a network extends the KBANN algorithm, a system that translates a set of propositional, non- recursive rules into a corresponding neural network. The topology and weights of the neural network are set by KBANN so that the network represents the knowledge in the rules. We present the extended system, FSKBANN, which augments the KBANN algorithm to handle finite-state automata. We employ FSKBANN to refine the Chou-Fasman algorithm, a method for predicting how globular proteins fold. The Chou-Fasman algorithm cannot be elegantly formalized using non-recursive rules, but can be concisely described as a finite-state automaton. Empirical evidence shows that the refined algorithm FSKBANN produces is statistically significantly more accurate than both the original Chou-Fasman algorithm and a neural network trained using the standard approach. We also provide extensive statistics on the type of errors each of the three approaches makes and discuss the need for better definitions of solution quality for the protein- folding problem. ---------- Refining PID Controllers using Neural Networks Gary M. Scott (Chemical Engineering) Jude W. Shavlik (Computer Sciences) W. Harmon Ray (Chemical Engineering) University of Wisconsin The KBANN (Knowledge-Based Artificial Neural Networks) approach uses neural networks to refine knowledge that can be written in the form of simple propositional rules. We extend this idea further by presenting the MANNCON (Multivariable Artif- icial Neural Network Control) algorithm by which the mathematical equations governing a PID (Proportional-Integral-Derivative) con- troller determine the topology and initial weights of a network, which is further trained using backpropagation. We apply this method to the task of controlling the outflow and temperature of a water tank, producing statistically- significant gains in accu- racy over both a standard neural network approach and a non- learning PID controller. Furthermore, using the PID knowledge to initialize the weights of the network produces statistically less variation in testset accuracy when compared to networks initial- ized with small random numbers. ---------- The Extraction of Refined Rules from Knowledge-Based Neural Networks Geoffrey G. Towell Jude W. Shavlik Department of Computer Science University of Wisconsin E-mail Address: towell at cs.wisc.edu Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this paper, we propose and empirically evaluate a method for the final, and possibly most difficult, step. This method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules: (1) closely reproduce (and can even exceed) the accuracy of the network from which they are extracted; (2) are superior to the rules produced by methods that directly refine symbolic rules; (3) are superior to those produced by previous techniques for extracting rules from trained neural networks; (4) are ``human comprehensible.'' Thus, the method demonstrates that neural networks can be an effective tool for the refinement of symbolic knowledge. Moreover, the rule-extraction technique developed herein contributes to the understanding of how symbolic and connectionist approaches to artificial intelligence can be profitably integrated. ---------- FTP Instructions: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclin.fskbann.ps.Z OR... get scott.nnpid.ps.Z OR... get towell.interpretation.ps.Z ftp> quit unix> uncompress maclin.fskbann.ps.Z OR... uncompress scott.nnpid.ps.Z OR... uncompress towell.interpretation.ps.Z unix> lpr maclin.fskbann.ps OR... lpr scott.nnpid.ps OR... lpr towell.interpretation.ps (or use whatever command you use to print PostScript) From danielg at cogs.sussex.ac.uk Wed Oct 9 07:07:27 1991 From: danielg at cogs.sussex.ac.uk (Daniel Glaser) Date: Wed, 9 Oct 91 12:07:27 +0100 Subject: Restrictions on recurrent learning Message-ID: <29747.9110091107@rsunx.cogs.susx.ac.uk> I have been working on some simple recurrent networks as defined by Jordan(1986) and Elman(1990), and am interested in the class of temporal regularities that they can learn. In particular, how do they compare with more general back propagation through time defined by the PDP group(1986) and Werbos(1990) ? In the Jordan/Elman nets, activation flows forward in time from `copies' of units from previous cycles, and thus, during learning, error only propagates backwards locally in time. Does anyone know of any theoretical or empirical work on what these different types of network can learn ? If replies are addressed to me personally, I will post a summary in due course. Thanks Daniel. References: Elman, J.~L. (1990). Finding structure in time. {\em Cognitive Science}, {\bf 14}:179--211. Jordan, M.~I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In {\em Proceedings of the Eighth Annual Meeting of the Cognitive Science Society}, Hillsdale, NJ. Erlbaum. Rumelhart, D.~E., McClelland, J.~L., \& Williams, R.~J. (1986). Learning internal representations by error propagation. In D.~E. Rumelhart \& J.~L. McClelland (Eds.), {\em Parallel Distributed Processing: Explorations in the Microstructure of Cognition}, volume~1 chapter~8. Cambridge, MA: MIT Press/Bradford Books. Werbos, P.~J. (1990). Backpropagation through time: What it does and how to do it. {\em Proceedings of the IEEE}, 78(10):1550--1560. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Wed Oct 9 14:05:33 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Wed, 09 Oct 91 14:05:33 -0400 Subject: Recurrent Cascade-Correlation Code Message-ID: Simulation code for the Recurrent Cascade-Correlation (RCC) algorithm, previously available only in Common Lisp, has now been translated into C by Conor Doherty of the University College of Dublin (Ireland). This code is a modification of the C program for original Cascade-Correlation, written by Scott Crowder of Carnegie Mellon. My thanks to Conor and Scott for their help in making these programs available to the barbarian hordes who speak only C. For a description of this algorithm, see Scott E. Fahlman, "The Recurrent Cascade-Correlation Architecture" in Advances in Neural Information Processing Systems 3, edited by R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Morgan Kaufmann Publishers, 1991. Alternatively, see the tech report mentioned below. The instructions for accessing any of this code via FTP are included at the end of this message. Scott E. Fahlman School of Computer Science Carnegie Mellon University =========================================================================== Public-domain simulation programs for the Quickprop, Cascade-Correlation, and Recurrent Cascade-Correlation learning algorithms are available via anonymous FTP on the Internet. This code is distributed without charge on an "as is" basis. There is no warranty of any kind by the authors or by Carnegie-Mellon University. Instructions for obtaining the code via FTP are included below. If you can't get it by FTP, contact me by E-mail (sef+ at cs.cmu.edu) and I'll try *once* to mail it to you. Specify whether you want the C or Lisp version. If it bounces or your mailer rejects such a large message, I don't have time to try a lot of other delivery methods. I am maintaining an E-mail list of people using this code so that I can notify them of any changes or problems that occur. I would appreciate hearing about any interesting applications of this code, and will try to help with any problems people run into. Of course, if the code is incorporated into any products or larger systems, I would appreciate an acknowledgement of where it came from. If for some reason these programs do not work for you, please contact me and I'll try to help. Common errors: (1) Some people don't notice that the symmetric sigmoid output units in cascor have a range of -0.5 to +0.5 (for reasons that are mostly historical). If you try to force this algorithm to produce an output of +1.0 or +37.3, it isn't going to work. (2) Note that quickprop (which is used inside of Cascade-Correlation) is designed to update the weights after every epoch, and it assumes that all the epochs are identical. If you try to run this code updating after every training case, you will lose badly. If you want to change the training set, it is important to zero out the PREV-SLOPES and DELTAS vectors, and also to re=build the caches in Cascade-Correlation. HOW TO GET IT: For people (at CMU, MIT, and soon some other places) with access to the Andrew File System (AFS), you can access the files directly from directory "/afs/cs.cmu.edu/project/connect/code". This file system uses the same syntactic conventions as BSD Unix: case sensitive names, slashes for subdirectories, no version numbers, etc. The protection scheme is a bit different, but that shouldn't matter to people just trying to read these files. For people accessing these files via FTP: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu". The internet address of this machine is 128.2.254.155, for those who need it. 2. Log in as user "anonymous" with your own ID as password. You may see an error message that says "filenames may not have /.. in them" or something like that. Just ignore it. 3. Change remote directory to "/afs/cs/project/connect/code". NOTE: you must do this in a single operation. 4. At this point FTP should be able to get a listing of files in this directory with DIR and fetch the ones you want with GET. (The exact FTP commands you use depend on your local FTP server.) Current contents: quickprop1.lisp Original Common Lisp version of Quickprop. quickprop1.c C version by Terry Regier, U. Cal. Berkeley. cascor1.lisp Original Common Lisp version of Cascade-Correlation. cascor1.c C version by Scott Crowder, Carnegie Mellon rcc1.lisp Common Lisp version of Recurrent Cascade-Correlation. rcc1.c C version, trans. by Conor Doherty, Univ. Coll. Dublin vowel.c Code for Tony Robinson's vowel benchmark. am4.tar.Z Aspirin/Migraine code from MITRE. backprop.lisp Overlay for quickprop1.lisp. Turns it into backprop. --------------------------------------------------------------------------- Tech reports describing these algorithms can also be obtained via FTP. These are Postscript files, processed with the Unix compress/uncompress program. Follow the steps for FTP access as above, but cd to directory unix> ftp pt.cs.cmu.edu (or 128.2.254.155) Name: anonymous Password: ftp> cd /afs/cs/project/connect/tr ftp> binary ftp> get filename.ps.Z ftp> quit unix> uncompress filename.ps.Z unix> lpr filename.ps (or however you print postscript files) For "filename", sustitute the following: cascor-tr Cascade-Correlation paper. qp-tr Paper on Quickprop and other backprop speedups. rcc-tr Recurrent Cascade-Correlation paper. precision Hoehfeld-Fahlman paper on Cascade-Correlation with limited numerical precision. From B344DSL at UTARLG.UTA.EDU Wed Oct 9 23:55:00 1991 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Wed, 9 Oct 1991 22:55 CDT Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu> ANNOUNCEMENT AND CALL FOR ABSTRACTS WORKSHOP ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas SIG of the International Neural Network Society (INNS). To be held at a loca- tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday, February 6-8, 1992. Confirmed speakers include: Stephen Grossberg (Boston University) Stephen Hampson (University of California, Irvine) Karl Pribram (Radford University) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) The focus of this conference will be twofold: (1) how to optimize different aspects of neural and cognitive function and (2) whether particular natural or artificial solutions to specific neural or cognitive problems are in fact opti- mal. Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat- egorization, sensory perception, and motor control. The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions. Speakers will not be re- quired to write a paper, but will be invited to contribute chapters to a book several months after the conference. Books based on two previous MIND conferen- ces -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published by Lawrence Erlbaum Associates. Registration for the conference will be $80 for non-students, $20 for students, with a $10 rebate for MIND or Texas SIG membership. We will try to arrange for discounted air fares from American Airlines as we have done in the past. Those interested in presenting should send me a short (1-3 paragraph) abstract by December 1, 1991, using either e-mail, FAX, or snail mail. Notification of ac- ceptance will be given December 15, 1991. We will not be holding parallel ses- sions, so there are limitations on the number of speakers. However, individu- als who send high-quality abstracts that cannot be accommodated in actual talks will have space to present their work in posters at the conference, and will also be invited to contribute to the book. Prof. Daniel S. Levine Department of Mathematics University of Texas at Arlington Arlington, TX 76019-0408 e-mail: b344dsl at utarlg.uta.edu FAX: 817-794-5802 Telephone: 817-273-3598 From bessiere at imag.fr Thu Oct 10 12:48:37 1991 From: bessiere at imag.fr (Pierre Bessiere) Date: Thu, 10 Oct 1991 17:48:37 +0100 Subject: 4 reports available Message-ID: <9110101648.AA09388@imag.imag.fr> The following four papers/reports have been placed in the neuroprose archive: - Bessiere, P.; "Toward a synthetic cognitive paradigm: Probabilistic Inference"; Conference COGNITIVA90, Madrid, Spain, 1990 Neuroprose file name: bessiere.cognitiva90.ps.Z - Talbi, E-G. & Bessiere, P.; "A parallel genetic algorithm for the graph partitioning problem"; ACM-ICS91 (Conference on Super Computing), Cologne, Germany, 1991 Neuroprose file name: bessiere.acm-ics91.ps.Z - Bessiere, P., Chams, A. & Muntean, T.; "A virtual machine model for artificial neural network programming"; INNC90 (International Neural Networks Conference), Paris, France, 1990 Neuroprose file name: bessiere.innc90.ps.Z - Bessiere, P., Chams, A. & Chol, P.; "MENTAL: a virtual machine approach to artificial neural networks programming"; ESPRIT B.R.A. project NERVES (3049), final report, 1991 The abstract of each paper and ftp instructions follow: ---------- TOWARD A SYNTHETIC COGNITIVE PARADIGM: PROBABILISTIC INFERENCE Cognitive science is a very active field of scientific interest. It turns out to be a "melting pot" of ideas coming from very different areas. One of the principal hopes is that some synthetic cognitive paradigms will emerge from this interdisciplinary "brain storming". The goal of this paper is to answer the question: "Given the state of the art, is there any hints indicating the emergence of such synthetic paradigms?" The main thesis of the paper is that there is a good candidate, namely, the probabilistic inference paradigm. In support of the above thesis the structure of the paper is as follows: - in a first part, we identify five criteria to qualify as a synthetic cognitive paradigm (validity, self consistency, competence, feasibility and mimetic power); - in the second paragraph, the principles of probabilistic inference are reviewed and justifications of validity and self consistency of this paradigm are given (Marr's computational level); - then, the competence criterion is discussed, considering the efficiency of probabilistic inference for dealing with the different classical cognitive riddles and analyzing the relationships of probabilistic inference with several of the usual connexionist formalisms (Marr's algorithmic level); - the criteria of feasibility (condition of computer implementation) and mimetic power (adequation with what is known of the architecture of the nervous system) are finally considered in the fourth part (Marr's implementation level). As a conclusion, it will appear that probabilistic inference is at least a very interesting framework to get a synthetic overview of a number of works in the area and to identify and formalize the most puzzling questions. Some of these questions will be listed. In fact, probabilistic inference will appear finally to be able to play the same role for computational cognitive science that formal logic has played for classical symbolic Artificial Intelligence: a sound mathematical foundation serving as a guide line, as a constant reference and as a source of inspiration. ---------- A PARALLEL GENETIC ALGORITHM FOR THE GRAPH PARTITIONING PROBLEM Genetic algorithms are stochastic search and optimization techniques which can be usedf for a wide range of applications. This paper addresses the application of genetic algorithms to the graph partitioning problem. Standard genetic algorithms with large populations suffer from lack of efficiency (quite high execution time). A massively parallel genetic algorithm is proposed, an implementation on a SuperNode( of Transputers( and results of various benchmarks are given. The parallel algorithm shows a superlinear speed-up, in the sense that when multiplying the number of processors by p, the time spent to reach a solution with a given score, is divided by kp (k>1). A comparative analysis of our approach with hill-climbing algorithms and simulated annealing is also presented. The experimental measures show that our algorithm gives better results concerning both the quality of the solution and the time needed to reach it. ---------- A VIRTUAL MACHINE MODEL FOR ARTIFICIAL NEURAL NETWORK PROGRAMMING This paper introduces the model of a virtual machine for A.N.N. (Artificial Neural Networks). The context of this work is a collaborative project to study new V.L.S.I. implementations and new architectures for neuronal machines. The work consists in the specification and a prototype implementation of a description language for A.N.N., of the associated virtual machine, of the compiler between them and of the compilers mapping the virtual machine on different highly parallel computers. In this short paper we present the virtual machine model which combines the features of various parallel programming paradigms. Our model allows, in particular, to have the same A.N.N. program running on both synchronous or asynchronous type of machines. In this framework a parallel architecture (S.M.A.R.T.) and a dynamically reconfigurable parallel machine of Transputers (SuperNode) are considered as target machines. ---------- MENTAL: A VIRTUAL MACHINE APPROACH TO ARTIFICIAL NEURAL NETWORKS PROGRAMMING (ATTENTION: 100 pages) This report treats (extensively) the same subject than the short paper described just above. Some parts are extracted from the three previouly presented papers. ---------- These reports may be FTP from either neuroprose archives or from my own server (IMAG): How to get files from the Neuroprose archives? ______________________________________________ Anonymous ftp on: - archive.cis.ohio-state.edu (128.146.8.52) mymachine>ftp archive.cis.ohio-state.edu Name: anonymous Password: yourname at youradress ftp>cd pub/neuroprose ftp>binary ftp>get bessiere.foo.ps.Z ftp>quit mymachine>uncompress bessiere.foo.ps.Z How to get files from IMAG? ___________________________ Anonymous ftp on: - 129.88.32.1 mymachine>ftp 129.88.32.1 Name: anonymous Password: yourname at youradress ftp>cd pub/SYMPA/NNandGA ftp>binary ftp>get bessiere.foo.ps.Z ftp>quit mymachine>uncompress bessiere.foo.ps.Z -- Pierre BESSIERE *************** IMAG/LGI phone: BP 53X Work: 33/76.51.45.72 38041 Grenoble Cedex Home: 33/76.51.16.15 FRANCE Fax: 33/76.44.66.75 Telex:UJF 980 134 F E-Mail: bessiere at imag.imag.fr C'est au savant moderne que convient, plus qu'a tout autre, l'austere conseil de Kipling: "Si tu peux voir s'ecrouler soudain l'ouvrage de ta vie, et te remettre au travail, si tu peux souffrir, lutter, mourrir sans murmurer, tu seras un homme , mon fils." Dans l'oeuvre de la science seulement on peut aimer ce qu on detruit, on peut continuer le passe en le niant, on peut venerer son maitre en le contredisant. GASTON BACHELARD From gary at cs.UCSD.EDU Thu Oct 10 13:26:04 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Thu, 10 Oct 91 10:26:04 PDT Subject: Restrictions on recurrent learning Message-ID: <9110101726.AA24233@desi.ucsd.edu> Fu-Sheng Tsung and I showed there were problems that a hidden-recurrent (Elman-style) net can learn that an output-recurrent Jordan net can't in our 1989 paper in IJCNN: Tsung, Fu-Sheng and Cottrell, G. (1989) A sequential adder using recurrent networks. In \fIProceedings of the International Joint Conference on Neural Networks\fP, Washington, D.C. A similar paper with some state space analysis is in: Cottrell, G. and Fu-sheng Tsung (1991). Learning simple arithmetic procedures. In J.A. Barnden & J.B. Pollack (Eds), \fIAdvances in connectionist and neural computation theory, Vol 1: High-level connectionist models\fP, Norwood: Ablex. There are simple logical arguments that show that hidden-recurrent nets are more powerful than output-recurrent nets. The bottom line is that if there is a problem where the teaching signal forces "forgetting" of the input, then a Jordan-style output-recurrent network cannot respond to things that require remembering it. Hal White also believes Elman nets are strictly more powerful than Jordan nets, but I'm not sure he has a proof. gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering C-014 UCSD, La Jolla, Ca. 92093 gary at cs.ucsd.edu (INTERNET) {ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET) gcottrell at ucsd.edu (BITNET) From ECONEC at vax.oxford.ac.uk Fri Oct 11 11:39:00 1991 From: ECONEC at vax.oxford.ac.uk (ECONEC@vax.oxford.ac.uk) Date: Fri, 11 Oct 91 11:39 BST Subject: REQUEST FOR INFORMATION: NNs AND ECONOMICS Message-ID: REQUEST FOR INFORMATION I am studying for an MLitt/DPhil at the Oxford University and would be very grateful for some help. This message is being transmitted to several relevant lists and please feel free to forward it to anyone who might be interested. Apologies in advance to anyone who gets fed up with seeing it! 1) REQUEST: I am interested in references and names for work broadly in the area of AI techniques applied to economics. To narrow this down, I am interested in AI as a tool for developing alternative models of economic behaviour than the traditional view of man as a perfectly informed calculating machine! Because of the behavioural aspect and my preference for economic theory I am hoping to avoid work that simply uses AI techniques to solve traditional models faster. (GAs as function optimisers for instance.) Similarly I am not seeking information on decision support or Expert Systems unless they make some attempt (or claim) to emulate human decision making behaviour. (Default Logics? Frames?) Please err on the side of completeness! 2) OFFER: Obviously I can provide summaries of my findings to various lists in the usual way. (Perhaps you could say where you saw my post so I can keep the summaries relevant to each list.) What I would also like to do is find out whether there is any interest in an adhoc email list of people working in this area. Or if there is one already I would very much like to hear about it. I'm sure such things have been going for years in the US but information here in the UK seems very sparse. I would be quite happy to "maintain" an unofficial bulletin board or mailing list if one does not exist. Many thanks in advance for any help and please feel free to contact me on any aspect of this posting. Edmund Chattoe SNAIL: LADY MARGARET HALL OXFORD OXON OX2 6QA From lyn at dcs.exeter.ac.uk Fri Oct 11 13:56:49 1991 From: lyn at dcs.exeter.ac.uk (Lyn Shackleton) Date: Fri, 11 Oct 91 13:56:49 BST Subject: special deal for Connection Science Message-ID: <11273.9110111256@castor.dcs.exeter.ac.uk> ********** CONNECTION SCIENCE SPECIAL ISSUE ****************** CONNECTIONIST MODELLING OF PSYCHOLOGICAL PROCESSES VOLUME 3.2 (out now) EDITOR Noel Sharkey SPECIAL BOARD Jim Anderson Andy Barto Thomas Bever Glyn Humphreys Walter Kintsch Dennis Norris Kim Plunkett Ronan Reilly Dave Rumelhart Antony Sanford CONTENTS J R Levenick:NAPS: a connectionist implementation of cognitive maps. A Pouget & S J Thorpe: Connectionist models of orientation identification. D R Shanks: A connectionist account of base-rate biases in categorization. A J O'Toole, K Deffenbacher, H Abdi & J Bartlett: Simulating the "Other-race effect" as a problem in perceptual learning. S Kaplan, M Sonntag & E Chown: Tracing recurrent activity of cognitive elements (TRACE): a model of temporal dynamics in a cell assembly. Research Notes: A H Kawamoto & S N Kitzis: Time course of regular and irregular pronunciations. A VERY SPECIAL DEAL FOR MEMBERS OF THE CONNECTIONISTS MAILING. Prices for members of this list will now be: North America 44 US Dollars (reduced from 126 dollars) Elsewhere and U.K. 22 pounds sterling. (Sterling checks must be drawn on a UK bank) These rates start from 1st January 1992 (volume 4). Conditions: 1. Personal use only (i.e. non-institutional). 2. Must subscribe from your private address. You can receive a subscription form by emailing direct to the publisher: email: carfax at ibmpcug.co.uk Say for the attention of David Green and say CONNECTIONISTS MAILING LIST. noel From mclennan at cs.utk.edu Fri Oct 11 17:43:22 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Fri, 11 Oct 91 17:43:22 -0400 Subject: report: contin. symbol systems Message-ID: <9110112143.AA01451@maclennan.cs.utk.edu> ** Please do not forward to other boards. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. N.B. The uncompressed file is long (1.82 MB), so you may have to use the -s (symbolic link) option on lpr to print it. ----------------------------------------------------- Continuous Symbol Systems The Logic of Connectionism Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-145 ABSTRACT: It has been long assumed that knowledge and thought are most naturally represented as _discrete_symbol_systems_ (calculi). Thus a major contribution of connectionism is that it provides an alternative model of knowledge and cognition that avoids many of the limitations of the traditional approach. But what idea serves for connectionism the same unifying role that the idea of a calculus served for the traditional theories? We claim it is the idea of a _continuous_symbol_system_. This paper presents a preliminary formulation of continuous sym- bol systems and indicates how they may aid the understanding and development of connectionist theories. It begins with a brief phenomenological analysis of the discrete and continuous; the aim of this analysis is to directly contrast the two kinds of symbols systems and identify their distinguishing characteristics. Next, based on the phenomenological analysis and on other observations of existing continuous symbol systems and connectionist models, I sketch a mathematical characterization of these systems. Finally the paper turns to some applications of the theory and to its implications for knowledge representation and the theory of com- putation in a connectionist context. Specific problems addressed include decomposition of connectionist spaces, representation of recursive structures, properties of connectionist categories, and decidability in continuous formal systems. A preliminary version of this paper was presented at the workshop "Neural Networks for Knowledge Representation, Fourth Annual Workshop of the Metroplex Institute for Neural Dynamics (MIND)," Westlake TX, October 4-6, 1990. Also presented at "ConnectFest 1990," sponsored by Indiana University Center for Research in Concepts and Cognition, November 3-4, 1990. ----------------------------------------------------- FTP INSTRUCTIONS Either use "Getps maclennan.css.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.css.ps.Z ftp> quit unix> uncompress maclennan.css.ps.Z unix> lpr -s maclennan.css.ps (or however you print postscript) Note that the postscript version is missing three (nonessential) figures that have been pasted into the hardcopy version. If you need hardcopy, then send your request to: library at cs.utk.edu Bruce MacLennan Department of Computer Science 107 Ayres Hall The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From david at cns.edinburgh.ac.uk Sun Oct 13 17:20:34 1991 From: david at cns.edinburgh.ac.uk (David Willshaw) Date: Sun, 13 Oct 91 17:20:34 BST Subject: Computational Neureoscientist post Message-ID: <4519.9110131620@subnode.cns.ed.ac.uk> UNIVERSITY OF OXFORD MRC Centre in Brain and Behaviour The Medical Research Council has awarded a 7-year grant to establish a Research Centre in Brain and Behaviour, based at the University of Oxford, and also involving scientists from other universities including Birmingham, Cambridge, Durham, Edinburgh and London. The main theme of the Research Centre is the organisation, function, development and disorders of the cerebral cortex, and central to this theme is the exploration of the cortex as an instrument of computation. To this end, the Centre carries out research involving many different methodologies, in the areas of sensory systems, learning and memory, and motor control. Applications are invited for the post of Computational Neuroscientist to work on theoretical aspects of learning and memory. The post will be based at the University of Edinburgh, where the post-holder will be expected to spend 80% of his/her time. The remaining time will be spent in linking with complementary work being carried out by other participants of the Centre, particularly at the universities of Oxford and Cambridge. A range of projects is available, and prospective applicants are encouraged to discuss their plans with Dr David Willshaw of the University of Edinburgh. Two possibilities which are compatible with present work are: 1) Development of a model of the mammalian hippocampal formation as an associative memory; 2) Investigation of associative and error-correcting models of cerebellar function as implemented in a biologically realistic form. This appointment, which is available from January 1992 for 2 years in the first instance and potentially renewable for a further 4 years, will be made on the RS1A scale (currently 11,969-19,073 pounds p.a. with a discretionary scale rising to 21,391 pounds p.a.). Applications (including the name and address of two referees) should be sent to Ms Catherine Greasley, Administrative Secretary, MRC Research Centre in Brain and Behaviour, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD (telephone (0865) 271364 - mornings only) no later than Friday 8 November 1991. The University of Oxford is an Equal Opportunities Employer David Willshaw Centre for Cognitive Science 2 Buccleuch Place Edinburgh EH8 9LW UK Tel: (+44) 31 650 4404 Fax: (+44) 31 650 4587 Email: d.willshaw at edinburgh.ac.uk From harnad at Princeton.EDU Sun Oct 13 19:51:05 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 13 Oct 91 19:51:05 EDT Subject: Newell's Unified Theories of Cognition: BBS Call for Book Reviewers Message-ID: <9110132351.AA08163@psycho> Below is the abstract of a book that will be accorded multiple book review in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal that provides Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. Commentators must be current BBS Associates or nominated by a current BBS Associate. To be considered as a commentator on this book, to suggest other appropriate commentators, or for information about how to become a BBS Associate, please send email to: harnad at clarity.princeton.edu or harnad at pucc.bitnet or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] To help us put together a balanced list of commentators, please give some indication of the aspects of the topic on which you would bring your areas of expertise to bear if you are selected as a commentator. ____________________________________________________________________ BBS Multiple Book Review of: UNIFIED THEORIES OF COGNITION (Harvard University Press, 1990) Allen Newell School of Computer Science Carnegie-Mellon University This book presents the case that cognitive science should turn its attention to developing theories of human cognition that cover the full range of human perceptual, cognitive, and action phenomena. Cognitive science has now produced a massive number of high quality regularities with many microtheories that reveal important mechanisms. The need for integration is pressing and will continue to increase. Equally important, cognitive science now has the theoretical concepts and tools to support serious attempts at unified theories. The argument is made entirely by presenting an exemplar unified theory of cognition both to show what a real unified theory would be like and to provide convincing evidence that such theories are feasible. The exemplar is Soar, a cognitive architecture realized as a software system. After a detailed discussion of the architecture and its properties, with its relation to the constraints on cognition in the real world and to existing ideas in cognitive science, Soar is used as a theory for a wide range of cognitive phenomena: immediate responses (stimulus-response compatibility and the Sternberg phenomena); discrete motor skills (transcription typing); memory and learning (episodic memory and the acquisition of skill through practice); problem solving (cryptarithmetic puzzles and syllogistic reasoning); language (sentence verification and taking instructions); and development (transitions in the balance beam task). The treatments vary in depth and adequacy, but they clearly reveal a single, highly specific, operational theory that works over the entire range of human cognition. Soar is presented as an exemplar unified theory, not as the sole candidate. Cognitive science is not ready yet for a single theory -- there must be multiple attempts. But cognitive science must begin to work towards such unified theories. From kamil at apple.com Mon Oct 14 19:41:34 1991 From: kamil at apple.com (Kamil A. Grajski) Date: Mon, 14 Oct 91 16:41:34 -0700 Subject: batch-mode parallel implementations Message-ID: <9110142341.AA19545@apple.com> Hi folks, In reviewing some implementations of back-prop type algorithms on parallel machines, it is apparent that several such implementations obtain their high performance because of batch-mode training. What this means is that one operates on N independent training patterns simultaneously and then collects all the weight update information and reestimates once per N samples. Example where this has been used (among others) are the GF-111, MasPar, CM-2, Warp (I think, at least for a self-org feature map implementation), etc. In many papers, I have read passing references to the fact that real-time learning is preferred (in practice) over the theoretically indicated batch-mode (so-called "true gradient") learning. Some of the arguments given include "faster" convergence and "better" generalization. Are the convergence and generalization arguments linked at some deeper level of analysis? (You could have fast convergence which generalizes poorly, etc.) I have played with this just a little bit on small speech and other datasets without reaching any conclusive results. I am wondering whether there have been some definitive studies, theoretical and/or practical which really confront this issue? How big an issue is this for people? For example, would you NOT look at a parallel design which assumes batch-mode training? Kamil P.S. If this is a dead issue and I missed the funeral, I apologize. ================ Kamil A. Grajski Apple Computer (408) 974-1313 kamil at apple.com ================ From B344DSL at UTARLG.UTA.EDU Mon Oct 14 14:14:00 1991 From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU) Date: Mon, 14 Oct 1991 13:14 CDT Subject: Announcement of talk by Pribram at Georgetown, Oct. 18 Message-ID: <01GBQK3RGXY80003LS@utarlg.uta.edu> From: IN%"PRUEITT at guvax.georgetown.edu" "Paul S. Prueitt" 14-OCT-1991 12:29:48.05 To: IN%"kpribram at ruacad.ac.runet.edu" "kpribram" CC: IN%"duziakm at isnet.inmos.com" "duziakm", IN%"pwerbos at note.nsf.gov" "pwerbos", IN%"liwu at aic.nrl.navy.mil" "liwu", IN%"kugler at rucs2.sunlab.cs.runet.edu" "kugler", IN%"medsker at AUVM.BITNET" "medsker", IN%"b344dsl at UTARLG.UTA.EDU" "b344dsl", IN%"prueitt Subj: Pribram's Talk on Friday From PRUEITT at guvax.georgetown.edu Mon Oct 14 14:15:00 1991 From: PRUEITT at guvax.georgetown.edu (Paul S. Prueitt) Date: 14 Oct 91 13:15:00 EST Subject: Pribram's Talk on Friday Message-ID: <01GBQIANVZ28000315@utarlg.uta.edu> Please Communicate within your group *********************Please post and forward on E-mail******************* ******************** Georgetown University Physics Department and Neural Network Research Facility 1991-92 Colloquium Series on Behavioral and Computational Neuroscience Friday, October 18th 4:00 P.M. to 6:00 P.M. Auditorium Room 112 Reiss Building, Georgetown University Refreshments at 3:30 P.M. in Room 505 Dr. Karl Pribram **************** Center for Brain Research and Informational Sciences, Radford University Brain and Perception, Holonomy and Structure in Figural Processing Dr. Pribram will discuss topics from his new book; Brain and Perception, Holonomy and Structure in Figural Processing. A one hour prepared lecture is to be followed by a one hour discussion. The book is now available from Dr. Edward J. Finn, Chairman of the G.U. Physics Department or from Lawrence Erlbaum Associates. Professor Pribram will autograph copies of the book after the Colloquium. ************************************************************************* For additional information please call Edward Finn at 202-687-6231. Parking: Use Georgetown Univ. Entrance One from Reservoir Road (Northern Boundary) ******************** From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 15 01:23:47 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 15 Oct 91 01:23:47 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Mon, 14 Oct 91 16:41:34 -0800. <9110142341.AA19545@apple.com> Message-ID: I don't recall seeing any studies that claim better generalization for per-sample or continuous updating than for per-epoch or batch updating. Can you supply some citations? The only reason I can think of for better generalization in the per-sample case would be a weak sort of simulated-annealing effect, with the random variation among individual training samples helping to jiggle the system out of small local minima in the vicinity of the best answer. As for speed of convergence, continuous updating clearly beats per-epoch updating if the training set is highly redundant. To see this, imagine taking a small set or training cases, duplicating that set 1000 times, and presenting the resulting huge set as the training set. Per-sample updating would probably have converged on a good set of weights before the first per-epoch weight adjustment is ever made. Also, in some cases it just is not practical to use per-epoch updating. There may be a stream of ever-changing data going by, and it may be impractical to store a large set of samples from this data stream for repeated use. On the other hand, it is rather dangerous to use continuous updating with high learning rates or with techniques that adjust the learning rate based on some sort of second-derivative estimate. If you are not very careful, a few atypical cases in a row can accelerate you right out of the solar system. Some techniques, such as quickprop and most of the conjugate gradient methods, depend on the ability to look at the same set of training examples more than once, so they inherently are per-epoch models. In my opinion, the best solution in most situations is probably to use one of the accelerated convergence methods and to update the weights after an "epoch" whose size is chosen by the experimenter. It must be sufficiently large to give a reasonably stable picture of the overall gradient, but not so large that the gradient is computed many times over before a weight-update cycle occurs. However, I am sure that this view is not universally accepted: some people seem to believe that per-sample updating is superior in all cases. -- Scott Fahlman From castillo at eel.upc.es Tue Oct 15 11:16:28 1991 From: castillo at eel.upc.es (Francisco Castillo Cobo) Date: Tue, 15 Oct 1991 11:16:28 UTC+0100 Subject: add to NEURAS-LIST Message-ID: <"114*/S=castillo/OU=eel/O=upc/PRMD=iris/C=es/"@MHS> Hi, I am currently compiling a list of incremental (or growing) neural network s, I have some already identified, including RCE and Tiling. I am interested in receiving additional references on the matter and would be glad to summarize t he responses and se nd them to anyone who might be interested. Thanx! F.Castillo From ecal at cgref.cemagref.fr Tue Oct 15 08:06:13 1991 From: ecal at cgref.cemagref.fr (European Conference on Artificial Life) Date: Tue, 15 Oct 91 12:06:13 GMT Subject: ECAL91 programme Message-ID: <9110151206.AA11528@cgref> Please find enclosed an E-mail version of ECAL91 programme (more up-to-date than the paper programme). You can use the registration form enclosed, granted that you send your payment by regular mail at the given address. =====================CUT HERE=====================CUT HERE====================== 1st European Conference on Artificial Life ________________________________________________________________________________ PROGRAMME - PROGRAMME - PROGRAMME- PROGRAMME ________________________________________________________________________________ EEEEEEE CCCCCC AA LL 99 11 EE CC AA AA LL 99 99 11 EE CC AA AA LL 99 99 11 EE CC AA AA LL 99 99 11 EEEEE CC AAAAAAAAA LL 99999 11 EE CC AA AA LL 99 11 EE CC AA AA LL 99 11 EE CC AA AA LL 99 11 EEEEEEE CCCCC AA AA LLLLLLLL 9999 11 ________________________________________________________________________________ To be held on December 11-13 1991 in Centre des Congres de la Villette Salle Laser cite des Sciences et de l'Industrie Paris, France Publisher : MIT Press / Bradford Books Sponsors : la Cite CEMAGREF Banque de France CNR Fondation de France AFCET Electricite de France CREA Offilib ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Artificial life: a new scientific field Artificial life embodies a recent and important conceptual step in modern science: asserting that the core of intelligence and cognitive abilities is the same as the capacity for living. Metaphorically, artificial life would see in the modest insect rather than in the symbolic abilities of an expert the best prototype for intelligence . What needs to be understood and characterized is the class of processes that endow living creatures with their characteric autonomy, key properties such as viability, abduction and adaptability. The autonomy of the living beings is understood here both with regards to their actions and to the way in which they shape their world into significance. This exploration goes hand in hand with the theory, design and construction of simple autonomous agents. The recent surge of interest in 'artificial life' has to be understood in the context of the long tradition inaugurated with cybernetics, seeking common basis for the living and the artificial. Artificial life can take advantage of the years of research in the tradition of symbolic computation that still characterizes most of the research in artificial intelligence, as well as the more recent explosive development of neural networks and connectionist approaches. Artificial life also induces a renewal of a whole range of engineering traditions, such as control theory and robotics, beyond classical notions of goal and planning, into biologically inspired notions of viability and adaptation, situatedness and operational closure, thus putting evolutionary processes at the very center of the stage. The first European meeting intends to highlight the practice of such autonomous systems in all their forms, by hosting the presentation and discussion of the most recent research in the area. Beyond research results, another main intention of the meeting is to engage researchers and philosophers to examine the epistemological basis of this new trend. Only a sustained analysis of the main concepts and ideas can provide a fertile ground for important advances and a change of research paradigm. Conference Chairs : Paul Bourgine and Francisco Varela Programme Committee : H. Bersini, B Ch. G. Langton, USA R. Brooks, USA J. A. Meyer, F J. Demongeot, F H.Schwefel, FRG B. Goodwin, UK D. Parisi, I S. Kauffman, USA Organizing Committee : I. Alvarez V. Douzal L. Bochereau T. Fuhs G. Deffuant ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Wednesday December 11 8:00 REGISTRATION 9:30 WELCOME ADDRESS Paul BOURGINE, CEMAGREF - (F), Francisco VARELA, CREA - (F) 9:45 AUTONOMOUS ROBOTS (I) Invited lecture: Rodney BROOKS, MIT - (USA) "Robots and artificial life" Uwe SCHNEPF, GMD - (FRG), Mukesh J. PATEL, University of Sussex - (UK) "Concept formation as Emergent Phenomena" Rolf PFEIFER, Free University of Brussels - (B), Paul VERSCHURE, Univ. of California, Santa Cruz (USA) "Distributed adaptive control : a paradigm for autonomous agents" Break / refreshments Tim SMITHERS, University of Edimburgh - (UK) "Taking eliminative materialism seriously : a methodology for autonomous systems research" Leslie P. KAELBLING, Brown University - (USA) "An adaptable mobile robot" Pattie MAES, MIT - (USA) "Learning behavior networks from experience" 13:15 LUNCH 14:30 SWARM INTELLIGENCE Invited lecture: Jean-Louis DENEUBOURG, Free University of Brussels - (B) "Swarm-made architecture" Alberto COLORNI, Marco DORIGO, Vittorio MANIEZZO, Politecnico di Milano - (I) "Distributed optimization by ant colonies" Andrew M. ASSAD, Univ. of Illinois - (USA), Norman H. PACKARD, Inst. for Scientific Interchange - (I) "Emergent colonization in an artificial ecology" Gerardo BENI, Susan HACKWOOD, Univ. of California, Riverside - (USA) "The maximum entropy principle and sensing in swarm intelligence" Break / refreshments 17:00 EPISTEMOLOGICAL ISSUES Stefan HELMREICH, Stanford University - (USA) "The historical and epistemological ground of von Neumann's theory of self-reproducing automata and theory of games" Jean-Luc DORMOY, EDF - (F), Sylvie KORNMAN, LAFORIA - (F) "Meta-knowledge, autonomy and (artificial) evolution : some lessons learnt so far" 18:00 POSTERS AND DEMOS ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Thursday December 12 9:00 EPISTEMOLOGICAL ISSUES (Continued) R. Allen GARDNER, Beatrix T. GARDNER, University of Nevada - (USA) "A feedforward model of animal learning" Bernard MANDERICK, Free University of Brussels - (B) "Selectionist systems as cognitive systems" Break / refreshments 10:15 AUTONOMOUS ROBOTS (II) Ian HORSWILL, MIT - (USA) "Characterizing adaptation by constraint" Didier KEYMEULEN, Jo DECUYPER, Free University of Brussels - (B) "On the self-organizing properties of topological maps" Piet SPIESSENS, Jan TORREELE, Free University of Brussels - (B) "Massively parallel evolution of recurrent networks : an approach to temporal processing" Dave CLIFF, University of Sussex - (UK) "Neural networks for visual tracking in an artificial fly" 12:45 LUNCH 14:15 LEARNING AND EVOLUTION Invited lecture: Domenico PARISI, Stefano NOLFI, Federico CECCONI, CNR - (I) "Learning, behaviour, and evolution" Hugues BERSINI, Free University of Brussels - (B) "Immune network and adaptive control" Franck HOFFMEISTER, Thomas BACK , University of Dortmund - (FRG) "Genetic self-learning" Heinz MUHLENBEIN, GMD - (FRG) "Darwin's continent cycle theory and its simulation by the Prisoner's dilemna" Break / refreshments Melanie MITCHELL, John H. HOLLAND, University of Michigan - (USA), Stephanie FORREST, University of New Mexico - (USA) "The royal road for genetic algorithms : fitness landscapes and GA performance" Brad FULLMER, Risto MIIKKULAINEN, University of Texas - (USA) "Evolving finite state behaviour using marker-based genetic encoding of neural networks" 18:00 Invited lecture: Stuart KAUFMANN , University of Pennsylvania - (USA) "Waiting for Carnot" 20:30 DINNER ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Friday December 13 9:30 ADAPTIVE AND EVOLUTIONARY MECHANISMS Barry McMULLIN, Dublin City University - (UK) "The Holland alpha-Universes revisited" Robert J. COLLINS, David R. JEFFERSON, University of California - (USA) "The evolution of sexual selection and female choice" Filippo MENCZER, Domenico PARISI, CNR - (I) "A model for the emergence of sex in evolving networks : adaptive advantage or random drift ?" Break / refreshments Inman HARVEY, University of Sussex - (UK) "Species adaptation genetic algorithms : a basis for a continuing SAGA" Jakob SKIPPER, Niels Bohr Institute - (Dk) "The complete zoo evolution in a box" Jeffrey HORN, University of Illinois - (USA) "Measuring the evolving complexity of stimulus-response organisms" 13:15 LUNCH 14:30 CONCEPTUAL FOUNDATIONS Hugues BERSINI, Free University of Brussels - (B) "Animat's I" Claus EMMECHE, Institute of Computer and Systems Sciences - (Dk) "Life as an abstract phenomenon : is Artificial Life possible ?" John STEWART - Paris (F) "Life=cognition : the epistemological and ontological signifance of Artificial Life" Break / refreshments Peter CARIANI, Boston - (USA) "Some epistemological implications of devices which construct their own sensors and effectors" Mark A. BEDAU, Reed College - (USA) "Philosophical aspects of Articial Life" 17:30 CONCLUDING REMARKS ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ POSTER SESSION Petr KURKA, Charles University - (Cz) "Natural Selection in a population of automata" Thomas BACK, University of Dortmund - (FRG) "Self-adaptation in genetic algorithms" Robert DAVIDGE, University of Sussex - (UK) "Looking at life" Hugo de GARIS, Free University of Brussels - (B) "Streerable GenNets : the genetic programming of controllable behaviors in GenNets" Bruno MARCHAL, Free University of Brussels - (B) "Amoeba, planaria and dreaming machines" Alexis DROGOUL, Jacques FERBER, LAFORIA - (F) "A behavioural simulation model for the study of emergent social structures" Antonio RIZZO, CNR - (I), Neil BURGESS, University of Manchester - (UK) "Action based neural network for adaptive control : the tank case" John R. KOZA, Stanford University - (USA) "Evolving emergent wall following robotic behavior using the genetic programming paradigm" Bruno GAS, Rene NATOWICZ, ESIEE - (F) "A non-supervised continuous learning model of neural network for temporal sequence recognition" Eric DEDIEU, Emmanuel MAZER, IMAG - (F) "The SWALLOW modeler : an approach to sensory relevance" Gilles VENTURINI, ESIEE - (F) "Characterizing the adaptation abilities of a class of genetic base machine learning algorithms" Barbara WEBB, Tim SMITHERS, University of Edimburgh - (UK) "The connection between AI and biology in the study of behaviour" Ulrich NEHMZOW, Tim SMITHERS, University of Edimburgh - (UK) "Using motor actions for location recognition" Stephen TODD, Wiliam LATHAM, IBM - (UK) "Artificial life or surreal art?" R.C. PATON , H. S. NWANA, M. J. SHAVE, T. J. BENCH-CAPON, University of Liverpool - (UK) "Computing at the tissue/organ level (with particular reference to the liver)" Pierre BESSIERE, IMAG - (F) "Genetic Algorithms applied to formal neural networks : parallel genetic implementation of a Boltzmann machine and associated robotic experimentations" Karl SIMS, Thinking Machines Corp. - (USA) "Interactive evolution of dynamical systems" Nicolas MEULEAU, CEMAGREF - (F) "Co-evolution and mimetism : a program simulating road traffic" Christian NOTTOLA, Frederic LEROY, Banque de France - (F) "Dynamics of artificial markets M. SNAITH, 0.HOLLAND, TAG - (UK) "Application of the temporal difference learning to the neural control of quadrupede locomotion" Simon GOOS, Jean-Louis DENEUBOURG, Free University of Brussels - (B) "Harvesting by a group of robots" ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ Registration Form Name : ...................... First name : ....................... Firm :.............................................................. Address : ............................................................ ...................................................................... Zip code : ............. City : ..................................... Country : ................................ Phone : ............ Fax : ............... Email : .................................. Invoice to be sent to : ................................ Registration fees Before 20/11/91 After 20/11/91 ________________________________________________________________________________ Students* o FF 750 o FF 750 University Members o FF 1500 o FF 1750 Others o FF 2200 o FF 2500 ________________________________________________________________________________ * Student status proof required These fees include all refreshments and lunches. Payment (in french francs only, foreign cheques accepted): o Cheques (to be sent to ECAL 91) please note that all charges, if any, must be at the participants' expense. o Banker's draft to the order of ECAL: Credit Lyonnais, bank account 30002 08948 0000079087X 55 Versailles StLouis, F-78000. PLease ask your bank to arrange the transfer at no cost for the beneficiary. Bank charges, if any, will be at the participants' expense. Travel Please, send me o Domestic railway discount ticket SNCF (20%) o Domestic flight discount ticket Air Inter (35%) Cancellations Refunds of 50 % will be made if a written request is received before November 30. No refunds will be made for cancellations received after this date. In case of conference cancellation beyond its control, ECAL organizing committee limits its liability to the registration fees already paid. Date Signature Send this form to : ECAL 91 17 allee Gabrielle d'Estrees F-75019 Paris FRANCE Further information concerning registration : Fax : (+33) 1 40 96 60 80 Voice : (+33) 1 40 96 61 79 E-mail : ecal at cemagref.fr ================================================================================ 1st European Conference on Artificial Life ________________________________________________________________________________ General Information ___________________ Language The conference will be conducted in English. Accommodation Hotel Forest Hill La Villette *** (5-minutes walk ) 26 av. Corentin Cariou, Paris. Tel : +33 1 44 72 15 30, fax: 33 1 44 72 15 80. Single or double rooms: 480FF, special price for ECAL participants. Hotel Arcade La Villette ** (5-minutes walk) Tel : +33 1 40 38 04 04 Single: 390FF, double room: 420FF. Please reserve at least 30 days in advance. Hotel Campanile Pantin ** (10-minutes walk) Tel : +33 1 48 91 32 76 Single or double rooms: 335FF. Please reserve at least 45 days in advance. Tel : +33 (1) 48 91 32 76 Reservation centers (other hotels): Tel: 33 1 47 27 15 15 (500 to 700FF rooms). Tel: 33 1 43 59 12 12. (Elysee 12 12). Tel: 33 1 42 56 30 00, fax 33 1 42 89 42 97 (Paris Sejour Reservations) Tourist information : 33 1 47 23 61 72 Cheaper accomodations are available at: Centre de sejour Eugene Henaff Tel 33 (1) 48 39 19 05 Entry visas ___________ For non European Community members, please check with the french consulate whether you need a Visa. Access to Paris cite des Sciences et de l'Industrie ___________________________________________________ La cite des Sciences et de l'Industrie is located in northeast Paris, at La Villette Park, 30, avenue Corentin Cariou, 75019 Paris. It is 40 minutes from Roissy and Orly airports. You can reach the Cite: by car: Circular highway, Porte de la Villette exit. Parking available at quai de la Charente and Boulevard Macdonald; by metro: Line 7, Porte de la Villette station; by bus: lines 150-152-250A-PC. For information about the cite des Sciences, call 33 1 46 42 13 13 (round-the-clock), or by Minitel: 3615 code Villette. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Tue Oct 15 10:22:41 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Tue, 15 Oct 91 09:22:41 EST Subject: Paper Message-ID: Following is the abstract of a paper accepted by IJCNN'91-SINGAPORE. The main purpose of this paper was to attack the problems of slow rate of convergence, local minima, and incapability of learning (under certain preset criteria) etc problems associated with the original back-propagation neural nets from an alternative viewpoint ---- topology ---- instead of the learning algorithm and units responsive characteristics. It was shown in this paper that the topology is a very important factor limiting the performances of back-propagation neural networks besides the already studied factors such as the learning algorithm and the units characteristics. All comments are welcome. PPNN: A Faster Learning and Better Generalizing Neural Net Bo Xu Indiana University Liqing Zheng Purdue University Abstract----It was pointed out in this paper that the planar topology of current back-propagation neural network (BPNN) sets limits to solve the slow convergence rate problem, local minima, and other problems associated with BPNN. The parallel probabilistic neural network (PPNN) using a new neural network topology, stereotopology, was proposed to overcome these problems. The learning ability and the generalization ability of BPNN and PPNN were compared for several problems. The simulation results show that PPNN was capable of learning any kinds of problems much faster than BPNN and generalized better than BPNN too. It was analyzed that the faster, universal learnability of PPNN was due to the parallel characteristic of PPNN's stereotopology, and the better generalization ability came from the probabilistic characteristic of PPNN's memory retrieval rule. Bo Xu Indiana University itgt500 at indycms.iupui.edu From xiru at Think.COM Tue Oct 15 11:35:55 1991 From: xiru at Think.COM (xiru Zhang) Date: Tue, 15 Oct 91 11:35:55 EDT Subject: batch-mode parallel implementations In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Tue, 15 Oct 91 01:23:47 -0400 <9110151441.AA10584@chaos.cs.brandeis.edu> Message-ID: <9110151535.AA02757@yangtze.think.com> From jcp at vaxserv.sarnoff.com Wed Oct 16 12:03:09 1991 From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385) Date: Wed, 16 Oct 91 12:03:09 EDT Subject: batch-mode parallel implementations Message-ID: <9110161603.AA09000@sarnoff.sarnoff.com> Xiru Zhang stated: >From the point of view of implementation, if a network is not large, there >is not much you can parallelize if you do per-sample training. Even in per-sample training one may be able to efficiently exploit a parallel machine. Each processor simulates the same network but has a different set of initial weights. The convergence time and performance of a trained network can be very dependent on the initial weights. I would appreciate being sent references that discuss this last statement. John Pearson David Sarnoff Research Center CN5300 Princeton, NJ 08543 609-734-2385 jcp at as1.sarnoff.com From gary at cs.UCSD.EDU Wed Oct 16 13:05:51 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Wed, 16 Oct 91 10:05:51 PDT Subject: batch-mode parallel implementations Message-ID: <9110161705.AA27497@desi.ucsd.edu> I tried implementing Elman's simple recurrent nets on an Intel Hypercube using data parallelism (a copy of the net at each node, each getting a part of the training set). I found is was as fast as a bat out of h**l, but as many times faster as it was, it was also as many times SLOWER at converging, leading to a net gain of 0! g. PS I did not try conjugate gradient, or back propping more steps in time, which probably would have helped convergence lots. From orilex at crl.ucsd.edu Wed Oct 16 15:33:44 1991 From: orilex at crl.ucsd.edu (Roy Higginson) Date: Wed, 16 Oct 91 12:33:44 PDT Subject: address for Sanger Message-ID: <9110161933.AA21258@crl.ucsd.edu> Can someone give me an e-mail address for Dennis Sanger AT&T Bell/Univ of CO at Boulder? Thanks, Higginson From ajr at eng.cam.ac.uk Wed Oct 16 17:48:31 1991 From: ajr at eng.cam.ac.uk (Tony Robinson) Date: Wed, 16 Oct 91 17:48:31 BST Subject: TR available: Phoneme recognition with recurrent networks Message-ID: <16687.9110161648@dsl.eng.cam.ac.uk> ***Do not forward to other bboards*** I've recently completed a technical report on connectionist phoneme recognition which I would like to make available to interested researchers. It describes a series of changes which have been made to tidy up a previously published system. Copies of the technical report may be obtained courtesy of Jordan Pollack by anonymous ftp from archive.cis.ohio-state.edu in the directory /pub/neuroprose as file robinson-tr82.ps.Z. If this option is not available to you, or if you would like a reprint of the background article, please send me email giving your full address. Tony [Robinson] Cambridge University Engineering Department, Trumpington Street, Cambridge, UK ------------------------------------------------------------------------------ Several Improvements to a Recurrent Error Propagation Network Phone Recognition System Tony Robinson ajr at eng.cam.ac.uk CUED/F-INFENG/TR.82 30 September 1991 Recurrent Error Propagation Networks have been shown to give good performance on the speaker independent phone recognition task in comparison with other methods [Robinson and Fallside, Computer Speech and Language, July 1991]. This short report describes several recent improvements made to the existing recogniser for the TIMIT database. The improvements are: an addition to the preprocessor to represent voicing information; use of histogram normalisation on the input channels of the network; normalisation of the output channels to enforce unity sum; a change in the cost function to give equal weighting to each target symbol; a change in the representation of the outputs to reduce quantisation errors; retraining on the complete TIMIT training set; and the better estimation of HMM phone models. Most of these changes decrease the number of arbitrary parameters used and allow for the integration of the system with standard HMM techniques. The result of these changes is a decrease in the number of errors by about 16% (from 36.5% to 30.7% when all 61 TIMIT phones are used and from 30.2% to 25.0% on a reduced 39 phone set). From shams at maxwell.hrl.hac.com Wed Oct 16 17:23:42 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Wed, 16 Oct 91 14:23:42 PDT Subject: batch-mode parallel implementations Message-ID: <9110162123.AA08260@maxwell.hrl.hac.com> We have exploited the "epoch" training method for implementing back-prop on a 2-D systolic array processor of Hughes [1,2]. There are two basic problems with this approach. First, there are only a limited number of models that allow for epoch training (e.g. back-prop). Second, this type of parallelism is not useful during recall or classification cycle since there is only a single input pattern to be evaluated (unless the input data rate exceeds the processor throughput enabling the input data to be buffered for batch processing). As the number of neurons used in real-world applications continue to increase, there would be enough computation to keep all the processors busy without having to use epoch parallelism. [1] S. Shams and K. W. Przytula, "Mapping of Neural Networks onto Programmable Parallel Machines," Proceedings of the Intern. Symp. on Circuits and Systems, New Orleans, LA, Vol. 4, pp. 2613-2617, 1990. [2] S. Shams and K. W. Przytula. "Implementation of Multilayer Neural Networks on Parallel Programmable Digital Computers." In Parallel Algorithms and Architectures for DSP Applications. Ed. M. Bayoumi, Kluwer Academic Publishers, pp. 225-253, 1991. Soheil Shams Hughes Research Labs. From karunani at CS.ColoState.EDU Wed Oct 16 22:23:31 1991 From: karunani at CS.ColoState.EDU (n karunanithi) Date: Wed, 16 Oct 91 20:23:31 MDT Subject: HowtoScale Message-ID: <9110170223.AA05027@zappa> Dear Connectionist, Some time back I posted the following problem in this news group and many people responded with suggestions and references. I thankful to all of them. I have summarized their responses and posting here to for other who might find it interesting. For completeness sake I have included my original posting as well. ******Issue raised: Background: ----------- I have been using neural network models (both Feed-Forward Nets and Recurrent Nets) in a prediction application and I am getting pretty good results. In fact neural networks approach outperformed many well known analytic models. Similar results have been reported by many researchers in (chaotic) time series predictions. Suppose that X is the independent variable and Y is the dependent variable. Let (x(i),y(i)) represent a sequence of actual input/output values observed at time i = 0,1,2,..,t of a temporal process. Let further that both the input and the output variables are single dimensional variable and can take on a sequence of +ve integers up to a maximum of 2000. Once we train a network with the history of the system up to time "t" we can use the network to predict outputs y(t+h), h=1,..,n for any future input x(t+h). In my application I already have the complete sequence and hence I know what is the maximum value for x and y. Using these maximum I normalized both X and Y over a 0.1 to 0.9 range. (Here I call such normalization as "scaled representation".) Since I have the complete sequence it is possible for me to evaluate how good the networks' predictions are. Now some basic issues: --------------------- 1) How to represent these variables if we don't know in advance what the maximum values are? Scaled representation presupposes the existence of a maximum value. Some may suggest that a linear units can be used at the output layer to get rid of scaling. If so how do I represent the input variable? The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed to 1.0) when the sum is >= 14. However one may suggest that changing the output range of the sigmoidal can help to get rid of saturation effect. Is it a correct approach? 2) In such prediction application, people (including me) compare the predictive accuracy of neural networks with that of parametric models(that are based on analytical reasons). But one main advantage with the parametric models is that their parameters can be calculated using any of the following parameter estimation techniques: least square, maximum likelyhood, Bayesian, Genetic Algorithms or any other method. These parameter estimation techniques do not require any scaling, and hence there is no need for preguessing of the maximum values. However with the scaled representation in neural networks one can not proceed without making guesses about the maximum(or a future) input and/or output. In many real life situations such guesses are infeasible or dangerous. How do we address this situation? ____________________________________________________________________________ N. KARUNANITHI E-Mail: karunani at CS.ColoState.EDU Computer Science Dept, Colorado State University, Collins, CO 80523. ____________________________________________________________________________ ******Responses Received: 1) Dr Huang at CMU Date: Thu, 26 Sep 1991 11:40-EDT From: Xuedong.Huang at SPEECH2.CS.CMU.EDU I have several papers addressing the issues you raised. See for example: [1] Huang, X : A Study on Speaker-Adaptive Speech Recognition" DARPA Speech and Language Workshop, Feb , 1991, pp278-283 [2] Huang, X, K. Lee and A. Waibel: Connectionist speaker normlization and its applications to speech recognition", IEEE Workshop on NNSP, Princeton, Sept. 1991 X.D. Huang, PhD Research Computer Scientist ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ School of Computer Science Tel: (412) 268 2329 Carnegie Mellon University Fax: (412) 681 5739 Pittsburgh, PA 15213 Email: xdh at cs.cmu.edu ============================================================================= 2) From Alexander at CUNY Date: Thu, 26 Sep 91 14:45 EDT From: TWOMBLY%JHUBARD.BITNET at CUNYVM.CUNY.EDU In response to your question about scaling for sigmoidal units..... I ran into the same problem of not knowing the maximum value that my input/output data would take at any particular time. There were no a priori bounds that could be reasonably set, so the solution (in this case) was to get rid of the sigmoidal activation function and replace it with one that did not require any scaling. The function I used was a clipped linear function - that is, f(x) = 0. for x<0., and f(x) = x for x>0. For my data this activation function worked as well as the sigmoidal units (in some cases better) because the hidden units never took advantage of the non-linearity in the upper range of the sigmoid function. The only difficulty with this function is that it does not have a continuous derivative at 0. You can get around this problem by tacking on a 1/x type function for x<0 that drops off very quickly. This will provide a well behaved, non-zero derivative for all parts of the activation function while adding a negligable value to the output for x<0. The actual function I use is: f(x) = x; x > 0. f(x) = 1/(10**2 - x*10**4); x < 0. I hope this helps. -Alexander ============================================================================= 3) Dr. Fahlman at CMU Date: Thu, 26 Sep 91 22:20:14 -0400 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU 1) How to represent these variables if we don't know in advance what the maximum values are? Scaled representation presupposes the existence of a maximum value. Some may suggest that a linear units can be used at the output layer to get rid of scaling. Right, I was about to suggest that. If so how do I represent the input variable? The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed to 1.0) when the sum is >= 14. However one may suggest that changing the output range of the sigmoidal can help to get rid of saturation effect. Is it a correct approach? For a non-recurrent network, the first layer of weights cand and usually will scale the inputs for you. You save some learning time and possible traps if the inputs are in some reasonable range, but it really isn't essential. I'd advise adding a small constant (0.1 works well) to the derivative of the sigmoid for all units so that you can recover if the unit gets pinned to an extreme value. I don't understand your second point, so I won't try to reply to it. Scott Fahlman Carnegie Mellon University ============================================================================= 4) Ian Fitchet at Birmingham University Date: Fri, 27 Sep 91 03:43:40 +0100 From: Ian Fitchet I'm no expert, but how about having two outputs: one is a control and has a (mostly) fixed value; the other is the output y(i) which is adjusted such that the one divided by the other gives the required result. Off the top of my head, have the control output 0.9 most of the time, when the value of y(i) goes above unity have y(i) = 0.9 and the control decrease, so that if the control equalled 0.45, say, then the real value of the output would be 0.9/0.45 = 2.0 . Of course the question is then, how do I train the nextwork to set the value of the control? But I leave that as an exercise... :-) Cheers, Ian -- Ian Fitchet I.D.Fitchet at cs.bham.ac.uk School of Computer Science Univ. of Birmingham, UK, B15 2TT "You run and you run to catch up with the sun, but it's sinking" Pink Floyd ============================================================================= 5) From Dermot O'Brien at the University of Edinburgh Date: Fri, 27 Sep 91 10:32:31 WET DST Sender: dob at castle.edinburgh.ac.uk You may be interested in the following references (if you havn't read them already): @techreport{Lapedes:87, Author = "Alan S. Lapedes and Robert M. Farber", Title = "Nonlinear signal processing using neural networks: prediction and system modelling", Institution = "Los Alamos National Laboratory", Year = 1987, Number = "LA-UR-87-2662"} @incollection{Lapedes:88, Author = "Alan S. Lapedes and Robert M. Farber", Title = "How Neural Nets Work", BookTitle = "Evolution, Learning, and Cognition", Pages = {331--346}, Editor = "Y.C Lee", Year = 1988, Publisher = "World Scientific", Address = "Singapore"} The above papers analyse the behaviour of feed-forward neural networks applied to the problem of time series prediction, and make an interesting analogy with Fourier decomposition. Cheers, Dermot O'Brien Physics Department University of Edinburgh The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland ============================================================================= 6) From: Tony Robinson Date: Fri, 27 Sep 91 12:23:23 BST My immediate advice is: Don't put the input through a nonlinearity at the start of the network. Use linear output units. Allow a linear path through the system so that if a linear solution to the problem is possible then this is a possible network solution. Then you will have no problems with maximum values. Tony [Robinson] ============================================================================= End of summary. ____________________________________________________________________________ N. KARUNANITHI E-Mail: karunani at CS.ColoState.EDU Computer Science Dept, Colorado State University, Collins, CO 80523. ____________________________________________________________________________ From thomasp at informatik.tu-muenchen.dbp.de Mon Oct 14 05:17:00 1991 From: thomasp at informatik.tu-muenchen.dbp.de (Thomas) Date: 14 Oct 91 10:17 +0100 Subject: report available Message-ID: <91Oct14.101724met.34256(a)gshalle1.informatik.tu-muenchen.de> From khosla at latcs1.lat.oz.au Thu Oct 17 04:00:31 1991 From: khosla at latcs1.lat.oz.au (Rajiv Khosla) Date: Thu, 17 Oct 91 18:00:31 +1000 Subject: Spatial crosstalk and modular NN architechture Message-ID: <9110170800.AA00862@latcs1.lat.oz.au> Dear Connectionists, Can anyone enlighten me on the following. I have to model a problem with 28 discrete inputs(1's and 0's) and 26 discrete outputs. Infact, these 26 discrete outputs can be represented by 5 normalized continous outputs also. Now, I have no problem modelling it as a 28-11-5 network using Scott Fahlman's quickprop . However, I get into all sorts of problems when I have to model 28-?-26 network(? stands for any no. of hid. units. I tried upto 104). Sometime back, I read a paper on modular NN architechtures which suggested that because of spatial crosstalk one should have dedicated or independent links between hidden units and each output unit. This would result in faster training and better generalization. I tried this architechture by making suitable changes in the quickprop algorithm but to no avail. There is no improvement over the standard architechture vis-a-vis training. Infact, things seemed to get slightly worse. I tried with 2,3,4 sets(that is, in all 52,78,104 hid. units resp.) of hid. units per output unit. I gave up after about 5000 epochs as I couldn't see any significant improvement in the total error. Has anyone used the modular architechture in a similar situation with large number of output nodes with positive results? Am I doing something wrong? Is there any other solution except making the outputs continous and reducing the number of output nodes? I have only recently started reading this group. So, Pl. excuse the naiveity of the questions if any. Please e-mail your replies to khosla at latcs1.lat.oz.au Thanks in advance, Rajiv From neural!lamoon.neural!yann at att.att.com Thu Oct 17 10:46:39 1991 From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com) Date: Thu, 17 Oct 91 10:46:39 -0400 Subject: batch-mode parallel implementations Message-ID: <9110171446.AA19788@lamoon> Several years ago, Steve Nowlan and I implemented a "batch-mode" vectorized backprop on a Cray. Just as in Gary Cottrell's story, the raw CUPS rate was high, but because batch mode converges so much slower than on-line, the net gain was 0. I think Patrick Haffner and Alex Waibel had a similar experience with their implementations of TDNNs on the Alliant. Now, the larger, and more redundant the dataset is, the larger the difference in convergence speed between on-line and batch. For small (and/or random) datasets, batch might be OK, but who cares. Also, if you need a very high accuracy solution (for function approximation for example), a second-order batch technique will probably be better than on-line. Sadly, almost all speedup techniques for backprop only apply to batch (or semi-batch) mode. That includes conjugate gradient, delta-bar-delta, most Newton or Quasi-Newton methods (BFGS...), etc... I would love to see a clear demonstration that any of these methods beats a carefully tuned on-line gradient on a large pattern classification problem. I tried many of these methods several years ago, and failed. I think there are two interesting challenges here: 1 - Explain theoretically why on-line is so much faster than batch (something that goes beyond the "multiple copies" argument). 2 - Find more speedup methods that work with on-line training. -- Yann Le Cun From kamil at apple.com Thu Oct 17 12:59:21 1991 From: kamil at apple.com (Kamil A. Grajski) Date: Thu, 17 Oct 91 09:59:21 -0700 Subject: batch & on-line training Message-ID: <9110171659.AA23721@apple.com> The consensus opinion seems to be that on-line learning is preferred for situations consisting of a classification problem with a large (possibly redundant) dataset. What appears to have been a common experience is that batch-mode training generates impressive MCUP statistics, but convergence is slower enough that the net gain is 0. It is difficult to make a scientific judgement still, mostly because the evidence appears to be largely anecdotal, e.g., "I really tried hard to make one (batch, or on-line) work, and it beat the other." It has been observed that several algorithms for accelerating convergence are designed for (semi-)batch mode. Were these to be seriously evaluated, would the net gain 0 still occur? On the other hand, with more work could on-line methods widen their apparent superiority? I don't think that we're splitting hairs by addressing this issue. One trend in the implementations side of NNs is to have the highest MCUPS performance. In several instances, this is achieved using mappings/architectures which rest on batch-mode training. I think that one might design a neurocomputer differently depending on which training mode is to be used, e.g., the communication vs computation curves are different. So, at the moment, in certain instances, we've actually put the cart before the horse. We have fast batch implemen- tations. Do we make batch-mode training better, or can we make on-line so fast and so optimally design a machine that the issue is moot? (I'm ignoring the (possibly substantial) conflicting requirements between training & recognition modes, here.) In any event, it seems that folks are having success doing either in different situations. However, there doesn't seem to be a compelling argument for preferring one or the other IN PRINCIPLE. Cheers, Kamil From dlukas at park.bu.edu Thu Oct 17 12:58:42 1991 From: dlukas at park.bu.edu (David Lukas) Date: Thu, 17 Oct 91 12:58:42 -0400 Subject: Graduate study in Cognitive & Neural Systems at Boston University Message-ID: <9110171658.AA15628@park.bu.edu> (please post) *********************************************** * * * DEPARTMENT OF * * COGNITIVE AND NEURAL SYSTEMS (CNS) * * AT BOSTON UNIVERSITY * * * *********************************************** Stephen Grossberg, Chairman The Boston University Department of Cognitive and Neural Systems offers comprehensive advanced training in the neural and computational principles, mechanisms, and architectures that underly human and animal behavior, and the application of neural network architectures to the solution of outstanding technological problems. Applications for Fall, 1992 admissions and financial aid are now being accepted for both the MA and PhD degree programs. To obtain a brochure describing the CNS Program and a set of application materials, write or telephone: Department of Cognitive & Neural Systems Boston University 111 Cummington Street, Room 240 Boston, MA 02215 (617) 353-9481 or send a mailing address to: kellyd at cns.bu.edu Applications for admission and financial aid should be received by the Graduate School Admissions Office no later than January 15. Applicants are required to submit undergraduate (and, if applicable, graduate) transcripts, three letters of recommendation, and Graduate Record Examination (GRE) scores. The Advanced Test should be in the candidate's area of departmental specialization. GRE scores may be waived for MA candidates and, in exceptional cases, for PhD candidates, but absence of these scores may decrease an applicant's chances for admission and financial aid. Description of the CNS Department: The Department of Cognitive and Neural Systems (CNS) provides advanced training and research experience for graduate students interested in the neural and computational principles, mechanisms, and architectures that underlie human and animal behavior, and the application of neural network architectures to the solution of outstanding technological problems. Students are trained in a broad range of areas concerning cognitive and neural systems, including vision and image processing; speech and language understanding; adaptive pattern recognition; cognitive information processing; self-organization; associative learning and long-term memory; cooperative and competitive network dynamics and short-term memory; reinforcement, motivation, and attention; adaptive sensory-motor control and robotics; and biological rhythms; as well as the mathematical and computational methods needed to support advanced modeling research and applications. The CNS Department awards MA, PhD, and BA/MA degrees. The CNS Department embodies a number of unique features. It has developed a core curriculum that consists of ten interdisciplinary graduate courses each of which integrates the psychological, neurobiological, mathematical, and computational information needed to theoretically investigate fundamental issues concerning mind and brain processes and the applications of neural networks to technology. Additional advanced courses, including research seminars, are also offered. Each course is typically taught once a week in the evening to make the program available to qualified students, including working professionals, throughout the Boston area. Students develop a coherent area of expertise by designing a program that includes courses in areas such as Biology, Computer Science, Engineering, Mathematics, and Psychology, in addition to courses in the CNS core curriculum. The CNS Department prepares students for thesis research with scientists in one of several Boston University research centers or groups, and with Boston-area scientists collaborating with these centers. The unit most closely linked to the department is the Center for Adaptive Systems. The Center for Adaptive Systems is also part of the Boston Consortium for Behavioral and Neural Studies, a Boston-area multi-institutional Congressional Center of Excellence. Another multi-institutional Congressional Center of Excellence focused at Boston University is the Center for the Study of Rhythmic Processes. Other research resources include distinguished research groups in neurophysiology, neuroanatomy, and neuropharmacology at the Medical School and the Charles River campus; in sensory robotics, biomedical engineering, computer and systems engineering, and neuromuscular research within the Engineering School; in dynamical systems within the mathematics department; in theoretical computer science within the Computer Science Department; and in biophysics and computational physics within the Physics Department. 1991 FACULTY and STAFF of CNS and CAS: Daniel H. Bullock Nancy Kopell Gail A. Carpenter John W.L. Merrill Michael A. Cohen Ennio Mingolla H. Steven Colburn Alan Peters Paolo Gaudiano Adam Reeves Stephen Grossberg James T. Todd Thomas G. Kincaid Allen Waxman From MURTAGH at SCIVAX.STSCI.EDU Thu Oct 17 15:29:25 1991 From: MURTAGH at SCIVAX.STSCI.EDU (MURTAGH@SCIVAX.STSCI.EDU) Date: Thu, 17 Oct 1991 15:29:25 -0400 (EDT) Subject: Workshop: Par. Prob. Solving: Applns. in Statistics & Economics Message-ID: <911017152925.28c128fa@SCIVAX.STSCI.EDU> Workshop Announcement and Call for Papers: "Parallel Problem Solving From Nature: Applications in Statistics & Economics". ------------------------------------------------------------------------------- Interdisciplinary Project Center for Supercomputing, ETH, Zurich, Switzerland. December 10-11, 1991. Support/Sponsorship: DOSES/Statistical Office of the European Communities; IPS, ETH Zurich; Konjunkturforschungsstelle, ETH Zurich; MasPar Distributor AG Zurich; PAR, Schweizerische Informatiker Gesellschaft; Parsytec GmbH, Aachen; QT optec AG, Zug; Schweizerischer Bankverein, Basel, IBM Switzerland. Program Committee: J. Frain (Central Bank of Ireland), K. Kirchmayr (Schweizerischer Bankverein, Basel), F. Murtagh (Munotec Systems, Munich and Dublin), P. Van Nypelseer (DOSES/EUROSTAT, Luxembourg), U. Reimer (Rentenanstalt Zuerich), M.M. Richter (DFKI Kaiserslautern), W. Roth (Konjunkturforschungsstelle ETH, Zurich), D. Wuertz (IPS, ETH Zurich), and H.G. Zimmermann (Siemens, Munich). Invited Speakers: J. Bernasconi (ABB Corp. Research, Baden), A. Colin (Citibank, London), F. Fogelman-Soulie (MIMETICS, Chatenay Malabry), J. Frain (Central Bank of Ireland), H. Horner (Universitaet Heidelberg), H. Muehlenbein (GMD, Sankt Augustin, Bonn), F. Murtagh (Munotec Syst., Munich), M.B. Priestley (UMIST Manchester), R. Rohwer (CSTR University of Edinburgh), C. Schaefer (Rowland Inst. of Science, Cambridge MA), P. Treleaven (University College London), A. Varfis (Joint Research Center, Ispra), H.-M. Wallmeier (IBM Scientific Center, Heidelberg), D. Weers (Aspen Intellect, Zug), A. Weigend (Stanford University) D. Wuertz (IPS, ETH Zurich), H.G. Zimmermann (Siemens, Munich). Registration: SFr 400 for those from profit-making companies; otherwise SFr 150. A limited fund will be available to support younger participants who would not otherwise be able to attend. Late registration, after November 1, additional SFr 50. Remittance (only Swiss Francs) to: PASE-Workshop - Dr. Diethelm Wuertz, Schweizerischer Bankverein, Zurich. Acccount number: P0-206066.0. Accommodation requests: directly to: Verkehrsverein Zurich (VVZ), Kongressbuero, Postfach, CH-8023 Zurich, Switzerland (Tel: + 41 1 211-1256). Contact Point: Dr. Diethelm Wuertz, IPS ETH Zurich, ETH Zentrum, CLU B3, CH-8092 Zurich, Switzerland. Fax: + 41 1 252-0185. Email: wuertz at ips.ethz.ch or the undersigned. Abstract: 1 page, by November 1. F.D. Murtagh murtagh at scivax.stsci.edu From dominic at DEBUSSY.CODA.CS.CMU.EDU Thu Oct 17 16:21:08 1991 From: dominic at DEBUSSY.CODA.CS.CMU.EDU (Chioccioli) Date: Thu, 17 Oct 91 14:21:08 -0600 Subject: No subject Message-ID: <9110172021.AA24272@debussy.cs.colostate.edu> This posting briefly describes my interest in parallel learning algorithms for neural networks. Currently I am investigating the following two aspects of parallel reinforcement learning algorithms for sequential decision tasks: 1) Multiple nets on multiple task simulations. Our goal here is to combine multiple-simultaneous experiences to reduce the wall-clock time required to learn a task. 2) Multiple nets on single task simulation. This paradigm assumes that multiple simulations cannot be run, however, parallel search of the (single) experience space obtained from running a single simulation can be used to reduce the total number of trials (i.e. simulated experiences) required for learning. Several different algorithms will be attempted for both of the above tasks. I am interested in hearing from others who may also be doing research in parallel learning algorithms for neural networks. Pointers to relevant publications or references will be most helpful. thanks, in advance for any responses. I will post a summary of any references I receive provided that this is not a repeated request and that sufficient response is forthcoming. Regards, Steve Dominic dominic at debussy.cs.colostate.edu Colorado State University Computer Science Dept. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Thu Oct 17 16:01:33 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Thu, 17 Oct 91 15:01:33 EST Subject: Paper Message-ID: Two days ago I posted the abstract of our paper "PPNN: A Faster Learning and Better Generalizing Neural Net". Because the paper will appear in the proceeding of IJCNN'91-SINGAPORE, I thought it would be not necessary to place it in the neuroprose. However, since the posting, I have received large amounts of messages requesting a copy of the paper, and the request is still going on. Because I had no preparation for this, I was unable to answer all of the messages in time. Please excuse me for any possible delay and errors in replying your requests. Thanks to many colleagues suggestion, I am going to place the paper to neuroprose archive. I will provide the procedures for reaching it at cheops of Ohio State when it is ready. I will be happy to send hardcopy to those having no access to FTP. Bo Xu Indiana University itgt500 at indycms.iupui.edu From khosla at latcs1.lat.oz.au Thu Oct 17 20:49:38 1991 From: khosla at latcs1.lat.oz.au (khosla@latcs1.lat.oz.au) Date: Fri, 18 Oct 91 10:49:38 +1000 Subject: Pl. Ignore Message-ID: <9110180049.AA28265@latcs2.lat.oz.au> This is a test From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Oct 18 02:10:29 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 18 Oct 91 02:10:29 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0500. <9110171446.AA19788@lamoon> Message-ID: Yann LeCun writes: Now, the larger, and more redundant the dataset is, the larger the difference in convergence speed between on-line and batch. For small (and/or random) datasets, batch might be OK, but who cares. I think that it may be misleading to lump together "large" and "redundant" as if they were the same thing, or as if they were inseparable. I agree that for highly redundant datasets, continuous updating has an advantage. I also agree that for small datasets, we don't care much about speed. But it seems to me that it is possible to have a large, not-very-redundant data set, and that accelerated batch methods should have an advantage for these. I guess you could measure redundancy by seeing if some subset of the training data set produces essentially the same gradient vector as the full set. Probably statisticians have good ways of talking about this redundancy business -- unfortunately, I don't know the right vocabulary. In a data set with noise, you need a big enough training set to raise relatively rare but real features above the level of the random background noise. If you have roughly that much data, I bet fast batch techniques would win; if you have a training set that is several times this minimal size, then continuous updating would win. That's my suspicion, anyway. I would love to see a clear demonstration that any of these methods beats a carefully tuned on-line gradient on a large pattern classification problem. I tried many of these methods several years ago, and failed. Well, if my hypothesis above is right, we could demonstrate this by finding a dataset that is large enough to make you happy, but not highly redundant. I guess that we could create this by taking any large dataset, measuring its redundancy, and trimming it down to minimal size (assuming that the result still can be classified as large). Do you know of any big sets that would qualify? It should preferably a relatively "pure" N-input data-classification problem, without all the additional issues (e.g. translation invariance) that are present in image-processing and speech-processing tasks. I think there are two interesting challenges here: 1 - Explain theoretically why on-line is so much faster than batch (something that goes beyond the "multiple copies" argument). 2 - Find more speedup methods that work with on-line training. I have a hunch that if we work hard enough on speeding up online training, we'll end up with something whose NET EFFECT is equivalent to the following: 1. Accumulate gradient data for a length of time that is adaptively chosen: Large enough for the gradients to be stable and accurate, but not large enough to be redundant. 2. Use something equivalent to one of the batch-processing acceleration techniques on this smoothed gradient. That's not to say that the technique will necessary do this in an obvious way -- it may be twiddling the weights each time a sample goes by -- but I suspect this kind of accumulation, smoothing, and acceleration will be present at some level. As I said, for now this is just a hunch. -- Scott Fahlman P.S. I avoid using the term "on-line" for what I call "per-sample" or "continuous" updating of weights. For me, "online" means something else. At this moment, I am sitting at my workstation watching one of my batch-updating algorithms running "on-line" in front of me. From smagt at fwi.uva.nl Fri Oct 18 09:23:07 1991 From: smagt at fwi.uva.nl (Patrick van der Smagt) Date: Fri, 18 Oct 91 14:23:07 +0100 Subject: Spatial crosstalk and modular NN architechture Message-ID: <9110181323.AA28643@fwi.uva.nl> > I have to model a problem with 28 discrete inputs(1's and 0's) and >26 discrete outputs. Infact, these 26 discrete outputs can be represented by >5 normalized continous outputs also. If one would want to model any kind of function, why go for the least obvious solution via a neural network first? Since your problem is binary, too, I would first create a much simpler method such as k-nearest-neighbour or any bin approach which would enable one to gain an understanding of the data and the overlap. Ten years ago this would have been a more standard approach, instead of using a black box (aka neural network). The reason that I would _not_ immediately grasp a network to do some function-approximation is that I have seen too many people choke on the fact that they do not understand their data, or the complexity of the data, a reasonable ratio #degrees of freedom:#learning samples, etc. Patrick van der Smagt From xiru at Think.COM Fri Oct 18 10:42:04 1991 From: xiru at Think.COM (xiru Zhang) Date: Fri, 18 Oct 91 10:42:04 EDT Subject: batch & on-line training In-Reply-To: "Kamil A. Grajski"'s message of Thu, 17 Oct 91 09:59:21 -0700 <9110171659.AA23721@apple.com> Message-ID: <9110181442.AA03133@yangtze.think.com> Date: Thu, 17 Oct 91 09:59:21 -0700 From: "Kamil A. Grajski" The consensus opinion seems to be that on-line learning is preferred for situations consisting of a classification problem with a large (possibly redundant) dataset. What appears to have been a common experience is that batch-mode training generates impressive MCUP statistics, but convergence is slower enough that the net gain is 0. It is difficult to make a scientific judgement still, mostly because the evidence appears to be largely anecdotal, e.g., "I really tried hard to make one (batch, or on-line) work, and it beat the other." I have used per-epoch training on an auto-association netowrk, to extract "features" of protein local structures, using as few hidden units as possible. I spent a lot of time to fine-tune the training process, such as using different learning rate at different stage of training, different momentum term, different range of random weights at the beginning, how large each "batch" is, etc. At the end I got a pretty good convergence rate. (Maybe I did not spend enough effert to fine-tune the per-sample training.) My feeling is that training a large network with lots of examples is still an art. You can almost always improve it if you spend time on it. Per-epoch training may have somewhat different behavior than per-sample training. So different training schedule is often needed. And it takes time to figure out what is a good one. It also critically depends on the particular problem you want to solve. Besides the issue of convergence rate, I wonder if people have compared networks trained by per-epoch schedule and per-sample schedule, to see if they have the same level of generalization. One thing I noticed in my work is that per-sample training tends to make certain weights much larger than in per-epoch training. But I am not sure if this is true in general. - Xiru Zhang From neural!lamoon.neural!yann at att.att.com Fri Oct 18 11:08:03 1991 From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com) Date: Fri, 18 Oct 91 11:08:03 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 02:10:29 -0400. Message-ID: <9110181508.AA00547@lamoon> Scott Fahlman writes: >I avoid using the term "on-line" for what I call "per-sample" or >"continuous" updating of weights. I personally prefer the phrase "stochastic gradient" to all of these. >I guess you could measure redundancy by seeing if some subset of the >training data set produces essentially the same gradient vector as the full >set. Hmmm, I think any dataset for which you expect good generalization is redundant. Train your net on 30% of the dataset, and measure how many of the remaining 70% you get right. If you get a significant portion of them right, then accumulating gradients on these examples (without updating the weights) would be little more than a waste of time. This suggests the following (unverified) postulate: The better the generalization, the bigger the speed difference between on-line (per-sample, stochastic....) and batch. In other words, any dataset interesting enough to be learned (as opposed to stored) has to be redundant. There might be no such thing as a large non-redundant dataset that is worth learning. -- Yann From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Oct 18 12:38:38 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 18 Oct 91 12:38:38 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0500. <9110181508.AA00547@lamoon> Message-ID: Original-From: Yann le Cun I personally prefer the phrase "stochastic gradient" to all of these. That's a fine term, but it seems to me that it refers to one of the effects of per-sample updating, and not to the mechanism itself. You might get a "stochastic gradient" because you are updating after every randomly chosen sample, but you might also get it from noise in the samples themselves. So if you want to refer to the choice of updating mechanism, and not to the quality of the gradient, I think it's better to use a term like "per-sample updating" that is nearly impossible for the reader to misunderstand. >I guess you could measure redundancy by seeing if some subset of the >training data set produces essentially the same gradient vector as the full >set. Hmmm, I think any dataset for which you expect good generalization is redundant. Train your net on 30% of the dataset, and measure how many of the remaining 70% you get right. If you get a significant portion of them right, then accumulating gradients on these examples (without updating the weights) would be little more than a waste of time. This suggests the following (unverified) postulate: The better the generalization, the bigger the speed difference between on-line (per-sample, stochastic....) and batch. In other words, any dataset interesting enough to be learned (as opposed to stored) has to be redundant. There might be no such thing as a large non-redundant dataset that is worth learning. I think we may be talking about two different things here. Let's assume that there is some underlying distribution that we are trying to model, and that we take some number of samples from this distribution to use as a training set. It is clearly true that there must be some "redundancy" in the underlying distribution if it is to be worth modelling. In this case, I'm using the term "redundancy" to mean that there's some sort of regular statistical structure that is stable enough to be of predictive value. Put another way, the distribution must not be totally random-looking; it has less than the maximum possible information per sample. However, given one of these redundant underlying distributions, we want to choose a training set that is large enough to be representative of the distribution (and to separate signal from noise), but not so large as to be redundant itself. This training set is what I was referring to in my earlier message. I think it is quite possible for the training set to be large, not internally redundant, and interesting in the sense that it models an predictable (redundant) underlying distribution. And this is the kind of case where I think that batch-updating has an advantage. -- Scott Fahlman From english at sun1.cs.ttu.edu Fri Oct 18 14:20:19 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Fri, 18 Oct 91 13:20:19 CDT Subject: batch-mode parallel implementations Message-ID: <9110181820.AA00593@sun1.cs.ttu.edu> Scott Fahlman remarked, > As for speed of convergence, continuous updating clearly beats per-epoch > updating if the training set is highly redundant. Another important factor is the autocorrelation of the training sequence. Consider a (highly redundant) training sequence that starts with 1000 examples of A and ends with 1000 examples of B. With continuous updating, there is a good chance that learning the B examples will cause the learned response to A examples to be lost. The obvious answer, in this contrived case, is to alternate presentations of A and B examples. Now for an uncontrived case: Suppose we are training a recurrent net for speaker-independent speech recognition, and that inputs to the net are power spectra extracted from the speech signal at fixed intervals. There are relatively long intervals in which the speech sound (spectrum) does not change much. There are even longer intervals in which the speaker does not change. Reordering the spectra for an utterance is clearly not an option, and continuous updating seems imprudent even though the redundancy of the training set is high. I'm sure there are plenty of nonstationary time series, other than speech, which present the same problems. In response to Scott's remark on the batch size used with an accelerated convergence procedure, > It must be sufficiently large to give a reasonably stable picture of > the overall gradient, but not so large that the gradient is computed > many times over before a weight-update cycle occurs. I would like to mention a case where, surprisingly, even large batches gave instability. The application was recognition of handwritten lower-case letters, and the network was of the LeCun variety. The training set comprised three batches of 1950 letter images (a total of 5850 images). This partition was chosen randomly. Fahlman's quickprop behaved poorly, and with some close inspection I found a number of weights for which the partial derivative was changing sign from one batch to the next. Further, the magnitudes of those partials were not always small. In short, the performance surfaces for the three batches differed considerably. The moral: You may have to make a single batch of the entire training set, even when working with fairly large training sets. -- Tom English english at sun1.cs.ttu.edu From nowlan at helmholtz.sdsc.edu Fri Oct 18 14:30:16 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Fri, 18 Oct 91 11:30:16 MST Subject: batch-mode parallel implementations In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0400. Message-ID: <9110181830.AA14145@bose> A couple of clarifications with regards to Yann's post: i) The dataset used in the comparison had a high degree of redundancy. ii) The "batch-mode" back-prop was vanilla fixed-step gradient descent, not a second order method. The issue of "batch" versus "on-line" is still a very open one. For relatively small problems (for me < ~5000 cases) I prefer conjugate gradient because of accuracy and no need to tune parameters. These techniques are also very easy to parallelize over cases. I have also implemented on a Cray a BP simulator that vectorized over connections rather than cases, and could implement on-line or batch techniques with ease. My experience here suggested that speed-ups could be obtained when the network had as few as a few thousand connections. - Steve From yoshua at psyche.mit.edu Sat Oct 19 12:55:19 1991 From: yoshua at psyche.mit.edu (Yoshua Bengio) Date: Sat, 19 Oct 91 12:55:19 EDT Subject: online parallel implementation Message-ID: <9110191655.AA12225@psyche.mit.edu> This message concerns an attempt to apply some parallelism to online back-propagation. I had recently access to N = 20 to 40 NeXT workstations on which I could perform learning experiments with back-propagation. My training database was huge (TIMIT, more than half a million patterns, but organized in sequences - sentences - of about 100 'frames' each), so I did not want to use a batch-based method. The idea I attempted to implement was the following: Split the database into N copies. Run N versions of the network on each of the N copies (on the N machines). Share weights _asynchronously_ among the networks, after 1 or more sequence. A 'server' program running on a separate machine received requests from any of the other machines to collect its contribution and return to it the current global moving average of the weights. Since I was running backpropagation through time the weight update was performed only after each sequence even in the single machine implementation, hence the update was not much less 'online' in the parallel implementation. Unfortunately, I don't have anymore access to these machines - because I have moved to a new institution - and I didn't have time to perform enough experiments and compare this approach with others. Yoshua Bengio MIT From honavar at iastate.edu Sat Oct 19 13:30:33 1991 From: honavar at iastate.edu (honavar@iastate.edu) Date: Sat, 19 Oct 91 12:30:33 CDT Subject: redundancy (was Re: batch-mode implementations) In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0400. <9110181508.AA00547@lamoon> Message-ID: <9110191730.AA07387@iastate.edu> Scott Fahlman wrote: >>I guess you could measure redundancy by seeing if some subset of the >>training data set produces essentially the same gradient vector as the full >>set. Yann Le Cun responded: > Hmmm, I think any dataset for which you expect good generalization is redunda nt. > Train your net on 30% of the dataset, and measure how many of the remaining > 70% you get right. If you get a significant portion of them right, then > accumulating gradients on these examples (without updating the weights) would > be little more than a waste of time. It is probably useful to distinguish between redundancy WITHIN the training set and the redundancy BETWEEN the training and test sets (or, redundancy in the combined training and test sets). I suspect Scott Fahlman was refering to the redundancy (R1) within the training set while Le Cun was refering to the redundancy (R2) in the set formed by the union of training set and test set (please correct me if I am wrong). I would expect the relationship between generalization and R1 to be quite different from the relationship between generaization and R2. Whether the two measures of redundancy will be the same or not will almost certainly depend on the method(s) (e.g., sampling procedures, sample size reduction techniques) used to arrive at the data actually given to the network during training. In fact, if a training set T (obtained say, by random sampling from some underlying distribution) were to be preprocessed in some fashion (e.g., using statistical techniques) and reduced training set T' was obtained from T after eliminating the "redundant" samples, clearly the redundancy (R1') within the reduced training set T' will be much smaller than the redundancy (R1) in the original training set T although the overall redundancy (R2) in the set formed by the union of T and the test data may be more or less equal to the redundancy (R2') in the set formed by the union of T' and the test data. My guess is that the generalization on the test data will be more or less the same irrespective of whether T or T' is used for training the network. Vasant Honavar honavar at iastate.edu From nowlan at helmholtz.sdsc.edu Sat Oct 19 15:05:24 1991 From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan) Date: Sat, 19 Oct 91 12:05:24 MST Subject: Paper Announcement (Neuroprose) Message-ID: <9110191905.AA15742@bose> ** Paper available via Neuroprose *************************************** ** Please do not forward to other mailing lists or boards. Thank you. ** The following paper has been placed in the Neuroprose archives at Ohio State. The file is nowlan.soft-share.ps.Z Ftp instructions follow the abstract. ----------------------------------------------------- Simplifying Neural Networks by Soft Weight-Sharing Steven J. Nowlan Computational Neuroscience Laboratory The Salk Institute P.O. Box 5800 San Diego, CA 92186-5800 Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Canada M5S 1A4 ABSTRACT: One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of non-zero weights. We propose a more complicated penalty term in which the distribution of weight values is modelled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability densities under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms. ----------------------------------------------------- FTP INSTRUCTIONS Either use "Getps nowlan.soft-share.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get nowlan.soft-share.ps.Z ftp> quit unix> uncompress nowlan.soft-share.ps.Z unix> lpr -s nowlan.soft-share.ps (or however you print postscript) Steven J. Nowlan Computational Neuroscience Laboratory The Salk Institute P.O. Box 85800 San Diego, CA 92186-5800 Work Phone: 619-453-4100 X463 e-mail: nowlan at helmholtz.sdsc.edu From tgd at guard.berkeley.edu Sat Oct 19 17:09:06 1991 From: tgd at guard.berkeley.edu (Tom Dietterich) Date: Sat, 19 Oct 91 14:09:06 -0700 Subject: batch-mode parallel implementations In-Reply-To: Tom English's message of Fri, 18 Oct 91 13:20:19 CDT <9110181820.AA00593@sun1.cs.ttu.edu> Message-ID: <9110192109.AA04626@guard.berkeley.edu> There has been a fair amount of work in decision-tree learning on the issue of breaking large training sets into smaller batches. In 1980, Quinlan introduced a method called "windowing" in which a small sample (or window) of the training data is initially drawn at random. The algorithm is trained on this window and then tested on the remainder of the data (that was excluded from the window). Then, some fraction of the misclassified examples (possibly all of them) are added to the window. Generally speaking, in noise-free domains, windowing works quite well. A very high-performing decision tree can be learned with a relatively small window. However, for noisy data, the general experience has been that the window eventually grows to include the entire training set. Jason Catlett (Sydney U) recently completed his dissertation on testing windowing and various other related tricks on datasets of roughly 100K examples (straight classification problems). I recommend his papers and thesis. His main conclusion is that if you want high performance, you need to look at all of the data. --Tom From ross at psych.psy.uq.oz.au Sat Oct 19 19:50:16 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Sun, 20 Oct 1991 09:50:16 +1000 Subject: batch & on-line training Message-ID: <9110192350.AA02282@psych.psy.uq.oz.au> On the topic of batch versus on-line training, Kamil at apple.com writes: > ... there doesn't seem to be a > compelling argument for preferring one or the other IN PRINCIPLE. I would like to turn the dichotomy into a trichotomy and argue that there is an 'in principle' reason for a preference. I want to add one-shot learning, which I define (on the spur of the moment) to be successful learning from one occasion of exposure to the input. This phenomenon is known to happen in animals (e.g. it can happen in taste aversion conditioning) and can happen in humans (e.g. recognition of an abstract painting seen only once before). One-shot learning becomes critical if you are trying to perform 'cognitive' tasks - when you learn the route to a new office you don't need hundreds or thousands of exposures to get it right. Obviously, one-shot learning can't be expected to happen in all circumstances: you have to be working in a constrained problem domain that can support it and the learner has to have the background knowledge that will support what is to be learned. Most of the work that is done with backprop and its relatives starts with near to a tabula rasa and all the time and effort goes into creating the universe from only the input data. Obviously, techniques do exist for one-shot learning: e.g. simple delta rule with a learning rate of 1. The problem is that they fail on the problems that people regard as interesting - inputs non-orthogonal and hidden units required. The challenge is to find a one-shot learning algorithm that can work on interesting problems. I believe that this will require strong architectural and problem data constraints. I see the current heavy use of gradient-descent techniques as analogous to the period in the history of AI when researchers looked for general problem solving techniques that were universally applicable. General techniques worked on toy problems but rapidly bogged down on real problems. In BP, we have a technique for learning arbitrary mappings, and we pay for it with excruciatingly slow learning. To summarise: IF you want to perform cognitive tasks THEN 'in principle' one shot learning is the only training regime that is acceptable (although slower learning may be required to get the net to the point where it can learn in one shot). All you have to do is invent a good one-shot learning scheme :-). Ross Gayler ross at psych.psy.uq.oz.au From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Oct 20 11:08:11 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 20 Oct 91 11:08:11 -0400 Subject: batch-mode parallel implementations In-Reply-To: Your message of Fri, 18 Oct 91 13:20:19 -0600. <9110181820.AA00593@sun1.cs.ttu.edu> Message-ID: I would like to mention a case where, surprisingly, even large batches gave instability. The application was recognition of handwritten lower-case letters, and the network was of the LeCun variety. The training set comprised three batches of 1950 letter images (a total of 5850 images). This partition was chosen randomly. Fahlman's quickprop behaved poorly, and with some close inspection I found a number of weights for which the partial derivative was changing sign from one batch to the next. Further, the magnitudes of those partials were not always small. In short, the performance surfaces for the three batches differed considerably. The moral: You may have to make a single batch of the entire training set, even when working with fairly large training sets. -- Tom English Note that it is OK to switch from one training set to another when using Quickprop, but that every time you change the training set you *must* zero out the prev-slopes and delta vectors. This prevents to quadratic part of the algorithm from trying to draw a parabola between two slopes that are not closely related. If you don't do this, that one step can badly mess up the weights you've laboriously accumulated so far. Of course, if you do this after every sample, the quadratic acceleration never kicks in and you end up with nothing more than plain old backprop without momentum. If you want to get any benefit from quickprop, you have to run each distinct training set for at least a few cycles. If you were aware of all that (it's unclear from your message) and still experienced instability, then I would say that the batches, even though they are fairly large, are not large enough to provide a fair representation of the underlying distribution. -- Scott From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Sun Oct 20 19:55:51 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu) Date: Sun, 20 Oct 91 18:55:51 EST Subject: One-shot learning Message-ID: Ross Gayler wrote: >To summarise: IF you want to perform cognitive tasks THEN 'in principle' one >shot learning is the only training regime that is acceptable (although slower >learning may be required to get the net to the point where it can learn in >one shot). All you have to do is invent a good one-shot learning scheme :-). Although one-shot (-trial) learning may not be the only mode of learning in our cognitive processes, it's true that the learning in our cognitive processes will not take as many times (epochs) as current BPNN takes. One-shot learning can be served as a goal and a criterion for learning schemes in both cognitive learning processes as well as learning systems for practical applications. Our work on PPNN (I posted the abstract several days ago) was originally driven by the one-trial learning. Although PPNN has not reached one-trial learning, it has stepped closer to it. In order to contrast the topological effect, we constrained PPNN to be the same as BPNN in all aspects except the topology. It was shown that the stereotopology alone can increase the training times (epoches) by several orders (due to the characteristics of PPNN's stereotopology, we used the average training time instead of epochs to measure the rate of convergence). It was found that the more difficult the problem is, the higher the order is. This topological speedup lies in the fact that there is a cause of slowness in the original planar topology of BPNN that cannot be accounted for by the learning algorithm or units characteristics (no matter what learning algorithm is used or what units responsive characteristics are employed, this cause of slow learning always exists. It is inherent to the planar topology of BPNN). Bo Xu Indiana University itgt500 at indycms.iupui.edu From mmoller at daimi.aau.dk Mon Oct 21 08:13:06 1991 From: mmoller at daimi.aau.dk (mmoller@daimi.aau.dk) Date: Mon, 21 Oct 91 13:13:06 +0100 Subject: Batch methods versus stochastic methods... Message-ID: <9110211213.AA13826@sinope.daimi.aau.dk> --- Concerning the discussion about batch-update versus stochastic update. The last about 6 month we have been working with online versus batch problems. A preprint of a paper, which tries to describe why the stochastic methods in some instances are better than the deterministic batch mehods will soon be available via the neuroprose archive. The paper also introduces a new algorithm which combines the good properties of the stochastic methods as well the batch methods. Our results so far can be summarized as follows: The redundancy of the trainingset plays as has been mentioned before a very important role. It is not clear, however, how to define this redundancy in a proper way. The usually definition of redundancy taken from the information theory can give a hint about he redundancy but can not in any obvious way provide a precise defintion, because this would involve the information content of the trainingset as well as the internal dynamics (the structur) of the network. So when we discuss the concept of redundancy we should be aware of that redundancy in the context of learning in feedforward networks is not very well defined. Another very important issue which I think is even more important than the concept of redundancy is the structure of the error surface. The "true" error surface which are given by the whole trainingset is as we know often characterized by a large number of flat regions and very steep, narrow ravines. Batch methods operates in the true but very complex error surfaces while stochastic methods operate in partial error surfaces which are only approximations to the true error surface. So stochastic methods makes a noisy, stochastic search in the true error surface which can help them through the flat regions. One can think of the stochastic search as a kind of "simulated annealing" approach where increase of error is also allowed. The algorithm we propose is based on a combination of the good properties of stochastic and batch algorithms. The main idea is to use a conjugate gradient algorithm on blocks of data (block-update or semi-batch update). Because the conjugate gradient algorithms updates weights with variable (and sometime large) stepsizes a validation scheme is used to control the updates. Through a simple sample technique we estimate the probabillity that an update will decrease the total error. This probabillity is then used to decide whether to update or not. The number of patterns needed in each block-update is a variable and controlled by an adaptive optimization scheme during training. We have done some experiments with this approach on the nettalk problem. Our results so far shows that the approach decreases the error faster per epoch than the stochastic backpropagation. More computation is however needed per epoch. An interesting observation is that the number of blocks needed to make an update is growing during learning so that after a certain number of epochs the blocksize is equal to the number of patterns. When this happens the algorithm is equal to a traditional batch-mode algorithm and no validation is needed anymore. In order to be able to draw some definite conclusions we need a few more experiments on different trainingsets. Unfortunately, we do not have any datasets of the proper size. So I would appreciate if anyone could inform me about where to find big datasets that are public available. -- Martin M ----------------------------------------------------------------------- Martin F. Moller email: mmoller at daimi.aau.dk Computer Science Department phone: +45 86202711 5223 Aarhus University fax: +45 86135725 Ny Munkegade, Building 540 8000 Aarhus C Denmark ---------------------------------------------------------------------- From giles at research.nec.com Mon Oct 21 09:15:03 1991 From: giles at research.nec.com (Lee Giles) Date: Mon, 21 Oct 91 09:15:03 EDT Subject: Announcement of NIPS Workshop Message-ID: <9110211315.AA19197@fuzzy.nec.com> Announcement of NIPS Workshop: ************************************************************************** RECURRENT NETWORKS: THEORY AND APPLICATIONS Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains one of the important open issues in the neural network area. Training algorithms are very inefficient in terms of memory demands, computational needs or both. Little is known about convenient architectures for recurrent networks. The number of known successful applications is very limited. Even for static applications (operation in the "fixed point mode"), recurrent networks are more general, and therefore more powerful, in principle, than feedforward ones. However, once again, little is known about their actual (dis)advantages, convenient architectures, successful applications, etc. We welcome proposals for presentations ( no more than one page in length) related to the theme of theory or applications of recurrent networks. Subject to the number of received proposals, we envisage a two day workshop, one day theory, the next day applications, with 15-20 minute presentations, each followed by about 10 minutes of discussion. Please send proposals to Lee Giles. Organizers: Professor Luis Borges de Almeida INESC Rua Alves Redol, 9 Apartado 10105 1017 LISBOA CODEX PORTUGAL 351-1-544607 inesc!lba at relay.EU.net (or) lba at sara.inesc.pt C. Lee Giles NEC Research Institute 4 Independence Way Princeton, N.J. 08540 609-951-2642 FAX: 609-951-2482 giles at research.nj.nec.com Richard Rohwer Centre for Speech Technology Research Edinburgh University 80, South Bridge Edinburgh EH1 1HN, Scotland (44 or 0) (31) 650-2764 FAX: (44 or 0) (31) 226-2730 rr%ed.cstr at nsfnet-relay.ac.uk (or) rr at uk.ac.ed.cstr ************************************************************************** C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From DOW_ERNST at LILLY.COM Mon Oct 21 10:16:00 1991 From: DOW_ERNST at LILLY.COM (Ernst Dow, 276-9916) Date: Mon, 21 Oct 1991 09:16 EST Subject: one-shot learning Message-ID: <01GC03SM0RHC0000EE@GATEWAY.LILLY.COM> Ross Gayler writes: I want to add one-shot learning, which I define (on the spur of the moment) to be successful learning from one occasion of exposure to the input. This phenomenon is known to happen in animals (e.g. it can happen in taste aversion conditioning) and can happen in humans (e.g. recognition of an abstract painting seen only once before). etc. If it was a big enough event in your life, you will have memorized the event. If it was not so monumental, you can help your memory by replaying the event in your mind. But in this case, we are talking memorization, not generalization. You may be able to identify the painting you saw before, but could you make the leap to recognizing all other abstract paintings? Ernst Dow ernst at lilly.com From: DOW ERNST (MCVAX0::TC64566) From mike at psych.ualberta.ca Mon Oct 21 12:15:37 1991 From: mike at psych.ualberta.ca (Mike R. W. Dawson) Date: Mon, 21 Oct 1991 10:15:37 -0600 Subject: Open position in cognitive psychology Message-ID: <9110211613.AA01542@psych.ualberta.ca> I'd like to bring the following open position in cognitive psychology to the attention of anyone who might be modeling cognitive processes with their networks: ======================================================================= Cognitive or Developmental Psychologists The Department of Psychology, University of Alberta, invites applications for one and, subject to budgetary considerations, possibly two tenure track positions at the level of beginning Assistant Professor, salary range: $38,955-$55,755. Candidates with research expertise in either COGNITIVE PSYCHOLOGY or DEVELOPMENTAL PSYCHOLOGY will be considered. The position in Cognitive is open with respect to area of specialization. The position in Developmental is also open with respect to area, but there is some preference for individuals with interests in language development, conceptual development, mathematical cognition, reading, scientific reasoning, spelling, or writing. Current Developmental faculty conduct research on emergent literacy, reading, and arithmetic skill. Decisions will be made on the basis of demonstrated research excellence, interactions with colleagues, and teaching ability. Applications should include a curriculum vita, three letters of recommendation, and reprints or recent publications. These materials should be sent, as appropriate, to Cognitive Search Chair, Dr. Peter Dixon, or Developmental Search Chair, Dr. Jeffrey Bisanz, Department of Psychology, University of Alberta, Edmonton, Alberta, Canada T6G 2E9. To receive full consideration, all materials must be received by January 1, 1992. The University of Alberta is committed to the principle of equity in employment. The University encourages applications from aboriginal persons, disabled persons, members of visible minorities and women. ======================================================================== Michael R. W. Dawson email: mike at psych.ualberta.ca Department of Psychology University of Alberta Edmonton, Alberta Tel: +1 403 492 5175 T6G 2E9, Canada Fax: +1 403 492 1768 From bap at james.psych.yale.edu Mon Oct 21 13:41:35 1991 From: bap at james.psych.yale.edu (Barak Pearlmutter) Date: Mon, 21 Oct 91 13:41:35 -0400 Subject: Paper Announcement (Neuroprose) In-Reply-To: "Steven J. Nowlan"'s message of Sat, 19 Oct 91 12:05:24 MST <9110191905.AA15742@bose> Message-ID: <9110211741.AA03347@james.psych.yale.edu> The following paper has not been placed in the Neuroprose archives at Ohio State. The file is not pearlmutter.soft-share.soft-share.ps.Z. Ftp instructions follow the abstract. ----------------------------------------------------- Simplifying Neural Network Soft Weight-Sharing Measures by Soft Weight-Measure Soft Weight Sharing Barak Pearlmutter Department of Psychology P.O. Box 11A Yale Station New Haven, CT 06520-7447 ABSTRACT: It has been shown by Nowlan and Hinton (1991) that it is advantagious to construct weight complexity measures for use in weight regularization through the use of EM, instead of relying on some a-priori complexity measure, or even worse, neglecting regularization by assuming a uniform distribution. Their work can be regarded as a generalization of the "Optimal Brain Damage" of Le Cunn et al (1990), in which the distribution of weights is estimated with a histogram, a peculiar functional form for a distribution. Nowlan and Hinton assume a much simpler functional form for the distribution, avoiding overfitting and therefore overregularization. However, they disregard the issue of regularization of the regularizer itself. Just as certain weights might be considered a-priori quite unlikely, certain distributions of weights may be considered a-priori quite unlikely. To solve this problem, we introduce a regularization term on the parameters of the weight distribution being estimated. This regularization term is itself determined by a distribution over these distributional parameters. In this light, Nowlan and Hinton (1991) make the uniform distributional parameter distribution assumption. Here, we estimate the distribution of distributions by running an ensemble of networks, with EM used to estimate the weight distribtion of each network (following Nowlan and Hinton), but we then use EM to estimate the distribution of distributions across networks. Of course, each estimated distribution is used to regularize the parameters over which that distribution is defined, leading to regularization of the individual network regularizers. We do not consider how to estimate the a-priori distribution which might be used to regularize the distribution being used to regularize the distribution being used to regularize the weights being estimated from the data, which will be the explored in a future paper. ----------------------------------------------------- FTP INSTRUCTIONS Either use "getps pearlmutter.soft-share.soft-share.ps.Z", or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get pearlmutter.soft-share.soft-share.ps.Z ftp> quit unix> uncompress pearlmutter.soft-share.soft-share.ps.Z unix> lpr -s pearlmutter.soft-share.soft-share.ps Barak Pearlmutter Department of Psychology P.O. Box 11A Yale Station New Haven, CT 06520-7447 Work Phone: 203 432-7011 From ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu Mon Oct 21 15:46:00 1991 From: ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu (ANDERSON%BROWNCOG.BITNET@mitvma.mit.edu) Date: Mon, 21 Oct 91 14:46 EST Subject: Technical Report Announcement Message-ID: Technical Report 91-3 available from: Department of Cognitive and Linguistic Sciences Box 1978, Brown University, Providence, RI 02912 A Study in Numerical Perversity: Teaching Arithmetic to a Neural Network James A. Anderson, Kathryn T. Spoehr, and David J. Bennett Department of Cognitive and Linguistic Sciences Box 1978 Brown University Providence, RI 02912 Abstract There are only a few hundred well-defined facts in elementary arithmetic, but humans find them hard to learn and hard to use. One reason for this difficulty is that the structure of elementary arithmetic lends itself to severe associative interference. If a neural network corresponds in any sense to brain-style computation, then we should expect similar difficulties teaching elementary arithmetic to a neural network. We find this observation is correct for a simple network that was taught the multiplication tables. We can enhance learning of arithmetic by forming a hybrid coding for the representation of number that contains a powerful analog or "sensory" component as well as a more abstract component. When the simple network uses a hybrid representation, many of the effects seen in human arithmetic learning are reproduced, including overall error patterns and response time patterns for false products. An extension of the arithmetic network is capable of being flexibly programmed to correctly answer questions involving terms such as "bigger" or "smaller." Problems can be answered correctly, even if the particular comparisons involved had not been learned previously. Such a system is genuinely creative and flexible, though only in a limited domain. It remains to be seen if the computational limitations of this approach are coincident with the limitations of human cognition. A version of this report will appear as a chapter in: "Neural Networks for Knowledge Representation and Inference" Edited by Daniel S. Levine and Manuel Aparicio, IV To be published by Lawrence Erlbaum Associates, Hillsdale, New Jersey Copies can be obtained by sending an email message to: LI700008 at brownvm.BITNET or to: anderson at browncog.BITNET From english at sun1.cs.ttu.edu Mon Oct 21 17:12:09 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Mon, 21 Oct 91 16:12:09 CDT Subject: batch-mode parallel implementations Message-ID: <9110212112.AA01265@sun1.cs.ttu.edu> With regard to my earlier posting on problems I encountered in applying Quickprop, Scott Fahlman has replied: Note that it is OK to switch from one training set to another when using Quickprop, but that every time you change the training set you *must* zero out the prev-slopes and delta vectors. If you want to get any benefit from quickprop, you have to run each distinct training set for at least a few cycles. If you were aware of all that (it's unclear from your message).... Well, I was not aware of what others were doing in practice. Scott's original tech report on Quickprop gave results only for the case of once-per-epoch weight updates. I apologize for referring to my implementation with once-per-batch weight updates and no zeroing between batches as "Fahlman's Quickprop." What I *did* understand was that Quickprop's attempt to approximate the error surface with a paraboloid was going to be fouled-up if the "pictures" of the error surface gleaned from different batches were substantially different. Training for multiple iterations with one batch, and then resetting the variables used in estimating the shape of the error surface before going on to the next batch would certainly eliminate the problem I described. The prospect of choosing the number of iterations per batch does not thrill me, however. In general, I hate parameter tweaking. From my perspective, the worst thing about parameter tweaking is that we don't really know how it affects the quality of the final network obtained. Also, exploring the effects of different parameter settings takes too much of *my* time. I want a procedure that does not require tweaking and that runs at a reasonable fraction of the speed of a "well-tuned" stochastic gradient descent procedure for a wide range of problems. (I haven't experimented with conjugate gradient descent yet, but it seems to fit my bill.) --Tom english at sun1.cs.ttu.edu From giles at research.nec.com Tue Oct 22 15:51:28 1991 From: giles at research.nec.com (Lee Giles) Date: Tue, 22 Oct 91 15:51:28 EDT Subject: Announcement of NIPS (Neural Information Processing Systems) Workshop Message-ID: <9110221951.AA21064@fuzzy.nec.com> Announcement of NIPS (Neural Information Processing Systems) Workshop: Dec 6-7, Vail, Colorado. ************************************************************************** RECURRENT NETWORKS: THEORY AND APPLICATIONS Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains one of the important open issues in the neural network area. Training algorithms are very inefficient in terms of memory demands, computational needs or both. Little is known about convenient architectures for recurrent networks. The number of known successful applications is very limited. Even for static applications (operation in the "fixed point mode"), recurrent networks are more general, and therefore more powerful, in principle, than feedforward ones. However, once again, little is known about their actual (dis)advantages, convenient architectures, successful applications, etc. We welcome proposals for presentations ( no more than one page in length) related to the theme of theory or applications of recurrent networks. Subject to the number of received proposals, we envisage a two day workshop, one day theory, the next day applications, with 15-20 minute presentations, each followed by about 10 minutes of discussion. Please send proposals to Lee Giles. Organizers: Professor Luis Borges de Almeida INESC Rua Alves Redol, 9 Apartado 10105 1017 LISBOA CODEX PORTUGAL 351-1-544607 inesc!lba at relay.EU.net (or) lba at sara.inesc.pt C. Lee Giles NEC Research Institute 4 Independence Way Princeton, N.J. 08540 609-951-2642 FAX: 609-951-2482 giles at research.nj.nec.com Richard Rohwer Centre for Speech Technology Research Edinburgh University 80, South Bridge Edinburgh EH1 1HN, Scotland (44 or 0) (31) 650-2764 FAX: (44 or 0) (31) 226-2730 rr%ed.cstr at nsfnet-relay.ac.uk (or) rr at uk.ac.ed.cstr ************************************************************************** C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From thsspxw at iitmax.iit.edu Tue Oct 22 19:10:57 1991 From: thsspxw at iitmax.iit.edu (Peter Wohl) Date: Tue, 22 Oct 91 18:10:57 CDT Subject: batch-mode parallel implementations In-Reply-To: <8431.688104706@B.GP.CS.CMU.EDU>; from "Connectionist_Research_Group@B.GP.CS.CMU.EDU" at Oct 22, 91 12:11 (midnight) Message-ID: <9110222311.AA09935@iitmax.iit.edu> Dear connectionists, I have some comments on several of these, so I decided not to include all the history of this discussion in my reply (you read it anyway). So here I go: 1. Given per-sample training, one still faces the problem of how to deal with really large networks (thousands of neurons and hundreds of thousands connections) on a parallel machine that has far fewer processors. What has been proposed: a) SIMD (don't cry for unused processors, as long as you can communicate fast enough); b) MIMD with clustering neurons somehow together, to increase granularity (SIMD also needs some), problem here being dependence on VERY particular nets (usually layers with powers of 2 neurons); c) re-writing the communication of the algorithm (see for example my paper this coming Nov at ICTAI'91). 2. I agree that epoch-training is probably desirable. How large is a "typical" epoch for a "large" net (thousands of neurons, fraction of million connections at least) ? Tens of vectors, hundreds ? I would say, no more than few hundreds. 3. "Recall" (forward propagation with no weight update) is far easier to parallelize, since there is no end-of-epoch bottleneck (barrier synch). In some results (to be published next year), we achieved (on 32 BBN Butterfly processors) almost 2 million connec-presen/sec with backprop., but over 5 million at recall. (2.5 million if you "adjust" forward-only by dividing by two, to match the backprop figure more closely). To summarize, I think the real problem of parallelizing ANNs applies when at least one of net-size or training-epoch-size is large (and thus slow when run sequentially). And don't forget: net architecture could change during training (e.g. cascade corr), and still keep it parallel. Thanks for your patience, Peter Wohl thsspxw at iitmax.iit.edu From spotter at darwin.bio.uci.edu Tue Oct 22 19:17:52 1991 From: spotter at darwin.bio.uci.edu (Steve Potter) Date: Tue, 22 Oct 91 16:17:52 PDT Subject: Continuous vs. Batch learning Message-ID: <9110222317.AA22627@sanger.bio.uci.edu> It is pretty clear to me that biological neural networks have all adapted to prefer the continuous learning technique, as we can verify for humans by remembering something that we only saw (or heard, etc.) once. One-trial learning paradigms abound in the behavioral literature. I cant think of any biological examples of batch learning, in which sensory data are saved until a certain number of them can be somehow averaged together and conclusions made and remembered. Any ideas? Anyway, perhaps we should take an example from nature, which has been optimizing things far longer than we have! Steve Potter UC Irvine Psychobiology dept. Irvine, CA 92717 spotter at darwin.bio.uci.edu From jbower at cns.caltech.edu Wed Oct 23 00:47:51 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Tue, 22 Oct 91 21:47:51 PDT Subject: CNS*92 Message-ID: <9110230447.AA01301@cns.caltech.edu> CALL FOR PAPERS First Annual Computation and Neural Systems Meeting CNS*92 Tuesday, July 26 through Sunday, July 31 1992 San Francisco, California This is the first annual meeting of an inter-disciplinary conference intended to address the broad range of research approaches and issues involved in the general field of computational neuroscience. The meeting itself has grown out of a workshop on "The Analysis and Modeling of Neural Systems" which has been held each of the last two years at the same site. The strong response to these previous meetings has suggested that it is now time for an annual open meeting on computational approaches to understanding neurobiological systems. CNS*92 is intended to bring together experimental and theoretical neurobiologists along with engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in understanding how neural systems compute. The meeting will equally emphasize experimental, model-based, and more abstract theoretical approaches to understanding neurobiological computation. The first day of the meeting (July 26) will be devoted to tutorial presentations and workshops focused on particular technical issues confronting computational neurobiology. The next three days will include the main technical program consisting of plenary, contributed and poster sessions. There will be no parallel sessions and the full text of presented papers will be published. Following the regular session, there will be two days of focused workshops at a site on the California coast (July 30-31). Participation in the workshops is restricted to 75 attendees. Technical Program: Plenary, contributed and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Presentation categories: A. Theory and Analysis B. Modeling and Simulation C. Experimental D. Tools and Techniques Themes: A. Development B. Cell Biology C. Excitable Membranes and Synaptic Mechanisms D. Neurotransmitters, Modulators, Receptors E. Sensory Systems 1. Somatosensory 2. Visual 3. Auditory 4. Olfactory 5. Other F. Motor Systems and Sensory Motor Integration G. Behavior H. Cognitive I. Disease Submission Procedures: Original research contributions are solicited, and will be carefully refereed. Authors must submit six copies of both a 1000-word (or less) summary and six copies of a separate singlepage 50-100 word abstract clearly stating their results postmarked by January 7, 1992. Accepted abstracts will be published in the conference program. Summaries are for program committee use only. At the bottom of each abstract page and on the first summary page indicate preference for oral or poster presentation and specify at least one appropriate category and and theme. Also indicate preparation if applicable. Include addresses of all authors on the front of the summary and the abstract and indicate to which author correspondence should be addressed. Submissions will not be considered that lack category information, separate abstract sheets, the required six copies, author addresses, or are late. Mail Submissions To: Chris Ploegaert CNS*92 Submissions Division of Biology 216-76 Caltech Pasadena, CA. 91125 Mail For Registration Material To: Chris Ghinazzi Lawrence Livermore National Laboratories P.O. Box 808 Livermore CA. 94550 All submitting authors will be sent registration material automatically. Program committee decisions will be sent to the correspondence author only. CNS*92 Organizing Committee: Program Chair, James M. Bower, Caltech. Publicity Chair, Frank Eeckman, Lawrence Livermore Labs. Finances, John Miller, UC Berkeley and Nora Smiriga, Institute of Scientific Computing Res. Local Arrangements, Ted Lewis, UC Berkeley and Muriel Ross, NASA Ames. Program Committee: William Bialek, NEC Research Institute. James M. Bower, Caltech. Frank Eeckman, Lawrence Livermore Labs. Scott Fraser, Caltech. Christof Koch, Caltech. Ted Lewis, UC Berkeley. Eve Marder, Brandeis. Bruce McNaughton, University of Arizona. John Miller, UC Berkeley. Idan Segev, Hebrew University, Jerusalem Shihab Shamma, University of Maryland. Josef Skrzypek, UCLA. DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 please post From palmer at world.std.com Wed Oct 23 02:25:10 1991 From: palmer at world.std.com (Kent D Palmer) Date: Wed, 23 Oct 91 02:25:10 -0400 Subject: THINKNET NEWSLETTER ANNOUNCEMENT Message-ID: <9110230625.AA18459@world.std.com> ===========================START=OF=THINKNET=FILE============================ ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||| PLEASE POST ----- NEWSLETTER ANNOUNCEMENT |||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| /| ....... .. .. . . . . .==|........ ... .. .... . .... .. ._____. . * . . / ===|_ _. ..______________________________...... | | | | |\ | / ======== |\ ...| .... |.THINKNET:An Electronic.... | |---| | | \ |< ========== |. \ .|---- . |.Journal Of Philosophy,... | | | | | \| \ ======== |... \| ..... |.Meta-Theory, And Other.. | | | | | | \ ====== |.... |____.. |.Thoughtful Discussions.... .==| ........ .. .... .. ... .. . \| .... ... .. .. . . .. . . ----------------------------------------------------------------------------- OCTOBER 1991 ISSUE 001 VOLUME 1 NUMBER 1 ----------------------------------------------------------------------------- This is an announcement for Thinknet, an on-line magazine forum dedicated to thoughtfulness in the cybertime environment. Thinknet covers philosophy, systems theory, and meta-theoretical discussions within disciplines. It is your interdisciplinary window on to what significant information sources are available to foster thought provoking discussion. *CONTENTS* Publication Data Scope of newsletter. Rationale for newsletter. Subscriptions and Submittals address. Bulletin Boards where it may be found. Services offered by newsletter. Staff of this edition. Coda: call for participation. About Thinknet Discussion of goals of Thinknet Newsletter. Prospect for Philosophy and Systems Theory in Cybertime Is there a possibility for a renaissance for philosophy? The Philosophy Category on GEnie Review by Gordon Swobe with list of topics. Philosophy on the WELL Review by Jeff Dooley with list of topics. Origin Conference on the WELL Review by Bruce Schuman with list of topics Internet Philosophy Mailing Lists A review of all know philosophy oriented mailing lists by Stephen Clark. Books Of Note THE MATRIX !%@:: A DIRECTORY OF ELECTRONIC MAIL ADDRESSING & NETWORKS Other Publications BOARDWATCH MAGAZINE SOFTWARE ENGINEERING FOUNDATIONS [a work in progress] Books, Electronic Newsletters, and Cyber-Artifacts Received ARTCOM NEWSLETTER FACTSHEET FIVE Protocols for Meaningful Discussions: ARTICLE by Kent Palmer A consideration of how philosophy discussions might be made more useful and their history accessible by using a voluntary protocol. Thoughtful Communications: EDITORIAL Closing remarks. <<<<<<<<<<<>>>>>>>>>>>> ----------------------------------------------------------------------------- HOW TO GET YOUR COPY kdp ----------------------------------------------------------------------------- *Price* The electronic form is FREE. Hardcopies cost money for reproduction, postage, and handling. *Subscriptions* Send an e-mail message to the following address: thinknet at world.std.com Your message should be of the following form: SEND THINKNET TO YourFullName AT YourEmailAddress Some mailing lists do not include your return mailing address if you use the reply function of your mail reader so you must make sure your return e-mail address is in the body of your message. Thinknet file is long, about 1113 lines; 7136 words; 51795 bytes. You will be added to the thinknet subscription list. You will get all further issues unless you unsubscribe. *Bulletin Boards* Thinknet will be posted in the WELL philosophy conference in a topic. The WELL 27 Gate Five Road, Sausalito, CA 94965 modem 415-332-6106 voice 415-332-4335 Also on GEnie in the Philosophy category under the Religion and Ethics Bulletin Board. GEnie Client Services 1-800-638-9636 *PHILOS-L Listserver* You will eventually be able to get the thinknet newsletter from a listserver. Send the message 'GET THINKNET DOC' to 'LISTSERV at LIVERPOOL.AC.UK'. If you get an error message try the regular thinknet address. *Or if all else fails* THINKNET PO BOX 8383 ORANGE CA 92664-8383 UNITED STATES ==============================END=THINKNET=FILE============================= From ross at psych.psy.uq.oz.au Wed Oct 23 04:23:43 1991 From: ross at psych.psy.uq.oz.au (Ross Gayler) Date: Wed, 23 Oct 1991 18:23:43 +1000 Subject: one-shot learning Message-ID: <9110230823.AA28466@psych.psy.uq.oz.au> Ernst Dow (ernst at lilly.com) writes (in the context of one-shot or one-trial learning): >But in this case, we are talking memorization, not generalization. You may >be able to identify the painting you saw before, but could you make the >leap to recognizing all other abstract paintings? My interest is in analogical retrieval and not one-trial learning (except to the extent that it is necessary for 'truly cognitive' capabilities). The literature on analogy stresses the role that goals play in determining the apparent similarity (and hence generalisation) of entities. That is, in analogy the generalisation pattern emerges at recall time rather than being completely determined at storage time. For such a (post-hoc) generaliser it makes sense to attempt to memorise everything. This contrasts with the approach of most BP work where the system learns an internal representation (read that as set of hidden units and weights) that supports a particular pre-specified pattern of generalisation. I realise that there is more to life than analogical recall and some generalisation is based on literal similarity etc, but I am just stating the extreme position for simplicity. Ross Gayler ross at psych.psy.uq.oz.au From pluto at cs.UCSD.EDU Mon Oct 21 19:29:59 1991 From: pluto at cs.UCSD.EDU (Mark Plutowksi) Date: Mon, 21 Oct 91 16:29:59 PDT Subject: Redundancy Message-ID: <9110212329.AA12326@tournesol.ucsd.edu> Scott Fahlman writes: :: :: I guess you could measure redundancy by seeing if some subset of the :: training data set produces essentially the same gradient vector as the full :: set. Probably statisticians have good ways of talking about this :: redundancy business -- unfortunately, I don't know the right vocabulary. Indeed they do; however, they begin from a more general perspective: for a particular "n", where "n" is the number of exemplars we are going to train on, call a set of "n" exemplars optimal if better generalization can not be obtained by training on any other set of "n" exemplars. This criterion is called "Integrated Mean Squared Error." See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. Using appropriate approximations, we can use this to obtain what you suggest. Results for the case of clean data are currently available in Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD CSE department (see [Plutowski & White, 1991].) Basically, given a set of candidate training examples, we select a subset which if trained upon give a gradient highly correlated with the gradient obtained by training upon the entire set. This results in a concise set of exemplars representative (in a precise sense) of the entire set. Preliminary empirical results indicate that the end result is what we originally desired: training upon this well chosen subset results in generalization close to that obtained by training upon the entire set. Tom Dietterich writes: :: :: There has been a fair amount of work in decision-tree learning on the :: issue of breaking large training sets into smaller batches. In 1980, :: Quinlan introduced a method called "windowing" in which a small sample :: (or window) of the training data is initially drawn at random. The :: algorithm is trained on this window and then tested on the remainder of :: the data (that was excluded from the window). Then, some fraction of :: the misclassified examples (possibly all of them) are added to the :: window. :: :: Generally speaking, in noise-free domains, windowing works quite well. :: A very high-performing decision tree can be learned with a relatively :: small window. However, for noisy data, the general experience has :: been that the window eventually grows to include the entire training set. :: Jason Catlett (Sydney U) recently completed his dissertation on :: testing windowing and various other related tricks on datasets of :: roughly 100K examples (straight classification problems). I recommend :: his papers and thesis. :: :: His main conclusion is that if you want high performance, you need to :: look at all of the data. Could you provide a reference to the work demonstrating the performance of windowing on clean data? And could you provide an e-mail address for Jason Catlett? I am in the process of setting up benchmarking experiments for the technique I mentioned above. Although I consider the more general task of fitting arbitrary functional mappings, these works seem relevant. Thanks, ================= == Mark Plutowski Computer Science and Engineering 0114 University of California, San Diego La Jolla, CA ----------- REFERENCES: ----------- Box,G., and N.Draper. 1987. {\bf Empirical Model-Building and Response Surfaces.} Wiley, New York. Khuri, A.I., and J.A.Cornell. 1987. {\bf Response Surfaces (Designs and Analyses)}. Marcel Dekker, Inc., New York. Myers, Raymond H., and A.I. Khuri, W.H. Carter, Jr. 1989. ``Response Surface Methodology: 1966-1988.'' {\em Technometrics}. vol.31, no.2. Plutowski, Mark E., and Halbert White. 1991. ``Active selection of training examples for network learning in noiseless environments.'' Technical Report No. CS91-180, Department of Computer Science and Engineering, The University of California, San Diego. 92093-0114. Accepted pending revision by IEEE Transactions on Neural Networks. ---- Here are some other related works: -------- Cohn, David, Les Atlas, and Richard Ladner. 1990. ``Training connectionist networks with queries and selective sampling.'' {\em Advances in Neural Information Processing Systems 2,} Proc. of the Neural Information Processing Systems Conference. Morgan Kaufmann, San Mateo, California. Hwang, Jenq-Neng, J.J. Choi, Seho Oh, and Robert J. Marks III. 1990. ``Query learning based on boundary search and gradient computation of trained multilayer perceptrons. '' {\em Proc. IJCNN 1990, San Diego. The International Joint Conference on Neural Networks.} IEEE press. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Oct 21 21:27:08 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 21 Oct 91 21:27:08 -0400 Subject: Redundancy In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800. <9110212329.AA12326@tournesol.ucsd.edu> Message-ID: :: I guess you could measure redundancy by seeing if some subset of the :: training data set produces essentially the same gradient vector as the full :: set. Probably statisticians have good ways of talking about this :: redundancy business -- unfortunately, I don't know the right vocabulary. Indeed they do; however, they begin from a more general perspective: for a particular "n", where "n" is the number of exemplars we are going to train on, call a set of "n" exemplars optimal if better generalization can not be obtained by training on any other set of "n" exemplars. This criterion is called "Integrated Mean Squared Error." See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. Using appropriate approximations, we can use this to obtain what you suggest. Results for the case of clean data are currently available in Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD CSE department (see [Plutowski & White, 1991].) Basically, given a set of candidate training examples, we select a subset which if trained upon give a gradient highly correlated with the gradient obtained by training upon the entire set. This results in a concise set of exemplars representative (in a precise sense) of the entire set. Preliminary empirical results indicate that the end result is what we originally desired: training upon this well chosen subset results in generalization close to that obtained by training upon the entire set. Thanks for the references. This is a useful beginning, but doesn't seem to address the problem we were discussing. In many real-world problems, the following constraints hold: 1. We do not have direct access to "the entire set". In fact, this set may well be infinite. All we can do is collect some number of samples, and there is usually a cost for obtaining each sample. 2. Rather than hand-crafting a training set by choosing all its elements, we want to choose an appropriate "n" and then pick "n" samples at random from the set we are trying to model. Of course, if collecting samples is cheap and network training is expensive, you might throw some samples away and not use them in the training set. I don't *think* that this would ever improve generalization, but it might lead to faster training without hurting generalization. 3. The data may not be "clean". The structure we are trying to model may be masked by a lot of random noise. Do you know of any work on how to pick an optimal "n" under these conditions? I would guess that this sort of problem is already well-studied in statistics; if not, it seems like a good research topic for someone with the proper background. -- Scott Fahlman From pluto at cs.UCSD.EDU Mon Oct 21 21:54:29 1991 From: pluto at cs.UCSD.EDU (Mark Plutowksi) Date: Mon, 21 Oct 91 18:54:29 PDT Subject: Redundancy Message-ID: <9110220154.AA12390@tournesol.ucsd.edu> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ..in response to your message, included here: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= :: To: Mark Plutowksi :: Cc: connectionists at CS.CMU.EDU :: Subject: Re: Redundancy :: In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800. :: <9110212329.AA12326 at tournesol.ucsd.edu> :: Date: Mon, 21 Oct 91 21:27:08 -0400 :: From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU :: :: :: I guess you could measure redundancy by seeing if some subset of the :: :: training data set produces essentially the same gradient vector as the full :: :: set. Probably statisticians have good ways of talking about this :: :: redundancy business -- unfortunately, I don't know the right vocabulary. :: :: Indeed they do; however, they begin from a more general perspective: :: for a particular "n", where "n" is the number of exemplars we are going to :: train on, call a set of "n" exemplars optimal if better generalization can :: not be obtained by training on any other set of "n" exemplars. :: This criterion is called "Integrated Mean Squared Error." :: See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. :: :: Using appropriate approximations, we can use this to obtain what you suggest. :: Results for the case of clean data are currently available in :: Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD :: CSE department (see [Plutowski & White, 1991].) Basically, given a set of :: candidate training examples, we select a subset which if trained upon :: give a gradient highly correlated with the gradient obtained by :: training upon the entire set. This results in a concise set of exemplars :: representative (in a precise sense) of the entire set. :: Preliminary empirical results indicate that the end result is what we :: originally desired: training upon this well chosen subset results in :: generalization close to that obtained by training upon the entire set. :: :: Thanks for the references. This is a useful beginning, but doesn't seem to :: address the problem we were discussing. In many real-world problems, the :: following constraints hold: :: :: 1. We do not have direct access to "the entire set". In fact, this set may :: well be infinite. All we can do is collect some number of samples, and :: there is usually a cost for obtaining each sample. :: :: 2. Rather than hand-crafting a training set by choosing all its elements, :: we want to choose an appropriate "n" and then pick "n" samples at random :: from the set we are trying to model. Of course, if collecting samples is :: cheap and network training is expensive, you might throw some samples away :: and not use them in the training set. I don't *think* that this would ever :: improve generalization, but it might lead to faster training without :: hurting generalization. :: :: 3. The data may not be "clean". The structure we are trying to model may :: be masked by a lot of random noise. :: :: Do you know of any work on how to pick an optimal "n" under these :: conditions? I would guess that this sort of problem is already :: well-studied in statistics; if not, it seems like a good research topic for :: someone with the proper background. :: :: -- Scott Fahlman :: I don't know of a feasible way of choosing such an "n". Instead, I obtain a greedy approximation to it. What we do (as reported in the tech report by Plutowski & White) is sequentially grow the training set, first finding an "optimal" training set of size 1, then fitting the network to this training set, appending the training set with a new exemplar selected from the set of available candidates, obtaining a training set of size 2 which is "approximately optimal", fitting this set, appending a third exemplar, etc, continuing the process until the network fit obtained by training over the exemplars fits the rest of the available examples within the desired tolerance. I have no idea as to how close the resulting training sets are to being truly IMSE-optimal. But, they are much more concise than the original set - and so far, at least on the toy problems I have tried so far, it has resulted in a computational benefit, apparently because training on the smaller set of exemplars provides an informative gradient at much lower cost than is required to obtain a gradient over all of the available examples. The more the redundancy in the data, the more the computational benefit. Of course, more extensive testing is required (and in progress.) = Mark Plutowski From 72247.2225 at CompuServe.COM Mon Oct 21 23:05:00 1991 From: 72247.2225 at CompuServe.COM (Larry Fast) Date: 21 Oct 91 23:05:00 EDT Subject: Backprop Feedback Gain Message-ID: <911022030500_72247.2225_EHL25-1@CompuServe.COM> I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to compensate for the following problem: As Backprop passes the error back thru multiple layers, the gradient has a built in tendency to decay. At the output the maximum slope of the 1/( 1 + e(-sum)) activation function is 0.5. Each successive layer multiplies this slope by a maximum of 0.5. The maximum gains at various layers (where n is the output layer) is: max slope at layer n = 0.5 max slope at layer n-2 = 0.125 max slope at layer n-3 = 0.0625 max slope at layer n-4 = 0.03125 .... It has been suggested (by a couple of sources) that an attempt should be made to have each layer learn at the same rate. To this end, I'm installing a gain factor on error being backpropagated. The new error function is: errorPropGain * act * (1 - act) The nominal value that makes sense is 2 (or more). This would allow at least the maximum learning rate to propagate unattenuated. Has anyone else tried this, or any other method of flattening out the learning rate in deep layers. Any info regarding more recent releases of PDP or a users' group would also be helpful. Please respond directly to 72247.2225 at compuserve.com Thanks, Larry Fast From max.coltheart at mrc-apu.cam.ac.uk Mon Oct 21 23:04:38 1991 From: max.coltheart at mrc-apu.cam.ac.uk (max.coltheart@mrc-apu.cam.ac.uk) Date: Tue, 22 Oct 1991 11:04:38 +0800 Subject: redundancy and generalization Message-ID: <18650.9110221006@sirius.mrc-apu.cam.ac.uk> Consider the eight words PAT PAD CAT CAD POT POD COT COD. Give a net the task of translating these from letters to phonemes. Choose any subset of, say, four items as the training set and after training to asymptote test performance on the other four. Even with a training set that contains all the information needed for the test set (e.g. PAT POD CAT COD exemplifies every letter-phoneme pairing twice), the various architectures we have been trying score 0% on the generalization set (in this example, the net learns nothing about the third letter so in the generalisation test translates PAD as "pat", POT as "pod", COT as "cod" and CAD as "cat". Is this problem, trivial for rule-learning algorithms, insoluble for any system that learns by error-correction? Tom Dietterich writes: >Generally speaking, in noise-free domains, windowing works quite well. >A very high-performing decision tree can be learned with a relatively >small window. However, for noisy data, the general experience has >been that the window eventually grows to include the entire training set. >Jason Catlett (Sydney U) recently completed his dissertation on >testing windowing and various other related tricks on datasets of >roughly 100K examples (straight classification problems). I recommend >his papers and thesis. > >His main conclusion is that if you want high performance, you need to >look at all of the data. "The window eventually grows to include the entire training set" = "the system is incapable of generalizing accurately ". Note that noise isn't the problem. In my example, there's no noise, and no generalization Max Coltheart max.coltheart at mrc-apu.cam.ac.uk From ahg at eng.cam.ac.uk Tue Oct 22 05:20:21 1991 From: ahg at eng.cam.ac.uk (A.H. Gee) Date: Tue, 22 Oct 91 10:20:21 +0100 Subject: No subject Message-ID: <22398.9110220920@tw700.eng.cam.ac.uk> ************** PLEASE DO NOT FORWARD TO OTHER NEWSGOUPS **************** The following technical report has been placed in the neuroprose archives at Ohio State University: NEURAL NETWORKS AND COMBINATORIAL OPTIMIZATION PROBLEMS - THE KEY TO A SUCCESSFUL MAPPING Andrew Gee, Sreeram Aiyer and Richard Prager Technical Report CUED/F-INFENG/TR 77 Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ England Abstract For several years now there has been much research interest in the use of Hopfield networks to solve combinatorial optimization problems. Although initial results were disappointing, it has since been demonstrated how modified network dynamics and better problem mapping can greatly improve the solution quality. The aim of this paper is to build on this progress by presenting a new analytical framework in which problem mappings can be evaluated without recourse to purely experimental means. A linearized analysis of the Hopfield network's dynamics forms the main theory of the paper, followed by a series of experiments in which some problem mappings are investigated in the context of these dynamics. In all cases the experimental results are compatible with the linearized theory, and observed weaknesses in the mappings are fully explained within the framework. What emerges is a largely analytical technique for evaluating candidate problem mappings, without having to resort to the more usual trial and error. ************************ How to obtain a copy ************************ a) Via FTP: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get gee.opt_map.ps.Z ftp> quit unix> uncompress gee.opt_map.ps.Z unix> lpr gee.opt_map.ps (or however you print PostScript) Please note that a couple of the figures in the paper were produced on an Apple Mac, and the resulting PostScript is not quite standard. People using an Apple LaserWriter should have no problems though. b) Via postal mail: Request a hardcopy from Andrew Gee, Speech Laboratory, Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England. or email me: ahg at eng.cam.ac.uk From dlb at ukc.ac.uk Wed Oct 23 08:10:16 1991 From: dlb at ukc.ac.uk (dlb@ukc.ac.uk) Date: Wed, 23 Oct 91 13:10:16 +0100 Subject: Research Fellowship (UK) Message-ID: Research Fellowship in Neural Networks: Investigation of Digitally Implemented Neural Networks Based on Novel Goal-Seeking Principles UNIVERSITY OF KENT AT CANTERBURY Electronic Engineering Laboratories Applications are invited for a Research Fellowship in the Electronic Engineering Laboratories at the University of Kent to work on an SERC-funded project on digitally implemented neural networks. The project, part of an on-going programme of work in neural networks, will investigate the properties and applications of novel artificial neural networks based on Boolean processing nodes and embodying local low-level goal-seeking principles. Applicants should have a good Honours degree in electronic engineering or computer science/engineering and should preferably hold a Ph.D. degree in an appropriate area. Applicants with previous experience in the field of neural networks or image analysis would be especially welcome. The Digital Systems Research Group in the Electronic Engineering Laboratories have a very strong research programme in computational architectures for pattern processing, with a particular emphasis on neural network architectures. Extensive facilities to support this work are available, including both central and in-house computing systems, and a dedicated workstation will be available for this project. Technician support will also be provided. The appointment is for a three year period and is available from 1st January 1992. The salary is on the scale 11969 - 14170 pounds. informal enquiries may be made to Dr. Michael Fairhurst or Dr. David Bisset on +44 227-764000, or by e-mail to dlb at ukc.ac.uk Further particulars and application forms are available from The Personnel Office, The University of Kent at Canterbury, Canterbury, Kent, CT2 7NZ, England, quoting reference A92/13. Telephone +44 227 475482 or 764000 x3915. The closing date is 1st November 1991. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Wed Oct 23 11:23:19 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Wed, 23 Oct 91 11:23:19 -0400 Subject: Continuous vs. Batch learning In-Reply-To: Your message of Tue, 22 Oct 91 16:17:52 -0800. <9110222317.AA22627@sanger.bio.uci.edu> Message-ID: It is pretty clear to me that biological neural networks have all adapted to prefer the continuous learning technique... Anyway, perhaps we should take an example from nature, which has been optimizing things far longer than we have! Sure, but with a totally different technology. Give me 10^9 processors, 10^13 active, complex connections, and 3-D packing, and make short-term memory scarce, slow, and unreliable, and I'd pick continuous learning as well. And it wouldn't even take me a billion years to make the decision. -- Scott Fahlman From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Wed Oct 23 14:13:31 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU) Date: Wed, 23 Oct 91 13:13:31 EST Subject: Paper Message-ID: Because this is the first time I place paper into neuroprose, I have brought lots of troubles to Jordan Pollack of Ohio State. We don't know whether it's due to my postscript file's problem (I generated the ps file on MacWrite II by pressing and holding the command key and the "F" or "K" key together before clicking the "OK" button in the print dialogue menu) or not, the ps file cannot be printed at Jordan's place. We retried it several times, and he still cannot see it after processing it. However, the ps file inside the Inbox can be traced from UNIX. So we decide to leave the paper inside the Inbox subdirectory and announce it with a caveat that it may not work. I am sorry for this delay and inconvenience, and I will be very glad to know more methods to generate ps files from MacWrite II which will have a good behavior at neuroprose archive. Thanks in advance. The procedure to get the ps file from the Inbox is as follows: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose/Inbox ftp> binary ftp> get ppnn.ps6 ftp> quit unix> lpr ppnn.ps6 I want to thank Jordan for his great help since last week. I appreciate very much his instructions and patience in retrying different versions of ps files I sent to him. Bo Xu Indiana University itgt500 at indycms.iupui.edu From steck at spock.wsu.ukans.edu Wed Oct 23 15:11:40 1991 From: steck at spock.wsu.ukans.edu (jim steck (ME) Date: Wed, 23 Oct 91 14:11:40 -0500 Subject: Batch Mode Parallel Implementations Message-ID: <9110231911.AA01043@spock.wsu.UKans.EDU> S. Kollias and D. Anastassiou presented an interesting approximate second order training algorithm using a Least Squares Estimation Technique at IJCNN 1988 (IEEE Transactions on Circuits and Systems vol 36 no. 8 ). This algorithm is interesting because it updates the weights with each training pair, but performs the update using information saved from all previous training pairs. The algorithm includes a parameter called a forgetting factor which causes information from the previous training pairs to slowly be discounted (or forgotten). This is essentially learning somewhere in between "batch learning" and "on line learning". As an appoximate second order method, it is somewhat computationally intensive; however, the method is easily and productively vectorized on parallel architectures. Jim Steck From wray at ptolemy.arc.nasa.gov Wed Oct 23 18:33:42 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Wed, 23 Oct 91 15:33:42 PDT Subject: Paper Announcement (Neuroprose) In-Reply-To: Barak Pearlmutter's message of Mon, 21 Oct 91 13:41:35 -0400 <9110211741.AA03347@james.psych.yale.edu> Message-ID: <9110232233.AA17716@ptolemy.arc.nasa.gov> > Simplifying Neural Network > Soft Weight-Sharing Measures > by > Soft Weight-Measure > Soft Weight Sharing > > Barak Pearlmutter > Department of Psychology > P.O. Box 11A Yale Station I enjoyed this take-off immensely. Determining good regularisers (or priors) is a major problem facing feed-forward network research (and related representations), so I also enjoyed the original Nowlan-Hinton paper. Dramatic performance improvements can be got by careful choice of regulariser/prior (I know this from my tree research), and its a bit of a black art right now, though I have some good directions. Nowlan & Hinton suggest a strong theoretical basis exists for their approach (see their section 8), so perhaps we'll see more of this style, and "cleaner" versions to keep the theoreticians happy. By the way, at CLNL in Berkeley in August I expressed the view that this problem: i.e. Regularizers ------------ for a given network/activation-function configuration, what are suitable parameterised families of regularizes, and how might the parameters be set from the knowledge of the particular application being addressed NB. the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's equation (1) has several fairly elegant and practical solutions along with: Training -------- decision-theoretic/bounded-rationality approaches to batch vs. block (sub-batch) vs. pattern updates during gradient descent (i.e. of back-prop.) (i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion, or the batch update vs. stochastic update problem) and subsequent addition of second-order gradient methods as two of the most pressing problems to make feed-forward networks a "mature" technology that will then supercede many earlier non-neural methods. Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 244-17 fax: (415) 604 6997 Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov PS.thanks also to Martin Moller for adding some meat to the Training problem: > An interesting observation is that the number of blocks needed > to make an update is growing during learning so that after a certain > number of epochs the blocksize is equal to the number of patterns. > When this happens the algorithm is equal to a traditional batch-mode > algorithm and no validation is needed anymore. When explaining batch update vs. stochastic update to people, I always use this behaviour as an example of what a decision-theoretic training scheme **should** do, so I'm glad you've confirmed it experimentally. From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu Wed Oct 23 20:46:29 1991 From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU) Date: Wed, 23 Oct 91 19:46:29 EST Subject: Paper Message-ID: A moment ago I received a message from Jordan telling me that he can see the ppnn.ps6 file now and he has put it into neuroprose subdirectory named xu.ppnn.ps.Z. I am very glad to hear this news and also sorry for possible inconvenience to you. Please don't follow the procedure for ppnn.ps6 in Inbox (ppnn.ps6 may not be there anymore). Instead, following is the procedure to get the paper "PPNN: A Faster Learning and Better Generalizing Neural Net": unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get xu.ppnn.ps.Z ftp> quit unix> uncompress xu.ppnn.ps.Z unix> lpr xu.ppnn.ps (or however you print postscript) Thanks to Jordan again for his continuing efforts. Bo Xu Indiana University itgt500 at indycms.iupui.edu From karit at spine.hut.fi Thu Oct 24 05:53:53 1991 From: karit at spine.hut.fi (Kari Torkkola) Date: Thu, 24 Oct 91 11:53:53 +0200 Subject: Speech recognition research job in Switzerland (REPOST) Message-ID: <9110240953.AA01337@spine.hut.fi.hut.fi> ---------------------------------------------------------------------------- RESEARCH POSITIONS AVAILABLE IN SPEECH PROCESSING (repost) The newly created "Institut Dalle Molle d'Intelligence Artificielle Perceptive" (IDIAP) in Martigny, Switzerland seeks to hire qualified researchers in the area of automatic speech recognition. Candidates should be able to conduct independent research in a UNIX environment on the basis of solid theoretical and applied knowledge. Salaries will be aligned with those offered by the Swiss government for equivalent positions. Researchers are expected to begin activity in the beginning of 1992. IDIAP is supported by the Dalle Molle Foundation along with public-sector partners at the local and federal levels (in Switzerland). IDIAP is the third institute of artificial intelligence supported by the Dalle Molle Foundation, the others being ISSCO (attached to the University of Geneva) and IDSIA (situated in Lugano). The new institute maintains close contact with these latter centers as well as with the Polytechnical School of Lausanne and the University of Geneva. Applications for a research position at IDIAP should include the following elements: - a curriculum vitae - sample publications or technical reports - a brief description of the research programme that the candidate wishes to pursue - a list of personal references. Applications are due by December 1, 1991 and may be sent to the address below: Daniel Osherson IDIAP Case Postale 609 CH-1920 Martigny SWITZERLAND For further information by e-mail, contact: osherson at idiap.ch (Daniel Osherson, director) or karit at idiap.ch (Kari Torkkola, researcher) Please use the latter email address only for inquiries concerning speech recognition research. From prechelt at ira.uka.de Thu Oct 24 11:16:36 1991 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Thu, 24 Oct 91 16:16:36 +0100 Subject: Terminology (was: batch-mode parallel implementations) Message-ID: I noticed a lot of inconsistent use of terminology concerning the frequency of weight update in Backprop learning. I would like to make a suggestion for the meaning of certain terms, that is not based on the democratic aspect of what is used most often, but on investigations in a dictionary: There are three cases: (a) update after only ONE single example has been seen (b) update after ALL of the examples have been seen (c) something in between The terms used are epoch, block, batch, sample, continuous, on-line. An EPOCH is (thus saith my dictionary) not only a section of time or history (an "era"), but also a turning point. This should make EPOCH the preferred term for case (b), because the end of the training set clearly is such a turning point. A BATCH is a set of some size, a pile of things or so; with some inherent need for the information about its size. Thus it is a good candidate for case (c) and there should always be some indication of the size either as an absolute number, as a fraction of training set size or by some qualitative criterion. BLOCK could be a perhaps even better word for the same, for computer scientists, because blocks are always groups of a certain number of similar objects and the word does not have the danger of misunderstanding that stems from the term "batch-processing" from the early days of data processing, where everything was being executed completely, before you received the results. Unfortunately, for reasons of other connotations, confusion of Block with Epoch is nevertheless very likely. A SAMPLE is a part picked from a whole, usually for test purposes. Although it is not absolutely clear, that a sample is just a single object, in my ears the word tends to sound so. Thus it should be indicating case (a). CONTINOUS is a bad term to use, because the individual examples are not cut into parts, so BP is always discrete. ON-LINE usually means something like "available without physical action, merely by execution of software" and is of course completely inappropriate to learning, except perhaps where there is an infinite training set constantly floating through the machine. SUMMARY: -------- Let us use 'Epoch' for (b), 'Batch' for (c) and 'Sample' for (a). Let us avoid 'continous', 'on-line' and 'block' as much as possible. I think as scientists we should exercise some discipline in the use of language, especially when confusion is as close as in the area of learning systems... :-> Please direct all comments and flames to me. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4317, FAX: ++49/721/697760) | less simple. From oden at herky.cs.uiowa.edu Thu Oct 24 12:11:12 1991 From: oden at herky.cs.uiowa.edu (Gregg Oden) Date: Thu, 24 Oct 91 11:11:12 -0500 Subject: Batch mode in nature? Message-ID: <9110241611.AA26933@herky.cs.uiowa.edu> Steve Potter asks > I cant think of any biological examples of batch learning, in which > sensory data are saved until a certain number of them can be somehow > averaged together and conclusions made and remembered. Any ideas? If by 'sensory data' you mean the most peripheral, unanalyzed input representations, then probably not. Otherwise, yes: it has been a long-term recurring theme in the psychological literature on the development of concepts that exemplars are remembered with a great deal of specific detail until a sufficient corpus of them have been acquired to support the abstraction of a general concept. (Subsequently, idiosyncratic details may be lost/suppressed through assimilation to the encompassing category.) This notion is supported by the intuitive experience of reflective recognition of regularities; i. e., insight. In recent years, it has also gained empirical support from experimental work, most notably by Lee Brooks and his colleagues. Some of this was briefly discussed in my chapter in the Annual Review of Psychology, 1987. (See also Oden & Lopes, "On the internal structure of fuzzy subjective categories" in Recent Developments in Fuzzy Set and Possibility Theory, R. Yager, ed., 1982.) Gregg Oden Psychology & Computer Science U. of Iowa From huyser at mithril.stanford.edu Thu Oct 24 18:27:42 1991 From: huyser at mithril.stanford.edu (Karen Huyser) Date: Thu, 24 Oct 91 15:27:42 PDT Subject: learning and memory Message-ID: <9110242227.AA27923@mithril.stanford.edu> It seems to me people are confusing very different things in the recent discussion of learning (one-shot, generalization, etc). A posting from Ross Gayler quotes Ernst Dow as saying (in the context of one-shot learning): > You may be able to identify the painting you saw before, but could you > make the leap to recognizing all other abstract paintings? To have the experience of seeing a painting and to be able to recall the memory of the experience is one kind of learning and memory. To be told by someone that the painting is of a type called "abstract" is to add a category label, another kind of learning and memory. However, to recognize another painting as abstract or imitate the painting style one must form a sufficiently rich concept to be able to make a category with the label "abstract" and the original painting as one member of the class. For most humans, this involves questions, insightful answers, and many more examples of paintings. As a completely separate conceptual skill, consider the learning and concept-formation task that goes on while doing research. How does it come about that one day we look at a set of phenomena in a new way, with new concepts and categories? There are many different skills that appear under the labels "learning" and "memory". Karen Huyser huyser at mojave.stanford.edu From bill at nsma.arizona.edu Thu Oct 24 23:04:21 1991 From: bill at nsma.arizona.edu (Bill Skaggs) Date: Thu, 24 Oct 91 20:04:21 MST Subject: Continuous vs. Batch learning Message-ID: <9110250304.AA07667@nsma.arizona.edu> >It is pretty clear to me that biological neural networks have all adapted >to prefer the continuous learning technique, as we can verify for humans >by remembering something that we only saw (or heard, etc.) once. One-trial >learning paradigms abound in the behavioral literature. I cant think of >any biological examples of batch learning, in which sensory data are >saved until a certain number of them can be somehow averaged together >and conclusions made and remembered. Any ideas? David Marr's theory of the hippocampus proposed that it (the hippocampus) is an intermediate-term memory storage device, performing one-shot learning of experiences and then holding them for a period of days or weeks until they can be evaluated for significance and then gradually moved into the neocortex for permanent storage. In my humble opinion this is still the best available theory of what the hippocampus does. Some of the details have changed, but the basic idea still makes sense. Patrick Lynn has recently been exploring a more abstract version of Marr's idea, using a "buffer" of example patterns to train a recurrent back-prop net, with new patterns going into the buffer, hanging around for a while, then dropping out. He has found that under certain conditions buffering gives better performance than learning each pattern only when it is presented. (Reference: "Simple memory: a theory for archicortex." D. Marr, 1971, Phil Trans Roy Soc B 262: 23-81.) -- Bill Skaggs From gary at cs.UCSD.EDU Fri Oct 25 21:59:28 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Fri, 25 Oct 91 18:59:28 PDT Subject: Seminar abstract: The Sanguine Algorithm Message-ID: <9110260159.AA09259@desi.ucsd.edu> SEMINAR New approaches to learning in Connectionist Networks Garrison W. Cottrell Richard K. Belew Institute for Neural Declamation Condominium Community College of Southern California Previous approaches to learning in recurrent networks often involve batch learning: A large amount of effort is expended in deciding which way to move in weight space, then a little step is taken. We propose a new algorithm for learning in large networks which is orders of magnitude more efficient than batch learning. Based on the realization that many nearby points in weight space are worse than where we are now, we propose the sanguine algorithm. The basic idea is to become more happy with where we are, rather than going to all the work of moving. Hence the approach is quite simple: Randomly sample a nearby point in weight space. Compute the error functional based on that point. If it is better than the current point, repeat until we find a nearby point that is worse. Now, here's the real trick: Once we find a point worse off than where we are now, we stay where we are and increment a "happiness function". That is, we search until we find a place that we can "look down on" in weight space[1]. Now, in order to remain happy with where we are may involve a certain amount of minor work to keep this point in weight space looking good. For example, we could change the error functional until this point looks better than most other points we find. Towards this end, we can apply recent techniques (Nowlan & Hinton, 1991) to make the error functional soft and flabby. Then we can stretch the error any way we like. This approach can also be extended to replace computationally expensive "weight-sharing" techniques. If we make the weights soft and flabby, then lifting them becomes much easier since part of the weight always remains on the ground, and sharing the burden of large weights becomes unnecessary. Note that this can be done completely locally. We have applied this novel learning procedure to the problem of time series prediction. Using the Mackey-Glass equations with dimension 3.5, we give the network values at 0, 6, 12, and 18 time units back in time to predict the value of the time series 6 time units into the future. Using the Sanguine Algorithm, a network with only two hidden units rapidly converges to a soft error functional. Of course, the network has no idea of what value will come next; however, the happiness function shows it is quite blissful in its ignorance. We propose that this technique will have wide application in Republican approaches to government. ____________________ [1]Thus the pet name for our algorithm is the "Nyah Nyah Algo- rithm". From steck at spock.wsu.ukans.edu Sat Oct 26 13:49:10 1991 From: steck at spock.wsu.ukans.edu (jim steck (ME) Date: Sat, 26 Oct 91 12:49:10 -0500 Subject: Batch Learning and Parallel Implementation Message-ID: <9110261749.AA04481@spock.wsu.UKans.EDU> Regarding Parallel implementations of Batch and online learning.... S. Kollias and D. Anastassiou presented an interesting approximate second order training algorithm using a Least Squares Estimation Technique at IJCNN 1988 (IEEE Transactions on Circuits and Systems vol 36 no. 8 ). This algorithm is interesting because it updates the weights with each training pair, but performs the update using information saved from all previous training pairs. The algorithm includes a parameter called a forgetting factor which causes information from the previous training pairs to slowly be discounted (or forgotten). This is basically a type of learning somewhere inbetween "batch" learning and "on line" learning. As an appoximate second order method, it is somewhat computationally intensive; however, the method is easily and productively vectorized on parallel architectures. Jim Steck Wichita State University From todd at galadriel.stanford.edu Fri Oct 25 17:50:47 1991 From: todd at galadriel.stanford.edu (todd@galadriel.stanford.edu) Date: Fri, 25 Oct 91 14:50:47 PDT Subject: MUSIC AND CONNECTIONISM Book Announcement Message-ID: <9110252150.AA02708@galadriel.stanford.edu> BOOK ANNOUNCEMENT: MUSIC AND CONNECTIONISM edited by Peter M. Todd and D. Gareth Loy MUSIC AND CONNECTIONISM is now available from MIT Press. This 280-pp. book contains a wide variety of recent research in the applications of neural networks and other connectionist methods to the problems of musical listening and understanding, performance, composition, and aesthetics. It consists of a core of articles that originally appeared in the Computer Music Journal, along with several new articles by Kohonen, Mozer, Bharucha, and others, and new addenda to the original articles describing the authors' most recent work. Topics covered range from models of psychological processing of pitches, chords, and melodies, to algorithmic composition and performance factors. A wide variety of connectionist models are employed as well, including back-propagation in time, Kohonen feature maps, ART networks, and Jordan- and Elman-style networks. We've also included a discussion generated by the Computer Music Journal articles on the use and place of connectionist systems in artistic endeavors. A more detailed description of the book is provided below (from the jacket text), along with the complete table of contents. We hope this book will be of use to a wide variety of readers, including neural network researchers interested in a broad, challenging, and fun new area of application, cognitive scientists and music psychologists looking for robust new models of musical behavior, and artists seeking to learn more about a potentially very useful technology. MUSIC AND CONNECTIONISM can be found in bookstores that carry MIT Press publications, or can be purchased directly from MIT Press by calling their toll-free order number, 1-800-356-0343, and giving the operator this catalog number: 1CSAT 503, and this book code: TODMH. By phone and mail-order, the price is $39.95; in stores, it will probably be $45 (there is some confusion with the publisher on this point, so I wanted to give out the detailed information for phone orders to save people some money). Please drop me a line if you have any questions, and especially if you take up the gauntlet and pursue research or applications in this area! cheers, peter todd ***************************************************************************** Music and Connectionism edited by Peter M. Todd and D. Gareth Loy As one of our highest expressions of thought and creativity, music has always been a difficult realm to capture, model, and understand. The connectionist paradigm, now beginning to provide insights into many realms of human behavior, offers a new and unified viewpoint from which to investigate the subtleties of musical experience. \fIMusic and Connectionism\fP provides a fresh approach to both fields, using techniques of connectionism and parallel distributed processing to look at a wide range of topics in music research, from pitch perception to chord fingering to composition. The contributors, leading researchers in both music psychology and neural networks, address the challenges and opportunities of musical applications of network models. The result is a current and thorough survey that advances our understanding of musical perception, cognition, composition, and performance and of the design and analysis of networks. Music and Connectionism is based on a core of articles originally appearing as two special issues of the Computer Music Journal. These have been augmented with addenda covering more recent research by the authors. The book opens with tutorial chapters introducing neural networks in a musical context and relevant aspects of previous computer music research, making this a self-contained text. There are many new chapters, along with new section introductions, summaries of related work, and a final debate on the artistic implications of connectionist methods. Peter M. Todd is a doctoral candidate in the PDP Research Group of the Psychology Department at Stanford University. Gareth Loy DMA is an award-winning composer, member of the Board of Directors of the Computer Music Association, lecturer in the Music Department of UC San Diego, and member of the technical staff of Frox Inc. Contents: Preface and Introduction Peter M. Todd and D. Gareth Loy Part 1: Background Machine Tongues XII: Neural Networks Mark Dolson Connectionism and Musiconomy D. Gareth Loy Part 2: Perception and Cognition A Neural Net Model for Pitch Perception Hajime Sano and B. Keith Jenkins Connectionist Models for Tonal Analysis Don L. Scarborough, Ben O. Miller, and Jacqueline A. Jones The Representation of Pitch in a Neural Net Model of Chord Classification Bernice Laden and Douglas H. Keefe Pitch, Harmony, and Neural Nets: A Psychological Perspective Jamshed J. Bharucha The Ontogenesis of Tonal Semantics: Results of a Computer Study Marc Leman Modeling the Perception of Tonal Structure with Neural Nets Jamshed J. Bharucha and Peter M. Todd Using Connectionist Models to Explore Complex Musical Patterns Robert O. Gjerdingen The Quantization of Musical Time: A Connectionist Approach Peter Desain and Henkjan Honing Part 3: Applications A Connectionist Approach to Algorithmic Composition Peter M. Todd Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints Michael C. Mozer Creation By Refinement and the Problem of Algorithmic Music Composition J.P. Lewis A Nonheuristic Automatic Composing Method Teuvo Kohonen, Pauli Laine, Kalev Tiits, and Kari Torkkola Fingering for String Instruments with the Optimum Path Paradigm Samir I. Sayegh Part 4: Conclusions Letter from Otto Laske Responses to Laske by Todd and Loy Further Research and Directions Peter M. Todd List of Author Addresses From white at teetot.acusd.edu Fri Oct 25 19:49:14 1991 From: white at teetot.acusd.edu (Ray White) Date: Fri, 25 Oct 91 16:49:14 -0700 Subject: No subject Message-ID: <9110252349.AA27577@teetot.acusd.edu> Larry Fast writes: > I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to > compensate for the following problem: > As Backprop passes the error back thru multiple layers, the gradient has > a built in tendency to decay. At the output the maximum slope of > the 1/( 1 + e(-sum)) activation function is 0.5. > Each successive layer multiplies this slope by a maximum of 0.5. ..... > It has been suggested (by a couple of sources) that an attempt should be > made to have each layer learn at the same rate. ... > The new error function is: errorPropGain * act * (1 - act) This suggests to me that we are too strongly wedded to precisely f(sum) = 1/( 1 + e(-sum)) as the squashing function. That function certainly does have a maximum slope of 0.25. A nice way to increase that maximum slope is to choose a slightly different squashing function. For example f(sum) = 1/( 1 + e(-4*sum)) would fill the bill, or if you'd rather have your output run from -1 to +1, then tanh(sum) would work. I think that such changes in the squashing function should automatically improve the maximum-slope situation, essentially by doing the "errorPropGain" bookkeeping for you. Such solutions are static fixes. I suggested a dynamic adjustment of the learning parameter for recurrent backprop at IJCNN - 90 in San Diego (The Learning Rate in Back-Propagation Systems: an Application of Newton's Method, IJCNN 90, vol I, p 679). The method amounts to dividing the learning rate parameter by the square of the gradient of the output function (subject to an empirical minimum divisor). One should be able to do something similar with feedforward systems, perhaps on a layer by layer basis. - Ray White (white at teetot.acusd.edu) Please respond directly to 72247.2225 at compuserve.com Thanks, Larry Fast From BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU Sun Oct 27 14:17:00 1991 From: BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU (BUTUROVIC%BUEF78%yubgef51.bitnet@BITNET.CC.CMU.EDU) Date: Sun, 27 Oct 1991 21:17 +0200 Subject: forward propagation Message-ID: <2B147310A0000F63@yubgef51.bitnet> I am interested in training multi-layer perceptron without using back-propagation (BP) of the error. MLP training by means of the back-propagation (BP) algorithm is in fact minimization of the criterion function using the ordinary gradient-descent minimization algorithm. For this, the computation of derivatives is necessary. Now, it is off course possible to optimize multi-variable function without computation of derivatives. One of effective algorithms of this type is simplex algorithm [1], so it seems logical to utilize it for MLP training. There are two advantages in avoiding derivatives: first, transfer functions of the individual neurons may be non-differentiable. Second, BP utilizes a criterion function that must be written in the form of the average squared difference between target and actual outputs (there are variants to this, but, for the purpose of this discussion, they vary insignificantly), and the derivative of this function with respect to the weights must be computable. Using simplex, i. e. not using derivatives, this limitation can be avoided, as long as the function to be minimized can be measured. This can be important for applications in control where we are sometimes not able to express criterion function as a function of network parameters. There is one serious limitation regarding this algorithm, and it is spatial complexity. It requires roughly N*N memory locations, where N is the number of variables (network weights). In practice, this limits the size of the network to a couple of thousands of weights. In order to verify the behavior of the algorithm, I performed extensive experiments with Ljubomir Citkusev of the Boston University. We trained MLP to perform classification tasks on three data sets. In short, the results obtained indicate that training of the network using simplex can be done successfully. However, BP is more effective, regarding both classification accuracy (i. e., function approximation accuracy), and computational complexity (number of iterations). We didn't yet verify the ability of the algorithm to train networks with non-differentiable transfer functions or criterion functions that can not be computed analitically. It is puzzling that in [2] Minsky and Papert claimed the training of the perceptrons with hidden layers to be impossible, while at that time (1969.) there was available effective algorithm for precisely that task. While BP was shown to be superior in our experiments, they could have done some quite satisfactory training of the multi-layer networks when they wrote the book. I tried to talk to Minsky about this, but I couldn't do it. I would like to hear people's opinion on this idea. Also, it would be beneficial to know if anyone is aware of similar work. Thanks, Ljubomir Buturovic, University of Belgrade References [1] Nelder, J. A., and Mead, R. 1965, Computer Journal, vol. 7, p. 308. [2] M. Minsky, and S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, 1969. From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET Mon Oct 28 09:49:58 1991 From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya) Date: Mon, 28 Oct 91 09:49:58 JST Subject: Robinson's vowel dataset Message-ID: <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> Does anyone have any NEW results on Robinson's vowel dataset. I am aware of the original results given in his thesis: A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ 1989. Please send me mail, thanks Nitin Indurkhya (nitin at crl.hitachi.co.jp) From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Oct 28 00:10:20 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 28 Oct 91 00:10:20 EST Subject: Announcement of NIPS Workshop Message-ID: The Neural Information Processing Systems Conference will be followed by a program of workshops in Vail, Colorado on December 6 and 7, 1991. The following one-day workshop will be offered on December 6: Constructive and Destructive Learning Algorithms Workshop Leader: Scott E. Fahlman School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Internet: fahlman at cs.cmu.edu Most existing neural network learning algorithms work by adjusting connection weights in a fixed network. Recently we have seen the emergence of new learning algorithms that alter the network's topology as they learn. Some of these algorithms start with excess connections and remove any that are not needed; others start with a sparse network and add hidden units as needed, sometimes in multiple layers; some algorithms do both. These algorithms eliminate the need to guess in advance what network topology will best fit a given problem. In addition, some of these algorithms claim significant improvements in learning speed and generalization. A successful two-day workshop on this topic was presented at the NIPS-90 conference. A number of algorithms were presented by their authors and were critically evaluated. The past year has seen a great deal of additional work in this area, so a second workshop on this topic seems appropriate. We will briefly review the major algorithms presented last year. Then we will turn to more recent developments, including both new algorithms and experience gained in using the older ones. Finally, we will consider current trends and will try to identify open problems for future research. I would like to hear from people who are interested in presenting new algorithms or results at this workshop. I would particularly like to hear from people with application results or comparative studies using algorithms of this kind. The tentative plan, depending on the response we get, is allow 15-20 minutes for each presentation, with ample time for discussion. If you would like to present something, please send a short description to Scott Fahlman, at the internet address listed above. For Cascade-Correlation fans, I will be presenting a new variation called "Cascade 2" that performs better than the original in a number of situations, especially in problems with continuous analog outputs. From tesauro at watson.ibm.com Mon Oct 28 11:41:58 1991 From: tesauro at watson.ibm.com (Gerald Tesauro) Date: Mon, 28 Oct 91 11:41:58 EST Subject: Program information: NIPS91 Workshops Message-ID: The NIPS91 post-conference workshops will take place Dec. 5-7, 1991, at the Marriott Mark Resort Hotel in Vail, Colorado. The following message gives information on the program schedule and local arrangements, and is organized as follows: I. Summary schedule II. Workshop schedule III. Arrangements information IV. Workshop abstracts I. Summary Schedule: Thursday, Dec. 5th 5:00 pm Registration Open 7:00 pm Orientation Meeting 8:00 pm Reception Friday, Dec. 6th 7:00 am Breakfast 7:30 - 9:30 am Workshop Sessions 4:30 - 6:30 pm Workshop Sessions 7:00 pm Banquet Saturday, Dec. 7th 7:00 am Breakfast 7:30 - 9:30 am Workshop Sessions 4:30 - 6:30 pm Workshop Sessions 6:30 - 7:00 pm Wrap-up 7:30 pm Barbecue Dinner (optional) II. Workshop schedule: Friday, Dec. 6th: Character recognition Projection pursuit and neural networks Constructive and destructive learning algorithms II Modularity in connectionist models of cognition VLSI neural networks and neurocomputers (1st day) Recurrent networks: theory and applications (1st day) Active learning and control (1st day) Self-organization and unsupervised learning in vision (1st day) Developments in Bayesian methods for neural networks (1st day) Saturday, Dec. 7th: Oscillations and correlations in neural information processing Optimization of neural network architectures for speech recognition Genetic algorithms and neural networks Complexity issues in neural computation and learning Computer vision vs. network vision VLSI neural networks and neurocomputers (2nd day) Recurrent networks: theory and applications (2nd day) Active learning and control (2nd day) Self-organization and unsupervised learning in vision (2nd day) Developments in Bayesian methods for neural networks (2nd day) III. Arrangements information: Accomodations: The conference sessions will be held in the banquet area at Marriott Mark Resort, at Vail CO, 90 miles west of Denver. For accomodations, call the Mariott at (303)-476-4444. Our room rate is $74 (single or double). Condos for larger groups can be arranged through Destination Resorts, at (303)-476-1350. Registration: Registration fee for the workshops is $100 ($50 for students). Transportation: CME (Colorado Mountain Express) will be running special shuttles from the Sheraton in Denver up to the Marriott in Vail Thursday afternoon at a price of $31.00 per person. Call them at 1-800- 525-6363, at least 24 hours in advance, to reserve and give a credit card number for prepayment. CME also runs shuttles down from Vail to the Denver airport, same price, on Sunday at many convenient times. The earlier you call CME, the more vans will be made available for our use. Be sure to mention our special group code "NIPS". Hertz has a desk in the Sheraton, and will rent cars at a weekend rate for the trip up to Vail and back to the airport in Denver. This is an unlimited mileage rate; prices start at $60 (three days, plus tax). To make reservations call the Sheraton at 1-800-552-7030 and ask for Kevin Kline at the Hertz desk. Skiing: Skiing at Vail can be expensive. The lift tickets this year were slated to rise to $40 per day. The conference has negotiated very attractive group rates for tickets bought in advance: $56 for a 2-day ticket $84 for a 3-day ticket $108 for a 4-day ticket You can purchase these by sending a check to the conference registration office: NIPS*91 Registration, Siemens Research Center, 755 College Road East, Princeton, NJ 08540. The tickets will be printed for us, and available when we get to Vail on Thursday evening. There are several sources for rental boots and skis in Vail. The rental shop at the lifts and Banner Sports (located in the Marriott) are offering the following packages to those who identify themselves as NIPS attendees: skis, boots, poles skis, poles standard package $ 8 / day $6 / day performance package $ 11 / day $9 / day Banner will, as extra incentives, stay open for us after the Thursday orientation meeting, and give a 10% discount on anything else in the store. Optional Gourmet barbecue dinner(!): Finally, besides the conference banquet, included in the registration fee, there will be an optional dinner on Saturday night at Booco's Station, a few miles outside of Vail and world famous for its barbecued meats and special sauces. Dinner will include transportation (if you need it), appetizers, all-you-can-eat barbecue, cornbread, vegetables, dessert, and more than 40 kinds of beer at the cash bar. Tickets will be on sale at the Sheraton and at the Marriott. Price: $27. IV. Workshop Abstracts: ========================================================================= Modularity in Connectionist Models of Cognition Organizer: Jordan Pollack, Ohio State Univ. Speakers: Michael Mozer, Univ of Colorado Robert Jacobs, MIT John Barnden, New Mexico State University Rik Belew, UCSD Abstract: Classical modular theories of mind presume mental "organs" - function specific, put in place by evolution - which communicate in a symbolic language of thought. In the 1980's, Connectionists radically rejected this view in favor of more integrated architectures, uniform learning systems which would be very tightly coupled and communicate through many feedforward and feedback connections. However, as connectionist attempts at cognitive modeling have gotten more ambitious, ad-hoc modular structuring has become more prevalent. But there are concerns regarding how much architectural bias is allowable. There has been a flurry of work on resolving these concerns by seeking the principles by which modularity could arise in connectionist architectures. This will involve solving several major problems - data decomposition, structural credit assignment, and shared adaptive representations. This workshop will bring together proponents of modular connectionist architectures to discuss research direction, recent progress, and long-term challenges. ========================================================================= Character Recognition Organizers: C. L. Wilson and M. D. Garris, National Institute of Standards and Technology Speakers: Jon Hull, SUNY Buffalo Tom Vogl, ERIM Jim Keeler, MCC Chris Schofield, Nestor C. L. Wilson, NIST R. G. Casey, IBM Abstract: This workshop will consider issues related to present and future testing needs for character recognition including: 1) What is user experience in using the NIST and other publicly available databases? 2) What types of databases will be required in the future? 3) What are future testing needs, such as x-y coordinate stream or gray level data? 4) How can the evaluation of current research problems, such as segmentation, be enhanced through carefully designed databases, standard testing procedures, and automated evaluation methodologies. 5) Is the incorporation of context important in testing? 6) What other issues face the research and development of large scale recognition systems? The target audience includes those interested in and/or working on hand print recognition and developers who wish to include character recognition as part of systems to recognize documents. ========================================================================= Genetic Algorithms and Neural Networks Organizer: Rik Belew, Univ. of Calif. at San Diego Speakers: Rik Belew and Dave Rogers Abstract: This workshop will examine theoretical and algorithmic interactions between GA and NNet techniques, as well as models of the evolutionary constraints on nervous systems. Specific topics include: 1) Comparison and composition of global GA sampling techniques with the local (gradient) search of NNet methods. 2) Use of the GA to evolve additional higher-order function approximation terms (``hidden units''). 3) The dis/advantages of GA recombination and its impact on appropriate representations for NNets. 4) Trade-offs between NNet training time and GA generational time. 5) Parallel implementations of GAs that facilitate NNet simulation. 6) A role for ontogenesis between GA evolution and NNet learning. 7) The role optimality (doesn't!) play in evolution ========================================================================= Projection Pursuit and Neural Networks Organizers: Ying Zhao, Chris Atkeson and Peter Huber, MIT Speakers: R.Douglas Martin, University of Washington John Moody, Yale University Ying Zhao, MIT Andrew R. Barron, University of Illinois Nathan Intrator, Brown University Trevor Hastie, Bell Labs Abstract: Projection Pursuit is a nonparametric statistical technique to find "interesting" low dimensional projections of high dimensional data sets. We hope to improve our understanding of neural networks and projection pursuit by discussing issues such as fast training algorithms based on PP, duality with kernel approximation, possible avoidance of the "curse of dimensionality", and the sample complexity for PP. ========================================================================= Constructive and Destructive Learning Algorithms II Organizer: Scott E. Fahlman, Carnegie Mellon University Speakers: TBA Abstract: Recently we have seen the emergence of new learning algorithms that alter the network's topology. Some of these algorithms start with excess connections and remove any that are not needed; others start with a sparse network and add hidden units as needed, sometimes in multiple layers; some algorithms do both. In a two-day workshop on this topic at NIPS-90, a number of learning algorithms that modify network topology were presented by their authors and were critically evaluated. The past year has seen a great deal of additional work in this area. We will briefly review the major algorithms presented last year. Then we will turn to more recent developments, including both new algorithms and experience gained in using the older ones. Finally, we will consider current trends and will try to identify open problems for future research. ========================================================================= Oscillations and Correlations in Neural Information Processing Organizer: Ernst Niebur, Caltech Speakers: Bard Ermentrout, U. of Pittsburgh Hennric Jokeit, U. of Munich Marius Usher, Weizmann Institute Ernst Niebur, Caltech Abstract: This workshop will address models proposed for tasks like tieing together the different parts of one object in the visual field or for binding the different representations of an object in different cortical areas. Both oscillation-based models as well as alternative models based on phase coherence (correlations) will be considered in the light of the latest experimental findings. ========================================================================= Optimization of Neural Network Architectures for Speech Recognition Organizers: Uli Bodenhausen, Universitaet Karlsruhe Alex Waibel, Carnegie Mellon University Speakers: Kenichi Iso, NEC Corporation, Japan Patrich Haffner, CNET, France Mike Franzini, Telefonica I + D, Spain Abstract: A variety of neural network algorithms have recently been applied to speech recognition tasks. Besides having learning algorithms for weights, optimization of the network architectures is required to achieve good performance. Also of critical importance is the optimization of neural network architectures within hybrid systems for best performance of the system as a whole. Parameters that have to be optimized within these constraints include the number of hidden units, number of hidden layers, time-delays, connectivity within the network, input windows, the number of network modules, number of states and others. The proposed workshop intends to discuss and evaluate the importance of these architectural parameters and different integration strategies for speech recognition systems. Participating researchers interested in speech recognition are welcome to present short case studies on the optimization of neural networks, preferably with an evaluation of the optimization steps. The workshop could also be of interest to researchers working on constructive/destructive learning algorithms because the relevance of different architectural parameters should be considered for the design of these algorithms. ========================================================================= SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION Organizer: Jonathan A. Marshall, Univ. of North Carolina Speakers: Suzanna Becker, University of Toronto Irving Biederman, University of Southern California Thomas H. Brown, Yale University Joachim M. Buhmann, Lawrence Livermore National Laboratory Heinrich Bulthoff, Brown University Edward Callaway, Duke University Allan Dobbins, McGill University Gillian Einstein, Duke University Charles Gilbert, The Rockefeller Universty John E. Hummel, UCLA Daniel Kersten, University of Minnesota David Knill, University of Minnesota Laurence T. Maloney, New York University Jonathan A. Marshall, University of North Carolina at Chapel Hill Paul Munro, University of Pittsburgh Albert L. Nigrin, American University Alice O'Toole, The University of Texas at Dallas Jurgen Schmidhuber, University of Colorado Nicol Schraudolph, University of California at San Diego Michael P. Stryker, University of California at San Francisco Patrick Thomas, Technische Universitat Muenchen Rich Zemel, University of Toronto Abstract: This workshop considers the role that unsupervised learning procedures (e.g. Hebb-type rules) may play in the self-organization of cortical structures involved in the processing of visual information. Researchers in visual neuroscience, visual psychophysics and neural network modeling will be brought together to address head-on the key issue of how animal visual systems got the way they are. We hope that this will lead to a better understanding of the factors that shape the structure of animal visual systems, as well as better models of the neurophysiological processes underlying vision. ========================================================================= Developments in Bayesian methods for neural networks Organizers: David MacKay, Caltech Steve Nowlan, Salk Institute Abstract: The first day of this workshop will be 50% tutorial in content, reviewing some new ways Bayesian methods may be applied to neural networks. The rest of the workshop will be devoted to discussions of the frontiers and challenges facing Bayesian work in neural networks, including issues such as Monte Carlo clustering, data selection, active query learning, prediction of generalisation, missing inputs, unlabelled data and discriminative training, Discussion will be moderated by John Bridle. Speakers: Radford Neal Jurgen Schmidhuber John Moody David Haussler + Michael Kearns Sara Solla + Esther Levin Steve Renals Reading up before the workshop ------------------------------ People intending to attend this workshop are encouraged to obtain preprints of relevant material before NIPS. A selection of preprints are available by anonymous ftp, as follows: unix> ftp hope.caltech.edu (or ftp 131.215.4.231) login: anonymous password: ftp> cd pub/mackay ftp> get README.NIPS ftp> quit Then read the file README.NIPS for further information. Problems? Contact David MacKay, mackay at hope.caltech.edu ========================================================================= Active Learning and Control Organizers: David Cohn, Univ. of Washington Don Sofge, MIT Speakers: C. Atkeson, MIT A. Barto, Univ. of Massachussetts, Amherst J. Hwang, Univ. of Washington M. Jordan, MIT A. Moore, MIT J. Schmidhuber, University of Colorado, Boulder R. Sutton, GTE S. Thrun, Carnegie-Mellon University Abstract: An "active" learning system is one that is not merely a passive observer of its environment, but instead play an active role in determining its inputs. This definition includes classification networks that query for values in "interesting" parts of their domain, learning systems that actively "explore" their environment, and adaptive controllers that learn how to produce control outputs to achieve a goal. Common facets of these problems include building world models in complex domains, exploring a domain to safely and efficiently, and, planning future actions based on one's model. In this workshop, our main focus will be to address key unsolved problems which may be holding up progress on these problems rather than presenting polished, finished results. Our hopes are that unsolved problems in one field may be able to draw on insight from research in other fields. ========================================================================= Computer Vision vs Network Vision Organizers: John Mayhew and Terry Sejnowski Speakers: TBA Abstract: Computer vision has developed a methodology based on sound engineering practice: 1. Break the problem down into well-defined subproblems and mathematically analyze each part; 2. Develop efficient algorithms for each module; 3. Implement each algorithm with the best available technology. These are Marr's three levels: computational, algorithmic, and implementational. In contrast, proponents of neural networks have developed a different methodology: 1. Find a good representation for the input data that makes explicit the features needed to solve the problem; 2. Use learning algorithms to cluster and categorize the data; 3. Glue together networks that solve different parts of the problem with more learning. Networks are memory intensive and constraints from the hardware level are as important as constraints from the computational level. This workshop is intended to provoke a lively and free-wheeling discussion of the central issues in vision. ========================================================================= Complexity Issues in Neural Computation and Learning Organizers: Kai-Yeung Sui and Vwani Roychowdhury, Stanford Univ. Speakers: TBA Abstract: The goal of this workshop is to address recent developments in understanding the capabilities and limitations of various models for neural computation and learning. Topics will include: 1) circuit complexity of neural networks, 2) capacity of neural networks, and 3) complexity issues in learning algorithms. ========================================================================= RECURRENT NETWORKS: THEORY AND APPLICATIONS Organizers: Luis Borges de Almeida, INESC C. Lee Giles, NEC Research Institute Richard Rohwer, Edinburgh University Speakers: TBA Abstract: Recurrent neural networks have a very large potential for handling dynamical / sequential problems, e.g. recognition and classification of time-dependent signals like speech, modelling and control of dynamical systems, learning of grammars and symbolic processing, etc. However, the fulfillment of this potential remains an important open issue. Training algorithms are very inefficient in terms of memory and computational demands. Little is known about convenient architectures. The number of known successful applications is very limited. This is true even for static applications (operation in the "fixed point mode"). The first day of this two-day workshop will focus on the outstanding theoretical issues in recurrent neural networks, and the second day will examine existing and potential real-world applications. ========================================================================= VLSI Neural Networks and Neurocomputers Organizers: Clifford Lau, Office of Naval Research Jim Burr, Stanford University Speakers: TBA Abstract: This two-day workshop will address the latest advances in VLSI implementations of neural nets, and the design of high performance neurocomputers. We will present an updated list of currently available neurochips, and discuss a wide range of issues, including: 1) Design issues: Advantage and disadvantage of analog and digital approaches; how much arithmetic precision is necessary; which algorithms have been implemented; importantance of on-chip learning; neurochip design in existing CAD environment. 2) Performance issues: Critical factors to achieve robust performance; Tradeoffs between capacity and performance; scaling limits to constructing large neural networks. 3) Use of neurochips: What input/output devices are necessary; what programming support environment is necessary. 4) Application areas for supercomputing neurocomputers From zeiden at cs.wisc.edu Mon Oct 28 10:30:14 1991 From: zeiden at cs.wisc.edu (zeiden@cs.wisc.edu) Date: Mon, 28 Oct 91 09:30:14 CST Subject: tech report available in NEUROPROSE Message-ID: <9110281530.AA29229@ai.cs.wisc.edu> I have placed the following tech report in the NEUROPROSE ftp archive at Ohio State, under the name zeidenberg.containment.ps.Z Implementing Spatial Relations in Neural Nets: The Case of Figure/Ground and Containment Matthew Zeidenberg zeiden at cs.wisc.edu A neural network system that computes the relation of containment between objects in a retina-like input array is described. This system is multi-layer, and operates by recognizing and segmenting the objects in the input to place them in separated arrays. The figure of each object, that is, the set of all pixels on the perimeter of or contained in the object, is computed for each object, using a method that involves a connectionist implementation of a standard algorithm using parity networks. These figures are then used to compute containment relations between the objects in the input. ftp Instructions: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get zeidenberg.containment.ps.Z ftp> quit unix> uncompress zeidenberg.containment.ps.Z unix> lpr zeidenberg.containment.ps (or other command to print postscript) From black at seismo.CSS.GOV Mon Oct 28 12:01:00 1991 From: black at seismo.CSS.GOV (Mike Black) Date: Mon, 28 Oct 91 12:01:00 EST Subject: What is current technology in Analog Neural Nets? Message-ID: <9110281701.AA21092@beno.CSS.GOV> I've seen little discussion and have found no references to work in analog neural networks. If you can provide some references or indicate what your current work is I'll summarize. These are the goals for my current research: Given an analog data source (e.g. pulse generator): 1. Recognize pulses (for example a single shot square wave) and reject "noise" (i.e. triangular wave) at rates of at least 10MHz (that is, it should be able to deal with a minimum 100ns pulse width). 2. Provide the trigger for an external digitizer to grab the resultant "good" pulses. 3. Be software controllable (hardware should be able to be updated by remote control). Please forward any current work or capability in this area to: black at beno.css.gov >> ------------------------------------------------------------------------------- >> : usenet: black at beno.CSS.GOV : land line: 407-494-5853 : I want a computer: >> : real home: Melbourne, FL : home line: 407-242-8619 : that does it all!: >> ------------------------------------------------------------------------------- From lissie!botsec7!botsec1!dcl at uunet.UU.NET Mon Oct 28 13:54:54 1991 From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert) Date: Mon, 28 Oct 91 13:54:54 EST Subject: Resource Allocation Network (RAN) Message-ID: <9110281854.AA20399@botsec1.bot.COM> Dear Connectionists: Has anyone tried to implement the Resource Allocation Network of John Platt (NIPS 3 and Neural Computation V3 #2)? I have a first cut at an implementation, and so far I have not been able to approach his published results. I'd be very interested in corresponding with anyone who has tried this algorithm. Also, if anyone has a means of reaching John Platt, I'd love to hear about it. I've been calling Synaptics in San Jose for over a week now, and there don't seem to be any humans that work there...only voice mail. Thanks David Lambert dcl at object.com or dcl at panix.com From khosla at latcs1.lat.oz.au Mon Oct 28 22:32:44 1991 From: khosla at latcs1.lat.oz.au (Rajiv Khosla) Date: Tue, 29 Oct 91 14:32:44 +1100 Subject: Spatial crosstalk and modular NN architecture Message-ID: <9110290332.AA18704@latcs1.lat.oz.au> Dear Connectionists, This is regarding my problem of making a 28-11-26, binary input/output neural network work. Thanks to everyone who sent me the replies. Its working nice and kicking. Best results are achieved by connecting the input layer to the output layer. Thanks once again Rajiv From terry at jeeves.UCSD.EDU Tue Oct 29 02:35:07 1991 From: terry at jeeves.UCSD.EDU (Terry Sejnowski) Date: Mon, 28 Oct 91 23:35:07 PST Subject: Continuous vs. Batch learning Message-ID: <9110290735.AA01748@jeeves.UCSD.EDU> There is evidence that the hippocampus is doing something like batch mode teaching for neocortex. The hippocampus is needed for one-shot learning, also called declarative or episodic learning. It seems to be storing up a lot of examples and over a period of months transfers this informaiton to cortex, where it is stored in a more categorical representation. Terry ----- From smieja at jargon.gmd.de Tue Oct 29 05:14:40 1991 From: smieja at jargon.gmd.de (Frank Smieja) Date: Tue, 29 Oct 91 11:14:40 +0100 Subject: Batch methods versus stochastic methods... In-Reply-To: mmoller@daimi.aau.dk's message of Mon, 21 Oct 91 13:13:06 +0100 Message-ID: <9110291014.AA24169@jargon.gmd.de> -) Unfortunately, we do not have any datasets of the proper size. -) So I would appreciate if anyone could inform me about where to find big -) datasets that are public available. -) -) -- Martin M -) -) ----------------------------------------------------------------------- -) Martin F. Moller email: mmoller at daimi.aau.dk -) Computer Science Department phone: +45 86202711 5223 -) Aarhus University fax: +45 86135725 -) Ny Munkegade, Building 540 -) 8000 Aarhus C -) Denmark -) ---------------------------------------------------------------------- I demonstrated in my paper "MLP Solutions, Generalization and Hidden Unit Representations" in the DANIP (Distributed And Neural Information Processing) conference in Bonn, Germany, April 1989 (ed: Kindermann & Linden, pub: Oldenbourg Verlag), how one might "synthetically" construct a training set of any size of inputs/outputs, that may be generalized, insofar that the "regularities" beloved by our networks are guaranteed to exist, since they are used to generate the training set pairs, but not visible to the network until the examples are seen, and the learning results in "emergent generalization". I used this method in the paper to study a small diagnosis problem, but scaling up is no problem. If you cannot get hold of this book, and would like to see the paper, I can make it available in the neuroprose archive (unfortunately without figures, but they are not needed to explain the method). If this is also difficult, I will send hard copies to interested parties. Please send such requests directly to me (smieja at gmdzi.uucp) and I will either reply directly or to the bboard. -Frank Smieja From joachim at gmdzi.gmd.de Tue Oct 29 12:57:47 1991 From: joachim at gmdzi.gmd.de (Joachim Diederich) Date: Tue, 29 Oct 91 16:57:47 -0100 Subject: New Paper Message-ID: <9110291557.AA14221@gmdzi.gmd.de> The following paper has been placed in the Neuroprose archives at Ohio State. The file is "diederich.hybrid.ps.Z." See ftp in- structions below. Efficient Question Answering in a Hybrid System Joachim Diederich (1,2) & Debra L. Long (2) (1) German National Research Center for Computer Science (GMD) Schloss Birlinghoven, P.O. Box 1240 D-5205 St.Augustin 1, Germany (2) Department of Psychology University of California, Davis Davis, CA 95616, U.S.A. ABSTRACT: A connectionist model for answering open-class questions in the context of text processing is presented. The system answers ques- tions from different question categories, such as "How," Why," and "Consequence" questions. These question categories have been identified in several empirical studies (Graesser & Clark, 1985; Graesser, 1990). The system responds to a question by generating a set of possible answers that are weighted according to their plausibility. Search is performed by means of a massively paral- lel, directed spreading activation process. The search process operates on several knowledge sources (i.e., connectionist net- works) that are learned or explicitly built-in. Spreading activa- tion involves the use of signature messages (Lange & Dyer, 1989). Signature messages are numeric values that are propagated throughout the networks and identify a particular question category (this makes the system hybrid). Binder units that gate the flow of activation between textual units receive these signa- tures and change their states. That is, the binder units either block the spread of activation or allow the flow of activation in a certain direction. The process results in a pattern of activa- tion that represents a set of candidate answers based on avail- able knowledge sources. This paper will appear in the IJCNN-91 Singapore Proceedings. unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get diederich.hybrid.ps.Z ftp> quit unix> uncompress diederich.hybrid.ps.Z unix> lpr diederich.hybrid.ps Joachim Diederich German National Research Center for Computer Science (GMD) P.O. Box 1240 D-5205 St. Augustin 1 Germany From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 29 15:23:46 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 29 Oct 91 15:23:46 EST Subject: Robinson's vowel dataset In-Reply-To: Your message of Mon, 28 Oct 91 09:49:58 +0200. <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> Message-ID: Does anyone have any NEW results on Robinson's vowel dataset. I am aware of the original results given in his thesis: A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ 1989. I don't know of any more recent publications on this problem. I got some rather good results using Cascade-Correlation: (train 300 300 25)) SigOff 0.10, WtRng 1.00, WtMul 1.00 OMu 2.00, OEps 1.00, ODcy 0.0300, OPat 12, OChange 0.010 IMu 2.00, IEps 10.00, IDcy 0.0300, IPat 8, IChange 0.030 Utype :SIGMOID, Otype :SIGMOID, RawErr NIL, Pool 32 Trial 0: 181 of 462 cases wrong, 281 right, 60.82% @ 23 hidden Trial 1: 174 of 462 cases wrong, 288 right, 62.34% @ 11 hidden Trial 2: 193 of 462 cases wrong, 269 right, 58.23% @ 24 hidden Trial 3: 174 of 462 cases wrong, 279 right, 60.39% @ 15 hidden Trial 4: 180 of 462 cases wrong, 282 right, 61.04% @ 24 hidden Trial 5: 186 of 462 cases wrong, 276 right, 59.74% @ 17 hidden Trial 6: 188 of 462 cases wrong, 274 right, 59.31% @ 11 hidden Trial 7: 174 of 462 cases wrong, 288 right, 62.34% @ 15 hidden Trial 8: 173 of 462 cases wrong, 289 right, 62.55% @ 13 hidden Trial 9: 170 of 462 cases wrong, 292 right, 63.20% @ 18 hidden Avg: 180 of 462 cases wrong, 282 right, 61.03% @ 17 hidden The test set was run after each output training phase and the best value obtained is the one reported. The best results obtained by Robinson were 260 right (56%) for nearest neighbor, and 253 right (55%) for 528 Gaussian nodes or 88 square nodes. Backprop with 88 sigmoids never got better than 234 (51%). I've never published these results, because I think they are a bit of a cheat. The problem is that I played around with the decay factor and other parameters until I got good results on the test set. It's not clear that the same setting would give equally good performance on a new test set that I had never seen. Also, in all cases the algorithm obtained a solid level of 59% or so, but then wandered up and down, in no particular pattern, as new units were added. I can get a good number -- up to 63% -- by grabbing the best point on this random walk, but I don't honestly believe that the network at that point would give equally good results on new test data drawn from the same distribution. What we really need is a much larger data set for this problem. Then we could split the set into training data (a larger set, offering much better generalization), cross-validation data (used to determine when training should stop), and final test data, never used in training. The the current set is so small that it's not possible to split things up this way. -- Scott Fahlman From kak at max.ee.lsu.edu Tue Oct 29 16:36:52 1991 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Tue, 29 Oct 91 15:36:52 CST Subject: No subject Message-ID: <9110292136.AA01849@max.ee.lsu.edu> CALL FOR PAPERS Special Issue On NETWORKS FOR NEURAL PROCESSING Circuits, Systems, and Signal Processing Guest Editors: W.A. Porter, University of Alabama, Huntsville S.C. Kak, Louisiana State University, Baton Rouge Papers are solicited on the theoretical foundations, challenging applications and efficient parallel architectures for neural computing. Suggested topics include: training for generalization, use of higher order moments, rapid training algorithms, nonbinary design, optimization networks, and mapping networks. Papers which critique and/or compare recent developments in neural computation are also of interest. Papers should be prepared according to the Information for Contributors on the inside back cover of Circuits, Systems, and Signal Processing. Papers should be submitted in triplicate by January 20, 1992 in care of: Professor William A. Porter Department of Electrical and Computer Engineering The Univesity of Alabama in Huntsville Huntsville, AL 35899 [Tel. (205) 895-6858] For further information contact Professor S.C. Kak at kak at max.ee.lsu.edu or contact Professor W.A. Porter. From dlukas at PARK.BU.EDU Tue Oct 29 13:34:49 1991 From: dlukas at PARK.BU.EDU (David Lukas) Date: Tue, 29 Oct 91 13:34:49 -0500 Subject: Faculty position in Cognitive & Neural Systems at Boston University Message-ID: <9110291834.AA29864@cns.bu.edu> Assistant Professor Cognitive and Neural Systems Boston University Boston University seeks to hire a tenure track assistant professor starting in Fall 1992 for its graduate Department of Cognitive and Neural Systems. The Department offers an integrated curriculum offering the full range of psychological, neurobiological, and computational concepts, models, and methods in the fields of neural networks, computational neuroscience, parallel distributed processing, and biological information processing, in which Boston University is a leader. Candidates should have extensive analytic or computational research experience in modelling nonlinear neural networks, especially in one or more of the areas: learning, speech and language processing, adaptive pattern recognition, cognitive information processing, and adaptive sensory-motor control. Send a complete curriculum vitae and three letters of recommendation to Stephen Grossberg, Chairman, Search Committee, Department of Cognitive and Neural Systems, Room 240, 111 Cummington Street, Boston University, Boston, MA 02215, no later than January 1, 1992. Boston University is an Equal Opportunity/Affirmative Action employer. If you have questions or require further information, please reply to Carol Jefferson---caroly at cns.bu.edu. From demers at cs.UCSD.EDU Tue Oct 29 16:48:36 1991 From: demers at cs.UCSD.EDU (David DeMers) Date: Tue, 29 Oct 91 13:48:36 PST Subject: Generalization Message-ID: <9110292148.AA15810@beowulf.ucsd.edu> A short while back there was a discussion of generalization; I recall contributions by Wolpert and Goldfarb, among others. I didn't save the exchanges, however I'd like to look at them now. Unfortunately, I can't seem to connect up to the archive to retrieve the mailings. If anyone has most of the discussion still lying around, I'd appreciate it if you could mail it to me; also, I'd appreciate anyone's opinion on "what is generalization" in 250 words or less :-) I do have most of David Wolpert's papers, so don't need another copy of them... Thanks for any help, Dave From hcard at ee.UManitoba.CA Wed Oct 30 15:14:07 1991 From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA) Date: Wed, 30 Oct 91 14:14:07 CST Subject: batch learning Message-ID: <9110302014.AA00760@card.ee.umanitoba.ca> In the PDP books batch learning accumulates error derivatives from each pattern rather than simply their contributions to the total error, before making weight changes. It seems that gradient descent ought to add all the errors before taking any derivatives. Any comments? Howard Card From petsche at learning.siemens.com Wed Oct 30 15:33:38 1991 From: petsche at learning.siemens.com (Thomas Petsche) Date: Wed, 30 Oct 91 15:33:38 EST Subject: NIPS travel (limited cheap airfare) Message-ID: <9110302033.AA12077@learning.siemens.com> FYI: United has a special fare program available until tomorrow. We just booked a round trip from Newark to Denver (leave Monday morning and return Sunday morning) for $250. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Oct 29 22:59:19 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 29 Oct 91 22:59:19 EST Subject: Resource Allocation Network (RAN) In-Reply-To: Your message of Mon, 28 Oct 91 13:54:54 -0500. <9110281854.AA20399@botsec1.bot.COM> Message-ID: Have you tried E-mail? I exchanged some mail with him a month or so ago: John Platt -- Scott From ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU Wed Oct 30 04:55:26 1991 From: ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU (ruizdeangulo%ispgro.cern.ch@BITNET.CC.CMU.EDU) Date: Wed, 30 Oct 91 10:55:26 +0100 Subject: batch-continous-one shot Message-ID: <9110300955.AA03462@dxmint.cern.ch> Referring to the batch-continous-one-shot learning discussion, in the reference bellow we describe an algorithm that can be labeled as one-shot learning. I think it fits well with the Plutowski and White method described recently. >What we do (as reported in the tech report by Plutowski & White) >is sequentially grow the training set, first finding >an "optimal" training set of size 1, then fitting the network to this >training set, appending the training set with a new exemplar selected from >the set of available candidates, obtaining a training set of size 2 which >is "approximately optimal", fitting this set, appending a third exemplar, etc, >continuing the process until the network fit obtained by training over the >exemplars fits the rest of the available examples within the desired tolerance. The MDL (Minimal disturbance learning) algorithm introduces a new exemplar minimizing an estimation of the loss function (error increment) over the old patterns.It makes a little search for this optimization but whatever the stopping point(for this search), perfect recall of the new exemplar is gotten. The network is not forced to assume any special kind of local-representation. Ruiz de Angulo,V.,Torras, C.(1991) Minimally disturbing Learning. In the proceedings of the IWANN 91.Springer Verlag From edelman at wisdom.weizmann.ac.il Thu Oct 31 04:08:00 1991 From: edelman at wisdom.weizmann.ac.il (Shimon Edelman) Date: Thu, 31 Oct 91 11:08+0200 Subject: Resource Allocation Network (RAN) In-Reply-To: <9110281854.AA20399@botsec1.bot.COM> Message-ID: <19911031090807.2.EDELMAN@YAD.weizmann.ac.il> A similar technique of RBF center allocation, in conjunction with other modifications of RBF learning, was successful in replicating human performance in the difficult visual task of hyperacuity vernier discrimination. See AI Memo 1271, "Synthesis of visual modules from examples: learning hyperacuity", by T. Poggio, M. Fahle and S. Edelman (January 1991). Center allocation is discussed there on p.7. -Shimon Edelman edelman at wisdom.weizmann.ac.il From dfausett at zach.fit.edu Thu Oct 31 09:48:43 1991 From: dfausett at zach.fit.edu ( Donald W. Fausett) Date: Thu, 31 Oct 91 09:48:43 -0500 Subject: What is current technology in Analog Neural Nets? Message-ID: <9110311448.AA02454@zach.fit.edu> Prof. Bernard Widrow at Stanford University (EE Dept) would be a likely source to stir you in the right direction. Locally, you might try Prof. Hal Brown at FIT (EE Dept). Good luck. -- Don Fausett From lissie!botsec7!botsec1!dcl at UUNET.uu.net Thu Oct 31 10:13:26 1991 From: lissie!botsec7!botsec1!dcl at UUNET.uu.net (David Lambert) Date: Thu, 31 Oct 91 10:13:26 EST Subject: Resource Allocation Network (RAN) Message-ID: <9110311513.AA24956@botsec1.bot.COM> Hi. Thanks to all respondents concerning my RAN question. I managed to get in touch with John Platt, and he was most helpful. John Platt writes: > Someone forwarded me your posting on the connectionist mailing list.. > Could you please follow up, and say that you have successfully used > RAN? It would be nice to leave an impression of a working algorithm... My sincere apologies for being lax in my courtesies, John. You're right, of course. I got RAN working just fine, and it works as well (if not better than) advertised. To those who asked for a copy of the resulting code, I'll probably release it sometime soon, through one mechanism or another. Thanks again. David Lambert dcl at object.com or dcl at panix.com From B344DSL at UTARLG.UTA.edu Wed Oct 9 23:55:00 1991 From: B344DSL at UTARLG.UTA.edu (B344DSL@UTARLG.UTA.edu) Date: Wed, 9 Oct 1991 22:55 CDT Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu> ANNOUNCEMENT AND CALL FOR ABSTRACTS WORKSHOP ON OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS? Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas SIG of the International Neural Network Society (INNS). To be held at a loca- tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday, February 6-8, 1992. Confirmed speakers include: Stephen Grossberg (Boston University) Stephen Hampson (University of California, Irvine) Karl Pribram (Radford University) Harold Szu (Naval Surface Warfare Center) Graham Tattersall (University of East Anglia) The focus of this conference will be twofold: (1) how to optimize different aspects of neural and cognitive function and (2) whether particular natural or artificial solutions to specific neural or cognitive problems are in fact opti- mal. Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat- egorization, sensory perception, and motor control. The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions. Speakers will not be re- quired to write a paper, but will be invited to contribute chapters to a book several months after the conference. Books based on two previous MIND conferen- ces -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published by Lawrence Erlbaum Associates. Registration for the conference will be $80 for non-students, $20 for students, with a $10 rebate for MIND or Texas SIG membership. We will try to arrange for discounted air fares from American Airlines as we have done in the past. Those interested in presenting should send me a short (1-3 paragraph) abstract by December 1, 1991, using either e-mail, FAX, or snail mail. Notification of ac- ceptance will be given December 15, 1991. We will not be holding parallel ses- sions, so there are limitations on the number of speakers. However, individu- als who send high-quality abstracts that cannot be accommodated in actual talks will have space to present their work in posters at the conference, and will also be invited to contribute to the book. Prof. Daniel S. Levine Department of Mathematics University of Texas at Arlington Arlington, TX 76019-0408 e-mail: b344dsl at utarlg.uta.edu FAX: 817-794-5802 Telephone: 817-273-3598