From Connectionists-Request at CS.CMU.EDU Fri Nov 1 00:05:16 1991 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Fri, 01 Nov 91 00:05:16 EST Subject: Bi-monthly Reminder Message-ID: <17441.688971916@B.GP.CS.CMU.EDU> *** DO NOT FORWARD TO ANY OTHER LISTS *** This is an automatically posted bi-monthly reminder about how the CONNECTIONISTS list works and how to access various online resources. CONNECTIONISTS is not an edited forum like the Neuron Digest, or a free-for-all newsgroup like comp.ai.neural-nets. It's somewhere in between, relying on the self-restraint of its subscribers. Membership in CONNECTIONISTS is restricted to persons actively involved in neural net research. The following posting guidelines are designed to reduce the amount of irrelevant messages sent to the list. Before you post, please remember that this list is distributed to over a thousand busy people who don't want their time wasted on trivia. Also, many subscribers pay cash for each kbyte; they shouldn't be forced to pay for junk mail. Happy hacking. -- Dave Touretzky & Hank Wan --------------------------------------------------------------------- What to post to CONNECTIONISTS ------------------------------ - The list is primarily intended to support the discussion of technical issues relating to neural computation. - We encourage people to post the abstracts of their latest papers and tech reports. - Conferences and workshops may be announced on this list AT MOST twice: once to send out a call for papers, and once to remind non-authors about the registration deadline. A flood of repetitive announcements about the same conference is not welcome here. - Requests for ADDITIONAL references. This has been a particularly sensitive subject lately. Please try to (a) demonstrate that you have already pursued the quick, obvious routes to finding the information you desire, and (b) give people something back in return for bothering them. The easiest way to do both these things is to FIRST do the library work to find the basic references, then POST these as part of your query. Here's an example: WRONG WAY: "Can someone please mail me all references to cascade correlation?" RIGHT WAY: "I'm looking for references to work on cascade correlation. I've already read Fahlman's paper in NIPS 2, his NIPS 3 abstract, and found the code in the nn-bench archive. Is anyone aware of additional work with this algorithm? I'll summarize and post results to the list." - Announcements of job openings related to neural computation. - Short reviews of new text books related to neural computation. To send mail to everyone on the list, address it to Connectionists at CS.CMU.EDU ------------------------------------------------------------------- What NOT to post to CONNECTIONISTS: ----------------------------------- - Requests for addition to the list, change of address and other administrative matters should be sent to: "Connectionists-Request at cs.cmu.edu" (note the exact spelling: many "connectionists", one "request"). If you mention our mailing list to someone who may apply to be added to it, please make sure they use the above and NOT "Connectionists at cs.cmu.edu". - Requests for e-mail addresses of people who are believed to subscribe to CONNECTIONISTS should be sent to postmaster at appropriate-site. If the site address is unknown, send your request to Connectionists-Request at cs.cmu.edu and we'll do our best to help. A phone call to the appropriate institution may sometimes be simpler and faster. - Note that in many mail programs a reply to a message is automatically "CC"-ed to all the addresses on the "To" and "CC" lines of the original message. If the mailer you use has this property, please make sure your personal response (request for a Tech Report etc.) is NOT broadcast over the net. - Do NOT tell a friend about Connectionists at cs.cmu.edu. Tell him or her only about Connectionists-Request at cs.cmu.edu. This will save your friend from public embarrassment if she/he tries to subscribe. - Limericks should not be posted here. ------------------------------------------------------------------------------- The CONNECTIONISTS Archive: --------------------------- All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are now available for public perusal. A separate file exists for each month. The files' names are: arch.yymm where yymm stand for the obvious thing. Thus the earliest available data are in the file: arch.8802 Files ending with .Z are compressed using the standard unix compress program. To browse through these files (as well as through other files, see below) you must FTP them to your local machine. ------------------------------------------------------------------------------- How to FTP Files from the CONNECTIONISTS Archive ------------------------------------------------ 1. Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address 128.2.242.8). 2. Login as user anonymous with password your username. 3. 'cd' directly to one of the following directories: /usr/connect/connectionists/archives /usr/connect/connectionists/bibliographies 4. The archives and bibliographies directories are the ONLY ones you can access. You can't even find out whether any other directories exist. If you are using the 'cd' command you must cd DIRECTLY into one of these two directories. Access will be denied to any others, including their parent directory. 5. The archives subdirectory contains back issues of the mailing list. Some bibliographies are in the bibliographies subdirectory. Problems? - contact us at "Connectionists-Request at cs.cmu.edu". ------------------------------------------------------------------------------- How to FTP Files from the Neuroprose Archive -------------------------------------------- Anonymous FTP on archive.cis.ohio-state.edu (128.146.8.52) pub/neuroprose directory This directory contains technical reports as a public service to the connectionist and neural network scientific community. Researchers may place electronic versions of their preprints or articles in this directory, announce availability, and other interested researchers can rapidly retrieve and print the postscripts. This saves copying, postage and handling, by having the interested reader supply the paper. To place a file, put it in the Inbox subdirectory, and send mail to pollack at cis.ohio-state.edu. Within a couple of days, I will move and protect it, and suggest a different name if necessary. Current naming convention is author.title.filetype[.Z] where title is enough to discriminate among the files of the same author. The filetype is usually "ps" for postscript, our desired universal printing format, but may be tex, which requires more local software than a spooler. Very large files (e.g. over 200k) must be squashed (with either a sigmoid function :) or the standard unix "compress" utility, which results in the .Z affix. To place or retrieve .Z files, make sure to issue the FTP command "BINARY" before transfering files. After retrieval, call the standard unix "uncompress" utility, which removes the .Z affix. An example of placing a file is attached as an appendix, and a shell script called Getps in the directory can perform the necessary retrival operations. For further questions contact: Jordan Pollack Email: pollack at cis.ohio-state.edu Here is an example of naming and placing a file: gvax> cp i-was-right.txt.ps rosenblatt.reborn.ps gvax> compress rosenblatt.reborn.ps gvax> ftp archive.cis.ohio-state.edu Connected to archive.cis.ohio-state.edu. 220 archive.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password:neuron 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose/Inbox 250 CWD command successful. ftp> put rosenblatt.reborn.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for rosenblatt.reborn.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye. gvax> mail pollack at cis.ohio-state.edu Subject: file in Inbox. Jordan, I just placed the file rosenblatt.reborn.ps.Z in the Inbox. The INDEX sentence is "Boastful statements by the deceased leader of the neurocomputing field." Please let me know when it is ready to announce to Connectionists at cmu. BTW, I enjoyed reading your review of the new edition of Perceptrons! Frank ------------------------------------------------------------------------ How to FTP Files from the NN-Bench Collection --------------------------------------------- 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/bench". Any subdirectories of this one should also be accessible. Parent directories should not be. 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "nn-bench-request at cs.cmu.edu". From hinton at ai.toronto.edu Fri Nov 1 10:19:01 1991 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Fri, 1 Nov 1991 10:19:01 -0500 Subject: batch learning In-Reply-To: Your message of Wed, 30 Oct 91 15:14:07 -0500. Message-ID: <91Nov1.101916edt.204@neuron.ai.toronto.edu> Differentiation is a linear operator. So the derivative of the sum of the individual errors is the sum of the derivatives of the individual errors. The fact that differentiation is linear is actually helpful for some fancier ideas. To improve the conditioning of the error surface, it would be nice to convolve it with a gaussian blurring function so that sharp curvatures across ravines are reduced, but gentle slopes along ravines remain. Instead of convolving the error surface and then differntiating, we can differentiate and then convolve if we want. The momentum method is actually a funny version of this where we convolve along the path using a one-sided exponetially decaying filter. PS: As Rick Szeliski pointed out years ago, convolving a quadratic surface with a gaussian does not change its curvature, it just moves the whole thing up a bit (I hope I got this right!). But of course, our surfaces are not quadrartic. They have plateaus, and convolution with a gaussian causes nearby plateaus to smooth out nasty ravines, and also allows the gradients in ravines to be "seen" on the plateaus. Geoff From bachmann at radar.nrl.navy.mil Fri Nov 1 11:11:07 1991 From: bachmann at radar.nrl.navy.mil (Charles Bachmann) Date: Fri, 1 Nov 91 11:11:07 -0500 Subject: Resource Allocation Network (RAN) Message-ID: <9111011611.AA22642@radar.nrl.navy.mil> who is john platt? I can't remember why I should know him? -chip From yoshio at eniac.seas.upenn.edu Fri Nov 1 13:20:02 1991 From: yoshio at eniac.seas.upenn.edu (Yoshio Yamamoto) Date: Fri, 1 Nov 91 13:20:02 EST Subject: No subject Message-ID: <9111011820.AA01432@eniac.seas.upenn.edu> One of my friends and I have been working on the applications of neural networks in control problem independently. After a little discussion we came across the following problem, which may be interesting from a practical point of view. Suppose you have two continuous input units whose data are normalized between 0 and 1, several hidden units, and one cotinuous output unit. Also suppose the input given to the input unit A is totally unrelated with the output; the input is a randomized number in [0,1]. The input unit B, on the other hand, has a strong colleration with its corresponding output. Therefore what we need is a trained newtwork such that it shows no colleration between the input A and the output. This can be illustrated by an example in which the input B is fixed, the input A varies at random in [0,1], and the network suppresses the influence from the input A to minimum, ideally a constant output regardless of the values in the input A. In other words, we want the output be fully independent of the input A. Then one obvious solution would be that all weights directed from the input A to the next hidden layer converge to zeros or very small values through the training process. Why is this interesting? This is useful in practical problem. Initially you don't know which input has colleration with the outputs and which doesn't. So you use all available inputs anyway. If there is a nonsense input, then it should be identified so by a neural network and the influence from the input should be automatically suppressed. The best solution we have in mind is that if no colleration were identified, then the weights associated with the input will shrink to zero. Is there any way to handle this problem? As a training tool we assume a backprop. Any suggestion will be greatly appreciated. - Yoshio Yamamoto General Robotics And Sensory Perception Laboratory (GRASP) University of Pennsylvania From hcard at ee.UManitoba.CA Fri Nov 1 17:00:02 1991 From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA) Date: Fri, 1 Nov 91 16:00:02 CST Subject: batch learning Message-ID: <9111012200.AA03433@card.ee.umanitoba.ca> I think my question concerning batch learning needs some amplification. The question was whether to add all contributions to the error from each pattern before taking derivatives. The point is that the batch dE/dW can be estimated (given limited precision, noise, etc) more accurately than the sum of many small dE(pat)/dW components. This will be particularly true towards the end of learning when errors are small. There may be several ways to determine the weight error derivative. The batch dE/dW would be most directly determined by twiddling the weights individually and rerunning the training set. I know this is expensive but the issue is accuracy not efficiency. Howard Card From John.Hampshire at SPEECH2.CS.CMU.EDU Fri Nov 1 17:41:47 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Fri, 1 Nov 91 17:41:47 EST Subject: Adding noise to training data Message-ID: As a follow-on to Geoff's post on this topic... Adding noise to the training set of any classifier (connectionist or other, linear or non-linear) has the statistical effect of convolving the PDF of the noise with the class-conditional densities of the RV that generated the training samples (assuming the noise and the RV are independent). This can (in principle) help generalization, because we typically have training sets that are so puny, we don't begin to have a sufficient sample size to estimate with any degree of precision the a-posteriori class distributions of the RV we're trying to classify. As a result, we get estimated a-posteriori distributions for a training set size of n that are usually n scaled Dirac delta functions distributed in feature space (continuous RV case). For discrete RV's the estimated distributions are made up of Kronecker deltas... OK, so if you add noise to that, you're convolving the deltas with the PDF of the noise (in the limit that you create an infinite number of noisy versions of each original training vector). This means that you have fabricated a NEW set of a-posteriori class distributions --- one that you hope will yield classification boundaries that are better estimates of the TRUE a-posteriori class distributions than all those original deltas. Whether or not you succeed depends critically on your choice of the PDF for the noise AND the covariance matrix of that PDF. In most cases the critical choice comes down to a largely arbitrary guess. So adding noise to improve generalization is something of an act of desperation in the face of uncertainty... uncertainty about what kind and how complex a classifier to build, uncertainty about the PDF of the data being classified... uncertainty about lots of things. John From polycarp at bode.usc.edu Fri Nov 1 18:38:30 1991 From: polycarp at bode.usc.edu (Marios Polycarpou) Date: Fri, 1 Nov 91 15:38:30 PST Subject: Tech. Rep. Available Message-ID: The following paper has been placed in the Neuroprose archives at Ohio State. The file is "polycarpou.stability.ps.Z." See ftp in- structions below. IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS USING NEURAL NETWORK MODELS: DESIGN AND STABILITY ANALYSIS Marios M. Polycarpou and Petros A. Ioannou Department of Electrical Engineering - Systems University of Southern California, MC-2563 Los Angeles, CA 90089-2563, U.S.A Abstract: The feasibility of applying neural network learning techniques in problems of system identification and control has been demonstrated through several empirical studies. These studies are based for the most part on gradient techniques for deriving parameter adjustment laws. While such schemes perform well in many cases, in general, problems arise in attempting to prove stability of the overall system, or convergence of the output error to zero. This paper presents a stability theory approach to synthesizing and analyzing identification and control schemes for nonlinear dynamical systems using neural network models. The nonlinearities of the dynamical system are assumed to be unknown and are modelled by neural network architectures. Multilayer networks with sigmoidal activation functions and radial basis function networks are the two types of neural network models that are considered. These static network architectures are combined with dynamical elements, in the form of stable filters, to construct a type of recurrent network configuration which is shown to be capable of approximating a large class of dynamical systems. Identification schemes based on neural network models are developed using two different techniques, namely, the Lyapunov synthesis approach and the gradient method. Both identification schemes are shown to guarantee stability, even in the presence of modelling errors. A novel network architecture, referred to as dynamic radial basis function network, is derived and shown to be useful in problems dealing with learning in dynamic enviroments. For a class of nonlinear systems, a stable neural network based control configuration is presented and analyzed. unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get polycarpou.stability.ps.Z ftp> quit unix> uncompress polycarpou.stability.ps.Z unix> lpr polycarpou.stability.ps Any comments are welcome! Marios Polycarpou e-mail: polycarp at bode.usc.edu From John.Hampshire at SPEECH2.CS.CMU.EDU Sun Nov 3 12:05:15 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Sun, 3 Nov 91 12:05:15 EST Subject: Adding noise to training data Message-ID: Sorry... PDF = probability density function RV = random vector (i.e., the probabilistic model generating the feature vectors of the training set). class-conditional density = probability density of (feature vector | class) --- see for example Duda & Hart. John From uh311ae at sunmanager.lrz-muenchen.de Sun Nov 3 13:56:50 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 03 Nov 91 19:56:50+0100 Subject: The transposed weight matrix hassle Message-ID: <9111031856.AA02137@sunmanager.lrz-muenchen.de> There are some nasty things showing up if you want to fine-tune a parallel architecture to algorithms such as backprop. E.g., you either get the communications fast fro the forwar phase or the backward phase - but if you want to use the same communication flow for both, you have to transpose the weight matrices. This is on the order of O(forget it). Has anybody cooked up an idea ? Cheers, Henrik MPCI at LLNL IBM Research U. of Munich From hughc at spectrum.cs.unsw.oz.au Mon Nov 4 15:59:28 1991 From: hughc at spectrum.cs.unsw.oz.au (Hugh Clapin) Date: Mon, 4 Nov 91 15:59:28 AES Subject: unsubscribe Message-ID: <9111040510.23803@munnari.oz.au> I sure hope this is the right address to send 'please unsubscribe me' messages. If it isn't, could someone please flame me with the appropriate addresss? hugh clapin hughc at spectrum.cs.unsw.oz.au From CONNECT at nbivax.nbi.dk Mon Nov 4 03:22:00 1991 From: CONNECT at nbivax.nbi.dk (CONNECT@nbivax.nbi.dk) Date: Mon, 4 Nov 1991 09:22 +0100 (NBI, Copenhagen) Subject: Contents of IJNS Vol. 2, issue 3 Message-ID: <107B86DB60E0636A@nbivax.nbi.dk> Begin Message: ----------------------------------------------------------------------- INTERNATIONAL JOURNAL OF NEURAL SYSTEMS The International Journal of Neural Systems is a quarterly journal which covers information processing in natural and artificial neural systems. It publishes original contributions on all aspects of this broad subject which involves physics, biology, psychology, computer science and engineering. Contributions include research papers, reviews and short communications. The journal presents a fresh undogmatic attitude towards this multidisciplinary field with the aim to be a forum for novel ideas and improved understanding of collective and cooperative phenomena with computational capabilities. ISSN: 0129-0657 (IJNS) ---------------------------------- Contents of Volume 2, issue number 3 (1991): 1. D.G. Stork: Sources of neural structure in speech and language processing. 2. L. Xu, A. Krzyzak and E. Oja: Neural nets for the dual subspace pattern recognition method. 3. P.J. Zwietering, E.H.L. Aarts and J. Wessels: The design and complexity of exact multi-layered perceptrons. 4. M.M. van Hulle: A goal programming network for mixed integer linear programming: A case study for the job-shop scheduling problem. 5. J-X. Wu and C. Chan: A three-layered adaptive network for pattern density estimation and classification. 6. L. Garrido and V. Gaitan: Use of neural nets to measure the tau-polarisation and its Bayesian interpretation. 7. C.M. Bishop: A fast procedure for retraining the multilayer perceptrons. 8. V. Menon and D.S. Tang: Population oscillations in neuronal groups. 9. V. Rodrigues and J. Skrzypek: Combining similarities and dissimilarities in supervised learning. ---------------------------------- Editorial board: B. Lautrup (Niels Bohr Institute, Denmark) (Editor-in-charge) S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) D. Stork (Stanford) (Book review editor) Associate editors: B. Baird (Berkeley) D. Ballard (University of Rochester) E. Baum (NEC Research Institute) S. Bjornsson (University of Iceland) J. M. Bower (CalTech) S. S. Chen (University of North Carolina) R. Eckmiller (University of Dusseldorf) J. L. Elman (University of California, San Diego) M. V. Feigelman (Landau Institute for Theoretical Physics) F. Fogelman-Soulie (Paris) K. Fukushima (Osaka University) A. Gjedde (Montreal Neurological Institute) S. Grillner (Nobel Institute for Neurophysiology, Stockholm) T. Gulliksen (University of Oslo) D. Hammerstrom (Oregon Graduate Institute) D. Horn (Tel Aviv University) J. Hounsgaard (University of Copenhagen) B. A. Huberman (XEROX PARC) L. B. Ioffe (Landau Institute for Theoretical Physics) P. I. M. Johannesma (Katholieke Univ. Nijmegen) M. Jordan (MIT) G. Josin (Neural Systems Inc.) I. Kanter (Princeton University) J. H. Kaas (Vanderbilt University) A. Lansner (Royal Institute of Technology, Stockholm) A. Lapedes (Los Alamos) B. McWhinney (Carnegie-Mellon University) M. Mezard (Ecole Normale Superieure, Paris) J. Moody (Yale, USA) A. F. Murray (University of Edinburgh) J. P. Nadal (Ecole Normale Superieure, Paris) E. Oja (Lappeenranta University of Technology, Finland) N. Parga (Centro Atomico Bariloche, Argentina) S. Patarnello (IBM ECSEC, Italy) P. Peretto (Centre d'Etudes Nucleaires de Grenoble) C. Peterson (University of Lund) K. Plunkett (University of Aarhus) S. A. Solla (AT&T Bell Labs) M. A. Virasoro (University of Rome) D. J. Wallace (University of Edinburgh) D. Zipser (University of California, San Diego) ---------------------------------- CALL FOR PAPERS Original contributions consistent with the scope of the journal are welcome. Complete instructions as well as sample copies and subscription information are available from The Editorial Secretariat, IJNS World Scientific Publishing Co. Pte. Ltd. 73, Lynton Mead, Totteridge London N20 8DH ENGLAND Telephone: (44)81-446-2461 or World Scientific Publishing Co. Inc. Suite 1B 1060 Main Street River Edge New Jersey 07661 USA Telephone: (1)201-487-9655 or World Scientific Publishing Co. Pte. Ltd. Farrer Road, P. O. Box 128 SINGAPORE 9128 Telephone (65)382-5663 ----------------------------------------------------------------------- End Message From xiru at Think.COM Mon Nov 4 11:07:02 1991 From: xiru at Think.COM (xiru Zhang) Date: Mon, 4 Nov 91 11:07:02 EST Subject: The transposed weight matrix hassle In-Reply-To: Henrik Klagges's message of 03 Nov 91 19:56:50+0100 <9111031856.AA02137@sunmanager.lrz-muenchen.de> Message-ID: <9111041607.AA01820@yangtze.think.com> Date: 03 Nov 91 19:56:50+0100 From: Henrik Klagges There are some nasty things showing up if you want to fine-tune a parallel architecture to algorithms such as backprop. E.g., you either get the communications fast fro the forwar phase or the backward phase - but if you want to use the same communication flow for both, you have to transpose the weight matrices. This is on the order of O(forget it). Has anybody cooked up an idea ? Cheers, Henrik MPCI at LLNL IBM Research U. of Munich This is definitely not true for our implementations on CM-2. We have several ways to run backprop: 1. one big network on the whole CM-2; 2. multiple copies of the same network on CM-2, each copy runs on a group of processors (i.e., "batch mode"); 3. multiple copies of the same network on CM-2, each copy runs on one processor. In none of the above cases we had the problem you mentioned in your message. - Xiru Zhang Thinking Machines Corp. From morgan at icsib.Berkeley.EDU Mon Nov 4 11:42:20 1991 From: morgan at icsib.Berkeley.EDU (Nelson Morgan) Date: Mon, 4 Nov 91 08:42:20 PST Subject: forward and backward (or, the transposed weight matrix hassle) Message-ID: <9111041642.AA21959@icsib.Berkeley.EDU> > From: Henrik Klagges > Message-Id: <9111031856.AA02137 at sunmanager.lrz-muenchen.de> > To: connectionists at cs.cmu.edu > Subject: The transposed weight matrix hassle > > There are some nasty things showing up if you want to fine-tune > a parallel architecture to algorithms such as backprop. E.g., you > either get the communications fast fro the forwar phase or the > backward phase - but if you want to use the same communication > flow for both, you have to transpose the weight matrices. This > is on the order of O(forget it). Has anybody cooked up an idea ? > > Cheers, Henrik > > MPCI at LLNL > IBM Research > U. of Munich > > > > ------- End of Forwarded Message > > Sure. We have had our parallel architecture, the Ring Array Processor (RAP) training up backprop nets for our speech recognition research for about a year and a half now. With a ring or a torus, you don't need to duplicate weight matrices. For instance, you can organize the weights so they are most convenient for the forward pass, and then during the backward pass just compute partial sums for all of the deltas; that is, on each processor just compute what you can out of every sum that has the local weights in it. Then pass around the partial sums systolically, updating cumulatively in each processor. If your computation is strongly virtualized (many more than 1 neuron per physical processor), and if your computation is efficient (we shift around the ring in one cycle, plus a few cycles overhead added to each complete shift around the ring), then this part of backprop is not a bad cost. I think this is described in our paper in Proceedings of ASAP '90. You can also send to info at icsi.berkeley.edu to ask about RAP TR's. From hwang at pierce.ee.washington.edu Mon Nov 4 11:02:54 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Mon, 4 Nov 91 08:02:54 PST Subject: The transposed weight matrix hassle Message-ID: <9111041602.AA23287@pierce.ee.washington.edu.> The forward phase of BP is done by the matrix-vector multiplication, the backward phase is done by the vector-matrix multiplication consecutively (layer-by-layer). In addition, the weight updating itself is done by an outer product operation. All these three operations can be elegantly implemented by a "ring array architecture" with fully pipelining efficiency (pipeline rate = 1). Some references: 1) S. Y. Kung, J. N. Hwang, "Parallel architectures for artificial neural networks," ICNN'88, San Diego, 1988. 2) J. N. Hwang, J. A. Vlontzos, S. Y. Kung, "A systolic Neural Network Architecture for Hidden Markov Models," IEEE Trans. on ASSP, December 1989. 3) S. Y. Kung, J. N. Hwang, " A unified systolic architecture for artificial neural networks," Journal of parallel and distributed computing, Special issue on Neural Networks, March 1989. Jenq-Neng Hwang 11/04/91 From white at teetot.acusd.edu Mon Nov 4 14:44:24 1991 From: white at teetot.acusd.edu (Ray White) Date: Mon, 4 Nov 91 11:44:24 -0800 Subject: No subject Message-ID: <9111041944.AA10150@teetot.acusd.edu> From prechelt at ira.uka.de Tue Nov 5 03:14:16 1991 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Tue, 05 Nov 91 09:14:16 +0100 Subject: Generalization In-Reply-To: Your message of Tue, 29 Oct 91 13:48:36 -0800. <9110292148.AA15810@beowulf.ucsd.edu> Message-ID: > ... I'd appreciate it if you > could mail it to me; also, I'd appreciate anyone's opinion on > "what is generalization" in 250 words or less :-) Let's do it much shorter (less than 50 words): Generalization is the application of knowledge about a set C of cases from a certain domain to a not-before-seen case X from the same domain but not belonging to C allowing to handle that case correctly. Notes: ------ 1. This can be made a concrete definition if you say what the terms knowledge case domain handle correctly shall mean. 2. This definition is NOT Neural Network specific. It can become Neural Network specific, depending on how the above terms are being defined. 3. Strictly speaking this defines a process, not a property of a mapping or something like that. 4. This defines something that Neuralnetters sometimes call 'successful generalization' as opposed to what happens in the system when it tries to generalize, but as a result the wrong result results. :-> 5. If you can decide what 'correct' is and what not, you can compute the can-generalize-to(X) predicate. This enables to quantify generalization capabilities. Comments and flames welcome. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4317, FAX: ++49/721/697760) | less simple. From shams at maxwell.hrl.hac.com Mon Nov 4 16:58:43 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Mon, 4 Nov 91 13:58:43 PST Subject: The transposed weight matrix hassle Message-ID: <9111042158.AA00763@maxwell.hrl.hac.com> There are a couple of different methods used for dealing with this problem that areeffective to a certain extend. First, a three phase conflict-free routing method has been proposed [1] that implicitly implements the matrix inversion during the back-propagation learning phase. This method is generally applicable to fine-grain architectures and sparsely connected neural nets. The second mapping method proposed by Kung & Hwang [2], efficiently time-multiplexes the synaptic interconnections of a neural network onto the physical connections of a 1-D ring systolic array. In this mapping, the matrix inversion operation requiredduring the learning phase can be performed by communicating neuron activation values between the processors (as oppose to the partial sums used in the feed-forward case). [1] V. K. Prasanna Kumar and K. W. Przytula, "Algorithmic Mapping of Neural Network Models onto Parallel SIMD Machines," Proceedings of the Inter. Conf. on Appl. Spec. Array Proc., Princeton, NJ, Ed. S. Y. Kung, E. E. Swartzlander, J. A. B. Fortes and K. W. Przytula, 1990. [2] S. Y. Kung and J. N. Hwang, RA Unified Systolic Architecture for Artificial Neural Networks.S Journal of Parallel and Distributed Computing. 6: 358-387, 1989. Soheil Shams Hughes Research Labs From mackay at hope.caltech.edu Tue Nov 5 13:20:59 1991 From: mackay at hope.caltech.edu (David MacKay) Date: Tue, 5 Nov 91 10:20:59 PST Subject: Announcement of NIPS Bayesian workshop and associated ftp archive Message-ID: <9111051820.AA02763@hope.caltech.edu> One of the two day workshops at Vail this year will be: `Developments in Bayesian methods for neural networks' ------------------------------------------------------ David MacKay and Steve Nowlan, organizers The first day of this workshop will be 50% tutorial in content, reviewing some new ways Bayesian methods may be applied to neural networks. The rest of the workshop will be devoted to discussions of the frontiers and challenges facing Bayesian work in neural networks. Participants are encouraged to obtain preprints by anonymous ftp before the workshop. Instructions end this message. Discussion will be moderated by John Bridle. Day 1, Morning: Tutorial review. 0 Introduction to Bayesian data modelling. David MacKay 1 E-M, clustering and mixtures. Steve Nowlan 2 Bayesian model comparison and determination of regularization constants - application to regression networks. David MacKay 3 The use of mixture decay schemes in backprop networks. Steve Nowlan Day 1, Evening: Tutorial continued. 4 The `evidence' framework for classification networks. David MacKay Day 1, Evening: Frontier Discussion. Background: A: In many cases the true Bayesian posterior distribution over a hypothesis or parameter space is difficult to obtain analytically. Monte Carlo methods may provide a useful and computationally efficient way to estimate posterior distributions in such cases. B: There are many applications where training data is expensive to obtain, and it is desirable to select training examples so we can learn as much as possible from each one. This session will discuss approaches for selecting the next training point "optimally". The same approaches may also be useful for reducing the size of a large data set by omitting the uninformative data points. A Monte Carlo clustering Radford Neal B Data selection / active query learning Jurgen Schmidhuber David MacKay Day 2, morning discussion: C Prediction of generalisation Background: The Bayesian approach to model comparison evaluates how PROBABLE alternative models are given the data. In contrast, the real problem is often to estimate HOW WELL EACH MODEL IS EXPECTED TO GENERALISE. In this session we will hear about various approaches to predicting generalisation. It is hoped that the discussion will shed light on the questions: - How does Bayesian model comparison relate to generalisation? - Can we predict generalisation ability of one model assuming that the `truth' is in a different specified model class? - Is it possible to predict generalisation ability WITHOUT making implicit assumptions about the properties of the `truth'? - Can we interpret GCV (cross-validation) in terms of prediction of generalisation? 1 Prediction of generalisation with `GPE' John Moody 2 Prediction of generalisation - worst + average case analysis David Haussler + Michael Kearns 3 News from the statistical physics front Sara Solla Day 2, Evening discussion: (Note: There will probably be time in this final session for continued discussion from the other sessions.) D Missing inputs, unlabelled data and discriminative training Background: When training a classifier with a data set D_1 = {x,t}, a full probability model is one which assigns a parameterised probability P(x,t|w). However, many classifiers only produce a discriminant P(t|x,w), ie they do not model P(x). Furthermore, classifiers of the first type often yield better discriminative performance if they are trained as if they were only of the second type. This is called `discriminative training'. The problem with discriminative training is that it leaves us with no obvious way to use UNLABELLED data D_2 = {x}. Such data is usually cheap, but how can we integrate it with discriminative training? The same problem arises for most regression or classifier models when some of the input variables are missing from the input vector. What is the right thing to do? 1 Introduction: the problem of combining unlabelled data and discriminative training Steve Renals 2 Combining labelled and unlabelled data for the modem problem Steve Nowlan Reading up before the workshop ------------------------------ People intending to attend this workshop are encouraged to obtain preprints of relevant material before NIPS. A selection of preprints are available by anonymous ftp, as follows: unix> ftp hope.caltech.edu (or ftp 131.215.4.231) Name: anonymous Password: ftp> cd pub/mackay ftp> get README.NIPS ftp> quit Then read the file README.NIPS for further information. Problems? Contact David MacKay, mackay at hope.caltech.edu, or Steve Nowlan, nowlan at helmholtz.sdsc.edu --------------------------------------------------------------------------- From english at sun1.cs.ttu.edu Mon Nov 4 09:42:53 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Mon, 4 Nov 91 08:42:53 CST Subject: Adding noise to training data Message-ID: <9111041442.AA07265@sun1.cs.ttu.edu> At first blush, it seems there's a close relationship between Parzen estimation (Duda & Hart 1973) and training with noise added to the samples. If we were to use the noise function as the window function in Parzen estimation of the distribution from which the training set was drawn, wouldn't we would obtain precisely the noisy-sample distribution? And wouldn't a network minimizing squared error for the noisy training set asymptotically realize (i.e., as the number of noisy sample presentations approaches infinity) the Parzen estimator? The results of Hampshire and Perlmutter (1990) seem to be relevant here. > So adding noise to improve generalization is something of an act of > desperation in the face of uncertainty... uncertainty about what kind > and how complex a classifier to build, uncertainty about the PDF of the > data being classified... uncertainty about lots of things. I agree. But perhaps the "act of desperation" is of a familiar sort. Tom English Duda, R. O., and P. E. Hart. 1973. Pattern Classification and Scene Analysis. New York: Wiley & Sons. Hampshire, J. B., and B. A. Perlmutter. 1990?. Equivalence proofs for multi-layer perceptron classifiers and Bayesian discriminant function. In Proc. 1990 Connectionist Models Summer School. [Publisher?] From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Nov 5 14:55:45 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 05 Nov 91 14:55:45 EST Subject: No subject In-Reply-To: Your message of Mon, 04 Nov 91 11:44:24 -0800. <9111041944.AA10150@teetot.acusd.edu> Message-ID: Another idea is to calculate the matrix of second derivatives (grad(grad E)) as well as the first derivatives (grad E) and from this information calculate the (unique) parabolic surface in weight space that has the same derivatives. Then the weights should be updated so as to jump to the center (minimum) of the parabola. I haven't coded this idea yet, has anyone else looked at this kind of thing, and if so what are the results? My Quickprop algorithm is pretty close to what you describe here, excpet that it uses only the diagonal terms of the second derivative (i.e. it pretends that the weight updates do not affect one another). If you haven't seen the paper on this, it's in neuroprose as "fahlman.quickprop-tr.ps.Z" or something close to that. It works well -- in the few cases I have seen in which both quickprop and conjugate gradient were used on the same problems, quickprop is considerably faster (though in very high-dimensional spaces, CG might win). Yann LeCun has used a slightly different version of the same idea: he back-propagates second-derivative information for each case, and uses this to dynamically adjust the learning rate. -- Scott Fahlman From haussler at saturn.ucsc.edu Tue Nov 5 17:09:23 1991 From: haussler at saturn.ucsc.edu (David Haussler) Date: Tue, 5 Nov 91 14:09:23 -0800 Subject: call for papers for COLT '92 Message-ID: <9111052209.AA23475@saturn.ucsc.edu> CALL FOR PAPERS COLT '92 Fifth ACM Workshop on Computational Learning Theory University of Pittsburgh July 27-29, 1992 The fifth workshop on Computational Learning Theory will be held at the University of Pittsburgh, Pittsburgh Pennsylvania. The workshop is sponsored jointly by the ACM Special Interest Groups in Automata and Computability Theory and Artificial Intelligence. Registration is open, within the limits of the space available (about 150 people). We invite papers in all areas that relate directly to the analysis of learning algorithms and the theory of machine learning, including artificial and biological neural networks, robotics, pattern recognition, inductive inference, information theory, decision theory, Bayesian/MDL estimation, and cryptography. We look forward to a lively, interdisciplinary meeting. As part of our program, we are pleased to be presenting an invited talk by Prof. A. Barto of the University of Massachusetts on reinforcement learning. Other invited talks may be scheduled as well. Authors should submit an extended abstract that consists of: + A cover page with title, authors' names, (postal and e-mail) addresses, and a 200 word summary. + A body not longer than 10 pages in twelve-point font. Be sure to include a clear definition of the theoretical model used, an overview of the results, and some discussion of their significance, including comparison to other work. Proofs or proof sketches should be included in the technical section. Experimental results are welcome, but are expected to be supported by theoretical analysis. Authors should send 13 copies of their abstract to David Haussler, COLT '92, Computer and Information Sciences, University of California, Santa Cruz, CA 95064. The deadline for receiving submissions is February 10, 1992. This deadline is FIRM. Authors will be notified by April 10; final camera-ready papers will be due May 15. Chair: Bob Daley (C.S. Dept., U. Pittsburgh, PA 15260). Program committee: David Haussler (UC Santa Cruz, chair), Naoki Abe (NEC, Japan), Shai Ben-David (Technion), Tom Cover (Stanford), Rusins Freivalds (U. of Latvia), Lisa Hellerstein (Northwestern), Nick Littlestone (NEC, Princeton), Wolfgang Maass (Technical U., Graz, Austria), Lenny Pitt (U. Illinois), Robert Schapire (Bell Labs, Murray Hill), Carl Smith (U. Maryland), Naftali Tishby (Hebrew U.), Santosh Venkatesh (U. Penn.) Note: papers that have appeared in journals or other conferences, or that are being submitted to other conferences are not appropriate for submission to COLT. From jbower at cns.caltech.edu Wed Nov 6 00:55:04 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Tue, 5 Nov 91 21:55:04 PST Subject: CNS*92 Message-ID: <9111060555.AA04973@cns.caltech.edu> CALL FOR PAPERS First Annual Computation and Neural Systems Meeting CNS*92 Sunday, July 26 through Friday, July 31 1992 San Francisco, California This is the first annual meeting of an inter-disciplinary conference intended to address the broad range of research approaches and issues involved in the general field of computational neuroscience. The meeting itself has grown out of a workshop on "The Analysis and Modeling of Neural Systems" which has been held each of the last two years at the same site. The strong response to these previous meetings has suggested that it is now time for an annual open meeting on computational approaches to understanding neurobiological systems. CNS*92 is intended to bring together experimental and theoretical neurobiologists along with engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in understanding how neural systems compute. The meeting will equally emphasize experimental, model-based, and more abstract theoretical approaches to understanding neurobiological computation. The first day of the meeting (July 26) will be devoted to tutorial presentations and workshops focused on particular technical issues confronting computational neurobiology. The next three days will include the main technical program consisting of plenary, contributed and poster sessions. There will be no parallel sessions and the full text of presented papers will be published. Following the regular session, there will be two days of focused workshops at a site on the California coast (July 30-31). Participation in the workshops is restricted to 75 attendees. Technical Program: Plenary, contributed and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Presentation categories: A. Theory and Analysis B. Modeling and Simulation C. Experimental D. Tools and Techniques Themes: A. Development B. Cell Biology C. Excitable Membranes and Synaptic Mechanisms D. Neurotransmitters, Modulators, Receptors E. Sensory Systems 1. Somatosensory 2. Visual 3. Auditory 4. Olfactory 5. Other F. Motor Systems and Sensory Motor Integration G. Behavior H. Cognitive I. Disease Submission Procedures: Original research contributions are solicited, and will be carefully refereed. Authors must submit six copies of both a 1000-word (or less) summary and six copies of a separate singlepage 50-100 word abstract clearly stating their results postmarked by January 7, 1992. Accepted abstracts will be published in the conference program. Summaries are for program committee use only. At the bottom of each abstract page and on the first summary page indicate preference for oral or poster presentation and specify at least one appropriate category and and theme. Also indicate preparation if applicable. Include addresses of all authors on the front of the summary and the abstract and indicate to which author correspondence should be addressed. Submissions will not be considered that lack category information, separate abstract sheets, the required six copies, author addresses, or are late. Mail Submissions To: Chris Ploegaert CNS*92 Submissions Division of Biology 216-76 Caltech Pasadena, CA. 91125 Mail For Registration Material To: Chris Ghinazzi Lawrence Livermore National Laboratories P.O. Box 808 Livermore CA. 94550 All submitting authors will be sent registration material automatically. Program committee decisions will be sent to the correspondence author only. CNS*92 Organizing Committee: Program Chair, James M. Bower, Caltech. Publicity Chair, Frank Eeckman, Lawrence Livermore Labs. Finances, John Miller, UC Berkeley and Nora Smiriga, Institute of Scientific Computing Res. Local Arrangements, Ted Lewis, UC Berkeley and Muriel Ross, NASA Ames. Program Committee: William Bialek, NEC Research Institute. James M. Bower, Caltech. Frank Eeckman, Lawrence Livermore Labs. Scott Fraser, Caltech. Christof Koch, Caltech. Ted Lewis, UC Berkeley. Eve Marder, Brandeis. Bruce McNaughton, University of Arizona. John Miller, UC Berkeley. Idan Segev, Hebrew University, Jerusalem Shihab Shamma, University of Maryland. Josef Skrzypek, UCLA. DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 please post From HOLMSTROM at csc.fi Mon Nov 4 12:41:00 1991 From: HOLMSTROM at csc.fi (HOLMSTROM@csc.fi) Date: Mon, 4 Nov 91 12:41 EET Subject: Adding noise to training data Message-ID: A note to John Hampshire's comment on this topic: Adding noise to the training vectors has been suggested and also used with some success by several authors. In a forthcoming article (Lasse Holmstrom and Petri Koistinen, "Using Additive Noise in Back-Propagation Training", IEEE Transactions on Neural Networks, January 1992) this method is discussed from the point of view of mathematical statistics. It is not claimed that better generalization is always achieved but mathematical insight is given to the choice of the characteristics of the additive noise density if using additive noise is attempted. A critical question is the level (variance) of the additive noise. One method to estimate a suitable noise level directly from data is to use a cross-validation method known from statistics. In a standard benchmark experiment (Kohonen- Barna-Chrisley, Neurocomputing 2) significant improvement in classification performance was achieved. The training method is also shown to be asymptotically consistent provided the noise level is chosen appropriately. Lasse Holmstrom From patrick at magi.ncsl.nist.gov Wed Nov 6 14:29:04 1991 From: patrick at magi.ncsl.nist.gov (Patrick Grother) Date: Wed, 6 Nov 91 14:29:04 EST Subject: Parallel MLP's Message-ID: <9111061929.AA17129@magi.ncsl.nist.gov> Parallel Multilayer Perceptron We have implemented batch mode conjugate gradient and backprop algorithms on a AMT 510C array processor (32 x 32 8 bit SIMD elements). As you know the weight update (i.e. straight vector operation done every epoch) is a tiny fraction of the total cost since, for realistically large (non redundant = noisy) training sets, the forward and backward propagation time is dominant. Given that this applies to both conjugate gradient and backprop, and that conjgrad typically converges in 30 times fewer iterations than backprop, conjgrad is undeniably the way to do it. On line involves the forward pass of a single input vector through the weights. This involves a matrix*vector operation and a sigmoid (or whatever) evaluation. The latter is purely parallel. The matrix operation involves a broadcast, a parallel multiplication and a recursive doubling sum over rows (or columns). Batch (or semi batch) passes many vectors through and is thus a matrix*matrix operation. The literature on this operation in parallel is huge and the best algorithm is inevitably dependent on the (communications bandwidth of the) particular machine and on the size of the matrices. On the array processor the outer product accumulation algorithm is up to 5 times quicker than the inner product algorithm: Outer: Given two matrices W(HIDS,INPS), P(INPS,PATS) obtain F(HIDS,PATS) thus F = 0 do i = 1, INPS { col_replicas = col_broadcast(W(,i)) # replicate col i over cols row_replicas = row_broadcast(P(i,)) # replicate row i over rows F = F + row_replicas * col_replicas } Inner: As above except do i = 1, HIDS { col_replicas = col_broadcast(W(i,)) # replicate row i over cols F(i,) = sum_over_rows( P * col_replicas ) # sum up the cols (ie over rows) } Henrik Klagges' weight matrix transposition in backprop is not really necessary. The output error is backpropagated through the final layer weights using the algorithm above; the difference is merely one of selecting columns instead of rows. Outer: With weights W(HIDS,INPS), output F(HIDS,PATS) obtain the inputs P(M,N) P = 0 do i = 1, L { row_replicas = row_broadcast(W(i,)) # replicate the row i col_replicas = col_broadcast(P(i,)) # replicate the col i P = P + row_replicas * col_replicas } On the DAP this operation is just as fast as explicitly doing the transpose. Transposition can be speeded greatly if the matrix dimensions are powers of two but the operation is inexpensive compared to matrix multiplication anyway, for any size matrix. A "recall" forward pass through a 32:32:10 MLP with 24 bit floats is taking 79 microsecs per input vector. Through a 128:64:10 takes 305 microsecs and a 1024:512:32 takes 1237. The latter is equivalent to 17.4 million connection-presentations per second. Such speed permits MLPs, trained from many initial random weight positions, to be optimised. The on-line versus batch problem is still unclear and I think that a semi batch, conjugate gradient method looks a good compromise in which case parallel code as above applies. Patrick Grother patrick at magi.ncsl.nist.gov Image Recognition Group Advanced Systems Division Computer Systems Laboratory Room A216 Building 225 NIST Gaithersburg MD 20899 From jon at cs.flinders.oz.au Wed Nov 6 18:38:37 1991 From: jon at cs.flinders.oz.au (jon@cs.flinders.oz.au) Date: Thu, 07 Nov 91 10:08:37 +1030 Subject: Quickprop. Message-ID: <9111062338.AA03721@degas> My Quickprop algorithm is pretty close to what you describe here, excpet that it uses only the diagonal terms of the second derivative (i.e. it pretends that the weight updates do not affect one another). If you haven't seen the paper on this, it's in neuroprose as "fahlman.quickprop-tr.ps.Z" or something close to that. It works well -- in the few cases I have seen in which both quickprop and conjugate gradient were used on the same problems, quickprop is considerably faster (though in very high-dimensional spaces, CG might win). Yann LeCun has used a slightly different version of the same idea: he back-propagates second-derivative information for each case, and uses this to dynamically adjust the learning rate. - -- Scott Fahlman Thanks for the info. I'll grab your paper out of Neuroprose and give it a read. Have you also done anything on keeping the magnitude of the error vector constant? Doing this makes a lot of sense to me as it is only the direction of the next jump in weight space that is important, and in particular if one uses delta(w) = - alpha*grad(E) then flat regions cause very slow progress and steep regions may cause one to move too fast. delta(w) = -alpha*grad(E)/||grad(E)|| gives one a lot more control over the learning rate. Jon Baxter From hal at asi.com Wed Nov 6 16:37:22 1991 From: hal at asi.com (Hal McCartor) Date: Wed, 6 Nov 91 13:37:22 PST Subject: Efficient parallel Backprop Message-ID: <9111062137.AA22811@asi.com> In response to the recent question about running BP on parallel hardware: The Backpropagation algorithm can be run quite efficiently on parallel hardware by maintaining a transpose of the output layer weights on the hidden node processors and updating them in the usual manner so that they are always maintained as the exact transpose of the output layer weights. The error on the output nodes is broadcast to all hidden nodes simultaneously where each multiplies it by the appropriate transpose weight to accumulate an error sum. The transpose weights can also be updated in parallel making the whole process quite efficient. This technique is further explained in Advances in Neural Information Processing, Volume 3, page 1028, in the paper, Back Propagation Implementation on the Adaptive Solutions CNAPS Neurocomputer Chip. Hal McCartor From kirk at watson.ibm.com Wed Nov 6 19:32:36 1991 From: kirk at watson.ibm.com (Scott Kirkpatrick) Date: Wed, 6 Nov 91 19:32:36 EST Subject: NIPS Workshop housing Message-ID: Because of strong early demand for rooms, the block of rooms which we had held at the Marriott for NIPS workshop attendees sold out even before the Nov. 4 date mentioned in the brochure. In fact, the whole hotel is now sold out during the workshops. The Marriott has arranged for two hotels, each in the next block, to offer rooms at the conference rate or close to it. These are the Evergreen Hotel, (303)-476-7810 at $74/night, and L'Ostello, (303)-476-2050 at $79/night. If you were unable to get a room while this was being sorted out, or haven't reserved yet, call one of these. You can also call the published Marriott number, (303)-476-4444, for pointers to these or additional hotels, should we run out again. From white at teetot.acusd.edu Wed Nov 6 19:39:03 1991 From: white at teetot.acusd.edu (Ray White) Date: Wed, 6 Nov 91 16:39:03 -0800 Subject: No subject Message-ID: <9111070039.AA01514@teetot.acusd.edu> Yoshio Yamamoto wrote: > ... > Suppose you have two continuous input units whose data are normalized between > 0 and 1, several hidden units, and one continuous output unit. > Also suppose the input given to the input unit A is totally unrelated with the > output; the input is a randomized number in [0,1]. The input unit B, on the > other hand, has a strong correlation with its corresponding output. > Therefore what we need is a trained network such that it shows no correlation > between the input A and the output. ... > Why is this interesting? This is useful in practical problem. > Initially you don't know which input has correlation with the outputs > doesn't. So you use all available inputs anyway. If there is a nonsense > input, then it should be identified so by a neural network and the influence > from the input should be automatically suppressed. > The best solution we have in mind is that if no correlation were identified, > then the weights associated with the input will shrink to zero. > Is there any way to handle this problem? > As a training tool we assume a backprop. > Any suggestion will be greatly appreciated. > - Yoshio Yamamoto > General Robotics And Sensory Perception Laboratory (GRASP) > University of Pennsylvania In reading this, I infer that there is a problem in training the net with backprop - and then not getting the desired behavior. I'm not enough of a backprop person to know if that inference is correct. But in any case, why use backprop, when the desired behavior is a natural outcome of training the hidden units with an optimizing algorithm, something similar to Hebbian learning, but Hebbian learning modified so that the learning is correlated with the desired output function? An example of such an algorithm is Competitive Hebbian Learning, which will be published in the first (or maybe the second) 1992 issue of NEURAL NETWORKS (Ray White, Competitive Hebbian Learning: Algorithm and Demonstrations). One trains the hidden units to compete with each other as well as with the inverse of the desired output function. I've tried it on Boolean functions and it works, though I haven't tried the precise problem with real-valued inputs. Other optimizing "soft-competition" algorithms may also work. One should get the best results for the output layer by training it with a delta rule (not backprop, since only the output layer training is still needed). Competitive Hebbian Learning may work for the output layer as well, but one should get better convergence to the desired output with delta-rule training. Ray White (white at teetot.acusd.edu) Depts. of Physics & Computer Science University of San Diego From FRANKLIN%lem.rug.ac.be at BITNET.CC.CMU.EDU Thu Nov 7 13:29:00 1991 From: FRANKLIN%lem.rug.ac.be at BITNET.CC.CMU.EDU (Franklin Vermeulen) Date: Thu, 7 Nov 91 13:29 N Subject: No subject Message-ID: <01GCO3MBA5SG001FXI@BGERUG51.BITNET> Dear fellow researcher: I am looking for names (and coordinates) of people (preferably in Europe) knowledgeable in the fields of statistics/neural networks, with the aim of estimating gray scale images (e.g., in the field of subtractive radiology). Thank you for your kind consideration of this request. If you intend to react, please do not postpone your answer. Sincerely. Franklin L. Vermeulen, Ph.D. E-Mail: Franklin at lem.rug.ac.be Medical imaging group, Electronics and Metrology Lab, Universiteit Gent Sint Pietersnieuwstraat 41, B-9000 Gent (BELGIUM) +32 (91) 64-3367 (Direct dial-in) From HOLMSTROM at csc.fi Thu Nov 7 13:06:00 1991 From: HOLMSTROM at csc.fi (HOLMSTROM@csc.fi) Date: Thu, 7 Nov 91 13:06 EET Subject: Adding noise to training data Message-ID: A note to a comment by Tom English on this topic: He writes > At first blush, it seems there's a close relationship between Parzen > estimation (Duda & Hart 1973) and training with noise added to the > samples. If we were to use the noise function as the window function > in Parzen estimation of the distribution from which the training set > was drawn, wouldn't we would obtain precisely the noisy-sample > distribution? Yes, this is correct. > And wouldn't a network minimizing squared error for the noisy training > set asymptotically realize (i.e., as the number of noisy sample > presentations approaches infinity) the Parzen estimator? The > results of Hampshire and Perlmutter (1990) seem to be relevant here. As I said in my earlier note to John Hampshire's comment, there will be a paper in the January issue of IEEE Transactions on Neural Networks that gives a statistical analysis of using additive noise in training. Several asymptotic results are given there. Lasse Holmstrom From GINZBERG at TAUNIVM.TAU.AC.IL Wed Nov 6 16:30:22 1991 From: GINZBERG at TAUNIVM.TAU.AC.IL (Iris Ginzberg) Date: Wed, 06 Nov 91 16:30:22 IST Subject: Roommate for NIPS Message-ID: Dear Connectionists, I'm looking for a roommate for NIPS and/or the workshop. I'll arrive at Denver on Sunday, leave on Thursday. Arrive at Vale on Thursday, leave on Monday or Tuesday. ,,,Iris my e-mail is GINZBERG @ TAUNIVM.BITNET From John.Hampshire at SPEECH2.CS.CMU.EDU Thu Nov 7 09:21:43 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Thu, 7 Nov 91 09:21:43 EST Subject: Adding noise to training data Message-ID: Actually, I made my original post on this topic to connectionists by accident... Oh well. Tom English is right --- Parzen windows do (in effect) estimate classification boundaries by convolving the noise PDF with the training sample PDF (a bunch of deltas...). Again, the goodness of the result depends on the PDF of the noise and its covariance matrix. The difference between Parzen windows and adding noise to data which is then used to train a (say) connectionist classifier is that both techniques formulate a new estimated PDF of the training data in the basis of the additive noise, but the connectionist model THEN goes on to try to model this new PDF in its connections and ITS set of basis functions. This is what seems desperate to me, and I didn't mean to impugn Parzen windows. I expected someone to catch me on that! -John From jcp at vaxserv.sarnoff.com Thu Nov 7 13:09:26 1991 From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385) Date: Thu, 7 Nov 91 13:09:26 EST Subject: NIPS-91 reminder Message-ID: <9111071809.AA00563@sarnoff.sarnoff.com> Now's the time to register for the NIPS-91 conference and workshops! For those of you who don't know about NIPS, read on: NEURAL INFORMATION PROCESSING SYSTEMS: NATURAL AND SYNTHETIC Conference: Monday, December 2 - Thursday, December 5, 1991; Denver, Colorado Workshop: Friday, December 6 - Saturday, December 7, 1991; Vail, Colorado This is the fifth meeting of an inter-disciplinary conference which brings together neuroscientists, engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in all aspects of neural processing and computation. There will be an afternoon of tutorial presentations (Dec 2) preceding the regular session and two days of focused workshops will follow at a nearby ski area (Dec 6-7). The meeting is sponsored by the Institute of Electrical and Electronic Engineers Information Theory Group, the Society for Neuroscience, and the American Physical Society. Plenary, contributed, and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Topical categories include: Neuroscience; Theory; Implementation and Simulations; Algorithms and Architectures; Cognitive Science and AI; Visual Processing; Speech and Signal Processing; Control, Navigation, and Planning; Applications. The format of the workshop is informal. Beyond reporting on past research, the goal is to provide a forum for scientists actively working in the field to freely discuss current issues of concern and interest. Sessions will meet in the morning and in the afternoon of both days, with free time in between for the ongoing individual exchange or outdoor activities. Specific open and/or controversial issues are encouraged and preferred as workshop topics. The deadline for submission of abstracts and workshop proposals is May 17th, 1991. For further information concerning the conference contact Dr. Stephen J. Hanson; NIPS*91 Information; Siemens Research Center; 755 College road East; Princeton NJ, 08540 From english at sun1.cs.ttu.edu Thu Nov 7 16:29:00 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Thu, 7 Nov 91 15:29:00 CST Subject: Adding noise to training data Message-ID: <9111072129.AA11674@sun1.cs.ttu.edu> With regard to the relationship of Parzen estimation and training with noise added to samples, John Hampshire writes > ... but the connectionist model THEN goes on to try to model > this new PDF in its connections and ITS set of basis functions. > This is what seems desperate to me.... In a sense, it IS desperate. But an important problem for direct-form implementations of Parzen estimators (e.g., Specht 1990) is the storage requirements. Adding noise to the training samples and training by back-propagation may be interpreted as a time-expensive approach to obtaining a space-economical Parzen estimator. (I'm assuming that the net requires less memory than direct-form implementations). Of course, we don't know in advance if a given network is equipped to realize the Parzen estimator. I suspect that someone could produce a case in which the "universal approximator" architecture (Hornik, Stinchcombe, and White 1989) would achieve a reasonable approximation only by using more memory than the direct-form implementation. My thanks to John for (accidentally) posting some interesting comments. --Tom English Specht, D. 1990. Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Tran. Neural Networks 1 (1), pp. 111-21. Hornik, K., Stinchcombe, M., and White, H. 1989. Multilayer feedforward nets are universal approximators. Neural Networks 2, pp. 359-66. From patrick at magi.ncsl.nist.gov Thu Nov 7 16:38:15 1991 From: patrick at magi.ncsl.nist.gov (Patrick Grother) Date: Thu, 7 Nov 91 16:38:15 EST Subject: Scaled Conjugate Gradient Message-ID: <9111072138.AA22416@magi.ncsl.nist.gov> The factor of 30 speed up of conjugate gradient over backprop that I quoted in my piece of November 6 is due to an excellent scaled conjugate gradient algorithm from Martin Moeller. Some conjgrad algorithms have been criticised on the basis that a costly line search is performed per epoch. Moeller's method sidesteps this expense by means of an automatic Levenberg-Marquardt step size scaling at each iteration. This effectively regulates the indefiniteness of the Hessian matrix. Patrick Grother Advanced Systems Division NIST November 7 From at neural.att.com Thu Nov 7 17:36:00 1991 From: at neural.att.com (@neural.att.com) Date: Thu, 07 Nov 91 17:36:00 -0500 Subject: Efficient parallel Backprop In-Reply-To: Your message of Wed, 06 Nov 91 13:37:22 -0800. Message-ID: <9111072236.AA03913@lamoon> The trick mentioned by Hal McCartor which consists in storing each weight twice (one copy in the processor that takes care of the presynaptic unit, and one copy in the processor that takes care of the postsynaptic unit) and update them both independently is probably one of the best techniques. It does require to make strong assumptions about the architecture of the network, and only costs you a factor of two in efficiency. It requires to broadcast the states, but in most cases there is a lot less states than weights. Unfortunately, it does not work so well in the case of shared-weight networks. I first heard about it from Leon Bottou (then at University of Paris-Orsay) in 1987. This trick was used in the L-NEURO backprop chip designed by Marc Duranton at the Philips Labs in Paris. -- Yann Le Cun From thierry at elen.utah.edu Thu Nov 7 18:23:09 1991 From: thierry at elen.utah.edu (Thierry Guillerm) Date: Thu, 7 Nov 91 16:23:09 MST Subject: Instantaneous and Average performance measures Message-ID: <9111072323.AA20758@perseus.elen.utah.edu> ABOUT INSTANTANEOUS AND AVERAGE PERFORMANCE MEASURE: A gradient descent learning based on an estimate of the performance measure U(w) (w = weights) can be represented as dw = -a0 grad( est[U(w)] ) dt where a0 is the step size, w represents the weights, t the time. The usual technique of moving the weights for each training sample can be represented as dw = - a0 grad( L(w,z) ) dt where z reprents the training sample, L(w,z) is the instantaneous performance measure. A good point about using an instantaneous performance measure L(w,z) in the gradient descent, (instead of waiting a few epochs for estimating U(w) and upgrade the weights) is that noise is inherently added to the process. Under some conditions (which ones?), the instantaneous learning can be rewritten as dw = - a0 grad( U(w) ) dt + b0 dx where x is a standard Brownian motion. This equation represents a diffusion process, which can be viewed as shaking the movement of the current weight point in the weight space. It is known that this process is a simulated annealing process. It is suspected that a minimum obtained this way will be better than with an averaging method. Has somebody done work on the quality of a solution obtained after a given time of running BackProp, or simulated annealing? Are there quantitative results about how long it takes to reach a given quality of solution? send email to: thierry at signus.utah.edu From alan_harget at pphub.aston.ac.uk Thu Nov 7 07:55:56 1991 From: alan_harget at pphub.aston.ac.uk (Alan Harget) Date: Thu, 7 Nov 91 12:55:56 GMT Subject: Vacancies Message-ID: Date 7/11/91 Subject Vacancies From Alan Harget To Connect Bulletin, Midlands KBS Subject: Time:12:52 am OFFICE MEMO Vacancies Date:7/11/91 ASTON UNIVERSITY BIRMINGHAM, ENGLAND DEPARTMENT OF COMPUTER SCIENCE AND APPLIED MATHEMATICS READERSHIPS/LECTURESHIPS IN COMPUTER SCIENCE Candidates are sought for the above posts to strengthen and expand the work of the Department. The key requirement is a proven ability to undertake high-quality research in Computer Science. Applicants for Readerships will have a substantial research record and have made a significant impact in their chosen area. Potential Lecturers will already have a publication record that demonstrates their research ability. Although areas of special interest are Neural Networks, Software Engineering, Database Techniques and Artificial Intelligence, excellent candidates in other fields will be considered. Anyone wishing to discuss the posts informally may call Professor David Bounds, Head of Department. Tel: 021-359-3611, ext 4285. Application forms and further particulars may be obtained from the Personnel Officer (Academic Staff), Aston University, Aston Triangle, Birmingham B4 7ET. Please quote appropriate Ref No: Readership (9139/EB); Lectureship (9140/EB). 24-hr answerphone: 021-359 -0870. Facsimile: 021-359-6470. Closing date for receipt of applications is 22 November 1991. From joe at cogsci.edinburgh.ac.uk Thu Nov 7 11:23:49 1991 From: joe at cogsci.edinburgh.ac.uk (Joe Levy) Date: Thu, 07 Nov 91 16:23:49 +0000 Subject: Research Post in Edinburgh UK Message-ID: <17169.9111071623@muriel.cogsci.ed.ac.uk> Research Assistant in Neural Networks University of Edinburgh Department of Psychology Applications are invited for a three year computational project studying the effects of damage on the performance of neural networks and other distributed systems, in order to elucidate the inferences that can be made about normal cognitive function from the patterns of breakdown in brain damaged patients. Previous experience in cognitive psychology and/or neural networks is desirable, preferably at the doctoral level. The post is funded by the Joint Councils Initiative in Cognitive Science/HCI awared to Dr Nick Chater. The closing date is 3 December, and the starting date will be between 1 January and 1 March 1992. A covering letter, CV and the names and addresses of 3 referees should be sent to University of Edinburgh, Personnel Department 1 Roxburgh Street, Edinburgh EH8 9TB. Informal enquiries to Nick Chater email: nicholas%cogsci.ed.ac.uk at nsfnet-relay.ac.uk From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET Fri Nov 8 10:02:15 1991 From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya) Date: Fri, 8 Nov 91 10:02:15 JST Subject: second derivatives Message-ID: <9111080102.AA17565@hcrlgw.crl.hitachi.co.jp> > Another idea is to calculate the matrix of second derivatives (grad(grad E)) as > well as the first derivatives (grad E) and from this information calculate the > (unique) parabolic surface in weight space that has the same derivatives. Then > the weights should be updated so as to jump to the center (minimum) of the > parabola. I haven't coded this idea yet, has anyone else looked at this kind > of thing, and if so what are the results? >-- Scott Fahlman I don't know about the exact same idea or of the method used by Le Cun, but Dr. Sholom Weiss of Rutgers University (weiss at cs.rutgers.edu) has developed an efficient method for calculating the second derivatives using monte-carlo methods. The second derivatives are then used within a stiff differential equation solver to optimize the weights by solving the BP differential eqns directly. the results on different datasets (e.g. peterson-barney's vowel dataset, robinson's vowel dataset) are superior to other results not only in terms of training, but also in terms of generalization. --nitin From uli at ira.uka.de Fri Nov 8 12:36:02 1991 From: uli at ira.uka.de (Uli Bodenhausen) Date: Fri, 08 Nov 91 18:36:02 +0100 Subject: post NIPS workshop on speech Message-ID: ------------------------------------------------------------------- Optimization of Neural Network Architectures for Speech Recognition ------------------------------------------------------------------- Dec. 7, 1991, Vail, Colorado Uli Bodenhausen, Universitaet Karlsruhe Alex Waibel, Carnegie Mellon University A variety of neural network algorithms have recently been applied to speech recognition tasks. Besides having learning algorithms for weights, optimization of the network architectures is required to achieve good performance. Also of critical importance is the optimization of neural network architectures within hybrid systems for best performance of the system as a whole. Parameters that have to be optimized within these constraints include the number of hidden units, number of hidden layers, time-delays, connectivity within the network, input windows, the number of network modules, number of states and others. The workshop intends to discuss and evaluate the importance of these architectural parameters and different integration strategies for speech recognition systems. Participants are welcome to present short case studies on the optimization of neural networks, preferably with an evaluation of the optimization steps. It would also be nice to hear about some rather unconventional techniques of optimization (as long as its not vodoo or the 'shake the disk during compilation' type of technique). The workshop could also be of interest to researchers working on constructive/destructive learning algorithms because the relevance of different architectural parameters should be considered for the design of these algorithms. The following speakers have already confirmed their participation: Kenichi Iso, NEC Corporation, Japan Patrich Haffner, CNET, France Mike Franzini, Telefonica I + D, Spain Allen Gorin, AT&T, USA Yoshua Bengio, MIT ----------------------------------------------------------------------- Further contributions are welcome. Please send mail to uli at ira.uka.de or uli at speech2.cs.cmu.edu. ------------------------------------------------------------------------ From white at teetot.acusd.edu Fri Nov 8 13:02:34 1991 From: white at teetot.acusd.edu (Ray White) Date: Fri, 8 Nov 91 10:02:34 -0800 Subject: No subject Message-ID: <9111081802.AA12659@teetot.acusd.edu> In reply to: > Manoel Fernando Tenorio > -------- > How is that different from Sanger's principle component algorithm? (NIPS,90). > --ft. > Pls send answer to the net. (Where "that" refers to my 'Competitive Hebbian Learning', to be published in Neural Networks, 1992, in response to Yoshio Yamamoto.) The Sanger paper that I think of in this connection is the 'Neural Networks ' paper, T. Sanger (1989) Optimal unsupervised learning..., Neural Networks, 2, 459-473. There is certainly some relation, in that each is a modification of Hebbian learning. And I would think that one could also apply Sanger's algorithm to Yoshio Yamamoto's problem - training hidden units to ignore input components which are uncorrelated with the desired output. As I understand it, Sanger's 'Generalized Hebbian learning' trains units to find successively, the principle components of the input, starting with the most important and working on down, depending on the number of units you use. Competitive Hebbian Learning, on the other hand, is a simpler algorithm which trains units to learn simultaneously (approximately) orthogonal linear combinations of the components of the input. With this algorithm, one does not get the princple components nicely separated out, but one does get trained units of roughly equal importance. For those interested there is a shorter preliminary version of the paper in the Jordan Pollack's neuroprose archive, where it is called white.comp-hebb.ps.Z. Unfortunately that version does not include the Boolean application which Yoshio Yamamoto's query suggested. Ray White (white at teetot.acusd.edu) Depts. of Physics & Computer Science University of San Diego From shams at maxwell.hrl.hac.com Fri Nov 8 14:43:23 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Fri, 8 Nov 91 11:43:23 PST Subject: "real-world" applications of neural nets Message-ID: <9111081943.AA27842@maxwell.hrl.hac.com> Dear Connectionists, We are looking for "real-world" applications of neural networks to be used as benchmarks for evaluating the performance of our neurocomputer architecture. In particular we are looking for applications using greater than 200 neurons using a structured interconnection network where the network is made up of several smaller components. The primary goal of our research is to demonstrate the effectiveness of our architecture in implementing neural networks without the assumption of full interconnectivity between the neurons. Any references to published work or a brief description of a specific network structure would greatly be appreciated. Soheil Shams shams at maxwell.hrl.hac.com From announce at park.bu.edu Wed Nov 6 09:58:03 1991 From: announce at park.bu.edu (announce@park.bu.edu) Date: Wed, 6 Nov 91 09:58:03 -0500 Subject: Courses and Conference on Neural Networks, May 1992, Boston University Message-ID: <9111061458.AA17643@fenway.bu.edu> BOSTON UNIVERSITY NEURAL NETWORK COURSES AND CONFERENCE COURSE 1: INTRODUCTION AND FOUNDATIONS May 9 - 12, 1992 A systematic introductory course on neural networks. COURSE 2: RESEARCH AND APPLICATIONS May 12 - 14, 1992 Eight tutorials on current research and applications. CONFERENCE: NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL MAY 14 - 16, 1992 An international research conference presenting INVITED and CONTRIBUTED papers. Sponsored by Boston University's Wang Institute, Center for Adaptive Systems, and Department of Cognitive and Neural Systems, with partial support from the Air Force Office of Scientific Research. NEURAL NETWORK COURSES May 9 - 14, 1992 This self-contained, systematic, five-day course is based on the graduate curriculum in the technology, computation, mathematics, and biology of neural networks developed at the Center for Adaptive Systems (CAS) and the Department of Cognitive and Neural Systems (CNS) at Boston University. This year's curriculum refines and updates the successful course held at the Wang Institute of Boston University in May 1990 and 1991. A new two-course format permits both beginners and researchers to participate. The course will be taught by CAS/CNS faculty, as well as by distinguished guest lecturers at the beautiful and superbly equipped campus of the Wang Institute. An extraordinary range and depth of models, methods, and applications will be presented. Interaction with the lecturers and other participants will continue at the daily discussion sessions, meals, receptions, and coffee breaks that are included with registration. At the 1990 and 1991 courses, participants came from many countries and from all parts of the United States. Course Faculty from Boston University are Stephen Grossberg, Gail Carpenter, Ennio Mingolla, and Daniel Bullock. Course guest lecturers are John Daugman, Federico Faggin, Michael I. Jordan, Eric Schwartz, Alex Waibel, and Allen Waxman. COURSE 1 SCHEDULE SATURDAY, MAY 9, 1992 4:00 - 6:00 P.M. Registration 5:00 - 7:00 P.M. Reception SUNDAY, MAY 10, 1992 Professor Grossberg: Historical Overview, Cooperation and Competition, Content Addressable Memory, and Associative Learning. Professors Carpenter and Mingolla: Neocognitron, Perceptrons, and Introduction to Back Propagation. Professor Grossberg and Mingolla: Adaptive Pattern Recognition. MONDAY, MAY 11, 1992 Professor Grossberg: Introduction to Adaptive Resonance Theory. Professor Carpenter: ART 1, ART 2, and ART 3. Professors Grossberg and Mingolla: Vision and Image Processing. Professors Bullock and Grossberg: Adaptive Sensory-Motor Control and Robotics TUESDAY, MAY 12, 1992 Professor Bullock and Grossberg: Adaptive Sensory-Motor Control and Robotics, continued. Professor Grossberg: Speech Perception and Production, Reinforcement Learning and Prediction. End of Course 1 (12:30 P.M.) COURSE 2 SCHEDULE TUESDAY, MAY 12, 1992 11:30 A.M. - 1:30 P.M. Registration Professor Carpenter: Fuzzy Artmap. Dr. Waxman: Learning 3-D Objects from Temporal Sequences. WEDNESDAY, MAY 13, 1992 Professor Jordan: Recent Developments in Supervised Learning. Professor Waibel: Speech Recognition and Understanding. Professor Grossberg: Vision, Space, and Action. Professor Daugman: Signal Processing in Neural Networks. THURSDAY, MAY 14, 1992 Professor Schwartz: Active Vision. Dr. Faggin: Practical Implementation of Neural Networks. End of Course 2 (12:30 P.M.) RESEARCH CONFERENCE NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL MAY 14-16, 1992 This international research conference on topics of fundamental importance in science and technology will bring together leading experts from universities, government, and industry to present their results on learning, recognition, and control, in invited lectures and contributed posters. Topics range from cognitive science and neurobiology through computational modeling to technological applications. CALL FOR PAPERS: A featured poster session on neural network research related to learning, recognition, and control will be held on May 15, 1992. Attendees who wish to present a poster should submit three copies of an abstract (one single-spaced page), postmarked by March 1, 1992, for refereeing. Include a cover letter giving the name, address, and telephone number of the corresponding author. Mail to: Poster Session, Neural Networks Conference, Wang Institute of Boston University, 72 Tyng Road, Tyngsboro, MA 01879. Authors will be informed of abstract acceptance by March 31, 1992. A book of lecture and poster abstracts will be given to attendees at the conference. CONFERENCE PROGRAM THURSDAY, MAY 14, 1992 2:00 P.M. - 5:00 P.M. Registration 3:00 P.M. - 5:00 P.M. Reception Professor Richard Shiffrin, Indiana University: "The Relationship between Composition/Distribution and Forgetting" Professor Roger Ratcliff, Northwestern University: "Evaluating Memory Models" Professor David Rumelhart, Stanford University: "Learning and Generalization in a Connectionist Network" FRIDAY, MAY 15, 1992 Dr. Mortimer Mishkin, National Institute of Mental Health: "Two Cerebral Memory Systems" Professor Larry Squire, University of California, San Diego: "Brain Systems and the Structure of Memory" Professor Stephen Grossberg, Boston University, "Neural Dynamics of Adaptively Timed Learning and Memory" Professor Theodore Berger, University of Pittsburgh: "A Biological Neural Model for Learning and Memory" Professor Mark Bear, Brown University: "Mechanisms for Experience- Dependent Modification of Visual Cortex" Professor Gail Carpenter, Boston University: "Supervised Learning by Adaptive Resonance Networks" Dr. Allen Waxman, MIT Lincoln Laboratory: "Neural Networks for Mobile Robot Visual Navigation and Conditioning" Dr. Thomas Caudell, Boeing Company: "The Industrial Application of Neural Networks to Information Retrieval and Object Recognition at the Boeing Company" POSTER SESSION (Three hours) SATURDAY, MAY 16, 1992 Professor George Cybenko, University of Illinois: "The Impact of Memory Technology on Neurocomputing" Professor Eduardo Sontag, Rutgers University: "Some Mathematical Results on Feedforward Nets: Recognition and Control" Professor Roger Brockett, Harvard University: "A General Framework for Learning via Steepest Descent" Professor Barry Peterson, Northwestern University Medical School: "Approaches to Modeling a Plastic Vestibulo-ocular Reflex" Professor Daniel Bullock, Boston University: "Spino-Cerebellar Cooperation for Skilled Movement Execution" Dr. James Albus, National Institute of Standards and Technology: "A System Architecture for Learning, Recognition, and Control" Professor Kumpati Narendra, Yale University: "Adaptive Control of Nonlinear Systems Using Neural Networks" Dr. Robert Pap, Accurate Automation Company: "Neural Network Control of the NASA Space Shuttle Robot Arm" Discussion End of Research Conference (5:30 P.M.) HOW TO REGISTER: ...To register by telephone, call (508) 649-9731 (x 255) with VISA or Mastercard between 8:00-5:00 PM (EST). ...To register by fax, complete and fax back the Registration Form to (508) 649-6926. ...To register by mail, complete the registration form below and mail it with your payment as directed. ON-SITE REGISTRATION: Those who wish to register for the courses and the research conference on-site may do so on a space-available basis. SITE: The Wang Institute of Boston University possesses excellent conference facilities in a beautiful 220-acre setting. It is easily reached from Boston's Logan Airport and Route 128. REGISTRATION FORM: NEURAL NETWORKS COURSES AND CONFERENCE MAY 9-16, 1992 Name: ______________________________________________________________ Title: _____________________________________________________________ Organization: ______________________________________________________ Address: ___________________________________________________________ City: ____________________________ State: __________ Zip: __________ Country: ___________________________________________________________ Telephone: _______________________ FAX: ____________________________ Regular Attendee Full-time Student Course 1 ( ) $650 N/A Course 2 ( ) $650 N/A Courses 1 and 2 ( ) $985 ( )$275* Conference ( ) $110 ( ) $75* * Limited number of spaces. Student registrations must be received by April 15, 1992. Total payment:______________________________________________________ Form of payment: ( ) Check or money order (payable in U.S. dollars to Boston University). ( ) VISA ( ) Mastercard #_________________________________Exp.Date:__________________ Signature (as it appears on card): __________________________ Return to: Neural Networks Wang Institute of Boston University 72 Tyng Road Tyngsboro, MA 01879 YOUR REGISTRATION FEE INCLUDES: COURSES CONFERENCE Lectures Lectures Course notebooks Poster session Evening discussion sessions Book of lecture & poster Saturday reception abstracts Continental breakfasts Thursday reception Lunches Two continental breakfasts Dinners Two lunches Coffee breaks One dinner Coffee breaks CANCELLATION POLICY: Course fee, less $100, and the research conference fee, less $60, will be refunded upon receipt of a written request postmarked before March 31, 1992. After this date no refund will be made. Registrants who do not attend and who do not cancel in writing before March 31, 1991 are liable for the full amount of the registration fee. You must obtain a cancellation number from the registrar in order to make the cancellation valid. HOTEL RESERVATIONS: ...Sheraton Tara, Nashua, NH (603) 888-9970, $60/night, plus tax (single or double). ...Best Western, Nashua, NH (603) 888-1200, $44/night, single, plus tax, $49/night, double, plus tax. ...Stonehedge Inn, Tyngsboro, MA, (508) 649-4342, $84/night, plus tax (single or double). The special conference rate applies only if you mention the name and dates of the meeting when making the reservation. The hotels in Nashua are located approximately five miles from the Wang Institute; shuttle bus service will be provided to them. AIRLINE DISCOUNTS: American Airlines is the official airline for Neural Networks. Receive 45% off full fare with at least seven days advance purchase or 5% off discount fares. A 35% discount applies on full fare flights from Canada with an advance purchase of at least seven days. Call American Airlines Meeting Services Desk at (800) 433-1790 and be sure to reference STAR#SO252AM. Some restrictions apply. STUDENT REGISTRATION: A limited number of spaces at the courses and conference have been reserved at a subsidized rate for full-time students. These spaces will be assigned on a first-come, first-served basis. Completed registration form and payment for students who wish to be considered for the reduced student rates must be received by April 15, 1992. From steve at cogsci.edinburgh.ac.uk Fri Nov 8 13:34:00 1991 From: steve at cogsci.edinburgh.ac.uk (Steve Finch) Date: Fri, 08 Nov 91 18:34:00 +0000 Subject: Announcement of paper on learning syntactic categories. Message-ID: <16311.9111081834@scott.cogsci.ed.ac.uk> I have submitted a copy of a paper Nick Chater and I have written, to the neuroprose archive. It details a hybrid system comprising a statistically motivated network and a symbolic clustering mechanism which together automatically classify words into a syntactic hierachy by imposing a similarity metric over the contexts in which they are observed to have occured in USENET newsgroup articles. The resulting categories are very linguistically intuitive. The abstract follows: Symbolic and neural network architectures differ with respect to the representations they naturally handle. Typically, symbolic systems use trees, DAGs, lists and so on, whereas networks typically use high dimensional vector spaces. Network learning methods may therefore appear to be inappropriate in domains, such as natural language, which are naturally modelled using symbolic methods. One reaction is to argue that network methods are able to {\it implicitly} capture this symbolic structure, thus obviating the need for explicit symbolic representation. However, we argue that the {\it explicit} representation of symbolic structure is an important goal, and can be learned using a hybrid approach, in which statistical structure extracted by a network is transformed into a symbolic representation. We apply this approach at several levels of linguistic structure, using as input unlabelled orthographic, phonological and word-level strings. We derive linguistically interesting categories such as `noun', `verb', `preposition', and so on from unlabeled text. To get it by anonymous ftp type ftp archive.cis.ohio-state.edu when asked for login name type anonymous; when asked for password type neuron. Then type cd pub/neuroprose binary get finch.hybrid.ps.Z quit Then uncompress it and lpr it. ------------------------------------------------------------------------------ Steven Finch Phone: +44 31 650 4435 | University of Edinburgh From hwang at pierce.ee.washington.edu Sun Nov 10 14:14:44 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Sun, 10 Nov 91 11:14:44 PST Subject: IJCNN'91 Singapore Advanced Program Message-ID: <9111101914.AA08065@pierce.ee.washington.edu.> For those of you who didn't receive the Advanced Program of IJCNN'91 Singapore. Here is a brief summary of the program. Jenq-Neng Hwang, Publicity/Technical Committee IJCNN'91 ---------------------------------------------------------- IJCNN'91 Singapore Preliminary Program Overview: Sunday, 11/17/91 4:00 pm -- 7:00 pm Registration Monday, 11/18/91 8:00 am -- 5:30 pm Registration 9:00 am -- 4:00 pm Tutorials 1) Weightless Neural Nets (NNs) 2) Neural Computation: From Brain Research to Novel Computers 3) Fuzzy Logic & Computational NNs 4) Neural Computing & Pattern Recognition 5) Morphology of Biological Vision 6) Cancelled 7) Successful NN Parallel Computing 8) A Logical Topology of NN Tuesday, 11/19/91 7:30 am -- 5:30 pm Registration 8:00 am -- 9:00 am Plenary Session (T. Kohonen) 9:15 am -- 12.15 pm Technical Sessions 1) Associative Memory (I) 2) Neurocognition (I) 3) Hybrid Systems (I) 4) Supervised Learning (I) 5) Applications (I) 6) Image Processing/Maths (Poster 1) 1:15 pm -- 3.15 pm Technical Sessions 1) Neurophysiology/Invertebrate 2) Sensation and Perception 3) Hybrid Systems (II) 4) Supervised Learning (II) 5) Applicatiions (II) 6) Supervised Learning (Poster 2) 3:30 pm -- 6:00 pm Technical Sessions 1) Electrical Neurocomputer (I) 2) Image Processing (I) 3) Hybrid Systems (III) 4) Supervised learning (III) 5) Applicatioins (III) 6:00 pm -- 7:30 pm Panel Discussion (G. Deboeck): Financial Applicatiions of NNs Wednesday, 11/20/91 7:00 am -- 5:30 pm Registration 8:00 am -- 9:00 am Plenary Session (Y. Nishikawa) 9:15 am -- 12.15 pm Technical Sessions 1) Optimization (I) 2) Image Processing (II) 3) Robotics (I) 4) Supervised Learning (IV) 5) Applications (IV) 6) Applications (Poster 3) 1:15 pm -- 3.15 pm Technical Sessions 1) Mathematical Methods (I) 2) Machine Vision 3) Sensorimotor Control Systems 4) Supervised Learning (V) 5) Applicatiions (V) 6) Robotics (Poster 4) 3:30 pm -- 6:00 pm Technical Sessions 1) Neurocomputer/Associative Memory 2) Neurocognition (II) 3) Unsupervised Learning (II) 4) Supervised Learning (VI) 5) Applicatioins (VI) 5:00 pm -- 6:30 pm Industrial Panel (Tom Caudell) 7:00 pm -- 10:00 pm IJCNN'91 Banquet Thursday, 11/21/91 7:30 am -- 12:15 pm Registration 8:00 am -- 9:00 am Plenary Session (K. S. Narendra) 9:15 am -- 12.15 pm Technical Sessions 1) Electrical Neurocomputer (II) 2) Neuro-Dynamics (I) 3) Robotics (II) 4) Supervised Learning (VII) 5) Applications (VII) 6) Neurocomputers (Poster 5) 1:15 pm -- 3.15 pm Technical Sessions 1) Associative Memory 2) Mathematical Methods (II) 3) Neuro-Dynamics (II) 4) Supervised/Unsupervised Learning 5) Applicatiions (VIII) 6) Optimization/Associative Memory (Poster 6) 3:30 pm -- 6:00 pm Technical Sessions 1) Optimization (II) 2) Machine Vision (II) 3) mathematical Methods (III) 4) Unsupervised Learning (III) 5) Mathematical Methods/Supervised Learning Welcome Reception: All speakers, authors, delegates, including students, and one-day registrants are invited to the "Welcome Receptiion" on the 18th November. Full details will included in the Final Program Conference Registration: Members Non-Members Students (no Proc.) US$ $240 $280 $100 Before 8/31/91 US$ $280 $330 $120 After 8/31/91 US$ $330 $380 $140 On Site Tutorial Registration: Only for Registered Conference Participants Registration Fee Per Tutorial US$ $120 Pre-Register US$ $140 On Site US$ $30 Students Conference Proceedings: Additional copies of the proceedings are available at the Conference at US$100.00 per set. Rates do not include postage and handling charges. Travel Information: Please Contact TRADEWINDS PTE LTD 77 Robinson Road, #02-06 SIA Building Singapore 0106 TEL: (65) 322-6845, FAX: (65) 224-1198 Attn: Ms Julie Gan Banquet Information: 7:00 pm, 20th November 1991 Westin Stamford/Westin Plaza An excellent 9 course Chinese Dinner will be served Additional ticket: US$42.00 from IJCNN'91 Secretariat. From carlos at dove.caltech.edu Sun Nov 10 18:10:40 1991 From: carlos at dove.caltech.edu (Carlos Brody-Pellicer) Date: Sun, 10 Nov 91 15:10:40 PST Subject: Roommate for NIPS Message-ID: <9111102310.AA07033@dove.caltech.edu> Would anybody out there like to share a room for NIPS? I'm going to arrive on Saturday and leave on Thursday (but I would expect us to share costs Sun-Thurs only). Please let me know if you are interested. -Carlos (carlos at dove.caltech.edu) From pollack at cis.ohio-state.edu Mon Nov 11 16:25:52 1991 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Mon, 11 Nov 91 16:25:52 -0500 Subject: POSTNIPS cognitive science workshop Message-ID: <9111112125.AA11761@dendrite.cis.ohio-state.edu> A neuro-engineering friend claimed that cognitive scientists feared "steep gradient descents," and THAT'S why they didn't come to the post-NIPS workshops! I countered that there were just no topics which made strong enough attractors... ------------------------------------------------------ Modularity in Connectionist Models of Cognition Friday November 6th, Vectorized AI Laboratory, Colorado ------------------------------------------------------ Organizer: Jordan Pollack Confirmed Speakers: Michael Mozer Robert Jacobs John Barnden Rik Belew (There is room for a few more people to have confirmed 15 minute slots, but half of the workshop is reserved for open discussion.) ABSTRACT: Classical modular theories of mind presume mental "organs" - function specific, put in place by evolution - which communicate in a symbolic language of thought. In the 1980's, Connectionists radically rejected this view in favor of more integrated architectures, uniform learning systems which would be very tightly coupled and communicate through many feedforward and feedback connections. However, as connectionist attempts at cognitive modeling have gotten more ambitious, ad-hoc modular structuring has become more prevalent. But there are concerns regarding how much architectural bias is allowable. There has been a flurry of work on resolving these concerns by seeking the principles by which modularity could arise in connectionist architectures. This will involve solving several major problems - data decomposition, structural credit assignment, and shared adaptive representations. This workshop will bring together proponents and opponents of Modular Connectionist Architectures to discuss research direction, recent progress, and long-term challenges. ------------------------ Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Phone: (614)292-4890 (then * to fax) From gary at cs.UCSD.EDU Mon Nov 11 13:52:50 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Mon, 11 Nov 91 10:52:50 PST Subject: principal components Message-ID: <9111111852.AA26216@desi.ucsd.edu> In reply to: >Date: Fri, 8 Nov 91 10:02:34 -0800 >From: Ray White >To: Connectionists at CS.CMU.EDU > >(Where "that" refers to my 'Competitive Hebbian Learning', to be published >in Neural Networks, 1992, in response to Yoshio Yamamoto.) > >The Sanger paper that I think of in this connection is the 'Neural Networks ' >paper, T. Sanger (1989) Optimal unsupervised learning..., Neural Networks, 2, >459-473. >As I understand it, Sanger's 'Generalized Hebbian learning' trains units >to find successively, the principle components of the input, starting with >the most important and working on down, depending on the number of units >you use. > >Competitive Hebbian Learning, on the other hand, is a >simpler algorithm which trains units to learn simultaneously (approximately) >orthogonal linear combinations of the components of the input. With this >algorithm, one does not get the princple components nicely separated out, >but one does get trained units of roughly equal importance. > Ray White (white at teetot.acusd.edu) > Depts. of Physics & Computer Science > University of San Diego > Back prop when used with linear nets, does just this also. Since the optimal technique is PCA in the linear case with a quadratic cost function, bp is just a way of directly performing this and is not an improvement over Karhunen-Loeve (except, perhaps in being space efficient). More recently, Mathilde Mougeot has used the fact that bp is doing PCA to discover a fast algorithm for the quadratic case, and she also has shown that bp can be effectively used for other norms. References: Baldi & Hornik, (1988) Neural Networks and Principal Components Analysis: Learning from examples without local minima. Neural Networks, Vol 2, No 1. Cottrell, G.W. and Munro, P. (1988) Principal components analysis of images via back propagation. Invited paper in \fIProceedings of the Society of Photo-Optical Instrumentation Engineers\fP, Cambridge, MA. Mougeot, M., Azencott, R. & Angeniol, B. (1991) Image compression with back propagation: Improvement of the visual restoration using different cost functions. Neural Networks vol 4. number 4 pp 467-476. onward and upward, Gary Cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering 0114 University of California San Diego La Jolla, Ca. 92093 gary at cs.ucsd.edu (INTERNET) gcottrell at ucsd.edu (BITNET, almost anything) From rob at tlon.mit.edu Mon Nov 11 17:04:13 1991 From: rob at tlon.mit.edu (Rob Sanner) Date: Mon, 11 Nov 91 17:04:13 EST Subject: MIT NSL Reports on Adaptive Neurocontrol Message-ID: The following are the titles and abstracts of three reports we have uploaded to the neuroprose archive. Due to a large number of recent requests for hardcopy reprints, these reports have now been made available electronically. They can also be obtained (under their NSL reference number) by anonymous ftp at tlon.mit.edu in the pub directory. These reports describe the results of research conducted at the MIT Nonlinear Systems Laboratory during the past year into algorithms for the stable adaptive tracking control of nonlinear systems using gaussian radial basis function networks. These papers are potentially interesting to researchers in both adaptive control and neural network theory. The research described starts by quantifying the relation between the network size and weights and the degree of uniform approximation accuracy a trained network can guarantee. On this basis, it develops a _constructive_ procedure for networks which ensures the required accuracy. These constructions are then exploited for the design of stable adaptive controllers for nonlinear systems. Any comments would be greatly appreciated and can be sent to either rob at tlon.mit.edu or jjs at athena.mit.edu. Robert M. Sanner and Jean-Jacques E. Slotine ------------------------------------------------------------------------------ on neuroprose: sanner.adcontrol_9103.ps.Z (NSL-910303, March 1991) Also appears: Proc. American Control Conference, June 1991. Direct Adaptive Control Using Gaussian Networks Robert M. Sanner and Jean-Jacques E. Slotine Abstract: A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous-time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible. The architecture employs a network of gaussian radial basis functions to adaptively compensate for the plant nonlinearities. Under mild assumptions about the degree of smoothness exhibited by the nonlinear functions, the algorithm is proven to be stable, with tracking errors converging to a neighborhood of zero. A constructive procedure is detailed, which directly translates the assumed smoothness properties of the nonlinearities involved into a specification of the network required to represent the plant to a chosen degree of accuracy. A stable weight adjustment mechanism is then determined using Lyapunov theory. The network construction and performance of the resulting controller are illustrated through simulations with an example system. ----------------------------------------------------------------------------- on neuroprose: sanner.adcontrol_9105.ps.Z (NSL-910503, May 1991) Gaussian Networks for Direct Adaptive Control Robert M. Sanner and Jean-Jacques E. Slotine Abstract: This report is a complete and formal exploration of the ideas originally presented in NSL-910303; as such it contains most of NSL-910303 as a subset. We detail a constructive procedure for a class of neural networks which can approximate to a prescribed accuracy the functions required for satisfaction of the control objectives. Since this approximation can be maintained only over a finite subset of the plant state space, to ensure global stability it is necessary to introduce an additional component into the control law, which is capable of stabilizing the dynamics as the neural approximation degrades. To unify these components into a single control law, we propose a novel technique of smoothly blending the two modes to provide a continuous transition from adaptive operation in the region of validity of the network approximation, to a nonadaptive operation in the regions where this approximation is inaccurate. Stable adaptation mechanisms are then developed using Lyapunov stability theory. Section 2 describes the setting of the control problem to be examined and illustrates the structure of conventional adaptive methods for its solution. Section 3 introduces the use of multivariable Fourier analysis and sampling theory as a method of translating assumed smoothness properties of the plant nonlinearities into a representation capable of uniformly approximating the plant over a compact set. This section then discusses the conditions under which these representations can be mapped onto a neural network with a finite number of components. Section 4 illustrates how these networks may be used as elements of an adaptive tracking control algorithm for a class of nonlinear systems, which will guarantee convergence of the tracking errors to a neighborhood of zero. Section 5 illustrates the method with two examples, and finally, Section 6 closes with some general observations about the proposed controller. ----------------------------------------------------------------------------- on neuroprose: sanner.adcontrol_9109.ps.Z (NSL-910901, Sept. 1991) To appear: IEEE Conf. on Decision and Control, Dec. 1991. Stable Adaptive Control and Recursive Identification Using Radial Gaussian Networks Robert M. Sanner and Jean-Jacques E. Slotine Abstract: Previous work has provided the theoretical foundations of a constructive design procedure for uniform approximation of smooth functions to a chosen degree of accuracy using networks of gaussian radial basis functions. This construction and the guaranteed uniform bounds were then shown to provide the basis for stable adaptive neurocontrol algorithms for a class of nonlinear plants. This paper details and extends these ideas in three directions: first some practical details of the construction are provided, explicitly illustrating the relation between the free parameters in the network design and the degree of approximation error on a particular set. Next, the original adaptive control algorithm is modified to permit incorporation of additional prior knowledge of the system dynamics, allowing the neurocontroller to operate in parallel with conventional fixed or adaptive controllers. Finally, it is shown how the gaussian network construction may also be utilized in recursive identification algorithms with similar guarantees of stability and convergence. The identification algorithm is evaluated on a chaotic time series and demonstrates the predicted convergence properties. From owens at eplrx7.es.duPont.com Tue Nov 12 12:03:25 1991 From: owens at eplrx7.es.duPont.com (Aaron Owens) Date: Tue, 12 Nov 1991 17:03:25 GMT Subject: second derivatives and the back propagation network References: <1991Nov10.133414.11341@eplrx7.es.duPont.com> Message-ID: <1991Nov12.170325.28731@eplrx7.es.duPont.com> RE: Second Derivatives and Stiff ODEs for Back Prop Training Several threads in this newsgroup recently have mentioned the use of second derivative information (i.e., the Hessian or Jacobian matrix) and/or stiff ordinary differential equations [ODEs] in the training of the back propagation network [BPN]. [-- Aside: Stiff differential equation solvers derive their speed and accuracy by specifically utilizing the information contained in the second-derivative Jacobian matrix. -- ] This is to confirm our experience that training the BPN using second-derivative methods in general, and stiff ODE solvers in particular, is extremely fast and efficient for problems which are small enough (i.e., up to about 1000 connection weights) to allow the Jacobian matrix [size = (number of weights)**2] to be stored in the computer's real memory. "Stiff" backprop is particularly well-suited to real-valued function mappings in which a high degree of accuracy is required. We have been using this method successfully in most of our production applications for several years. See the abtracts below of a paper presented at the 1989 IJCNN in Washington and of a recently-issued U. S. patent. It is possible -- and desirable -- to use the back error propagation methodology (i.e., the chain rule of calculus) to explicitly compute the second derivative of the sum_of_squared_prediction_error with respect to the weights (i.e., the Jacobian matrix) analytically. Using an analytic Jacobian, rather than computing the second derivatives numerically [or -- an UNVERIFIED personal hypothesis -- stochastically], increases the algorithm's speed and accuracy significantly. -- Aaron -- Aaron J. Owens Du Pont Neural Network Technology Center P. O. B. 80357 Wilmington, DE 19880-0357 Telephone Numbers: Office (302) 695-7341 (Phone & FAX) Home " 738-5413 Internet: owens at esvax.dnet.dupont.com ---------- IJCNN '89 paper abstract ------------ EFFICIENT TRAINING OF THE BACK PROPAGATION NETWORK BY SOLVING A SYSTEM OF STIFF ORDINARY DIFFERENTIAL EQUATIONS A. J. Owens and D. L. Filkin Central Research and Development Department P. O. Box 80320 E. I. du Pont de Nemours and Company (Inc.) Wilmington, DE 19880-0320 International Joint Conference on Neural Networks June 19-22, 1989, Washington, DC Volume II, pp. 381-386 Abstract. The training of back propagation networks involves adjusting the weights between the computing nodes in the artificial neural network to minimize the errors between the network's predictions and the known outputs in the training set. This least-squares minimization problem is conventionally solved by an iterative fixed-step technique, using gradient descent, which occa- sionally exhibits instabilities and converges slowly. We show that the training of the back propagation network can be expressed as a problem of solving coupled ordinary differential equations for the weights as a (continuous) function of time. These differential equations are usually mathematically stiff. The use of a stiff differential equation solver ensures quick convergence to the nearest least-squares minimum. Training proceeds at a rapidly accelerating rate as the accuracy of the predictions increases, in contrast with gradient descent and conjugate gradient methods. The number of presentations required for accurate training is reduced by up to several orders of magnitude over the conventional method. ---------- U. S. Patent No. 5,046,020 abstract ---------- DISTRIBUTED PARALLEL PROCESSING NETWORK WHEREIN THE CONNECTION WEIGHTS ARE GENERATED USING STIFF DIFFERENTIAL EQUATIONS Inventor: David L. Filkin Assignee: E. I. du Pont de Nemours and Company U. S. Patent Number 5,046,020 Sep. 3, 1991 Abstract. A parallel distributed processing network of the back propagation type is disclosed in which the weights of connection between processing elements in the various layers of the network are determined in accordance with the set of steady solutions of the stiff differential equations governing the relationship between the layers of the network. From tds at ai.mit.edu Tue Nov 12 17:54:54 1991 From: tds at ai.mit.edu (Terence D. Sanger) Date: Tue, 12 Nov 91 17:54:54 EST Subject: Algorithms for Principal Components Analysis In-Reply-To: Ray White's message of Fri, 8 Nov 91 10:02:34 -0800 <9111081802.AA12659@teetot.acusd.edu> Message-ID: <9111122254.AA11404@cauda-equina> Ray, Over the past few years there has been a great deal of interest in recursive algorithms for finding eigenvectors or linear combinations of them. Many of these algorithms are based on the Oja rule (1982) with modifications to find more than a single output. As might be expected, so many people working on a single type of algorithm has led to a certain amount of duplication of effort. Following is a list of the papers I know about, which I'm sure is incomplete. Anyone else working on this topic should feel free to add to this list! Cheers, Terry Sanger @article{sang89a, author="Terence David Sanger", title="Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network", year=1989, journal="Neural Networks", volume=2, pages="459--473"} @incollection{sang89c, author="Terence David Sanger", title="An Optimality Principle for Unsupervised Learning", year=1989, pages="11--19", booktitle="Advances in Neural Information Processing Systems 1", editor="David S. Touretzky", publisher="Morgan Kaufmann", address="San Mateo, {CA}", note="Proc. {NIPS'88}, Denver"} @article{sang89d, author="Terence David Sanger", title="Analysis of the Two-Dimensional Receptive Fields Learned by the Generalized {Hebbian} Algorithm in Response to Random Input", year=1990, journal="Biological Cybernetics", volume=63, pages="221--228"} @misc{sang90c, author="Terence D. Sanger", title="Optimal Hidden Units for Two-layer Nonlinear Feedforward Neural Networks", year=1991, note="{\it Int. J. Pattern Recognition and AI}, in press"} @inproceedings{broc89, author="Roger W. Brockett", title="Dynamical Systems that Sort Lists, Diagonalize Matrices, and Solve Linear Programming Problems", booktitle="Proc. 1988 {IEEE} Conference on Decision and Control", publisher="{IEEE}", address="New York", pages="799--803", year=1988} @ARTICLE{rubn90, AUTHOR = {J. Rubner and K. Schulten}, TITLE = {Development of Feature Detectors by Self-Organization}, JOURNAL = {Biol. Cybern.}, YEAR = {1990}, VOLUME = {62}, PAGES = {193--199} } @INCOLLECTION{krog90, AUTHOR = {Anders Krogh and John A. Hertz}, TITLE = {Hebbian Learning of Principal Components}, BOOKTITLE = {Parallel Processing in Neural Systems and Computers}, PUBLISHER = {Elsevier Science Publishers B.V.}, YEAR = {1990}, EDITOR = {R. Eckmiller and G. Hartmann and G. Hauske}, PAGES = {183--186}, ADDRESS = {North-Holland} } @INPROCEEDINGS{fold89, AUTHOR = {Peter Foldiak}, TITLE = {Adaptive Network for Optimal Linear Feature Extraction}, BOOKTITLE = {Proc. {IJCNN}}, YEAR = {1989}, PAGES = {401--406}, ORGANIZATION = {{IEEE/INNS}}, ADDRESS = {Washington, D.C.}, MONTH = {June} } @MISC{kung90, AUTHOR = {S. Y. Kung}, TITLE = {Neural networks for Extracting Constrained Principal Components}, YEAR = {1990}, NOTE = {submitted to {\it IEEE Trans. Neural Networks}} } @article{oja85, author="Erkki Oja and Juha Karhunen", title="On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix", journal="J. Math. Analysis and Appl.", volume=106, pages="69--84", year=1985} @book{oja83, author="Erkki Oja", title="Subspace Methods of Pattern Recognition", publisher="Research Studies Press", address="Letchworth, Hertfordshire UK", year=1983} @inproceedings{karh84b, author="Juha Karhunen", title="Adaptive Algorithms for Estimating Eigenvectors of Correlation Type Matrices", booktitle="{Proc. 1984 {IEEE} Int. Conf. on Acoustics, Speech, and Signal Processing}", publisher="{IEEE} Press", address="Piscataway, {NJ}", year=1984, pages="14.6.1--14.6.4"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @misc{ogaw86, author = "Hidemitsu Ogawa and Erkki Oja", title = "Can we Solve the Continuous Karhunen-Loeve Eigenproblem from Discrete Data?", note = "Proc. {IEEE} Eighth International Conference on Pattern Recognition, Paris", year = "1986"} @article{leen91, author = "Todd K Leen", title = "Dynamics of learning in linear feature-discovery networks", journal = "Network", volume = 2, year = "1991", pages = "85--105"} @incollection{silv91, author = "Fernando M. Silva and Luis B. Almeida", title = "A Distributed Decorrelation Algorithm", booktitle = "Neural Networks, Advances and Applications", editor = "Erol Gelenbe", publisher = "North-Holland", year = "1991", note = "to appear"} From gluck at pavlov.Rutgers.EDU Wed Nov 13 09:29:58 1991 From: gluck at pavlov.Rutgers.EDU (Mark Gluck) Date: Wed, 13 Nov 91 09:29:58 EST Subject: Adding noise to training -- A psychological perspective (Preprint) Message-ID: <9111131429.AA02765@pavlov.rutgers.edu> In a recent paper we have discussed the role of stochastic noise in training data for adaptive network models of human classification learning. We have shown how the incorporation of such noise (usually modelled as a stochastic sampling process on the external stimuli) improves generalization performance, especially with deterministic discriminations which underconstrain the set of possible solution-weights. The addition of noise to the training biases the network to find solutions (and generalizations) which more closely correspond to the behavior of humans in psychological experiments. The reference is: Gluck, M. A. (1991,in press). Stimulus sampling and distributed representations in adaptive network theories of learning. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), Festschrift for W. K. Estes. New Jersey: Lawrence Erlbaum Associates. Copies can be received by emailing to: ______________________________________________________________________ Dr. Mark A. Gluck Center for Molecular & Behavioral Neuroscience Rutgers University 197 University Ave. Newark, New Jersey 07102 Phone: (201) 648-1080 (Ext. 3221) Fax: (201) 648-1272 Email: gluck at pavlov.rutgers.edu From jb at s1.gov Wed Nov 13 12:27:42 1991 From: jb at s1.gov (jb@s1.gov) Date: Wed, 13 Nov 91 09:27:42 PST Subject: Research Positions in NN/Speech Rec. Message-ID: <9111131727.AA03952@havana.s1.gov> A friend of mine asked me to post the following announcement of research positions. Please do not reply to this posting since I have nothing to do with the selection process, Joachim Buhmann ----------------------------- cut here ---------------------------------------- * Please post *** Please post *** Please post *** Please post *** Please post * --------------------------------------------------------------------------- POSITIONS IN NEURAL NETWORKS / SPEECH RECOGNITION AVAILABLE --------------------------------------------------------------------------- Technical University of Munich, Germany Physics Department, T35 The theoretical biopysics group of Dr. Paul Tavan offers research positions in the field of neural networks and speech recognition. The positions are funded by the German Federal Department of Research and Technology (BMFT) for a period of at least three years, starting in January 1992. Salaries are paid according to the tariffs for federal employees. Position offered include a postdoctoral fellow (BAT Ib) and positions for graduate students (BAT IIa/2). The latter includes the opportunity of pursuing a Ph.D. in physics. The project is part of a larger project aiming towards the development of neural algorithms and architectures for the transformation of continuous speech into symbolic code. It will be pursued in close cooperation with other german research groups working on different aspects of the problem. Within this cooperation our group is responsible for the definition and extraction of appropriate features and their optimal representation based on selforganizing algorithms and methods of statistical pattern recognition. Higher-level processing, e.g., the incorporation of semantic knowledge is *not* the central issue of our subproject. The postdoctoral position involves coordination and organization of the research projects within the group and the overall project. Applicants for this position therefore should have broad experience in the areas of neural networks and/or speech recognition. Applicants for the other positions should hold a masters degree in physics. Experience in the field of neural networks/pattern recognition would be valuable but is not necessarly required. Profound background in mathematics however is. All applicants should have programming experience in a workstation environment and should most of all be able to perform project oriented research in close teamwork with other group members. University policy requires all applicants to spend a limited amount of time on teaching activities at the physics-department of the TUM besides their research. Applications should be sent before December 1st, 1991 and include - a curriculum vitae, - sample publications, technical reports, thesis reprints etc. and - a brief description of the main fields of interest in the research of the applicant. Please direct application material or requests for further information to: Hans Kuehnel Physik-Department, T35 Ehem. Bauamt Boltzmannstr. W-8046 Garching GERMANY Phone: +49-89-3209-3766 Email: kuehnel at physik.tu-muenchen.de From BATTITI at ITNVAX.CINECA.IT Thu Nov 14 10:01:00 1991 From: BATTITI at ITNVAX.CINECA.IT (BATTITI@ITNVAX.CINECA.IT) Date: Thu, 14 NOV 91 10:01 N Subject: paper on 2nd order methods in Neuroprose Message-ID: <2274@ITNVAX.CINECA.IT> A new paper is available from the Neuroprose directory. FILE: battiti.second.ps.Z (ftp binary, uncompress, lpr (PostScript)) TITLE: "First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method" AUTHOR: Roberto Battiti ABSTRACT: On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice.This paper reviews first- and second-order optimization methods for learning in feed-forward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimiza- tion techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations. PS: the paper will be published in Neural Computation. PPSS: comments and/or new results welcome. ====================================================================== | | | | Roberto Battiti | e-mail: battiti at itnvax.cineca.it | | Dipartimento di Matematica | tel: (+39) - 461 - 88 - 1639 | | 38050 Povo (Trento) - ITALY | fax: (+39) - 461 - 88 - 1624 | | | | ====================================================================== From fcr at neura.inesc.pt Thu Nov 14 05:15:47 1991 From: fcr at neura.inesc.pt (Fernando Corte Real) Date: Thu, 14 Nov 91 10:15:47 GMT Subject: Algorithms for Principal Components Analysis In-Reply-To: "Terence D. Sanger"'s message of Tue, 12 Nov 91 17:54:54 EST <9111122254.AA11404@cauda-equina> Message-ID: <9111141015.AA29833@neura.inesc.pt> Just a short correction to Sanger's bibliographic list on compettitive Hebbian and related algorithms: The last work listed there was already published a few months ago. A shorter version can be found in [1] F.M. Silva, L. B. Almeida, "A Distributed Solution for Data Orthonormalization", in Proc. ICANN, Helsinki, 1991. The following reference may also be added to Sanger's list: [2] R. Williams, "Feature Discovery Trough Error-Correction Learning", ICS Report 8501, University of California, San Diego, 1985 Since there has been so much discussion in the net about 2nd. order algorithms, an aplication of [1] to the improvement of the so-called "diagonal" 2nd. order algorithms can be found in the following reference: [3] F.M. Silva, L. B. Almeida, "Speeding-Up Backpropagation by Data Orthonormalization", in Proc. ICANN, Helsinki, 1991. From LWCHAN at cucsd.cs.cuhk.hk Thu Nov 14 07:55:00 1991 From: LWCHAN at cucsd.cs.cuhk.hk (LAI-WAN CHAN) Date: Thu, 14 Nov 1991 20:55 +0800 Subject: research position available Message-ID: <4CF1F3A14040013E@CUCSD.CUHK.HK> Department of Electronic Engineering and Department of Computer Science Faculty of Engineering The Chinese University of Hong Kong Research Assistantship in Speech Technology We are seeking for a young, self-motivated and high calibre researcher to fill the above position. Applicants should have a degree in Electronic Engineering or Computer Science, with knowledge of speech processing and/or neural networks and preferably with programming experience in C under UNIX environment. A higher degree in relevant field will be an advantage. The appointee will join the Speech Processing Group and will contribute to an ongoing research effort focusing on R&D of speech recognition and synthesis technology. The Speech Processing Group within the Faculty of Engineering is very well equipped and has a strong background in theoretical and experimental aspects of speech processing and recognition and has recently been awarded a research grant by the Croucher Foundation. Computing facilities available for research in the Faculty consist of more than 150 DECstations, SPARCstations, SGI as well as numerous 386PCs. They are fully networked and are linked to both the Bitnet and Internet. There are several application software packages for speech and neural networks projects which include COMDISCO, MONARCH, ILS, Hypersignal Plus and NeuroExplorer. Also available is a quiet room, a KAY Computerized Speech Lab Model 4300, D/A and A/D converters as well as front end audio equipment for real time speech processing. Appointment for the successful applicant will be made on a 1-year contract initially and might be renewable for another year subject to satisfactory performance. The appointee might also register in our Ph.D. programme. The starting salary for degree holder will be approximately HKD$9,400 per month (1 USD = 7.8 HKD) or above depending on experience and qualification. Enquiries and/or applications should be directed to Dr. P.C. Ching, Department of Electronic Engineering, (Tel: 6096380, FAX: 6035558, e-mail: pcching at cuele.cuhk.hk) or Dr. L.W. Chan, Department of Computer Science, (Tel: 6098865, FAX: 6035024, e-mail: lwchan at cucsd.cuhk.hk), The Chinese University of Hong Kong, Shatin, Hong Kong. The deadline for the application is December 31, 1991. I will go to IJCNN-91-Singapore next week. If you are interested in this position, you can talk to me directly in IJCNN. Lai-Wan Chan, Computer Science Dept, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. Email : lwchan at cucsd.cuhk.hk Tel : (+852) 609 8865 FAX : (+852) 603 5024 From David_Plaut at K.GP.CS.CMU.EDU Thu Nov 14 09:58:02 1991 From: David_Plaut at K.GP.CS.CMU.EDU (David_Plaut@K.GP.CS.CMU.EDU) Date: Thu, 14 Nov 91 09:58:02 EST Subject: preprints and thesis TR available Message-ID: <11445.690130682@K.GP.CS.CMU.EDU> I've placed two papers in the neuroprose archive. Instructions for retrieving them are at the end of the message. (Thanks again to Jordan Pollack for maintaining the archive.) The first (plaut.thesis-summary.ps.Z) is a 15 page summary of my thesis, entitled "Connectionist Neuropsychology: The Breakdown and Recovery of Behavior in Lesioned Attractor Networks" (abstract below). For people who want more detail, the second paper (plaut.dyslexia.ps.Z) is a 119 page TR, co-authored with Tim Shallice, that presents a systematic analysis of work by Hinton & Shallice on modeling deep dyslexia, extending the approach to a more comprehensive account of the syndrome. FTP'ers should be forewarned that the file is about 0.5 Mbytes compressed, 1.8 Mbytes uncompressed. For true die-hards, the full thesis (325 pages) is available as CMU-CS-91-185 from Computer Science Documentation School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3890 reports at cs.cmu.edu To defray printing/mailing costs, requests for the thesis TR must be accompanied by a check or money order for US$10 (domestic) or US$15 (overseas) payable to "Carnegie Mellon University." Enjoy, -Dave Connectionist Neuropsychology: The Breakdown and Recovery of Behavior in Lesioned Attractor Networks David C. Plaut An often-cited advantage of connectionist networks is that they degrade gracefully under damage. Most demonstrations of the effects of damage and subsequent relearning in these networks have only looked at very general measures of performance. More recent studies suggest that damage in connectionist networks can reproduce the specific patterns of behavior of patients with neurological damage, supporting the claim that these networks provide insight into the neural implementation of cognitive processes. However, the existing demonstrations are not very general, and there is little understanding of what underlying principles are responsible for the results. This thesis investigates the effects of damage in connectionist networks in order to analyze their behavior more thoroughly and assess their effectiveness and generality in reproducing neuropsychological phenomena. We focus on connectionist networks that make familiar patterns of activity into stable ``attractors.'' Unit interactions cause similar but unfamiliar patterns to move towards the nearest familiar pattern, providing a type of ``clean-up.'' In unstructured tasks, in which inputs and outputs are arbitrarily related, the boundaries between attractors can help ``pull apart'' very similar inputs into very different final patterns. Errors arise when damage causes the network to settle into a neighboring but incorrect attractor. In this way, the pattern of errors produced by the damaged network reflects the layout of the attractors that develop through learning. In a series of simulations in the domain of reading via meaning, networks are trained to pronounce written words via a simplified representation of their semantics. This task is unstructured in the sense that there is no intrinsic relationship between a word and its meaning. Under damage, the networks produce errors that show a distribution of visual and semantic influences quite similar to that of brain-injured patients with ``deep dyslexia.'' Further simulations replicate other characteristics of these patients, including additional error types, better performance on concrete vs.\ abstract words, preserved lexical decision, and greater confidence in visual vs.\ semantic errors. A range of network architectures and learning procedures produce qualitatively similar results, demonstrating that the layout of attractors depends more on the nature of the task than on the architectural details of the network that enable the attractors to develop. Additional simulations address issues in relearning after damage: the speed of recovery, degree of generalization, and strategies for optimizing recovery. Relative differences in the degree of relearning and generalization for different network lesion locations can be understood in terms of the amount of structure in the subtasks performed by parts of the network. Finally, in the related domain of object recognition, a similar network is trained to generate semantic representations of objects from high-level visual representations. In addition to the standard weights, the network has correlational weights useful for implementing short-term associative memory. Under damage, the network exhibits the complex semantic and perseverative effects of patients with a visual naming disorder known as ``optic aphasia,'' in which previously presented objects influence the response to the current object. Like optic aphasics, the network produces predominantly semantic rather than visual errors because, in contrast to reading, there is some structure in the mapping from visual to semantic representations for objects. Taken together, the results of the thesis demonstrate that the breakdown and recovery of behavior in lesioned attractor networks reproduces specific neuropsychological phenomena by virtue of the way the structure of a task shapes the layout of attractors. unix> ftp 128.146.8.52 Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get plaut.thesis-summary.ps.Z (or plaut.dyslexia.ps.Z) ftp> quit unix> zcat plaut.thesis-summary.ps.Z | lpr ------------------------------------------------------------------------------ David Plaut dcp+ at cs.cmu.edu Department of Psychology 412/268-5145 Carnegie Mellon University Pittsburgh, PA 15213-3890 From michael at dasy.cbs.dk Thu Nov 14 08:59:03 1991 From: michael at dasy.cbs.dk (Michael Egmont-Petersen) Date: Thu, 14 Nov 91 14:59:03 +0100 Subject: Case frequency versus case distance during learning Message-ID: <9111141359.AA02755@dasy.cbs.dk> Dear connectionists During learning the gradient methods use a mix of case similarity (euclidian) and case frequency to direct each step in the hyperspace. Each weight is changed with a certain magnitude: delta W = (Target-Output) * f'(Act) * f(Act-1) (for Back-propagation) to adjust weights between the (last) hidden layer and the output layer. I have wondered how much emphasis Back-propagation puts on "case similarity" (euclidian) while determining delta W. The underlying problem is the following: * How large a role plays the number of cases (in each category) compared to their (euclidian) (dis)similarity in adjusting the weights? It is a relevant question to pose because other learning algorithms such as ID3 *only* rely on case frequencies and NOT on distance between patterns within a cluster as well as distance between patterns belonging to different clusters. My question might already have been answered by some one in a paper. Is this the case, then don't bother the other connectionists with it, but mail me directly. Otherwise, it is a highly relevant question to pose, because the input representation then plays a role for how fast a network learns and furthermore its ability to generalize. Best regards Michael Egmont-Petersen Institute for Computer and Systems Sciences Copenhagen Business School DK-1925 Frb. C. Denmark E-mail: michael at dasy.cbs.dk From deo at cs.pdx.edu Thu Nov 14 11:34:24 1991 From: deo at cs.pdx.edu (Steven Farber) Date: Thu, 14 Nov 91 8:34:24 PST Subject: Change of address Message-ID: <9111141634.AA21541@jove.cs.pdx.edu> My address has changed from "steven at m2xenix.rain.com" to "deo at jove.cs.pdx.edu". Could you please change the address listed in the mailing list so my mail comes here instead? From p-mehra at uiuc.edu Thu Nov 14 14:16:51 1991 From: p-mehra at uiuc.edu (Pankaj Mehra) Date: Thu, 14 Nov 91 13:16:51 CST Subject: Applications Message-ID: <9111141916.AA03685@hobbes> > > From: shams at maxwell.hrl.hac.com > Subject: "real-world" applications of neural nets > > We are looking for "real-world" applications of neural networks to be > used as benchmarks for evaluating the performance of our > neurocomputer architecture. I will recommend the following book -- actually the proceedings of the conf on ANNs in Engg held in St Louis this week -- to anyone interested in ``real-world'' applications: Cihan Dagli, et al. (eds.), "Intelligent Engineering Systems through Artificial Neural Networks," New York: ASME Press, 1991. Ordering Information: ISBN 0-7918-0026-1 The American Society of Mechanical Engineers 345 East 47th Street, New York, NY 10017, U.S.A. Of course, if you are out looking for OFFLINE data, then you are ignoring a big chunk of the real world. If you contact the authors from this book, you might be able to obtain online simulator software as well as offline benchmark data. -Pankaj From uharigop at plucky.eng.ua.edu Thu Nov 14 17:24:39 1991 From: uharigop at plucky.eng.ua.edu (Umeshram Harigopal) Date: Thu, 14 Nov 91 16:24:39 -0600 Subject: No subject Message-ID: <9111142224.AA19216@plucky.eng.ua.edu> I am a graduate student in CS in the University of Alabama. In the process of my reading some material in Neural Network Architectures, I am right now stuck at a point studying recurrent networks. Having understood a Hopfield network (the basic recurrent network if I am correct) and the generalization of backpropogation to recurrent networks as in William and Zipser's paper it remains a problem to me to understand what a 'higher order' recurrent network is. I will be very thankful if someone can help me in this. Thanking you -Umesh Harigopal E-mail : uharigop at buster.eng.ua.edu From jon at cs.flinders.oz.au Fri Nov 15 00:29:36 1991 From: jon at cs.flinders.oz.au (jon@cs.flinders.oz.au) Date: Fri, 15 Nov 91 15:59:36 +1030 Subject: Patents Message-ID: <9111150529.AA02911@turner> From: Aaron Owens : ---------- U. S. Patent No. 5,046,020 abstract ---------- DISTRIBUTED PARALLEL PROCESSING NETWORK WHEREIN THE CONNECTION WEIGHTS ARE GENERATED USING STIFF DIFFERENTIAL EQUATIONS Inventor: David L. Filkin Assignee: E. I. du Pont de Nemours and Company U. S. Patent Number 5,046,020 Sep. 3, 1991 Abstract. A parallel distributed processing network of the back propagation type is disclosed in which the weights of connection between processing elements in the various layers of the network are determined in accordance with the set of steady solutions of the stiff differential equations governing the relationship between the layers of the network. Its so nice to see people facilitating the free and unencumbered dissemination of knowledge by borrowing pubicly distributed ideas like Backprop, modifying them slightly and then patenting them for their own gain! Jon Baxter From thodberg at nn.meatre.dk Fri Nov 15 10:12:20 1991 From: thodberg at nn.meatre.dk (Hans Henrik Thodberg) Date: Fri, 15 Nov 91 16:12:20 +0100 Subject: Subtractive network design Message-ID: <9111151512.AA02320@nn.meatre.dk.meatre.dk> I would like a discussion on the virtue of subtractive versus additive methods in design of neural networks. It is widely accepted that if several networks process the training data correctly, the smallest of them will generalise best. The design problem is therefore to find these mimimal nets. Many workers have chosen to construct the networks by adding nodes or weights until the training data is processed correctly (cascade correlation, adding hidden units during training, meiosis). This philosophy is natural in our culture. We are used to custruct things by pieces. I would like to advocate an alternative method. One trains a (too) large network, and then SUBTRACTS nodes or weights (while retraining) until the network starts to fail to process the training data correctly. Neural networks are powerful because they can form global or distributed representations of a domain. The global structures are more economic, i.e they use fewer weights, and therefore generalise better. My point is that subtractive shemes are more likely to find these global descriptions. These structures so to speek condense out of the more complicated structures under the force of subtraction. I would like to hear your opinion on this claim! I give here some references on subtractive schemes: Y.Le Cun, J.S.Denker and S.A.Solla, "Optimal Brain Damage", NIPS 2, p.598-605 H.H.Thodberg, "Improving Generalization of Neural Networks through Pruning", Int. Journal of Neural Systems, 1, 317-326, (1991). A.S.Weigend, D.E.Rumelhart and B.A.Huberman, "Generalization by Weight Elimination with Application to Forecasting", NIPS 3, p. 877-882. ------------------------------------------------------------------ Hans Henrik Thodberg Email: thodberg at nn.meatre.dk Danish Meat Research Institute Phone: (+45) 42 36 12 00 Maglegaardsvej 2, Postboks 57 Fax: (+45) 42 36 48 36 DK-4000 Roskilde, Denmark ------------------------------------------------------------------ From COGSCI92 at ucs.indiana.edu Fri Nov 15 11:28:37 1991 From: COGSCI92 at ucs.indiana.edu (Cognitive Science Conference 1992) Date: Fri, 15 Nov 91 11:28:37 EST Subject: COG SCI 92: CALL FOR PAPERS Message-ID: please post ================================================================= CALL FOR PAPERS: The Fourteenth Annual Conference of The Cognitive Science Society July 29 -- August 1, 1992 Indiana University THE CONFERENCE: --------------- The Annual Conference of the Cognitive Science Society brings together researchers studying cognition in humans, animals or machines. The 1992 Conference will be held at Indiana University. Plenary speakers for the conference are: Elizabeth Bates John Holland Daniel Dennett Richard Shiffrin Martha Farah Michael Turvey Douglas Hofstadter The Conference will also feature evening entertainments: a welcoming reception (Wed), banquet (Thurs), poster reception (Fri), and concert (Sat). PAPER SUBMISSION INSTRUCTIONS: ------------------------------ Paper and poster submissions are encouraged in the areas of cognitive psychology, artificial intelligence, linguistics, cognitive anthropology, connectionist models, cognitive neuroscience, education, cognitive development, philosophical foundations, as well as any other area of relevance to cognitive science. Authors should submit five (5) copies of their papers in hard copy form to Cognitive Science 1992 Submissions Attn: Candace Shertzer Cognitive Science Program Psychology Building Indiana University Bloomington, IN 47405 All accepted papers will appear in the Conference Proceedings. Presentation format (talk or poster) will be decided by a review panel, unless the author specifically requests consideration for only one format. Electronic and FAX submissions cannot be accepted. David Marr Memorial Prizes for Excellent Student Papers: -------------------------------------------------------- To encourage even greater student participation in the Conference, papers that have a student as first author are eligible to compete for one of four David Marr Memorial Prizes. Student-authored papers will be judged by reviewers and the Program Committee for excellence in research and presentation. Each of the four Prizes is accompanied by a $300 honorarium. The David Marr Prize is funded by an anonymous donor. Appearance and length: ---------------------- Papers should be a maximum of six (6) pages long (excluding cover page, described below), have at least 1 inch margins on all sides, and use no smaller than 10pt type. Camera-ready versions will be required only after authors are notified of acceptance. Cover page: ----------- Each copy of the paper must include a cover page, separate from the body of the paper, that includes (in order): 1. Title of paper. 2. Full names, postal addresses, phone numbers and e-mail addresses (if available) of all authors. 3. An abstract of no more than 200 words. 4. The area and subarea in which the paper should be reviewed. 5. Preference of presentation format: Talk or poster; talk only; poster only. 6. A note stating whether the first author is a student and should therefore be considered for a Marr Prize. Papers submission deadline: --------------------------- Papers must be *received* by March 2, 1992. Notification of acceptance or rejection will be made by April 10. Camera ready versions of accepted papers are due May 8. SYMPOSIA: --------- Symposium submissions are also encouraged. Submissions should specify: 1. A brief description of the topic. 2. How the symposium would address a broad cognitive science audience. 3. Names of symposium organizer(s) and potential speakers and their topics. 4. Proposed format of symposium (e.g., all formal talks; brief talks plus panel discussion; open discussion; etc.). Symposia should be designed to last 1 hr 40 min. Symposium submission deadline: ------------------------------ Symposium submissions must be received by January 13, 1992, and should be sent as soon as possible. Note that the deadline for symposium submissions is earlier than for papers. TRAVEL: ------- By air, fly to Indianapolis (not Bloomington) where pre-arranged, inexpensive charter buses will take you on the 1-hour drive to Bloomington. Discount airfares are available from the conference airline, USAir, which has flights from Europe and Canada as well as within the continental US. Full details regarding travel, lodging and registration will be given in a subsequent announcement. FOR MORE INFORMATION CONTACT: ----------------------------- John K. Kruschke, Conference Chair e-mail: cogsci92 at ucs.indiana.edu Candace Shertzer, Cognitive Science Program Secretary phone: (812) 855-4658 e-mail: cshertze at silver.ucs.indiana.edu Cognitive Science Program Psychology Building Indiana University Bloomington, IN 47405 ================================================================= From sontag at control.rutgers.edu Fri Nov 15 18:08:50 1991 From: sontag at control.rutgers.edu (sontag@control.rutgers.edu) Date: Fri, 15 Nov 91 18:08:50 EST Subject: Patenting of algorithms Message-ID: <9111152308.AA24271@control.rutgers.edu> There was recently a message to this bboard regarding the patenting of neural net algorithms (as opposed to copyrighting of software). With permission, I am reprinting here a report prepared by the Mathematical Programming Society regarding the issue of patenting algorithms. (The report appears in the forthcoming issue of SIAM News.) I DO NOT want to generate a discussion of this general topic in connectionists; the purpose of reprinting this here is just to make people aware of the report and its strong recommendations, which are especially relevant for an area such as neural nets. (I suggest the use of comp.ai.neural-nets for discussion.) -eduardo PS: I have not included the Appendices that are referred to, as I did not obtain permission to reprint them. ---Copyright issue, NOT patent...! :-) PS_2: Note the irony: the first signatory of the report is George Dantzig, who in essence designed the most useful algorithm for (batch) perceptron learning. ---------------------------- cut here (TeX file) ------------------------------ \hyphenation{Tex-as} \def\subpar{\hfill\break\indent\indent} \centerline{\bf Report of the Committee on Algorithms and the Law} \medskip \centerline{Mathematical Programming Society} \beginsection Background and charge The Committee was appointed in the spring of 1990 by George Nemhauser, Chairman of the Mathematical Programming Society (MPS). Its charge follows: {\narrower \noindent ``The purpose of the committee should be to devise a position for MPS to adopt and publicize regarding the effects of patents on the advancement of research and education in our field. The committee may also wish to comment on the recent past history.'' \smallskip} This is the report of the Committee. It comprises a main body with our assumptions, findings of fact, conclusions, and recommendations. There are two appendices, prepared by others, containing a great deal of specific factual information and some additional analysis. \beginsection Assumptions MPS is a professional, scientific society whose members engage in research and teaching of the theory, implementation and practical use of optimization methods. It is within the purview of MPS to promote its activities (via publications, symposia, prizes, newsletter), to set standards by which that research can be measured (such as criteria for publication and prizes, guidelines for computational testing, etc.), and to take positions on issues which directly affect our profession. It is not within the purview of MPS to market software products, and MPS should not become involved in issues related to the commercial aspects of our profession except where it directly affects research and education. The Committee is unable to make expert legal analyses or to provide legal counsel. The main body of this report is therefore written from the perspective of practitioners of mathematical programming rather than from that of attorneys skilled in the law. MPS is an international society. However, the Committee has interpreted its charge as applying apecifically to U.S. patent law and its application to algorithms. All comments and conclusions of this report should be read with this fact in mind. \beginsection Facts about patents and copyrights The three principal forms of legal protection for intellectual property are the copyright, the patent, and the trade secret. Copyrights and patents are governed by federal law, trade secrets by state law. Setting aside the issue of trade secrets, some of the distinctions between copyrights and patents can be summarized as follows. {\it Type of property protected:\/} Patents protect ideas, principally ``nonobvious'' inventions and designs. It is well estabished that ``processes'' are patentable. The Patent Office currently grants patents on algorithms and software, on the basis of the ambiguous 1981 U.S. Supreme Court decision in {\it Diamond v. Diehr.} Copyrights do not protect ideas. Instead, they protect the {\it expression} of ideas, in ``original works of authorship in any tangible medium of expression.'' The principle that software is copyrightable appears to have been well established by the 1983 decision of the U.S. Court of Appeals in {\it Apple v. Franklin.} {\it How protection is obtained:\/} Federal law is now in essential conformity with the Bern Copyright Convention. As a consequence, international copyrights are created virtually automatically for most works of authorship. Government registration of copyrights is simple and inexpensive to obtain. By contrast, patents are issued by the U.S. Patent Office only after an examination procedure that is both lengthy (three years or more) and costly (\$10,000 and up in fees and legal expenses). An inventor must avoid public disclosure of his invention, at least until patent application is made, else the invention will be deemed to be in the public domain. Patent application proceedings are confidential, so that trade secret protection can be obtained if a patent is not granted. {\it Length of protection:\/} U.S. patents are for 17 years. Copyrights are for the lifetime of the individual plus 50 years or, in the case of corporations, 75-100 years. \beginsection Facts about algorithms Algorithms are typically designed and developed in a highly decentralized manner by single individuals or small groups working together. This requires no special equipment, few resources, and little cost. The number of people involved is also quite large compared to the needs of the marketplace. Independent rediscovery is a commonly occurring phenomenon. There is a long and distinguished history of public disclosure by developers of mathematical algorithms via the usual and widely-accepted channels of publication in scientific journals and talks at professional meetings. These disclosures include the theoretical underpinnings of the method, implementation details, computational results, and case studies of results on applied problems. Indeed, algorithm development is based on the tradition of building upon previous work by generalizing and improving solution principles from one situation to another. The commercial end product of an algorithm (if there is any) is generally a software package, where the algorithm is again generally implemented by a very small number of individuals. Of course, a larger group of people may be involved in building the package around the optimization software to handle the user interface, data processing, etc. Also, others may be involved to handle functions like marketing, distribution, and maintenance. Competition in the marketplace has been traditionally based on the performance of particular implementations and features provided by particular software products. The product is often treated like a ``black box'' with the specific algorithm used playing a rather minor role. The cost of producing, manufacturing, distributing and advertising optimization software is often quite small. Even when this is not the case, it is generally the implementation of algorithms that is costly, rather than their development. Software manufacturers have a need to protect their investment in implementation, but have little need to protect an investment in algorithmic development. In the absence of patents, algorithms--like all of mathematics and basic science-- are freely available for all to use. Traditionally, developers of optimization software have protected their investments by keeping the details of their implementation secret while allowing the general principles to become public. Software copyrights are also an appropriate form of protection, and are now widely used. Moreover, despite unresolved legal questions concerning the ``look and feel'' of software, the legal issues of copyright protection seem to be relatively well settled. Often an optimization package is a small (but important) part of an overall planning process. That process is often quite complex; it may require many resources and great cost to complete, and the potential benefits may be uncertain and distributed over a long time period. In such situations it is usually quite difficult to quantify the net financial impact made by the embedded optimization package. \beginsection Public policy issues {\it Will algorithm patents promote invention?} Article I, Section 8 of the U.S. Constitution empowers Congress ``To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.'' Inasmuch as patents are intended to provide an incentive for invention, it seems appropriate to inquire whether patenting of algorithms will, in fact, create an incentive for the invention of algorithms. Given the existing intensity of research and the rapid pace of algorithmic invention, it seems hard to argue that additional incentives are needed. In fact, there is good reason to believe that algorithm patents will inhibit research, in that free exchange of ideas will be curtailed, new developments will be held secret, and researchers will be subjected to undesired legal constraints. {\it Will algorithm patents provide needed protection for software manufacturers?} Copyright and trade secret protection appear to provide the sort of protection most needed by software manufacturers. By their nature, patents seem to offer a greater potential for legal confrontation than copyrights. Instead of providing protection, algorithm patents actually pose a threat to smaller software houses lacking the resources to defend themselves in costly patent litigation. It can be argued that patents encourage an oligarchical industrial structure and discourage competition. {\it Is the Patent Office able to deal with algorithm patents?} There is abundant evidence that the Patent Office is not up to the job. Many algorithmic ``inventions'' have been granted undeserved patents, greatly increasing the potential for legal entanglement and litigation. Moreover, it seems unlikely that there will be any substantial improvement in the quality of patent examinations. \beginsection Conclusions It seems clear from the previous discussion that the nature of work on algorithms is quite different from that in other fields where the principles of patents apply more readily. This in itself is a strong argument against patenting algorithms. In addition, we believe that the patenting of algorithms would have an extremely damaging effect on our research and on our teaching, particularly at the graduate level, far outweighing any imaginable commercial benefit. Here is a partial list of reasons for this view: \item{$\bullet$} Patents provide a protection which is not warranted given the nature of our work. \item{$\bullet$} Patents are filed secretly and would likely slow down the flow of information and the development of results in the field. \item{$\bullet$} Patents necessarily impose a long-term monopoly over inventions. This would likely restrict rather than enhance the availability of algorithms and software for optimization. \item{$\bullet$} Patents introduce tremendous uncertainty and add a large cost and risk factor to our work. This is unwarranted since our work does not generate large amounts of capital. \item{$\bullet$} Patents would not provide any additional source of public information about algorithms. \item{$\bullet$} Patents would largely be concentrated within large institutions as universities and industrial labs would likely become the owners of patents on algorithms produced by their researchers. \item{$\bullet$} Once granted, even a patent with obviously invalid claims would be difficult to overturn by persons in our profession due to high legal costs. \item{$\bullet$} If patents on algorithms were to become commonplace, it is likely that nearly all algorithms, new or old, would be patented to provide a defense against future lawsuits and as a potential revenue stream for future royalties. Such a situation would have a very negative effect on our profession. \beginsection Recommendations The practice of patenting algorithms is harmful to the progress of research and teaching in optimization, and therefore harmful to the vital interests of MPS. MPS should therefore take such actions as it can to help stop this practice, or to limit it if it cannot be stopped. In particular: \item{$\bullet$} The MPS Council should adopt a resolution opposing the patenting of algorithms on the grounds that it harms research and teaching. \item{$\bullet$} MPS should urge its sister societies ({\it e.g.,} SIAM, ACM, IEEE Computer Society, AMS) to take a similar forthright position against algorithm patents. \item{$\bullet$} MPS should publish information in one or more of its publications as to why patenting of algorithms is undesirable. \item{$\bullet$} The Chairman of MPS should write in his official capacity to urge members of Congress to pass a law declaring algorithms non-patentable (and, if possible, nullifying the effects of patents already granted on algorithms). \item{$\bullet$} MPS should support the efforts of other organizations to intervene in opposition to the patenting of algorithms (for example, as friends of the court or with Congress). It should do so by means such as providing factual information on mathematical programming issues and/or history, and commenting on the impact of the patent issue to our research and teaching in mathematical programming. MPS should urge its members to do likewise. \vskip .6 in \settabs 6 \columns \centerline{The Committee on Algorithms and the Law} \smallskip \+&&&George B. Dantzig\cr \+&&&Donald Goldfarb\cr \+&&&Eugene Lawler\cr \+&&&Clyde Monma\cr \+&&&Stephen M. Robinson (Chair)\cr \medskip \+&&&26 September 1990\cr \vfill\eject \end From uh311ae at sunmanager.lrz-muenchen.de Sun Nov 17 17:40:49 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 17 Nov 91 23:40:49+0100 Subject: Boltzmann machine, anyone ? Message-ID: <9111172240.AA12057@sunmanager.lrz-muenchen.de> I heard that some people failed to make an analog VLSI implementation. Is there any fast hardware out ? Would anybody still think it worth a try, at least in principle ? Thanks, Henrik MPCI at LLNL IBM Research U. of Munich From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Nov 17 18:03:27 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 17 Nov 91 18:03:27 EST Subject: Subtractive network design In-Reply-To: Your message of Fri, 15 Nov 91 16:12:20 +0100. <9111151512.AA02320@nn.meatre.dk.meatre.dk> Message-ID: My point is that subtractive shemes are more likely to find these global descriptions. These structures so to speek condense out of the more complicated structures under the force of subtraction. I would like to hear your opinion on this claim! For best generalization in supervised learning, the goal is to develop a separating surface that captures as much as possible of the "signal" in the data set, without capturing too much of the noise. If one assumes that the signal components are larger and more coherent than the noise, you can do this by restricting the complexity of the separating surface(s). This, in turn, can be accomplished by choosing a network architecture with exactly the right level of complexity or by stopping the training before the surface gets too contorted. (The excess degrees of freedom are still there, but tend to be redundant with one another in the early phases of training.) Since, in most cases, we can't guess in advance just what architecture is needed, we must make this selection dynamically. An architecture like Cascade-Correlation builds the best model it can without hidden units, then the best it can do with one, and so on. It's possible to stop the process as soon as the cross-validation performance begins to decline -- a sign that the signal has been exhausted and you're starting to model noise. One problem is that each new hidden unit receives connections from ALL available inputs. Normally, you don't really need all those free parameters at once, and the excess ones can hurt generalization in some problems. Various schemes have been proposed to eliminate these unneccessary degrees of freedom as the new units are being trained, and I think this problem will soon be solved. A subtractive scheme can also lead to a network of about the right complexity, and you cite a couple of excellent studies that demonstrate this. But I don't see why these should be better than additive methods (except for the problem noted above). You suggest that a larger net can somehow form a good global description (presumably one that models a lot of the noise as well as the signal), and that the good stuff is more likely to be retained as the net is compressed. I think it is equally likely that the global model will form some sort of description that blends signal and noise components in a very distributed manner, and that it is then hard to get rid of just the noisy parts by eliminating discrete chunks of network. That's my hunch, anyway -- maybe someone with more experience in subtractive methods can comment. I beleive that the subtractive schemes will be slower, other things being equal: you have to train a very large net, lop off something, retrain and evaluate the remainder, and iterate till done. It's quicker to build up small nets and to lock in useful sub-assemblies as you go. But I guess you wanted to focus only on generalization and not on speed. Well, opinions are cheap. If you really want to know the answer, why don't you run some careful comparative studies and tell the rest of us what you find out. Scott Fahlman School of Computer Science Carnegie Mellon University From KRUSCHKE at ucs.indiana.edu Sat Nov 16 15:38:42 1991 From: KRUSCHKE at ucs.indiana.edu (John K. Kruschke) Date: Sat, 16 Nov 91 15:38:42 EST Subject: Subtractive network design Message-ID: The dichotomy between additive and subtractive schemes for modifying network architectures is based on the notion that nodes which are not "in" the network consume no computation or memory; i.e., what gets added or subtracted is the *presence* of the node. An alternative construal is that what gets added or subtracted is not the node itself but its *participation* in the functionality of the network. As a trivial example, a node can be present but not participate if all the weights leading out of it are zero. Under the first construal (presence), subtractive schemes can be more expensive to implement in hardware or software than "additive" schemes, because the additive schemes spend nothing on nodes which aren't there yet. Under the second construal (functional participation), the two schemes consume equal amounts of resources, because all the nodes are processed all the time. In this latter case, arguments for or against one type of scheme must come from other constraints; e.g., ability to generalize, learning speed, neural plausibility, or even (gasp!) human performance. Architecture modification schemes can be both additive and subtractive. For example, Kruschke and Movellan (1991) described an algorithm in which individual nodes from a large pool of candidates can have their functional participation gradually suppressed (subtracted) or resurrected (added). Other methods for manipulating the functional participation of hidden nodes are described in the other papers listed below. Kruschke, J. K., & Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks. IEEE Transactions on Systems, Man and Cybernetics, v.21, pp.273-280. Kruschke, J. K. (1989b). Distributed bottlenecks for improved generalization in back-propagation networks. International Journal of Neural Networks Research and Applications, v.1, pp.187-193. Kruschke, J. K. (1989a). Improving generalization in back-propagation networks with distributed bottlenecks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, Washington D.C. June 1989, v.1, pp.443-447. Kruschke, J. K. (1988). Creating local and distributed bottlenecks in hidden layers of back-propagation networks. In: D. Touretzky, G. Hinton, & T. Sejnowski (eds.), Proceedings of the 1988 Connectionist Models Summer School, pp.120-126. San Mateo, CA: Morgann Kaufmann. ------------------------------------------------------------------- John K. Kruschke Asst. Prof. of Psych. & Cog. Sci. Dept. of Psychology internet: kruschke at ucs.indiana.edu Indiana University bitnet: kruschke at iubacs Bloomington, IN 47405-4201 office: (812) 855-3192 USA lab: (812) 855-9613 =================================================================== From reiner at isy.liu.se Mon Nov 18 00:50:47 1991 From: reiner at isy.liu.se (Reiner Lenz) Date: Mon, 18 Nov 91 06:50:47 +0100 Subject: Algorithms for Principal Components Analysis In-Reply-To: "Terence D. Sanger" Tue, 12 Nov 91 17:54:54 EST Message-ID: <9111180545.AA04405@rainier.isy.liu.se> Here is our contribution to the computation of Principle Components. We developed 1) a system that learns the principle components in parallel @article{Len_proof:91, author ={Reiner Lenz and Mats \"Osterberg}, title="Computing the Karhunen-Loeve expansion with a parallel, unsupervised filter system", journal = "Neural Computations", year = "Accepted" } 2) Recently we modified this system to overcome some of the drawbacks of the standard principle components approach (such as mixing eigenvectors belonging to the same eigenvector etc). @techreport{Len_4o:91, author ={Reiner Lenz and Mats \"Osterberg}, title="A new method for unsupervised linear feature extraction using forth order moments", institution={Link\"oping University, ISY, S-58183 Link\"oping}, note="Internal Report", year="1991" } These systems are part of our work on group theoretical methods in image science as described in @article{Len_jos:89, author ="Reiner Lenz", title ="A Group Theoretical Model of Feature Extraction", journal="J. Optical Soc. America A", volume="6", number="6", pages="827-834", year = "1989" } @article{Len:90, author= "Reiner Lenz", title = "Group-Invariant Pattern Recognition", journal = "Pattern Recognition", volume="23", number="1/2", pages = "199-218", year = "1990" } @article{Len_nn:91, author ="Reiner Lenz", title="On probabilistic Invariance", journal = "Neural Networks", volume="4", number="5", year = "1991" } @book{Len:90ln, author= "Reiner Lenz", title = "Group Theoretical Methods in Image Processing", publisher = "Springer Verlag", series = "Lecture Notes in Computer Science (Vol. 413)", address = "Heidelberg, Berlin, New York", year = "1990" } From lba at sara.inesc.pt Sun Nov 17 16:33:29 1991 From: lba at sara.inesc.pt (Luis B. Almeida) Date: Sun, 17 Nov 91 20:33:29 -0100 Subject: Patents In-Reply-To: jon@cs.flinders.oz.au's message of Fri, 15 Nov 91 15:59:36 +1030 <9111150529.AA02911@turner> Message-ID: <9111172133.AA22872@sara.inesc.pt> I am sorry, but I am strongly in favor of the right to patent algorithms. In fact, I have recently patented the usual algorithm for multiplication of numbers in any base, including binary (you may not believe it, but it had not yet been patented). I am expecting to earn large sums of money, especially from computer and calculator manufacturers, and also from neural network people, who are said to be very fond of sums of products. I wouldn't like this large income to simply disappear. I am now in the process of patenting algorithms for addition, subtraction and division, and I already have an idea of a square root algorithm, which probably also is worth patenting. Luis B. Almeida INESC Phone: +351-1-544607 Apartado 10105 Fax: +351-1-525843 P-1017 Lisboa Codex Portugal lba at inesc.pt lba at inesc.uucp (if you have access to uucp) From srikanth at cs.tulane.edu Mon Nov 18 12:19:01 1991 From: srikanth at cs.tulane.edu (R Srikanth) Date: Mon, 18 Nov 91 11:19:01 CST Subject: Subtractive network design In-Reply-To: <9111180509.AA27873@rex.cs.tulane.edu>; from "Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU" at Nov 17, 91 6:03 pm Message-ID: <9111181719.AA00230@poseidon.cs.tulane.edu> > > > My point is that subtractive shemes are more likely to find > these global descriptions. These structures so to speek condense out of > the more complicated structures under the force of subtraction. > > I would like to hear your opinion on this claim! > > A subtractive scheme can also lead to a network of about the right > complexity, and you cite a couple of excellent studies that demonstrate > this. But I don't see why these should be better than additive methods > (except for the problem noted above). You suggest that a larger net can > somehow form a good global description (presumably one that models a lot of > the noise as well as the signal), and that the good stuff is more likely to > be retained as the net is compressed. I think it is equally likely that > the global model will form some sort of description that blends signal and > noise components in a very distributed manner, and that it is then hard to > get rid of just the noisy parts by eliminating discrete chunks of network. > That's my hunch, anyway -- maybe someone with more experience in > subtractive methods can comment. > Also there is a question of over generalizations. A larger network say is given a set of m points to learn a parabola, may end up learning a higher order polynomial. Which is a case of over generalization leading to poor performance. Of course the vice vorsa is also true. The question posed here is do we need a first best fit or the most general fit ? Answer may be different for different problems. Thus we may be able to generate opposite views in different problem spaces. srikanth -- srikanth at rex.cs.tulane.edu Dept of Computer Science, Tulane University, New Orleans, La - 70118 From P.Refenes at cs.ucl.ac.uk Mon Nov 18 13:09:31 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Mon, 18 Nov 91 18:09:31 +0000 Subject: Subtractive network design In-Reply-To: Your message of "Sun, 17 Nov 91 18:03:27 EST." Message-ID: reduce the generality of a network and thus improve its generalisation. Depending on size and training times, fixed geometry networks often develop (near-) duplicate and/or (near-) reduntant functionality. Prunning techniques aim to remove this functionality from the network and they do quite well here. There are however two problems: firstly, these are not the only cases of increased functionality, and secondly, the removal of near zero connections often ignores the knock-on effects on generalisation due the accumulated influence that these connections might have. It is often conjectured that hidden unit size is the culprit for bad generalisation. This is not strictly so. The true culprit is the high degree of freedom in exploring the search space which also depends on other parameters such as training times. The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset (the fact that we are not using it for the backward pass is irrelevant). So any improvement in generalisation is probably due to the fact that we are using a larger training dataset (again the fact that we are doing manually, should not divert us). My view is that this method should be treated as a "good code of professional practise" when reporting results, rather than as a panacea. Paul Refenes From ingber at umiacs.UMD.EDU Mon Nov 18 14:50:56 1991 From: ingber at umiacs.UMD.EDU (Lester Ingber) Date: Mon, 18 Nov 1991 14:50:56 EST Subject: Genetic algorithms and very fast simulated re-annealing: A comparison Message-ID: <9111181950.AA00292@dweezil.umiacs.UMD.EDU> connectionists at cs.cmu.edu *** Please do not forward to any other lists *** Bruce Rosen and I have written the following paper, and placed it in the Neuroprose archive as ingber.saga.ps.tar.Z. Please see the note at the end of this file to extract the text and figures. Genetic algorithms and very fast simulated re-annealing: A comparison Lester Ingber Science Transfer Corporation, P.O. Box 857, McLean, VA 22101 ingber at umiacs.umd.edu and Bruce Rosen Department of Computer & Information Sciences, University of Delaware, Newark, DE 19716 brosen at cis.udel.edu We compare Genetic Algorithms (GA) with a functional search method, Very Fast Simulated Re-Annealing (VFSR) that not only is efficient in its search strategy, but also is statistically guaranteed to find the function optima. GA previously has been demonstrated to be competitive with other standard Boltzmann-type simulated annealing techniques. Presenting a suite of six stan- dard test functions to GA and VFSR codes from previous studies, without any additional fine tuning, strongly suggests that VFSR can be expected to be orders of magnitude more efficient than GA. To ftp this file from Neuroprose to your local machine, follow these directions, typing in commands between single quotes (without the quotes included). Start with 'cd /tmp' as noted below, so that you won't have to be concerned with deleting all these files after you're finished printing. local% cd /tmp local% 'ftp archive.cis.ohio-state.edu' [local% 'ftp 128.146.8.52'] Name (archive.cis.ohio-state.edu:yourloginname): 'anonymous' Password (archive.cis.ohio-state.edu:anonymous): 'yourloginname' ftp> 'cd pub/neuroprose' ftp> 'binary' ftp> 'get ingber.saga.ps.tar.Z' ftp> 'quit' local% Now, at your local machine: 'uncompress ingber.saga.ps.tar.Z' will leave "ingber.saga.ps.tar". 'tar xf ingber.saga.ps.tar' will leave a directory "saga.dir". 'cd saga.dir' will put you in a directory with the text.ps file and 6 figX.ps files, where X = 1-6. If you 'ls -l' you should get -rw-r--r-- 1 ingber 4928 Nov 17 06:49 fig1.ps -rw-r--r-- 1 ingber 6949 Nov 17 06:49 fig2.ps -rw-r--r-- 1 ingber 14432 Nov 17 06:50 fig3.ps -rw-r--r-- 1 ingber 5311 Nov 17 06:50 fig4.ps -rw-r--r-- 1 ingber 7552 Nov 17 06:50 fig5.ps -rw-r--r-- 1 ingber 6222 Nov 17 06:50 fig6.ps -rw-r--r-- 1 ingber 85945 Nov 17 06:52 text.ps (with your name instead of mine). Now you can 'lpr [-P..] *.ps' to a PostScript laserprinter. This will print out 18 pages: 12 pages of text + 6 graphs. If you'd like a copy of the final version when this paper is published, just drop me a note with the word sagareprint (all caps or all lower case O.K.) anyplace in your email, and I'll oblige. Lester Ingber ============================================================ ------------------------------------------ | | | | | | | Prof. Lester Ingber | | ______________________ | | | | | | Science Transfer Corporation | | P.O. Box 857 703-759-2769 | | McLean, VA 22101 ingber at umiacs.umd.edu | | | ------------------------------------------ ============================================================ From kolen-j at cis.ohio-state.edu Mon Nov 18 20:02:32 1991 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Mon, 18 Nov 91 20:02:32 -0500 Subject: No subject In-Reply-To: Umeshram Harigopal's message of Thu, 14 Nov 91 16:24:39 -0600 <9111142224.AA19216@plucky.eng.ua.edu> Message-ID: <9111190102.AA14728@retina.cis.ohio-state.edu> Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) John Kolen From soller%asylum at cs.utah.edu Mon Nov 18 23:34:22 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Mon, 18 Nov 91 21:34:22 -0700 Subject: Research Position in Medical Information Systems at VA GRECC Message-ID: <9111190434.AA03143@asylum.utah.edu> The Salt Lake City VA GRECC requested that I post the following notice. Please forward this message to any interested researchers. To prevent accidentally sending a response to all people on either of these very large mailing lists, do not respond directly by e-mail. Jerome B. Soller Salt Lake City VA Regional Information Systems Center -------------------------- cut here ------------------------------------------ **** Please Post *** Please Post *** Please Post *** Please Post ***** ------------------------------------------------------------------------------ POSITION AVAILABLE IN MEDICAL INFORMATION SYSTEMS ------------------------------------------------------------------------------ The Salt Lake City Veterans Affairs GRECC(Geriatric, Research, Education, and Clinical Center) has an opening for a Ph.D. or MD level senior research position in the area of Medical Information Systems. This GRECC is one of 15 GRECCs nationally. The computer work at the GRECC is being supported by the Department of Veterans Affairs Salt Lake City VA Regional Information Systems Center (one of 7 national R and D centers for the VA specializing in information systems) and the Salt Lake City Veterans Affairs Hospital's IRMS division. The Department of Veterans Affairs has 172 hospitals nationwide, all combined into the DHCP database. Because these hospitals serve millions of patients each year, the opportunity exists for analysis of huge data sets, otherwise unavailable. The GRECC encourages its researchers to pursue joint research with other research groups at the U. of Utah. Opportunities for joint research include the following: 1) Computer Science(Strength areas include expert systems tools, graphics, integrated circuit design, robotic/vision, parallel numerical modelling, etc..,) 2) Medical Informatics(Projects include the Help Expert System, the Iliad Expert System, semantic networks, and support of the Library of Medicine's Unified Medical Language System. The Department of Nursing Informatics complements the Department of Medical Informatics, but with an emphasis on nursing systems.) 3) Bioengineering(Has many neuroprosthetic projects.) 4) Human Genetics(Has just established a 50 million dollar research center and has its own computer research group.) 5) Anesthesiology (Has an industrially supported neural network research group.) 6) The Center for Engineering Design(Creators of the Utah/MIT dextrous hand, the Utah Arm, and medical devices.) 7) The Cognitive Science Program, which is in its formative stages. Candidates for this position should have knowledge and demonstrated research in many of the following areas with an emphasis or potential applicability to medical applications: databases, digital signal processing, instrumentation, expert systems, statistics, time series analysis, fuzzy logic, neural networks, parallel computation, physiological and neural modelling, and data fusion. Candidates for this position must be U.S. citizens. To apply, send or fax your curriculum vitae to Dr. Gerald Rothstein, Director of the GRECC Mail Code 182 500 Foothill Boulevard Salt Lake City, Utah 84148 Phone Number: (801) 582-1565 extension 2475 Fax Number: (801) 583-7338 Dr. Gerald Rothstein Director, Salt Lake City VA GRECC From ken at cns.caltech.edu Tue Nov 19 01:18:33 1991 From: ken at cns.caltech.edu (Ken Miller) Date: Mon, 18 Nov 91 22:18:33 PST Subject: con etiquete Message-ID: <9111190618.AA21823@cns.caltech.edu> Hello, I enjoy receiving connectionists to keep abreast of technical issues, meetings, new papers. But lately the flow of messages has become very large and the signal to noise has noticeably decreased, and these trends have been sustained for a long enough time that they seem likely to represent a change rather than a fluctuation. I think the reason is a change to a more "conversational" mode, in which people feel free to post their very preliminary and not-very-substantiated thoughts, for example "i think maybe this is good, and i think maybe that is bad". I would like to suggest that we all collectively raise the threshold for what is to be broadcast to the 1000's of us on the net as a whole. Many of these "conversational" entries would in my opinion be better kept as private conversations among the small group of people involved. When some concrete conclusions emerge, or a concrete set of questions needing more investigation emerges, *then* a *distilled* post could be sent to the net as a whole. If you will: forward prop through private links, backward prop to the net as a whole. This also means that, when a concrete question is posted, answers unless very solid might best be sent to the poster of the question, who in turn may eventually send a distillate to the net. Along the same lines, I would like to make a suggestion that people *strongly* avoid publicly broadcasting dialogues; iterate privately, and if relevant send the net a distilled version of the final results. Finally I would also suggest that the urge to philosophical, as opposed to technical, discussions be strongly suppressed; please set the threshold very very high. In conclusion, I would like to urge everyone to think of connectionists as a place for distilled rather than raw, technical rather than philosophical, discussions. Thanks, Ken p.s. please send angry flames to me, not the net. ken at cns.caltech.edu From ibm at dit.upm.es Tue Nov 19 09:38:30 1991 From: ibm at dit.upm.es (Ignacio Bellido Montes) Date: Tue, 19 Nov 91 15:38:30 +0100 Subject: Patents Message-ID: <9111191438.AA01121@bosco.dit.upm.es> I'm not sure if Luis B. Almeida's message is just a joke or something else, in that case,m I think I can patent the weel. This is one of the most used things in the world, and not only by computer scientist, but by everybody, so.... I must chec first if it is patented and in that case, I can act... Anyway, I don't know about laws, but I think that nobody should be able to patent something (algorithm or not) previously used by other people, who didn't patented it. Ignacio Bellido ============================================================================ Ignacio Bellido Fernandez-Montes, Departamento de Ingenieria de Department of Telematic Sistemas Telematicos (dit), Systems Engineering, Laboratorio de Inteligencia Artificial. Artificial Intelligence Laboratory. Universidad Politecnica de Madrid. Madrid University of Technology. e-mail: ibellido at dit.upm.es Phone: Work: .. 34 - 1 - 5495700 ext 440 Home: .. 34 - 1 - 4358739 TELEX: 47430 ETSIT E Fax: .. 34 - 1 - 5432077 ============================================================================ From pablo at cs.washington.edu Tue Nov 19 12:42:41 1991 From: pablo at cs.washington.edu (David Cohn) Date: Tue, 19 Nov 91 09:42:41 -0800 Subject: NIPS Workshop Announcement (and CFP) Message-ID: <9111191742.AA09972@june.cs.washington.edu> -------------------------------------------------------------- NIPS Workshop on Active Learning and Control Announcement (and call for participation) -------------------------------------------------------------- organizers: David Cohn, University of Washington Don Sofge, MIT AI Lab An "active" learning system is one that is not merely a passive observer of its environment, but instead play an active role in determining its inputs. This definition includes classification networks that query for values in "interesting" parts of their domain, learning systems that actively "explore" their environment, and adaptive controllers that learn how to produce control outputs to achieve a goal. Common facets of these problems include building world models in complex domains, exploring a domain to safely and efficiently, and, planning future actions based on one's model. In this workshop, our main focus will be to address key unsolved problems which may be holding up progress on these problems rather than presenting polished, finished results. Ours hopes are that unsolved problems in one field may be able to draw on insight from research in other fields. Each session of the workshop will begin with introductions to specific problems in the field by researchers in each area, with the second half of each session reserved for discussion. --------------------------------------------------------------------------- Current speakers include: Chris Atkeson, MIT AI Lab Tom Dietterich, Oregon State Univ. Michael Jordan, MIT Brain & Cognitive Sciences Michael Littman, BellCore Andrew Moore, MIT AI Lab Jurgen Schmidhuber, Univ. of Colorado, Boulder Satinder Singh, UMass Amherst Rich Sutton, GTE Sebastian Thrun, Carnegie-Mellon University A few open slots remain, so if you would be interested in discussing your "key unsolved problem" in active learning, exploration, planning or control, send email to David Cohn (pablo at cs.washington.edu) or Don Sofge (sofge at ai.mit.edu). --------------------------------------------------------------------------- Friday, 12/6, Morning Active Learning " " Afternoon Learning Control Saturday, 12/7, Morning Active Exploration " " Afternoon Planning --------------------------------------------------------------------------- From tony at aivru.sheffield.ac.uk Tue Nov 19 12:20:44 1991 From: tony at aivru.sheffield.ac.uk (Tony_Prescott) Date: Tue, 19 Nov 91 17:20:44 GMT Subject: connectionist map-building Message-ID: <9111191720.AA01072@aivru> Does anyone know the whereabouts of Martin Snaith of the Information Techonology group? He appeared on a BBC equinox program recently describing a mobile robot that navigates using a connectionist map. I would also be interested in hearing from anyone else who is using networks to generate maps for robot navigation. Thanks, Tony Prescott (AI vision research unit, Sheffield). From P.Refenes at cs.ucl.ac.uk Tue Nov 19 08:21:25 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Tue, 19 Nov 91 13:21:25 +0000 Subject: Subtractive network design Message-ID: I have just realised that the first part of my earlier message on this was garbled. The first few sentecnes read: =============================== I agree with Scott Fahlman on this point. Both techniques try to reduce the generality of a network and thus improve its generalisation. Depending on size and training times, fixed geometry networks often develop (near-) duplicate and/or (near-) reduntant functionality. Prunning techniques aim to remove this functionality from the network and they do quite well here. There are however two problems: firstly, these are not the only cases of increased functionality, and secondly, the removal of near zero connections often ignores the knock-on effects on generalisation due the accumulated influence that these connections might have. It is often conjectured that hidden unit size is the culprit for bad generalisation. This is not strictly so. The true culprit is the high degree of freedom in exploring the search space which also depends on other parameters such as training times. The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset (the fact that we are not using it for the backward pass is irrelevant). So any improvement in generalisation is probably due to the fact that we are using a larger training dataset (again the fact that we are doing manually, should not divert us). My view is that this method should be treated as a "good code of professional practise" when reporting results, rather than as a panacea. Paul Refenes From hinton at ai.toronto.edu Tue Nov 19 16:28:12 1991 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 19 Nov 1991 16:28:12 -0500 Subject: Subtractive network design In-Reply-To: Your message of Mon, 18 Nov 91 13:09:31 -0500. Message-ID: <91Nov19.162826edt.530@neuron.ai.toronto.edu> You say "The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset ... So any improvement in generalisation is probably due to the fact that we are using a larger training dataset." I think this is wrong because you only get a single number (when to stop training) from the validation set. So even if you made the validation set contain infinitely many cases, you would still be limited by the size of the original training set. Quite apart from this point, pruning techniques such as the soft-weight sharing method recently advertised on this net by Steve Nowlan and me (Pearlmutter, 1999) seem to work noticeably better than using a validation set to decide when to stop training. However, the use of a validation set is much simpler and therefore a good thing to try for people in a hurry. Geoff From tackett at ipld01.hac.com Tue Nov 19 16:59:44 1991 From: tackett at ipld01.hac.com (Walter Alden Tackett) Date: Tue, 19 Nov 91 13:59:44 PST Subject: Patents Message-ID: <9111192159.AA06000@ipld01.hac.com> > In fact, I have recently patented the usual algorithm for > multiplication of numbers in any base, I am now in the process of > patenting algorithms for addition, subtraction and division > Luis B. Almeida ....Sorry pal: these algorithms are previously published in the public domain, e.g., elementary school texts and the like, for the past couple of hundred years. And even if you did patent them, we connectionists would just get bootleg copies published in mainland China. ;-) -wt From Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU Tue Nov 19 23:10:34 1991 From: Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU (Dave_Touretzky@DST.BOLTZ.CS.CMU.EDU) Date: Tue, 19 Nov 91 23:10:34 EST Subject: making CONNECTIONISTS a moderated newsgroup Message-ID: <16991.690610234@DST.BOLTZ.CS.CMU.EDU> The CONNECTIONISTS list has grown too large to continue its current mode of operation. There are over 900 addresses in the CMU-maintained mailing list, and several dozen of those are actually redistribution points at various sites around the globe, rather than individuals. So we probably have a couple of thousand readers overall. I think it may be time to switch to a moderated newsgroup. The problems with the current mode of operation are: - Too many misdirected messages. The clueless will always be with us. But wading through dozens of subscribe, unsubscribe, and "please send me a copy of your tech report" messages has become tiresome. - Too much misuse of the list. Requests for people's mailing addresses, or for very elementary information about neural nets, are not appropriate for CONNECTIONISTS, but we keep getting them anyway. - It's too easy to start flamefests or off-topic discussions. - The load on the CMU mailer is incredible. There is a substantial delay in forwarding messages because we have to send out 900 copies of each one. Dozens of these bounce back due to temporary network outages, changes to host names, accounts being shut down at end of term, etc. - The load on the list maintainer(s) is increasing. Most of the time is now spent dealing with bounced mail messages. I propose converting CONNECTIONISTS to a moderated Usenet newsgroup. The moderators will review each message and forward only those that meet the stated criteria for appropriateness. The idea is to keep the list focused on informed technical discussion, plus relevant announcements of conferences, technical reports, and the like. Messages that the moderators deem inappropriate will be rejected. Note that there is already a Usenet newsgroup called comp.ai.neural-nets. This newsgroup is not moderated, and therefore has a very low signal to noise ratio. In other words, it's mostly junk. Messages that aren't appropriate for CONNECTIONISTS can always be sent there, where they will no doubt be eagerly read by thousands of people. For those readers who don't have Usenet access, we will continue to maintain a small mailing list here at CMU so you can continue to get CONNECTIONISTS by email. Most of you do have access to Usenet, and so the only two changes you should observe if we go ahead with this proposal are: (1) the signal to noise ratio will be greatly improved, and (2) you will have to use your system's newsreading software rather than email software to read and/or post to CONNECTIONISTS. **************** We are soliciting comments on this proposed change. Please send them to Connectionists-Request at cs.cmu.edu, where they will be collected by the list maintainer. Don't bother sending brief "yes" votes; we expect that most people already support this plan. But if you want to argue against the plan, or raise technical issues we may have overlooked, then by all means send us your comments. * Do not send them to me directly. Send them to Connectionists-Request. * Do not reply to the whole CONNECTIONISTS list... or else! -- Dave Touretzky and Hank Wan From thodberg at NN.MEATRE.DK Wed Nov 20 06:14:16 1991 From: thodberg at NN.MEATRE.DK (Hans Henrik Thodberg) Date: Wed, 20 Nov 91 12:14:16 +0100 Subject: Subtractive network design Message-ID: <9111201114.AA03016@nn.meatre.dk.meatre.dk> Scott_Fahlman writes (Mon Nov 18) "A subtractive scheme can also lead to a network of about the right complexity, and you cite a couple of excellent studies that demonstrate this. But I don't see why these should be better than additive methods" Well, I agree that what is needed is comparative studies of additive and subtractive methods, so if anybody out there has this, please post it! Meanwhile, I believe that one can get some understanding by appealing to pictures and analogies: My general picture of the mechanisms of a subtractive network design is the following: A network which has learned the training data and is too large is still rather uncontrained. The network is flexible towards rearranging its internal representations in response to some external pressure. This "poly- morphic soup" is now subjected to the pruning. My favourite pruning technique is brute and efficient (but also time-consuming). It removes one connection at a time tentatively. If the error after some retrinign is no worse than before (apart from a small allowable error increase), the connection is considered pruned. Otherwise the network state prior to the removal is reestablished. This gradually forces the network to collaps into simpler networks. It is like an annealing process. By approaching the minimal solution from "above", i.e. from the more complicated networks, one is more likely to find the optimal network, since one is guided by the hopefully wide basin of attraction. Since the basin is not covering everything, one must train and prune with new initial weights/topology (see Int. Journ. Neur. Syst. for more details). A additive method does not have this nice pool of resources cooperating in a plastic manner. Suppose that you were to develop the first car in the world by additive methods. Adding one wheel at a time would not lead you to the Honda Civic, because a one- or two-wheeled Civic would be as bad as a zero-wheeled. However a twenty-wheeled polymorphic monster-car could be pruned to a Civic. Another analogy to subtractive methods is the brainstorming. Out of a wild discussion, where many complicated ideas are flying through the room, can suddenly emerge a simple and beatiful solution to the problem. The additive approach would correspond to a strict analytical incremental thought process. I view the reluctance towards subtractive methods as part of the old discussion between AI and connectionism. We (certainly in Denmark) were brought up with LEGO-bricks, learning that everything can be contructed from its parts. We are not used to projecting solutions out of chaos or complexity. We like to be in control, and it seems like a waste to through away part of your model. ------------------------------------------------------------------ Hans Henrik Thodberg Email: thodberg at nn.meatre.dk Danish Meat Research Institute Phone: (+45) 42 36 12 00 Maglegaardsvej 2, Postboks 57 Fax: (+45) 42 36 48 36 DK-4000 Roskilde, Denmark ------------------------------------------------------------------ From P.Refenes at cs.ucl.ac.uk Wed Nov 20 06:13:34 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Wed, 20 Nov 91 11:13:34 +0000 Subject: Subtractive network design In-Reply-To: Your message of "Tue, 19 Nov 91 16:28:12 EST." <91Nov19.162826edt.530@neuron.ai.toronto.edu> Message-ID: You point out (quite correctly) that the validation set only gives a single number. Now, suppose we have a dataset of k training vectors. We divide this dataset into two subsets (N, M) of sizes n, m such that n+m=k. We use the first subset as the training set, and the second subset as the validation set. The only difference between N and M is that N is used during both passes whilst M is only used during the forward pass. My argument is that if we used M for both passes we would still get a better generalisation anyway because we have more points from which to approximate the polynomial, and more constraints to satisfy. The only case in which this is not true is when N is already sufficiently large (and representative) but this is hardly ever the case in practise. You also say: > I think this is wrong because you only get a single number > (when to stop training) from the validation set. So even if > you made the validation contain infinitely many cases, you > would still be limited by the size of the original training > set. My conjecture is that if you used these "infinitely many cases", for both passes (starting with a small network and increasing it gradually until convergence) you would get equally good, and perhaps better generalisation. Paul From frederic at neuretp.biol.ruu.nl Wed Nov 20 08:45:31 1991 From: frederic at neuretp.biol.ruu.nl (frederic@neuretp.biol.ruu.nl) Date: Wed, 20 Nov 91 14:45:31 +0100 Subject: Patents Message-ID: <911120.144531.1852@neuretp.biol.ruu.nl> > > > > I'm not sure if Luis B. Almeida's message is just a joke or something >else, in that case,m I think I can patent the weel. This is one of the most >used things in the world, and not only by computer scientist, but by >everybody, so.... I must chec first if it is patented and in that case, I can >act... > Anyway, I don't know about laws, but I think that nobody should be >able to patent something (algorithm or not) previously used by other people, >who didn't patented it. > > Ignacio Bellido I am glad that someone else thinks the same thing. It is either a joke, or something strange is going on. As I remember what I have heard about patent law, you cannot patent something that has been in the so called 'public domain', i.e. in general public usage (at least in the US). It must be something that is new and original, as well as useful, or a useful extension of a previously patented idea. There is also a condition that it not be 'obvious' to an expert in the field (which is a bit fuzzy, I think). Since the algorithms being refered to Almeida's message are in general 'public' use, I don't think that they would pass inspection of the patent department. If Mr. Almeida would clarify why the above reasoning is wrong, or the conditions for patentability are not what I remember, I would be grateful. >============================================================================ >Ignacio Bellido Fernandez-Montes, >============================================================================ Eric Fredericksen frederic at cs.unc.edu Dept. of Computer Science, UNC-Chapel Hill frederic at neuretp.biol.ruu Dept. of Neuroethology, Rijksuniveriteit Utrecht, The Netherlands From lba at sara.inesc.pt Wed Nov 20 13:18:23 1991 From: lba at sara.inesc.pt (Luis B. Almeida) Date: Wed, 20 Nov 91 17:18:23 -0100 Subject: Patents Message-ID: <9111201818.AA29024@sara.inesc.pt> From at risc.ua.edu Wed Nov 20 14:47:35 1991 From: at risc.ua.edu (@risc.ua.edu) Date: Wed, 20 Nov 91 13:47:35 CST Subject: GA VFSR paper Message-ID: <9111201947.AA01862@galab2.mh.ua.edu> (the following letter was sent to B. Rosen and L. Ingber, but I dropped a copy to connectionist as well, since I felt it might be of interest) Hi, I just finished reading your interesting paper on GA vs. VFSR. You may be interested in a technical report I distribute: Goldberg, D. E. (1990).A note on Boltzmann tournament selection for genetic algorithms and population-oriented simulated annealing (TCGA Report No. 90003). This paper deals with the combination of a SA-like selection procedure with a GA. Although the implementation is rather rough, the idea is provocative. Ultimately, SA is what GA researchers would view as a selection algorithm that should be able to compliment, rather than compete with GAs. This is an interesting area for future research, although I've never been able to get around to experimenting with these ideas. I'd be glad to send you a copy of the report. If it happens to spark any ideas, I'd love to get in on them. BTW, the GA selection scheme you use (roulette wheel selection) is known to be very noisy, and is not generally used in modern GAs. See: %&&##$$ @article{Baker:87, author = "Baker, J. E.", year = "1987", title = "Reducing Bias and Ineffiency in the Selection Algorithms", journal = "Proceedings of the Second International Conference on Genetic Algorithms", pages = "14--21"} Goldberg, D. E., & Deb, K. (1990). A comparative analysis of selection schemes used in genetic algorithms (TCGA Report No. 90007). It would also be interesting to examine more problems, since even DeJong has criticized the concentration of the GA community on his test suite. It would probably be good to consider problems that are constructed with various degrees of deception. See the following papers: Goldberg, D. E. (1988a). Genetic algorithms and Walsh functions: Part I, a gentle introduction (TCGA Report No. 88006). Goldberg, D. E. (1989). Genetic algorithms and Walsh functions: Part II, deception and its analysis (TCGA Report No. 89001). Goldberg, D. E. (1986). Simple genetic algorithms and the minimal, deceptive problem (TCGA Report No. 86003). Take Care, Rob Smith. ------------------------------------------- Robert Elliott Smith Department of Engineering of Mechanics Room 210 Hardaway Hall The University of Alabama Box 870278 Tuscaloosa, Alabama 35487 <> @ua1ix.ua.edu:rob at galab2.mh.ua.edu <> (205) 348-1618 <> (205) 348-8573 ------------------------------------------- From lss at compsci.stirling.ac.uk Wed Nov 20 07:01:29 1991 From: lss at compsci.stirling.ac.uk (Dr L S Smith (Staff)) Date: 20 Nov 91 12:01:29 GMT (Wed) Subject: No subject Message-ID: <9111201201.AA18333@uk.ac.stir.cs.tugrik> Subject: Flames on patents. Can I suggest that what will happen if people patent known and published algorithms is that patenting will simply fall into disrepute. Companies and private individuals will ignore patent law altogether. And the ONLY result will be even more joy for lawyers. I apologise for using bandwidth on this irrelevance. --Leslie Smith From rr at cstr.edinburgh.ac.uk Wed Nov 20 14:10:46 1991 From: rr at cstr.edinburgh.ac.uk (Richard Rohwer) Date: Wed, 20 Nov 91 19:10:46 GMT Subject: Patents Message-ID: <5394.9111201910@cstr.ed.ac.uk> Connectionists is probably not an appropriate forum for flaming about software patents, but I think that many connectionists feel strongly about it, so I would like to suggest a different forum, especially to people living in EEC countries. This is the European League for Programming Freedom list (contact elpf-request at castle.ed.ac.uk). This list is not meant for massive flaming, actually, but is intended for discussion and coordination of letter-writing and press campaigns. You can take positive, useful action even if you can only spare an hour or two to write a few letters. Right now this group is mainly working to soften the implementation by each EEC government of an EEC directive which threatens to pave the way for "Look and Feel" suits in Europe similar to those in the US. Richard Stallman (of GNU fame, rms at edu.mit.ai.gnu) is an important influence in this campaign. He probably can put Americans in touch with similar groups there. Richard Rohwer From thildebr at athos.csee.lehigh.edu Thu Nov 21 09:19:48 1991 From: thildebr at athos.csee.lehigh.edu (Thomas H. Hildebrandt ) Date: Thu, 21 Nov 91 09:19:48 -0500 Subject: Patents In-Reply-To: "Luis B. Almeida"'s message of Wed, 20 Nov 91 17:18:23 -0100 <9111201818.AA29024@sara.inesc.pt> Message-ID: <9111211419.AA12296@athos.csee.lehigh.edu> Dr. Almeida: My hat is off to you for your extremely dry wit! A good measure of wryness is the number of people you fool completely, of which those who were bold enough to post to the net is probably only a small number. I myself was fooled while reading the first 3 or 4 lines. . . . I thought about posting a message saying that I had applied for a patent on the process of constructing a neuron -- with a sufficient admixture of legal and neurobiological jargon to sound convincing. But in light of the recent (orthogonal) discussion regarding the nature of postings which are acceptable on CONNECTIONISTS, I was forced to reconsider. Even so, I would only have been emulating a master. Bravo! Thomas H. Hildebrandt From sun at umiacs.UMD.EDU Thu Nov 21 10:59:52 1991 From: sun at umiacs.UMD.EDU (Guo-Zheng Sun) Date: Thu, 21 Nov 91 10:59:52 -0500 Subject: con etiquete Message-ID: <9111211559.AA28328@neudec.umiacs.UMD.EDU> I agree with Ken that this network should not be a place to broadcast private conversations. Guo-Zheng Sun From shavlik at cs.wisc.edu Thu Nov 21 15:19:02 1991 From: shavlik at cs.wisc.edu (Jude Shavlik) Date: Thu, 21 Nov 91 14:19:02 -0600 Subject: validation sets Message-ID: <9111212019.AA07223@steves.cs.wisc.edu> The question of whether or not validation sets are useful can easily be answered, at least on specific datasets. We have run that experiment and found that devoting some training examples to validation is useful (ie, training on N examples does worse than training on N-k and validating on k). This same issue comes up with decision-tree learners (where the validation set is often called a "tuning set", as it is used to prune the decision tree). I believe there people have also found it is useful to devote some examples to pruning/validating. I think there is also an important point about "proper" experimental methodology lurking in the discussion. If one is using N examples for weight adjustment (or whatever kind of learning one is doing) and also use k examples for selecting among possible final answers, one should report that their testset accuracy resulted from N+k training examples. Here's a brief argument for counting the validation examples just like "regular" ones. Let N=0 and k=. Randomly guess some very large number of answers, return the answer that does best on the validation set. Most likely the answer returned will do well on the testset (and all we ever got from the tuning set was a single number). Certainly our algorithm didnt learn from zero examples! Jude Shavlik University of Wisconsin shavlik at cs.wisc.edu From lissie!botsec7!botsec1!dcl at uunet.UU.NET Thu Nov 21 14:58:26 1991 From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert) Date: Thu, 21 Nov 91 14:58:26 EST Subject: best output representation? Message-ID: <9111211958.AA11111@botsec1.bot.COM> Hi all. Let's say you have a classification problem that can be answered with one of three mutually exclusive alternatives (eg, yes, no, don't care) Of the following output representations: 1). One output value: answer output --------------- +1 +1 (yes) 0 0 (don't care) -1 -1 (no) 2). Two output values: answer output1 output2 ------------------------ +1 +1 -1 0 -1 -1 -1 -1 +1 3). Three output values: answer output1 output2 output3 --------------------------------- +1 +1 -1 -1 0 -1 -1 +1 -1 -1 +1 -1 which is best, and for what reason? Thanks. David Lambert dcl at botsec7.panix.com (best) dcl at object.com dcl at panix.com From MCCAINKW at DUVM.OCS.DREXEL.EDU Fri Nov 22 07:55:25 1991 From: MCCAINKW at DUVM.OCS.DREXEL.EDU (kate McCain) Date: Fri, 22 Nov 91 08:55:25 EDT Subject: BBS Message-ID: I am trying to locate a neural networks related computer conference. Have I reached one?? I have a "meta-interest" in neural networks research -- stemming from my current research devoted to understanding the formal and informal communication structure, subject diversity, etc. in the field. One of the goals of our research is an understanding of the information needs and access problems faced by NN researchers. Contact with a wider range of participants that I have access to in Philadelphia would be valuable. Kate McCain Associate Professor College of Information Studies Drexel University Philadelphia, PA 19104 From kirk at watson.ibm.com Fri Nov 22 09:17:42 1991 From: kirk at watson.ibm.com (Scott Kirkpatrick) Date: Fri, 22 Nov 91 09:17:42 EST Subject: NIPS ticket (LGA-DEN-LGA) Message-ID: A change in travel plans is forcing me to discard a perfectly good, but non-refundable ticket to NIPS. Can anybody use it? 12/2 UA 343 lv LGA 840 am ar DEN 1107 am (in time for two tutorials) 12/8 UA 166 lv DEN 629 pm ar LGA 1159 pm This cost me $260. From tap at ai.toronto.edu Fri Nov 22 14:18:26 1991 From: tap at ai.toronto.edu (Tony Plate) Date: Fri, 22 Nov 1991 14:18:26 -0500 Subject: Flames on patents. In-Reply-To: Your message of Wed, 20 Nov 91 07:01:29 -0500. <9111201201.AA18333@uk.ac.stir.cs.tugrik> Message-ID: <91Nov22.141837edt.569@neuron.ai.toronto.edu> Some people appear to be concerned that known and published algorithms will be patented. This concern is mostly misplaced. There are 3 requirements that an invention must satisfy in order to be patentable: (1) novelty (the invention must be new) (2) utility (must be useful) (3) non-obviousness (to a person versed in the appropriate art) A patent can be invalidated by the existence of "prior art". Any version of an algorithm, used or published before the date of the patent application, constitutes "prior art". A problem with proving the existence of prior art via implementation is that computer programs get deleted, espcially old ones that ran on systems no longer in existence. With published algorithms there is no such problem. Requirement (3) is particularly contentious in the field of patents on algorithms. Someone writes: > I apologise for using bandwidth on this irrelevance. I apologise too, and suggest that people interested in this issue use the newgroup "comp.patents" for further discussion. There has been much information posted in this newsgroup, including lists of software patents, and requests for examples of prior art. Tony Plate From siegel-micah at CS.YALE.EDU Fri Nov 22 16:33:07 1991 From: siegel-micah at CS.YALE.EDU (Micah Siegel) Date: Fri, 22 Nov 91 16:33:07 EST Subject: Analog VLSI mailing list Message-ID: <9111222133.AA18480@SUNED.ZOO.CS.YALE.EDU> *** Please DO NOT forward to other newsgroups or mailing lists *** ANNOUNCING the genesis of a mailing list devoted to the study of analog VLSI and neural networks. Relevant topics will include the instantiation of neural systems and other collective computations in silicon, analog VLSI design issues, analog VLSI design tools, Tech report announcements, etc... The analog-vlsi-nn mailing list has been created in conjunction with Yale University and its planned Center for Theoretical and Applied Neuroscience (CTAN). Please send subscription requests to analog-vlsi-nn-request at cs.yale.edu. To limit the analog-vlsi-nn mailing list to active researchers in the field, the following information must be provided with subscription requests: 1) Full name; 2) Email address; 3) Institutional affiliation; 4) One sentence summary of current research interests. Please direct mail to the appropriate address. Mailing list submissions (only): analog-vlsi-nn at cs.yale.edu Administrative requests/concerns: analog-vlsi-nn-request at cs.yale.edu --Micah Siegel Analog VLSI NN Moderator ======================================================================= Micah Siegel "for life's not a paragraph siegel at cs.yale.edu Yale University And death i think is no parenthesis" e.e.cummings ======================================================================= From zl at guinness.ias.edu Fri Nov 22 16:56:17 1991 From: zl at guinness.ias.edu (Zhaoping Li) Date: Fri, 22 Nov 91 16:56:17 EST Subject: No subject Message-ID: <9111222156.AA12609@guinness.ias.edu> POSTDOCTOROAL POSITIONS IN COMPUTATIONAL NEUROSCIENCE AT ROCKEFELLER UNIVERSITY, NEW YORK We anticipate the opening of one or two positions in computational neuroscience at Rockefeller University. The positions are at the postdoctoroal level for one year, starting in September 1992, with the possibility of renewal for a second year. Interested applicants should send a CV including a statement of their current research interests, and arrange for three letters of recommendation to be sent as soon as possible directly to Prof. Joseph Atick, Institute for Advanced Study, Princeton, NJ 08540. (Note that applications are to be sent to the Princeton address.) From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Nov 22 21:17:46 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 22 Nov 91 21:17:46 EST Subject: best output representation? In-Reply-To: Your message of Thu, 21 Nov 91 14:58:26 -0500. <9111211958.AA11111@botsec1.bot.COM> Message-ID: Hi all. Let's say you have a classification problem that can be answered with one of three mutually exclusive alternatives (eg, yes, no, don't care) Of the following output representations: ... which is best, and for what reason? I would say that the correct answer is "none of the above". What you want to do is use a single output, with +1 meaning "yes" and -1 meaning "no". Take all the "don't care" cases out of the training set, because you don't care what answer they give you. (And you may as well take them out of the test set as well, since they will always be correct.) This will make training faster because (a) the training set is smaller and (b) it allows the net to concentrate all its resources on getting the "do care" cases right. -- Scott Fahlman From zl at guinness.ias.edu Sat Nov 23 14:46:38 1991 From: zl at guinness.ias.edu (Zhaoping Li) Date: Sat, 23 Nov 91 14:46:38 EST Subject: No subject Message-ID: <9111231946.AA13273@guinness.ias.edu> POSTDOCTOROAL POSITIONS IN COMPUTATIONAL NEUROSCIENCE AT ROCKEFELLER UNIVERSITY, NEW YORK We anticipate the opening of one or two positions in computational neuroscience at Rockefeller University. The positions are at the postdoctoroal level for one year, starting in September 1992, with the possibility of renewal for a second year. Interested applicants should send a CV including a statement of their current research interests, and arrange for three letters of recommendation to be sent as soon as possible directly to Prof. Joseph Atick, Institute for Advanced Study, Princeton, NJ 08540. (Note that applications are to be sent to the Princeton address.) From gmk%idacrd at UUNET.UU.NET Sat Nov 23 16:58:27 1991 From: gmk%idacrd at UUNET.UU.NET (Gary M. Kuhn) Date: Sat, 23 Nov 91 16:58:27 EST Subject: Copenhagen: 1992 IEEE Workshop on NN for SP Message-ID: <9111232158.AA05282@> 1992 IEEE Workshop on Neural Networks for Signal Processing. August 31 - September 2, 1992 Copenhagen, Denmark In cooperation with the IEEE Signal Processing Society and sponsored by the Computational Neural Network Center (CONNECT) CALL FOR PAPERS The second of a series of IEEE workshops on Neural Networks for Signal Processing, the first of which was held in Princeton in October 1991, will be held in Copenhagen, Denmark, in August 1992. Papers are solicited for technical sessions on the following topics: System Identification and Spectral Estimation by Neural Networks. Non-linear Filtering by Neural Networks. Pattern Learning Theory and Algorithms. Application-Driven Neural Models. Application to Image Processing and Pattern Recognition. Application to Speech Recognition, Coding and Enhancement. Application to Adaptive Array Processing. Digital/Analog Systems for Signal Processing. Prospective authors are invited to submit 4 copies of extended summaries of no more than 5 pages. The top of the first page of the summary should include a title, authors' names, affiliations, address, telephone and fax numbers, and email address if any. Photo-ready full papers of accepted proposals will be published in a hard bound book by IEEE. General chairs S.Y. Kung Frank Fallside Department of Electrical Engineering Engineering Department Princeton University Cambridge University Princeton, NJ 08544, USA Cambridge CB2 1PZ, UK email: kung at princeton.edu email: fallside at eng.cam.ac.uk Program chair Proceedings Chair John Aasted Sorensen Candace Kamm Electronics Institute, Bldg. 349 Box 1910 Technical University of Denmark Bellcore, 445 South St., Rm. 2E-256 DK-2800 Lyngby, Denmark Morristown, NJ 07960-1910, USA email: jaas at dthei.ei.dth.dk email: cak at thumper.bellcore.com Program Committee Ronald de Beer Jeng-Neng Hwang John E. Moody John Bridle Yu Hen Hu Carsten Peterson Erik Bruun B.H. Juang Sathyanarayan S. Rao Poul Dalsgaard S. Katagiri Peter Salamon Lee Giles Teuvo Kohonen Christian J. Wellekens Lars Kai Hansen Gary M. Kuhn Barbara Yoon Steffen Duus Hansen Benny Lautrup John Hertz Peter Koefoed Moeller Paper submissions and further information: Program Chair Tel: +4545931222 ext. 3895, Fax: +4542880117 Submission of extended summary February 15, l992 Notification of acceptance April 20, l992 Submission of photo-ready paper May 20, l992 From sg at corwin.CCS.Northeastern.EDU Sat Nov 23 17:01:28 1991 From: sg at corwin.CCS.Northeastern.EDU (steve gallant) Date: Sat, 23 Nov 91 17:01:28 -0500 Subject: Patent Fallacies Message-ID: <9111232201.AA02257@corwin.CCS.Northeastern.EDU> The issue of patents seems to have struck somewhat of a raw nerve, so I submit the following list of common fallacies about patents. (Warning: I am not a patent lawyer, so you should consult one before accepting any of the following as legal gospel.) 1. You can patent anything not yet patented / not in the public domain. A patent must be non-obvious to those "skilled in the art" at the time in question. It is also the responsibility of the applicant to submit all known applicable "prior art" regarding the proposed patent. In other words things that most of us know how to do are not properly patentable, regardless of whether they are in the "public domain" (a technical term). 2. Patents prevent free exchange of information. Patents are designed to ENCOURAGE free exchange of information. The basic deal is that if you teach the world something useful, you will be given a large amount of control over the usage of your invention for a limited time, after which everybody will be able to freely use it. It is important to consider the alternative to having patents, namely trade secrets. Anybody opposed to patents on principle should be able to say why they prefer having the information be a trade secret, with no knowledge/access by anybody outside the organization. 3. Patents hinder research. Everybody immediately has knowledge of patented information; trade secrets remain secret. I believe that a recent Supreme Court decision has ruled that patents cannot be used to prevent basic research.(?) 4. You cannot talk about your invention before filing a patent. For US patents, you can file up to 1 year after disclosing your invention. This rule does not apply to foreign patents, but they are so expensive and such a hassle that you should be especially careful (and have very deep pockets) before going down that path. Thus you can tell the world about your method, and still have a year to file for a US patent. 5. Patents make money. The majority of patents granted do not result in the inventor making back legal costs and filing fees. An application that is not granted is a clear loss. 6. Patents favor big corporations. This is a debatable point. If there were no patent protection, anybody who invented something would be giving that invention to whoever wanted to use it -- in many cases, only big corporations would profit from this. On the other hand, patents give the individual researcher some compensation for, and control over, his or her invention. (This has been very useful in my case.) 7. Software / algorithm (process) patents are different than other patents. Another debatable point. If one can get a patent on mixing chemical A with chemical B to make AB, a good fertilizer, how is this different than adding number A to number B + to factor number X more quickly than had been previously possible? It is hard to come up with an issue that applies to patenting software that does not apply to other types of patents. Of course there are some good arguments against software / algorithm (process) patents. It does seem to be true that the patent office is getting over their heads with a lot of this, and therefore letting things slip through that should not be allowed patents. However, this problem is also not unique to software. The above list reflects dozens of hours of working with patent lawyers, but the reader is again cautioned that I am not a patent lawyer. The first rule you should follow when considering patenting something is to consult a patent lawyer. By the way, they tend to be interesting people, with the challenging job of taking very technical information in a variety of fields, understanding it, and turning it into legal-speak. (Patent examiners have even more challenging jobs, the most famous one having been Albert Einstein.) Steve Gallant From ken at cns.caltech.edu Sat Nov 23 22:59:48 1991 From: ken at cns.caltech.edu (Ken Miller) Date: Sat, 23 Nov 91 19:59:48 PST Subject: con etiquete: the struggle continues (or: conetiquete vs. tek sass) Message-ID: <9111240359.AA00211@cns.caltech.edu> A quick summary of some of the responses I've received to my con etiquete note. Though Dave Touretzky's note about making this a moderated newsgroup partially supersedes this, the question remains as to what will be the criteria for acceptable discussions there or here. I got 9 notes saying "thank you/I agree", 1 saying "I disagree with almost everything you say". Other notes brought up new issues, as follows: Four people (dave at cogsci.indiana.edu,tgelder at phil.indiana.edu, thildebr at athos.csee.lehigh.edu,rick at csufres.CSUFresno.EDU), at least two themselves philosophers, opposed my "technical not philosophical" distinction. I responded Maybe the dichotomy philosophical vs. technical was the wrong one. How would you all feel if I had instead made the dichotomy one between speculation (or opinion) and knowledge? Anyone can have their opinion, but after it's expressed nothing has been changed. Nature alone knows the outcome. Whereas an opinion based on real experience or analysis (and analysis I presume could include hard philosophical analysis) is quite worthwhile. So I am talking about setting the threshold much higher between speculation and knowledge, the former being raw and the latter distilled. I should have known that wouldn't make it past the philosophers: Knowledge is interesting, but often too much distilled to carry the insight of the imparter. I think the real distinction we are seeking is the dichotomy between unsupported hypotheses and supported ones. ... I think what you are getting at is that we have a built-in credibility meter, and that its value for some recent postings to the net has been incredibly (pun intended) low. You would like to ask people to apply their own judgement to potential postings (as to how well the idea is supported) before taxing our credibility filters with the same task. thildebr at athos.csee.lehigh.edu The distinction between speculation or opinion and knowledge is not a pragmatically useful one. How do you know whether your opinions amount to "knowledge" and are therefore legitimately expressible on the network? ... As far as I can see, the only relevant distinction in the vicinity is that between carefully thought-out, well-grounded opinion vs raw, crudely-formed, not-well-grounded opinion. If what you really want to say is that we only want the former on the network, then I am in agreement. tgelder at phil.indiana.edu Scott Fahlman (Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU) "certainly agree(s) with setting the threshold higher", but emphasizes the importance to him of opinions of other practitioners: I think that if someone who has thought hard about an issue is willing to share their opinions, that's about the most valuable stuff that goes by. We can get the facts from published work, but in a field like this, opinion is all we have in many cases. The technology of neural nets is more art than science, and it's largely an unwritten art. And sometimes the conventional opinion is wrong or incomplete. If we can expose some of these opinions, and someone can show that they are wrong (or offer good arguments against them), then we all learn something useful. Maybe what we learn is that opinion is divided and it's time to go do some experiments, but that's a form of knowledge as well. What I'd like to ban or strongly discourage are (a) statements of opinion by people who have not yet earned the right to an opinion, (b) any message composed in less than 30 minutes per screenful, and (c) iteration to the point where people end up repeating themselves, just to get the last word in. I don't want to leave your message out there without any rebuttal, because if people were to take it as a consensus statement of etiquette, we'd lose a lot of the discussion that I feel is quite valuable. If there really is widespread support for your position, then the right move is to split the list into an announcement version and a discussion version. ashley at spectrum.cs.unsw.OZ.AU (Ashley Aitken) seems to take a similar position with some different arguments: I understand your point of view with regard "conversation" and "philosophy" but strongly disagree with raising the threshold too much. Let met tell you my situation. I am a graduate student at the University of New South Wales in Sydney, Australia. I am researching biological neural network models of the cortex. Unfortunately, there is no-one else really interested in anything like this in Australia (that I know of). ... Connectionists can give me the chance to listen in to "conversations" between some of the leading researchers in the field. This to me is extremely valuable - it provides an idea of where the research is, where it is going, and also provides a great deal of motivation. I don't think I could keep in touch with the research without it - connectionists is, in a way, my research group*. Sure we don't want to get into silly "philosophical" discussions which lead nowhere (like the ones that appear regularly in the news groups). However, there is a thin line between philosophy of today and research areas and theories of tomorrow. So: I would say there is general but not complete agreement that (1) the threshold has been too low lately, and (2) postings should be well-grounded and supported. The main disagreement seems to be how much room that leaves for experienced people to be offering their otherwise unsupported opinion, or contrariwise to what extent people --- including experienced people --- should restrict themselves to points on which they have specific analysis or experience/experiment to offer. We probably can't resolve these issues except by fiat of the list administrators. I hope that raising the issues, and assuming that everyone will think carefully about them before posting, will increase the signal to noise ratio. On the positive side, *nobody* disagreed that dialogues should take place off the net. Ken From harnad at Princeton.EDU Sun Nov 24 01:07:37 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 24 Nov 91 01:07:37 EST Subject: Discussion I: Reading & Neural Nets Message-ID: <9111240607.AA10474@clarity.Princeton.EDU> Here is the first of two exchanges concerning the Target Article on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4 (retrievable by anonymous ftp from directory pub/harnad on princeton.edu). Further commentary is invited. All contributions will be refereed. Please submit to psyc at pucc.bitnet or psyc at pucc.princeton.edu -- NOT TO THIS LIST. PSYCOLOQUY V2 #9 (2.9.3 Commentary / Coltheart, Skoyles: 345 lines) PSYCOLOQUY ISSN 1055-0143 Sun, 24 Nov 91 Volume 2 : Issue 9.3 2.9.3.1 Commentary on Connectionism, Reading.... / Coltheart 2.9.3.2 Reply to Coltheart / Skoyles ---------------------------------------------------------------------- From: max.coltheart at mrc-applied-psychology.cambridge.ac.uk Subject: 2.9.3.1 Commentary on Connectionism, reading.... / Coltheart Connectionist modeling of human language processing: The case of reading (Commentary on Skoyles Connectionism, Reading and the Limits of Cognition PSYCOLOQUY 2.8.4 1991) Max Coltheart School of Behavioural Sciences Macquarie University Sydney, NSW, Australia max.coltheart at mrc-applied-psychology.cambridge.ac.uk Skoyles (1991) wrote in his Rationale (paragraph 8): "Connectionism shows that nonword reading can be done purely by processes trained on real words without the use of special grapheme-phoneme translation processes." This is not the case. The connectionist model in question, that of Seidenberg & McClelland (1989), reads nonwords very poorly after being trained on words. Besner, Twilley, McCann and Seergobin (1990) tested its reading of various sets of nonwords. The trained model got 51%, 59% and 65% correct; people get around 90%. The Seidenberg and McClelland paper itself does not report what rate of correct reading of nonwords the model can achieve. Skoyles also writes (paragraph 3): "Connectionist (PDP) neural network simulations of reading successfully explain many experimental facts found about word recognition (Seidenberg & McClelland, 1989)" I would like to see a list of facts about reading that the PDP model can explain; even more, I would like to see a list of facts about reading that the traditional non-PDP dual-route model (which uses rules and local representations) cannot explain but which the PDP model can. Here is a list of facts which are all discussed in the Seidenberg & McClelland paper, which can be explained by the dual-route model, but which cannot be explained by the PDP model: 1. People are very accurate at reading aloud pronounceable nonwords. This is done by using grapheme-phoneme rules in a dual-route model. As I've already mentioned, the PDP model is not accurate at reading nonwords aloud, so cannot explain why people are. 2. People are very accurate at deciding whether or not a pronounceable letter string is a real word (lexical decision task). The PDP model is very inaccurate at this: In the paper by Besner et al (1990), it is shown that the model achieves a correct detection rate of about 6% (typical of people) at the expense of a false alarm rate of over 80% (not typical of people). So the PDP model cannot explain why people are so accurate at lexical decision. 3. After brain damage in some people reading is affected in the following way: nonword reading is still normal, but many exception words, even quite common ones, are wrongly read. In addition, the erroneous responses are the ones that would be predicted from applying spelling-sound rules (e.g. reading PINT as if it rhymed with "mint"). This is surface dyslexia; two of the clearest cases are patients MP (Bub, Cancelliere and Kertesz, 1985) and KT (McCarthy and Warrington, 1986). According to the dual-route explanation the lexical route for reading is damaged but the nonlexical (rule-based) route intact. Attempts have been made to simulate this by damaging the trained PDP model (e.g., by deleting hidden units). These attempts have not succeeded. It seems highly unlikely that they ever will succeed: Since the damaged patients are about 95% right at reading nonwords, and the intact model gets only around 60% right, is it likely that any form of "lesion" to the model will make it much BETTER at reading nonwords? 4. After brain damage in some people reading is affected in the following way: Word reading is still good, but nonword reading is very bad. This is phonological dyslexia. A clear case is that of Funnell (1983); her patient could not read any nonwords at all, but achieved scores of around 90% correct in tests of word reading. The dual-route explanation would be that there was abolition of the nonlexical route and sparing of the lexical route. Seidenberg & McClelland appeal to a way (not implemented in their model) of reading from orthography through meaning to phonology. This would of course fail for a meaningless letter string, so anyone reading solely by such a route would be able to read words but not nonwords. The explanation fails, however, because in the case of phonological dyslexia referred to above (Funnell, 1983), the patient also had a semantic impairment and would have shown semantic confusions in reading aloud if he had been reading semantically. He did not make such confusions. Therefore Seidenberg and McClelland's reconciliation of phonological dyslexia with their model cannot be correct. Pinker and Prince (1988) argued that any model which eschews explicit rules and local (word or morpheme) representations would fail to explain the data on children's learning of past tenses. I argue that any model which eschews explicit rules and local (word or morpheme) representations will fail to explain the data on adult skilled reading. NETtalk (Sejnowski and Rosenberg, 1986) might be offered as a counterexample to my claim, but it will not serve this purpose. First, Sejnowski and Rosenberg explicitly state that NETtalk is not meant to be a model of any human cognitive process. Second, perhaps the major computational problem in reading nonwords aloud - coping with the fact that the mapping of letters to phonemes is often many-to-one, so that the words AT, ATE, ACHE and EIGHT all have just two phonemes - is not dealt with by NETtalk. The input upon which the network operates is precoded by hand in such a way that there is always a one-to-one mapping of orthographic symbol to phoneme; so NETtalk does not have to try to solve this problem. 5. References Besner, D., Twilley, L., McCann, R.S. and Seergobin, K. (1990) On the association between connectionism and data: are a few words necessary? Psychological Review, 97, 432-446. Bub, D., Cancelliere, A. and Kertesz, A. (1985) Whole-word and analytic translation of spelling to sound in a non-semantic reader. In Patterson, K., Marshall, J.C. and Coltheart, M. (eds) (1985) Surface Dyslexia: Cognitive and Neuropsychological Studies of Phonological Reading. London: Lawrence ERlbaum Associates Ltd. Funnell, E. (1983) Phonological processes in reading: New evidence from acquired dyslexia. British Journal of Psychology, 74, 159-180. McCarthy, R. and Warrington, E.K. (1986) Phonological reading: phenomena and paradoxes. Cortex, 22, 359-380. Pinker, S. and Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed model of language acquisition. Cognition, 28, 73-194. Seidenberg, M. S. and McClelland, J. l. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568. Sejnowski, T.J. and Rosenberg, C.R. (1986) NETtalk: A parallel network that learns to read aloud (EE and CS Technical Report No. JHU/EECS-86/01). Baltimore, Maryland: Johns Hopkins University. Skoyles J. (1991) Connectionism, Reading and the Limits of Cognition. PSYCOLOQUY 2.8.4. ---------------------------------------------------------------------- From: John R Skoyles Subject: 2.9.3.2 Reply to Coltheart / Skoyles The Success of PDP and the Dual Route Model: Time to Rethink the Phonological Route (Reply to Coltheart) John R. Skoyles Department of Psychology University College London London WC1E 6BT ucjtprs at ucl.ac.uk Max Coltheart makes a good defense of the dual route model (the view that there are separate phonological and nonphonological ways of recognising written words) but he appears to overlook the fact that I am attempting to do the same thing: to defend the existence of more than one route in reading. I am going about this in a completely different manner, however, and Coltheart does not seem not to have spotted either how I do this or the degree to which we agree. My strategy is to take the main alternative to the dual route model -- PDP (a single "route" connectionist [network] account of phonological and nonphonological word recognition) and show that even if PDP is as good as its advocates claim, it is incomplete and needs a separate reading mechanism to come into existence. The problem is that the reading abilities of PDP models need to be tutored using error correction feedback. Where this feedback comes from, however, is left out of PDP accounts of reading. In my target article (Skoyles 1991) I showed that error correction feedback can only exist if there is some process independent of the PDP network which can identify words correctly and so judge whether or not the network has read them correctly. Without this, error correction feedback, and hence the reading abilities of PDP networks, cannot occur. I further show that research on child reading and dyslexia strongly suggests that this independent recognition of written words depends in human readers upon sounding out words and accessing oral knowledge of their pronunciation. Coltheart essentially claims that I need not go so far: PDP simply cannot model the most interesting aspects of reading and so the above argument is premature. I cannot go along with his critique of PDP, although in many ways I would like to (I cannot be alone in longing for the good old days when the dual route model reigned supreme). It is not as easy as Coltheart implies to dismiss the phonological reading abilities shown by PDP networks. They do read a large number of nonwords correctly -- though Coltheart is right to note that they are not as good as skilled readers. Nonetheless, they do read some words correctly which is surprising given that PDP networks lack any specific knowledge of spelling-sound transcoding. These nonword reading skills are important even if they are not as good as those of proficient readers because we can no longer automatically assume that every time people read a nonword they do so using an independent grapheme-phoneme phonological route -- for they might instead be reading them (at least some of the time) by something like a PDP network. My disagreement with Coltheart concerns whether there are one or two kinds of phonological reading -- I suggest at least two exist. The first process is attentive (such as when you have to stop reading War and Peace to work out the pronunciation of the names of the characters). Attentive decoding depends, I suggest, upon rule-like grapheme-phoneme decoding. The second process, nonattentive phonological decoding (when you read monosyllables which happen not to be real words like VIZ), depends, I suggest, upon PDP networks. In contrast to attentive phonological decoding, nonattentive phonological decoding depends on generating phonology using the statistical regularities between spelling and pronunciation that are incidentally acquired by PDP networks when they are learning to read real words. The processes responsible for attentive and nonattentive phonological coding are independent of each other. Both attentive and nonattentive phonological decoding can produce phonological output that can be used to access the oral knowledge of word pronunciation contained in the speech system to identify words (perhaps along with with semantic and sentence context information -- see note 1). The boundary between the two forms of phonological decoding in any individual will depend upon their reading experience and their innate phonological capacities -- a five-year-old will probably only be able to read a monosyllabic nonword attentively, whereas a linguist will have no difficulty nonattentively sounding out obscure polysyllabic Russian names. My difference with Coltheart lies in our respective ways of defining the nonlexical reading route. Coltheart takes it to be a phonological route which reads nonwords through the use of explicit spelling-sound correspondence rules. I instead take it to be primarily a route using phonological decoding processes that can identify words by using the phonological information contained in word spelling to access a reader's oral knowledge of how words sound. Although nonattentive phonological processes can access oral knowledge, I suggest that this is much less likely than the use of attentive processes. If we focus on decoding a spelling to recognise the word behind its pronunciation we are more likely to adopt attentive rather than nonattentive processes as a consequence of stopping and focusing. Thus although we both support the existence of two reading routes, we have very different notions as to what they are. In this context, I will answer Coltheart's points one by one. I paraphrase his criticisms before describing my replies. (1) "PDP models are not very accurate at reading nonwords ... people are." As noted, people use a mix of attentive and nonattentive phonological decoding, whereas PDP networks only stimulate nonattentive ones. (2) "People are very accurate at deciding whether or not a pronounceable letter string is a real word (lexical decision) .. [PDP models are not]." First, the nature of lexical decision is controversial, with some arguing that it involves access to lexical representations and others that it does not (Balota & Chumbley, 1984). In addition, in order for PDP models to simulate lexical decisions, new assumptions are added to them. PDP models are designed to give correct phonological output to a given spelling input and not to make lexical decisions. To model lexical decisions, its modelers have made the additional assumption that back activation from its hidden units to its input units reflects some measure of lexicality. This is an assumption added to the model. hence it could be this assumption as much as the model which is at fault. (3) "Some brain lesions leave people with good nonword reading abilities with damaged lexical word recognition abilities -- surface dyslexia." Fine, such people are relying upon attentive phonological "sounding out" processes; their nonattentive processes are damaged along with their lexical reading processes. (4) "Some brain lesions leave people with good lexical reading abilities with damaged phonological ones -- phonological dyslexia." Unfortunately, acquired phonological dyslexia is rather rare (Funnell's patient, whom Coltheart cites, is nearly a unique case). It is so rare that afflicted individuals might have had phonological reading problems prior to their brain damage (Van Orden, Pennington and Stone, 1990). The difference between Coltheart and myself is that whereas he collapses nonattentive and attentive phonological reading together, I separate them. Can our two positions be tested? I think they can. If I am right, skilled readers should read nonwords with two levels of performance: First, they should display a high level of competence when they are free to use attentive phonological decoding. Second, they should show a lower level of success when they attempt to read nonwords while doing a secondary task which blocks their use of attentive phonological decoding and thereby confines their nonword reading to nonattentive processes. I suggest that this lower level of performance (if it exists) is the one against which PDP simulations of nonword reading should be compared as this should reflect only nonattentive nonword reading -- the phonological ability modeled by PDP simulations of reading. Note 1. It is possible that sentence and other contextual sources of information are used in accessing oral knowledge following phonological decoding: the hearing of words is highly context dependent and so I would expect any "inner ear" identification of words to be likewise. References Balota, D. A., & Chumbley, J. I. (1984). Were are the effects of frequency in visual word recognition tasks? Right where we said they were: Comment on Monsell, Doyle, and Haggard (1989). Journal of Experiment Psychology: General, 111, 231-237. Van Orden, G. C., Stone, G. O. & Pennington, B. F. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychology Review, 97, 488-522. ------------------------------ PSYCOLOQUY is sponsored by the Science Directorate of the American Psychological Association (202) 955-7653 Co-Editors: (scientific discussion) (professional/clinical discussion) Stevan Harnad Perry London, Dean, Cary Cherniss (Assoc Ed.) Psychology Department Graduate School of Applied Graduate School of Applied Princeton University and Professional Psychology and Professional Psychology Rutgers University Rutgers University Assistant Editor: Malcolm Bauer Psychology Department Princeton University End of PSYCOLOQUY Digest ****************************** From harnad at Princeton.EDU Sun Nov 24 01:12:30 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 24 Nov 91 01:12:30 EST Subject: Discussion II: Reading & Neural Nets Message-ID: <9111240612.AA10506@clarity.Princeton.EDU> Subject: Discussion I: Reading & Neural Nets Here is the second of two exchanges concerning the Target Article on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4 (retrievable by anonymous ftp from directory pub/harnad on princeton.edu). Further commentary is invited. All contributions will be refereed. Please submit to psyc at pucc.bitnet or psyc at pucc.princeton.edu -- NOT TO THIS LIST. Subject: PSYCOLOQUY V2 #9 (2.9.4 Commentary: Reilly, Skoyles : 410 lines) PSYCOLOQUY ISSN 1055-0143 Sun, 24 Nov 91 Volume 2 : Issue 9.4 2.9.4.1 Commentary on Skoyles Connectionism, Reading... / Reilly 2.9.4.2 Reply to Reilly / Skoyles ---------------------------------------------------------------------- From: Ronan Reilly ERC Subject: 2.9.4.1 Commentary on Skoyles Connectionism, Reading... / Reilly There's More to Connectionism than Feedforward and Backpropogation (Commentary on Skoyles Connectionism, Reading and the Limits of Cognition PSYCOLOQUY 2.8.4 1991) Ronan Reilly Educational Research Centre St Patrick's College Dublin 9 IRELAND ronan_reilly at eurokom.ie 1. Introduction I think Skoyles has presented a novel idea for modeling the learning of reading. The main aim of this commentary is to answer some of the questions he raised in his preamble, particularly those relating to connectionism, and finally to discuss some work I've done in the area that may provide a starting point for implementing Skoyles's proposal. 2. The Nature of Connectionist Training There are, as I'm sure will be pointed out in other commentaries, more connectionist learning algorithms than error backpropagation and more connectionist learning paradigms than supervised learning. So I am a little puzzled by Skoyles's failure to find any research on issues relating to the nature of error correction feedback. For example, what about the research on reinforcement learning by Barto, Sutton, and Anderson (1983)? In this work, no detailed feedback is provided on the correctness of the output vector. The teaching signal simply indicates whether or not the output was correct. On the issue of delayed error feedback: In order to deal with temporal disparities between input and error feedback, the network has to incorporate some form of memory that preserves sequential information. A standard feedforward network obviously has a memory, but it is one in which the temporal aspect of the input is discarded. Indeed, modelers usually go out of their way to discourage any temporal artifacts in training by randomising the order of input. Elman (1990) devised a simple technique for giving feedforward networks a temporal memory. It involves taking a copy of the activation pattern of the hidden units at time t and using it as input at time t+1, in addition to whatever other input there might be. The weights connecting these copy units (or context units) to the hidden units are themselves modifiable, just like the other weights in the network. Consequently, these weights accrete information about the input sequence in diminishing amounts over a number of preceding time steps. In these simple recurrent networks it is possible, therefore, for the input at time t to affect the output of the network at time t+n, for relatively small n. The corollary to this is that it is possible for error feedback to have an effect on learning at a temporal remove from the input to which it relates. Degraded error feedback is not a problem either. A number of connectionist paradigms have made use of so-called "moving target" learning. This occurs when the teaching vector (and even the input vector) are themselves modified during training. The most recent example of this is the recursive auto-associative memory (RAAM) of Pollack (1990). I won't dwell on the ins and outs of RAAMs, but suffice to say that a key element in the training of such networks is the use of their own hidden unit vectors as both input and teaching vectors. Thus, the network is confronted with a very complex learning task, since every time its weights are changed, the input and teaching vectors also change. Nevertheless, networks such as these are capable of learning successfully. In many ways, the task of the RAAM network is not unlike that of the individual learning to read as characterized by Skoyles. My final word on the topic of connectionist learning algorithms concerns their psychological status. I think it is important to emphasise that many aspects of backpropagation learning are psychologically unrealistic. Apart from the fact that the algorithm itself is biologically implausible, the level of specificity required of the teacher is just not found in most psychological learning contexts. Furthermore, the randomized nature of the training regime and the catastrophic interference that occurs when a network is trained on new associations does not correspond to many realistic learning situations (if any). What is important about connectionist learning is not the learning as such, but what gets learned. It is the nature of the representations embodied in the weight matrix of a network that gives connectionist models their explanatory and predictive power. 3. Phonetic Reading In what follows, I assume that what Skoyles means by "phonetic reading" is phonologically mediated access to word meaning. I don't think it is yet possible to say that phonology plays no role in accessing the meaning of a word. However, Seidenberg (1989) has argued persuasively that much of the evidence in favor of phonological mediation can be accounted for by the simultaneous activation of both orthographic and phonological codes, and none of the evidence addresses the central issue of whether or not access is mediated by phonological codes. Personally, I am inclined to the view that access to meaning among skilled readers is direct from the orthography. I was puzzled by Skoyles's description of the Seidenberg and McClelland (1989) model, first, as a model of reading, and second, as a model of non-phonetic reading. It certainly is not a model of reading, since in the implementation they discuss there is no access to meaning. Furthermore, how can it be considered to be nonphonetic when part of the model's training involves teaching it to pronounce words? In fact, Seidenberg and McClelland's model seems to be a red herring in the context of the issues Skoyles wishes to address. 4. A Modelling Framework I am currently working on modeling the role of phonics in teaching reading using a connectionist framework (Reilly, 1991). The model I've developed might provide a suitable framework for addressing Skoyles's hypothesis. It consists of two components, a speech component which is trained first and learns to map a sequence of phonemes onto a lexical representation. The weights in this network are frozen after training. The second component is a network that maps an orthographic representation onto a lexical representation. This mapping can be either via the hidden units in the speech module (i.e., the phonological route), via a separate set of hidden units (i.e., the direct route), or via both sets of hidden units. I have operationalized different teaching emphases (e.g., phonics vs. whole- word vs. a mixed approach) by allowing or disallowing the training of the weights comprising the two lexical access routes. Preliminary results suggest that a mixed approach gives the best overall word recognition performance, but this has not proved entirely reliable over replications of training with different initial weight settings. I am currently working on various refinements to the model. In addition to providing a testbed for the phonics issue, the model I've outlined might also provide a framework for implementing Skoyles's idea, and it might perhaps help derive some testable hypotheses from it. For example, it would be possible to use the lexical output produced as a result of taking the phonological route as a teaching signal for the direct route. I imagine that this might give rise to distinct forms of word recognition error, forms not found if a "correct" teaching signal were used. 5. Conclusion I think that Skoyles's idea is interesting and worthy of exploration. I feel, however, that his view of current connectionist modeling is somewhat narrow. Contrary to the impression he appears to have, there are connectionist learning architectures and techniques available that address many of the issues he raises. 6. References Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Pollack, J. B. (1990). Recursive distributed representations. Artificial Intelligence, 46, I77-105. Reilly, R. (1991). A Connectionist exploration of the phonics issue in the teaching of reading: Re-using internal representations. In Working notes of AAAI Spring Symposium on connectionist natural language processing. March, 1991, Stanford University, pp. 178-182. Seidenberg, M. S. (1989). Visual word recognition and pronunciation. In W. Marslen-Wilson (Ed.), Lexical representation and process. Cambridge, MA: MIT Press, pp. 25-74. Seidenberg, M. S., & McClelland, J. L. (1989). A distributed developmental model of visual word recognition. Psychological Review, 96, 523-568. ------------------------------ From: John R Skoyles Subject: 2.9.4.2 Reply to Reilly / Skoyles The Limits of Connectionism and Cognition Revisited: All Reading Networks need to be Trained (Reply to Reilly) John R. Skoyles Department of Psychology University College London London WC1E 6BT ucjtprs at ucl.ac.uk 1. The nature of connectionist learning. My argument against connectionist reading consists of two points. First, reading networks do not get a "free meal" -- something for nothing (Skoyles, 1988). To be able to read they have to be trained. To parallel the popular cliche, "junk in, junk out" reading networks depend upon "mappings in, mappings out." What is called "reading" in these networks is a mapping usually from a written word to its pronunciation (but potentially also to its meaning). To get to that state, however, they need to be trained on exemplar mappings -- the reading network does not get the information to make its mappings miraculously from nowhere but from mappings previous given to it. Error-correction is one way of doing exemplar training -- the network can only make an error in the context of a correct output for a given input. (Of course, reading networks create new mappings not given to them, but the information to do so derives from mappings with which they have been previously trained. So in a sense there is a free meal, but the network has to be feed something first.) Second, the proponents of reading networks maintain that there is a free meal by focusing entirely upon the "mappings out," forgetting where they get the "mappings in" to train them. Instead of miracles, I suggest that phonological reading -- identifying a written word -- provides this information. This conjecture fits in with the evidence about phonological reading and learner readers and dyslexia. Improving the skill of a learner reader to identify words from their spelling enhances their progress in learning to read (Adams, 1990). Dyslexics lack phonological abilities and so find it difficult to identify words from their spelling (Snowling, 1987). These two facts make sense if the phonological identification of words is providing the "mappings in" to train the reading network. 1.1. Supervised learning. In my target article (Skoyles 1991) I discussed McClelland and Seidenberg's (1989) model of reading, which uses supervised backpropagation learning. Reilly correctly points out that there is more to connectionism than backpropagation and supervised learning. I focused upon these because they are used by the published models of reading. This does not diminish the generality of my points. For example, Reilly correctly points out that Barto, Sutton, and Anderson (1983) have proposed a model of reinforcement training which contains no detailed information about the correctness of the output vector. As Reilly points out, however, they nonetheless use a teaching signal that indicates whether or not the output was correct. But how would a system training a network know whether or not its output was correct without some independent means of recognising words? My point applies not only to backpropagation but any form of supervised learning (because to tutor the network the supervisor has to know something the network does not). 1.2. Unsupervised learning. My point also applies to unsupervised networks -- for example Boltzmann nets. These are given inputs and are apparently not corrected. There is a stage in Boltzmann training, however, when the network's actual and desired output are calculated to form the objective function, and depending upon this the internal weights in the network are or are not retained. Thus, this unsupervised learning still begs the question of the availability of knowledge regarding the desired output of the network: Without this the objective function cannot be calculated. Although the network may be unsupervised, it is not unregulated. It is given exemplar input and desired outputs. In the case of reading, the desired output will be the correct reading of a written word (its input). But the Boltzmann network cannot by itself know that any reading is correct and hence desired: Something outside the system has to be able to read to do this. In other words the same situation I showed existed with supervised networks, exists for unsupervised ones. 1.3. Auto-associative learning. Reilly raises the possibility of auto-associative learning. Networks using this do not have to feed on information in the form of error-correction nor do they have to correct exemplar input-output pairs supplied from outside because their input doubles as their desired output. I would question, however, whether a network dependent entirely upon auto-associative learning could learn to read. This may work well with categorization skills, but as far as I am aware, not with mapping tasks (such as reading) which involve learning a vocabulary. I would be very interested to see whether anyone can create such a net. Of course, there is no reason a network may not use auto-associative learning in combination with non-autoassociative training. 2. Biological plausibility. I agree with Reilly's observation that backpropagation is biologically implausible. However, new learning procedures have been developed which are biologically feasible (Mazzoni, Anderson & Jordan, 1991). In addition, as noted above, my observation is a general one, which would apply much more widely than just to the cases of backpropagation and supervised learning. Although it is unlikely the networks responsible for reading in the brain use backpropagation, it is likely that they are constrained by the same kind of constraints noted above and in my original target article. 3. Network learning vs the internal representations of networks as objects of interest. I am slightly concerned that Reilly suggests "What is important about connectionist leaning is not the learning as such, but what gets learned. It is the nature of the representations embodied in the weight matrix of a network that gives connectionist model their explanatory and predictive power." This seems an abdication of responsibility. Connectionist models are described as learning models, not representation models. Their authors emphasis that their training is not an incidental means for their creation but something that might enlighten us about the process by which networks are acquired. Reilly's own simulation of reading is concerned not with what gets learnt but with which of three reading instruction methods (whole word, phonic or mixed whole word and phonics) trains reading networks best. In addition, if we get the mechanism by which networks develop wrong, can we be confident that their internal representations are going to be correct, and consequently of interest? 4. Phonetic reading. As I note in my accompanying reply to Coltheart (1991; Skoyles 1991), phonetic reading can mean two things. First, phonological decoding -- something measured by the ability to read nonwords. Second, the identification of written words using information about how they are spelt and orally pronounced. In the latter, a reader uses some kind of phonological decoding to access oral vocabulary to identify words -- so they are associated. However, phonological decoding may be done through several means -- lexical analogies and even to some extent through the reading network (see my comments on this in my reply to Coltheart 1991). But whereas a reading network can phonological decode words, it cannot recognise words by accessing the knowledge we have of how they are pronounced in oral vocabularies. Access to that information through phonological decoding is the critical thing I suggest for training networks -- not the phonological decoding involved. Reilly right points out that Seidenberg and McClelland's (1989) model does not fully cover all aspects of reading, in particular, access to meaning. However, my observations would generalise to reading models which cover this. This is because my observation is about input/output mapping and it does not matter if the output is not phonology but meaning. In this case, phonological reading accesses the meaning of words from oral vocabulary, which is then used to train the semantic output of the reading network. I did not develop this point simply because Seidenberg and McClelland's model, as Reilly notes, does not cover meaning. 5. Reilly's own model Reilly only briefly describes his own contribution to understanding how reading networks develop. I am very interested in his suggestion that "it would be possible to use the lexical output produced as a result of taking the phonological route as a teaching signal for the direct route." As he notes, this might produce "distinct forms of word recognition error." Experiments in this area seem a good idea, though perhaps Reilly's network needs to be refined (he notes that it is not entirely reliable over replications with different initial weights). I would like to see whether his "phonic" route could take more account of the possibility of using units of pronunciation correspondence larger than the phoneme-grapheme one, because children seem to start with larger ones (the sort used in lexical analogies). 6. Conclusion Reilly suggests that learning architectures and techniques are available which address the issues I raised in my original target article. With the exception of Reilly's own model, as I hope I have shown above, this is not true. 7. References Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press. Coltheart, M. (1991) Connectionist modeling of human language processing: The case of reading. PSYCOLOQUY 2.9.3. Mazzoni, P., Anderson, R. A. & Jordan, M. I. (1991). A more biologically plausible learning rule for neural networks. Proceedings of the National Academy of Sciences USA. 88, 4433-4437. Seidenberg, M. S. and McClelland, J. I. (1989). A distributed, developmental model of word recognition and naming, Psychological Review, 96, 523-568. Skoyles, J. R. (1988) Training the brain using neural-network models. Nature 333, 401. Skoyles, J. R. (1991) Connectionism, reading and the limits of cognition. PSYCOLOQUY 2.8.4. Skoyles, J.R. (1991) The success of PDP and the dual route model: Time to rethink the phonological route. PSYCOLOQUY 2.9.3. Snowling, M. (1987). Dyslexia: A cognitive developmental perspective. Oxford: Basil Blackwell. ------------------------------ PSYCOLOQUY is sponsored by the Science Directorate of the American Psychological Association (202) 955-7653 Co-Editors: (scientific discussion) (professional/clinical discussion) Stevan Harnad Perry London, Dean, Cary Cherniss (Assoc Ed.) Psychology Department Graduate School of Applied Graduate School of Applied Princeton University and Professional Psychology and Professional Psychology Rutgers University Rutgers University Assistant Editor: Malcolm Bauer Psychology Department Princeton University End of PSYCOLOQUY Digest ****************************** From gmk%idacrd at uunet.UU.NET Sun Nov 24 12:21:49 1991 From: gmk%idacrd at uunet.UU.NET (Gary M. Kuhn) Date: Sun, 24 Nov 91 12:21:49 EST Subject: Copenhagen: 1992 Workshop on NN for SP Message-ID: <9111241721.AA05809@> 1992 IEEE Workshop on Neural Networks for Signal Processing. August 31 - September 2, 1992 Copenhagen, Denmark In cooperation with the IEEE Signal Processing Society and sponsored by the Computational Neural Network Center (CONNECT) CALL FOR PAPERS The second of a series of IEEE workshops on Neural Networks for Signal Processing, the first of which was held in Princeton in October 1991, will be held in Copenhagen, Denmark, in August 1992. Papers are solicited for technical sessions on the following topics: System Identification and Spectral Estimation by Neural Networks. Non-linear Filtering by Neural Networks. Pattern Learning Theory and Algorithms. Application-Driven Neural Models. Application to Image Processing and Pattern Recognition. Application to Speech Recognition, Coding and Enhancement. Application to Adaptive Array Processing. Digital/Analog Systems for Signal Processing. Prospective authors are invited to submit 4 copies of extended summaries of no more than 5 pages. The top of the first page of the summary should include a title, authors' names, affiliations, address, telephone and fax numbers, and email address if any. Photo-ready full papers of accepted proposals will be published in a hard bound book by IEEE. General chairs S.Y. Kung Frank Fallside Department of Electrical Engineering Engineering Department Princeton University Cambridge University Princeton, NJ 08544, USA Cambridge CB2 1PZ, UK email: kung at princeton.edu email: fallside at eng.cam.ac.uk Program chair Proceedings Chair John Aasted Sorensen Candace Kamm Electronics Institute, Bldg. 349 Box 1910 Technical University of Denmark Bellcore, 445 South St., Rm. 2E-256 DK-2800 Lyngby, Denmark Morristown, NJ 07960-1910, USA email: jaas at dthei.ei.dth.dk email: cak at thumper.bellcore.com Program Committee Ronald de Beer Jeng-Neng Hwang John E. Moody John Bridle Yu Hen Hu Carsten Peterson Erik Bruun B.H. Juang Sathyanarayan S. Rao Poul Dalsgaard S. Katagiri Peter Salamon Lee Giles Teuvo Kohonen Christian J. Wellekens Lars Kai Hansen Gary M. Kuhn Barbara Yoon Steffen Duus Hansen Benny Lautrup John Hertz Peter Koefoed Moeller Paper submissions and further information: Program Chair Tel: +4545931222 ext. 3895, Fax: +4542880117 Submission of extended summary February 15, l992 Notification of acceptance April 20, l992 Submission of photo-ready paper May 20, l992 From kris at psy.gla.ac.uk Thu Nov 21 12:33:08 1991 From: kris at psy.gla.ac.uk (Kris Doing) Date: Thu, 21 Nov 91 17:33:08 GMT Subject: Roommate for NIPS Message-ID: <5473.9111211733@buzzard.psy.glasgow.ac.uk> Dear Connectionists, I am looking for a roommate for the NIPS Conference in Denver, Sunday to Thursday only. Unfortunately I cannot recieve email after Friday 22 Nov. If you are interested please contact me at may parents house: Kristina Doing Harris c/o Park Doing 4411 Shady Crest Drive Kettering, Ohio 45459 tel : 513-433-4336 Hope to hear from someone, Kris Doing Harris University of Glasgow, Scotland From jbower at cns.caltech.edu Mon Nov 25 13:01:38 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Mon, 25 Nov 91 10:01:38 PST Subject: Postdoctoral work at Caltech Message-ID: <9111251801.AA01995@cns.caltech.edu> -------------------------------------------------------------------- Postdoctoral Position in Computational Neurobiology Computation and Neural Systems Program Caltech A Post-doctoral position in the laboratory of Dr. Gilles Laurent is available for up to 3 years Applicants should have experience in modelling techniques and be interested in general problems of sensory-motor integration and/or single neuron computation. One possible project would focus on somatosensory processing in insects emaphasizing the architecture of local circuits comprised of a few hundred identified neurons. These circuits are composed of 4 layers of neurons (sensory, interneuronal 1, interneuronal 2, motor), with a large degree of convergence and no known internal feedback connections. The task which these circuits perform is the mediation of leg reflexes, and the adaptation of these reflexes to external inputs or to internal constraints (e.g. centrally generated rhythm). The second possible project would focus on the integrative properties of the 2 classes of local interneurons in those circuits. Both classes lack an axon (they are local neurons), but the first ones use action potentials whereas the second use graded potentials as modes of intra- and inter-cellular communication. The hypothesis which we are trying to test experimentally is that graded processing allows compartmentalization of function, thereby increasing the computational capabilities of single neurons. For further information contact: Gilles Laurent Biology Division CNS Program, MS 139-74 Caltech Pasadena, CA 91125 (818) 397-2798 laurent at delphi.caltech.edu. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Nov 25 14:28:50 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 25 Nov 91 14:28:50 EST Subject: Patent Fallacies In-Reply-To: Your message of Sat, 23 Nov 91 17:01:28 -0500. <9111232201.AA02257@corwin.CCS.Northeastern.EDU> Message-ID: This is not the proper forum for a discussion of software patents, so I'll keep this very brief: Before accepting Steve Gallant's mostly pro-patent analysis at face value, you should contact the League for Programming Freedom, "league at prep.ai.mit.edu", and ask them to send you their position papers on the subject, which they will send by E-mail. This is a good presentation of the arguments for elimination of software/algorithm patents. (They also have a position statement opposing look-and-feel copyrights). -- Scott Fahlman From ctika01 at mailserv.zdv.uni-tuebingen.de Mon Nov 25 14:52:43 1991 From: ctika01 at mailserv.zdv.uni-tuebingen.de (George Kampis) Date: Mon, 25 Nov 91 20:52:43 +0100 Subject: Special Issue on Emergence II Message-ID: <9111251952.AA07602@mailserv.zdv.uni-tuebingen.de > If you get this more than once, I apologize. There have been numerous requests on the exact co-ordinates of a journal Special Issue I announced a time ago. *************************************************************************** SPECIAL ISSUE ON EMERGENCE of World Futures: the Journal of General Evolution *************************************************************************** The WF Spec Issue can be bought for USD 38 as a book, it's ISBN 2-88124-526-9, order from Gordon and Breach, POBox 786 Cooper Station New York, NY 10276 USA phone (212) 206 8900 FAX (212) 645 2459 George Kampis kampis at mailserv.zdv.uni-tuebingen.de From giles at research.nec.com Mon Nov 25 15:35:56 1991 From: giles at research.nec.com (Lee Giles) Date: Mon, 25 Nov 91 15:35:56 EST Subject: recurrent higher order neural networks Message-ID: <9111252035.AA01060@fuzzy.nec.com> Regarding higher order recurrent nets: John Kolen mentions: ***************************************** Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) ********************************************** For other references on higher-order recurrent nets, see the following: (This list is not meant to be inclusive, but to give some flavor of the diversity of work in this area.) Y.C. Lee, et.al,1986, Physica D. H.H. Chen, et.al, 1986, AIP conference proceedings on Neural Networks for Computing F. Pineda, 1988, AIP conference proceedings for NIPS Psaltis, et.al, 1988, Neural Networks. Giles, et al. 1990, NIPS2; and 1991 IJCNN proceedings Mozer and Bachrach, Machine Learning 1991 Hush, et.al., 1991 Proceedings for Neural Networks for Signal Processing. Watrous and Kuhn, 1992 Neural Computation In particular the papers by Giles, et.al use a 2nd order RTRL to learn grammars from grammatical strings. (Similar work has been done by Watrous and Kuhn.) What may be of interest is that using a heuristic extraction method, one can extract the grammar that the recurrent network learns (or is learning). It's worth noting that higher-order nets usually include sub-orders as special cases, i.e. 2nd includes 1st. In addition, sigma-pi units are just a subset of higher-order models and in many cases do not have the computational power of higher-order models. C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From wray at ptolemy.arc.nasa.gov Mon Nov 25 21:04:20 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Mon, 25 Nov 91 18:04:20 PST Subject: validation sets In-Reply-To: Jude Shavlik's message of Thu, 21 Nov 91 14:19:02 -0600 <9111212019.AA07223@steves.cs.wisc.edu> Message-ID: <9111260204.AA02393@ptolemy.arc.nasa.gov> Jude Shavlik says: > The question of whether or not validation sets are useful can easily be > answered, at least on specific datasets. We have run that experiment and > found that devoting some training examples to validation is useful (ie, > training on N examples does worse than training on N-k and validating on k). > > This same issue comes up with decision-tree learners (where the validation set > is often called a "tuning set", as it is used to prune the decision tree). I > believe there people have also found it is useful to devote some examples to > pruning/validating. Sorry Jude but I couldn't let this one slip by. Use of a validation set in decision-tree learners produces great results ONLY when you have LOTS and LOTS of data. When you have less data, Cross validation or use of a well put together complexity/penalty term (i.e. carefully thought out MDL, weight decay/elimination, Bayesian maximum posterior, regularization, etc. etc. etc.) works much better. If the penalty term isn't well thought out (e.g. the early stuff on feed-forward networks such as weight decay/elimination was still toying with a new idea, so I'd call these not well thought out, although revolutionary for the time) then performance isn't as good. Best results with trees are obtained so far from doing "averaging", i.e. probabilistically combining the results from many different trees. i.e. experimental confirmation of the COLT-91 Haussler et al. style of results. NB. good penalty terms are discussed in Nowlan & Hinton, Buntine & Weigend and MacKay, and probably in lots of other places ... Jude's comments: > found that devoting some training examples to validation is useful (ie, > training on N examples does worse than training on N-k and validating on k). Only applies because they haven't included a reasonable penalty term. Get with it guys! > I think there is also an important point about "proper" experimental > methodology lurking in the discussion. If one is using N examples for weight > adjustment (or whatever kind of learning one is doing) and also use k examples > for selecting among possible final answers, one should report that their > testset accuracy resulted from N+k training examples. There's an interesting example of NOT doing this properly recently in the machine learning journal. See Mingers in Machine Learning 3(4), 1989, then see our experimental work in Buntine and Niblett 7, 1992. Mingers produced an otherwise *excellent* paper, but produced perculiar results (to those experienced in the area) because of mixing the "tuning set" with the "validation set". Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 269-2 fax: (415) 604 3594 Moffett Field, CA, 94035 email: wray at kronos.arc.nasa.gov From slehar at park.bu.edu Tue Nov 26 00:23:29 1991 From: slehar at park.bu.edu (slehar@park.bu.edu) Date: Tue, 26 Nov 91 00:23:29 -0500 Subject: Patent Fallacies In-Reply-To: connectionists@c.cs.cmu.edu's message of 25 Nov 91 18:13:09 GM Message-ID: <9111260523.AA08966@alewife.bu.edu> Thanks to Steve Gallant for a very clear and convincing explaination of the patent issue. I was particularly impressed with the argument that Patents are designed to ENCOURAGE free exchange of information, which was a new concept for me. I have a couple of questions still- when you say that The majority of patents granted do not result in the inventor making back legal costs and filing fees, is this because inventors have an unreasonably high esteem for their own creation and thus tend to patent things that should not have been patented? Or is this just "insurance" to cover the uncertainty of the prospects for the product, and constitutes a proper cost of the business of inventing? Or is it a way to purchase prestige for the organization that pays for the patent? How much do patents typically cost, and where does that money really go to? Is this another tax, or are we really getting value for our money? You say that a patent must be non-obvious to those "skilled in the art". What if somebody releases some software to public domain as free software, and which is clearly the work of a genius? After release, can somebody else "steal" the idea and patent it for themselves, or is the public release sufficient education to those "skilled in the art" as to render it henceforth obvious and thereby unpatentable? Finally, is there not a growing practical issue that as things become easier to copy, the patent and copyright laws become progressively more difficult to enforce? Unenforcable laws are worse than useless, because they stimulate the spread of intrusive police measures and legal expenses in a futile attempt to stop the unstoppable. Computer software is currently the toughest problem in this regard, but the recent digital tape fiasco and growing problem of illegal photocopying are just the beginning- what happens when a patented life form gets bootlegged and starts replicating itself at will? Will the advance of technology not eventually make all copyrights and most patents worthless? From mmdf at gate.fzi.de Sun Nov 24 11:31:07 1991 From: mmdf at gate.fzi.de (FZI-mmdfmail) Date: Sun, 24 Nov 91 16:31:07 GMT Subject: resend Algorithms for Principal Components Analysis Message-ID: Ray, Over the past few years there has been a great deal of interest in recursive algorithms for finding eigenvectors or linear combinations of them. Many of these algorithms are based on the Oja rule (1982) with modifications to find more than a single output. As might be expected, so many people working on a single type of algorithm has led to a certain amount of duplication of effort. Following is a list of the papers I know about, which I'm sure is incomplete. Anyone else working on this topic should feel free to add to this list! Cheers, Terry Sanger @article{sang89a, author="Terence David Sanger", title="Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network", year=1989, journal="Neural Networks", volume=2, pages="459--473"} @incollection{sang89c, author="Terence David Sanger", title="An Optimality Principle for Unsupervised Learning", year=1989, pages="11--19", booktitle="Advances in Neural Information Processing Systems 1", editor="David S. Touretzky", publisher="Morgan Kaufmann", address="San Mateo, {CA}", note="Proc. {NIPS'88}, Denver"} @article{sang89d, author="Terence David Sanger", title="Analysis of the Two-Dimensional Receptive Fields Learned by the Generalized {Hebbian} Algorithm in Response to Random Input", year=1990, journal="Biological Cybernetics", volume=63, pages="221--228"} @misc{sang90c, author="Terence D. Sanger", title="Optimal Hidden Units for Two-layer Nonlinear Feedforward Neural Networks", year=1991, note="{\it Int. J. Pattern Recognition and AI}, in press"} @inproceedings{broc89, author="Roger W. Brockett", title="Dynamical Systems that Sort Lists, Diagonalize Matrices, and Solve Linear Programming Problems", booktitle="Proc. 1988 {IEEE} Conference on Decision and Control", publisher="{IEEE}", address="New York", pages="799--803", year=1988} @ARTICLE{rubn90, AUTHOR = {J. Rubner and K. Schulten}, TITLE = {Development of Feature Detectors by Self-Organization}, JOURNAL = {Biol. Cybern.}, YEAR = {1990}, VOLUME = {62}, PAGES = {193--199} } @INCOLLECTION{krog90, AUTHOR = {Anders Krogh and John A. Hertz}, TITLE = {Hebbian Learning of Principal Components}, BOOKTITLE = {Parallel Processing in Neural Systems and Computers}, PUBLISHER = {Elsevier Science Publishers B.V.}, YEAR = {1990}, EDITOR = {R. Eckmiller and G. Hartmann and G. Hauske}, PAGES = {183--186}, ADDRESS = {North-Holland} } @INPROCEEDINGS{fold89, AUTHOR = {Peter Foldiak}, TITLE = {Adaptive Network for Optimal Linear Feature Extraction}, BOOKTITLE = {Proc. {IJCNN}}, YEAR = {1989}, PAGES = {401--406}, ORGANIZATION = {{IEEE/INNS}}, ADDRESS = {Washington, D.C.}, MONTH = {June} } @MISC{kung90, AUTHOR = {S. Y. Kung}, TITLE = {Neural networks for Extracting Constrained Principal Components}, YEAR = {1990}, NOTE = {submitted to {\it IEEE Trans. Neural Networks}} } @article{oja85, author="Erkki Oja and Juha Karhunen", title="On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix", journal="J. Math. Analysis and Appl.", volume=106, pages="69--84", year=1985} @book{oja83, author="Erkki Oja", title="Subspace Methods of Pattern Recognition", publisher="Research Studies Press", address="Letchworth, Hertfordshire UK", year=1983} @inproceedings{karh84b, author="Juha Karhunen", title="Adaptive Algorithms for Estimating Eigenvectors of Correlation Type Matrices", booktitle="{Proc. 1984 {IEEE} Int. Conf. on Acoustics, Speech, and Signal Processing}", publisher="{IEEE} Press", address="Piscataway, {NJ}", year=1984, pages="14.6.1--14.6.4"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @misc{ogaw86, author = "Hidemitsu Ogawa and Erkki Oja", title = "Can we Solve the Continuous Karhunen-Loeve Eigenproblem from Discrete Data?", note = "Proc. {IEEE} Eighth International Conference on Pattern Recognition, Paris", year = "1986"} @article{leen91, author = "Todd K Leen", title = "Dynamics of learning in linear feature-discovery networks", journal = "Network", volume = 2, year = "1991", pages = "85--105"} @incollection{silv91, author = "Fernando M. Silva and Luis B. Almeida", title = "A Distributed Decorrelation Algorithm", booktitle = "Neural Networks, Advances and Applications", editor = "Erol Gelenbe", publisher = "North-Holland", year = "1991", note = "to appear"} From giles at research.nec.com Tue Nov 26 18:03:57 1991 From: giles at research.nec.com (Lee Giles) Date: Tue, 26 Nov 91 18:03:57 EST Subject: Higher-order recurrent neural networks Message-ID: <9111262303.AA03072@fuzzy.nec.com> More references for higher-order recurrent nets and some general comments: John Kolen mentions: ***************************************** Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) ********************************************** For other references on higher-order recurrent nets, see the following: (This list is not meant to be inclusive, but to give some flavor of the diversity of work in this area.) Y.C. Lee, et.al,1986, Physica D. H.H. Chen, et.al, 1986, AIP conference proceedings on Neural Networks for Computing F. Pineda, 1988, AIP conference proceedings for NIPS Psaltis, et.al, 1988, Neural Networks. Giles, et al. 1990, NIPS2; and 1991 IJCNN proceedings, Neural Computation, 1992. Mozer and Bachrach, Machine Learning 1991 Hush, et.al., 1991 Proceedings for Neural Networks for Signal Processing. Watrous and Kuhn, 1992 Neural Computation In particular the work by Giles, et.al. describes a 2nd order forward-propagation RTRL to learn grammars from grammatical strings.* What may be of interest is that using a heuristic extraction method, one can extract the "learned" grammar from the the recurrent network both during and after training. It's worth noting that higher-order nets usually include sub-orders as special cases, i.e. 2nd includes 1st. In addition, sigma-pi units are just a subset of higher-order models and in some cases do not have the computational representative power of higher-order models. For example, the term (using Kolen's notation above) S[i,t] . I[j,t] would have the same weight coefficient in the original sigma-pi notation as the term S[j,t] . I[i,t]. Higher-order notation would distinguish between these terms using the tensor weights W[k,i,j] and W[k,j,i]. *(Similar work has been done by Watrous & Kuhn and Pollack) C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From FJIMENEZ%ANDESCOL.BITNET at BITNET.CC.CMU.EDU Tue Nov 26 19:50:44 1991 From: FJIMENEZ%ANDESCOL.BITNET at BITNET.CC.CMU.EDU (Nestor) Date: Tue, 26 Nov 91 19:50:44 COL Subject: Change address Message-ID: <01GDF0BWLPT49EDE3C@BITNET.CC.CMU.EDU> Hello, can you say me what is the procedure if I want to change the address bitnet? thanks in advance, Nestor Ceron e-mail:fjimenez at andescol.bitnet Universidad de los Andes Santafe de Bogota - Colombia From platt at synaptics.com Tue Nov 26 21:46:55 1991 From: platt at synaptics.com (John Platt) Date: Tue, 26 Nov 91 18:46:55 PST Subject: Neural Architect Position Offered Message-ID: <9111270246.AA27607@synaptx.synaptics.com> **********************DO NOT FORWARD TO OTHER BBOARDS************************* **********************DO NOT FORWARD TO OTHER BBOARDS************************* NEURAL NETWORK ARCHITECT WANTED Synaptics, Inc., is a small and growing neural network company, located in San Jose, California. We develop neural network architectures and analog VLSI chips to sense and process real-world data. Our architectures and unique hardware solutions enable our customers to create state-of-the-art systems in many different fields. There is an opening at Synaptics for a neural network architect. The job will consist of creating network architectures for real-world applications, such as optical character recognition. The architect will need to develop programs to train and test these architectures on real data. The architect will also have to map the architectures onto existing or new analog VLSI chips. Applicants should have a strong background in programming. Experience in C++ or LISP are especially valuable. Applicants should also be familiar with current research in neural networks and have experience in applying network models to real problems. Experience with VLSI (analog or digital) is desirable, but not necessary. Applicants should be multi-disciplinary, thorough experimentalists. They should be enthusiastic about neural networks, working with other researchers, inventing new ideas, and building a successful company by meeting customers' needs. If you are interested in this position, please send your resume to John Platt Synaptics 2860 Zanker Road, Suite 206 San Jose, CA 95134 or send a postscript or plain text resume to platt at synaptics.com I will be away at NIPS until Dec 10, so please do not expect an immediate reply. From ang at hertz.njit.edu Tue Nov 26 22:03:40 1991 From: ang at hertz.njit.edu (Nirwan Ansari, 201-596-3670) Date: Tue, 26 Nov 1991 22:03:40 -0500 Subject: Wavelet Symposium Message-ID: <9111270303.AA03913@hertz.njit.edu> The following symposium on wavelets might arouse interest in the Neural Networks community. ********************************************************************* New Jersey Institute of Technology Department of Electrical and Computer Engineering Center for Communications and Signal Processing Research presents One-day Symposium on MULTIRESOLUTION IMAGE AND VIDEO PROCESSING: SUBBANDS AND WAVELETS Date: March 20, 1992 (Friday, just before ICASSP week) Place: NJIT, Newark, New Jersey Organizers: A.N. Akansu, NJIT M. Vetterli, Columbia U. J.W. Woods, RPI Program: 08.30-09.00 Registration and Coffee 09.00-09.10 Gary Thomas, Provost, NJIT: Welcoming Remarks 09.10-09.40 Edward H. Adelson, MIT: Steerable, Shiftable Subband Transforms 09.40-10.10 Ali N. Akansu, NJIT: Some Aspects of Optimal Filter Bank Design for Image-Video Coding 10.10-10.40 Arnaud Jacquin, AT&T Bell Labs.: Comparative Study of Different Filterbanks for Low Bit Rate Subband-based Video Coding 10.40-11.00 Coffee Break 11.00-11.30 Ronald Coifman, Yale U.: Adapted Image Coding with Wavelet-packets and Local Trigonometric Waveform Libraries 11.30-12.00 Philippe M. Cassereau, Aware Inc.: Wavelet Based Video Coding 12.00-12.30 Michele Barlaud, Nice U.: Image Coding Using Biorthogonal Wavelet Transform and Entropy Lattice Vector Quantization 12.30-01.30 Lunch 01.30-02.00 Jan Biemond, Delft U.: Hierarchical Subband Coding of HDTV 02.00-02.30 Martin Vetterli, Columbia U.: Multiresolution Joint Source-Channel Coding for HDTV Broadcast 02.30-03.00 John W. Woods, RPI: Compression Coding of Video Subbands 03.00-03.30 Rashid Ansari, Bellcore: Hierarchical Video Coding: Some Options and Comparisons ************************************* Registration Fee: $20, Lunch included Parking will be provided EARLY REGISTRATION ADVISED ************************************** For Early Registration: Send your check to(payable to NJIT/CCSPR) A.N. Akansu NJIT ECE Dept. University Heights Newark, NJ 07102 Tel:201-5965650 email:ali at hertz.njit.edu DIRECTIONS TO NJIT ****************** GARDEN STATE PARKWAY: Exit 145 to Route 280 East. Exit King Blvd. At traffic light turn right. Third traffic light is Central Ave. Turn right. One short block, turn left onto Summit Ave. Stop at guard house for parking directions. ROUTE 280 EAST: Follow directions outlined above. NEW JERSEY TURNPIKE: Exit 15W to Route 280 West. Stay in right-hand lane after metal bridge. Broad Street is second exit(landmarks-RR station on left, church spire on right). Turn left at foot of ramp. One short block to stop sign. Turn left onto King Blvd. At 4th light, turn right onto Central Ave. (stay left). One short block, turn left onto Summit Ave. Drive to guard house for parking directions.(If you miss the Broad St. exit, get off at Clinton Ave. Turn left at foot of ramp; left onto Central Ave; right onto Summit Ave.) ROUTE 280 WEST: Follow directions outlined above. FROM NEWARK AIRPORT: Take a taxi to NJIT campus. We are next to Rutgers Newark Campus. HAVE A SAFE TRIP! From pratt at cs.rutgers.edu Wed Nov 27 15:33:37 1991 From: pratt at cs.rutgers.edu (pratt@cs.rutgers.edu) Date: Wed, 27 Nov 91 15:33:37 EST Subject: Subtractive methods / Cross validation (includes summary) Message-ID: <9111272033.AA13154@rags.rutgers.edu> Hi, FYI, I've summarized the recent discussion on subtractive methods below. A couple of comments: o [Ramachandran and Pratt, 1992] presents a new subtractive method, called Information Measure Based Skeletonisation (IMBS). IMBS induces a decision tree hidden unit hyperplanes in a learned network in order to detect which are superfluous. Single train/test holdout experiments on three real-world problems (Deterding vowel recognition, Peterson-Barney vowel recognition, heart disease diagnosis) indicate that this method doesn't degrade generalization scores while it substantially reduces hidden unit counts. It's also very intuitive. o There seems to be some confusion between the very different goals of: (1) Evaluating the generalization ability of a network, and (2) Creating a network with the best possible generalization performance. Cross-validation is used for (1). However, as P. Refenes points out, once the generalization score has been estimated, you should use *all* training data to build the best network possible. --Lori @incollection{ ramachandran-92, MYKEY = " ramachandran-92 : .con .bap", EDITOR = "D. S. Touretzky", BOOKTITLE = "{Advances in Neural Information Processing Systems 4}", AUTHOR = "Sowmya Ramachandran and Lorien Pratt", TITLE = "Discriminability Based Skeletonisation", ADDRESS = "San Mateo, CA", PUBLISHER = "Morgan Kaufmann", YEAR = 1992, NOTE = "(To appear)" } Summary of discussion so far: hht: Hans Henrik Thodberg sf: Scott_Fahlman at sef-pmax.slisp.cs.cmu.edu jkk: John K. Kruschke rs: R Srikanth pr: P.Refenes at cs.ucl.ac.uk gh: Geoffrey Hinton kl: Ken Laws js: Jude Shavlik hht~~: Request for discussion. Goal is good generalisation: achievable hht~~: if nets are of minimal size. Advocates subtractive methods hht~~: over additive ones. Gives Thodberg, Lecun, Weigend hht~~: references. sf~~: restricting complexity ==> better generalization only when sf~~: ``signal components are larger and more coherent than the noise'' sf~~: Describes what cascade correlation does. sf~~: Questions why a subtractive method should be superior to this. sf~~: Gives reasons to believe that subtractive methods might be slower sf~~: (because you have to train, chop, train, instead of just train) jkk~~: Distinguishes between removing a node and just removing its jkk~~: participation (by zeroing weights, for example). When nodes jkk~~: are indeed removed, subtractive schemes can be more expensive, jkk~~: since we are training nodes which will later be removed. jkk~~: Cites his work (w/Mavellan) on schemes which are both additive jkk~~: and subtractive. rs~~: Says that overgeneralization is bad: distinguishes best fit from rs~~: most general fit as potentially competing criteria. pr~~: Points out that pruning techniques are able to remove redundant pr~~: parts of the network. Also points out that using a cross-validation pr~~: set without a third set is ``training on the testing data''. gh~~: Points out that, though you might be doing some training on the testing gh~~: set, since you only get a single number as feedback from it, you aren't gh~~: really fully training on this set. gh~~: Also points out that techniques such as his work on soft-weight sharing gh~~: seem to work noticeably better than using a validation set to decide gh~~: when to stop training. hht~~: Agrees that comparitive studies between subtractive and additive hht~~: methods would be a good thing. Describes a brute-force subtractive hht~~: Argues, by analogy to automobile construction and idea generation, why hht~~: subtractive methods are more appealing than additive ones. ~~pr: Argues that you'd get better generalization if you used more ~~pr: examples for training; in particular not just a subset of all ~~pr: training examples present. ~~kl: Points out the similarity between the additive/subtractive debate ~~kl: and stepwise-inclusion vs stepwise-deletion issues in multiple ~~kl: regression. ~~js: Points out that when reporting the number of examples used for ~~js: training, it's important to include the cross-validation examples ~~js: as well. From mclennan at cs.utk.edu Wed Nov 27 13:20:17 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Wed, 27 Nov 91 13:20:17 -0500 Subject: report available Message-ID: <9111271820.AA10236@maclennan.cs.utk.edu> ** Please do not forward to other lists. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. ----------------------------------------------------- Characteristics of Connectionist Knowledge Representation Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-147* ABSTRACT: Connectionism -- the use of neural networks for knowledge representation and inference -- has profound implications for the representation and processing of information because it provides a fundamentally new view of knowledge. However, its progress is impeded by the lack of a unifying theoretical construct corresponding to the idea of a calculus (or formal system) in traditional approaches to knowledge representation. Such a con- struct, called a simulacrum, is proposed here, and its basic pro- perties are explored. We find that although exact classification is impossible, several other useful, robust kinds of classifica- tion are permitted. The representation of structured information and constituent structure are considered, and we find a basis for more flexible rule-like processing than that permitted by conven- tional methods. We discuss briefly logical issues such as deci- dability and computability and show that they require reformula- tion in this new context. Throughout we discuss the implications for artificial intelligence and cognitive science of this new theoretical framework. * Modified slightly for electronic distribution. ----------------------------------------------------- FTP INSTRUCTIONS Either use Getps script, or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.cckr.ps.Z ftp> quit unix> uncompress maclennan.cckr.ps.Z unix> lpr maclennan.cckr.ps (or however you print postscript) Note that the postscript version is missing a (nonessential) fig- ure that has been pasted into the hardcopy version. If you need hardcopy, then send your request to: library at cs.utk.edu Your comments are welcome, Bruce MacLennan Department of Computer Science The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From soller%asylum at cs.utah.edu Thu Nov 28 18:18:19 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Thu, 28 Nov 91 16:18:19 -0700 Subject: Looking for A Roommate at NIPPS(have a room for two to share) Message-ID: <9111282318.AA27788@asylum.utah.edu> I am looking for someone to share a room I have reserved at the NIPPS conference for two people(I am the only one now). It will be at the Sheraton on Sunday, Monday, Tuesday, and Wednesday evenings(Denver Sheraton). The cost is $66.00 (works out to 33.00 per person per night if I can find someone plus local tax). I apologize for the short notice, but some possible roommates fell through. Please respond by e-mail or call me Friday in the middle of the day at (801) 582-1565 ext. 2475 and ask to be connected to my individual extension. Jerome B. Soller  From marshall at cs.unc.edu Fri Nov 29 20:46:55 1991 From: marshall at cs.unc.edu (Jonathan Marshall) Date: Fri, 29 Nov 91 20:46:55 -0500 Subject: Workshop on Self-Organization and Unsupervised Learning in Vision Message-ID: <9111300146.AA19204@marshall.cs.unc.edu> (Please post) PROGRAM: NIPS*91 Post-Conference Workshop on SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION December 6-7, 1991 in Vail, Colorado Workshop Chair: Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Substantial neurophysiological and psychophysical evidence suggests that visual experience guides or directs the formation of much of the fine structure of animal visual systems. Simple unsupervised learning procedures (e.g., Hebbian rules) using winner-take-all or local k-winner networks have been applied with moderate success to show how visual experience can guide the self-organization of visual mechanisms sensitive to low-level attributes like orientation, contrast, color, stereo disparity, and motion. However, such simple networks lack the more sophisticated capabilities needed to demonstrate self-organized development of higher-level visual mechanisms for segmentation, grouping/binding, selective attention, representation of occluded or amodal visual features, resolution of uncertainty, generalization, context-sensitivity, and invariant object recognition. A variety of enhancements to the simple Hebbian model have been proposed. These include anti-Hebbian rules, maximization of mutual information, oscillatory interactions, intraneuronal interactions, steerable receptive fields, pre- vs. post-synaptic learning rules, covariance rules, addition of behavioral (motor) information, and attentional gating. Are these extensions to unsupervised learning sufficiently powerful to model the important aspects of neurophysiological development of higher-level visual functions? Some of the specific questions that the workshop will address are: o Does our visual environment provide enough information to direct the formation of higher-level visual processing mechanisms? o What kinds of information (e.g., correlations, constraints, coherence, and affordances) can be discovered in our visual world, using unsupervised learning? o Can such higher-level visual processing mechanisms be formed by unsupervised learning? Or is it necessary to appeal to external mechanisms such as evolution (genetic algorithms)? o Are there further enhancements that can be made to improve the performance and capabilities of unsupervised learning rules for vision? o What neurophysiological evidence is available regarding these possible enhancements to models of unsupervised learning? o What aspects of the development of visual systems must be genetically pre-wired, and what aspects can be guided or directed by visual experience? o How is the output of an unsupervised network stage used in subsequent stages of processing? o How can behaviorally relevant (sensorimotor) criteria become incorporated into visual processing mechanisms, using unsupervised learning? This 2-day informal workshop brings together researchers in visual neuroscience, visual psychophysics, and neural network modeling. Invited speakers from these communities will briefly discuss their views and results on relevant topics. In discussion periods, we will examine and compare these results in detail. The workshop topic is crucial to our understanding of how animal visual systems got the way they are. By addressing this issue head-on, we may come to understand better the factors that shape the structure of animal visual systems, and we may become able to build better computational models of the neurophysiological processes underlying vision. ---------------------------------------------------------------------- PROGRAM FRIDAY MORNING, December 6, 7:30-9:30 a.m. Daniel Kersten, Department of Psychology, University of Minnesota. "Environmental structure and scene perception: Perceptual representation of material, shape, and lighting" David C. Knill, Center for Research in Learning, Perception, and Cognition, University of Minnesota. "Environmental structure and scene perception: The nature of visual cues for 3-D scene structure" DISCUSSION Edward M. Callaway, Department of Neurobiology, Duke University. "Development of clustered intrinsic connections in cat striate cortex" Michael P. Stryker, Department of Physiology, University of California at San Francisco. "Problems and promise of relating theory to experiment in models for the development of visual cortex" DISCUSSION FRIDAY AFTERNOON, December 6, 4:30-6:30 p.m. Joachim M. Buhmann, Lawrence Livermore National Laboratory. "Complexity optimized data clustering by competitive neural networks" Nicol G. Schraudolph, Department of Computer Science, University of California at San Diego. "The information transparency of sigmoidal nodes" DISCUSSION Heinrich H. Bulthoff, Department of Cognitive and Linguistic Sciences, Brown University. "Psychophysical support for a 2D view interpolation theory of object recognition" John E. Hummel, Department of Psychology, University of California at Los Angeles. "Structural description and self organizing object classification" DISCUSSION SATURDAY MORNING, December 7, 7:30-9:30 a.m. Allan Dobbins, Computer Vision and Robotics Laboratory, McGill University. "Local estimation of binocular optic flow" Alice O'Toole, School of Human Development, The University of Texas at Dallas. "Recent psychophysics suggesting a reformulation of the computational problem of structure-from-stereopsis" DISCUSSION Jonathan A. Marshall, Department of Computer Science, University of North Carolina at Chapel Hill. "Development of perceptual context-sensitivity in unsupervised neural networks: Parsing, grouping, and segmentation" Suzanna Becker, Department of Computer Science, University of Toronto. "Learning perceptual invariants in unsupervised connectionist networks" Albert L. Nigrin, Department of Computer Science and Information Systems, American University. "Using Presynaptic Inhibition to Allow Neural Networks to Perform Translational Invariant Recognition DISCUSSION SATURDAY AFTERNOON, December 7, 4:30-7:00 p.m. Jurgen Schmidhuber, Department of Computer Science, University of Colorado. "Learning non-redundant codes by predictability minimization" Laurence T. Maloney, Center for Neural Science, New York University. "Geometric calibration of a simple visual system" DISCUSSION Paul Munro, Department of Information Science, University of Pittsburgh. "Self-supervised learning of concepts" Richard Zemel, Department of Computer Science, University of Toronto. "Learning to encode parts of objects" DISCUSSION WRAP-UP, 6:30-7:00 ---------------------------------------------------------------------- (Please post) (Please post) PROGRAM: NIPS*91 Post-Conference Workshop on SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION December 6-7, 1991 in Vail, Colorado Workshop Chair: Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Substantial neurophysiological and psychophysical evidence suggests that visual experience guides or directs the formation of much of the fine structure of animal visual systems. Simple unsupervised learning procedures (e.g., Hebbian rules) using winner-take-all or local k-winner networks have been applied with moderate success to show how visual experience can guide the self-organization of visual mechanisms sensitive to low-level attributes like orientation, contrast, color, stereo disparity, and motion. However, such simple networks lack the more sophisticated capabilities needed to demonstrate self-organized development of higher-level visual mechanisms for segmentation, grouping/binding, selective attention, representation of occluded or amodal visual features, resolution of uncertainty, generalization, context-sensitivity, and invariant object recognition. A variety of enhancements to the simple Hebbian model have been proposed. These include anti-Hebbian rules, maximization of mutual information, oscillatory interactions, intraneuronal interactions, steerable receptive fields, pre- vs. post-synaptic learning rules, covariance rules, addition of behavioral (motor) information, and attentional gating. Are these extensions to unsupervised learning sufficiently powerful to model the important aspects of neurophysiological development of higher-level visual functions? Some of the specific questions that the workshop will address are: o Does our visual environment provide enough information to direct the formation of higher-level visual processing mechanisms? o What kinds of information (e.g., correlations, constraints, coherence, and affordances) can be discovered in our visual world, using unsupervised learning? o Can such higher-level visual processing mechanisms be formed by unsupervised learning? Or is it necessary to appeal to external mechanisms such as evolution (genetic algorithms)? o Are there further enhancements that can be made to improve the performance and capabilities of unsupervised learning rules for vision? o What neurophysiological evidence is available regarding these possible enhancements to models of unsupervised learning? o What aspects of the development of visual systems must be genetically pre-wired, and what aspects can be guided or directed by visual experience? o How is the output of an unsupervised network stage used in subsequent stages of processing? o How can behaviorally relevant (sensorimotor) criteria become incorporated into visual processing mechanisms, using unsupervised learning? This 2-day informal workshop brings together researchers in visual neuroscience, visual psychophysics, and neural network modeling. Invited speakers from these communities will briefly discuss their views and results on relevant topics. In discussion periods, we will examine and compare these results in detail. The workshop topic is crucial to our understanding of how animal visual systems got the way they are. By addressing this issue head-on, we may come to understand better the factors that shape the structure of animal visual systems, and we may become able to build better computational models of the neurophysiological processes underlying vision. ---------------------------------------------------------------------- PROGRAM FRIDAY MORNING, December 6, 7:30-9:30 a.m. Daniel Kersten, Department of Psychology, University of Minnesota. "Environmental structure and scene perception: Perceptual representation of material, shape, and lighting" David C. Knill, Center for Research in Learning, Perception, and Cognition, University of Minnesota. "Environmental structure and scene perception: The nature of visual cues for 3-D scene structure" DISCUSSION Edward M. Callaway, Department of Neurobiology, Duke University. "Development of clustered intrinsic connections in cat striate cortex" Michael P. Stryker, Department of Physiology, University of California at San Francisco. "Problems and promise of relating theory to experiment in models for the development of visual cortex" DISCUSSION FRIDAY AFTERNOON, December 6, 4:30-6:30 p.m. Joachim M. Buhmann, Lawrence Livermore National Laboratory. "Complexity optimized data clustering by competitive neural networks" Nicol G. Schraudolph, Department of Computer Science, University of California at San Diego. "The information transparency of sigmoidal nodes" DISCUSSION Heinrich H. Bulthoff, Department of Cognitive and Linguistic Sciences, Brown University. "Psychophysical support for a 2D view interpolation theory of object recognition" John E. Hummel, Department of Psychology, University of California at Los Angeles. "Structural description and self organizing object classification" DISCUSSION SATURDAY MORNING, December 7, 7:30-9:30 a.m. Allan Dobbins, Computer Vision and Robotics Laboratory, McGill University. "Local estimation of binocular optic flow" Alice O'Toole, School of Human Development, The University of Texas at Dallas. "Recent psychophysics suggesting a reformulation of the computational problem of structure-from-stereopsis" DISCUSSION Jonathan A. Marshall, Department of Computer Science, University of North Carolina at Chapel Hill. "Development of perceptual context-sensitivity in unsupervised neural networks: Parsing, grouping, and segmentation" Suzanna Becker, Department of Computer Science, University of Toronto. "Learning perceptual invariants in unsupervised connectionist networks" Albert L. Nigrin, Department of Computer Science and Information Systems, American University. "Using Presynaptic Inhibition to Allow Neural Networks to Perform Translational Invariant Recognition DISCUSSION SATURDAY AFTERNOON, December 7, 4:30-7:00 p.m. Jurgen Schmidhuber, Department of Computer Science, University of Colorado. "Learning non-redundant codes by predictability minimization" Laurence T. Maloney, Center for Neural Science, New York University. "Geometric calibration of a simple visual system" DISCUSSION Paul Munro, Department of Information Science, University of Pittsburgh. "Self-supervised learning of concepts" Richard Zemel, Department of Computer Science, University of Toronto. "Learning to encode parts of objects" DISCUSSION WRAP-UP, 6:30-7:00 ---------------------------------------------------------------------- (Please post) From D.M.Peterson at computer-science.birmingham.ac.uk Fri Nov 29 10:10:40 1991 From: D.M.Peterson at computer-science.birmingham.ac.uk (D.M.Peterson@computer-science.birmingham.ac.uk) Date: Fri, 29 Nov 91 15:10:40 GMT Subject: Cognitive Science at Birmingham Message-ID: ============================================================================ University of Birmingham Graduate Studies in COGNITIVE SCIENCE ============================================================================ The Cognitive Science Research Centre at the University of Birmingham comprises staff from the Departments/Schools of Psychology, Computer Science, Philosophy and English, and supports teaching and research in the inter-disciplinary investigation of mind and cognition. The Centre offers both MSc and PhD programmes. MSc in Cognitive Science The MSc programme is a 12 month conversion course, including a 4 month supervised project. The course places a particular stress on the relation between biological and computational architectures. Compulsory courses: AI Programming, Overview of Cognitive Science, Knowledge Representation Inference and Expert Systems, General Linguistics, Human Information Processing, Structures for Data and Knowledge, Philosophical Questions in Cognitive Science, Human-Computer Interaction, Biological and Computational Architectures, The Computer and the Mind, Current Issues in Cognitive Science. Option courses: Artificial and Natural Perceptual Systems, Speech and Natural Language, Parallel Distributed Processing. It is expected that students will have a good first degree --- psychology, computing, philosophy or linguistics being especially relevant. Funding is available through SERC and HTNT. PhD in Cognitive Science For 1992 studentships are expected for PhD level research into a range of topics including: o computational modelling of emotion o computational modelling of cognition o interface design o computational and psychophysical approaches to vision Computing Facilities Students have access to ample computing facilities, including networks of Hewlett-Packard, Sun and Sparc workstations in the Schools of Computer Science and Psychology. Contact For further details, contact: The Admissions Tutor, Cognitive Science, School of Psychology, University of Birmingham, PO Box 363, Edgbaston, Birmingham B15 2TT, UK. Phone: (021) 414 3683 Email: cogsci at bham.ac.uk From andreu at esaii.upc.es Sat Nov 30 09:48:31 1991 From: andreu at esaii.upc.es (andreu@esaii.upc.es) Date: Sat, 30 Nov 1991 9:48:31 UTC+0200 Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu> From Connectionists-Request at CS.CMU.EDU Fri Nov 1 00:05:16 1991 From: Connectionists-Request at CS.CMU.EDU (Connectionists-Request@CS.CMU.EDU) Date: Fri, 01 Nov 91 00:05:16 EST Subject: Bi-monthly Reminder Message-ID: <17441.688971916@B.GP.CS.CMU.EDU> *** DO NOT FORWARD TO ANY OTHER LISTS *** This is an automatically posted bi-monthly reminder about how the CONNECTIONISTS list works and how to access various online resources. CONNECTIONISTS is not an edited forum like the Neuron Digest, or a free-for-all newsgroup like comp.ai.neural-nets. It's somewhere in between, relying on the self-restraint of its subscribers. Membership in CONNECTIONISTS is restricted to persons actively involved in neural net research. The following posting guidelines are designed to reduce the amount of irrelevant messages sent to the list. Before you post, please remember that this list is distributed to over a thousand busy people who don't want their time wasted on trivia. Also, many subscribers pay cash for each kbyte; they shouldn't be forced to pay for junk mail. Happy hacking. -- Dave Touretzky & Hank Wan --------------------------------------------------------------------- What to post to CONNECTIONISTS ------------------------------ - The list is primarily intended to support the discussion of technical issues relating to neural computation. - We encourage people to post the abstracts of their latest papers and tech reports. - Conferences and workshops may be announced on this list AT MOST twice: once to send out a call for papers, and once to remind non-authors about the registration deadline. A flood of repetitive announcements about the same conference is not welcome here. - Requests for ADDITIONAL references. This has been a particularly sensitive subject lately. Please try to (a) demonstrate that you have already pursued the quick, obvious routes to finding the information you desire, and (b) give people something back in return for bothering them. The easiest way to do both these things is to FIRST do the library work to find the basic references, then POST these as part of your query. Here's an example: WRONG WAY: "Can someone please mail me all references to cascade correlation?" RIGHT WAY: "I'm looking for references to work on cascade correlation. I've already read Fahlman's paper in NIPS 2, his NIPS 3 abstract, and found the code in the nn-bench archive. Is anyone aware of additional work with this algorithm? I'll summarize and post results to the list." - Announcements of job openings related to neural computation. - Short reviews of new text books related to neural computation. To send mail to everyone on the list, address it to Connectionists at CS.CMU.EDU ------------------------------------------------------------------- What NOT to post to CONNECTIONISTS: ----------------------------------- - Requests for addition to the list, change of address and other administrative matters should be sent to: "Connectionists-Request at cs.cmu.edu" (note the exact spelling: many "connectionists", one "request"). If you mention our mailing list to someone who may apply to be added to it, please make sure they use the above and NOT "Connectionists at cs.cmu.edu". - Requests for e-mail addresses of people who are believed to subscribe to CONNECTIONISTS should be sent to postmaster at appropriate-site. If the site address is unknown, send your request to Connectionists-Request at cs.cmu.edu and we'll do our best to help. A phone call to the appropriate institution may sometimes be simpler and faster. - Note that in many mail programs a reply to a message is automatically "CC"-ed to all the addresses on the "To" and "CC" lines of the original message. If the mailer you use has this property, please make sure your personal response (request for a Tech Report etc.) is NOT broadcast over the net. - Do NOT tell a friend about Connectionists at cs.cmu.edu. Tell him or her only about Connectionists-Request at cs.cmu.edu. This will save your friend from public embarrassment if she/he tries to subscribe. - Limericks should not be posted here. ------------------------------------------------------------------------------- The CONNECTIONISTS Archive: --------------------------- All e-mail messages sent to "Connectionists at cs.cmu.edu" starting 27-Feb-88 are now available for public perusal. A separate file exists for each month. The files' names are: arch.yymm where yymm stand for the obvious thing. Thus the earliest available data are in the file: arch.8802 Files ending with .Z are compressed using the standard unix compress program. To browse through these files (as well as through other files, see below) you must FTP them to your local machine. ------------------------------------------------------------------------------- How to FTP Files from the CONNECTIONISTS Archive ------------------------------------------------ 1. Open an FTP connection to host B.GP.CS.CMU.EDU (Internet address 128.2.242.8). 2. Login as user anonymous with password your username. 3. 'cd' directly to one of the following directories: /usr/connect/connectionists/archives /usr/connect/connectionists/bibliographies 4. The archives and bibliographies directories are the ONLY ones you can access. You can't even find out whether any other directories exist. If you are using the 'cd' command you must cd DIRECTLY into one of these two directories. Access will be denied to any others, including their parent directory. 5. The archives subdirectory contains back issues of the mailing list. Some bibliographies are in the bibliographies subdirectory. Problems? - contact us at "Connectionists-Request at cs.cmu.edu". ------------------------------------------------------------------------------- How to FTP Files from the Neuroprose Archive -------------------------------------------- Anonymous FTP on archive.cis.ohio-state.edu (128.146.8.52) pub/neuroprose directory This directory contains technical reports as a public service to the connectionist and neural network scientific community. Researchers may place electronic versions of their preprints or articles in this directory, announce availability, and other interested researchers can rapidly retrieve and print the postscripts. This saves copying, postage and handling, by having the interested reader supply the paper. To place a file, put it in the Inbox subdirectory, and send mail to pollack at cis.ohio-state.edu. Within a couple of days, I will move and protect it, and suggest a different name if necessary. Current naming convention is author.title.filetype[.Z] where title is enough to discriminate among the files of the same author. The filetype is usually "ps" for postscript, our desired universal printing format, but may be tex, which requires more local software than a spooler. Very large files (e.g. over 200k) must be squashed (with either a sigmoid function :) or the standard unix "compress" utility, which results in the .Z affix. To place or retrieve .Z files, make sure to issue the FTP command "BINARY" before transfering files. After retrieval, call the standard unix "uncompress" utility, which removes the .Z affix. An example of placing a file is attached as an appendix, and a shell script called Getps in the directory can perform the necessary retrival operations. For further questions contact: Jordan Pollack Email: pollack at cis.ohio-state.edu Here is an example of naming and placing a file: gvax> cp i-was-right.txt.ps rosenblatt.reborn.ps gvax> compress rosenblatt.reborn.ps gvax> ftp archive.cis.ohio-state.edu Connected to archive.cis.ohio-state.edu. 220 archive.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password:neuron 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose/Inbox 250 CWD command successful. ftp> put rosenblatt.reborn.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for rosenblatt.reborn.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye. gvax> mail pollack at cis.ohio-state.edu Subject: file in Inbox. Jordan, I just placed the file rosenblatt.reborn.ps.Z in the Inbox. The INDEX sentence is "Boastful statements by the deceased leader of the neurocomputing field." Please let me know when it is ready to announce to Connectionists at cmu. BTW, I enjoyed reading your review of the new edition of Perceptrons! Frank ------------------------------------------------------------------------ How to FTP Files from the NN-Bench Collection --------------------------------------------- 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/bench". Any subdirectories of this one should also be accessible. Parent directories should not be. 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "nn-bench-request at cs.cmu.edu". From hinton at ai.toronto.edu Fri Nov 1 10:19:01 1991 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Fri, 1 Nov 1991 10:19:01 -0500 Subject: batch learning In-Reply-To: Your message of Wed, 30 Oct 91 15:14:07 -0500. Message-ID: <91Nov1.101916edt.204@neuron.ai.toronto.edu> Differentiation is a linear operator. So the derivative of the sum of the individual errors is the sum of the derivatives of the individual errors. The fact that differentiation is linear is actually helpful for some fancier ideas. To improve the conditioning of the error surface, it would be nice to convolve it with a gaussian blurring function so that sharp curvatures across ravines are reduced, but gentle slopes along ravines remain. Instead of convolving the error surface and then differntiating, we can differentiate and then convolve if we want. The momentum method is actually a funny version of this where we convolve along the path using a one-sided exponetially decaying filter. PS: As Rick Szeliski pointed out years ago, convolving a quadratic surface with a gaussian does not change its curvature, it just moves the whole thing up a bit (I hope I got this right!). But of course, our surfaces are not quadrartic. They have plateaus, and convolution with a gaussian causes nearby plateaus to smooth out nasty ravines, and also allows the gradients in ravines to be "seen" on the plateaus. Geoff From bachmann at radar.nrl.navy.mil Fri Nov 1 11:11:07 1991 From: bachmann at radar.nrl.navy.mil (Charles Bachmann) Date: Fri, 1 Nov 91 11:11:07 -0500 Subject: Resource Allocation Network (RAN) Message-ID: <9111011611.AA22642@radar.nrl.navy.mil> who is john platt? I can't remember why I should know him? -chip From yoshio at eniac.seas.upenn.edu Fri Nov 1 13:20:02 1991 From: yoshio at eniac.seas.upenn.edu (Yoshio Yamamoto) Date: Fri, 1 Nov 91 13:20:02 EST Subject: No subject Message-ID: <9111011820.AA01432@eniac.seas.upenn.edu> One of my friends and I have been working on the applications of neural networks in control problem independently. After a little discussion we came across the following problem, which may be interesting from a practical point of view. Suppose you have two continuous input units whose data are normalized between 0 and 1, several hidden units, and one cotinuous output unit. Also suppose the input given to the input unit A is totally unrelated with the output; the input is a randomized number in [0,1]. The input unit B, on the other hand, has a strong colleration with its corresponding output. Therefore what we need is a trained newtwork such that it shows no colleration between the input A and the output. This can be illustrated by an example in which the input B is fixed, the input A varies at random in [0,1], and the network suppresses the influence from the input A to minimum, ideally a constant output regardless of the values in the input A. In other words, we want the output be fully independent of the input A. Then one obvious solution would be that all weights directed from the input A to the next hidden layer converge to zeros or very small values through the training process. Why is this interesting? This is useful in practical problem. Initially you don't know which input has colleration with the outputs and which doesn't. So you use all available inputs anyway. If there is a nonsense input, then it should be identified so by a neural network and the influence from the input should be automatically suppressed. The best solution we have in mind is that if no colleration were identified, then the weights associated with the input will shrink to zero. Is there any way to handle this problem? As a training tool we assume a backprop. Any suggestion will be greatly appreciated. - Yoshio Yamamoto General Robotics And Sensory Perception Laboratory (GRASP) University of Pennsylvania From hcard at ee.UManitoba.CA Fri Nov 1 17:00:02 1991 From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA) Date: Fri, 1 Nov 91 16:00:02 CST Subject: batch learning Message-ID: <9111012200.AA03433@card.ee.umanitoba.ca> I think my question concerning batch learning needs some amplification. The question was whether to add all contributions to the error from each pattern before taking derivatives. The point is that the batch dE/dW can be estimated (given limited precision, noise, etc) more accurately than the sum of many small dE(pat)/dW components. This will be particularly true towards the end of learning when errors are small. There may be several ways to determine the weight error derivative. The batch dE/dW would be most directly determined by twiddling the weights individually and rerunning the training set. I know this is expensive but the issue is accuracy not efficiency. Howard Card From John.Hampshire at SPEECH2.CS.CMU.EDU Fri Nov 1 17:41:47 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Fri, 1 Nov 91 17:41:47 EST Subject: Adding noise to training data Message-ID: As a follow-on to Geoff's post on this topic... Adding noise to the training set of any classifier (connectionist or other, linear or non-linear) has the statistical effect of convolving the PDF of the noise with the class-conditional densities of the RV that generated the training samples (assuming the noise and the RV are independent). This can (in principle) help generalization, because we typically have training sets that are so puny, we don't begin to have a sufficient sample size to estimate with any degree of precision the a-posteriori class distributions of the RV we're trying to classify. As a result, we get estimated a-posteriori distributions for a training set size of n that are usually n scaled Dirac delta functions distributed in feature space (continuous RV case). For discrete RV's the estimated distributions are made up of Kronecker deltas... OK, so if you add noise to that, you're convolving the deltas with the PDF of the noise (in the limit that you create an infinite number of noisy versions of each original training vector). This means that you have fabricated a NEW set of a-posteriori class distributions --- one that you hope will yield classification boundaries that are better estimates of the TRUE a-posteriori class distributions than all those original deltas. Whether or not you succeed depends critically on your choice of the PDF for the noise AND the covariance matrix of that PDF. In most cases the critical choice comes down to a largely arbitrary guess. So adding noise to improve generalization is something of an act of desperation in the face of uncertainty... uncertainty about what kind and how complex a classifier to build, uncertainty about the PDF of the data being classified... uncertainty about lots of things. John From polycarp at bode.usc.edu Fri Nov 1 18:38:30 1991 From: polycarp at bode.usc.edu (Marios Polycarpou) Date: Fri, 1 Nov 91 15:38:30 PST Subject: Tech. Rep. Available Message-ID: The following paper has been placed in the Neuroprose archives at Ohio State. The file is "polycarpou.stability.ps.Z." See ftp in- structions below. IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS USING NEURAL NETWORK MODELS: DESIGN AND STABILITY ANALYSIS Marios M. Polycarpou and Petros A. Ioannou Department of Electrical Engineering - Systems University of Southern California, MC-2563 Los Angeles, CA 90089-2563, U.S.A Abstract: The feasibility of applying neural network learning techniques in problems of system identification and control has been demonstrated through several empirical studies. These studies are based for the most part on gradient techniques for deriving parameter adjustment laws. While such schemes perform well in many cases, in general, problems arise in attempting to prove stability of the overall system, or convergence of the output error to zero. This paper presents a stability theory approach to synthesizing and analyzing identification and control schemes for nonlinear dynamical systems using neural network models. The nonlinearities of the dynamical system are assumed to be unknown and are modelled by neural network architectures. Multilayer networks with sigmoidal activation functions and radial basis function networks are the two types of neural network models that are considered. These static network architectures are combined with dynamical elements, in the form of stable filters, to construct a type of recurrent network configuration which is shown to be capable of approximating a large class of dynamical systems. Identification schemes based on neural network models are developed using two different techniques, namely, the Lyapunov synthesis approach and the gradient method. Both identification schemes are shown to guarantee stability, even in the presence of modelling errors. A novel network architecture, referred to as dynamic radial basis function network, is derived and shown to be useful in problems dealing with learning in dynamic enviroments. For a class of nonlinear systems, a stable neural network based control configuration is presented and analyzed. unix> ftp archive.cis.ohio-state.edu Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get polycarpou.stability.ps.Z ftp> quit unix> uncompress polycarpou.stability.ps.Z unix> lpr polycarpou.stability.ps Any comments are welcome! Marios Polycarpou e-mail: polycarp at bode.usc.edu From John.Hampshire at SPEECH2.CS.CMU.EDU Sun Nov 3 12:05:15 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Sun, 3 Nov 91 12:05:15 EST Subject: Adding noise to training data Message-ID: Sorry... PDF = probability density function RV = random vector (i.e., the probabilistic model generating the feature vectors of the training set). class-conditional density = probability density of (feature vector | class) --- see for example Duda & Hart. John From uh311ae at sunmanager.lrz-muenchen.de Sun Nov 3 13:56:50 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 03 Nov 91 19:56:50+0100 Subject: The transposed weight matrix hassle Message-ID: <9111031856.AA02137@sunmanager.lrz-muenchen.de> There are some nasty things showing up if you want to fine-tune a parallel architecture to algorithms such as backprop. E.g., you either get the communications fast fro the forwar phase or the backward phase - but if you want to use the same communication flow for both, you have to transpose the weight matrices. This is on the order of O(forget it). Has anybody cooked up an idea ? Cheers, Henrik MPCI at LLNL IBM Research U. of Munich From hughc at spectrum.cs.unsw.oz.au Mon Nov 4 15:59:28 1991 From: hughc at spectrum.cs.unsw.oz.au (Hugh Clapin) Date: Mon, 4 Nov 91 15:59:28 AES Subject: unsubscribe Message-ID: <9111040510.23803@munnari.oz.au> I sure hope this is the right address to send 'please unsubscribe me' messages. If it isn't, could someone please flame me with the appropriate addresss? hugh clapin hughc at spectrum.cs.unsw.oz.au From CONNECT at nbivax.nbi.dk Mon Nov 4 03:22:00 1991 From: CONNECT at nbivax.nbi.dk (CONNECT@nbivax.nbi.dk) Date: Mon, 4 Nov 1991 09:22 +0100 (NBI, Copenhagen) Subject: Contents of IJNS Vol. 2, issue 3 Message-ID: <107B86DB60E0636A@nbivax.nbi.dk> Begin Message: ----------------------------------------------------------------------- INTERNATIONAL JOURNAL OF NEURAL SYSTEMS The International Journal of Neural Systems is a quarterly journal which covers information processing in natural and artificial neural systems. It publishes original contributions on all aspects of this broad subject which involves physics, biology, psychology, computer science and engineering. Contributions include research papers, reviews and short communications. The journal presents a fresh undogmatic attitude towards this multidisciplinary field with the aim to be a forum for novel ideas and improved understanding of collective and cooperative phenomena with computational capabilities. ISSN: 0129-0657 (IJNS) ---------------------------------- Contents of Volume 2, issue number 3 (1991): 1. D.G. Stork: Sources of neural structure in speech and language processing. 2. L. Xu, A. Krzyzak and E. Oja: Neural nets for the dual subspace pattern recognition method. 3. P.J. Zwietering, E.H.L. Aarts and J. Wessels: The design and complexity of exact multi-layered perceptrons. 4. M.M. van Hulle: A goal programming network for mixed integer linear programming: A case study for the job-shop scheduling problem. 5. J-X. Wu and C. Chan: A three-layered adaptive network for pattern density estimation and classification. 6. L. Garrido and V. Gaitan: Use of neural nets to measure the tau-polarisation and its Bayesian interpretation. 7. C.M. Bishop: A fast procedure for retraining the multilayer perceptrons. 8. V. Menon and D.S. Tang: Population oscillations in neuronal groups. 9. V. Rodrigues and J. Skrzypek: Combining similarities and dissimilarities in supervised learning. ---------------------------------- Editorial board: B. Lautrup (Niels Bohr Institute, Denmark) (Editor-in-charge) S. Brunak (Technical Univ. of Denmark) (Assistant Editor-in-Charge) D. Stork (Stanford) (Book review editor) Associate editors: B. Baird (Berkeley) D. Ballard (University of Rochester) E. Baum (NEC Research Institute) S. Bjornsson (University of Iceland) J. M. Bower (CalTech) S. S. Chen (University of North Carolina) R. Eckmiller (University of Dusseldorf) J. L. Elman (University of California, San Diego) M. V. Feigelman (Landau Institute for Theoretical Physics) F. Fogelman-Soulie (Paris) K. Fukushima (Osaka University) A. Gjedde (Montreal Neurological Institute) S. Grillner (Nobel Institute for Neurophysiology, Stockholm) T. Gulliksen (University of Oslo) D. Hammerstrom (Oregon Graduate Institute) D. Horn (Tel Aviv University) J. Hounsgaard (University of Copenhagen) B. A. Huberman (XEROX PARC) L. B. Ioffe (Landau Institute for Theoretical Physics) P. I. M. Johannesma (Katholieke Univ. Nijmegen) M. Jordan (MIT) G. Josin (Neural Systems Inc.) I. Kanter (Princeton University) J. H. Kaas (Vanderbilt University) A. Lansner (Royal Institute of Technology, Stockholm) A. Lapedes (Los Alamos) B. McWhinney (Carnegie-Mellon University) M. Mezard (Ecole Normale Superieure, Paris) J. Moody (Yale, USA) A. F. Murray (University of Edinburgh) J. P. Nadal (Ecole Normale Superieure, Paris) E. Oja (Lappeenranta University of Technology, Finland) N. Parga (Centro Atomico Bariloche, Argentina) S. Patarnello (IBM ECSEC, Italy) P. Peretto (Centre d'Etudes Nucleaires de Grenoble) C. Peterson (University of Lund) K. Plunkett (University of Aarhus) S. A. Solla (AT&T Bell Labs) M. A. Virasoro (University of Rome) D. J. Wallace (University of Edinburgh) D. Zipser (University of California, San Diego) ---------------------------------- CALL FOR PAPERS Original contributions consistent with the scope of the journal are welcome. Complete instructions as well as sample copies and subscription information are available from The Editorial Secretariat, IJNS World Scientific Publishing Co. Pte. Ltd. 73, Lynton Mead, Totteridge London N20 8DH ENGLAND Telephone: (44)81-446-2461 or World Scientific Publishing Co. Inc. Suite 1B 1060 Main Street River Edge New Jersey 07661 USA Telephone: (1)201-487-9655 or World Scientific Publishing Co. Pte. Ltd. Farrer Road, P. O. Box 128 SINGAPORE 9128 Telephone (65)382-5663 ----------------------------------------------------------------------- End Message From xiru at Think.COM Mon Nov 4 11:07:02 1991 From: xiru at Think.COM (xiru Zhang) Date: Mon, 4 Nov 91 11:07:02 EST Subject: The transposed weight matrix hassle In-Reply-To: Henrik Klagges's message of 03 Nov 91 19:56:50+0100 <9111031856.AA02137@sunmanager.lrz-muenchen.de> Message-ID: <9111041607.AA01820@yangtze.think.com> Date: 03 Nov 91 19:56:50+0100 From: Henrik Klagges There are some nasty things showing up if you want to fine-tune a parallel architecture to algorithms such as backprop. E.g., you either get the communications fast fro the forwar phase or the backward phase - but if you want to use the same communication flow for both, you have to transpose the weight matrices. This is on the order of O(forget it). Has anybody cooked up an idea ? Cheers, Henrik MPCI at LLNL IBM Research U. of Munich This is definitely not true for our implementations on CM-2. We have several ways to run backprop: 1. one big network on the whole CM-2; 2. multiple copies of the same network on CM-2, each copy runs on a group of processors (i.e., "batch mode"); 3. multiple copies of the same network on CM-2, each copy runs on one processor. In none of the above cases we had the problem you mentioned in your message. - Xiru Zhang Thinking Machines Corp. From morgan at icsib.Berkeley.EDU Mon Nov 4 11:42:20 1991 From: morgan at icsib.Berkeley.EDU (Nelson Morgan) Date: Mon, 4 Nov 91 08:42:20 PST Subject: forward and backward (or, the transposed weight matrix hassle) Message-ID: <9111041642.AA21959@icsib.Berkeley.EDU> > From: Henrik Klagges > Message-Id: <9111031856.AA02137 at sunmanager.lrz-muenchen.de> > To: connectionists at cs.cmu.edu > Subject: The transposed weight matrix hassle > > There are some nasty things showing up if you want to fine-tune > a parallel architecture to algorithms such as backprop. E.g., you > either get the communications fast fro the forwar phase or the > backward phase - but if you want to use the same communication > flow for both, you have to transpose the weight matrices. This > is on the order of O(forget it). Has anybody cooked up an idea ? > > Cheers, Henrik > > MPCI at LLNL > IBM Research > U. of Munich > > > > ------- End of Forwarded Message > > Sure. We have had our parallel architecture, the Ring Array Processor (RAP) training up backprop nets for our speech recognition research for about a year and a half now. With a ring or a torus, you don't need to duplicate weight matrices. For instance, you can organize the weights so they are most convenient for the forward pass, and then during the backward pass just compute partial sums for all of the deltas; that is, on each processor just compute what you can out of every sum that has the local weights in it. Then pass around the partial sums systolically, updating cumulatively in each processor. If your computation is strongly virtualized (many more than 1 neuron per physical processor), and if your computation is efficient (we shift around the ring in one cycle, plus a few cycles overhead added to each complete shift around the ring), then this part of backprop is not a bad cost. I think this is described in our paper in Proceedings of ASAP '90. You can also send to info at icsi.berkeley.edu to ask about RAP TR's. From hwang at pierce.ee.washington.edu Mon Nov 4 11:02:54 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Mon, 4 Nov 91 08:02:54 PST Subject: The transposed weight matrix hassle Message-ID: <9111041602.AA23287@pierce.ee.washington.edu.> The forward phase of BP is done by the matrix-vector multiplication, the backward phase is done by the vector-matrix multiplication consecutively (layer-by-layer). In addition, the weight updating itself is done by an outer product operation. All these three operations can be elegantly implemented by a "ring array architecture" with fully pipelining efficiency (pipeline rate = 1). Some references: 1) S. Y. Kung, J. N. Hwang, "Parallel architectures for artificial neural networks," ICNN'88, San Diego, 1988. 2) J. N. Hwang, J. A. Vlontzos, S. Y. Kung, "A systolic Neural Network Architecture for Hidden Markov Models," IEEE Trans. on ASSP, December 1989. 3) S. Y. Kung, J. N. Hwang, " A unified systolic architecture for artificial neural networks," Journal of parallel and distributed computing, Special issue on Neural Networks, March 1989. Jenq-Neng Hwang 11/04/91 From white at teetot.acusd.edu Mon Nov 4 14:44:24 1991 From: white at teetot.acusd.edu (Ray White) Date: Mon, 4 Nov 91 11:44:24 -0800 Subject: No subject Message-ID: <9111041944.AA10150@teetot.acusd.edu> From prechelt at ira.uka.de Tue Nov 5 03:14:16 1991 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Tue, 05 Nov 91 09:14:16 +0100 Subject: Generalization In-Reply-To: Your message of Tue, 29 Oct 91 13:48:36 -0800. <9110292148.AA15810@beowulf.ucsd.edu> Message-ID: > ... I'd appreciate it if you > could mail it to me; also, I'd appreciate anyone's opinion on > "what is generalization" in 250 words or less :-) Let's do it much shorter (less than 50 words): Generalization is the application of knowledge about a set C of cases from a certain domain to a not-before-seen case X from the same domain but not belonging to C allowing to handle that case correctly. Notes: ------ 1. This can be made a concrete definition if you say what the terms knowledge case domain handle correctly shall mean. 2. This definition is NOT Neural Network specific. It can become Neural Network specific, depending on how the above terms are being defined. 3. Strictly speaking this defines a process, not a property of a mapping or something like that. 4. This defines something that Neuralnetters sometimes call 'successful generalization' as opposed to what happens in the system when it tries to generalize, but as a result the wrong result results. :-> 5. If you can decide what 'correct' is and what not, you can compute the can-generalize-to(X) predicate. This enables to quantify generalization capabilities. Comments and flames welcome. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4317, FAX: ++49/721/697760) | less simple. From shams at maxwell.hrl.hac.com Mon Nov 4 16:58:43 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Mon, 4 Nov 91 13:58:43 PST Subject: The transposed weight matrix hassle Message-ID: <9111042158.AA00763@maxwell.hrl.hac.com> There are a couple of different methods used for dealing with this problem that areeffective to a certain extend. First, a three phase conflict-free routing method has been proposed [1] that implicitly implements the matrix inversion during the back-propagation learning phase. This method is generally applicable to fine-grain architectures and sparsely connected neural nets. The second mapping method proposed by Kung & Hwang [2], efficiently time-multiplexes the synaptic interconnections of a neural network onto the physical connections of a 1-D ring systolic array. In this mapping, the matrix inversion operation requiredduring the learning phase can be performed by communicating neuron activation values between the processors (as oppose to the partial sums used in the feed-forward case). [1] V. K. Prasanna Kumar and K. W. Przytula, "Algorithmic Mapping of Neural Network Models onto Parallel SIMD Machines," Proceedings of the Inter. Conf. on Appl. Spec. Array Proc., Princeton, NJ, Ed. S. Y. Kung, E. E. Swartzlander, J. A. B. Fortes and K. W. Przytula, 1990. [2] S. Y. Kung and J. N. Hwang, RA Unified Systolic Architecture for Artificial Neural Networks.S Journal of Parallel and Distributed Computing. 6: 358-387, 1989. Soheil Shams Hughes Research Labs From mackay at hope.caltech.edu Tue Nov 5 13:20:59 1991 From: mackay at hope.caltech.edu (David MacKay) Date: Tue, 5 Nov 91 10:20:59 PST Subject: Announcement of NIPS Bayesian workshop and associated ftp archive Message-ID: <9111051820.AA02763@hope.caltech.edu> One of the two day workshops at Vail this year will be: `Developments in Bayesian methods for neural networks' ------------------------------------------------------ David MacKay and Steve Nowlan, organizers The first day of this workshop will be 50% tutorial in content, reviewing some new ways Bayesian methods may be applied to neural networks. The rest of the workshop will be devoted to discussions of the frontiers and challenges facing Bayesian work in neural networks. Participants are encouraged to obtain preprints by anonymous ftp before the workshop. Instructions end this message. Discussion will be moderated by John Bridle. Day 1, Morning: Tutorial review. 0 Introduction to Bayesian data modelling. David MacKay 1 E-M, clustering and mixtures. Steve Nowlan 2 Bayesian model comparison and determination of regularization constants - application to regression networks. David MacKay 3 The use of mixture decay schemes in backprop networks. Steve Nowlan Day 1, Evening: Tutorial continued. 4 The `evidence' framework for classification networks. David MacKay Day 1, Evening: Frontier Discussion. Background: A: In many cases the true Bayesian posterior distribution over a hypothesis or parameter space is difficult to obtain analytically. Monte Carlo methods may provide a useful and computationally efficient way to estimate posterior distributions in such cases. B: There are many applications where training data is expensive to obtain, and it is desirable to select training examples so we can learn as much as possible from each one. This session will discuss approaches for selecting the next training point "optimally". The same approaches may also be useful for reducing the size of a large data set by omitting the uninformative data points. A Monte Carlo clustering Radford Neal B Data selection / active query learning Jurgen Schmidhuber David MacKay Day 2, morning discussion: C Prediction of generalisation Background: The Bayesian approach to model comparison evaluates how PROBABLE alternative models are given the data. In contrast, the real problem is often to estimate HOW WELL EACH MODEL IS EXPECTED TO GENERALISE. In this session we will hear about various approaches to predicting generalisation. It is hoped that the discussion will shed light on the questions: - How does Bayesian model comparison relate to generalisation? - Can we predict generalisation ability of one model assuming that the `truth' is in a different specified model class? - Is it possible to predict generalisation ability WITHOUT making implicit assumptions about the properties of the `truth'? - Can we interpret GCV (cross-validation) in terms of prediction of generalisation? 1 Prediction of generalisation with `GPE' John Moody 2 Prediction of generalisation - worst + average case analysis David Haussler + Michael Kearns 3 News from the statistical physics front Sara Solla Day 2, Evening discussion: (Note: There will probably be time in this final session for continued discussion from the other sessions.) D Missing inputs, unlabelled data and discriminative training Background: When training a classifier with a data set D_1 = {x,t}, a full probability model is one which assigns a parameterised probability P(x,t|w). However, many classifiers only produce a discriminant P(t|x,w), ie they do not model P(x). Furthermore, classifiers of the first type often yield better discriminative performance if they are trained as if they were only of the second type. This is called `discriminative training'. The problem with discriminative training is that it leaves us with no obvious way to use UNLABELLED data D_2 = {x}. Such data is usually cheap, but how can we integrate it with discriminative training? The same problem arises for most regression or classifier models when some of the input variables are missing from the input vector. What is the right thing to do? 1 Introduction: the problem of combining unlabelled data and discriminative training Steve Renals 2 Combining labelled and unlabelled data for the modem problem Steve Nowlan Reading up before the workshop ------------------------------ People intending to attend this workshop are encouraged to obtain preprints of relevant material before NIPS. A selection of preprints are available by anonymous ftp, as follows: unix> ftp hope.caltech.edu (or ftp 131.215.4.231) Name: anonymous Password: ftp> cd pub/mackay ftp> get README.NIPS ftp> quit Then read the file README.NIPS for further information. Problems? Contact David MacKay, mackay at hope.caltech.edu, or Steve Nowlan, nowlan at helmholtz.sdsc.edu --------------------------------------------------------------------------- From english at sun1.cs.ttu.edu Mon Nov 4 09:42:53 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Mon, 4 Nov 91 08:42:53 CST Subject: Adding noise to training data Message-ID: <9111041442.AA07265@sun1.cs.ttu.edu> At first blush, it seems there's a close relationship between Parzen estimation (Duda & Hart 1973) and training with noise added to the samples. If we were to use the noise function as the window function in Parzen estimation of the distribution from which the training set was drawn, wouldn't we would obtain precisely the noisy-sample distribution? And wouldn't a network minimizing squared error for the noisy training set asymptotically realize (i.e., as the number of noisy sample presentations approaches infinity) the Parzen estimator? The results of Hampshire and Perlmutter (1990) seem to be relevant here. > So adding noise to improve generalization is something of an act of > desperation in the face of uncertainty... uncertainty about what kind > and how complex a classifier to build, uncertainty about the PDF of the > data being classified... uncertainty about lots of things. I agree. But perhaps the "act of desperation" is of a familiar sort. Tom English Duda, R. O., and P. E. Hart. 1973. Pattern Classification and Scene Analysis. New York: Wiley & Sons. Hampshire, J. B., and B. A. Perlmutter. 1990?. Equivalence proofs for multi-layer perceptron classifiers and Bayesian discriminant function. In Proc. 1990 Connectionist Models Summer School. [Publisher?] From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Tue Nov 5 14:55:45 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Tue, 05 Nov 91 14:55:45 EST Subject: No subject In-Reply-To: Your message of Mon, 04 Nov 91 11:44:24 -0800. <9111041944.AA10150@teetot.acusd.edu> Message-ID: Another idea is to calculate the matrix of second derivatives (grad(grad E)) as well as the first derivatives (grad E) and from this information calculate the (unique) parabolic surface in weight space that has the same derivatives. Then the weights should be updated so as to jump to the center (minimum) of the parabola. I haven't coded this idea yet, has anyone else looked at this kind of thing, and if so what are the results? My Quickprop algorithm is pretty close to what you describe here, excpet that it uses only the diagonal terms of the second derivative (i.e. it pretends that the weight updates do not affect one another). If you haven't seen the paper on this, it's in neuroprose as "fahlman.quickprop-tr.ps.Z" or something close to that. It works well -- in the few cases I have seen in which both quickprop and conjugate gradient were used on the same problems, quickprop is considerably faster (though in very high-dimensional spaces, CG might win). Yann LeCun has used a slightly different version of the same idea: he back-propagates second-derivative information for each case, and uses this to dynamically adjust the learning rate. -- Scott Fahlman From haussler at saturn.ucsc.edu Tue Nov 5 17:09:23 1991 From: haussler at saturn.ucsc.edu (David Haussler) Date: Tue, 5 Nov 91 14:09:23 -0800 Subject: call for papers for COLT '92 Message-ID: <9111052209.AA23475@saturn.ucsc.edu> CALL FOR PAPERS COLT '92 Fifth ACM Workshop on Computational Learning Theory University of Pittsburgh July 27-29, 1992 The fifth workshop on Computational Learning Theory will be held at the University of Pittsburgh, Pittsburgh Pennsylvania. The workshop is sponsored jointly by the ACM Special Interest Groups in Automata and Computability Theory and Artificial Intelligence. Registration is open, within the limits of the space available (about 150 people). We invite papers in all areas that relate directly to the analysis of learning algorithms and the theory of machine learning, including artificial and biological neural networks, robotics, pattern recognition, inductive inference, information theory, decision theory, Bayesian/MDL estimation, and cryptography. We look forward to a lively, interdisciplinary meeting. As part of our program, we are pleased to be presenting an invited talk by Prof. A. Barto of the University of Massachusetts on reinforcement learning. Other invited talks may be scheduled as well. Authors should submit an extended abstract that consists of: + A cover page with title, authors' names, (postal and e-mail) addresses, and a 200 word summary. + A body not longer than 10 pages in twelve-point font. Be sure to include a clear definition of the theoretical model used, an overview of the results, and some discussion of their significance, including comparison to other work. Proofs or proof sketches should be included in the technical section. Experimental results are welcome, but are expected to be supported by theoretical analysis. Authors should send 13 copies of their abstract to David Haussler, COLT '92, Computer and Information Sciences, University of California, Santa Cruz, CA 95064. The deadline for receiving submissions is February 10, 1992. This deadline is FIRM. Authors will be notified by April 10; final camera-ready papers will be due May 15. Chair: Bob Daley (C.S. Dept., U. Pittsburgh, PA 15260). Program committee: David Haussler (UC Santa Cruz, chair), Naoki Abe (NEC, Japan), Shai Ben-David (Technion), Tom Cover (Stanford), Rusins Freivalds (U. of Latvia), Lisa Hellerstein (Northwestern), Nick Littlestone (NEC, Princeton), Wolfgang Maass (Technical U., Graz, Austria), Lenny Pitt (U. Illinois), Robert Schapire (Bell Labs, Murray Hill), Carl Smith (U. Maryland), Naftali Tishby (Hebrew U.), Santosh Venkatesh (U. Penn.) Note: papers that have appeared in journals or other conferences, or that are being submitted to other conferences are not appropriate for submission to COLT. From jbower at cns.caltech.edu Wed Nov 6 00:55:04 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Tue, 5 Nov 91 21:55:04 PST Subject: CNS*92 Message-ID: <9111060555.AA04973@cns.caltech.edu> CALL FOR PAPERS First Annual Computation and Neural Systems Meeting CNS*92 Sunday, July 26 through Friday, July 31 1992 San Francisco, California This is the first annual meeting of an inter-disciplinary conference intended to address the broad range of research approaches and issues involved in the general field of computational neuroscience. The meeting itself has grown out of a workshop on "The Analysis and Modeling of Neural Systems" which has been held each of the last two years at the same site. The strong response to these previous meetings has suggested that it is now time for an annual open meeting on computational approaches to understanding neurobiological systems. CNS*92 is intended to bring together experimental and theoretical neurobiologists along with engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in understanding how neural systems compute. The meeting will equally emphasize experimental, model-based, and more abstract theoretical approaches to understanding neurobiological computation. The first day of the meeting (July 26) will be devoted to tutorial presentations and workshops focused on particular technical issues confronting computational neurobiology. The next three days will include the main technical program consisting of plenary, contributed and poster sessions. There will be no parallel sessions and the full text of presented papers will be published. Following the regular session, there will be two days of focused workshops at a site on the California coast (July 30-31). Participation in the workshops is restricted to 75 attendees. Technical Program: Plenary, contributed and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Presentation categories: A. Theory and Analysis B. Modeling and Simulation C. Experimental D. Tools and Techniques Themes: A. Development B. Cell Biology C. Excitable Membranes and Synaptic Mechanisms D. Neurotransmitters, Modulators, Receptors E. Sensory Systems 1. Somatosensory 2. Visual 3. Auditory 4. Olfactory 5. Other F. Motor Systems and Sensory Motor Integration G. Behavior H. Cognitive I. Disease Submission Procedures: Original research contributions are solicited, and will be carefully refereed. Authors must submit six copies of both a 1000-word (or less) summary and six copies of a separate singlepage 50-100 word abstract clearly stating their results postmarked by January 7, 1992. Accepted abstracts will be published in the conference program. Summaries are for program committee use only. At the bottom of each abstract page and on the first summary page indicate preference for oral or poster presentation and specify at least one appropriate category and and theme. Also indicate preparation if applicable. Include addresses of all authors on the front of the summary and the abstract and indicate to which author correspondence should be addressed. Submissions will not be considered that lack category information, separate abstract sheets, the required six copies, author addresses, or are late. Mail Submissions To: Chris Ploegaert CNS*92 Submissions Division of Biology 216-76 Caltech Pasadena, CA. 91125 Mail For Registration Material To: Chris Ghinazzi Lawrence Livermore National Laboratories P.O. Box 808 Livermore CA. 94550 All submitting authors will be sent registration material automatically. Program committee decisions will be sent to the correspondence author only. CNS*92 Organizing Committee: Program Chair, James M. Bower, Caltech. Publicity Chair, Frank Eeckman, Lawrence Livermore Labs. Finances, John Miller, UC Berkeley and Nora Smiriga, Institute of Scientific Computing Res. Local Arrangements, Ted Lewis, UC Berkeley and Muriel Ross, NASA Ames. Program Committee: William Bialek, NEC Research Institute. James M. Bower, Caltech. Frank Eeckman, Lawrence Livermore Labs. Scott Fraser, Caltech. Christof Koch, Caltech. Ted Lewis, UC Berkeley. Eve Marder, Brandeis. Bruce McNaughton, University of Arizona. John Miller, UC Berkeley. Idan Segev, Hebrew University, Jerusalem Shihab Shamma, University of Maryland. Josef Skrzypek, UCLA. DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 please post From HOLMSTROM at csc.fi Mon Nov 4 12:41:00 1991 From: HOLMSTROM at csc.fi (HOLMSTROM@csc.fi) Date: Mon, 4 Nov 91 12:41 EET Subject: Adding noise to training data Message-ID: A note to John Hampshire's comment on this topic: Adding noise to the training vectors has been suggested and also used with some success by several authors. In a forthcoming article (Lasse Holmstrom and Petri Koistinen, "Using Additive Noise in Back-Propagation Training", IEEE Transactions on Neural Networks, January 1992) this method is discussed from the point of view of mathematical statistics. It is not claimed that better generalization is always achieved but mathematical insight is given to the choice of the characteristics of the additive noise density if using additive noise is attempted. A critical question is the level (variance) of the additive noise. One method to estimate a suitable noise level directly from data is to use a cross-validation method known from statistics. In a standard benchmark experiment (Kohonen- Barna-Chrisley, Neurocomputing 2) significant improvement in classification performance was achieved. The training method is also shown to be asymptotically consistent provided the noise level is chosen appropriately. Lasse Holmstrom From patrick at magi.ncsl.nist.gov Wed Nov 6 14:29:04 1991 From: patrick at magi.ncsl.nist.gov (Patrick Grother) Date: Wed, 6 Nov 91 14:29:04 EST Subject: Parallel MLP's Message-ID: <9111061929.AA17129@magi.ncsl.nist.gov> Parallel Multilayer Perceptron We have implemented batch mode conjugate gradient and backprop algorithms on a AMT 510C array processor (32 x 32 8 bit SIMD elements). As you know the weight update (i.e. straight vector operation done every epoch) is a tiny fraction of the total cost since, for realistically large (non redundant = noisy) training sets, the forward and backward propagation time is dominant. Given that this applies to both conjugate gradient and backprop, and that conjgrad typically converges in 30 times fewer iterations than backprop, conjgrad is undeniably the way to do it. On line involves the forward pass of a single input vector through the weights. This involves a matrix*vector operation and a sigmoid (or whatever) evaluation. The latter is purely parallel. The matrix operation involves a broadcast, a parallel multiplication and a recursive doubling sum over rows (or columns). Batch (or semi batch) passes many vectors through and is thus a matrix*matrix operation. The literature on this operation in parallel is huge and the best algorithm is inevitably dependent on the (communications bandwidth of the) particular machine and on the size of the matrices. On the array processor the outer product accumulation algorithm is up to 5 times quicker than the inner product algorithm: Outer: Given two matrices W(HIDS,INPS), P(INPS,PATS) obtain F(HIDS,PATS) thus F = 0 do i = 1, INPS { col_replicas = col_broadcast(W(,i)) # replicate col i over cols row_replicas = row_broadcast(P(i,)) # replicate row i over rows F = F + row_replicas * col_replicas } Inner: As above except do i = 1, HIDS { col_replicas = col_broadcast(W(i,)) # replicate row i over cols F(i,) = sum_over_rows( P * col_replicas ) # sum up the cols (ie over rows) } Henrik Klagges' weight matrix transposition in backprop is not really necessary. The output error is backpropagated through the final layer weights using the algorithm above; the difference is merely one of selecting columns instead of rows. Outer: With weights W(HIDS,INPS), output F(HIDS,PATS) obtain the inputs P(M,N) P = 0 do i = 1, L { row_replicas = row_broadcast(W(i,)) # replicate the row i col_replicas = col_broadcast(P(i,)) # replicate the col i P = P + row_replicas * col_replicas } On the DAP this operation is just as fast as explicitly doing the transpose. Transposition can be speeded greatly if the matrix dimensions are powers of two but the operation is inexpensive compared to matrix multiplication anyway, for any size matrix. A "recall" forward pass through a 32:32:10 MLP with 24 bit floats is taking 79 microsecs per input vector. Through a 128:64:10 takes 305 microsecs and a 1024:512:32 takes 1237. The latter is equivalent to 17.4 million connection-presentations per second. Such speed permits MLPs, trained from many initial random weight positions, to be optimised. The on-line versus batch problem is still unclear and I think that a semi batch, conjugate gradient method looks a good compromise in which case parallel code as above applies. Patrick Grother patrick at magi.ncsl.nist.gov Image Recognition Group Advanced Systems Division Computer Systems Laboratory Room A216 Building 225 NIST Gaithersburg MD 20899 From jon at cs.flinders.oz.au Wed Nov 6 18:38:37 1991 From: jon at cs.flinders.oz.au (jon@cs.flinders.oz.au) Date: Thu, 07 Nov 91 10:08:37 +1030 Subject: Quickprop. Message-ID: <9111062338.AA03721@degas> My Quickprop algorithm is pretty close to what you describe here, excpet that it uses only the diagonal terms of the second derivative (i.e. it pretends that the weight updates do not affect one another). If you haven't seen the paper on this, it's in neuroprose as "fahlman.quickprop-tr.ps.Z" or something close to that. It works well -- in the few cases I have seen in which both quickprop and conjugate gradient were used on the same problems, quickprop is considerably faster (though in very high-dimensional spaces, CG might win). Yann LeCun has used a slightly different version of the same idea: he back-propagates second-derivative information for each case, and uses this to dynamically adjust the learning rate. - -- Scott Fahlman Thanks for the info. I'll grab your paper out of Neuroprose and give it a read. Have you also done anything on keeping the magnitude of the error vector constant? Doing this makes a lot of sense to me as it is only the direction of the next jump in weight space that is important, and in particular if one uses delta(w) = - alpha*grad(E) then flat regions cause very slow progress and steep regions may cause one to move too fast. delta(w) = -alpha*grad(E)/||grad(E)|| gives one a lot more control over the learning rate. Jon Baxter From hal at asi.com Wed Nov 6 16:37:22 1991 From: hal at asi.com (Hal McCartor) Date: Wed, 6 Nov 91 13:37:22 PST Subject: Efficient parallel Backprop Message-ID: <9111062137.AA22811@asi.com> In response to the recent question about running BP on parallel hardware: The Backpropagation algorithm can be run quite efficiently on parallel hardware by maintaining a transpose of the output layer weights on the hidden node processors and updating them in the usual manner so that they are always maintained as the exact transpose of the output layer weights. The error on the output nodes is broadcast to all hidden nodes simultaneously where each multiplies it by the appropriate transpose weight to accumulate an error sum. The transpose weights can also be updated in parallel making the whole process quite efficient. This technique is further explained in Advances in Neural Information Processing, Volume 3, page 1028, in the paper, Back Propagation Implementation on the Adaptive Solutions CNAPS Neurocomputer Chip. Hal McCartor From kirk at watson.ibm.com Wed Nov 6 19:32:36 1991 From: kirk at watson.ibm.com (Scott Kirkpatrick) Date: Wed, 6 Nov 91 19:32:36 EST Subject: NIPS Workshop housing Message-ID: Because of strong early demand for rooms, the block of rooms which we had held at the Marriott for NIPS workshop attendees sold out even before the Nov. 4 date mentioned in the brochure. In fact, the whole hotel is now sold out during the workshops. The Marriott has arranged for two hotels, each in the next block, to offer rooms at the conference rate or close to it. These are the Evergreen Hotel, (303)-476-7810 at $74/night, and L'Ostello, (303)-476-2050 at $79/night. If you were unable to get a room while this was being sorted out, or haven't reserved yet, call one of these. You can also call the published Marriott number, (303)-476-4444, for pointers to these or additional hotels, should we run out again. From white at teetot.acusd.edu Wed Nov 6 19:39:03 1991 From: white at teetot.acusd.edu (Ray White) Date: Wed, 6 Nov 91 16:39:03 -0800 Subject: No subject Message-ID: <9111070039.AA01514@teetot.acusd.edu> Yoshio Yamamoto wrote: > ... > Suppose you have two continuous input units whose data are normalized between > 0 and 1, several hidden units, and one continuous output unit. > Also suppose the input given to the input unit A is totally unrelated with the > output; the input is a randomized number in [0,1]. The input unit B, on the > other hand, has a strong correlation with its corresponding output. > Therefore what we need is a trained network such that it shows no correlation > between the input A and the output. ... > Why is this interesting? This is useful in practical problem. > Initially you don't know which input has correlation with the outputs > doesn't. So you use all available inputs anyway. If there is a nonsense > input, then it should be identified so by a neural network and the influence > from the input should be automatically suppressed. > The best solution we have in mind is that if no correlation were identified, > then the weights associated with the input will shrink to zero. > Is there any way to handle this problem? > As a training tool we assume a backprop. > Any suggestion will be greatly appreciated. > - Yoshio Yamamoto > General Robotics And Sensory Perception Laboratory (GRASP) > University of Pennsylvania In reading this, I infer that there is a problem in training the net with backprop - and then not getting the desired behavior. I'm not enough of a backprop person to know if that inference is correct. But in any case, why use backprop, when the desired behavior is a natural outcome of training the hidden units with an optimizing algorithm, something similar to Hebbian learning, but Hebbian learning modified so that the learning is correlated with the desired output function? An example of such an algorithm is Competitive Hebbian Learning, which will be published in the first (or maybe the second) 1992 issue of NEURAL NETWORKS (Ray White, Competitive Hebbian Learning: Algorithm and Demonstrations). One trains the hidden units to compete with each other as well as with the inverse of the desired output function. I've tried it on Boolean functions and it works, though I haven't tried the precise problem with real-valued inputs. Other optimizing "soft-competition" algorithms may also work. One should get the best results for the output layer by training it with a delta rule (not backprop, since only the output layer training is still needed). Competitive Hebbian Learning may work for the output layer as well, but one should get better convergence to the desired output with delta-rule training. Ray White (white at teetot.acusd.edu) Depts. of Physics & Computer Science University of San Diego From FRANKLIN%lem.rug.ac.be at BITNET.CC.CMU.EDU Thu Nov 7 13:29:00 1991 From: FRANKLIN%lem.rug.ac.be at BITNET.CC.CMU.EDU (Franklin Vermeulen) Date: Thu, 7 Nov 91 13:29 N Subject: No subject Message-ID: <01GCO3MBA5SG001FXI@BGERUG51.BITNET> Dear fellow researcher: I am looking for names (and coordinates) of people (preferably in Europe) knowledgeable in the fields of statistics/neural networks, with the aim of estimating gray scale images (e.g., in the field of subtractive radiology). Thank you for your kind consideration of this request. If you intend to react, please do not postpone your answer. Sincerely. Franklin L. Vermeulen, Ph.D. E-Mail: Franklin at lem.rug.ac.be Medical imaging group, Electronics and Metrology Lab, Universiteit Gent Sint Pietersnieuwstraat 41, B-9000 Gent (BELGIUM) +32 (91) 64-3367 (Direct dial-in) From HOLMSTROM at csc.fi Thu Nov 7 13:06:00 1991 From: HOLMSTROM at csc.fi (HOLMSTROM@csc.fi) Date: Thu, 7 Nov 91 13:06 EET Subject: Adding noise to training data Message-ID: A note to a comment by Tom English on this topic: He writes > At first blush, it seems there's a close relationship between Parzen > estimation (Duda & Hart 1973) and training with noise added to the > samples. If we were to use the noise function as the window function > in Parzen estimation of the distribution from which the training set > was drawn, wouldn't we would obtain precisely the noisy-sample > distribution? Yes, this is correct. > And wouldn't a network minimizing squared error for the noisy training > set asymptotically realize (i.e., as the number of noisy sample > presentations approaches infinity) the Parzen estimator? The > results of Hampshire and Perlmutter (1990) seem to be relevant here. As I said in my earlier note to John Hampshire's comment, there will be a paper in the January issue of IEEE Transactions on Neural Networks that gives a statistical analysis of using additive noise in training. Several asymptotic results are given there. Lasse Holmstrom From GINZBERG at TAUNIVM.TAU.AC.IL Wed Nov 6 16:30:22 1991 From: GINZBERG at TAUNIVM.TAU.AC.IL (Iris Ginzberg) Date: Wed, 06 Nov 91 16:30:22 IST Subject: Roommate for NIPS Message-ID: Dear Connectionists, I'm looking for a roommate for NIPS and/or the workshop. I'll arrive at Denver on Sunday, leave on Thursday. Arrive at Vale on Thursday, leave on Monday or Tuesday. ,,,Iris my e-mail is GINZBERG @ TAUNIVM.BITNET From John.Hampshire at SPEECH2.CS.CMU.EDU Thu Nov 7 09:21:43 1991 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Thu, 7 Nov 91 09:21:43 EST Subject: Adding noise to training data Message-ID: Actually, I made my original post on this topic to connectionists by accident... Oh well. Tom English is right --- Parzen windows do (in effect) estimate classification boundaries by convolving the noise PDF with the training sample PDF (a bunch of deltas...). Again, the goodness of the result depends on the PDF of the noise and its covariance matrix. The difference between Parzen windows and adding noise to data which is then used to train a (say) connectionist classifier is that both techniques formulate a new estimated PDF of the training data in the basis of the additive noise, but the connectionist model THEN goes on to try to model this new PDF in its connections and ITS set of basis functions. This is what seems desperate to me, and I didn't mean to impugn Parzen windows. I expected someone to catch me on that! -John From jcp at vaxserv.sarnoff.com Thu Nov 7 13:09:26 1991 From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385) Date: Thu, 7 Nov 91 13:09:26 EST Subject: NIPS-91 reminder Message-ID: <9111071809.AA00563@sarnoff.sarnoff.com> Now's the time to register for the NIPS-91 conference and workshops! For those of you who don't know about NIPS, read on: NEURAL INFORMATION PROCESSING SYSTEMS: NATURAL AND SYNTHETIC Conference: Monday, December 2 - Thursday, December 5, 1991; Denver, Colorado Workshop: Friday, December 6 - Saturday, December 7, 1991; Vail, Colorado This is the fifth meeting of an inter-disciplinary conference which brings together neuroscientists, engineers, computer scientists, cognitive scientists, physicists, and mathematicians interested in all aspects of neural processing and computation. There will be an afternoon of tutorial presentations (Dec 2) preceding the regular session and two days of focused workshops will follow at a nearby ski area (Dec 6-7). The meeting is sponsored by the Institute of Electrical and Electronic Engineers Information Theory Group, the Society for Neuroscience, and the American Physical Society. Plenary, contributed, and poster sessions will be held. There will be no parallel sessions. The full text of presented papers will be published. Topical categories include: Neuroscience; Theory; Implementation and Simulations; Algorithms and Architectures; Cognitive Science and AI; Visual Processing; Speech and Signal Processing; Control, Navigation, and Planning; Applications. The format of the workshop is informal. Beyond reporting on past research, the goal is to provide a forum for scientists actively working in the field to freely discuss current issues of concern and interest. Sessions will meet in the morning and in the afternoon of both days, with free time in between for the ongoing individual exchange or outdoor activities. Specific open and/or controversial issues are encouraged and preferred as workshop topics. The deadline for submission of abstracts and workshop proposals is May 17th, 1991. For further information concerning the conference contact Dr. Stephen J. Hanson; NIPS*91 Information; Siemens Research Center; 755 College road East; Princeton NJ, 08540 From english at sun1.cs.ttu.edu Thu Nov 7 16:29:00 1991 From: english at sun1.cs.ttu.edu (Tom English) Date: Thu, 7 Nov 91 15:29:00 CST Subject: Adding noise to training data Message-ID: <9111072129.AA11674@sun1.cs.ttu.edu> With regard to the relationship of Parzen estimation and training with noise added to samples, John Hampshire writes > ... but the connectionist model THEN goes on to try to model > this new PDF in its connections and ITS set of basis functions. > This is what seems desperate to me.... In a sense, it IS desperate. But an important problem for direct-form implementations of Parzen estimators (e.g., Specht 1990) is the storage requirements. Adding noise to the training samples and training by back-propagation may be interpreted as a time-expensive approach to obtaining a space-economical Parzen estimator. (I'm assuming that the net requires less memory than direct-form implementations). Of course, we don't know in advance if a given network is equipped to realize the Parzen estimator. I suspect that someone could produce a case in which the "universal approximator" architecture (Hornik, Stinchcombe, and White 1989) would achieve a reasonable approximation only by using more memory than the direct-form implementation. My thanks to John for (accidentally) posting some interesting comments. --Tom English Specht, D. 1990. Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Tran. Neural Networks 1 (1), pp. 111-21. Hornik, K., Stinchcombe, M., and White, H. 1989. Multilayer feedforward nets are universal approximators. Neural Networks 2, pp. 359-66. From patrick at magi.ncsl.nist.gov Thu Nov 7 16:38:15 1991 From: patrick at magi.ncsl.nist.gov (Patrick Grother) Date: Thu, 7 Nov 91 16:38:15 EST Subject: Scaled Conjugate Gradient Message-ID: <9111072138.AA22416@magi.ncsl.nist.gov> The factor of 30 speed up of conjugate gradient over backprop that I quoted in my piece of November 6 is due to an excellent scaled conjugate gradient algorithm from Martin Moeller. Some conjgrad algorithms have been criticised on the basis that a costly line search is performed per epoch. Moeller's method sidesteps this expense by means of an automatic Levenberg-Marquardt step size scaling at each iteration. This effectively regulates the indefiniteness of the Hessian matrix. Patrick Grother Advanced Systems Division NIST November 7 From at neural.att.com Thu Nov 7 17:36:00 1991 From: at neural.att.com (@neural.att.com) Date: Thu, 07 Nov 91 17:36:00 -0500 Subject: Efficient parallel Backprop In-Reply-To: Your message of Wed, 06 Nov 91 13:37:22 -0800. Message-ID: <9111072236.AA03913@lamoon> The trick mentioned by Hal McCartor which consists in storing each weight twice (one copy in the processor that takes care of the presynaptic unit, and one copy in the processor that takes care of the postsynaptic unit) and update them both independently is probably one of the best techniques. It does require to make strong assumptions about the architecture of the network, and only costs you a factor of two in efficiency. It requires to broadcast the states, but in most cases there is a lot less states than weights. Unfortunately, it does not work so well in the case of shared-weight networks. I first heard about it from Leon Bottou (then at University of Paris-Orsay) in 1987. This trick was used in the L-NEURO backprop chip designed by Marc Duranton at the Philips Labs in Paris. -- Yann Le Cun From thierry at elen.utah.edu Thu Nov 7 18:23:09 1991 From: thierry at elen.utah.edu (Thierry Guillerm) Date: Thu, 7 Nov 91 16:23:09 MST Subject: Instantaneous and Average performance measures Message-ID: <9111072323.AA20758@perseus.elen.utah.edu> ABOUT INSTANTANEOUS AND AVERAGE PERFORMANCE MEASURE: A gradient descent learning based on an estimate of the performance measure U(w) (w = weights) can be represented as dw = -a0 grad( est[U(w)] ) dt where a0 is the step size, w represents the weights, t the time. The usual technique of moving the weights for each training sample can be represented as dw = - a0 grad( L(w,z) ) dt where z reprents the training sample, L(w,z) is the instantaneous performance measure. A good point about using an instantaneous performance measure L(w,z) in the gradient descent, (instead of waiting a few epochs for estimating U(w) and upgrade the weights) is that noise is inherently added to the process. Under some conditions (which ones?), the instantaneous learning can be rewritten as dw = - a0 grad( U(w) ) dt + b0 dx where x is a standard Brownian motion. This equation represents a diffusion process, which can be viewed as shaking the movement of the current weight point in the weight space. It is known that this process is a simulated annealing process. It is suspected that a minimum obtained this way will be better than with an averaging method. Has somebody done work on the quality of a solution obtained after a given time of running BackProp, or simulated annealing? Are there quantitative results about how long it takes to reach a given quality of solution? send email to: thierry at signus.utah.edu From alan_harget at pphub.aston.ac.uk Thu Nov 7 07:55:56 1991 From: alan_harget at pphub.aston.ac.uk (Alan Harget) Date: Thu, 7 Nov 91 12:55:56 GMT Subject: Vacancies Message-ID: Date 7/11/91 Subject Vacancies From Alan Harget To Connect Bulletin, Midlands KBS Subject: Time:12:52 am OFFICE MEMO Vacancies Date:7/11/91 ASTON UNIVERSITY BIRMINGHAM, ENGLAND DEPARTMENT OF COMPUTER SCIENCE AND APPLIED MATHEMATICS READERSHIPS/LECTURESHIPS IN COMPUTER SCIENCE Candidates are sought for the above posts to strengthen and expand the work of the Department. The key requirement is a proven ability to undertake high-quality research in Computer Science. Applicants for Readerships will have a substantial research record and have made a significant impact in their chosen area. Potential Lecturers will already have a publication record that demonstrates their research ability. Although areas of special interest are Neural Networks, Software Engineering, Database Techniques and Artificial Intelligence, excellent candidates in other fields will be considered. Anyone wishing to discuss the posts informally may call Professor David Bounds, Head of Department. Tel: 021-359-3611, ext 4285. Application forms and further particulars may be obtained from the Personnel Officer (Academic Staff), Aston University, Aston Triangle, Birmingham B4 7ET. Please quote appropriate Ref No: Readership (9139/EB); Lectureship (9140/EB). 24-hr answerphone: 021-359 -0870. Facsimile: 021-359-6470. Closing date for receipt of applications is 22 November 1991. From joe at cogsci.edinburgh.ac.uk Thu Nov 7 11:23:49 1991 From: joe at cogsci.edinburgh.ac.uk (Joe Levy) Date: Thu, 07 Nov 91 16:23:49 +0000 Subject: Research Post in Edinburgh UK Message-ID: <17169.9111071623@muriel.cogsci.ed.ac.uk> Research Assistant in Neural Networks University of Edinburgh Department of Psychology Applications are invited for a three year computational project studying the effects of damage on the performance of neural networks and other distributed systems, in order to elucidate the inferences that can be made about normal cognitive function from the patterns of breakdown in brain damaged patients. Previous experience in cognitive psychology and/or neural networks is desirable, preferably at the doctoral level. The post is funded by the Joint Councils Initiative in Cognitive Science/HCI awared to Dr Nick Chater. The closing date is 3 December, and the starting date will be between 1 January and 1 March 1992. A covering letter, CV and the names and addresses of 3 referees should be sent to University of Edinburgh, Personnel Department 1 Roxburgh Street, Edinburgh EH8 9TB. Informal enquiries to Nick Chater email: nicholas%cogsci.ed.ac.uk at nsfnet-relay.ac.uk From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET Fri Nov 8 10:02:15 1991 From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya) Date: Fri, 8 Nov 91 10:02:15 JST Subject: second derivatives Message-ID: <9111080102.AA17565@hcrlgw.crl.hitachi.co.jp> > Another idea is to calculate the matrix of second derivatives (grad(grad E)) as > well as the first derivatives (grad E) and from this information calculate the > (unique) parabolic surface in weight space that has the same derivatives. Then > the weights should be updated so as to jump to the center (minimum) of the > parabola. I haven't coded this idea yet, has anyone else looked at this kind > of thing, and if so what are the results? >-- Scott Fahlman I don't know about the exact same idea or of the method used by Le Cun, but Dr. Sholom Weiss of Rutgers University (weiss at cs.rutgers.edu) has developed an efficient method for calculating the second derivatives using monte-carlo methods. The second derivatives are then used within a stiff differential equation solver to optimize the weights by solving the BP differential eqns directly. the results on different datasets (e.g. peterson-barney's vowel dataset, robinson's vowel dataset) are superior to other results not only in terms of training, but also in terms of generalization. --nitin From uli at ira.uka.de Fri Nov 8 12:36:02 1991 From: uli at ira.uka.de (Uli Bodenhausen) Date: Fri, 08 Nov 91 18:36:02 +0100 Subject: post NIPS workshop on speech Message-ID: ------------------------------------------------------------------- Optimization of Neural Network Architectures for Speech Recognition ------------------------------------------------------------------- Dec. 7, 1991, Vail, Colorado Uli Bodenhausen, Universitaet Karlsruhe Alex Waibel, Carnegie Mellon University A variety of neural network algorithms have recently been applied to speech recognition tasks. Besides having learning algorithms for weights, optimization of the network architectures is required to achieve good performance. Also of critical importance is the optimization of neural network architectures within hybrid systems for best performance of the system as a whole. Parameters that have to be optimized within these constraints include the number of hidden units, number of hidden layers, time-delays, connectivity within the network, input windows, the number of network modules, number of states and others. The workshop intends to discuss and evaluate the importance of these architectural parameters and different integration strategies for speech recognition systems. Participants are welcome to present short case studies on the optimization of neural networks, preferably with an evaluation of the optimization steps. It would also be nice to hear about some rather unconventional techniques of optimization (as long as its not vodoo or the 'shake the disk during compilation' type of technique). The workshop could also be of interest to researchers working on constructive/destructive learning algorithms because the relevance of different architectural parameters should be considered for the design of these algorithms. The following speakers have already confirmed their participation: Kenichi Iso, NEC Corporation, Japan Patrich Haffner, CNET, France Mike Franzini, Telefonica I + D, Spain Allen Gorin, AT&T, USA Yoshua Bengio, MIT ----------------------------------------------------------------------- Further contributions are welcome. Please send mail to uli at ira.uka.de or uli at speech2.cs.cmu.edu. ------------------------------------------------------------------------ From white at teetot.acusd.edu Fri Nov 8 13:02:34 1991 From: white at teetot.acusd.edu (Ray White) Date: Fri, 8 Nov 91 10:02:34 -0800 Subject: No subject Message-ID: <9111081802.AA12659@teetot.acusd.edu> In reply to: > Manoel Fernando Tenorio > -------- > How is that different from Sanger's principle component algorithm? (NIPS,90). > --ft. > Pls send answer to the net. (Where "that" refers to my 'Competitive Hebbian Learning', to be published in Neural Networks, 1992, in response to Yoshio Yamamoto.) The Sanger paper that I think of in this connection is the 'Neural Networks ' paper, T. Sanger (1989) Optimal unsupervised learning..., Neural Networks, 2, 459-473. There is certainly some relation, in that each is a modification of Hebbian learning. And I would think that one could also apply Sanger's algorithm to Yoshio Yamamoto's problem - training hidden units to ignore input components which are uncorrelated with the desired output. As I understand it, Sanger's 'Generalized Hebbian learning' trains units to find successively, the principle components of the input, starting with the most important and working on down, depending on the number of units you use. Competitive Hebbian Learning, on the other hand, is a simpler algorithm which trains units to learn simultaneously (approximately) orthogonal linear combinations of the components of the input. With this algorithm, one does not get the princple components nicely separated out, but one does get trained units of roughly equal importance. For those interested there is a shorter preliminary version of the paper in the Jordan Pollack's neuroprose archive, where it is called white.comp-hebb.ps.Z. Unfortunately that version does not include the Boolean application which Yoshio Yamamoto's query suggested. Ray White (white at teetot.acusd.edu) Depts. of Physics & Computer Science University of San Diego From shams at maxwell.hrl.hac.com Fri Nov 8 14:43:23 1991 From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com) Date: Fri, 8 Nov 91 11:43:23 PST Subject: "real-world" applications of neural nets Message-ID: <9111081943.AA27842@maxwell.hrl.hac.com> Dear Connectionists, We are looking for "real-world" applications of neural networks to be used as benchmarks for evaluating the performance of our neurocomputer architecture. In particular we are looking for applications using greater than 200 neurons using a structured interconnection network where the network is made up of several smaller components. The primary goal of our research is to demonstrate the effectiveness of our architecture in implementing neural networks without the assumption of full interconnectivity between the neurons. Any references to published work or a brief description of a specific network structure would greatly be appreciated. Soheil Shams shams at maxwell.hrl.hac.com From announce at park.bu.edu Wed Nov 6 09:58:03 1991 From: announce at park.bu.edu (announce@park.bu.edu) Date: Wed, 6 Nov 91 09:58:03 -0500 Subject: Courses and Conference on Neural Networks, May 1992, Boston University Message-ID: <9111061458.AA17643@fenway.bu.edu> BOSTON UNIVERSITY NEURAL NETWORK COURSES AND CONFERENCE COURSE 1: INTRODUCTION AND FOUNDATIONS May 9 - 12, 1992 A systematic introductory course on neural networks. COURSE 2: RESEARCH AND APPLICATIONS May 12 - 14, 1992 Eight tutorials on current research and applications. CONFERENCE: NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL MAY 14 - 16, 1992 An international research conference presenting INVITED and CONTRIBUTED papers. Sponsored by Boston University's Wang Institute, Center for Adaptive Systems, and Department of Cognitive and Neural Systems, with partial support from the Air Force Office of Scientific Research. NEURAL NETWORK COURSES May 9 - 14, 1992 This self-contained, systematic, five-day course is based on the graduate curriculum in the technology, computation, mathematics, and biology of neural networks developed at the Center for Adaptive Systems (CAS) and the Department of Cognitive and Neural Systems (CNS) at Boston University. This year's curriculum refines and updates the successful course held at the Wang Institute of Boston University in May 1990 and 1991. A new two-course format permits both beginners and researchers to participate. The course will be taught by CAS/CNS faculty, as well as by distinguished guest lecturers at the beautiful and superbly equipped campus of the Wang Institute. An extraordinary range and depth of models, methods, and applications will be presented. Interaction with the lecturers and other participants will continue at the daily discussion sessions, meals, receptions, and coffee breaks that are included with registration. At the 1990 and 1991 courses, participants came from many countries and from all parts of the United States. Course Faculty from Boston University are Stephen Grossberg, Gail Carpenter, Ennio Mingolla, and Daniel Bullock. Course guest lecturers are John Daugman, Federico Faggin, Michael I. Jordan, Eric Schwartz, Alex Waibel, and Allen Waxman. COURSE 1 SCHEDULE SATURDAY, MAY 9, 1992 4:00 - 6:00 P.M. Registration 5:00 - 7:00 P.M. Reception SUNDAY, MAY 10, 1992 Professor Grossberg: Historical Overview, Cooperation and Competition, Content Addressable Memory, and Associative Learning. Professors Carpenter and Mingolla: Neocognitron, Perceptrons, and Introduction to Back Propagation. Professor Grossberg and Mingolla: Adaptive Pattern Recognition. MONDAY, MAY 11, 1992 Professor Grossberg: Introduction to Adaptive Resonance Theory. Professor Carpenter: ART 1, ART 2, and ART 3. Professors Grossberg and Mingolla: Vision and Image Processing. Professors Bullock and Grossberg: Adaptive Sensory-Motor Control and Robotics TUESDAY, MAY 12, 1992 Professor Bullock and Grossberg: Adaptive Sensory-Motor Control and Robotics, continued. Professor Grossberg: Speech Perception and Production, Reinforcement Learning and Prediction. End of Course 1 (12:30 P.M.) COURSE 2 SCHEDULE TUESDAY, MAY 12, 1992 11:30 A.M. - 1:30 P.M. Registration Professor Carpenter: Fuzzy Artmap. Dr. Waxman: Learning 3-D Objects from Temporal Sequences. WEDNESDAY, MAY 13, 1992 Professor Jordan: Recent Developments in Supervised Learning. Professor Waibel: Speech Recognition and Understanding. Professor Grossberg: Vision, Space, and Action. Professor Daugman: Signal Processing in Neural Networks. THURSDAY, MAY 14, 1992 Professor Schwartz: Active Vision. Dr. Faggin: Practical Implementation of Neural Networks. End of Course 2 (12:30 P.M.) RESEARCH CONFERENCE NEURAL NETWORKS FOR LEARNING, RECOGNITION, AND CONTROL MAY 14-16, 1992 This international research conference on topics of fundamental importance in science and technology will bring together leading experts from universities, government, and industry to present their results on learning, recognition, and control, in invited lectures and contributed posters. Topics range from cognitive science and neurobiology through computational modeling to technological applications. CALL FOR PAPERS: A featured poster session on neural network research related to learning, recognition, and control will be held on May 15, 1992. Attendees who wish to present a poster should submit three copies of an abstract (one single-spaced page), postmarked by March 1, 1992, for refereeing. Include a cover letter giving the name, address, and telephone number of the corresponding author. Mail to: Poster Session, Neural Networks Conference, Wang Institute of Boston University, 72 Tyng Road, Tyngsboro, MA 01879. Authors will be informed of abstract acceptance by March 31, 1992. A book of lecture and poster abstracts will be given to attendees at the conference. CONFERENCE PROGRAM THURSDAY, MAY 14, 1992 2:00 P.M. - 5:00 P.M. Registration 3:00 P.M. - 5:00 P.M. Reception Professor Richard Shiffrin, Indiana University: "The Relationship between Composition/Distribution and Forgetting" Professor Roger Ratcliff, Northwestern University: "Evaluating Memory Models" Professor David Rumelhart, Stanford University: "Learning and Generalization in a Connectionist Network" FRIDAY, MAY 15, 1992 Dr. Mortimer Mishkin, National Institute of Mental Health: "Two Cerebral Memory Systems" Professor Larry Squire, University of California, San Diego: "Brain Systems and the Structure of Memory" Professor Stephen Grossberg, Boston University, "Neural Dynamics of Adaptively Timed Learning and Memory" Professor Theodore Berger, University of Pittsburgh: "A Biological Neural Model for Learning and Memory" Professor Mark Bear, Brown University: "Mechanisms for Experience- Dependent Modification of Visual Cortex" Professor Gail Carpenter, Boston University: "Supervised Learning by Adaptive Resonance Networks" Dr. Allen Waxman, MIT Lincoln Laboratory: "Neural Networks for Mobile Robot Visual Navigation and Conditioning" Dr. Thomas Caudell, Boeing Company: "The Industrial Application of Neural Networks to Information Retrieval and Object Recognition at the Boeing Company" POSTER SESSION (Three hours) SATURDAY, MAY 16, 1992 Professor George Cybenko, University of Illinois: "The Impact of Memory Technology on Neurocomputing" Professor Eduardo Sontag, Rutgers University: "Some Mathematical Results on Feedforward Nets: Recognition and Control" Professor Roger Brockett, Harvard University: "A General Framework for Learning via Steepest Descent" Professor Barry Peterson, Northwestern University Medical School: "Approaches to Modeling a Plastic Vestibulo-ocular Reflex" Professor Daniel Bullock, Boston University: "Spino-Cerebellar Cooperation for Skilled Movement Execution" Dr. James Albus, National Institute of Standards and Technology: "A System Architecture for Learning, Recognition, and Control" Professor Kumpati Narendra, Yale University: "Adaptive Control of Nonlinear Systems Using Neural Networks" Dr. Robert Pap, Accurate Automation Company: "Neural Network Control of the NASA Space Shuttle Robot Arm" Discussion End of Research Conference (5:30 P.M.) HOW TO REGISTER: ...To register by telephone, call (508) 649-9731 (x 255) with VISA or Mastercard between 8:00-5:00 PM (EST). ...To register by fax, complete and fax back the Registration Form to (508) 649-6926. ...To register by mail, complete the registration form below and mail it with your payment as directed. ON-SITE REGISTRATION: Those who wish to register for the courses and the research conference on-site may do so on a space-available basis. SITE: The Wang Institute of Boston University possesses excellent conference facilities in a beautiful 220-acre setting. It is easily reached from Boston's Logan Airport and Route 128. REGISTRATION FORM: NEURAL NETWORKS COURSES AND CONFERENCE MAY 9-16, 1992 Name: ______________________________________________________________ Title: _____________________________________________________________ Organization: ______________________________________________________ Address: ___________________________________________________________ City: ____________________________ State: __________ Zip: __________ Country: ___________________________________________________________ Telephone: _______________________ FAX: ____________________________ Regular Attendee Full-time Student Course 1 ( ) $650 N/A Course 2 ( ) $650 N/A Courses 1 and 2 ( ) $985 ( )$275* Conference ( ) $110 ( ) $75* * Limited number of spaces. Student registrations must be received by April 15, 1992. Total payment:______________________________________________________ Form of payment: ( ) Check or money order (payable in U.S. dollars to Boston University). ( ) VISA ( ) Mastercard #_________________________________Exp.Date:__________________ Signature (as it appears on card): __________________________ Return to: Neural Networks Wang Institute of Boston University 72 Tyng Road Tyngsboro, MA 01879 YOUR REGISTRATION FEE INCLUDES: COURSES CONFERENCE Lectures Lectures Course notebooks Poster session Evening discussion sessions Book of lecture & poster Saturday reception abstracts Continental breakfasts Thursday reception Lunches Two continental breakfasts Dinners Two lunches Coffee breaks One dinner Coffee breaks CANCELLATION POLICY: Course fee, less $100, and the research conference fee, less $60, will be refunded upon receipt of a written request postmarked before March 31, 1992. After this date no refund will be made. Registrants who do not attend and who do not cancel in writing before March 31, 1991 are liable for the full amount of the registration fee. You must obtain a cancellation number from the registrar in order to make the cancellation valid. HOTEL RESERVATIONS: ...Sheraton Tara, Nashua, NH (603) 888-9970, $60/night, plus tax (single or double). ...Best Western, Nashua, NH (603) 888-1200, $44/night, single, plus tax, $49/night, double, plus tax. ...Stonehedge Inn, Tyngsboro, MA, (508) 649-4342, $84/night, plus tax (single or double). The special conference rate applies only if you mention the name and dates of the meeting when making the reservation. The hotels in Nashua are located approximately five miles from the Wang Institute; shuttle bus service will be provided to them. AIRLINE DISCOUNTS: American Airlines is the official airline for Neural Networks. Receive 45% off full fare with at least seven days advance purchase or 5% off discount fares. A 35% discount applies on full fare flights from Canada with an advance purchase of at least seven days. Call American Airlines Meeting Services Desk at (800) 433-1790 and be sure to reference STAR#SO252AM. Some restrictions apply. STUDENT REGISTRATION: A limited number of spaces at the courses and conference have been reserved at a subsidized rate for full-time students. These spaces will be assigned on a first-come, first-served basis. Completed registration form and payment for students who wish to be considered for the reduced student rates must be received by April 15, 1992. From steve at cogsci.edinburgh.ac.uk Fri Nov 8 13:34:00 1991 From: steve at cogsci.edinburgh.ac.uk (Steve Finch) Date: Fri, 08 Nov 91 18:34:00 +0000 Subject: Announcement of paper on learning syntactic categories. Message-ID: <16311.9111081834@scott.cogsci.ed.ac.uk> I have submitted a copy of a paper Nick Chater and I have written, to the neuroprose archive. It details a hybrid system comprising a statistically motivated network and a symbolic clustering mechanism which together automatically classify words into a syntactic hierachy by imposing a similarity metric over the contexts in which they are observed to have occured in USENET newsgroup articles. The resulting categories are very linguistically intuitive. The abstract follows: Symbolic and neural network architectures differ with respect to the representations they naturally handle. Typically, symbolic systems use trees, DAGs, lists and so on, whereas networks typically use high dimensional vector spaces. Network learning methods may therefore appear to be inappropriate in domains, such as natural language, which are naturally modelled using symbolic methods. One reaction is to argue that network methods are able to {\it implicitly} capture this symbolic structure, thus obviating the need for explicit symbolic representation. However, we argue that the {\it explicit} representation of symbolic structure is an important goal, and can be learned using a hybrid approach, in which statistical structure extracted by a network is transformed into a symbolic representation. We apply this approach at several levels of linguistic structure, using as input unlabelled orthographic, phonological and word-level strings. We derive linguistically interesting categories such as `noun', `verb', `preposition', and so on from unlabeled text. To get it by anonymous ftp type ftp archive.cis.ohio-state.edu when asked for login name type anonymous; when asked for password type neuron. Then type cd pub/neuroprose binary get finch.hybrid.ps.Z quit Then uncompress it and lpr it. ------------------------------------------------------------------------------ Steven Finch Phone: +44 31 650 4435 | University of Edinburgh From hwang at pierce.ee.washington.edu Sun Nov 10 14:14:44 1991 From: hwang at pierce.ee.washington.edu ( J. N. Hwang) Date: Sun, 10 Nov 91 11:14:44 PST Subject: IJCNN'91 Singapore Advanced Program Message-ID: <9111101914.AA08065@pierce.ee.washington.edu.> For those of you who didn't receive the Advanced Program of IJCNN'91 Singapore. Here is a brief summary of the program. Jenq-Neng Hwang, Publicity/Technical Committee IJCNN'91 ---------------------------------------------------------- IJCNN'91 Singapore Preliminary Program Overview: Sunday, 11/17/91 4:00 pm -- 7:00 pm Registration Monday, 11/18/91 8:00 am -- 5:30 pm Registration 9:00 am -- 4:00 pm Tutorials 1) Weightless Neural Nets (NNs) 2) Neural Computation: From Brain Research to Novel Computers 3) Fuzzy Logic & Computational NNs 4) Neural Computing & Pattern Recognition 5) Morphology of Biological Vision 6) Cancelled 7) Successful NN Parallel Computing 8) A Logical Topology of NN Tuesday, 11/19/91 7:30 am -- 5:30 pm Registration 8:00 am -- 9:00 am Plenary Session (T. Kohonen) 9:15 am -- 12.15 pm Technical Sessions 1) Associative Memory (I) 2) Neurocognition (I) 3) Hybrid Systems (I) 4) Supervised Learning (I) 5) Applications (I) 6) Image Processing/Maths (Poster 1) 1:15 pm -- 3.15 pm Technical Sessions 1) Neurophysiology/Invertebrate 2) Sensation and Perception 3) Hybrid Systems (II) 4) Supervised Learning (II) 5) Applicatiions (II) 6) Supervised Learning (Poster 2) 3:30 pm -- 6:00 pm Technical Sessions 1) Electrical Neurocomputer (I) 2) Image Processing (I) 3) Hybrid Systems (III) 4) Supervised learning (III) 5) Applicatioins (III) 6:00 pm -- 7:30 pm Panel Discussion (G. Deboeck): Financial Applicatiions of NNs Wednesday, 11/20/91 7:00 am -- 5:30 pm Registration 8:00 am -- 9:00 am Plenary Session (Y. Nishikawa) 9:15 am -- 12.15 pm Technical Sessions 1) Optimization (I) 2) Image Processing (II) 3) Robotics (I) 4) Supervised Learning (IV) 5) Applications (IV) 6) Applications (Poster 3) 1:15 pm -- 3.15 pm Technical Sessions 1) Mathematical Methods (I) 2) Machine Vision 3) Sensorimotor Control Systems 4) Supervised Learning (V) 5) Applicatiions (V) 6) Robotics (Poster 4) 3:30 pm -- 6:00 pm Technical Sessions 1) Neurocomputer/Associative Memory 2) Neurocognition (II) 3) Unsupervised Learning (II) 4) Supervised Learning (VI) 5) Applicatioins (VI) 5:00 pm -- 6:30 pm Industrial Panel (Tom Caudell) 7:00 pm -- 10:00 pm IJCNN'91 Banquet Thursday, 11/21/91 7:30 am -- 12:15 pm Registration 8:00 am -- 9:00 am Plenary Session (K. S. Narendra) 9:15 am -- 12.15 pm Technical Sessions 1) Electrical Neurocomputer (II) 2) Neuro-Dynamics (I) 3) Robotics (II) 4) Supervised Learning (VII) 5) Applications (VII) 6) Neurocomputers (Poster 5) 1:15 pm -- 3.15 pm Technical Sessions 1) Associative Memory 2) Mathematical Methods (II) 3) Neuro-Dynamics (II) 4) Supervised/Unsupervised Learning 5) Applicatiions (VIII) 6) Optimization/Associative Memory (Poster 6) 3:30 pm -- 6:00 pm Technical Sessions 1) Optimization (II) 2) Machine Vision (II) 3) mathematical Methods (III) 4) Unsupervised Learning (III) 5) Mathematical Methods/Supervised Learning Welcome Reception: All speakers, authors, delegates, including students, and one-day registrants are invited to the "Welcome Receptiion" on the 18th November. Full details will included in the Final Program Conference Registration: Members Non-Members Students (no Proc.) US$ $240 $280 $100 Before 8/31/91 US$ $280 $330 $120 After 8/31/91 US$ $330 $380 $140 On Site Tutorial Registration: Only for Registered Conference Participants Registration Fee Per Tutorial US$ $120 Pre-Register US$ $140 On Site US$ $30 Students Conference Proceedings: Additional copies of the proceedings are available at the Conference at US$100.00 per set. Rates do not include postage and handling charges. Travel Information: Please Contact TRADEWINDS PTE LTD 77 Robinson Road, #02-06 SIA Building Singapore 0106 TEL: (65) 322-6845, FAX: (65) 224-1198 Attn: Ms Julie Gan Banquet Information: 7:00 pm, 20th November 1991 Westin Stamford/Westin Plaza An excellent 9 course Chinese Dinner will be served Additional ticket: US$42.00 from IJCNN'91 Secretariat. From carlos at dove.caltech.edu Sun Nov 10 18:10:40 1991 From: carlos at dove.caltech.edu (Carlos Brody-Pellicer) Date: Sun, 10 Nov 91 15:10:40 PST Subject: Roommate for NIPS Message-ID: <9111102310.AA07033@dove.caltech.edu> Would anybody out there like to share a room for NIPS? I'm going to arrive on Saturday and leave on Thursday (but I would expect us to share costs Sun-Thurs only). Please let me know if you are interested. -Carlos (carlos at dove.caltech.edu) From pollack at cis.ohio-state.edu Mon Nov 11 16:25:52 1991 From: pollack at cis.ohio-state.edu (Jordan B Pollack) Date: Mon, 11 Nov 91 16:25:52 -0500 Subject: POSTNIPS cognitive science workshop Message-ID: <9111112125.AA11761@dendrite.cis.ohio-state.edu> A neuro-engineering friend claimed that cognitive scientists feared "steep gradient descents," and THAT'S why they didn't come to the post-NIPS workshops! I countered that there were just no topics which made strong enough attractors... ------------------------------------------------------ Modularity in Connectionist Models of Cognition Friday November 6th, Vectorized AI Laboratory, Colorado ------------------------------------------------------ Organizer: Jordan Pollack Confirmed Speakers: Michael Mozer Robert Jacobs John Barnden Rik Belew (There is room for a few more people to have confirmed 15 minute slots, but half of the workshop is reserved for open discussion.) ABSTRACT: Classical modular theories of mind presume mental "organs" - function specific, put in place by evolution - which communicate in a symbolic language of thought. In the 1980's, Connectionists radically rejected this view in favor of more integrated architectures, uniform learning systems which would be very tightly coupled and communicate through many feedforward and feedback connections. However, as connectionist attempts at cognitive modeling have gotten more ambitious, ad-hoc modular structuring has become more prevalent. But there are concerns regarding how much architectural bias is allowable. There has been a flurry of work on resolving these concerns by seeking the principles by which modularity could arise in connectionist architectures. This will involve solving several major problems - data decomposition, structural credit assignment, and shared adaptive representations. This workshop will bring together proponents and opponents of Modular Connectionist Architectures to discuss research direction, recent progress, and long-term challenges. ------------------------ Jordan Pollack Assistant Professor CIS Dept/OSU Laboratory for AI Research 2036 Neil Ave Email: pollack at cis.ohio-state.edu Columbus, OH 43210 Phone: (614)292-4890 (then * to fax) From gary at cs.UCSD.EDU Mon Nov 11 13:52:50 1991 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Mon, 11 Nov 91 10:52:50 PST Subject: principal components Message-ID: <9111111852.AA26216@desi.ucsd.edu> In reply to: >Date: Fri, 8 Nov 91 10:02:34 -0800 >From: Ray White >To: Connectionists at CS.CMU.EDU > >(Where "that" refers to my 'Competitive Hebbian Learning', to be published >in Neural Networks, 1992, in response to Yoshio Yamamoto.) > >The Sanger paper that I think of in this connection is the 'Neural Networks ' >paper, T. Sanger (1989) Optimal unsupervised learning..., Neural Networks, 2, >459-473. >As I understand it, Sanger's 'Generalized Hebbian learning' trains units >to find successively, the principle components of the input, starting with >the most important and working on down, depending on the number of units >you use. > >Competitive Hebbian Learning, on the other hand, is a >simpler algorithm which trains units to learn simultaneously (approximately) >orthogonal linear combinations of the components of the input. With this >algorithm, one does not get the princple components nicely separated out, >but one does get trained units of roughly equal importance. > Ray White (white at teetot.acusd.edu) > Depts. of Physics & Computer Science > University of San Diego > Back prop when used with linear nets, does just this also. Since the optimal technique is PCA in the linear case with a quadratic cost function, bp is just a way of directly performing this and is not an improvement over Karhunen-Loeve (except, perhaps in being space efficient). More recently, Mathilde Mougeot has used the fact that bp is doing PCA to discover a fast algorithm for the quadratic case, and she also has shown that bp can be effectively used for other norms. References: Baldi & Hornik, (1988) Neural Networks and Principal Components Analysis: Learning from examples without local minima. Neural Networks, Vol 2, No 1. Cottrell, G.W. and Munro, P. (1988) Principal components analysis of images via back propagation. Invited paper in \fIProceedings of the Society of Photo-Optical Instrumentation Engineers\fP, Cambridge, MA. Mougeot, M., Azencott, R. & Angeniol, B. (1991) Image compression with back propagation: Improvement of the visual restoration using different cost functions. Neural Networks vol 4. number 4 pp 467-476. onward and upward, Gary Cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029 Computer Science and Engineering 0114 University of California San Diego La Jolla, Ca. 92093 gary at cs.ucsd.edu (INTERNET) gcottrell at ucsd.edu (BITNET, almost anything) From rob at tlon.mit.edu Mon Nov 11 17:04:13 1991 From: rob at tlon.mit.edu (Rob Sanner) Date: Mon, 11 Nov 91 17:04:13 EST Subject: MIT NSL Reports on Adaptive Neurocontrol Message-ID: The following are the titles and abstracts of three reports we have uploaded to the neuroprose archive. Due to a large number of recent requests for hardcopy reprints, these reports have now been made available electronically. They can also be obtained (under their NSL reference number) by anonymous ftp at tlon.mit.edu in the pub directory. These reports describe the results of research conducted at the MIT Nonlinear Systems Laboratory during the past year into algorithms for the stable adaptive tracking control of nonlinear systems using gaussian radial basis function networks. These papers are potentially interesting to researchers in both adaptive control and neural network theory. The research described starts by quantifying the relation between the network size and weights and the degree of uniform approximation accuracy a trained network can guarantee. On this basis, it develops a _constructive_ procedure for networks which ensures the required accuracy. These constructions are then exploited for the design of stable adaptive controllers for nonlinear systems. Any comments would be greatly appreciated and can be sent to either rob at tlon.mit.edu or jjs at athena.mit.edu. Robert M. Sanner and Jean-Jacques E. Slotine ------------------------------------------------------------------------------ on neuroprose: sanner.adcontrol_9103.ps.Z (NSL-910303, March 1991) Also appears: Proc. American Control Conference, June 1991. Direct Adaptive Control Using Gaussian Networks Robert M. Sanner and Jean-Jacques E. Slotine Abstract: A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous-time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible. The architecture employs a network of gaussian radial basis functions to adaptively compensate for the plant nonlinearities. Under mild assumptions about the degree of smoothness exhibited by the nonlinear functions, the algorithm is proven to be stable, with tracking errors converging to a neighborhood of zero. A constructive procedure is detailed, which directly translates the assumed smoothness properties of the nonlinearities involved into a specification of the network required to represent the plant to a chosen degree of accuracy. A stable weight adjustment mechanism is then determined using Lyapunov theory. The network construction and performance of the resulting controller are illustrated through simulations with an example system. ----------------------------------------------------------------------------- on neuroprose: sanner.adcontrol_9105.ps.Z (NSL-910503, May 1991) Gaussian Networks for Direct Adaptive Control Robert M. Sanner and Jean-Jacques E. Slotine Abstract: This report is a complete and formal exploration of the ideas originally presented in NSL-910303; as such it contains most of NSL-910303 as a subset. We detail a constructive procedure for a class of neural networks which can approximate to a prescribed accuracy the functions required for satisfaction of the control objectives. Since this approximation can be maintained only over a finite subset of the plant state space, to ensure global stability it is necessary to introduce an additional component into the control law, which is capable of stabilizing the dynamics as the neural approximation degrades. To unify these components into a single control law, we propose a novel technique of smoothly blending the two modes to provide a continuous transition from adaptive operation in the region of validity of the network approximation, to a nonadaptive operation in the regions where this approximation is inaccurate. Stable adaptation mechanisms are then developed using Lyapunov stability theory. Section 2 describes the setting of the control problem to be examined and illustrates the structure of conventional adaptive methods for its solution. Section 3 introduces the use of multivariable Fourier analysis and sampling theory as a method of translating assumed smoothness properties of the plant nonlinearities into a representation capable of uniformly approximating the plant over a compact set. This section then discusses the conditions under which these representations can be mapped onto a neural network with a finite number of components. Section 4 illustrates how these networks may be used as elements of an adaptive tracking control algorithm for a class of nonlinear systems, which will guarantee convergence of the tracking errors to a neighborhood of zero. Section 5 illustrates the method with two examples, and finally, Section 6 closes with some general observations about the proposed controller. ----------------------------------------------------------------------------- on neuroprose: sanner.adcontrol_9109.ps.Z (NSL-910901, Sept. 1991) To appear: IEEE Conf. on Decision and Control, Dec. 1991. Stable Adaptive Control and Recursive Identification Using Radial Gaussian Networks Robert M. Sanner and Jean-Jacques E. Slotine Abstract: Previous work has provided the theoretical foundations of a constructive design procedure for uniform approximation of smooth functions to a chosen degree of accuracy using networks of gaussian radial basis functions. This construction and the guaranteed uniform bounds were then shown to provide the basis for stable adaptive neurocontrol algorithms for a class of nonlinear plants. This paper details and extends these ideas in three directions: first some practical details of the construction are provided, explicitly illustrating the relation between the free parameters in the network design and the degree of approximation error on a particular set. Next, the original adaptive control algorithm is modified to permit incorporation of additional prior knowledge of the system dynamics, allowing the neurocontroller to operate in parallel with conventional fixed or adaptive controllers. Finally, it is shown how the gaussian network construction may also be utilized in recursive identification algorithms with similar guarantees of stability and convergence. The identification algorithm is evaluated on a chaotic time series and demonstrates the predicted convergence properties. From owens at eplrx7.es.duPont.com Tue Nov 12 12:03:25 1991 From: owens at eplrx7.es.duPont.com (Aaron Owens) Date: Tue, 12 Nov 1991 17:03:25 GMT Subject: second derivatives and the back propagation network References: <1991Nov10.133414.11341@eplrx7.es.duPont.com> Message-ID: <1991Nov12.170325.28731@eplrx7.es.duPont.com> RE: Second Derivatives and Stiff ODEs for Back Prop Training Several threads in this newsgroup recently have mentioned the use of second derivative information (i.e., the Hessian or Jacobian matrix) and/or stiff ordinary differential equations [ODEs] in the training of the back propagation network [BPN]. [-- Aside: Stiff differential equation solvers derive their speed and accuracy by specifically utilizing the information contained in the second-derivative Jacobian matrix. -- ] This is to confirm our experience that training the BPN using second-derivative methods in general, and stiff ODE solvers in particular, is extremely fast and efficient for problems which are small enough (i.e., up to about 1000 connection weights) to allow the Jacobian matrix [size = (number of weights)**2] to be stored in the computer's real memory. "Stiff" backprop is particularly well-suited to real-valued function mappings in which a high degree of accuracy is required. We have been using this method successfully in most of our production applications for several years. See the abtracts below of a paper presented at the 1989 IJCNN in Washington and of a recently-issued U. S. patent. It is possible -- and desirable -- to use the back error propagation methodology (i.e., the chain rule of calculus) to explicitly compute the second derivative of the sum_of_squared_prediction_error with respect to the weights (i.e., the Jacobian matrix) analytically. Using an analytic Jacobian, rather than computing the second derivatives numerically [or -- an UNVERIFIED personal hypothesis -- stochastically], increases the algorithm's speed and accuracy significantly. -- Aaron -- Aaron J. Owens Du Pont Neural Network Technology Center P. O. B. 80357 Wilmington, DE 19880-0357 Telephone Numbers: Office (302) 695-7341 (Phone & FAX) Home " 738-5413 Internet: owens at esvax.dnet.dupont.com ---------- IJCNN '89 paper abstract ------------ EFFICIENT TRAINING OF THE BACK PROPAGATION NETWORK BY SOLVING A SYSTEM OF STIFF ORDINARY DIFFERENTIAL EQUATIONS A. J. Owens and D. L. Filkin Central Research and Development Department P. O. Box 80320 E. I. du Pont de Nemours and Company (Inc.) Wilmington, DE 19880-0320 International Joint Conference on Neural Networks June 19-22, 1989, Washington, DC Volume II, pp. 381-386 Abstract. The training of back propagation networks involves adjusting the weights between the computing nodes in the artificial neural network to minimize the errors between the network's predictions and the known outputs in the training set. This least-squares minimization problem is conventionally solved by an iterative fixed-step technique, using gradient descent, which occa- sionally exhibits instabilities and converges slowly. We show that the training of the back propagation network can be expressed as a problem of solving coupled ordinary differential equations for the weights as a (continuous) function of time. These differential equations are usually mathematically stiff. The use of a stiff differential equation solver ensures quick convergence to the nearest least-squares minimum. Training proceeds at a rapidly accelerating rate as the accuracy of the predictions increases, in contrast with gradient descent and conjugate gradient methods. The number of presentations required for accurate training is reduced by up to several orders of magnitude over the conventional method. ---------- U. S. Patent No. 5,046,020 abstract ---------- DISTRIBUTED PARALLEL PROCESSING NETWORK WHEREIN THE CONNECTION WEIGHTS ARE GENERATED USING STIFF DIFFERENTIAL EQUATIONS Inventor: David L. Filkin Assignee: E. I. du Pont de Nemours and Company U. S. Patent Number 5,046,020 Sep. 3, 1991 Abstract. A parallel distributed processing network of the back propagation type is disclosed in which the weights of connection between processing elements in the various layers of the network are determined in accordance with the set of steady solutions of the stiff differential equations governing the relationship between the layers of the network. From tds at ai.mit.edu Tue Nov 12 17:54:54 1991 From: tds at ai.mit.edu (Terence D. Sanger) Date: Tue, 12 Nov 91 17:54:54 EST Subject: Algorithms for Principal Components Analysis In-Reply-To: Ray White's message of Fri, 8 Nov 91 10:02:34 -0800 <9111081802.AA12659@teetot.acusd.edu> Message-ID: <9111122254.AA11404@cauda-equina> Ray, Over the past few years there has been a great deal of interest in recursive algorithms for finding eigenvectors or linear combinations of them. Many of these algorithms are based on the Oja rule (1982) with modifications to find more than a single output. As might be expected, so many people working on a single type of algorithm has led to a certain amount of duplication of effort. Following is a list of the papers I know about, which I'm sure is incomplete. Anyone else working on this topic should feel free to add to this list! Cheers, Terry Sanger @article{sang89a, author="Terence David Sanger", title="Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network", year=1989, journal="Neural Networks", volume=2, pages="459--473"} @incollection{sang89c, author="Terence David Sanger", title="An Optimality Principle for Unsupervised Learning", year=1989, pages="11--19", booktitle="Advances in Neural Information Processing Systems 1", editor="David S. Touretzky", publisher="Morgan Kaufmann", address="San Mateo, {CA}", note="Proc. {NIPS'88}, Denver"} @article{sang89d, author="Terence David Sanger", title="Analysis of the Two-Dimensional Receptive Fields Learned by the Generalized {Hebbian} Algorithm in Response to Random Input", year=1990, journal="Biological Cybernetics", volume=63, pages="221--228"} @misc{sang90c, author="Terence D. Sanger", title="Optimal Hidden Units for Two-layer Nonlinear Feedforward Neural Networks", year=1991, note="{\it Int. J. Pattern Recognition and AI}, in press"} @inproceedings{broc89, author="Roger W. Brockett", title="Dynamical Systems that Sort Lists, Diagonalize Matrices, and Solve Linear Programming Problems", booktitle="Proc. 1988 {IEEE} Conference on Decision and Control", publisher="{IEEE}", address="New York", pages="799--803", year=1988} @ARTICLE{rubn90, AUTHOR = {J. Rubner and K. Schulten}, TITLE = {Development of Feature Detectors by Self-Organization}, JOURNAL = {Biol. Cybern.}, YEAR = {1990}, VOLUME = {62}, PAGES = {193--199} } @INCOLLECTION{krog90, AUTHOR = {Anders Krogh and John A. Hertz}, TITLE = {Hebbian Learning of Principal Components}, BOOKTITLE = {Parallel Processing in Neural Systems and Computers}, PUBLISHER = {Elsevier Science Publishers B.V.}, YEAR = {1990}, EDITOR = {R. Eckmiller and G. Hartmann and G. Hauske}, PAGES = {183--186}, ADDRESS = {North-Holland} } @INPROCEEDINGS{fold89, AUTHOR = {Peter Foldiak}, TITLE = {Adaptive Network for Optimal Linear Feature Extraction}, BOOKTITLE = {Proc. {IJCNN}}, YEAR = {1989}, PAGES = {401--406}, ORGANIZATION = {{IEEE/INNS}}, ADDRESS = {Washington, D.C.}, MONTH = {June} } @MISC{kung90, AUTHOR = {S. Y. Kung}, TITLE = {Neural networks for Extracting Constrained Principal Components}, YEAR = {1990}, NOTE = {submitted to {\it IEEE Trans. Neural Networks}} } @article{oja85, author="Erkki Oja and Juha Karhunen", title="On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix", journal="J. Math. Analysis and Appl.", volume=106, pages="69--84", year=1985} @book{oja83, author="Erkki Oja", title="Subspace Methods of Pattern Recognition", publisher="Research Studies Press", address="Letchworth, Hertfordshire UK", year=1983} @inproceedings{karh84b, author="Juha Karhunen", title="Adaptive Algorithms for Estimating Eigenvectors of Correlation Type Matrices", booktitle="{Proc. 1984 {IEEE} Int. Conf. on Acoustics, Speech, and Signal Processing}", publisher="{IEEE} Press", address="Piscataway, {NJ}", year=1984, pages="14.6.1--14.6.4"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @misc{ogaw86, author = "Hidemitsu Ogawa and Erkki Oja", title = "Can we Solve the Continuous Karhunen-Loeve Eigenproblem from Discrete Data?", note = "Proc. {IEEE} Eighth International Conference on Pattern Recognition, Paris", year = "1986"} @article{leen91, author = "Todd K Leen", title = "Dynamics of learning in linear feature-discovery networks", journal = "Network", volume = 2, year = "1991", pages = "85--105"} @incollection{silv91, author = "Fernando M. Silva and Luis B. Almeida", title = "A Distributed Decorrelation Algorithm", booktitle = "Neural Networks, Advances and Applications", editor = "Erol Gelenbe", publisher = "North-Holland", year = "1991", note = "to appear"} From gluck at pavlov.Rutgers.EDU Wed Nov 13 09:29:58 1991 From: gluck at pavlov.Rutgers.EDU (Mark Gluck) Date: Wed, 13 Nov 91 09:29:58 EST Subject: Adding noise to training -- A psychological perspective (Preprint) Message-ID: <9111131429.AA02765@pavlov.rutgers.edu> In a recent paper we have discussed the role of stochastic noise in training data for adaptive network models of human classification learning. We have shown how the incorporation of such noise (usually modelled as a stochastic sampling process on the external stimuli) improves generalization performance, especially with deterministic discriminations which underconstrain the set of possible solution-weights. The addition of noise to the training biases the network to find solutions (and generalizations) which more closely correspond to the behavior of humans in psychological experiments. The reference is: Gluck, M. A. (1991,in press). Stimulus sampling and distributed representations in adaptive network theories of learning. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), Festschrift for W. K. Estes. New Jersey: Lawrence Erlbaum Associates. Copies can be received by emailing to: ______________________________________________________________________ Dr. Mark A. Gluck Center for Molecular & Behavioral Neuroscience Rutgers University 197 University Ave. Newark, New Jersey 07102 Phone: (201) 648-1080 (Ext. 3221) Fax: (201) 648-1272 Email: gluck at pavlov.rutgers.edu From jb at s1.gov Wed Nov 13 12:27:42 1991 From: jb at s1.gov (jb@s1.gov) Date: Wed, 13 Nov 91 09:27:42 PST Subject: Research Positions in NN/Speech Rec. Message-ID: <9111131727.AA03952@havana.s1.gov> A friend of mine asked me to post the following announcement of research positions. Please do not reply to this posting since I have nothing to do with the selection process, Joachim Buhmann ----------------------------- cut here ---------------------------------------- * Please post *** Please post *** Please post *** Please post *** Please post * --------------------------------------------------------------------------- POSITIONS IN NEURAL NETWORKS / SPEECH RECOGNITION AVAILABLE --------------------------------------------------------------------------- Technical University of Munich, Germany Physics Department, T35 The theoretical biopysics group of Dr. Paul Tavan offers research positions in the field of neural networks and speech recognition. The positions are funded by the German Federal Department of Research and Technology (BMFT) for a period of at least three years, starting in January 1992. Salaries are paid according to the tariffs for federal employees. Position offered include a postdoctoral fellow (BAT Ib) and positions for graduate students (BAT IIa/2). The latter includes the opportunity of pursuing a Ph.D. in physics. The project is part of a larger project aiming towards the development of neural algorithms and architectures for the transformation of continuous speech into symbolic code. It will be pursued in close cooperation with other german research groups working on different aspects of the problem. Within this cooperation our group is responsible for the definition and extraction of appropriate features and their optimal representation based on selforganizing algorithms and methods of statistical pattern recognition. Higher-level processing, e.g., the incorporation of semantic knowledge is *not* the central issue of our subproject. The postdoctoral position involves coordination and organization of the research projects within the group and the overall project. Applicants for this position therefore should have broad experience in the areas of neural networks and/or speech recognition. Applicants for the other positions should hold a masters degree in physics. Experience in the field of neural networks/pattern recognition would be valuable but is not necessarly required. Profound background in mathematics however is. All applicants should have programming experience in a workstation environment and should most of all be able to perform project oriented research in close teamwork with other group members. University policy requires all applicants to spend a limited amount of time on teaching activities at the physics-department of the TUM besides their research. Applications should be sent before December 1st, 1991 and include - a curriculum vitae, - sample publications, technical reports, thesis reprints etc. and - a brief description of the main fields of interest in the research of the applicant. Please direct application material or requests for further information to: Hans Kuehnel Physik-Department, T35 Ehem. Bauamt Boltzmannstr. W-8046 Garching GERMANY Phone: +49-89-3209-3766 Email: kuehnel at physik.tu-muenchen.de From BATTITI at ITNVAX.CINECA.IT Thu Nov 14 10:01:00 1991 From: BATTITI at ITNVAX.CINECA.IT (BATTITI@ITNVAX.CINECA.IT) Date: Thu, 14 NOV 91 10:01 N Subject: paper on 2nd order methods in Neuroprose Message-ID: <2274@ITNVAX.CINECA.IT> A new paper is available from the Neuroprose directory. FILE: battiti.second.ps.Z (ftp binary, uncompress, lpr (PostScript)) TITLE: "First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method" AUTHOR: Roberto Battiti ABSTRACT: On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice.This paper reviews first- and second-order optimization methods for learning in feed-forward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimiza- tion techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations. PS: the paper will be published in Neural Computation. PPSS: comments and/or new results welcome. ====================================================================== | | | | Roberto Battiti | e-mail: battiti at itnvax.cineca.it | | Dipartimento di Matematica | tel: (+39) - 461 - 88 - 1639 | | 38050 Povo (Trento) - ITALY | fax: (+39) - 461 - 88 - 1624 | | | | ====================================================================== From fcr at neura.inesc.pt Thu Nov 14 05:15:47 1991 From: fcr at neura.inesc.pt (Fernando Corte Real) Date: Thu, 14 Nov 91 10:15:47 GMT Subject: Algorithms for Principal Components Analysis In-Reply-To: "Terence D. Sanger"'s message of Tue, 12 Nov 91 17:54:54 EST <9111122254.AA11404@cauda-equina> Message-ID: <9111141015.AA29833@neura.inesc.pt> Just a short correction to Sanger's bibliographic list on compettitive Hebbian and related algorithms: The last work listed there was already published a few months ago. A shorter version can be found in [1] F.M. Silva, L. B. Almeida, "A Distributed Solution for Data Orthonormalization", in Proc. ICANN, Helsinki, 1991. The following reference may also be added to Sanger's list: [2] R. Williams, "Feature Discovery Trough Error-Correction Learning", ICS Report 8501, University of California, San Diego, 1985 Since there has been so much discussion in the net about 2nd. order algorithms, an aplication of [1] to the improvement of the so-called "diagonal" 2nd. order algorithms can be found in the following reference: [3] F.M. Silva, L. B. Almeida, "Speeding-Up Backpropagation by Data Orthonormalization", in Proc. ICANN, Helsinki, 1991. From LWCHAN at cucsd.cs.cuhk.hk Thu Nov 14 07:55:00 1991 From: LWCHAN at cucsd.cs.cuhk.hk (LAI-WAN CHAN) Date: Thu, 14 Nov 1991 20:55 +0800 Subject: research position available Message-ID: <4CF1F3A14040013E@CUCSD.CUHK.HK> Department of Electronic Engineering and Department of Computer Science Faculty of Engineering The Chinese University of Hong Kong Research Assistantship in Speech Technology We are seeking for a young, self-motivated and high calibre researcher to fill the above position. Applicants should have a degree in Electronic Engineering or Computer Science, with knowledge of speech processing and/or neural networks and preferably with programming experience in C under UNIX environment. A higher degree in relevant field will be an advantage. The appointee will join the Speech Processing Group and will contribute to an ongoing research effort focusing on R&D of speech recognition and synthesis technology. The Speech Processing Group within the Faculty of Engineering is very well equipped and has a strong background in theoretical and experimental aspects of speech processing and recognition and has recently been awarded a research grant by the Croucher Foundation. Computing facilities available for research in the Faculty consist of more than 150 DECstations, SPARCstations, SGI as well as numerous 386PCs. They are fully networked and are linked to both the Bitnet and Internet. There are several application software packages for speech and neural networks projects which include COMDISCO, MONARCH, ILS, Hypersignal Plus and NeuroExplorer. Also available is a quiet room, a KAY Computerized Speech Lab Model 4300, D/A and A/D converters as well as front end audio equipment for real time speech processing. Appointment for the successful applicant will be made on a 1-year contract initially and might be renewable for another year subject to satisfactory performance. The appointee might also register in our Ph.D. programme. The starting salary for degree holder will be approximately HKD$9,400 per month (1 USD = 7.8 HKD) or above depending on experience and qualification. Enquiries and/or applications should be directed to Dr. P.C. Ching, Department of Electronic Engineering, (Tel: 6096380, FAX: 6035558, e-mail: pcching at cuele.cuhk.hk) or Dr. L.W. Chan, Department of Computer Science, (Tel: 6098865, FAX: 6035024, e-mail: lwchan at cucsd.cuhk.hk), The Chinese University of Hong Kong, Shatin, Hong Kong. The deadline for the application is December 31, 1991. I will go to IJCNN-91-Singapore next week. If you are interested in this position, you can talk to me directly in IJCNN. Lai-Wan Chan, Computer Science Dept, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. Email : lwchan at cucsd.cuhk.hk Tel : (+852) 609 8865 FAX : (+852) 603 5024 From David_Plaut at K.GP.CS.CMU.EDU Thu Nov 14 09:58:02 1991 From: David_Plaut at K.GP.CS.CMU.EDU (David_Plaut@K.GP.CS.CMU.EDU) Date: Thu, 14 Nov 91 09:58:02 EST Subject: preprints and thesis TR available Message-ID: <11445.690130682@K.GP.CS.CMU.EDU> I've placed two papers in the neuroprose archive. Instructions for retrieving them are at the end of the message. (Thanks again to Jordan Pollack for maintaining the archive.) The first (plaut.thesis-summary.ps.Z) is a 15 page summary of my thesis, entitled "Connectionist Neuropsychology: The Breakdown and Recovery of Behavior in Lesioned Attractor Networks" (abstract below). For people who want more detail, the second paper (plaut.dyslexia.ps.Z) is a 119 page TR, co-authored with Tim Shallice, that presents a systematic analysis of work by Hinton & Shallice on modeling deep dyslexia, extending the approach to a more comprehensive account of the syndrome. FTP'ers should be forewarned that the file is about 0.5 Mbytes compressed, 1.8 Mbytes uncompressed. For true die-hards, the full thesis (325 pages) is available as CMU-CS-91-185 from Computer Science Documentation School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3890 reports at cs.cmu.edu To defray printing/mailing costs, requests for the thesis TR must be accompanied by a check or money order for US$10 (domestic) or US$15 (overseas) payable to "Carnegie Mellon University." Enjoy, -Dave Connectionist Neuropsychology: The Breakdown and Recovery of Behavior in Lesioned Attractor Networks David C. Plaut An often-cited advantage of connectionist networks is that they degrade gracefully under damage. Most demonstrations of the effects of damage and subsequent relearning in these networks have only looked at very general measures of performance. More recent studies suggest that damage in connectionist networks can reproduce the specific patterns of behavior of patients with neurological damage, supporting the claim that these networks provide insight into the neural implementation of cognitive processes. However, the existing demonstrations are not very general, and there is little understanding of what underlying principles are responsible for the results. This thesis investigates the effects of damage in connectionist networks in order to analyze their behavior more thoroughly and assess their effectiveness and generality in reproducing neuropsychological phenomena. We focus on connectionist networks that make familiar patterns of activity into stable ``attractors.'' Unit interactions cause similar but unfamiliar patterns to move towards the nearest familiar pattern, providing a type of ``clean-up.'' In unstructured tasks, in which inputs and outputs are arbitrarily related, the boundaries between attractors can help ``pull apart'' very similar inputs into very different final patterns. Errors arise when damage causes the network to settle into a neighboring but incorrect attractor. In this way, the pattern of errors produced by the damaged network reflects the layout of the attractors that develop through learning. In a series of simulations in the domain of reading via meaning, networks are trained to pronounce written words via a simplified representation of their semantics. This task is unstructured in the sense that there is no intrinsic relationship between a word and its meaning. Under damage, the networks produce errors that show a distribution of visual and semantic influences quite similar to that of brain-injured patients with ``deep dyslexia.'' Further simulations replicate other characteristics of these patients, including additional error types, better performance on concrete vs.\ abstract words, preserved lexical decision, and greater confidence in visual vs.\ semantic errors. A range of network architectures and learning procedures produce qualitatively similar results, demonstrating that the layout of attractors depends more on the nature of the task than on the architectural details of the network that enable the attractors to develop. Additional simulations address issues in relearning after damage: the speed of recovery, degree of generalization, and strategies for optimizing recovery. Relative differences in the degree of relearning and generalization for different network lesion locations can be understood in terms of the amount of structure in the subtasks performed by parts of the network. Finally, in the related domain of object recognition, a similar network is trained to generate semantic representations of objects from high-level visual representations. In addition to the standard weights, the network has correlational weights useful for implementing short-term associative memory. Under damage, the network exhibits the complex semantic and perseverative effects of patients with a visual naming disorder known as ``optic aphasia,'' in which previously presented objects influence the response to the current object. Like optic aphasics, the network produces predominantly semantic rather than visual errors because, in contrast to reading, there is some structure in the mapping from visual to semantic representations for objects. Taken together, the results of the thesis demonstrate that the breakdown and recovery of behavior in lesioned attractor networks reproduces specific neuropsychological phenomena by virtue of the way the structure of a task shapes the layout of attractors. unix> ftp 128.146.8.52 Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get plaut.thesis-summary.ps.Z (or plaut.dyslexia.ps.Z) ftp> quit unix> zcat plaut.thesis-summary.ps.Z | lpr ------------------------------------------------------------------------------ David Plaut dcp+ at cs.cmu.edu Department of Psychology 412/268-5145 Carnegie Mellon University Pittsburgh, PA 15213-3890 From michael at dasy.cbs.dk Thu Nov 14 08:59:03 1991 From: michael at dasy.cbs.dk (Michael Egmont-Petersen) Date: Thu, 14 Nov 91 14:59:03 +0100 Subject: Case frequency versus case distance during learning Message-ID: <9111141359.AA02755@dasy.cbs.dk> Dear connectionists During learning the gradient methods use a mix of case similarity (euclidian) and case frequency to direct each step in the hyperspace. Each weight is changed with a certain magnitude: delta W = (Target-Output) * f'(Act) * f(Act-1) (for Back-propagation) to adjust weights between the (last) hidden layer and the output layer. I have wondered how much emphasis Back-propagation puts on "case similarity" (euclidian) while determining delta W. The underlying problem is the following: * How large a role plays the number of cases (in each category) compared to their (euclidian) (dis)similarity in adjusting the weights? It is a relevant question to pose because other learning algorithms such as ID3 *only* rely on case frequencies and NOT on distance between patterns within a cluster as well as distance between patterns belonging to different clusters. My question might already have been answered by some one in a paper. Is this the case, then don't bother the other connectionists with it, but mail me directly. Otherwise, it is a highly relevant question to pose, because the input representation then plays a role for how fast a network learns and furthermore its ability to generalize. Best regards Michael Egmont-Petersen Institute for Computer and Systems Sciences Copenhagen Business School DK-1925 Frb. C. Denmark E-mail: michael at dasy.cbs.dk From deo at cs.pdx.edu Thu Nov 14 11:34:24 1991 From: deo at cs.pdx.edu (Steven Farber) Date: Thu, 14 Nov 91 8:34:24 PST Subject: Change of address Message-ID: <9111141634.AA21541@jove.cs.pdx.edu> My address has changed from "steven at m2xenix.rain.com" to "deo at jove.cs.pdx.edu". Could you please change the address listed in the mailing list so my mail comes here instead? From p-mehra at uiuc.edu Thu Nov 14 14:16:51 1991 From: p-mehra at uiuc.edu (Pankaj Mehra) Date: Thu, 14 Nov 91 13:16:51 CST Subject: Applications Message-ID: <9111141916.AA03685@hobbes> > > From: shams at maxwell.hrl.hac.com > Subject: "real-world" applications of neural nets > > We are looking for "real-world" applications of neural networks to be > used as benchmarks for evaluating the performance of our > neurocomputer architecture. I will recommend the following book -- actually the proceedings of the conf on ANNs in Engg held in St Louis this week -- to anyone interested in ``real-world'' applications: Cihan Dagli, et al. (eds.), "Intelligent Engineering Systems through Artificial Neural Networks," New York: ASME Press, 1991. Ordering Information: ISBN 0-7918-0026-1 The American Society of Mechanical Engineers 345 East 47th Street, New York, NY 10017, U.S.A. Of course, if you are out looking for OFFLINE data, then you are ignoring a big chunk of the real world. If you contact the authors from this book, you might be able to obtain online simulator software as well as offline benchmark data. -Pankaj From uharigop at plucky.eng.ua.edu Thu Nov 14 17:24:39 1991 From: uharigop at plucky.eng.ua.edu (Umeshram Harigopal) Date: Thu, 14 Nov 91 16:24:39 -0600 Subject: No subject Message-ID: <9111142224.AA19216@plucky.eng.ua.edu> I am a graduate student in CS in the University of Alabama. In the process of my reading some material in Neural Network Architectures, I am right now stuck at a point studying recurrent networks. Having understood a Hopfield network (the basic recurrent network if I am correct) and the generalization of backpropogation to recurrent networks as in William and Zipser's paper it remains a problem to me to understand what a 'higher order' recurrent network is. I will be very thankful if someone can help me in this. Thanking you -Umesh Harigopal E-mail : uharigop at buster.eng.ua.edu From jon at cs.flinders.oz.au Fri Nov 15 00:29:36 1991 From: jon at cs.flinders.oz.au (jon@cs.flinders.oz.au) Date: Fri, 15 Nov 91 15:59:36 +1030 Subject: Patents Message-ID: <9111150529.AA02911@turner> From: Aaron Owens : ---------- U. S. Patent No. 5,046,020 abstract ---------- DISTRIBUTED PARALLEL PROCESSING NETWORK WHEREIN THE CONNECTION WEIGHTS ARE GENERATED USING STIFF DIFFERENTIAL EQUATIONS Inventor: David L. Filkin Assignee: E. I. du Pont de Nemours and Company U. S. Patent Number 5,046,020 Sep. 3, 1991 Abstract. A parallel distributed processing network of the back propagation type is disclosed in which the weights of connection between processing elements in the various layers of the network are determined in accordance with the set of steady solutions of the stiff differential equations governing the relationship between the layers of the network. Its so nice to see people facilitating the free and unencumbered dissemination of knowledge by borrowing pubicly distributed ideas like Backprop, modifying them slightly and then patenting them for their own gain! Jon Baxter From thodberg at nn.meatre.dk Fri Nov 15 10:12:20 1991 From: thodberg at nn.meatre.dk (Hans Henrik Thodberg) Date: Fri, 15 Nov 91 16:12:20 +0100 Subject: Subtractive network design Message-ID: <9111151512.AA02320@nn.meatre.dk.meatre.dk> I would like a discussion on the virtue of subtractive versus additive methods in design of neural networks. It is widely accepted that if several networks process the training data correctly, the smallest of them will generalise best. The design problem is therefore to find these mimimal nets. Many workers have chosen to construct the networks by adding nodes or weights until the training data is processed correctly (cascade correlation, adding hidden units during training, meiosis). This philosophy is natural in our culture. We are used to custruct things by pieces. I would like to advocate an alternative method. One trains a (too) large network, and then SUBTRACTS nodes or weights (while retraining) until the network starts to fail to process the training data correctly. Neural networks are powerful because they can form global or distributed representations of a domain. The global structures are more economic, i.e they use fewer weights, and therefore generalise better. My point is that subtractive shemes are more likely to find these global descriptions. These structures so to speek condense out of the more complicated structures under the force of subtraction. I would like to hear your opinion on this claim! I give here some references on subtractive schemes: Y.Le Cun, J.S.Denker and S.A.Solla, "Optimal Brain Damage", NIPS 2, p.598-605 H.H.Thodberg, "Improving Generalization of Neural Networks through Pruning", Int. Journal of Neural Systems, 1, 317-326, (1991). A.S.Weigend, D.E.Rumelhart and B.A.Huberman, "Generalization by Weight Elimination with Application to Forecasting", NIPS 3, p. 877-882. ------------------------------------------------------------------ Hans Henrik Thodberg Email: thodberg at nn.meatre.dk Danish Meat Research Institute Phone: (+45) 42 36 12 00 Maglegaardsvej 2, Postboks 57 Fax: (+45) 42 36 48 36 DK-4000 Roskilde, Denmark ------------------------------------------------------------------ From COGSCI92 at ucs.indiana.edu Fri Nov 15 11:28:37 1991 From: COGSCI92 at ucs.indiana.edu (Cognitive Science Conference 1992) Date: Fri, 15 Nov 91 11:28:37 EST Subject: COG SCI 92: CALL FOR PAPERS Message-ID: please post ================================================================= CALL FOR PAPERS: The Fourteenth Annual Conference of The Cognitive Science Society July 29 -- August 1, 1992 Indiana University THE CONFERENCE: --------------- The Annual Conference of the Cognitive Science Society brings together researchers studying cognition in humans, animals or machines. The 1992 Conference will be held at Indiana University. Plenary speakers for the conference are: Elizabeth Bates John Holland Daniel Dennett Richard Shiffrin Martha Farah Michael Turvey Douglas Hofstadter The Conference will also feature evening entertainments: a welcoming reception (Wed), banquet (Thurs), poster reception (Fri), and concert (Sat). PAPER SUBMISSION INSTRUCTIONS: ------------------------------ Paper and poster submissions are encouraged in the areas of cognitive psychology, artificial intelligence, linguistics, cognitive anthropology, connectionist models, cognitive neuroscience, education, cognitive development, philosophical foundations, as well as any other area of relevance to cognitive science. Authors should submit five (5) copies of their papers in hard copy form to Cognitive Science 1992 Submissions Attn: Candace Shertzer Cognitive Science Program Psychology Building Indiana University Bloomington, IN 47405 All accepted papers will appear in the Conference Proceedings. Presentation format (talk or poster) will be decided by a review panel, unless the author specifically requests consideration for only one format. Electronic and FAX submissions cannot be accepted. David Marr Memorial Prizes for Excellent Student Papers: -------------------------------------------------------- To encourage even greater student participation in the Conference, papers that have a student as first author are eligible to compete for one of four David Marr Memorial Prizes. Student-authored papers will be judged by reviewers and the Program Committee for excellence in research and presentation. Each of the four Prizes is accompanied by a $300 honorarium. The David Marr Prize is funded by an anonymous donor. Appearance and length: ---------------------- Papers should be a maximum of six (6) pages long (excluding cover page, described below), have at least 1 inch margins on all sides, and use no smaller than 10pt type. Camera-ready versions will be required only after authors are notified of acceptance. Cover page: ----------- Each copy of the paper must include a cover page, separate from the body of the paper, that includes (in order): 1. Title of paper. 2. Full names, postal addresses, phone numbers and e-mail addresses (if available) of all authors. 3. An abstract of no more than 200 words. 4. The area and subarea in which the paper should be reviewed. 5. Preference of presentation format: Talk or poster; talk only; poster only. 6. A note stating whether the first author is a student and should therefore be considered for a Marr Prize. Papers submission deadline: --------------------------- Papers must be *received* by March 2, 1992. Notification of acceptance or rejection will be made by April 10. Camera ready versions of accepted papers are due May 8. SYMPOSIA: --------- Symposium submissions are also encouraged. Submissions should specify: 1. A brief description of the topic. 2. How the symposium would address a broad cognitive science audience. 3. Names of symposium organizer(s) and potential speakers and their topics. 4. Proposed format of symposium (e.g., all formal talks; brief talks plus panel discussion; open discussion; etc.). Symposia should be designed to last 1 hr 40 min. Symposium submission deadline: ------------------------------ Symposium submissions must be received by January 13, 1992, and should be sent as soon as possible. Note that the deadline for symposium submissions is earlier than for papers. TRAVEL: ------- By air, fly to Indianapolis (not Bloomington) where pre-arranged, inexpensive charter buses will take you on the 1-hour drive to Bloomington. Discount airfares are available from the conference airline, USAir, which has flights from Europe and Canada as well as within the continental US. Full details regarding travel, lodging and registration will be given in a subsequent announcement. FOR MORE INFORMATION CONTACT: ----------------------------- John K. Kruschke, Conference Chair e-mail: cogsci92 at ucs.indiana.edu Candace Shertzer, Cognitive Science Program Secretary phone: (812) 855-4658 e-mail: cshertze at silver.ucs.indiana.edu Cognitive Science Program Psychology Building Indiana University Bloomington, IN 47405 ================================================================= From sontag at control.rutgers.edu Fri Nov 15 18:08:50 1991 From: sontag at control.rutgers.edu (sontag@control.rutgers.edu) Date: Fri, 15 Nov 91 18:08:50 EST Subject: Patenting of algorithms Message-ID: <9111152308.AA24271@control.rutgers.edu> There was recently a message to this bboard regarding the patenting of neural net algorithms (as opposed to copyrighting of software). With permission, I am reprinting here a report prepared by the Mathematical Programming Society regarding the issue of patenting algorithms. (The report appears in the forthcoming issue of SIAM News.) I DO NOT want to generate a discussion of this general topic in connectionists; the purpose of reprinting this here is just to make people aware of the report and its strong recommendations, which are especially relevant for an area such as neural nets. (I suggest the use of comp.ai.neural-nets for discussion.) -eduardo PS: I have not included the Appendices that are referred to, as I did not obtain permission to reprint them. ---Copyright issue, NOT patent...! :-) PS_2: Note the irony: the first signatory of the report is George Dantzig, who in essence designed the most useful algorithm for (batch) perceptron learning. ---------------------------- cut here (TeX file) ------------------------------ \hyphenation{Tex-as} \def\subpar{\hfill\break\indent\indent} \centerline{\bf Report of the Committee on Algorithms and the Law} \medskip \centerline{Mathematical Programming Society} \beginsection Background and charge The Committee was appointed in the spring of 1990 by George Nemhauser, Chairman of the Mathematical Programming Society (MPS). Its charge follows: {\narrower \noindent ``The purpose of the committee should be to devise a position for MPS to adopt and publicize regarding the effects of patents on the advancement of research and education in our field. The committee may also wish to comment on the recent past history.'' \smallskip} This is the report of the Committee. It comprises a main body with our assumptions, findings of fact, conclusions, and recommendations. There are two appendices, prepared by others, containing a great deal of specific factual information and some additional analysis. \beginsection Assumptions MPS is a professional, scientific society whose members engage in research and teaching of the theory, implementation and practical use of optimization methods. It is within the purview of MPS to promote its activities (via publications, symposia, prizes, newsletter), to set standards by which that research can be measured (such as criteria for publication and prizes, guidelines for computational testing, etc.), and to take positions on issues which directly affect our profession. It is not within the purview of MPS to market software products, and MPS should not become involved in issues related to the commercial aspects of our profession except where it directly affects research and education. The Committee is unable to make expert legal analyses or to provide legal counsel. The main body of this report is therefore written from the perspective of practitioners of mathematical programming rather than from that of attorneys skilled in the law. MPS is an international society. However, the Committee has interpreted its charge as applying apecifically to U.S. patent law and its application to algorithms. All comments and conclusions of this report should be read with this fact in mind. \beginsection Facts about patents and copyrights The three principal forms of legal protection for intellectual property are the copyright, the patent, and the trade secret. Copyrights and patents are governed by federal law, trade secrets by state law. Setting aside the issue of trade secrets, some of the distinctions between copyrights and patents can be summarized as follows. {\it Type of property protected:\/} Patents protect ideas, principally ``nonobvious'' inventions and designs. It is well estabished that ``processes'' are patentable. The Patent Office currently grants patents on algorithms and software, on the basis of the ambiguous 1981 U.S. Supreme Court decision in {\it Diamond v. Diehr.} Copyrights do not protect ideas. Instead, they protect the {\it expression} of ideas, in ``original works of authorship in any tangible medium of expression.'' The principle that software is copyrightable appears to have been well established by the 1983 decision of the U.S. Court of Appeals in {\it Apple v. Franklin.} {\it How protection is obtained:\/} Federal law is now in essential conformity with the Bern Copyright Convention. As a consequence, international copyrights are created virtually automatically for most works of authorship. Government registration of copyrights is simple and inexpensive to obtain. By contrast, patents are issued by the U.S. Patent Office only after an examination procedure that is both lengthy (three years or more) and costly (\$10,000 and up in fees and legal expenses). An inventor must avoid public disclosure of his invention, at least until patent application is made, else the invention will be deemed to be in the public domain. Patent application proceedings are confidential, so that trade secret protection can be obtained if a patent is not granted. {\it Length of protection:\/} U.S. patents are for 17 years. Copyrights are for the lifetime of the individual plus 50 years or, in the case of corporations, 75-100 years. \beginsection Facts about algorithms Algorithms are typically designed and developed in a highly decentralized manner by single individuals or small groups working together. This requires no special equipment, few resources, and little cost. The number of people involved is also quite large compared to the needs of the marketplace. Independent rediscovery is a commonly occurring phenomenon. There is a long and distinguished history of public disclosure by developers of mathematical algorithms via the usual and widely-accepted channels of publication in scientific journals and talks at professional meetings. These disclosures include the theoretical underpinnings of the method, implementation details, computational results, and case studies of results on applied problems. Indeed, algorithm development is based on the tradition of building upon previous work by generalizing and improving solution principles from one situation to another. The commercial end product of an algorithm (if there is any) is generally a software package, where the algorithm is again generally implemented by a very small number of individuals. Of course, a larger group of people may be involved in building the package around the optimization software to handle the user interface, data processing, etc. Also, others may be involved to handle functions like marketing, distribution, and maintenance. Competition in the marketplace has been traditionally based on the performance of particular implementations and features provided by particular software products. The product is often treated like a ``black box'' with the specific algorithm used playing a rather minor role. The cost of producing, manufacturing, distributing and advertising optimization software is often quite small. Even when this is not the case, it is generally the implementation of algorithms that is costly, rather than their development. Software manufacturers have a need to protect their investment in implementation, but have little need to protect an investment in algorithmic development. In the absence of patents, algorithms--like all of mathematics and basic science-- are freely available for all to use. Traditionally, developers of optimization software have protected their investments by keeping the details of their implementation secret while allowing the general principles to become public. Software copyrights are also an appropriate form of protection, and are now widely used. Moreover, despite unresolved legal questions concerning the ``look and feel'' of software, the legal issues of copyright protection seem to be relatively well settled. Often an optimization package is a small (but important) part of an overall planning process. That process is often quite complex; it may require many resources and great cost to complete, and the potential benefits may be uncertain and distributed over a long time period. In such situations it is usually quite difficult to quantify the net financial impact made by the embedded optimization package. \beginsection Public policy issues {\it Will algorithm patents promote invention?} Article I, Section 8 of the U.S. Constitution empowers Congress ``To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.'' Inasmuch as patents are intended to provide an incentive for invention, it seems appropriate to inquire whether patenting of algorithms will, in fact, create an incentive for the invention of algorithms. Given the existing intensity of research and the rapid pace of algorithmic invention, it seems hard to argue that additional incentives are needed. In fact, there is good reason to believe that algorithm patents will inhibit research, in that free exchange of ideas will be curtailed, new developments will be held secret, and researchers will be subjected to undesired legal constraints. {\it Will algorithm patents provide needed protection for software manufacturers?} Copyright and trade secret protection appear to provide the sort of protection most needed by software manufacturers. By their nature, patents seem to offer a greater potential for legal confrontation than copyrights. Instead of providing protection, algorithm patents actually pose a threat to smaller software houses lacking the resources to defend themselves in costly patent litigation. It can be argued that patents encourage an oligarchical industrial structure and discourage competition. {\it Is the Patent Office able to deal with algorithm patents?} There is abundant evidence that the Patent Office is not up to the job. Many algorithmic ``inventions'' have been granted undeserved patents, greatly increasing the potential for legal entanglement and litigation. Moreover, it seems unlikely that there will be any substantial improvement in the quality of patent examinations. \beginsection Conclusions It seems clear from the previous discussion that the nature of work on algorithms is quite different from that in other fields where the principles of patents apply more readily. This in itself is a strong argument against patenting algorithms. In addition, we believe that the patenting of algorithms would have an extremely damaging effect on our research and on our teaching, particularly at the graduate level, far outweighing any imaginable commercial benefit. Here is a partial list of reasons for this view: \item{$\bullet$} Patents provide a protection which is not warranted given the nature of our work. \item{$\bullet$} Patents are filed secretly and would likely slow down the flow of information and the development of results in the field. \item{$\bullet$} Patents necessarily impose a long-term monopoly over inventions. This would likely restrict rather than enhance the availability of algorithms and software for optimization. \item{$\bullet$} Patents introduce tremendous uncertainty and add a large cost and risk factor to our work. This is unwarranted since our work does not generate large amounts of capital. \item{$\bullet$} Patents would not provide any additional source of public information about algorithms. \item{$\bullet$} Patents would largely be concentrated within large institutions as universities and industrial labs would likely become the owners of patents on algorithms produced by their researchers. \item{$\bullet$} Once granted, even a patent with obviously invalid claims would be difficult to overturn by persons in our profession due to high legal costs. \item{$\bullet$} If patents on algorithms were to become commonplace, it is likely that nearly all algorithms, new or old, would be patented to provide a defense against future lawsuits and as a potential revenue stream for future royalties. Such a situation would have a very negative effect on our profession. \beginsection Recommendations The practice of patenting algorithms is harmful to the progress of research and teaching in optimization, and therefore harmful to the vital interests of MPS. MPS should therefore take such actions as it can to help stop this practice, or to limit it if it cannot be stopped. In particular: \item{$\bullet$} The MPS Council should adopt a resolution opposing the patenting of algorithms on the grounds that it harms research and teaching. \item{$\bullet$} MPS should urge its sister societies ({\it e.g.,} SIAM, ACM, IEEE Computer Society, AMS) to take a similar forthright position against algorithm patents. \item{$\bullet$} MPS should publish information in one or more of its publications as to why patenting of algorithms is undesirable. \item{$\bullet$} The Chairman of MPS should write in his official capacity to urge members of Congress to pass a law declaring algorithms non-patentable (and, if possible, nullifying the effects of patents already granted on algorithms). \item{$\bullet$} MPS should support the efforts of other organizations to intervene in opposition to the patenting of algorithms (for example, as friends of the court or with Congress). It should do so by means such as providing factual information on mathematical programming issues and/or history, and commenting on the impact of the patent issue to our research and teaching in mathematical programming. MPS should urge its members to do likewise. \vskip .6 in \settabs 6 \columns \centerline{The Committee on Algorithms and the Law} \smallskip \+&&&George B. Dantzig\cr \+&&&Donald Goldfarb\cr \+&&&Eugene Lawler\cr \+&&&Clyde Monma\cr \+&&&Stephen M. Robinson (Chair)\cr \medskip \+&&&26 September 1990\cr \vfill\eject \end From uh311ae at sunmanager.lrz-muenchen.de Sun Nov 17 17:40:49 1991 From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges) Date: 17 Nov 91 23:40:49+0100 Subject: Boltzmann machine, anyone ? Message-ID: <9111172240.AA12057@sunmanager.lrz-muenchen.de> I heard that some people failed to make an analog VLSI implementation. Is there any fast hardware out ? Would anybody still think it worth a try, at least in principle ? Thanks, Henrik MPCI at LLNL IBM Research U. of Munich From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Nov 17 18:03:27 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 17 Nov 91 18:03:27 EST Subject: Subtractive network design In-Reply-To: Your message of Fri, 15 Nov 91 16:12:20 +0100. <9111151512.AA02320@nn.meatre.dk.meatre.dk> Message-ID: My point is that subtractive shemes are more likely to find these global descriptions. These structures so to speek condense out of the more complicated structures under the force of subtraction. I would like to hear your opinion on this claim! For best generalization in supervised learning, the goal is to develop a separating surface that captures as much as possible of the "signal" in the data set, without capturing too much of the noise. If one assumes that the signal components are larger and more coherent than the noise, you can do this by restricting the complexity of the separating surface(s). This, in turn, can be accomplished by choosing a network architecture with exactly the right level of complexity or by stopping the training before the surface gets too contorted. (The excess degrees of freedom are still there, but tend to be redundant with one another in the early phases of training.) Since, in most cases, we can't guess in advance just what architecture is needed, we must make this selection dynamically. An architecture like Cascade-Correlation builds the best model it can without hidden units, then the best it can do with one, and so on. It's possible to stop the process as soon as the cross-validation performance begins to decline -- a sign that the signal has been exhausted and you're starting to model noise. One problem is that each new hidden unit receives connections from ALL available inputs. Normally, you don't really need all those free parameters at once, and the excess ones can hurt generalization in some problems. Various schemes have been proposed to eliminate these unneccessary degrees of freedom as the new units are being trained, and I think this problem will soon be solved. A subtractive scheme can also lead to a network of about the right complexity, and you cite a couple of excellent studies that demonstrate this. But I don't see why these should be better than additive methods (except for the problem noted above). You suggest that a larger net can somehow form a good global description (presumably one that models a lot of the noise as well as the signal), and that the good stuff is more likely to be retained as the net is compressed. I think it is equally likely that the global model will form some sort of description that blends signal and noise components in a very distributed manner, and that it is then hard to get rid of just the noisy parts by eliminating discrete chunks of network. That's my hunch, anyway -- maybe someone with more experience in subtractive methods can comment. I beleive that the subtractive schemes will be slower, other things being equal: you have to train a very large net, lop off something, retrain and evaluate the remainder, and iterate till done. It's quicker to build up small nets and to lock in useful sub-assemblies as you go. But I guess you wanted to focus only on generalization and not on speed. Well, opinions are cheap. If you really want to know the answer, why don't you run some careful comparative studies and tell the rest of us what you find out. Scott Fahlman School of Computer Science Carnegie Mellon University From KRUSCHKE at ucs.indiana.edu Sat Nov 16 15:38:42 1991 From: KRUSCHKE at ucs.indiana.edu (John K. Kruschke) Date: Sat, 16 Nov 91 15:38:42 EST Subject: Subtractive network design Message-ID: The dichotomy between additive and subtractive schemes for modifying network architectures is based on the notion that nodes which are not "in" the network consume no computation or memory; i.e., what gets added or subtracted is the *presence* of the node. An alternative construal is that what gets added or subtracted is not the node itself but its *participation* in the functionality of the network. As a trivial example, a node can be present but not participate if all the weights leading out of it are zero. Under the first construal (presence), subtractive schemes can be more expensive to implement in hardware or software than "additive" schemes, because the additive schemes spend nothing on nodes which aren't there yet. Under the second construal (functional participation), the two schemes consume equal amounts of resources, because all the nodes are processed all the time. In this latter case, arguments for or against one type of scheme must come from other constraints; e.g., ability to generalize, learning speed, neural plausibility, or even (gasp!) human performance. Architecture modification schemes can be both additive and subtractive. For example, Kruschke and Movellan (1991) described an algorithm in which individual nodes from a large pool of candidates can have their functional participation gradually suppressed (subtracted) or resurrected (added). Other methods for manipulating the functional participation of hidden nodes are described in the other papers listed below. Kruschke, J. K., & Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal hidden layers in back propagation networks. IEEE Transactions on Systems, Man and Cybernetics, v.21, pp.273-280. Kruschke, J. K. (1989b). Distributed bottlenecks for improved generalization in back-propagation networks. International Journal of Neural Networks Research and Applications, v.1, pp.187-193. Kruschke, J. K. (1989a). Improving generalization in back-propagation networks with distributed bottlenecks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, Washington D.C. June 1989, v.1, pp.443-447. Kruschke, J. K. (1988). Creating local and distributed bottlenecks in hidden layers of back-propagation networks. In: D. Touretzky, G. Hinton, & T. Sejnowski (eds.), Proceedings of the 1988 Connectionist Models Summer School, pp.120-126. San Mateo, CA: Morgann Kaufmann. ------------------------------------------------------------------- John K. Kruschke Asst. Prof. of Psych. & Cog. Sci. Dept. of Psychology internet: kruschke at ucs.indiana.edu Indiana University bitnet: kruschke at iubacs Bloomington, IN 47405-4201 office: (812) 855-3192 USA lab: (812) 855-9613 =================================================================== From reiner at isy.liu.se Mon Nov 18 00:50:47 1991 From: reiner at isy.liu.se (Reiner Lenz) Date: Mon, 18 Nov 91 06:50:47 +0100 Subject: Algorithms for Principal Components Analysis In-Reply-To: "Terence D. Sanger" Tue, 12 Nov 91 17:54:54 EST Message-ID: <9111180545.AA04405@rainier.isy.liu.se> Here is our contribution to the computation of Principle Components. We developed 1) a system that learns the principle components in parallel @article{Len_proof:91, author ={Reiner Lenz and Mats \"Osterberg}, title="Computing the Karhunen-Loeve expansion with a parallel, unsupervised filter system", journal = "Neural Computations", year = "Accepted" } 2) Recently we modified this system to overcome some of the drawbacks of the standard principle components approach (such as mixing eigenvectors belonging to the same eigenvector etc). @techreport{Len_4o:91, author ={Reiner Lenz and Mats \"Osterberg}, title="A new method for unsupervised linear feature extraction using forth order moments", institution={Link\"oping University, ISY, S-58183 Link\"oping}, note="Internal Report", year="1991" } These systems are part of our work on group theoretical methods in image science as described in @article{Len_jos:89, author ="Reiner Lenz", title ="A Group Theoretical Model of Feature Extraction", journal="J. Optical Soc. America A", volume="6", number="6", pages="827-834", year = "1989" } @article{Len:90, author= "Reiner Lenz", title = "Group-Invariant Pattern Recognition", journal = "Pattern Recognition", volume="23", number="1/2", pages = "199-218", year = "1990" } @article{Len_nn:91, author ="Reiner Lenz", title="On probabilistic Invariance", journal = "Neural Networks", volume="4", number="5", year = "1991" } @book{Len:90ln, author= "Reiner Lenz", title = "Group Theoretical Methods in Image Processing", publisher = "Springer Verlag", series = "Lecture Notes in Computer Science (Vol. 413)", address = "Heidelberg, Berlin, New York", year = "1990" } From lba at sara.inesc.pt Sun Nov 17 16:33:29 1991 From: lba at sara.inesc.pt (Luis B. Almeida) Date: Sun, 17 Nov 91 20:33:29 -0100 Subject: Patents In-Reply-To: jon@cs.flinders.oz.au's message of Fri, 15 Nov 91 15:59:36 +1030 <9111150529.AA02911@turner> Message-ID: <9111172133.AA22872@sara.inesc.pt> I am sorry, but I am strongly in favor of the right to patent algorithms. In fact, I have recently patented the usual algorithm for multiplication of numbers in any base, including binary (you may not believe it, but it had not yet been patented). I am expecting to earn large sums of money, especially from computer and calculator manufacturers, and also from neural network people, who are said to be very fond of sums of products. I wouldn't like this large income to simply disappear. I am now in the process of patenting algorithms for addition, subtraction and division, and I already have an idea of a square root algorithm, which probably also is worth patenting. Luis B. Almeida INESC Phone: +351-1-544607 Apartado 10105 Fax: +351-1-525843 P-1017 Lisboa Codex Portugal lba at inesc.pt lba at inesc.uucp (if you have access to uucp) From srikanth at cs.tulane.edu Mon Nov 18 12:19:01 1991 From: srikanth at cs.tulane.edu (R Srikanth) Date: Mon, 18 Nov 91 11:19:01 CST Subject: Subtractive network design In-Reply-To: <9111180509.AA27873@rex.cs.tulane.edu>; from "Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU" at Nov 17, 91 6:03 pm Message-ID: <9111181719.AA00230@poseidon.cs.tulane.edu> > > > My point is that subtractive shemes are more likely to find > these global descriptions. These structures so to speek condense out of > the more complicated structures under the force of subtraction. > > I would like to hear your opinion on this claim! > > A subtractive scheme can also lead to a network of about the right > complexity, and you cite a couple of excellent studies that demonstrate > this. But I don't see why these should be better than additive methods > (except for the problem noted above). You suggest that a larger net can > somehow form a good global description (presumably one that models a lot of > the noise as well as the signal), and that the good stuff is more likely to > be retained as the net is compressed. I think it is equally likely that > the global model will form some sort of description that blends signal and > noise components in a very distributed manner, and that it is then hard to > get rid of just the noisy parts by eliminating discrete chunks of network. > That's my hunch, anyway -- maybe someone with more experience in > subtractive methods can comment. > Also there is a question of over generalizations. A larger network say is given a set of m points to learn a parabola, may end up learning a higher order polynomial. Which is a case of over generalization leading to poor performance. Of course the vice vorsa is also true. The question posed here is do we need a first best fit or the most general fit ? Answer may be different for different problems. Thus we may be able to generate opposite views in different problem spaces. srikanth -- srikanth at rex.cs.tulane.edu Dept of Computer Science, Tulane University, New Orleans, La - 70118 From P.Refenes at cs.ucl.ac.uk Mon Nov 18 13:09:31 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Mon, 18 Nov 91 18:09:31 +0000 Subject: Subtractive network design In-Reply-To: Your message of "Sun, 17 Nov 91 18:03:27 EST." Message-ID: reduce the generality of a network and thus improve its generalisation. Depending on size and training times, fixed geometry networks often develop (near-) duplicate and/or (near-) reduntant functionality. Prunning techniques aim to remove this functionality from the network and they do quite well here. There are however two problems: firstly, these are not the only cases of increased functionality, and secondly, the removal of near zero connections often ignores the knock-on effects on generalisation due the accumulated influence that these connections might have. It is often conjectured that hidden unit size is the culprit for bad generalisation. This is not strictly so. The true culprit is the high degree of freedom in exploring the search space which also depends on other parameters such as training times. The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset (the fact that we are not using it for the backward pass is irrelevant). So any improvement in generalisation is probably due to the fact that we are using a larger training dataset (again the fact that we are doing manually, should not divert us). My view is that this method should be treated as a "good code of professional practise" when reporting results, rather than as a panacea. Paul Refenes From ingber at umiacs.UMD.EDU Mon Nov 18 14:50:56 1991 From: ingber at umiacs.UMD.EDU (Lester Ingber) Date: Mon, 18 Nov 1991 14:50:56 EST Subject: Genetic algorithms and very fast simulated re-annealing: A comparison Message-ID: <9111181950.AA00292@dweezil.umiacs.UMD.EDU> connectionists at cs.cmu.edu *** Please do not forward to any other lists *** Bruce Rosen and I have written the following paper, and placed it in the Neuroprose archive as ingber.saga.ps.tar.Z. Please see the note at the end of this file to extract the text and figures. Genetic algorithms and very fast simulated re-annealing: A comparison Lester Ingber Science Transfer Corporation, P.O. Box 857, McLean, VA 22101 ingber at umiacs.umd.edu and Bruce Rosen Department of Computer & Information Sciences, University of Delaware, Newark, DE 19716 brosen at cis.udel.edu We compare Genetic Algorithms (GA) with a functional search method, Very Fast Simulated Re-Annealing (VFSR) that not only is efficient in its search strategy, but also is statistically guaranteed to find the function optima. GA previously has been demonstrated to be competitive with other standard Boltzmann-type simulated annealing techniques. Presenting a suite of six stan- dard test functions to GA and VFSR codes from previous studies, without any additional fine tuning, strongly suggests that VFSR can be expected to be orders of magnitude more efficient than GA. To ftp this file from Neuroprose to your local machine, follow these directions, typing in commands between single quotes (without the quotes included). Start with 'cd /tmp' as noted below, so that you won't have to be concerned with deleting all these files after you're finished printing. local% cd /tmp local% 'ftp archive.cis.ohio-state.edu' [local% 'ftp 128.146.8.52'] Name (archive.cis.ohio-state.edu:yourloginname): 'anonymous' Password (archive.cis.ohio-state.edu:anonymous): 'yourloginname' ftp> 'cd pub/neuroprose' ftp> 'binary' ftp> 'get ingber.saga.ps.tar.Z' ftp> 'quit' local% Now, at your local machine: 'uncompress ingber.saga.ps.tar.Z' will leave "ingber.saga.ps.tar". 'tar xf ingber.saga.ps.tar' will leave a directory "saga.dir". 'cd saga.dir' will put you in a directory with the text.ps file and 6 figX.ps files, where X = 1-6. If you 'ls -l' you should get -rw-r--r-- 1 ingber 4928 Nov 17 06:49 fig1.ps -rw-r--r-- 1 ingber 6949 Nov 17 06:49 fig2.ps -rw-r--r-- 1 ingber 14432 Nov 17 06:50 fig3.ps -rw-r--r-- 1 ingber 5311 Nov 17 06:50 fig4.ps -rw-r--r-- 1 ingber 7552 Nov 17 06:50 fig5.ps -rw-r--r-- 1 ingber 6222 Nov 17 06:50 fig6.ps -rw-r--r-- 1 ingber 85945 Nov 17 06:52 text.ps (with your name instead of mine). Now you can 'lpr [-P..] *.ps' to a PostScript laserprinter. This will print out 18 pages: 12 pages of text + 6 graphs. If you'd like a copy of the final version when this paper is published, just drop me a note with the word sagareprint (all caps or all lower case O.K.) anyplace in your email, and I'll oblige. Lester Ingber ============================================================ ------------------------------------------ | | | | | | | Prof. Lester Ingber | | ______________________ | | | | | | Science Transfer Corporation | | P.O. Box 857 703-759-2769 | | McLean, VA 22101 ingber at umiacs.umd.edu | | | ------------------------------------------ ============================================================ From kolen-j at cis.ohio-state.edu Mon Nov 18 20:02:32 1991 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Mon, 18 Nov 91 20:02:32 -0500 Subject: No subject In-Reply-To: Umeshram Harigopal's message of Thu, 14 Nov 91 16:24:39 -0600 <9111142224.AA19216@plucky.eng.ua.edu> Message-ID: <9111190102.AA14728@retina.cis.ohio-state.edu> Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) John Kolen From soller%asylum at cs.utah.edu Mon Nov 18 23:34:22 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Mon, 18 Nov 91 21:34:22 -0700 Subject: Research Position in Medical Information Systems at VA GRECC Message-ID: <9111190434.AA03143@asylum.utah.edu> The Salt Lake City VA GRECC requested that I post the following notice. Please forward this message to any interested researchers. To prevent accidentally sending a response to all people on either of these very large mailing lists, do not respond directly by e-mail. Jerome B. Soller Salt Lake City VA Regional Information Systems Center -------------------------- cut here ------------------------------------------ **** Please Post *** Please Post *** Please Post *** Please Post ***** ------------------------------------------------------------------------------ POSITION AVAILABLE IN MEDICAL INFORMATION SYSTEMS ------------------------------------------------------------------------------ The Salt Lake City Veterans Affairs GRECC(Geriatric, Research, Education, and Clinical Center) has an opening for a Ph.D. or MD level senior research position in the area of Medical Information Systems. This GRECC is one of 15 GRECCs nationally. The computer work at the GRECC is being supported by the Department of Veterans Affairs Salt Lake City VA Regional Information Systems Center (one of 7 national R and D centers for the VA specializing in information systems) and the Salt Lake City Veterans Affairs Hospital's IRMS division. The Department of Veterans Affairs has 172 hospitals nationwide, all combined into the DHCP database. Because these hospitals serve millions of patients each year, the opportunity exists for analysis of huge data sets, otherwise unavailable. The GRECC encourages its researchers to pursue joint research with other research groups at the U. of Utah. Opportunities for joint research include the following: 1) Computer Science(Strength areas include expert systems tools, graphics, integrated circuit design, robotic/vision, parallel numerical modelling, etc..,) 2) Medical Informatics(Projects include the Help Expert System, the Iliad Expert System, semantic networks, and support of the Library of Medicine's Unified Medical Language System. The Department of Nursing Informatics complements the Department of Medical Informatics, but with an emphasis on nursing systems.) 3) Bioengineering(Has many neuroprosthetic projects.) 4) Human Genetics(Has just established a 50 million dollar research center and has its own computer research group.) 5) Anesthesiology (Has an industrially supported neural network research group.) 6) The Center for Engineering Design(Creators of the Utah/MIT dextrous hand, the Utah Arm, and medical devices.) 7) The Cognitive Science Program, which is in its formative stages. Candidates for this position should have knowledge and demonstrated research in many of the following areas with an emphasis or potential applicability to medical applications: databases, digital signal processing, instrumentation, expert systems, statistics, time series analysis, fuzzy logic, neural networks, parallel computation, physiological and neural modelling, and data fusion. Candidates for this position must be U.S. citizens. To apply, send or fax your curriculum vitae to Dr. Gerald Rothstein, Director of the GRECC Mail Code 182 500 Foothill Boulevard Salt Lake City, Utah 84148 Phone Number: (801) 582-1565 extension 2475 Fax Number: (801) 583-7338 Dr. Gerald Rothstein Director, Salt Lake City VA GRECC From ken at cns.caltech.edu Tue Nov 19 01:18:33 1991 From: ken at cns.caltech.edu (Ken Miller) Date: Mon, 18 Nov 91 22:18:33 PST Subject: con etiquete Message-ID: <9111190618.AA21823@cns.caltech.edu> Hello, I enjoy receiving connectionists to keep abreast of technical issues, meetings, new papers. But lately the flow of messages has become very large and the signal to noise has noticeably decreased, and these trends have been sustained for a long enough time that they seem likely to represent a change rather than a fluctuation. I think the reason is a change to a more "conversational" mode, in which people feel free to post their very preliminary and not-very-substantiated thoughts, for example "i think maybe this is good, and i think maybe that is bad". I would like to suggest that we all collectively raise the threshold for what is to be broadcast to the 1000's of us on the net as a whole. Many of these "conversational" entries would in my opinion be better kept as private conversations among the small group of people involved. When some concrete conclusions emerge, or a concrete set of questions needing more investigation emerges, *then* a *distilled* post could be sent to the net as a whole. If you will: forward prop through private links, backward prop to the net as a whole. This also means that, when a concrete question is posted, answers unless very solid might best be sent to the poster of the question, who in turn may eventually send a distillate to the net. Along the same lines, I would like to make a suggestion that people *strongly* avoid publicly broadcasting dialogues; iterate privately, and if relevant send the net a distilled version of the final results. Finally I would also suggest that the urge to philosophical, as opposed to technical, discussions be strongly suppressed; please set the threshold very very high. In conclusion, I would like to urge everyone to think of connectionists as a place for distilled rather than raw, technical rather than philosophical, discussions. Thanks, Ken p.s. please send angry flames to me, not the net. ken at cns.caltech.edu From ibm at dit.upm.es Tue Nov 19 09:38:30 1991 From: ibm at dit.upm.es (Ignacio Bellido Montes) Date: Tue, 19 Nov 91 15:38:30 +0100 Subject: Patents Message-ID: <9111191438.AA01121@bosco.dit.upm.es> I'm not sure if Luis B. Almeida's message is just a joke or something else, in that case,m I think I can patent the weel. This is one of the most used things in the world, and not only by computer scientist, but by everybody, so.... I must chec first if it is patented and in that case, I can act... Anyway, I don't know about laws, but I think that nobody should be able to patent something (algorithm or not) previously used by other people, who didn't patented it. Ignacio Bellido ============================================================================ Ignacio Bellido Fernandez-Montes, Departamento de Ingenieria de Department of Telematic Sistemas Telematicos (dit), Systems Engineering, Laboratorio de Inteligencia Artificial. Artificial Intelligence Laboratory. Universidad Politecnica de Madrid. Madrid University of Technology. e-mail: ibellido at dit.upm.es Phone: Work: .. 34 - 1 - 5495700 ext 440 Home: .. 34 - 1 - 4358739 TELEX: 47430 ETSIT E Fax: .. 34 - 1 - 5432077 ============================================================================ From pablo at cs.washington.edu Tue Nov 19 12:42:41 1991 From: pablo at cs.washington.edu (David Cohn) Date: Tue, 19 Nov 91 09:42:41 -0800 Subject: NIPS Workshop Announcement (and CFP) Message-ID: <9111191742.AA09972@june.cs.washington.edu> -------------------------------------------------------------- NIPS Workshop on Active Learning and Control Announcement (and call for participation) -------------------------------------------------------------- organizers: David Cohn, University of Washington Don Sofge, MIT AI Lab An "active" learning system is one that is not merely a passive observer of its environment, but instead play an active role in determining its inputs. This definition includes classification networks that query for values in "interesting" parts of their domain, learning systems that actively "explore" their environment, and adaptive controllers that learn how to produce control outputs to achieve a goal. Common facets of these problems include building world models in complex domains, exploring a domain to safely and efficiently, and, planning future actions based on one's model. In this workshop, our main focus will be to address key unsolved problems which may be holding up progress on these problems rather than presenting polished, finished results. Ours hopes are that unsolved problems in one field may be able to draw on insight from research in other fields. Each session of the workshop will begin with introductions to specific problems in the field by researchers in each area, with the second half of each session reserved for discussion. --------------------------------------------------------------------------- Current speakers include: Chris Atkeson, MIT AI Lab Tom Dietterich, Oregon State Univ. Michael Jordan, MIT Brain & Cognitive Sciences Michael Littman, BellCore Andrew Moore, MIT AI Lab Jurgen Schmidhuber, Univ. of Colorado, Boulder Satinder Singh, UMass Amherst Rich Sutton, GTE Sebastian Thrun, Carnegie-Mellon University A few open slots remain, so if you would be interested in discussing your "key unsolved problem" in active learning, exploration, planning or control, send email to David Cohn (pablo at cs.washington.edu) or Don Sofge (sofge at ai.mit.edu). --------------------------------------------------------------------------- Friday, 12/6, Morning Active Learning " " Afternoon Learning Control Saturday, 12/7, Morning Active Exploration " " Afternoon Planning --------------------------------------------------------------------------- From tony at aivru.sheffield.ac.uk Tue Nov 19 12:20:44 1991 From: tony at aivru.sheffield.ac.uk (Tony_Prescott) Date: Tue, 19 Nov 91 17:20:44 GMT Subject: connectionist map-building Message-ID: <9111191720.AA01072@aivru> Does anyone know the whereabouts of Martin Snaith of the Information Techonology group? He appeared on a BBC equinox program recently describing a mobile robot that navigates using a connectionist map. I would also be interested in hearing from anyone else who is using networks to generate maps for robot navigation. Thanks, Tony Prescott (AI vision research unit, Sheffield). From P.Refenes at cs.ucl.ac.uk Tue Nov 19 08:21:25 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Tue, 19 Nov 91 13:21:25 +0000 Subject: Subtractive network design Message-ID: I have just realised that the first part of my earlier message on this was garbled. The first few sentecnes read: =============================== I agree with Scott Fahlman on this point. Both techniques try to reduce the generality of a network and thus improve its generalisation. Depending on size and training times, fixed geometry networks often develop (near-) duplicate and/or (near-) reduntant functionality. Prunning techniques aim to remove this functionality from the network and they do quite well here. There are however two problems: firstly, these are not the only cases of increased functionality, and secondly, the removal of near zero connections often ignores the knock-on effects on generalisation due the accumulated influence that these connections might have. It is often conjectured that hidden unit size is the culprit for bad generalisation. This is not strictly so. The true culprit is the high degree of freedom in exploring the search space which also depends on other parameters such as training times. The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset (the fact that we are not using it for the backward pass is irrelevant). So any improvement in generalisation is probably due to the fact that we are using a larger training dataset (again the fact that we are doing manually, should not divert us). My view is that this method should be treated as a "good code of professional practise" when reporting results, rather than as a panacea. Paul Refenes From hinton at ai.toronto.edu Tue Nov 19 16:28:12 1991 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Tue, 19 Nov 1991 16:28:12 -0500 Subject: Subtractive network design In-Reply-To: Your message of Mon, 18 Nov 91 13:09:31 -0500. Message-ID: <91Nov19.162826edt.530@neuron.ai.toronto.edu> You say "The solution proposed by Scott Fahlman i.e. to use the cross-validation performance as an indicator of when to stop is not complete, because as soon as you do this the cross- validation dataset becomes part of the training dataset ... So any improvement in generalisation is probably due to the fact that we are using a larger training dataset." I think this is wrong because you only get a single number (when to stop training) from the validation set. So even if you made the validation set contain infinitely many cases, you would still be limited by the size of the original training set. Quite apart from this point, pruning techniques such as the soft-weight sharing method recently advertised on this net by Steve Nowlan and me (Pearlmutter, 1999) seem to work noticeably better than using a validation set to decide when to stop training. However, the use of a validation set is much simpler and therefore a good thing to try for people in a hurry. Geoff From tackett at ipld01.hac.com Tue Nov 19 16:59:44 1991 From: tackett at ipld01.hac.com (Walter Alden Tackett) Date: Tue, 19 Nov 91 13:59:44 PST Subject: Patents Message-ID: <9111192159.AA06000@ipld01.hac.com> > In fact, I have recently patented the usual algorithm for > multiplication of numbers in any base, I am now in the process of > patenting algorithms for addition, subtraction and division > Luis B. Almeida ....Sorry pal: these algorithms are previously published in the public domain, e.g., elementary school texts and the like, for the past couple of hundred years. And even if you did patent them, we connectionists would just get bootleg copies published in mainland China. ;-) -wt From Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU Tue Nov 19 23:10:34 1991 From: Dave_Touretzky at DST.BOLTZ.CS.CMU.EDU (Dave_Touretzky@DST.BOLTZ.CS.CMU.EDU) Date: Tue, 19 Nov 91 23:10:34 EST Subject: making CONNECTIONISTS a moderated newsgroup Message-ID: <16991.690610234@DST.BOLTZ.CS.CMU.EDU> The CONNECTIONISTS list has grown too large to continue its current mode of operation. There are over 900 addresses in the CMU-maintained mailing list, and several dozen of those are actually redistribution points at various sites around the globe, rather than individuals. So we probably have a couple of thousand readers overall. I think it may be time to switch to a moderated newsgroup. The problems with the current mode of operation are: - Too many misdirected messages. The clueless will always be with us. But wading through dozens of subscribe, unsubscribe, and "please send me a copy of your tech report" messages has become tiresome. - Too much misuse of the list. Requests for people's mailing addresses, or for very elementary information about neural nets, are not appropriate for CONNECTIONISTS, but we keep getting them anyway. - It's too easy to start flamefests or off-topic discussions. - The load on the CMU mailer is incredible. There is a substantial delay in forwarding messages because we have to send out 900 copies of each one. Dozens of these bounce back due to temporary network outages, changes to host names, accounts being shut down at end of term, etc. - The load on the list maintainer(s) is increasing. Most of the time is now spent dealing with bounced mail messages. I propose converting CONNECTIONISTS to a moderated Usenet newsgroup. The moderators will review each message and forward only those that meet the stated criteria for appropriateness. The idea is to keep the list focused on informed technical discussion, plus relevant announcements of conferences, technical reports, and the like. Messages that the moderators deem inappropriate will be rejected. Note that there is already a Usenet newsgroup called comp.ai.neural-nets. This newsgroup is not moderated, and therefore has a very low signal to noise ratio. In other words, it's mostly junk. Messages that aren't appropriate for CONNECTIONISTS can always be sent there, where they will no doubt be eagerly read by thousands of people. For those readers who don't have Usenet access, we will continue to maintain a small mailing list here at CMU so you can continue to get CONNECTIONISTS by email. Most of you do have access to Usenet, and so the only two changes you should observe if we go ahead with this proposal are: (1) the signal to noise ratio will be greatly improved, and (2) you will have to use your system's newsreading software rather than email software to read and/or post to CONNECTIONISTS. **************** We are soliciting comments on this proposed change. Please send them to Connectionists-Request at cs.cmu.edu, where they will be collected by the list maintainer. Don't bother sending brief "yes" votes; we expect that most people already support this plan. But if you want to argue against the plan, or raise technical issues we may have overlooked, then by all means send us your comments. * Do not send them to me directly. Send them to Connectionists-Request. * Do not reply to the whole CONNECTIONISTS list... or else! -- Dave Touretzky and Hank Wan From thodberg at NN.MEATRE.DK Wed Nov 20 06:14:16 1991 From: thodberg at NN.MEATRE.DK (Hans Henrik Thodberg) Date: Wed, 20 Nov 91 12:14:16 +0100 Subject: Subtractive network design Message-ID: <9111201114.AA03016@nn.meatre.dk.meatre.dk> Scott_Fahlman writes (Mon Nov 18) "A subtractive scheme can also lead to a network of about the right complexity, and you cite a couple of excellent studies that demonstrate this. But I don't see why these should be better than additive methods" Well, I agree that what is needed is comparative studies of additive and subtractive methods, so if anybody out there has this, please post it! Meanwhile, I believe that one can get some understanding by appealing to pictures and analogies: My general picture of the mechanisms of a subtractive network design is the following: A network which has learned the training data and is too large is still rather uncontrained. The network is flexible towards rearranging its internal representations in response to some external pressure. This "poly- morphic soup" is now subjected to the pruning. My favourite pruning technique is brute and efficient (but also time-consuming). It removes one connection at a time tentatively. If the error after some retrinign is no worse than before (apart from a small allowable error increase), the connection is considered pruned. Otherwise the network state prior to the removal is reestablished. This gradually forces the network to collaps into simpler networks. It is like an annealing process. By approaching the minimal solution from "above", i.e. from the more complicated networks, one is more likely to find the optimal network, since one is guided by the hopefully wide basin of attraction. Since the basin is not covering everything, one must train and prune with new initial weights/topology (see Int. Journ. Neur. Syst. for more details). A additive method does not have this nice pool of resources cooperating in a plastic manner. Suppose that you were to develop the first car in the world by additive methods. Adding one wheel at a time would not lead you to the Honda Civic, because a one- or two-wheeled Civic would be as bad as a zero-wheeled. However a twenty-wheeled polymorphic monster-car could be pruned to a Civic. Another analogy to subtractive methods is the brainstorming. Out of a wild discussion, where many complicated ideas are flying through the room, can suddenly emerge a simple and beatiful solution to the problem. The additive approach would correspond to a strict analytical incremental thought process. I view the reluctance towards subtractive methods as part of the old discussion between AI and connectionism. We (certainly in Denmark) were brought up with LEGO-bricks, learning that everything can be contructed from its parts. We are not used to projecting solutions out of chaos or complexity. We like to be in control, and it seems like a waste to through away part of your model. ------------------------------------------------------------------ Hans Henrik Thodberg Email: thodberg at nn.meatre.dk Danish Meat Research Institute Phone: (+45) 42 36 12 00 Maglegaardsvej 2, Postboks 57 Fax: (+45) 42 36 48 36 DK-4000 Roskilde, Denmark ------------------------------------------------------------------ From P.Refenes at cs.ucl.ac.uk Wed Nov 20 06:13:34 1991 From: P.Refenes at cs.ucl.ac.uk (P.Refenes@cs.ucl.ac.uk) Date: Wed, 20 Nov 91 11:13:34 +0000 Subject: Subtractive network design In-Reply-To: Your message of "Tue, 19 Nov 91 16:28:12 EST." <91Nov19.162826edt.530@neuron.ai.toronto.edu> Message-ID: You point out (quite correctly) that the validation set only gives a single number. Now, suppose we have a dataset of k training vectors. We divide this dataset into two subsets (N, M) of sizes n, m such that n+m=k. We use the first subset as the training set, and the second subset as the validation set. The only difference between N and M is that N is used during both passes whilst M is only used during the forward pass. My argument is that if we used M for both passes we would still get a better generalisation anyway because we have more points from which to approximate the polynomial, and more constraints to satisfy. The only case in which this is not true is when N is already sufficiently large (and representative) but this is hardly ever the case in practise. You also say: > I think this is wrong because you only get a single number > (when to stop training) from the validation set. So even if > you made the validation contain infinitely many cases, you > would still be limited by the size of the original training > set. My conjecture is that if you used these "infinitely many cases", for both passes (starting with a small network and increasing it gradually until convergence) you would get equally good, and perhaps better generalisation. Paul From frederic at neuretp.biol.ruu.nl Wed Nov 20 08:45:31 1991 From: frederic at neuretp.biol.ruu.nl (frederic@neuretp.biol.ruu.nl) Date: Wed, 20 Nov 91 14:45:31 +0100 Subject: Patents Message-ID: <911120.144531.1852@neuretp.biol.ruu.nl> > > > > I'm not sure if Luis B. Almeida's message is just a joke or something >else, in that case,m I think I can patent the weel. This is one of the most >used things in the world, and not only by computer scientist, but by >everybody, so.... I must chec first if it is patented and in that case, I can >act... > Anyway, I don't know about laws, but I think that nobody should be >able to patent something (algorithm or not) previously used by other people, >who didn't patented it. > > Ignacio Bellido I am glad that someone else thinks the same thing. It is either a joke, or something strange is going on. As I remember what I have heard about patent law, you cannot patent something that has been in the so called 'public domain', i.e. in general public usage (at least in the US). It must be something that is new and original, as well as useful, or a useful extension of a previously patented idea. There is also a condition that it not be 'obvious' to an expert in the field (which is a bit fuzzy, I think). Since the algorithms being refered to Almeida's message are in general 'public' use, I don't think that they would pass inspection of the patent department. If Mr. Almeida would clarify why the above reasoning is wrong, or the conditions for patentability are not what I remember, I would be grateful. >============================================================================ >Ignacio Bellido Fernandez-Montes, >============================================================================ Eric Fredericksen frederic at cs.unc.edu Dept. of Computer Science, UNC-Chapel Hill frederic at neuretp.biol.ruu Dept. of Neuroethology, Rijksuniveriteit Utrecht, The Netherlands From lba at sara.inesc.pt Wed Nov 20 13:18:23 1991 From: lba at sara.inesc.pt (Luis B. Almeida) Date: Wed, 20 Nov 91 17:18:23 -0100 Subject: Patents Message-ID: <9111201818.AA29024@sara.inesc.pt> From at risc.ua.edu Wed Nov 20 14:47:35 1991 From: at risc.ua.edu (@risc.ua.edu) Date: Wed, 20 Nov 91 13:47:35 CST Subject: GA VFSR paper Message-ID: <9111201947.AA01862@galab2.mh.ua.edu> (the following letter was sent to B. Rosen and L. Ingber, but I dropped a copy to connectionist as well, since I felt it might be of interest) Hi, I just finished reading your interesting paper on GA vs. VFSR. You may be interested in a technical report I distribute: Goldberg, D. E. (1990).A note on Boltzmann tournament selection for genetic algorithms and population-oriented simulated annealing (TCGA Report No. 90003). This paper deals with the combination of a SA-like selection procedure with a GA. Although the implementation is rather rough, the idea is provocative. Ultimately, SA is what GA researchers would view as a selection algorithm that should be able to compliment, rather than compete with GAs. This is an interesting area for future research, although I've never been able to get around to experimenting with these ideas. I'd be glad to send you a copy of the report. If it happens to spark any ideas, I'd love to get in on them. BTW, the GA selection scheme you use (roulette wheel selection) is known to be very noisy, and is not generally used in modern GAs. See: %&&##$$ @article{Baker:87, author = "Baker, J. E.", year = "1987", title = "Reducing Bias and Ineffiency in the Selection Algorithms", journal = "Proceedings of the Second International Conference on Genetic Algorithms", pages = "14--21"} Goldberg, D. E., & Deb, K. (1990). A comparative analysis of selection schemes used in genetic algorithms (TCGA Report No. 90007). It would also be interesting to examine more problems, since even DeJong has criticized the concentration of the GA community on his test suite. It would probably be good to consider problems that are constructed with various degrees of deception. See the following papers: Goldberg, D. E. (1988a). Genetic algorithms and Walsh functions: Part I, a gentle introduction (TCGA Report No. 88006). Goldberg, D. E. (1989). Genetic algorithms and Walsh functions: Part II, deception and its analysis (TCGA Report No. 89001). Goldberg, D. E. (1986). Simple genetic algorithms and the minimal, deceptive problem (TCGA Report No. 86003). Take Care, Rob Smith. ------------------------------------------- Robert Elliott Smith Department of Engineering of Mechanics Room 210 Hardaway Hall The University of Alabama Box 870278 Tuscaloosa, Alabama 35487 <> @ua1ix.ua.edu:rob at galab2.mh.ua.edu <> (205) 348-1618 <> (205) 348-8573 ------------------------------------------- From lss at compsci.stirling.ac.uk Wed Nov 20 07:01:29 1991 From: lss at compsci.stirling.ac.uk (Dr L S Smith (Staff)) Date: 20 Nov 91 12:01:29 GMT (Wed) Subject: No subject Message-ID: <9111201201.AA18333@uk.ac.stir.cs.tugrik> Subject: Flames on patents. Can I suggest that what will happen if people patent known and published algorithms is that patenting will simply fall into disrepute. Companies and private individuals will ignore patent law altogether. And the ONLY result will be even more joy for lawyers. I apologise for using bandwidth on this irrelevance. --Leslie Smith From rr at cstr.edinburgh.ac.uk Wed Nov 20 14:10:46 1991 From: rr at cstr.edinburgh.ac.uk (Richard Rohwer) Date: Wed, 20 Nov 91 19:10:46 GMT Subject: Patents Message-ID: <5394.9111201910@cstr.ed.ac.uk> Connectionists is probably not an appropriate forum for flaming about software patents, but I think that many connectionists feel strongly about it, so I would like to suggest a different forum, especially to people living in EEC countries. This is the European League for Programming Freedom list (contact elpf-request at castle.ed.ac.uk). This list is not meant for massive flaming, actually, but is intended for discussion and coordination of letter-writing and press campaigns. You can take positive, useful action even if you can only spare an hour or two to write a few letters. Right now this group is mainly working to soften the implementation by each EEC government of an EEC directive which threatens to pave the way for "Look and Feel" suits in Europe similar to those in the US. Richard Stallman (of GNU fame, rms at edu.mit.ai.gnu) is an important influence in this campaign. He probably can put Americans in touch with similar groups there. Richard Rohwer From thildebr at athos.csee.lehigh.edu Thu Nov 21 09:19:48 1991 From: thildebr at athos.csee.lehigh.edu (Thomas H. Hildebrandt ) Date: Thu, 21 Nov 91 09:19:48 -0500 Subject: Patents In-Reply-To: "Luis B. Almeida"'s message of Wed, 20 Nov 91 17:18:23 -0100 <9111201818.AA29024@sara.inesc.pt> Message-ID: <9111211419.AA12296@athos.csee.lehigh.edu> Dr. Almeida: My hat is off to you for your extremely dry wit! A good measure of wryness is the number of people you fool completely, of which those who were bold enough to post to the net is probably only a small number. I myself was fooled while reading the first 3 or 4 lines. . . . I thought about posting a message saying that I had applied for a patent on the process of constructing a neuron -- with a sufficient admixture of legal and neurobiological jargon to sound convincing. But in light of the recent (orthogonal) discussion regarding the nature of postings which are acceptable on CONNECTIONISTS, I was forced to reconsider. Even so, I would only have been emulating a master. Bravo! Thomas H. Hildebrandt From sun at umiacs.UMD.EDU Thu Nov 21 10:59:52 1991 From: sun at umiacs.UMD.EDU (Guo-Zheng Sun) Date: Thu, 21 Nov 91 10:59:52 -0500 Subject: con etiquete Message-ID: <9111211559.AA28328@neudec.umiacs.UMD.EDU> I agree with Ken that this network should not be a place to broadcast private conversations. Guo-Zheng Sun From shavlik at cs.wisc.edu Thu Nov 21 15:19:02 1991 From: shavlik at cs.wisc.edu (Jude Shavlik) Date: Thu, 21 Nov 91 14:19:02 -0600 Subject: validation sets Message-ID: <9111212019.AA07223@steves.cs.wisc.edu> The question of whether or not validation sets are useful can easily be answered, at least on specific datasets. We have run that experiment and found that devoting some training examples to validation is useful (ie, training on N examples does worse than training on N-k and validating on k). This same issue comes up with decision-tree learners (where the validation set is often called a "tuning set", as it is used to prune the decision tree). I believe there people have also found it is useful to devote some examples to pruning/validating. I think there is also an important point about "proper" experimental methodology lurking in the discussion. If one is using N examples for weight adjustment (or whatever kind of learning one is doing) and also use k examples for selecting among possible final answers, one should report that their testset accuracy resulted from N+k training examples. Here's a brief argument for counting the validation examples just like "regular" ones. Let N=0 and k=. Randomly guess some very large number of answers, return the answer that does best on the validation set. Most likely the answer returned will do well on the testset (and all we ever got from the tuning set was a single number). Certainly our algorithm didnt learn from zero examples! Jude Shavlik University of Wisconsin shavlik at cs.wisc.edu From lissie!botsec7!botsec1!dcl at uunet.UU.NET Thu Nov 21 14:58:26 1991 From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert) Date: Thu, 21 Nov 91 14:58:26 EST Subject: best output representation? Message-ID: <9111211958.AA11111@botsec1.bot.COM> Hi all. Let's say you have a classification problem that can be answered with one of three mutually exclusive alternatives (eg, yes, no, don't care) Of the following output representations: 1). One output value: answer output --------------- +1 +1 (yes) 0 0 (don't care) -1 -1 (no) 2). Two output values: answer output1 output2 ------------------------ +1 +1 -1 0 -1 -1 -1 -1 +1 3). Three output values: answer output1 output2 output3 --------------------------------- +1 +1 -1 -1 0 -1 -1 +1 -1 -1 +1 -1 which is best, and for what reason? Thanks. David Lambert dcl at botsec7.panix.com (best) dcl at object.com dcl at panix.com From MCCAINKW at DUVM.OCS.DREXEL.EDU Fri Nov 22 07:55:25 1991 From: MCCAINKW at DUVM.OCS.DREXEL.EDU (kate McCain) Date: Fri, 22 Nov 91 08:55:25 EDT Subject: BBS Message-ID: I am trying to locate a neural networks related computer conference. Have I reached one?? I have a "meta-interest" in neural networks research -- stemming from my current research devoted to understanding the formal and informal communication structure, subject diversity, etc. in the field. One of the goals of our research is an understanding of the information needs and access problems faced by NN researchers. Contact with a wider range of participants that I have access to in Philadelphia would be valuable. Kate McCain Associate Professor College of Information Studies Drexel University Philadelphia, PA 19104 From kirk at watson.ibm.com Fri Nov 22 09:17:42 1991 From: kirk at watson.ibm.com (Scott Kirkpatrick) Date: Fri, 22 Nov 91 09:17:42 EST Subject: NIPS ticket (LGA-DEN-LGA) Message-ID: A change in travel plans is forcing me to discard a perfectly good, but non-refundable ticket to NIPS. Can anybody use it? 12/2 UA 343 lv LGA 840 am ar DEN 1107 am (in time for two tutorials) 12/8 UA 166 lv DEN 629 pm ar LGA 1159 pm This cost me $260. From tap at ai.toronto.edu Fri Nov 22 14:18:26 1991 From: tap at ai.toronto.edu (Tony Plate) Date: Fri, 22 Nov 1991 14:18:26 -0500 Subject: Flames on patents. In-Reply-To: Your message of Wed, 20 Nov 91 07:01:29 -0500. <9111201201.AA18333@uk.ac.stir.cs.tugrik> Message-ID: <91Nov22.141837edt.569@neuron.ai.toronto.edu> Some people appear to be concerned that known and published algorithms will be patented. This concern is mostly misplaced. There are 3 requirements that an invention must satisfy in order to be patentable: (1) novelty (the invention must be new) (2) utility (must be useful) (3) non-obviousness (to a person versed in the appropriate art) A patent can be invalidated by the existence of "prior art". Any version of an algorithm, used or published before the date of the patent application, constitutes "prior art". A problem with proving the existence of prior art via implementation is that computer programs get deleted, espcially old ones that ran on systems no longer in existence. With published algorithms there is no such problem. Requirement (3) is particularly contentious in the field of patents on algorithms. Someone writes: > I apologise for using bandwidth on this irrelevance. I apologise too, and suggest that people interested in this issue use the newgroup "comp.patents" for further discussion. There has been much information posted in this newsgroup, including lists of software patents, and requests for examples of prior art. Tony Plate From siegel-micah at CS.YALE.EDU Fri Nov 22 16:33:07 1991 From: siegel-micah at CS.YALE.EDU (Micah Siegel) Date: Fri, 22 Nov 91 16:33:07 EST Subject: Analog VLSI mailing list Message-ID: <9111222133.AA18480@SUNED.ZOO.CS.YALE.EDU> *** Please DO NOT forward to other newsgroups or mailing lists *** ANNOUNCING the genesis of a mailing list devoted to the study of analog VLSI and neural networks. Relevant topics will include the instantiation of neural systems and other collective computations in silicon, analog VLSI design issues, analog VLSI design tools, Tech report announcements, etc... The analog-vlsi-nn mailing list has been created in conjunction with Yale University and its planned Center for Theoretical and Applied Neuroscience (CTAN). Please send subscription requests to analog-vlsi-nn-request at cs.yale.edu. To limit the analog-vlsi-nn mailing list to active researchers in the field, the following information must be provided with subscription requests: 1) Full name; 2) Email address; 3) Institutional affiliation; 4) One sentence summary of current research interests. Please direct mail to the appropriate address. Mailing list submissions (only): analog-vlsi-nn at cs.yale.edu Administrative requests/concerns: analog-vlsi-nn-request at cs.yale.edu --Micah Siegel Analog VLSI NN Moderator ======================================================================= Micah Siegel "for life's not a paragraph siegel at cs.yale.edu Yale University And death i think is no parenthesis" e.e.cummings ======================================================================= From zl at guinness.ias.edu Fri Nov 22 16:56:17 1991 From: zl at guinness.ias.edu (Zhaoping Li) Date: Fri, 22 Nov 91 16:56:17 EST Subject: No subject Message-ID: <9111222156.AA12609@guinness.ias.edu> POSTDOCTOROAL POSITIONS IN COMPUTATIONAL NEUROSCIENCE AT ROCKEFELLER UNIVERSITY, NEW YORK We anticipate the opening of one or two positions in computational neuroscience at Rockefeller University. The positions are at the postdoctoroal level for one year, starting in September 1992, with the possibility of renewal for a second year. Interested applicants should send a CV including a statement of their current research interests, and arrange for three letters of recommendation to be sent as soon as possible directly to Prof. Joseph Atick, Institute for Advanced Study, Princeton, NJ 08540. (Note that applications are to be sent to the Princeton address.) From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Nov 22 21:17:46 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 22 Nov 91 21:17:46 EST Subject: best output representation? In-Reply-To: Your message of Thu, 21 Nov 91 14:58:26 -0500. <9111211958.AA11111@botsec1.bot.COM> Message-ID: Hi all. Let's say you have a classification problem that can be answered with one of three mutually exclusive alternatives (eg, yes, no, don't care) Of the following output representations: ... which is best, and for what reason? I would say that the correct answer is "none of the above". What you want to do is use a single output, with +1 meaning "yes" and -1 meaning "no". Take all the "don't care" cases out of the training set, because you don't care what answer they give you. (And you may as well take them out of the test set as well, since they will always be correct.) This will make training faster because (a) the training set is smaller and (b) it allows the net to concentrate all its resources on getting the "do care" cases right. -- Scott Fahlman From zl at guinness.ias.edu Sat Nov 23 14:46:38 1991 From: zl at guinness.ias.edu (Zhaoping Li) Date: Sat, 23 Nov 91 14:46:38 EST Subject: No subject Message-ID: <9111231946.AA13273@guinness.ias.edu> POSTDOCTOROAL POSITIONS IN COMPUTATIONAL NEUROSCIENCE AT ROCKEFELLER UNIVERSITY, NEW YORK We anticipate the opening of one or two positions in computational neuroscience at Rockefeller University. The positions are at the postdoctoroal level for one year, starting in September 1992, with the possibility of renewal for a second year. Interested applicants should send a CV including a statement of their current research interests, and arrange for three letters of recommendation to be sent as soon as possible directly to Prof. Joseph Atick, Institute for Advanced Study, Princeton, NJ 08540. (Note that applications are to be sent to the Princeton address.) From gmk%idacrd at UUNET.UU.NET Sat Nov 23 16:58:27 1991 From: gmk%idacrd at UUNET.UU.NET (Gary M. Kuhn) Date: Sat, 23 Nov 91 16:58:27 EST Subject: Copenhagen: 1992 IEEE Workshop on NN for SP Message-ID: <9111232158.AA05282@> 1992 IEEE Workshop on Neural Networks for Signal Processing. August 31 - September 2, 1992 Copenhagen, Denmark In cooperation with the IEEE Signal Processing Society and sponsored by the Computational Neural Network Center (CONNECT) CALL FOR PAPERS The second of a series of IEEE workshops on Neural Networks for Signal Processing, the first of which was held in Princeton in October 1991, will be held in Copenhagen, Denmark, in August 1992. Papers are solicited for technical sessions on the following topics: System Identification and Spectral Estimation by Neural Networks. Non-linear Filtering by Neural Networks. Pattern Learning Theory and Algorithms. Application-Driven Neural Models. Application to Image Processing and Pattern Recognition. Application to Speech Recognition, Coding and Enhancement. Application to Adaptive Array Processing. Digital/Analog Systems for Signal Processing. Prospective authors are invited to submit 4 copies of extended summaries of no more than 5 pages. The top of the first page of the summary should include a title, authors' names, affiliations, address, telephone and fax numbers, and email address if any. Photo-ready full papers of accepted proposals will be published in a hard bound book by IEEE. General chairs S.Y. Kung Frank Fallside Department of Electrical Engineering Engineering Department Princeton University Cambridge University Princeton, NJ 08544, USA Cambridge CB2 1PZ, UK email: kung at princeton.edu email: fallside at eng.cam.ac.uk Program chair Proceedings Chair John Aasted Sorensen Candace Kamm Electronics Institute, Bldg. 349 Box 1910 Technical University of Denmark Bellcore, 445 South St., Rm. 2E-256 DK-2800 Lyngby, Denmark Morristown, NJ 07960-1910, USA email: jaas at dthei.ei.dth.dk email: cak at thumper.bellcore.com Program Committee Ronald de Beer Jeng-Neng Hwang John E. Moody John Bridle Yu Hen Hu Carsten Peterson Erik Bruun B.H. Juang Sathyanarayan S. Rao Poul Dalsgaard S. Katagiri Peter Salamon Lee Giles Teuvo Kohonen Christian J. Wellekens Lars Kai Hansen Gary M. Kuhn Barbara Yoon Steffen Duus Hansen Benny Lautrup John Hertz Peter Koefoed Moeller Paper submissions and further information: Program Chair Tel: +4545931222 ext. 3895, Fax: +4542880117 Submission of extended summary February 15, l992 Notification of acceptance April 20, l992 Submission of photo-ready paper May 20, l992 From sg at corwin.CCS.Northeastern.EDU Sat Nov 23 17:01:28 1991 From: sg at corwin.CCS.Northeastern.EDU (steve gallant) Date: Sat, 23 Nov 91 17:01:28 -0500 Subject: Patent Fallacies Message-ID: <9111232201.AA02257@corwin.CCS.Northeastern.EDU> The issue of patents seems to have struck somewhat of a raw nerve, so I submit the following list of common fallacies about patents. (Warning: I am not a patent lawyer, so you should consult one before accepting any of the following as legal gospel.) 1. You can patent anything not yet patented / not in the public domain. A patent must be non-obvious to those "skilled in the art" at the time in question. It is also the responsibility of the applicant to submit all known applicable "prior art" regarding the proposed patent. In other words things that most of us know how to do are not properly patentable, regardless of whether they are in the "public domain" (a technical term). 2. Patents prevent free exchange of information. Patents are designed to ENCOURAGE free exchange of information. The basic deal is that if you teach the world something useful, you will be given a large amount of control over the usage of your invention for a limited time, after which everybody will be able to freely use it. It is important to consider the alternative to having patents, namely trade secrets. Anybody opposed to patents on principle should be able to say why they prefer having the information be a trade secret, with no knowledge/access by anybody outside the organization. 3. Patents hinder research. Everybody immediately has knowledge of patented information; trade secrets remain secret. I believe that a recent Supreme Court decision has ruled that patents cannot be used to prevent basic research.(?) 4. You cannot talk about your invention before filing a patent. For US patents, you can file up to 1 year after disclosing your invention. This rule does not apply to foreign patents, but they are so expensive and such a hassle that you should be especially careful (and have very deep pockets) before going down that path. Thus you can tell the world about your method, and still have a year to file for a US patent. 5. Patents make money. The majority of patents granted do not result in the inventor making back legal costs and filing fees. An application that is not granted is a clear loss. 6. Patents favor big corporations. This is a debatable point. If there were no patent protection, anybody who invented something would be giving that invention to whoever wanted to use it -- in many cases, only big corporations would profit from this. On the other hand, patents give the individual researcher some compensation for, and control over, his or her invention. (This has been very useful in my case.) 7. Software / algorithm (process) patents are different than other patents. Another debatable point. If one can get a patent on mixing chemical A with chemical B to make AB, a good fertilizer, how is this different than adding number A to number B + to factor number X more quickly than had been previously possible? It is hard to come up with an issue that applies to patenting software that does not apply to other types of patents. Of course there are some good arguments against software / algorithm (process) patents. It does seem to be true that the patent office is getting over their heads with a lot of this, and therefore letting things slip through that should not be allowed patents. However, this problem is also not unique to software. The above list reflects dozens of hours of working with patent lawyers, but the reader is again cautioned that I am not a patent lawyer. The first rule you should follow when considering patenting something is to consult a patent lawyer. By the way, they tend to be interesting people, with the challenging job of taking very technical information in a variety of fields, understanding it, and turning it into legal-speak. (Patent examiners have even more challenging jobs, the most famous one having been Albert Einstein.) Steve Gallant From ken at cns.caltech.edu Sat Nov 23 22:59:48 1991 From: ken at cns.caltech.edu (Ken Miller) Date: Sat, 23 Nov 91 19:59:48 PST Subject: con etiquete: the struggle continues (or: conetiquete vs. tek sass) Message-ID: <9111240359.AA00211@cns.caltech.edu> A quick summary of some of the responses I've received to my con etiquete note. Though Dave Touretzky's note about making this a moderated newsgroup partially supersedes this, the question remains as to what will be the criteria for acceptable discussions there or here. I got 9 notes saying "thank you/I agree", 1 saying "I disagree with almost everything you say". Other notes brought up new issues, as follows: Four people (dave at cogsci.indiana.edu,tgelder at phil.indiana.edu, thildebr at athos.csee.lehigh.edu,rick at csufres.CSUFresno.EDU), at least two themselves philosophers, opposed my "technical not philosophical" distinction. I responded Maybe the dichotomy philosophical vs. technical was the wrong one. How would you all feel if I had instead made the dichotomy one between speculation (or opinion) and knowledge? Anyone can have their opinion, but after it's expressed nothing has been changed. Nature alone knows the outcome. Whereas an opinion based on real experience or analysis (and analysis I presume could include hard philosophical analysis) is quite worthwhile. So I am talking about setting the threshold much higher between speculation and knowledge, the former being raw and the latter distilled. I should have known that wouldn't make it past the philosophers: Knowledge is interesting, but often too much distilled to carry the insight of the imparter. I think the real distinction we are seeking is the dichotomy between unsupported hypotheses and supported ones. ... I think what you are getting at is that we have a built-in credibility meter, and that its value for some recent postings to the net has been incredibly (pun intended) low. You would like to ask people to apply their own judgement to potential postings (as to how well the idea is supported) before taxing our credibility filters with the same task. thildebr at athos.csee.lehigh.edu The distinction between speculation or opinion and knowledge is not a pragmatically useful one. How do you know whether your opinions amount to "knowledge" and are therefore legitimately expressible on the network? ... As far as I can see, the only relevant distinction in the vicinity is that between carefully thought-out, well-grounded opinion vs raw, crudely-formed, not-well-grounded opinion. If what you really want to say is that we only want the former on the network, then I am in agreement. tgelder at phil.indiana.edu Scott Fahlman (Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU) "certainly agree(s) with setting the threshold higher", but emphasizes the importance to him of opinions of other practitioners: I think that if someone who has thought hard about an issue is willing to share their opinions, that's about the most valuable stuff that goes by. We can get the facts from published work, but in a field like this, opinion is all we have in many cases. The technology of neural nets is more art than science, and it's largely an unwritten art. And sometimes the conventional opinion is wrong or incomplete. If we can expose some of these opinions, and someone can show that they are wrong (or offer good arguments against them), then we all learn something useful. Maybe what we learn is that opinion is divided and it's time to go do some experiments, but that's a form of knowledge as well. What I'd like to ban or strongly discourage are (a) statements of opinion by people who have not yet earned the right to an opinion, (b) any message composed in less than 30 minutes per screenful, and (c) iteration to the point where people end up repeating themselves, just to get the last word in. I don't want to leave your message out there without any rebuttal, because if people were to take it as a consensus statement of etiquette, we'd lose a lot of the discussion that I feel is quite valuable. If there really is widespread support for your position, then the right move is to split the list into an announcement version and a discussion version. ashley at spectrum.cs.unsw.OZ.AU (Ashley Aitken) seems to take a similar position with some different arguments: I understand your point of view with regard "conversation" and "philosophy" but strongly disagree with raising the threshold too much. Let met tell you my situation. I am a graduate student at the University of New South Wales in Sydney, Australia. I am researching biological neural network models of the cortex. Unfortunately, there is no-one else really interested in anything like this in Australia (that I know of). ... Connectionists can give me the chance to listen in to "conversations" between some of the leading researchers in the field. This to me is extremely valuable - it provides an idea of where the research is, where it is going, and also provides a great deal of motivation. I don't think I could keep in touch with the research without it - connectionists is, in a way, my research group*. Sure we don't want to get into silly "philosophical" discussions which lead nowhere (like the ones that appear regularly in the news groups). However, there is a thin line between philosophy of today and research areas and theories of tomorrow. So: I would say there is general but not complete agreement that (1) the threshold has been too low lately, and (2) postings should be well-grounded and supported. The main disagreement seems to be how much room that leaves for experienced people to be offering their otherwise unsupported opinion, or contrariwise to what extent people --- including experienced people --- should restrict themselves to points on which they have specific analysis or experience/experiment to offer. We probably can't resolve these issues except by fiat of the list administrators. I hope that raising the issues, and assuming that everyone will think carefully about them before posting, will increase the signal to noise ratio. On the positive side, *nobody* disagreed that dialogues should take place off the net. Ken From harnad at Princeton.EDU Sun Nov 24 01:07:37 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 24 Nov 91 01:07:37 EST Subject: Discussion I: Reading & Neural Nets Message-ID: <9111240607.AA10474@clarity.Princeton.EDU> Here is the first of two exchanges concerning the Target Article on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4 (retrievable by anonymous ftp from directory pub/harnad on princeton.edu). Further commentary is invited. All contributions will be refereed. Please submit to psyc at pucc.bitnet or psyc at pucc.princeton.edu -- NOT TO THIS LIST. PSYCOLOQUY V2 #9 (2.9.3 Commentary / Coltheart, Skoyles: 345 lines) PSYCOLOQUY ISSN 1055-0143 Sun, 24 Nov 91 Volume 2 : Issue 9.3 2.9.3.1 Commentary on Connectionism, Reading.... / Coltheart 2.9.3.2 Reply to Coltheart / Skoyles ---------------------------------------------------------------------- From: max.coltheart at mrc-applied-psychology.cambridge.ac.uk Subject: 2.9.3.1 Commentary on Connectionism, reading.... / Coltheart Connectionist modeling of human language processing: The case of reading (Commentary on Skoyles Connectionism, Reading and the Limits of Cognition PSYCOLOQUY 2.8.4 1991) Max Coltheart School of Behavioural Sciences Macquarie University Sydney, NSW, Australia max.coltheart at mrc-applied-psychology.cambridge.ac.uk Skoyles (1991) wrote in his Rationale (paragraph 8): "Connectionism shows that nonword reading can be done purely by processes trained on real words without the use of special grapheme-phoneme translation processes." This is not the case. The connectionist model in question, that of Seidenberg & McClelland (1989), reads nonwords very poorly after being trained on words. Besner, Twilley, McCann and Seergobin (1990) tested its reading of various sets of nonwords. The trained model got 51%, 59% and 65% correct; people get around 90%. The Seidenberg and McClelland paper itself does not report what rate of correct reading of nonwords the model can achieve. Skoyles also writes (paragraph 3): "Connectionist (PDP) neural network simulations of reading successfully explain many experimental facts found about word recognition (Seidenberg & McClelland, 1989)" I would like to see a list of facts about reading that the PDP model can explain; even more, I would like to see a list of facts about reading that the traditional non-PDP dual-route model (which uses rules and local representations) cannot explain but which the PDP model can. Here is a list of facts which are all discussed in the Seidenberg & McClelland paper, which can be explained by the dual-route model, but which cannot be explained by the PDP model: 1. People are very accurate at reading aloud pronounceable nonwords. This is done by using grapheme-phoneme rules in a dual-route model. As I've already mentioned, the PDP model is not accurate at reading nonwords aloud, so cannot explain why people are. 2. People are very accurate at deciding whether or not a pronounceable letter string is a real word (lexical decision task). The PDP model is very inaccurate at this: In the paper by Besner et al (1990), it is shown that the model achieves a correct detection rate of about 6% (typical of people) at the expense of a false alarm rate of over 80% (not typical of people). So the PDP model cannot explain why people are so accurate at lexical decision. 3. After brain damage in some people reading is affected in the following way: nonword reading is still normal, but many exception words, even quite common ones, are wrongly read. In addition, the erroneous responses are the ones that would be predicted from applying spelling-sound rules (e.g. reading PINT as if it rhymed with "mint"). This is surface dyslexia; two of the clearest cases are patients MP (Bub, Cancelliere and Kertesz, 1985) and KT (McCarthy and Warrington, 1986). According to the dual-route explanation the lexical route for reading is damaged but the nonlexical (rule-based) route intact. Attempts have been made to simulate this by damaging the trained PDP model (e.g., by deleting hidden units). These attempts have not succeeded. It seems highly unlikely that they ever will succeed: Since the damaged patients are about 95% right at reading nonwords, and the intact model gets only around 60% right, is it likely that any form of "lesion" to the model will make it much BETTER at reading nonwords? 4. After brain damage in some people reading is affected in the following way: Word reading is still good, but nonword reading is very bad. This is phonological dyslexia. A clear case is that of Funnell (1983); her patient could not read any nonwords at all, but achieved scores of around 90% correct in tests of word reading. The dual-route explanation would be that there was abolition of the nonlexical route and sparing of the lexical route. Seidenberg & McClelland appeal to a way (not implemented in their model) of reading from orthography through meaning to phonology. This would of course fail for a meaningless letter string, so anyone reading solely by such a route would be able to read words but not nonwords. The explanation fails, however, because in the case of phonological dyslexia referred to above (Funnell, 1983), the patient also had a semantic impairment and would have shown semantic confusions in reading aloud if he had been reading semantically. He did not make such confusions. Therefore Seidenberg and McClelland's reconciliation of phonological dyslexia with their model cannot be correct. Pinker and Prince (1988) argued that any model which eschews explicit rules and local (word or morpheme) representations would fail to explain the data on children's learning of past tenses. I argue that any model which eschews explicit rules and local (word or morpheme) representations will fail to explain the data on adult skilled reading. NETtalk (Sejnowski and Rosenberg, 1986) might be offered as a counterexample to my claim, but it will not serve this purpose. First, Sejnowski and Rosenberg explicitly state that NETtalk is not meant to be a model of any human cognitive process. Second, perhaps the major computational problem in reading nonwords aloud - coping with the fact that the mapping of letters to phonemes is often many-to-one, so that the words AT, ATE, ACHE and EIGHT all have just two phonemes - is not dealt with by NETtalk. The input upon which the network operates is precoded by hand in such a way that there is always a one-to-one mapping of orthographic symbol to phoneme; so NETtalk does not have to try to solve this problem. 5. References Besner, D., Twilley, L., McCann, R.S. and Seergobin, K. (1990) On the association between connectionism and data: are a few words necessary? Psychological Review, 97, 432-446. Bub, D., Cancelliere, A. and Kertesz, A. (1985) Whole-word and analytic translation of spelling to sound in a non-semantic reader. In Patterson, K., Marshall, J.C. and Coltheart, M. (eds) (1985) Surface Dyslexia: Cognitive and Neuropsychological Studies of Phonological Reading. London: Lawrence ERlbaum Associates Ltd. Funnell, E. (1983) Phonological processes in reading: New evidence from acquired dyslexia. British Journal of Psychology, 74, 159-180. McCarthy, R. and Warrington, E.K. (1986) Phonological reading: phenomena and paradoxes. Cortex, 22, 359-380. Pinker, S. and Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed model of language acquisition. Cognition, 28, 73-194. Seidenberg, M. S. and McClelland, J. l. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568. Sejnowski, T.J. and Rosenberg, C.R. (1986) NETtalk: A parallel network that learns to read aloud (EE and CS Technical Report No. JHU/EECS-86/01). Baltimore, Maryland: Johns Hopkins University. Skoyles J. (1991) Connectionism, Reading and the Limits of Cognition. PSYCOLOQUY 2.8.4. ---------------------------------------------------------------------- From: John R Skoyles Subject: 2.9.3.2 Reply to Coltheart / Skoyles The Success of PDP and the Dual Route Model: Time to Rethink the Phonological Route (Reply to Coltheart) John R. Skoyles Department of Psychology University College London London WC1E 6BT ucjtprs at ucl.ac.uk Max Coltheart makes a good defense of the dual route model (the view that there are separate phonological and nonphonological ways of recognising written words) but he appears to overlook the fact that I am attempting to do the same thing: to defend the existence of more than one route in reading. I am going about this in a completely different manner, however, and Coltheart does not seem not to have spotted either how I do this or the degree to which we agree. My strategy is to take the main alternative to the dual route model -- PDP (a single "route" connectionist [network] account of phonological and nonphonological word recognition) and show that even if PDP is as good as its advocates claim, it is incomplete and needs a separate reading mechanism to come into existence. The problem is that the reading abilities of PDP models need to be tutored using error correction feedback. Where this feedback comes from, however, is left out of PDP accounts of reading. In my target article (Skoyles 1991) I showed that error correction feedback can only exist if there is some process independent of the PDP network which can identify words correctly and so judge whether or not the network has read them correctly. Without this, error correction feedback, and hence the reading abilities of PDP networks, cannot occur. I further show that research on child reading and dyslexia strongly suggests that this independent recognition of written words depends in human readers upon sounding out words and accessing oral knowledge of their pronunciation. Coltheart essentially claims that I need not go so far: PDP simply cannot model the most interesting aspects of reading and so the above argument is premature. I cannot go along with his critique of PDP, although in many ways I would like to (I cannot be alone in longing for the good old days when the dual route model reigned supreme). It is not as easy as Coltheart implies to dismiss the phonological reading abilities shown by PDP networks. They do read a large number of nonwords correctly -- though Coltheart is right to note that they are not as good as skilled readers. Nonetheless, they do read some words correctly which is surprising given that PDP networks lack any specific knowledge of spelling-sound transcoding. These nonword reading skills are important even if they are not as good as those of proficient readers because we can no longer automatically assume that every time people read a nonword they do so using an independent grapheme-phoneme phonological route -- for they might instead be reading them (at least some of the time) by something like a PDP network. My disagreement with Coltheart concerns whether there are one or two kinds of phonological reading -- I suggest at least two exist. The first process is attentive (such as when you have to stop reading War and Peace to work out the pronunciation of the names of the characters). Attentive decoding depends, I suggest, upon rule-like grapheme-phoneme decoding. The second process, nonattentive phonological decoding (when you read monosyllables which happen not to be real words like VIZ), depends, I suggest, upon PDP networks. In contrast to attentive phonological decoding, nonattentive phonological decoding depends on generating phonology using the statistical regularities between spelling and pronunciation that are incidentally acquired by PDP networks when they are learning to read real words. The processes responsible for attentive and nonattentive phonological coding are independent of each other. Both attentive and nonattentive phonological decoding can produce phonological output that can be used to access the oral knowledge of word pronunciation contained in the speech system to identify words (perhaps along with with semantic and sentence context information -- see note 1). The boundary between the two forms of phonological decoding in any individual will depend upon their reading experience and their innate phonological capacities -- a five-year-old will probably only be able to read a monosyllabic nonword attentively, whereas a linguist will have no difficulty nonattentively sounding out obscure polysyllabic Russian names. My difference with Coltheart lies in our respective ways of defining the nonlexical reading route. Coltheart takes it to be a phonological route which reads nonwords through the use of explicit spelling-sound correspondence rules. I instead take it to be primarily a route using phonological decoding processes that can identify words by using the phonological information contained in word spelling to access a reader's oral knowledge of how words sound. Although nonattentive phonological processes can access oral knowledge, I suggest that this is much less likely than the use of attentive processes. If we focus on decoding a spelling to recognise the word behind its pronunciation we are more likely to adopt attentive rather than nonattentive processes as a consequence of stopping and focusing. Thus although we both support the existence of two reading routes, we have very different notions as to what they are. In this context, I will answer Coltheart's points one by one. I paraphrase his criticisms before describing my replies. (1) "PDP models are not very accurate at reading nonwords ... people are." As noted, people use a mix of attentive and nonattentive phonological decoding, whereas PDP networks only stimulate nonattentive ones. (2) "People are very accurate at deciding whether or not a pronounceable letter string is a real word (lexical decision) .. [PDP models are not]." First, the nature of lexical decision is controversial, with some arguing that it involves access to lexical representations and others that it does not (Balota & Chumbley, 1984). In addition, in order for PDP models to simulate lexical decisions, new assumptions are added to them. PDP models are designed to give correct phonological output to a given spelling input and not to make lexical decisions. To model lexical decisions, its modelers have made the additional assumption that back activation from its hidden units to its input units reflects some measure of lexicality. This is an assumption added to the model. hence it could be this assumption as much as the model which is at fault. (3) "Some brain lesions leave people with good nonword reading abilities with damaged lexical word recognition abilities -- surface dyslexia." Fine, such people are relying upon attentive phonological "sounding out" processes; their nonattentive processes are damaged along with their lexical reading processes. (4) "Some brain lesions leave people with good lexical reading abilities with damaged phonological ones -- phonological dyslexia." Unfortunately, acquired phonological dyslexia is rather rare (Funnell's patient, whom Coltheart cites, is nearly a unique case). It is so rare that afflicted individuals might have had phonological reading problems prior to their brain damage (Van Orden, Pennington and Stone, 1990). The difference between Coltheart and myself is that whereas he collapses nonattentive and attentive phonological reading together, I separate them. Can our two positions be tested? I think they can. If I am right, skilled readers should read nonwords with two levels of performance: First, they should display a high level of competence when they are free to use attentive phonological decoding. Second, they should show a lower level of success when they attempt to read nonwords while doing a secondary task which blocks their use of attentive phonological decoding and thereby confines their nonword reading to nonattentive processes. I suggest that this lower level of performance (if it exists) is the one against which PDP simulations of nonword reading should be compared as this should reflect only nonattentive nonword reading -- the phonological ability modeled by PDP simulations of reading. Note 1. It is possible that sentence and other contextual sources of information are used in accessing oral knowledge following phonological decoding: the hearing of words is highly context dependent and so I would expect any "inner ear" identification of words to be likewise. References Balota, D. A., & Chumbley, J. I. (1984). Were are the effects of frequency in visual word recognition tasks? Right where we said they were: Comment on Monsell, Doyle, and Haggard (1989). Journal of Experiment Psychology: General, 111, 231-237. Van Orden, G. C., Stone, G. O. & Pennington, B. F. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychology Review, 97, 488-522. ------------------------------ PSYCOLOQUY is sponsored by the Science Directorate of the American Psychological Association (202) 955-7653 Co-Editors: (scientific discussion) (professional/clinical discussion) Stevan Harnad Perry London, Dean, Cary Cherniss (Assoc Ed.) Psychology Department Graduate School of Applied Graduate School of Applied Princeton University and Professional Psychology and Professional Psychology Rutgers University Rutgers University Assistant Editor: Malcolm Bauer Psychology Department Princeton University End of PSYCOLOQUY Digest ****************************** From harnad at Princeton.EDU Sun Nov 24 01:12:30 1991 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 24 Nov 91 01:12:30 EST Subject: Discussion II: Reading & Neural Nets Message-ID: <9111240612.AA10506@clarity.Princeton.EDU> Subject: Discussion I: Reading & Neural Nets Here is the second of two exchanges concerning the Target Article on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4 (retrievable by anonymous ftp from directory pub/harnad on princeton.edu). Further commentary is invited. All contributions will be refereed. Please submit to psyc at pucc.bitnet or psyc at pucc.princeton.edu -- NOT TO THIS LIST. Subject: PSYCOLOQUY V2 #9 (2.9.4 Commentary: Reilly, Skoyles : 410 lines) PSYCOLOQUY ISSN 1055-0143 Sun, 24 Nov 91 Volume 2 : Issue 9.4 2.9.4.1 Commentary on Skoyles Connectionism, Reading... / Reilly 2.9.4.2 Reply to Reilly / Skoyles ---------------------------------------------------------------------- From: Ronan Reilly ERC Subject: 2.9.4.1 Commentary on Skoyles Connectionism, Reading... / Reilly There's More to Connectionism than Feedforward and Backpropogation (Commentary on Skoyles Connectionism, Reading and the Limits of Cognition PSYCOLOQUY 2.8.4 1991) Ronan Reilly Educational Research Centre St Patrick's College Dublin 9 IRELAND ronan_reilly at eurokom.ie 1. Introduction I think Skoyles has presented a novel idea for modeling the learning of reading. The main aim of this commentary is to answer some of the questions he raised in his preamble, particularly those relating to connectionism, and finally to discuss some work I've done in the area that may provide a starting point for implementing Skoyles's proposal. 2. The Nature of Connectionist Training There are, as I'm sure will be pointed out in other commentaries, more connectionist learning algorithms than error backpropagation and more connectionist learning paradigms than supervised learning. So I am a little puzzled by Skoyles's failure to find any research on issues relating to the nature of error correction feedback. For example, what about the research on reinforcement learning by Barto, Sutton, and Anderson (1983)? In this work, no detailed feedback is provided on the correctness of the output vector. The teaching signal simply indicates whether or not the output was correct. On the issue of delayed error feedback: In order to deal with temporal disparities between input and error feedback, the network has to incorporate some form of memory that preserves sequential information. A standard feedforward network obviously has a memory, but it is one in which the temporal aspect of the input is discarded. Indeed, modelers usually go out of their way to discourage any temporal artifacts in training by randomising the order of input. Elman (1990) devised a simple technique for giving feedforward networks a temporal memory. It involves taking a copy of the activation pattern of the hidden units at time t and using it as input at time t+1, in addition to whatever other input there might be. The weights connecting these copy units (or context units) to the hidden units are themselves modifiable, just like the other weights in the network. Consequently, these weights accrete information about the input sequence in diminishing amounts over a number of preceding time steps. In these simple recurrent networks it is possible, therefore, for the input at time t to affect the output of the network at time t+n, for relatively small n. The corollary to this is that it is possible for error feedback to have an effect on learning at a temporal remove from the input to which it relates. Degraded error feedback is not a problem either. A number of connectionist paradigms have made use of so-called "moving target" learning. This occurs when the teaching vector (and even the input vector) are themselves modified during training. The most recent example of this is the recursive auto-associative memory (RAAM) of Pollack (1990). I won't dwell on the ins and outs of RAAMs, but suffice to say that a key element in the training of such networks is the use of their own hidden unit vectors as both input and teaching vectors. Thus, the network is confronted with a very complex learning task, since every time its weights are changed, the input and teaching vectors also change. Nevertheless, networks such as these are capable of learning successfully. In many ways, the task of the RAAM network is not unlike that of the individual learning to read as characterized by Skoyles. My final word on the topic of connectionist learning algorithms concerns their psychological status. I think it is important to emphasise that many aspects of backpropagation learning are psychologically unrealistic. Apart from the fact that the algorithm itself is biologically implausible, the level of specificity required of the teacher is just not found in most psychological learning contexts. Furthermore, the randomized nature of the training regime and the catastrophic interference that occurs when a network is trained on new associations does not correspond to many realistic learning situations (if any). What is important about connectionist learning is not the learning as such, but what gets learned. It is the nature of the representations embodied in the weight matrix of a network that gives connectionist models their explanatory and predictive power. 3. Phonetic Reading In what follows, I assume that what Skoyles means by "phonetic reading" is phonologically mediated access to word meaning. I don't think it is yet possible to say that phonology plays no role in accessing the meaning of a word. However, Seidenberg (1989) has argued persuasively that much of the evidence in favor of phonological mediation can be accounted for by the simultaneous activation of both orthographic and phonological codes, and none of the evidence addresses the central issue of whether or not access is mediated by phonological codes. Personally, I am inclined to the view that access to meaning among skilled readers is direct from the orthography. I was puzzled by Skoyles's description of the Seidenberg and McClelland (1989) model, first, as a model of reading, and second, as a model of non-phonetic reading. It certainly is not a model of reading, since in the implementation they discuss there is no access to meaning. Furthermore, how can it be considered to be nonphonetic when part of the model's training involves teaching it to pronounce words? In fact, Seidenberg and McClelland's model seems to be a red herring in the context of the issues Skoyles wishes to address. 4. A Modelling Framework I am currently working on modeling the role of phonics in teaching reading using a connectionist framework (Reilly, 1991). The model I've developed might provide a suitable framework for addressing Skoyles's hypothesis. It consists of two components, a speech component which is trained first and learns to map a sequence of phonemes onto a lexical representation. The weights in this network are frozen after training. The second component is a network that maps an orthographic representation onto a lexical representation. This mapping can be either via the hidden units in the speech module (i.e., the phonological route), via a separate set of hidden units (i.e., the direct route), or via both sets of hidden units. I have operationalized different teaching emphases (e.g., phonics vs. whole- word vs. a mixed approach) by allowing or disallowing the training of the weights comprising the two lexical access routes. Preliminary results suggest that a mixed approach gives the best overall word recognition performance, but this has not proved entirely reliable over replications of training with different initial weight settings. I am currently working on various refinements to the model. In addition to providing a testbed for the phonics issue, the model I've outlined might also provide a framework for implementing Skoyles's idea, and it might perhaps help derive some testable hypotheses from it. For example, it would be possible to use the lexical output produced as a result of taking the phonological route as a teaching signal for the direct route. I imagine that this might give rise to distinct forms of word recognition error, forms not found if a "correct" teaching signal were used. 5. Conclusion I think that Skoyles's idea is interesting and worthy of exploration. I feel, however, that his view of current connectionist modeling is somewhat narrow. Contrary to the impression he appears to have, there are connectionist learning architectures and techniques available that address many of the issues he raises. 6. References Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Pollack, J. B. (1990). Recursive distributed representations. Artificial Intelligence, 46, I77-105. Reilly, R. (1991). A Connectionist exploration of the phonics issue in the teaching of reading: Re-using internal representations. In Working notes of AAAI Spring Symposium on connectionist natural language processing. March, 1991, Stanford University, pp. 178-182. Seidenberg, M. S. (1989). Visual word recognition and pronunciation. In W. Marslen-Wilson (Ed.), Lexical representation and process. Cambridge, MA: MIT Press, pp. 25-74. Seidenberg, M. S., & McClelland, J. L. (1989). A distributed developmental model of visual word recognition. Psychological Review, 96, 523-568. ------------------------------ From: John R Skoyles Subject: 2.9.4.2 Reply to Reilly / Skoyles The Limits of Connectionism and Cognition Revisited: All Reading Networks need to be Trained (Reply to Reilly) John R. Skoyles Department of Psychology University College London London WC1E 6BT ucjtprs at ucl.ac.uk 1. The nature of connectionist learning. My argument against connectionist reading consists of two points. First, reading networks do not get a "free meal" -- something for nothing (Skoyles, 1988). To be able to read they have to be trained. To parallel the popular cliche, "junk in, junk out" reading networks depend upon "mappings in, mappings out." What is called "reading" in these networks is a mapping usually from a written word to its pronunciation (but potentially also to its meaning). To get to that state, however, they need to be trained on exemplar mappings -- the reading network does not get the information to make its mappings miraculously from nowhere but from mappings previous given to it. Error-correction is one way of doing exemplar training -- the network can only make an error in the context of a correct output for a given input. (Of course, reading networks create new mappings not given to them, but the information to do so derives from mappings with which they have been previously trained. So in a sense there is a free meal, but the network has to be feed something first.) Second, the proponents of reading networks maintain that there is a free meal by focusing entirely upon the "mappings out," forgetting where they get the "mappings in" to train them. Instead of miracles, I suggest that phonological reading -- identifying a written word -- provides this information. This conjecture fits in with the evidence about phonological reading and learner readers and dyslexia. Improving the skill of a learner reader to identify words from their spelling enhances their progress in learning to read (Adams, 1990). Dyslexics lack phonological abilities and so find it difficult to identify words from their spelling (Snowling, 1987). These two facts make sense if the phonological identification of words is providing the "mappings in" to train the reading network. 1.1. Supervised learning. In my target article (Skoyles 1991) I discussed McClelland and Seidenberg's (1989) model of reading, which uses supervised backpropagation learning. Reilly correctly points out that there is more to connectionism than backpropagation and supervised learning. I focused upon these because they are used by the published models of reading. This does not diminish the generality of my points. For example, Reilly correctly points out that Barto, Sutton, and Anderson (1983) have proposed a model of reinforcement training which contains no detailed information about the correctness of the output vector. As Reilly points out, however, they nonetheless use a teaching signal that indicates whether or not the output was correct. But how would a system training a network know whether or not its output was correct without some independent means of recognising words? My point applies not only to backpropagation but any form of supervised learning (because to tutor the network the supervisor has to know something the network does not). 1.2. Unsupervised learning. My point also applies to unsupervised networks -- for example Boltzmann nets. These are given inputs and are apparently not corrected. There is a stage in Boltzmann training, however, when the network's actual and desired output are calculated to form the objective function, and depending upon this the internal weights in the network are or are not retained. Thus, this unsupervised learning still begs the question of the availability of knowledge regarding the desired output of the network: Without this the objective function cannot be calculated. Although the network may be unsupervised, it is not unregulated. It is given exemplar input and desired outputs. In the case of reading, the desired output will be the correct reading of a written word (its input). But the Boltzmann network cannot by itself know that any reading is correct and hence desired: Something outside the system has to be able to read to do this. In other words the same situation I showed existed with supervised networks, exists for unsupervised ones. 1.3. Auto-associative learning. Reilly raises the possibility of auto-associative learning. Networks using this do not have to feed on information in the form of error-correction nor do they have to correct exemplar input-output pairs supplied from outside because their input doubles as their desired output. I would question, however, whether a network dependent entirely upon auto-associative learning could learn to read. This may work well with categorization skills, but as far as I am aware, not with mapping tasks (such as reading) which involve learning a vocabulary. I would be very interested to see whether anyone can create such a net. Of course, there is no reason a network may not use auto-associative learning in combination with non-autoassociative training. 2. Biological plausibility. I agree with Reilly's observation that backpropagation is biologically implausible. However, new learning procedures have been developed which are biologically feasible (Mazzoni, Anderson & Jordan, 1991). In addition, as noted above, my observation is a general one, which would apply much more widely than just to the cases of backpropagation and supervised learning. Although it is unlikely the networks responsible for reading in the brain use backpropagation, it is likely that they are constrained by the same kind of constraints noted above and in my original target article. 3. Network learning vs the internal representations of networks as objects of interest. I am slightly concerned that Reilly suggests "What is important about connectionist leaning is not the learning as such, but what gets learned. It is the nature of the representations embodied in the weight matrix of a network that gives connectionist model their explanatory and predictive power." This seems an abdication of responsibility. Connectionist models are described as learning models, not representation models. Their authors emphasis that their training is not an incidental means for their creation but something that might enlighten us about the process by which networks are acquired. Reilly's own simulation of reading is concerned not with what gets learnt but with which of three reading instruction methods (whole word, phonic or mixed whole word and phonics) trains reading networks best. In addition, if we get the mechanism by which networks develop wrong, can we be confident that their internal representations are going to be correct, and consequently of interest? 4. Phonetic reading. As I note in my accompanying reply to Coltheart (1991; Skoyles 1991), phonetic reading can mean two things. First, phonological decoding -- something measured by the ability to read nonwords. Second, the identification of written words using information about how they are spelt and orally pronounced. In the latter, a reader uses some kind of phonological decoding to access oral vocabulary to identify words -- so they are associated. However, phonological decoding may be done through several means -- lexical analogies and even to some extent through the reading network (see my comments on this in my reply to Coltheart 1991). But whereas a reading network can phonological decode words, it cannot recognise words by accessing the knowledge we have of how they are pronounced in oral vocabularies. Access to that information through phonological decoding is the critical thing I suggest for training networks -- not the phonological decoding involved. Reilly right points out that Seidenberg and McClelland's (1989) model does not fully cover all aspects of reading, in particular, access to meaning. However, my observations would generalise to reading models which cover this. This is because my observation is about input/output mapping and it does not matter if the output is not phonology but meaning. In this case, phonological reading accesses the meaning of words from oral vocabulary, which is then used to train the semantic output of the reading network. I did not develop this point simply because Seidenberg and McClelland's model, as Reilly notes, does not cover meaning. 5. Reilly's own model Reilly only briefly describes his own contribution to understanding how reading networks develop. I am very interested in his suggestion that "it would be possible to use the lexical output produced as a result of taking the phonological route as a teaching signal for the direct route." As he notes, this might produce "distinct forms of word recognition error." Experiments in this area seem a good idea, though perhaps Reilly's network needs to be refined (he notes that it is not entirely reliable over replications with different initial weights). I would like to see whether his "phonic" route could take more account of the possibility of using units of pronunciation correspondence larger than the phoneme-grapheme one, because children seem to start with larger ones (the sort used in lexical analogies). 6. Conclusion Reilly suggests that learning architectures and techniques are available which address the issues I raised in my original target article. With the exception of Reilly's own model, as I hope I have shown above, this is not true. 7. References Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press. Coltheart, M. (1991) Connectionist modeling of human language processing: The case of reading. PSYCOLOQUY 2.9.3. Mazzoni, P., Anderson, R. A. & Jordan, M. I. (1991). A more biologically plausible learning rule for neural networks. Proceedings of the National Academy of Sciences USA. 88, 4433-4437. Seidenberg, M. S. and McClelland, J. I. (1989). A distributed, developmental model of word recognition and naming, Psychological Review, 96, 523-568. Skoyles, J. R. (1988) Training the brain using neural-network models. Nature 333, 401. Skoyles, J. R. (1991) Connectionism, reading and the limits of cognition. PSYCOLOQUY 2.8.4. Skoyles, J.R. (1991) The success of PDP and the dual route model: Time to rethink the phonological route. PSYCOLOQUY 2.9.3. Snowling, M. (1987). Dyslexia: A cognitive developmental perspective. Oxford: Basil Blackwell. ------------------------------ PSYCOLOQUY is sponsored by the Science Directorate of the American Psychological Association (202) 955-7653 Co-Editors: (scientific discussion) (professional/clinical discussion) Stevan Harnad Perry London, Dean, Cary Cherniss (Assoc Ed.) Psychology Department Graduate School of Applied Graduate School of Applied Princeton University and Professional Psychology and Professional Psychology Rutgers University Rutgers University Assistant Editor: Malcolm Bauer Psychology Department Princeton University End of PSYCOLOQUY Digest ****************************** From gmk%idacrd at uunet.UU.NET Sun Nov 24 12:21:49 1991 From: gmk%idacrd at uunet.UU.NET (Gary M. Kuhn) Date: Sun, 24 Nov 91 12:21:49 EST Subject: Copenhagen: 1992 Workshop on NN for SP Message-ID: <9111241721.AA05809@> 1992 IEEE Workshop on Neural Networks for Signal Processing. August 31 - September 2, 1992 Copenhagen, Denmark In cooperation with the IEEE Signal Processing Society and sponsored by the Computational Neural Network Center (CONNECT) CALL FOR PAPERS The second of a series of IEEE workshops on Neural Networks for Signal Processing, the first of which was held in Princeton in October 1991, will be held in Copenhagen, Denmark, in August 1992. Papers are solicited for technical sessions on the following topics: System Identification and Spectral Estimation by Neural Networks. Non-linear Filtering by Neural Networks. Pattern Learning Theory and Algorithms. Application-Driven Neural Models. Application to Image Processing and Pattern Recognition. Application to Speech Recognition, Coding and Enhancement. Application to Adaptive Array Processing. Digital/Analog Systems for Signal Processing. Prospective authors are invited to submit 4 copies of extended summaries of no more than 5 pages. The top of the first page of the summary should include a title, authors' names, affiliations, address, telephone and fax numbers, and email address if any. Photo-ready full papers of accepted proposals will be published in a hard bound book by IEEE. General chairs S.Y. Kung Frank Fallside Department of Electrical Engineering Engineering Department Princeton University Cambridge University Princeton, NJ 08544, USA Cambridge CB2 1PZ, UK email: kung at princeton.edu email: fallside at eng.cam.ac.uk Program chair Proceedings Chair John Aasted Sorensen Candace Kamm Electronics Institute, Bldg. 349 Box 1910 Technical University of Denmark Bellcore, 445 South St., Rm. 2E-256 DK-2800 Lyngby, Denmark Morristown, NJ 07960-1910, USA email: jaas at dthei.ei.dth.dk email: cak at thumper.bellcore.com Program Committee Ronald de Beer Jeng-Neng Hwang John E. Moody John Bridle Yu Hen Hu Carsten Peterson Erik Bruun B.H. Juang Sathyanarayan S. Rao Poul Dalsgaard S. Katagiri Peter Salamon Lee Giles Teuvo Kohonen Christian J. Wellekens Lars Kai Hansen Gary M. Kuhn Barbara Yoon Steffen Duus Hansen Benny Lautrup John Hertz Peter Koefoed Moeller Paper submissions and further information: Program Chair Tel: +4545931222 ext. 3895, Fax: +4542880117 Submission of extended summary February 15, l992 Notification of acceptance April 20, l992 Submission of photo-ready paper May 20, l992 From kris at psy.gla.ac.uk Thu Nov 21 12:33:08 1991 From: kris at psy.gla.ac.uk (Kris Doing) Date: Thu, 21 Nov 91 17:33:08 GMT Subject: Roommate for NIPS Message-ID: <5473.9111211733@buzzard.psy.glasgow.ac.uk> Dear Connectionists, I am looking for a roommate for the NIPS Conference in Denver, Sunday to Thursday only. Unfortunately I cannot recieve email after Friday 22 Nov. If you are interested please contact me at may parents house: Kristina Doing Harris c/o Park Doing 4411 Shady Crest Drive Kettering, Ohio 45459 tel : 513-433-4336 Hope to hear from someone, Kris Doing Harris University of Glasgow, Scotland From jbower at cns.caltech.edu Mon Nov 25 13:01:38 1991 From: jbower at cns.caltech.edu (Jim Bower) Date: Mon, 25 Nov 91 10:01:38 PST Subject: Postdoctoral work at Caltech Message-ID: <9111251801.AA01995@cns.caltech.edu> -------------------------------------------------------------------- Postdoctoral Position in Computational Neurobiology Computation and Neural Systems Program Caltech A Post-doctoral position in the laboratory of Dr. Gilles Laurent is available for up to 3 years Applicants should have experience in modelling techniques and be interested in general problems of sensory-motor integration and/or single neuron computation. One possible project would focus on somatosensory processing in insects emaphasizing the architecture of local circuits comprised of a few hundred identified neurons. These circuits are composed of 4 layers of neurons (sensory, interneuronal 1, interneuronal 2, motor), with a large degree of convergence and no known internal feedback connections. The task which these circuits perform is the mediation of leg reflexes, and the adaptation of these reflexes to external inputs or to internal constraints (e.g. centrally generated rhythm). The second possible project would focus on the integrative properties of the 2 classes of local interneurons in those circuits. Both classes lack an axon (they are local neurons), but the first ones use action potentials whereas the second use graded potentials as modes of intra- and inter-cellular communication. The hypothesis which we are trying to test experimentally is that graded processing allows compartmentalization of function, thereby increasing the computational capabilities of single neurons. For further information contact: Gilles Laurent Biology Division CNS Program, MS 139-74 Caltech Pasadena, CA 91125 (818) 397-2798 laurent at delphi.caltech.edu. From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Mon Nov 25 14:28:50 1991 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Mon, 25 Nov 91 14:28:50 EST Subject: Patent Fallacies In-Reply-To: Your message of Sat, 23 Nov 91 17:01:28 -0500. <9111232201.AA02257@corwin.CCS.Northeastern.EDU> Message-ID: This is not the proper forum for a discussion of software patents, so I'll keep this very brief: Before accepting Steve Gallant's mostly pro-patent analysis at face value, you should contact the League for Programming Freedom, "league at prep.ai.mit.edu", and ask them to send you their position papers on the subject, which they will send by E-mail. This is a good presentation of the arguments for elimination of software/algorithm patents. (They also have a position statement opposing look-and-feel copyrights). -- Scott Fahlman From ctika01 at mailserv.zdv.uni-tuebingen.de Mon Nov 25 14:52:43 1991 From: ctika01 at mailserv.zdv.uni-tuebingen.de (George Kampis) Date: Mon, 25 Nov 91 20:52:43 +0100 Subject: Special Issue on Emergence II Message-ID: <9111251952.AA07602@mailserv.zdv.uni-tuebingen.de > If you get this more than once, I apologize. There have been numerous requests on the exact co-ordinates of a journal Special Issue I announced a time ago. *************************************************************************** SPECIAL ISSUE ON EMERGENCE of World Futures: the Journal of General Evolution *************************************************************************** The WF Spec Issue can be bought for USD 38 as a book, it's ISBN 2-88124-526-9, order from Gordon and Breach, POBox 786 Cooper Station New York, NY 10276 USA phone (212) 206 8900 FAX (212) 645 2459 George Kampis kampis at mailserv.zdv.uni-tuebingen.de From giles at research.nec.com Mon Nov 25 15:35:56 1991 From: giles at research.nec.com (Lee Giles) Date: Mon, 25 Nov 91 15:35:56 EST Subject: recurrent higher order neural networks Message-ID: <9111252035.AA01060@fuzzy.nec.com> Regarding higher order recurrent nets: John Kolen mentions: ***************************************** Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) ********************************************** For other references on higher-order recurrent nets, see the following: (This list is not meant to be inclusive, but to give some flavor of the diversity of work in this area.) Y.C. Lee, et.al,1986, Physica D. H.H. Chen, et.al, 1986, AIP conference proceedings on Neural Networks for Computing F. Pineda, 1988, AIP conference proceedings for NIPS Psaltis, et.al, 1988, Neural Networks. Giles, et al. 1990, NIPS2; and 1991 IJCNN proceedings Mozer and Bachrach, Machine Learning 1991 Hush, et.al., 1991 Proceedings for Neural Networks for Signal Processing. Watrous and Kuhn, 1992 Neural Computation In particular the papers by Giles, et.al use a 2nd order RTRL to learn grammars from grammatical strings. (Similar work has been done by Watrous and Kuhn.) What may be of interest is that using a heuristic extraction method, one can extract the grammar that the recurrent network learns (or is learning). It's worth noting that higher-order nets usually include sub-orders as special cases, i.e. 2nd includes 1st. In addition, sigma-pi units are just a subset of higher-order models and in many cases do not have the computational power of higher-order models. C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From wray at ptolemy.arc.nasa.gov Mon Nov 25 21:04:20 1991 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Mon, 25 Nov 91 18:04:20 PST Subject: validation sets In-Reply-To: Jude Shavlik's message of Thu, 21 Nov 91 14:19:02 -0600 <9111212019.AA07223@steves.cs.wisc.edu> Message-ID: <9111260204.AA02393@ptolemy.arc.nasa.gov> Jude Shavlik says: > The question of whether or not validation sets are useful can easily be > answered, at least on specific datasets. We have run that experiment and > found that devoting some training examples to validation is useful (ie, > training on N examples does worse than training on N-k and validating on k). > > This same issue comes up with decision-tree learners (where the validation set > is often called a "tuning set", as it is used to prune the decision tree). I > believe there people have also found it is useful to devote some examples to > pruning/validating. Sorry Jude but I couldn't let this one slip by. Use of a validation set in decision-tree learners produces great results ONLY when you have LOTS and LOTS of data. When you have less data, Cross validation or use of a well put together complexity/penalty term (i.e. carefully thought out MDL, weight decay/elimination, Bayesian maximum posterior, regularization, etc. etc. etc.) works much better. If the penalty term isn't well thought out (e.g. the early stuff on feed-forward networks such as weight decay/elimination was still toying with a new idea, so I'd call these not well thought out, although revolutionary for the time) then performance isn't as good. Best results with trees are obtained so far from doing "averaging", i.e. probabilistically combining the results from many different trees. i.e. experimental confirmation of the COLT-91 Haussler et al. style of results. NB. good penalty terms are discussed in Nowlan & Hinton, Buntine & Weigend and MacKay, and probably in lots of other places ... Jude's comments: > found that devoting some training examples to validation is useful (ie, > training on N examples does worse than training on N-k and validating on k). Only applies because they haven't included a reasonable penalty term. Get with it guys! > I think there is also an important point about "proper" experimental > methodology lurking in the discussion. If one is using N examples for weight > adjustment (or whatever kind of learning one is doing) and also use k examples > for selecting among possible final answers, one should report that their > testset accuracy resulted from N+k training examples. There's an interesting example of NOT doing this properly recently in the machine learning journal. See Mingers in Machine Learning 3(4), 1989, then see our experimental work in Buntine and Niblett 7, 1992. Mingers produced an otherwise *excellent* paper, but produced perculiar results (to those experienced in the area) because of mixing the "tuning set" with the "validation set". Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 269-2 fax: (415) 604 3594 Moffett Field, CA, 94035 email: wray at kronos.arc.nasa.gov From slehar at park.bu.edu Tue Nov 26 00:23:29 1991 From: slehar at park.bu.edu (slehar@park.bu.edu) Date: Tue, 26 Nov 91 00:23:29 -0500 Subject: Patent Fallacies In-Reply-To: connectionists@c.cs.cmu.edu's message of 25 Nov 91 18:13:09 GM Message-ID: <9111260523.AA08966@alewife.bu.edu> Thanks to Steve Gallant for a very clear and convincing explaination of the patent issue. I was particularly impressed with the argument that Patents are designed to ENCOURAGE free exchange of information, which was a new concept for me. I have a couple of questions still- when you say that The majority of patents granted do not result in the inventor making back legal costs and filing fees, is this because inventors have an unreasonably high esteem for their own creation and thus tend to patent things that should not have been patented? Or is this just "insurance" to cover the uncertainty of the prospects for the product, and constitutes a proper cost of the business of inventing? Or is it a way to purchase prestige for the organization that pays for the patent? How much do patents typically cost, and where does that money really go to? Is this another tax, or are we really getting value for our money? You say that a patent must be non-obvious to those "skilled in the art". What if somebody releases some software to public domain as free software, and which is clearly the work of a genius? After release, can somebody else "steal" the idea and patent it for themselves, or is the public release sufficient education to those "skilled in the art" as to render it henceforth obvious and thereby unpatentable? Finally, is there not a growing practical issue that as things become easier to copy, the patent and copyright laws become progressively more difficult to enforce? Unenforcable laws are worse than useless, because they stimulate the spread of intrusive police measures and legal expenses in a futile attempt to stop the unstoppable. Computer software is currently the toughest problem in this regard, but the recent digital tape fiasco and growing problem of illegal photocopying are just the beginning- what happens when a patented life form gets bootlegged and starts replicating itself at will? Will the advance of technology not eventually make all copyrights and most patents worthless? From mmdf at gate.fzi.de Sun Nov 24 11:31:07 1991 From: mmdf at gate.fzi.de (FZI-mmdfmail) Date: Sun, 24 Nov 91 16:31:07 GMT Subject: resend Algorithms for Principal Components Analysis Message-ID: Ray, Over the past few years there has been a great deal of interest in recursive algorithms for finding eigenvectors or linear combinations of them. Many of these algorithms are based on the Oja rule (1982) with modifications to find more than a single output. As might be expected, so many people working on a single type of algorithm has led to a certain amount of duplication of effort. Following is a list of the papers I know about, which I'm sure is incomplete. Anyone else working on this topic should feel free to add to this list! Cheers, Terry Sanger @article{sang89a, author="Terence David Sanger", title="Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network", year=1989, journal="Neural Networks", volume=2, pages="459--473"} @incollection{sang89c, author="Terence David Sanger", title="An Optimality Principle for Unsupervised Learning", year=1989, pages="11--19", booktitle="Advances in Neural Information Processing Systems 1", editor="David S. Touretzky", publisher="Morgan Kaufmann", address="San Mateo, {CA}", note="Proc. {NIPS'88}, Denver"} @article{sang89d, author="Terence David Sanger", title="Analysis of the Two-Dimensional Receptive Fields Learned by the Generalized {Hebbian} Algorithm in Response to Random Input", year=1990, journal="Biological Cybernetics", volume=63, pages="221--228"} @misc{sang90c, author="Terence D. Sanger", title="Optimal Hidden Units for Two-layer Nonlinear Feedforward Neural Networks", year=1991, note="{\it Int. J. Pattern Recognition and AI}, in press"} @inproceedings{broc89, author="Roger W. Brockett", title="Dynamical Systems that Sort Lists, Diagonalize Matrices, and Solve Linear Programming Problems", booktitle="Proc. 1988 {IEEE} Conference on Decision and Control", publisher="{IEEE}", address="New York", pages="799--803", year=1988} @ARTICLE{rubn90, AUTHOR = {J. Rubner and K. Schulten}, TITLE = {Development of Feature Detectors by Self-Organization}, JOURNAL = {Biol. Cybern.}, YEAR = {1990}, VOLUME = {62}, PAGES = {193--199} } @INCOLLECTION{krog90, AUTHOR = {Anders Krogh and John A. Hertz}, TITLE = {Hebbian Learning of Principal Components}, BOOKTITLE = {Parallel Processing in Neural Systems and Computers}, PUBLISHER = {Elsevier Science Publishers B.V.}, YEAR = {1990}, EDITOR = {R. Eckmiller and G. Hartmann and G. Hauske}, PAGES = {183--186}, ADDRESS = {North-Holland} } @INPROCEEDINGS{fold89, AUTHOR = {Peter Foldiak}, TITLE = {Adaptive Network for Optimal Linear Feature Extraction}, BOOKTITLE = {Proc. {IJCNN}}, YEAR = {1989}, PAGES = {401--406}, ORGANIZATION = {{IEEE/INNS}}, ADDRESS = {Washington, D.C.}, MONTH = {June} } @MISC{kung90, AUTHOR = {S. Y. Kung}, TITLE = {Neural networks for Extracting Constrained Principal Components}, YEAR = {1990}, NOTE = {submitted to {\it IEEE Trans. Neural Networks}} } @article{oja85, author="Erkki Oja and Juha Karhunen", title="On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix", journal="J. Math. Analysis and Appl.", volume=106, pages="69--84", year=1985} @book{oja83, author="Erkki Oja", title="Subspace Methods of Pattern Recognition", publisher="Research Studies Press", address="Letchworth, Hertfordshire UK", year=1983} @inproceedings{karh84b, author="Juha Karhunen", title="Adaptive Algorithms for Estimating Eigenvectors of Correlation Type Matrices", booktitle="{Proc. 1984 {IEEE} Int. Conf. on Acoustics, Speech, and Signal Processing}", publisher="{IEEE} Press", address="Piscataway, {NJ}", year=1984, pages="14.6.1--14.6.4"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @inproceedings{karh82, author="Juha Karhunen and Erkki Oja", title="New Methods for Stochastic Approximation of Truncated {Karhunen-Lo\`{e}ve} Expansions", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="{Springer}-{Verlag}", address="{NY}", month="October", pages="550--553"} @inproceedings{oja80, author="Erkki Oja and Juha Karhunen", title="Recursive Construction of {Karhunen-Lo\`{e}ve} Expansions for Pattern Recognition Purposes", booktitle="{Proc. 5th Int. Conf. on Pattern Recognition}", publisher="Springer-{Verlag}", address="{NY}", year=1980, month="December", pages="1215--1218"} @inproceedings{kuus82, author="Maija Kuusela and Erkki Oja", title="The Averaged Learning Subspace Method for Spectral Pattern Recognition", booktitle="{Proc. 6th Int. Conf. on Pattern Recognition}", year=1982, publisher="Springer-{Verlag}", address="{NY}", month="October", pages="134--137"} @phdthesis{karh84, author="Juha Karhunen", title="Recursive Estimation of Eigenvectors of Correlation Type Matrices for Signal Processing Applications", school="Helsinki Univ. Tech.", year=1984, address="Espoo, Finland"} @techreport{karh85, author="Juha Karhunen", title="Simple Gradient Type Algorithms for Data-Adaptive Eigenvector Estimation", institution="Helsinki Univ. Tech.", year=1985, number="TKK-F-A584"} @misc{ogaw86, author = "Hidemitsu Ogawa and Erkki Oja", title = "Can we Solve the Continuous Karhunen-Loeve Eigenproblem from Discrete Data?", note = "Proc. {IEEE} Eighth International Conference on Pattern Recognition, Paris", year = "1986"} @article{leen91, author = "Todd K Leen", title = "Dynamics of learning in linear feature-discovery networks", journal = "Network", volume = 2, year = "1991", pages = "85--105"} @incollection{silv91, author = "Fernando M. Silva and Luis B. Almeida", title = "A Distributed Decorrelation Algorithm", booktitle = "Neural Networks, Advances and Applications", editor = "Erol Gelenbe", publisher = "North-Holland", year = "1991", note = "to appear"} From giles at research.nec.com Tue Nov 26 18:03:57 1991 From: giles at research.nec.com (Lee Giles) Date: Tue, 26 Nov 91 18:03:57 EST Subject: Higher-order recurrent neural networks Message-ID: <9111262303.AA03072@fuzzy.nec.com> More references for higher-order recurrent nets and some general comments: John Kolen mentions: ***************************************** Higher order recurrent networks are recurrent networks with higher order connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]). An example of a high order recurent network is Pollack's sequential cascaded networks which appear, I believe, in the latest issue of Machine Learning. This network can be described as two three-dimensional matrices, W and V, and the following equations. O[t] = Sigmoid( (W . S[t]) . I[t]) S[t+1]=Sigmoid( (V . S[t]) . I[t]) where I[t] is the input vector, O[t] is the output vector, and S[t] is the state vector, each at time t. ( . is inner product) ********************************************** For other references on higher-order recurrent nets, see the following: (This list is not meant to be inclusive, but to give some flavor of the diversity of work in this area.) Y.C. Lee, et.al,1986, Physica D. H.H. Chen, et.al, 1986, AIP conference proceedings on Neural Networks for Computing F. Pineda, 1988, AIP conference proceedings for NIPS Psaltis, et.al, 1988, Neural Networks. Giles, et al. 1990, NIPS2; and 1991 IJCNN proceedings, Neural Computation, 1992. Mozer and Bachrach, Machine Learning 1991 Hush, et.al., 1991 Proceedings for Neural Networks for Signal Processing. Watrous and Kuhn, 1992 Neural Computation In particular the work by Giles, et.al. describes a 2nd order forward-propagation RTRL to learn grammars from grammatical strings.* What may be of interest is that using a heuristic extraction method, one can extract the "learned" grammar from the the recurrent network both during and after training. It's worth noting that higher-order nets usually include sub-orders as special cases, i.e. 2nd includes 1st. In addition, sigma-pi units are just a subset of higher-order models and in some cases do not have the computational representative power of higher-order models. For example, the term (using Kolen's notation above) S[i,t] . I[j,t] would have the same weight coefficient in the original sigma-pi notation as the term S[j,t] . I[i,t]. Higher-order notation would distinguish between these terms using the tensor weights W[k,i,j] and W[k,j,i]. *(Similar work has been done by Watrous & Kuhn and Pollack) C. Lee Giles NEC Research Institute 4 Independence Way Princeton, NJ 08540 USA Internet: giles at research.nj.nec.com UUCP: princeton!nec!giles PHONE: (609) 951-2642 FAX: (609) 951-2482 From FJIMENEZ%ANDESCOL.BITNET at BITNET.CC.CMU.EDU Tue Nov 26 19:50:44 1991 From: FJIMENEZ%ANDESCOL.BITNET at BITNET.CC.CMU.EDU (Nestor) Date: Tue, 26 Nov 91 19:50:44 COL Subject: Change address Message-ID: <01GDF0BWLPT49EDE3C@BITNET.CC.CMU.EDU> Hello, can you say me what is the procedure if I want to change the address bitnet? thanks in advance, Nestor Ceron e-mail:fjimenez at andescol.bitnet Universidad de los Andes Santafe de Bogota - Colombia From platt at synaptics.com Tue Nov 26 21:46:55 1991 From: platt at synaptics.com (John Platt) Date: Tue, 26 Nov 91 18:46:55 PST Subject: Neural Architect Position Offered Message-ID: <9111270246.AA27607@synaptx.synaptics.com> **********************DO NOT FORWARD TO OTHER BBOARDS************************* **********************DO NOT FORWARD TO OTHER BBOARDS************************* NEURAL NETWORK ARCHITECT WANTED Synaptics, Inc., is a small and growing neural network company, located in San Jose, California. We develop neural network architectures and analog VLSI chips to sense and process real-world data. Our architectures and unique hardware solutions enable our customers to create state-of-the-art systems in many different fields. There is an opening at Synaptics for a neural network architect. The job will consist of creating network architectures for real-world applications, such as optical character recognition. The architect will need to develop programs to train and test these architectures on real data. The architect will also have to map the architectures onto existing or new analog VLSI chips. Applicants should have a strong background in programming. Experience in C++ or LISP are especially valuable. Applicants should also be familiar with current research in neural networks and have experience in applying network models to real problems. Experience with VLSI (analog or digital) is desirable, but not necessary. Applicants should be multi-disciplinary, thorough experimentalists. They should be enthusiastic about neural networks, working with other researchers, inventing new ideas, and building a successful company by meeting customers' needs. If you are interested in this position, please send your resume to John Platt Synaptics 2860 Zanker Road, Suite 206 San Jose, CA 95134 or send a postscript or plain text resume to platt at synaptics.com I will be away at NIPS until Dec 10, so please do not expect an immediate reply. From ang at hertz.njit.edu Tue Nov 26 22:03:40 1991 From: ang at hertz.njit.edu (Nirwan Ansari, 201-596-3670) Date: Tue, 26 Nov 1991 22:03:40 -0500 Subject: Wavelet Symposium Message-ID: <9111270303.AA03913@hertz.njit.edu> The following symposium on wavelets might arouse interest in the Neural Networks community. ********************************************************************* New Jersey Institute of Technology Department of Electrical and Computer Engineering Center for Communications and Signal Processing Research presents One-day Symposium on MULTIRESOLUTION IMAGE AND VIDEO PROCESSING: SUBBANDS AND WAVELETS Date: March 20, 1992 (Friday, just before ICASSP week) Place: NJIT, Newark, New Jersey Organizers: A.N. Akansu, NJIT M. Vetterli, Columbia U. J.W. Woods, RPI Program: 08.30-09.00 Registration and Coffee 09.00-09.10 Gary Thomas, Provost, NJIT: Welcoming Remarks 09.10-09.40 Edward H. Adelson, MIT: Steerable, Shiftable Subband Transforms 09.40-10.10 Ali N. Akansu, NJIT: Some Aspects of Optimal Filter Bank Design for Image-Video Coding 10.10-10.40 Arnaud Jacquin, AT&T Bell Labs.: Comparative Study of Different Filterbanks for Low Bit Rate Subband-based Video Coding 10.40-11.00 Coffee Break 11.00-11.30 Ronald Coifman, Yale U.: Adapted Image Coding with Wavelet-packets and Local Trigonometric Waveform Libraries 11.30-12.00 Philippe M. Cassereau, Aware Inc.: Wavelet Based Video Coding 12.00-12.30 Michele Barlaud, Nice U.: Image Coding Using Biorthogonal Wavelet Transform and Entropy Lattice Vector Quantization 12.30-01.30 Lunch 01.30-02.00 Jan Biemond, Delft U.: Hierarchical Subband Coding of HDTV 02.00-02.30 Martin Vetterli, Columbia U.: Multiresolution Joint Source-Channel Coding for HDTV Broadcast 02.30-03.00 John W. Woods, RPI: Compression Coding of Video Subbands 03.00-03.30 Rashid Ansari, Bellcore: Hierarchical Video Coding: Some Options and Comparisons ************************************* Registration Fee: $20, Lunch included Parking will be provided EARLY REGISTRATION ADVISED ************************************** For Early Registration: Send your check to(payable to NJIT/CCSPR) A.N. Akansu NJIT ECE Dept. University Heights Newark, NJ 07102 Tel:201-5965650 email:ali at hertz.njit.edu DIRECTIONS TO NJIT ****************** GARDEN STATE PARKWAY: Exit 145 to Route 280 East. Exit King Blvd. At traffic light turn right. Third traffic light is Central Ave. Turn right. One short block, turn left onto Summit Ave. Stop at guard house for parking directions. ROUTE 280 EAST: Follow directions outlined above. NEW JERSEY TURNPIKE: Exit 15W to Route 280 West. Stay in right-hand lane after metal bridge. Broad Street is second exit(landmarks-RR station on left, church spire on right). Turn left at foot of ramp. One short block to stop sign. Turn left onto King Blvd. At 4th light, turn right onto Central Ave. (stay left). One short block, turn left onto Summit Ave. Drive to guard house for parking directions.(If you miss the Broad St. exit, get off at Clinton Ave. Turn left at foot of ramp; left onto Central Ave; right onto Summit Ave.) ROUTE 280 WEST: Follow directions outlined above. FROM NEWARK AIRPORT: Take a taxi to NJIT campus. We are next to Rutgers Newark Campus. HAVE A SAFE TRIP! From pratt at cs.rutgers.edu Wed Nov 27 15:33:37 1991 From: pratt at cs.rutgers.edu (pratt@cs.rutgers.edu) Date: Wed, 27 Nov 91 15:33:37 EST Subject: Subtractive methods / Cross validation (includes summary) Message-ID: <9111272033.AA13154@rags.rutgers.edu> Hi, FYI, I've summarized the recent discussion on subtractive methods below. A couple of comments: o [Ramachandran and Pratt, 1992] presents a new subtractive method, called Information Measure Based Skeletonisation (IMBS). IMBS induces a decision tree hidden unit hyperplanes in a learned network in order to detect which are superfluous. Single train/test holdout experiments on three real-world problems (Deterding vowel recognition, Peterson-Barney vowel recognition, heart disease diagnosis) indicate that this method doesn't degrade generalization scores while it substantially reduces hidden unit counts. It's also very intuitive. o There seems to be some confusion between the very different goals of: (1) Evaluating the generalization ability of a network, and (2) Creating a network with the best possible generalization performance. Cross-validation is used for (1). However, as P. Refenes points out, once the generalization score has been estimated, you should use *all* training data to build the best network possible. --Lori @incollection{ ramachandran-92, MYKEY = " ramachandran-92 : .con .bap", EDITOR = "D. S. Touretzky", BOOKTITLE = "{Advances in Neural Information Processing Systems 4}", AUTHOR = "Sowmya Ramachandran and Lorien Pratt", TITLE = "Discriminability Based Skeletonisation", ADDRESS = "San Mateo, CA", PUBLISHER = "Morgan Kaufmann", YEAR = 1992, NOTE = "(To appear)" } Summary of discussion so far: hht: Hans Henrik Thodberg sf: Scott_Fahlman at sef-pmax.slisp.cs.cmu.edu jkk: John K. Kruschke rs: R Srikanth pr: P.Refenes at cs.ucl.ac.uk gh: Geoffrey Hinton kl: Ken Laws js: Jude Shavlik hht~~: Request for discussion. Goal is good generalisation: achievable hht~~: if nets are of minimal size. Advocates subtractive methods hht~~: over additive ones. Gives Thodberg, Lecun, Weigend hht~~: references. sf~~: restricting complexity ==> better generalization only when sf~~: ``signal components are larger and more coherent than the noise'' sf~~: Describes what cascade correlation does. sf~~: Questions why a subtractive method should be superior to this. sf~~: Gives reasons to believe that subtractive methods might be slower sf~~: (because you have to train, chop, train, instead of just train) jkk~~: Distinguishes between removing a node and just removing its jkk~~: participation (by zeroing weights, for example). When nodes jkk~~: are indeed removed, subtractive schemes can be more expensive, jkk~~: since we are training nodes which will later be removed. jkk~~: Cites his work (w/Mavellan) on schemes which are both additive jkk~~: and subtractive. rs~~: Says that overgeneralization is bad: distinguishes best fit from rs~~: most general fit as potentially competing criteria. pr~~: Points out that pruning techniques are able to remove redundant pr~~: parts of the network. Also points out that using a cross-validation pr~~: set without a third set is ``training on the testing data''. gh~~: Points out that, though you might be doing some training on the testing gh~~: set, since you only get a single number as feedback from it, you aren't gh~~: really fully training on this set. gh~~: Also points out that techniques such as his work on soft-weight sharing gh~~: seem to work noticeably better than using a validation set to decide gh~~: when to stop training. hht~~: Agrees that comparitive studies between subtractive and additive hht~~: methods would be a good thing. Describes a brute-force subtractive hht~~: Argues, by analogy to automobile construction and idea generation, why hht~~: subtractive methods are more appealing than additive ones. ~~pr: Argues that you'd get better generalization if you used more ~~pr: examples for training; in particular not just a subset of all ~~pr: training examples present. ~~kl: Points out the similarity between the additive/subtractive debate ~~kl: and stepwise-inclusion vs stepwise-deletion issues in multiple ~~kl: regression. ~~js: Points out that when reporting the number of examples used for ~~js: training, it's important to include the cross-validation examples ~~js: as well. From mclennan at cs.utk.edu Wed Nov 27 13:20:17 1991 From: mclennan at cs.utk.edu (mclennan@cs.utk.edu) Date: Wed, 27 Nov 91 13:20:17 -0500 Subject: report available Message-ID: <9111271820.AA10236@maclennan.cs.utk.edu> ** Please do not forward to other lists. Thank you. ** The following technical report has been placed in the Neuroprose archives at Ohio State. Ftp instructions follow the abstract. ----------------------------------------------------- Characteristics of Connectionist Knowledge Representation Bruce MacLennan Computer Science Department University of Tennessee Knoxville, TN 37996 maclennan at cs.utk.edu Technical Report CS-91-147* ABSTRACT: Connectionism -- the use of neural networks for knowledge representation and inference -- has profound implications for the representation and processing of information because it provides a fundamentally new view of knowledge. However, its progress is impeded by the lack of a unifying theoretical construct corresponding to the idea of a calculus (or formal system) in traditional approaches to knowledge representation. Such a con- struct, called a simulacrum, is proposed here, and its basic pro- perties are explored. We find that although exact classification is impossible, several other useful, robust kinds of classifica- tion are permitted. The representation of structured information and constituent structure are considered, and we find a basis for more flexible rule-like processing than that permitted by conven- tional methods. We discuss briefly logical issues such as deci- dability and computability and show that they require reformula- tion in this new context. Throughout we discuss the implications for artificial intelligence and cognitive science of this new theoretical framework. * Modified slightly for electronic distribution. ----------------------------------------------------- FTP INSTRUCTIONS Either use Getps script, or do the following: unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get maclennan.cckr.ps.Z ftp> quit unix> uncompress maclennan.cckr.ps.Z unix> lpr maclennan.cckr.ps (or however you print postscript) Note that the postscript version is missing a (nonessential) fig- ure that has been pasted into the hardcopy version. If you need hardcopy, then send your request to: library at cs.utk.edu Your comments are welcome, Bruce MacLennan Department of Computer Science The University of Tennessee Knoxville, TN 37996-1301 (615)974-0994/5067 FAX: (615)974-4404 maclennan at cs.utk.edu From soller%asylum at cs.utah.edu Thu Nov 28 18:18:19 1991 From: soller%asylum at cs.utah.edu (Jerome Soller) Date: Thu, 28 Nov 91 16:18:19 -0700 Subject: Looking for A Roommate at NIPPS(have a room for two to share) Message-ID: <9111282318.AA27788@asylum.utah.edu> I am looking for someone to share a room I have reserved at the NIPPS conference for two people(I am the only one now). It will be at the Sheraton on Sunday, Monday, Tuesday, and Wednesday evenings(Denver Sheraton). The cost is $66.00 (works out to 33.00 per person per night if I can find someone plus local tax). I apologize for the short notice, but some possible roommates fell through. Please respond by e-mail or call me Friday in the middle of the day at (801) 582-1565 ext. 2475 and ask to be connected to my individual extension. Jerome B. Soller  From marshall at cs.unc.edu Fri Nov 29 20:46:55 1991 From: marshall at cs.unc.edu (Jonathan Marshall) Date: Fri, 29 Nov 91 20:46:55 -0500 Subject: Workshop on Self-Organization and Unsupervised Learning in Vision Message-ID: <9111300146.AA19204@marshall.cs.unc.edu> (Please post) PROGRAM: NIPS*91 Post-Conference Workshop on SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION December 6-7, 1991 in Vail, Colorado Workshop Chair: Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Substantial neurophysiological and psychophysical evidence suggests that visual experience guides or directs the formation of much of the fine structure of animal visual systems. Simple unsupervised learning procedures (e.g., Hebbian rules) using winner-take-all or local k-winner networks have been applied with moderate success to show how visual experience can guide the self-organization of visual mechanisms sensitive to low-level attributes like orientation, contrast, color, stereo disparity, and motion. However, such simple networks lack the more sophisticated capabilities needed to demonstrate self-organized development of higher-level visual mechanisms for segmentation, grouping/binding, selective attention, representation of occluded or amodal visual features, resolution of uncertainty, generalization, context-sensitivity, and invariant object recognition. A variety of enhancements to the simple Hebbian model have been proposed. These include anti-Hebbian rules, maximization of mutual information, oscillatory interactions, intraneuronal interactions, steerable receptive fields, pre- vs. post-synaptic learning rules, covariance rules, addition of behavioral (motor) information, and attentional gating. Are these extensions to unsupervised learning sufficiently powerful to model the important aspects of neurophysiological development of higher-level visual functions? Some of the specific questions that the workshop will address are: o Does our visual environment provide enough information to direct the formation of higher-level visual processing mechanisms? o What kinds of information (e.g., correlations, constraints, coherence, and affordances) can be discovered in our visual world, using unsupervised learning? o Can such higher-level visual processing mechanisms be formed by unsupervised learning? Or is it necessary to appeal to external mechanisms such as evolution (genetic algorithms)? o Are there further enhancements that can be made to improve the performance and capabilities of unsupervised learning rules for vision? o What neurophysiological evidence is available regarding these possible enhancements to models of unsupervised learning? o What aspects of the development of visual systems must be genetically pre-wired, and what aspects can be guided or directed by visual experience? o How is the output of an unsupervised network stage used in subsequent stages of processing? o How can behaviorally relevant (sensorimotor) criteria become incorporated into visual processing mechanisms, using unsupervised learning? This 2-day informal workshop brings together researchers in visual neuroscience, visual psychophysics, and neural network modeling. Invited speakers from these communities will briefly discuss their views and results on relevant topics. In discussion periods, we will examine and compare these results in detail. The workshop topic is crucial to our understanding of how animal visual systems got the way they are. By addressing this issue head-on, we may come to understand better the factors that shape the structure of animal visual systems, and we may become able to build better computational models of the neurophysiological processes underlying vision. ---------------------------------------------------------------------- PROGRAM FRIDAY MORNING, December 6, 7:30-9:30 a.m. Daniel Kersten, Department of Psychology, University of Minnesota. "Environmental structure and scene perception: Perceptual representation of material, shape, and lighting" David C. Knill, Center for Research in Learning, Perception, and Cognition, University of Minnesota. "Environmental structure and scene perception: The nature of visual cues for 3-D scene structure" DISCUSSION Edward M. Callaway, Department of Neurobiology, Duke University. "Development of clustered intrinsic connections in cat striate cortex" Michael P. Stryker, Department of Physiology, University of California at San Francisco. "Problems and promise of relating theory to experiment in models for the development of visual cortex" DISCUSSION FRIDAY AFTERNOON, December 6, 4:30-6:30 p.m. Joachim M. Buhmann, Lawrence Livermore National Laboratory. "Complexity optimized data clustering by competitive neural networks" Nicol G. Schraudolph, Department of Computer Science, University of California at San Diego. "The information transparency of sigmoidal nodes" DISCUSSION Heinrich H. Bulthoff, Department of Cognitive and Linguistic Sciences, Brown University. "Psychophysical support for a 2D view interpolation theory of object recognition" John E. Hummel, Department of Psychology, University of California at Los Angeles. "Structural description and self organizing object classification" DISCUSSION SATURDAY MORNING, December 7, 7:30-9:30 a.m. Allan Dobbins, Computer Vision and Robotics Laboratory, McGill University. "Local estimation of binocular optic flow" Alice O'Toole, School of Human Development, The University of Texas at Dallas. "Recent psychophysics suggesting a reformulation of the computational problem of structure-from-stereopsis" DISCUSSION Jonathan A. Marshall, Department of Computer Science, University of North Carolina at Chapel Hill. "Development of perceptual context-sensitivity in unsupervised neural networks: Parsing, grouping, and segmentation" Suzanna Becker, Department of Computer Science, University of Toronto. "Learning perceptual invariants in unsupervised connectionist networks" Albert L. Nigrin, Department of Computer Science and Information Systems, American University. "Using Presynaptic Inhibition to Allow Neural Networks to Perform Translational Invariant Recognition DISCUSSION SATURDAY AFTERNOON, December 7, 4:30-7:00 p.m. Jurgen Schmidhuber, Department of Computer Science, University of Colorado. "Learning non-redundant codes by predictability minimization" Laurence T. Maloney, Center for Neural Science, New York University. "Geometric calibration of a simple visual system" DISCUSSION Paul Munro, Department of Information Science, University of Pittsburgh. "Self-supervised learning of concepts" Richard Zemel, Department of Computer Science, University of Toronto. "Learning to encode parts of objects" DISCUSSION WRAP-UP, 6:30-7:00 ---------------------------------------------------------------------- (Please post) (Please post) PROGRAM: NIPS*91 Post-Conference Workshop on SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION December 6-7, 1991 in Vail, Colorado Workshop Chair: Jonathan A. Marshall Department of Computer Science, CB 3175, Sitterson Hall University of North Carolina, Chapel Hill, NC 27599-3175, U.S.A. 919-962-1887, marshall at cs.unc.edu Substantial neurophysiological and psychophysical evidence suggests that visual experience guides or directs the formation of much of the fine structure of animal visual systems. Simple unsupervised learning procedures (e.g., Hebbian rules) using winner-take-all or local k-winner networks have been applied with moderate success to show how visual experience can guide the self-organization of visual mechanisms sensitive to low-level attributes like orientation, contrast, color, stereo disparity, and motion. However, such simple networks lack the more sophisticated capabilities needed to demonstrate self-organized development of higher-level visual mechanisms for segmentation, grouping/binding, selective attention, representation of occluded or amodal visual features, resolution of uncertainty, generalization, context-sensitivity, and invariant object recognition. A variety of enhancements to the simple Hebbian model have been proposed. These include anti-Hebbian rules, maximization of mutual information, oscillatory interactions, intraneuronal interactions, steerable receptive fields, pre- vs. post-synaptic learning rules, covariance rules, addition of behavioral (motor) information, and attentional gating. Are these extensions to unsupervised learning sufficiently powerful to model the important aspects of neurophysiological development of higher-level visual functions? Some of the specific questions that the workshop will address are: o Does our visual environment provide enough information to direct the formation of higher-level visual processing mechanisms? o What kinds of information (e.g., correlations, constraints, coherence, and affordances) can be discovered in our visual world, using unsupervised learning? o Can such higher-level visual processing mechanisms be formed by unsupervised learning? Or is it necessary to appeal to external mechanisms such as evolution (genetic algorithms)? o Are there further enhancements that can be made to improve the performance and capabilities of unsupervised learning rules for vision? o What neurophysiological evidence is available regarding these possible enhancements to models of unsupervised learning? o What aspects of the development of visual systems must be genetically pre-wired, and what aspects can be guided or directed by visual experience? o How is the output of an unsupervised network stage used in subsequent stages of processing? o How can behaviorally relevant (sensorimotor) criteria become incorporated into visual processing mechanisms, using unsupervised learning? This 2-day informal workshop brings together researchers in visual neuroscience, visual psychophysics, and neural network modeling. Invited speakers from these communities will briefly discuss their views and results on relevant topics. In discussion periods, we will examine and compare these results in detail. The workshop topic is crucial to our understanding of how animal visual systems got the way they are. By addressing this issue head-on, we may come to understand better the factors that shape the structure of animal visual systems, and we may become able to build better computational models of the neurophysiological processes underlying vision. ---------------------------------------------------------------------- PROGRAM FRIDAY MORNING, December 6, 7:30-9:30 a.m. Daniel Kersten, Department of Psychology, University of Minnesota. "Environmental structure and scene perception: Perceptual representation of material, shape, and lighting" David C. Knill, Center for Research in Learning, Perception, and Cognition, University of Minnesota. "Environmental structure and scene perception: The nature of visual cues for 3-D scene structure" DISCUSSION Edward M. Callaway, Department of Neurobiology, Duke University. "Development of clustered intrinsic connections in cat striate cortex" Michael P. Stryker, Department of Physiology, University of California at San Francisco. "Problems and promise of relating theory to experiment in models for the development of visual cortex" DISCUSSION FRIDAY AFTERNOON, December 6, 4:30-6:30 p.m. Joachim M. Buhmann, Lawrence Livermore National Laboratory. "Complexity optimized data clustering by competitive neural networks" Nicol G. Schraudolph, Department of Computer Science, University of California at San Diego. "The information transparency of sigmoidal nodes" DISCUSSION Heinrich H. Bulthoff, Department of Cognitive and Linguistic Sciences, Brown University. "Psychophysical support for a 2D view interpolation theory of object recognition" John E. Hummel, Department of Psychology, University of California at Los Angeles. "Structural description and self organizing object classification" DISCUSSION SATURDAY MORNING, December 7, 7:30-9:30 a.m. Allan Dobbins, Computer Vision and Robotics Laboratory, McGill University. "Local estimation of binocular optic flow" Alice O'Toole, School of Human Development, The University of Texas at Dallas. "Recent psychophysics suggesting a reformulation of the computational problem of structure-from-stereopsis" DISCUSSION Jonathan A. Marshall, Department of Computer Science, University of North Carolina at Chapel Hill. "Development of perceptual context-sensitivity in unsupervised neural networks: Parsing, grouping, and segmentation" Suzanna Becker, Department of Computer Science, University of Toronto. "Learning perceptual invariants in unsupervised connectionist networks" Albert L. Nigrin, Department of Computer Science and Information Systems, American University. "Using Presynaptic Inhibition to Allow Neural Networks to Perform Translational Invariant Recognition DISCUSSION SATURDAY AFTERNOON, December 7, 4:30-7:00 p.m. Jurgen Schmidhuber, Department of Computer Science, University of Colorado. "Learning non-redundant codes by predictability minimization" Laurence T. Maloney, Center for Neural Science, New York University. "Geometric calibration of a simple visual system" DISCUSSION Paul Munro, Department of Information Science, University of Pittsburgh. "Self-supervised learning of concepts" Richard Zemel, Department of Computer Science, University of Toronto. "Learning to encode parts of objects" DISCUSSION WRAP-UP, 6:30-7:00 ---------------------------------------------------------------------- (Please post) From D.M.Peterson at computer-science.birmingham.ac.uk Fri Nov 29 10:10:40 1991 From: D.M.Peterson at computer-science.birmingham.ac.uk (D.M.Peterson@computer-science.birmingham.ac.uk) Date: Fri, 29 Nov 91 15:10:40 GMT Subject: Cognitive Science at Birmingham Message-ID: ============================================================================ University of Birmingham Graduate Studies in COGNITIVE SCIENCE ============================================================================ The Cognitive Science Research Centre at the University of Birmingham comprises staff from the Departments/Schools of Psychology, Computer Science, Philosophy and English, and supports teaching and research in the inter-disciplinary investigation of mind and cognition. The Centre offers both MSc and PhD programmes. MSc in Cognitive Science The MSc programme is a 12 month conversion course, including a 4 month supervised project. The course places a particular stress on the relation between biological and computational architectures. Compulsory courses: AI Programming, Overview of Cognitive Science, Knowledge Representation Inference and Expert Systems, General Linguistics, Human Information Processing, Structures for Data and Knowledge, Philosophical Questions in Cognitive Science, Human-Computer Interaction, Biological and Computational Architectures, The Computer and the Mind, Current Issues in Cognitive Science. Option courses: Artificial and Natural Perceptual Systems, Speech and Natural Language, Parallel Distributed Processing. It is expected that students will have a good first degree --- psychology, computing, philosophy or linguistics being especially relevant. Funding is available through SERC and HTNT. PhD in Cognitive Science For 1992 studentships are expected for PhD level research into a range of topics including: o computational modelling of emotion o computational modelling of cognition o interface design o computational and psychophysical approaches to vision Computing Facilities Students have access to ample computing facilities, including networks of Hewlett-Packard, Sun and Sparc workstations in the Schools of Computer Science and Psychology. Contact For further details, contact: The Admissions Tutor, Cognitive Science, School of Psychology, University of Birmingham, PO Box 363, Edgbaston, Birmingham B15 2TT, UK. Phone: (021) 414 3683 Email: cogsci at bham.ac.uk From andreu at esaii.upc.es Sat Nov 30 09:48:31 1991 From: andreu at esaii.upc.es (andreu@esaii.upc.es) Date: Sat, 30 Nov 1991 9:48:31 UTC+0200 Subject: Announcement and call for abstracts for Feb. conference Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu>