From stefano at kant.irmkant.rm.cnr.it Mon Feb 1 03:41:40 1993 From: stefano at kant.irmkant.rm.cnr.it (stefano@kant.irmkant.rm.cnr.it) Date: Mon, 1 Feb 1993 02:41:40 -0600 Subject: No subject Message-ID: <9302010841.AA11465@kant.irmkant.rm.cnr.it> The following paper has been placed in the neuroprose archive as nolfi.self-sel.ps.Z Instructions for retrieving and printing follow the abstract. Self-selection of Input Stimuli for Improving Performance Stefano Nolfi Domenico Parisi Institute of Psychology, CNR V.le Marx 15, 00137 Rome - Italy E-mail: stiva at irmkant.Bitnet domenico at irmkant.Bitnet Abstract A system which behaves in an environment can increase its performance level in two different ways. It can improve its ability to react efficiently to any stimulus that may come from the environment or it can acquire an ability to expose itself only to a sub-class of stimuli to which it knows how to respond efficiently. The possibility that a system can solve a task by selecting favourable stimuli is rarely considered in designing intelligent systems. In this paper we show that this type of ability can play a very powerful role in explaining a system's performance. The paper has been published in: G. A. Bekey (1993), Neural Networks and Robotics, Kluwer Academic Publisher. Sorry, no hard copies are available Comments are welcome. Stefano Nolfi Institute of Psychology, CNR V.le Marx, 15 00137 - Rome - Italy email stiva at irmkant.Bitnet _______________________________________________________________________ Here is an example of how to retrieve this file: gvax> ftp archive.cis.ohio-state.edu (or ftp 128.146.8.52) Connected to archive.cis.ohio-state.edu. 220 archive.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password:neuron at wherever 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get nolfi.self-sel.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for nolfi.self-sel.ps.Z 226 Transfer complete. ftp> quit 221 Goodbye. gvax> uncompress nolfi.self-sel.ps.Z gvax> lpr nolfi.self-sel.ps  From sbcho%gorai.kaist.ac.kr at DAIDUK.KAIST.AC.KR Tue Feb 2 11:31:50 1993 From: sbcho%gorai.kaist.ac.kr at DAIDUK.KAIST.AC.KR (Sung-Bae Cho) Date: Tue, 2 Feb 93 11:31:50 KST Subject: Paper Announcement Message-ID: <9302020231.AA01990@gorai.kaist.ac.kr.noname> Feedforward Neural Network Architectures for Complex Classification Problems To appear in the Fuzzy Systems & AI journal (Romanian Academia Publishing House). The idea of this paper was presented at the 2nd International Conference on Fuzzy Logic & Neural Networks, Iizuka-92. Sung-Bae Cho (sbcho at gorai.kaist.ac.kr) and Jin H. Kim Center for Artificial Intelligence Research and Computer Science Department Korea Advanced Institute of Science and Technology 373-1, Koosung-dong, Yoosung-ku, Taejeon 305-701, Republic of Korea Abstract This paper presents two neural network design strategies for incorporating a priori knowledge about a given problem into the feedforward neural networks. These strategies aim at obtaining tractability and reliability for solving complex classification problems by neural networks. The first type strategy based on multistage scheme decomposes the problem into manageable ones for reducing the complexity of the problem, and the second type strategy on multiple network scheme combines incomplete decisions from several copies of networks for reliable decision-making. A preliminary experiment of recognizing on-line handwriting characters confirms the superiority relative to a single large neural network classifier. Key words: neural network architecture design, multistage neural network, multiple neural networks, synthesis method, voting method, expert judgement, handwriting character recognition ----- Now available in the neuroprose archive: archive.cis.ohio-state.edu (128.146.8.52) pub/neuroprose directory under the file name sbcho.nn_architects.ps.Z (compressed PostScript).  From ro2m at crab.psy.cmu.edu Mon Feb 1 11:34:06 1993 From: ro2m at crab.psy.cmu.edu (Randall C. O'Reilly) Date: Mon, 1 Feb 93 11:34:06 EST Subject: 2 pdp.cns TR's available Message-ID: <9302011634.AA06379@crab.psy.cmu.edu.noname> The following two (related) TR's are now available for electronic ftp or by hardcopy. Instructions follow the abstracts. >>> NOTE THAT THE FTP SITE IS OUR OWN, NOT NEUROPROSE <<< Object Recognition and Sensitive Periods: A Computational Analysis of Visual Imprinting Randall C. O'Reilly Mark H. Johnson Technical Report PDP.CNS.93.1 (Submitted to Neural Computation) Abstract: Evidence from a variety of methods suggests that a localized portion of the domestic chick brain, the Intermediate and Medial Hyperstriatum Ventrale (IMHV), is critical for filial imprinting. Data further suggest that IMHV is performing the object recognition component of imprinting, as chicks with IMHV lesions are impaired on other tasks requiring object recognition. We present a neural network model of translation invariant object recognition developed from computational and neurobiological considerations that incorporates some features of the known local circuitry of IMHV. In particular, we propose that the recurrent excitatory and lateral inhibitory circuitry in the model, and observed in IMHV, produces hysteresis on the activation state of the units in the model and the principal excitatory neurons in IMHV. Hysteresis, when combined with a simple Hebbian covariance learning mechanism, has been shown in earlier work to produce translation invariant visual representations. To test the idea that IMHV might be implementing this type of object recognition algorithm, we have used a simple neural network model to simulate a variety of different empirical phenomena associated with the imprinting process. These phenomena include reversibility, sensitive periods, generalization, and temporal contiguity effects observed in behavioral studies of chicks. In addition to supporting the notion that these phenomena, and imprinting itself, result from the IMHV properties captured in the simplified model, the simulations also generate several predictions and clarify apparent contradictions in the behavioral data. ----------------------------------------------------------------------- The Self-Organization of Spatially Invariant Representations Randall C. O'Reilly James L. McClelland Technical Report PDP.CNS.92.5 Abstract: The problem of computing object-based visual representations can be construed as the development of invariancies to visual dimensions irrelevant for object identity. This view, when implemented in a neural network, suggests a different set of algorithms for computing object-based visual representations than the ``traditional'' approach pioneered by Marr, 1981. A biologically plausible self-organizing neural network model that develops spatially invariant representations is presented. There are four features of the self-organizing algorithm that contribute to the development of spatially invariant representations: temporal continuity of environmental stimuli, hysteresis of the activation state (via recurrent activation loops and lateral inhibition in an interactive network), Hebbian learning, and a split pathway between ``what'' and ``where'' representations. These constraints are tested with a backprop network, which allows for the evaluation of the individual contributions of each constraint on the development of spatially invariant representations. Subsequently, a complete model embodying a modified Hebbian learning rule and interactive connectivity is developed from biological and computational considerations. The activational stability and weight function maximization properties of this interactive network are analyzed using a Lyapunov function approach. The model is tested first on the same simple stimuli used in the backprop simulation, and then with a more complex environment consisting of right and left diagonal lines. The results indicate that the hypothesized constraints, implemented in a Hebbian network, were capable of producing spatially invariant representations. Further, evidence for the gradual integration of both featural complexity and spatial invariance over increasing layers in the network, thought to be important for real-world applications, was obtained. As the approach is generalizable to other dimensions such as orientation and size, it could provide the basis of a more complete biologically plausible object recognition system. Indeed, this work forms the basis of a recent model of object recognition in the domestic chick (O'Reilly & Johnson, 1993, TR PDP.CNS.93.1). ----------------------------------------------------------------------- Retrieval information for pdp.cns TRs: unix> ftp 128.2.248.152 # hydra.psy.cmu.edu Name: anonymous Password: ftp> cd pub/pdp.cns ftp> binary ftp> get pdp.cns.93.1.ps.Z # or, and ftp> get pdp.cns.92.5.ps.Z ftp> quit unix> zcat pdp.cns.93.1.ps.Z | lpr # or however you print postscript unix> zcat pdp.cns.92.5.ps.Z | lpr For those who do not have FTP access, physical copies can be requested from Barbara Dorney .  From tresp at inf21.zfe.siemens.de Tue Feb 2 12:29:10 1993 From: tresp at inf21.zfe.siemens.de (Volker Tresp) Date: Tue, 2 Feb 1993 18:29:10 +0100 Subject: paper in neuroprose Message-ID: <199302021729.AA24088@inf21.zfe.siemens.de> The following paper has been placed in the neuroprose archive as tresp.rules.ps.Z Instructions for retrieving and printing follow the abstract. ----------------------------------------------------------------- NETWORK STRUCTURING AND TRAINING USING RULE-BASED KNOWLEDGE ----------------------------------------------------------------- Volker Tresp, Siemens, Central Research Juergen Hollatz, TU Muenchen Subutai Ahmad, Siemens, Central Research Abstract We demonstrate in this paper how certain forms of rule-based knowledge can be used to prestructure a neural network of normalized basis functions and give a probabilistic interpretation of the network architecture. We describe several ways to assure that rule-based knowledge is preserved during training and present a method for complexity reduction that tries to minimize the number of rules and the number of conjuncts. After training, the refined rules are extracted and analyzed. To appear in: S. J. Hanson, J. D. Cowan, and C. L. Giles (Eds.), Advances in Neural Information Processing Systems 5. San Mateo CA: Morgan Kaufmann. ---- Volker Tresp Siemens AG, Central Research, Phone: +49 89 636-49408 Otto-Hahn-Ring 6, FAX: +49 89 636-3320 W-8000 Munich 83, Germany E-mail: tresp at zfe.siemens.de unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get tresp.rules.ps.Z ftp> quit unix> uncompress tresp.rules.ps.Z unix> lpr -s tresp.rules.ps (or however you print postscript)  From denis at psy.ox.ac.uk Wed Feb 3 10:47:36 1993 From: denis at psy.ox.ac.uk (Denis Mareschal) Date: Wed, 3 Feb 93 15:47:36 GMT Subject: visual tracking Message-ID: <9302031547.AA09779@dragon.psych.pdp> Hi, A couple of months ago I sent around a request for further information concerning higher level connectionist approaches to the development of visual tracking. I received a number of replies spanning the broad range of fields in which neural network research is being conducted. I also received a significant number of requests for the resulting compiled list of references. I am thus posting a list of references resulting directly and indirectly from my original request. I have also included a few relevant psychology review papers. Thanks to all those who replied. Clearly this list is not exhaustive and if anyone reading it notices an ommission which may be of interest I would greatly appreciate hearing from them. Cheers, Denis Mareschal Department of Experimental Psychology South Parks Road Oxford University Oxford OX1 3UD maresch at black.ox.ac.uk REFERENCES: Allen, R. B. (1988), Sequential connectionist networks for answering simple questions about a microworld. In: Proceedings of the Tenth Annual Conference of the Cognitive Science Society, pp. 489-495, Hillsdale, NJ: Erlbaum. Baloch, A. A. & Waxman A. M. (1991). Visual learning, adaptive expectations and behavioral conditioning of the mobile robot MAVIN, Neural Networks, vol. 4, pp. 271-302. Buck, D. S. & Nelson D. E. (1992). Applying the abductory induction mechanism (AIM) to the extrapolation of chaotic time series. In: Proceedings of the National Aerospace Electronics Conference (NAECON), 18-22 May, Dayton, Ohio, vol. 3, pp 910-915. Bremner, J. G. (1985). Object tracking and search in infancy: A review of data and a theoretical evaluation, Developmental Review, 5, pp. 371-396 Carpenter, G. A. & Grossberg, S. (1992). Neural Networks for Vision and Image Processing, Cambridge, MA: MIT Press. Cleermans, A., Servan-Schreiber, D. & McClelland, J. L. (1989). Finite state automata and simple recurrent networks, Neural Computation,1, pp 372- 381. Deno, D. C., Keller, E. L. & Crandall, W. F. (1989). Dynamical neural network organization of the visual pursuit system, IEEE Transactions on Biomedical Engineering, vol. 36, pp. 85-91. Dobnikar, A., Likar, A. & Podbregar, D. (1989). Optimal visual tracking with artificial neural network. In: First I.E.E. International Conference on Artificial Neural Networks (conf. Publ. 313), pp 275-279. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, pp. 179-211. Ensley, D. & Nelson, D. E. (1992). Applying Cascade-correlation to the extrapolation of chaotic time series. Proceedings of the Third Workshop on Neural Networks: Academic/Industrial/NASA/Defense; 10-12 February, Auburn, Alabama. Fay, D. A. & Waxman, A. M. (1992). Neurodynamics of real-time image velocity extraction. In: G. A. Carpenter & S. Grossberg (Eds), Neural Networks for Vision and Image Processing, pp 221-246, Cambridge, MA: MIT Press. Gordon, Steele, & Rossmiller (1991). Predicting trajectories using recurrent neural networks. In: Dagli, Kumara, & Shin (Eds), Intelligent Systems Through Artificial Neural Networks, ASME Press. (Sorry that's the best I can do for this reference) Grossberg, S. & Rudd(1989). A neural architecture for visual motion perception: Neural Networks, 2, pp. 421-450. Koch, C. & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, pp. 219-227. Lisberger, S. G., Morris, E. J. & Tychsen, L. (1987). Visual motion processing and sensory-motor integration for smooth pursuit eye movements, Annual Review of Neuroscience, 10, pp. 97-129. Lumer, E., D. (1992). The phase tracker of attention. In: Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pp 962-967, Hillsdale, NJ: Erlbaum. Neilson,P. D., Neilson, M. D. & O'Dwyer, N. J. (1993, in press). What limits high speed tracking performance?, Human Mouvement Science, 12. Nelson, D. E., Ensley, D. D. & Rogers, S. K. (1992). Prediction of chaotic time series using Cascade Correlation: Effects of number of inputs and training set size. In: The Society for Optical Engineering (SPIE), Proceeedings of the Applications of Artificial Neural Networks III Conference, 21-24 April, Orlando, Florida, vol. 1709, pp 823-829. Marshall, J. A. (1990). Self-organizing neural networks for perception of visual motion, Neural Networks, 3, pp. 45-74. Martin, W. N. & Aggarwal, J. K. (Eds) (1988). Motion Understanding: Robot and Human Vision. Boston: Kluwer Academic Publishers. Metzgen, Y. & Lehmann D. (1990). Learning temporal sequences by local synaptic changes, Network, 1, pp. 271-302. Nakayama, K. (1985). Biological image motion processing: A review. Vision Research 25, pp 625-660. Parisi, D., Cecconi, F. & Nolfi, S. (1990). Econets: Neural networks that learn in an environment, Network, 1, pp. 149-168. Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent networks, Neural Computation, 1, pp. 263-269. Regier, T. (1992). The acquisition of lexical semantics for spatial terms: A connectionist model of perceptual categorization. International Computer Science Institute (ICSI) Technical Report TR-92-062, Berkely. Schmidhuber, J. & Huber, R. (1991). Using adaptive sequential neurocontrol for efficient learning of translation and rotation invariance. In: T. Kohonen, K. Makisara, O. Simula & J. Kangas (Eds), Artificial Neural Networks, pp 315-320, North Holland: Elsevier Science. Schmidhuber, J. & Huber, R. (1991). Learning to generate artificial foveal trajectories for target detection. International Journal of Neural Systems, 2, pp. 135-141. Schmidhuber, J. & Wahnsiedler, R. (1992). Planning simple trajectories using neural subgoal generators. Second International Conference on Simulations of Adaptive Behavior (SAB92). (Available by ftp from Jordan Pollack's Neuroprose Archive). Sereno, M. E. (1986). Neural network model for the measurement of visual motion. Journal of the Optical Sociaty of America A, 3, pp 72. Sereno, M. E. (1987). Implementing stages of motion analysis in neural. Program of the Ninth Annual Conference of the Cognitive Science Society, pp. 405-416, Hillsdala, NJ: Erlbaum. Servan-Schreiber, D., Cleermans, A. & McClelland, J. L. (1991). Graded state machines: The representation of temporal contingencies in simple recurrent networks, 7, pp. 161-193. Shimohara, K., Uchiyama T. & Tokunaya Y. (1988). Back propagation networks for event-driven temporal sequence processing. In: IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 665-672, NY: IEEE. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences, Machine Learning, 3, pp 9-44. Tolg, S. (1991). A biological motivated system to track moving objectas by active camera control. In:T. Kohonen, K. Makisara, O. Simula & J. Kangas (Eds), Artificial Neural Networks, pp 1237-1240, North Holland: Elsevier Science. Wechsler, H. (Ed) (1991). Neural Networks for Human and Machine Perception, New York: Academic Press.  From gluck at pavlov.rutgers.edu Wed Feb 3 09:13:20 1993 From: gluck at pavlov.rutgers.edu (Mark Gluck) Date: Wed, 3 Feb 93 09:13:20 EST Subject: Preprint: Computational Models of the Neural Bases of Learning and Memory Message-ID: <9302031413.AA24540@james.rutgers.edu> For (hard copy) preprints of the following article: Gluck, M. A. & Granger, R. C. (1993). Computational models of the neural bases of learning and memory. Annual Review of Neuroscience. 16: 667-706 ABSTRACT: Advances in computational analyses of parallel-processing have made computer simulation of learning systems an increasingly useful tool in understanding complex aggregate functional effects of changes in neural systems. In this article, we review current efforts to develop computational models of the neural bases of learning and memory, with a focus on the behavioral implications of network-level characterizations of synaptic change in three anatomical regions: olfactory (piriform) cortex, cerebellum, and the hippocampal formation. ____________________________________ Send US-mail address to: Mark Gluck (Center for Neuroscience, Rutgers-Newark) gluck at pavlov.rutgers.edu  From robtag at udsab.dia.unisa.it Wed Feb 3 13:22:31 1993 From: robtag at udsab.dia.unisa.it (Tagliaferri Roberto) Date: Wed, 3 Feb 1993 19:22:31 +0100 Subject: course on Hybrid Systems Message-ID: <199302031822.AA08460@udsab.dia.unisa.it> **************** IIASS 1993 February Courses ************** **************** Last Announcement ************** A short course on "Hybrid Systems: Neural Nets, Fuzzy Sets and A.I. Systems" February 9 - 12 Lecturers: Dr. Silvano Colombano, NASA Research Center, CA Prof. Piero Morasso, Univ. Genova, Italia ----------------------------------------------------------------- Dr. Silvano Colombano (4 hours) Introduction: extending the representational power of connectionism The interim approach: hybrid symbolic connectionist systems - Distributed - Localist - Mixed localist and distributed (3 hours) Hybrid Fuzzy Logic connectionist systems - Classification - Control - Reasoning (2 hours) A competing approach: classifier systems Future directions Prof. Piero Morasso (2 hours) Self-organizing Systems and Hybrid Systems Course schedule February 9 3 pm - 6 pm Dr. S. Colombano February 10 3 pm - 6 pm Dr. S. Colombano February 11 3 pm - 6 pm Dr. S. Colombano February 12 3 pm - 5 pm Prof. P. Morasso The course will be held at IIASS, via G. Pellegrino, Vietri s/m (Sa) Italia. Participants will pay their own fare and travel expenses. No fees to be payed. The short course is sponsored by Progetto Finalizzato CNR "Sistemi Informatici e Calcolo Parallelo" and by Contratto quinquennale CNR-IIASS For any information for the short course, please contact the IIASS secretariat I.I.A.S.S Via G.Pellegrino, 19 I-84019 Vietri Sul Mare (SA) ITALY Tel. +39 89 761167 Fax +39 89 761189 or Dr. Roberto Tagliaferri E-Mail robtag at udsab.dia.unisa.it  From uli at ira.uka.de Thu Feb 4 12:06:41 1993 From: uli at ira.uka.de (Uli Bodenhausen) Date: Thu, 04 Feb 93 18:06:41 +0100 Subject: new papers in the neuroprose archive Message-ID: The following papers have been placed in the neuroprose archive as bodenhausen.application_oriented.ps.Z bodenhausen.architectural_learning.ps.Z Instructions for retrieving and printing follow the abstracts. 1.) CONNECTIONIST ARCHITECTURAL LEARNING FOR HIGH PERFORMANCE CHARACTER AND SPEECH RECOGNITION Ulrich Bodenhausen and Stefan Manke University of Karlsruhe and Carnegie Mellon University Highly structured neural networks like the Time-Delay Neural Network (TDNN) can achieve very high recognition accuracies in real world applications like handwritten character and speech recognition systems. Achieving the best possible performance greatly depends on the optimization of all structural parameters for the given task and amount of training data. We propose an Automatic Structure Optimization (ASO) algorithm that avoids time-consuming manual optimization and apply it to Multi State Time-Delay Neural Networks, a recent extension of the TDNN. We show that the ASO algorithm can construct efficient architec tures in a single training run that achieve very high recognition accuracies for two handwritten character recognition tasks and one speech recognition task. (only 4 pages!) To appear in the proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 93, Minneapolis -------------------------------------------------------------------------- 2.) Application Oriented Automatic Structuring of Time-Delay Neural Networks for High Performance Character and Speech Recognition Ulrich Bodenhausen and Alex Waibel University of Karlsruhe and Carnegie Mellon University Highly structured artificial neural networks have been shown to be superior to fully connected networks for real-world applications like speech recognition and handwritten character recognition. These structured networks can be optimized in many ways, and have to be optimized for optimal performance. This makes the manual optimization very time consuming. A highly structured approach is the Multi State Time Delay Neural Network (MSTDNN) which uses shifted input windows and allows the recognition of sequences of ordered events that have to be observed jointly. In this paper we propose an Automatic Structure Optimization (ASO) algorithm and apply it to MSTDNN type networks. The ASO algorithm optimizes all relevant parameters of MSTDNNs automatically and was successfully tested with three different tasks and varying amounts of training data. (6 pages, more detailed than the first paper) To appear in the ICNN 93 proceedings, San Francisco. -------------------------------------------------------------------------- unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get bodenhausen.application_oriented.ps.Z ftp> get bodenhausen.architectural_learning.ps.Z ftp> quit unix> uncompress bodenhausen.application_oriented.ps.Z unix> uncompress bodenhausen.architectural_learning.ps.Z unix> lpr -s bodenhausen.application_oriented.ps (or however you print postscript) unix> lpr -s bodenhausen.architectural_learning.ps Thanks to Jordan Pollack for providing this service!  From moody at chianti.cse.ogi.edu Thu Feb 4 20:38:08 1993 From: moody at chianti.cse.ogi.edu (John Moody) Date: Thu, 4 Feb 93 17:38:08 -0800 Subject: NATO ASI: March 5 Deadline Approaching Message-ID: <9302050138.AA00659@chianti.cse.ogi.edu> As the March 5th application deadline is now four weeks away, I am posting this notice again. NATO Advanced Studies Institute (ASI) on Statistics and Neural Networks June 21 - July 2, 1993, Les Arcs, France Directors: Professor Vladimir Cherkassky, Department of Electrical Eng., University of Minnesota, Minneapolis, MN 55455, tel.(612)625-9597, fax (612)625- 4583, email cherkass at ee.umn.edu Professor Jerome H. Friedman, Statistics Department, Stanford University, Stanford, CA 94309 tel(415)723-9329, fax(415)926-3329, email jhf at playfair.stanford.edu Professor Harry Wechsler, Computer Science Department, George Mason University, Fairfax VA22030, tel(703)993-1533, fax(703)993-1521, email wechsler at gmuvax2.gmu.edu List of invited lecturers: I. Alexander, L. Almeida, A. Barron, A. Buja, E. Bienenstock, G. Carpenter, V. Cherkassky, T. Hastie, F. Fogelman, J. Friedman, H. Freeman, F. Girosi, S. Grossberg, J. Kittler, R. Lippmann, J. Moody, G. Palm, R. Tibshirani, H. Wechsler, C. Wellekens Objective, Agenda and Participants: Nonparametric estimation is a problem of fundamental importance for many applications involving pattern classification and discrimination. This problem has been addressed in Statistics, Pattern Recognition, Chaotic Systems Theory, and more recently in Artificial Neural Network (ANN) research. This ASI will bring together leading researchers from these fields to present an up-to-date review of the current state-of-the art, to identify fundamental concepts and trends for future development, to assess the relative advantages and limitations of statistical vs neural network techniques for various pattern recognition applications, and to develop a coherent framework for the joint study of Statistics and ANNs. Topics range from theoretical modeling and adaptive computational methods to empirical comparisons between statistical and neural network techniques. Lectures will be presented in a tutorial manner to benefit the participants of ASI. A two-week programme is planned, complete with lectures, industrial/government sessions, poster sessions and social events. It is expected that over seventy students (which can be researchers or practitioners at the post-graduate or graduate level) will attend, drawn from each NATO country and from Central and Eastern Europe. The proceedings of ASI will be published by Springer-Verlag. Applications: Applications for participation at the ASI are sought. Prospective students, industrial or government participants should send a brief statement of what they intend to accomplish and what form their participation would take. Each application should include a curriculum vitae, with a brief summary of relevant scientific or professional accomplishments, and a documented statement of financial need (if funds are applied for). Optionally, applications may include a one page summary for making a short presentation at the poster session. Poster presentations focusing on comparative evaluation of statistical and neural network methods and application studies are especially sought. For junior applicants, support letters from senior members of the professional community familiar with the applicant's work would strengthen the application. Prospective participants from Greece, Portugal and Turkey are especially encouraged to apply. Costs and Funding: The estimated cost of hotel accommodations and meals for the two-week duration of the ASI is US$1,600. In addition, participants from industry will be charged an industrial registration fee, not to exceed US$1,000. Participants representing industrial sponsors will be exempt from the fee. We intend to subsidize costs of participants to the maximum extent possible by available funding. Prospective participants should also seek support from their national scientific funding agencies. The agencies, such as the American NSF or the German DFG, may provide some ASI travel funds upon the recommendation of an ASI director. Additional funds exist for students from Greece, Portugal and Turkey. We are also seeking additional sponsorship of ASI. Every sponsor will be fully acknowledged at the ASI site as well as in the printed proceedings. Correspondence and Registration: Applications should be forwarded to Dr. Cherkassky at the above address. Applications arriving after March 5, 1993 may not be considered. All approved applicants will be informed of the exact registration arrangements. Informal email inquiries can be addressed to Dr. Cherkassky at nato_asi at ee.umn.edu  From takagi at diva.berkeley.edu Thu Feb 4 21:48:15 1993 From: takagi at diva.berkeley.edu (Hideyuki Takagi) Date: Thu, 4 Feb 93 18:48:15 -0800 Subject: BISC Special Seminar Message-ID: <9302050248.AA02922@diva.Berkeley.EDU> Dear Colleagues: We will hold the BISC Special Seminar at UC Berkeley one day before FUZZ-IEEE'93/ICNN'93. Please forward the following announcement to widely. Hideyuki TAKAGI ----------------------------------------------------------------------- EXTENDED BISC SPECIAL SEMINAR 10:30AM-5:45PM, March 28 (Sunday), 1993 Sibley Auditorium (210) in Bechtel Hall University of California, Berkeley CA 94720 BISC (Berkeley Initiative for Soft Computing) of UC Berkeley will hold a Special Seminar to take advantage of the presence in the San Francisco area of the luminaries attending FUZZ-IEEE'93/ICNN'93. We hope that your schedule will allow you to participate. PROGRAM: 10:30-11:00 Lotfi A. Zadeh (Univ. of California, Berkeley) Soft Computing 11:00-12:00 Hidetomo Ichihashi / Univ. of Osaka Prefecture Neuro-Fuzzy Approaches to Optimization and Inverse Problems 12:00- 1:30 (lunch) 1:30- 2:30 Philippe Smets (Iridia Universite Libre de Bruxelles) Imperfect information : Imprecision - Uncertainty 2:30- 3:30 Teuvo Kohonen (Helsinki University of Technology) Competitive-Learning Neural Networks are closest to Biology 3:30- 3:45 (break) 3:45- 4:45 Michio Sugeno (Tokyo Institute of Technology) Fuzzy Modeling towards Qualitative Modeling 4:45- 5:45 Hugues Bersini (Iridia Universite Libre de Bruxelles) The Immune Learning Mechanisms: Reinforcement, Recruitment and their Applications REGISTRATION: Attendance is free and registration is not required. HOW TO GET HERE: [BART subway from San Francisco downtown] The closest station to the SF Hilton Hotel is the Powell Str. Station. Berkeley is a safe 24 minute ride from the Powell Str. Station. You must catch the Concord bound train and transfer onto a Richmond bound train at the Oakland City Center-12th Str. Station. Trains on Sunday rendezvous every 20 minutes as indicated below. Powell 12th Str. Berkeley 8:17 ---- 8:31 8:31 ---- 8:41 8:37 ---- 8:51 8:51 ---- 9:01 8:57 ---- 9:11 9:11 ---- 9:21 9:17 ---- 9:31 9:31 ---- 9:41 9:37 ---- 9:51 9:51 ---- 10:01 It takes 15-20 minutes on foot from the Berkeley BART Station to reach Bechtel Hall, which is located on the North-East part of campus. Bechtel Hall is just North of Evans Hall, home of the Computer Science Division. North Gate is the nearest campus gate. [TAXI] You can take a taxi from the front of the Berkeley BART Station. Ask the taxi driver to enter from East Gate on campus and let you off at Mining Circle. The tallest building adjacent to the circle is Evans Hall. Bechtel Hall is just north of the Evans. [CAR] Get off at the University Ave. exit from Interstate 80. The east end of University Ave. is the West Gate to UC Berkeley. Most street parking is free on Sunday, but it may be scarce and remember to read the signs. If you feel you must park in a lot, we recommend UCB Parking Structure H which is located at the corner of Hearst and La Loma Avenues. You must buy an all day parking ticket from the vending machine located on the 2nd level (the only one in the structure). You need to prepare 12 quarters. Illegal parking in Berkeley is expensive. CONTACT ADDRESS: Hideyuki TAKAGI, Coordinator of this seminar (takagi at cs.berkeley.edu) Lotfi A. Zadeh, Director of BISC (zadeh at cs.berkeley.edu) Computer Science Division University of California at Berkeley Berkeley, CA 94720 FAX <+1>510-642-5775  From ira at linus.mitre.org Fri Feb 5 10:06:53 1993 From: ira at linus.mitre.org (ira@linus.mitre.org) Date: Fri, 5 Feb 93 10:06:53 -0500 Subject: vision position posting Message-ID: <9302051506.AA09737@ellington.mitre.org> Neural Network Vision Research Position The MITRE Corporation is looking for a Vision Modeler with an excellent math background, knowledge of signal processing techniques, considerable experience modeling biological low-level vision processes and broad knowledge of current neural network learning algorithm research. This is an *applied* research position which has as its goal the application of vision modeling techniques to real tasks such as 2D and 3D object recognition in synthetic and real world imagery. This position requires software implementation of models in C language. The position may also involve management responsibilities. The position is located in Bedford, Massachusetts. We are looking for someone with availability within the next two months. Interested applicants should send a resume and representative publications to: Ira Smotroff Lead Scientist The MITRE Corporation MS K331 202 Burlington Rd. Bedford, MA 01730-1420  From heiniw at sun1.eeb.ele.tue.nl Fri Feb 5 09:56:17 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Fri, 5 Feb 1993 15:56:17 +0100 (MET) Subject: Does backprop need the derivative ?? Message-ID: <9302051456.AA02038@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 1054 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/b760eda0/attachment.ksh From wallyn at capsogeti.fr Fri Feb 5 13:05:20 1993 From: wallyn at capsogeti.fr (Alexandre Wallyn) Date: Fri, 5 Feb 93 19:05:20 +0100 Subject: Neural networks in Product modelling Message-ID: <9302051805.AA13434@gizmo> I am trying to evaluate the state of the art in the connectionist applications in Product Modelling (or engineering design). After looking in several journals (Neural Networks, IJCNN proceedings, Neuro-Nimes, and some history of connectionist mailing list), I only found: "Neural Network in Engineering Design" (H.Adeli, IJCNN 1990) (very general) Indirect quotations of general work in AI Wright University (1988) Modelling of MOS components in University of Dortmund (1990) and CadChem product of AIWare for product modelling and chemical formulation (seem to be uses by General Tire and Good Year). Are these applications in product modelling so scarce, or are they published in other forums ? I thank you in advance for your help. I will, of course, publish a summary of the replies. Alexandre Wallyn CAP GEMINI INNOVATION 86-90, rue Thiers 92513 BOULOGNE FRANCE wallyn at capsogeti.fr  From ira at linus.mitre.org Fri Feb 5 10:14:46 1993 From: ira at linus.mitre.org (ira@linus.mitre.org) Date: Fri, 5 Feb 93 10:14:46 -0500 Subject: vision position: US Citizens only Message-ID: <9302051514.AA09747@ellington.mitre.org> Sorry to clutter your mail boxes. The Neural Network Vision Position at The MITRE Corporation is open only to US Citizens. Ira Smotroff  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Feb 5 22:55:28 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 05 Feb 93 22:55:28 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Fri, 05 Feb 93 15:56:17 +0100. <9302051456.AA02038@sun1.eeb.ele.tue.nl> Message-ID: In his paper, 'An Empirical Study of Learning Speed in Back-Propagation Networks', Scott E. Fahlmann shows that with the encoder/decoder problem it is possible to replace the derivative of the transfer function by a constant. I have been able to reproduce this example. However, for several other examples, it was not possible to get the network converged using a constant for the derivative. Interesting. I just tried this on encoder problems and a couple of other simple things, and leapt to the conclusion that it was a general phenomenon. It seems plausible to me that any "derivative" function that preserves the sign of the error and doesn't have a "flat spot" (stable point of 0 derivative) would work OK, but I don't know of anyone who has made an extensive study of this. I'd be interested in hearing more about the problems you've encountered and about any results others send to you. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From marwan at sedal.su.oz.au Sat Feb 6 07:49:53 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Sat, 6 Feb 1993 23:49:53 +1100 Subject: Does backprop need the derivative ?? Message-ID: <9302061249.AA17234@sedal.sedal.su.OZ.AU> As the intention of the inquirer is the analog implementation of backprop, I see two problems: 1- the question whether the derivative can be replaced by a constant, and more importantly 2- whether the precision of the analog implementation will be high enough for backprop to work. Regarding (1), it is likely as Scott Fahlman suggested any derivative that "preserves" the error sign may do the job. The question however is the implication in terms of convergence speed, and the comparison thereof with perturbation type training methods. Regarding (2), there has been several reports indicating that backpropagation simply does not work when the number of bits is reduced towards 6-8 bits! Marwan ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From jlm at crab.psy.cmu.edu Sat Feb 6 08:39:43 1993 From: jlm at crab.psy.cmu.edu (James L. McClelland) Date: Sat, 6 Feb 93 08:39:43 EST Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@sef-pmax.slisp.cs.cmu.edu's message of Fri, 05 Feb 93 22:55:28 EST Message-ID: <9302061339.AA19977@crab.psy.cmu.edu.noname> Re the discussion concerning replacing the derivative of the activations of units with a constant: Some work has been done using the activation rather than the derivative of the activation by Nestor Schmajuk. He is interested in biologically plausible models and tends to keep hidden units in the bottom half of the sigmoid. In that case they can be approximated by exponentials and so the derivative can be approximated by the activation. Approx ref: Schmajuk and DiCarlo, Psychological Review, 1992 - Jay McClelland  From ljubomir at darwin.bu.edu Sat Feb 6 11:17:56 1993 From: ljubomir at darwin.bu.edu (Ljubomir Buturovic) Date: Sat, 6 Feb 93 11:17:56 -0500 Subject: Does backprop need the derivative ?? Message-ID: <9302061617.AA13641@darwin.bu.edu> Mr. Heini Withagen says: > I am working on an analog chip implementing a feedforward > network and I am planning to incorporate backpropagation learning > on the chip. If it would be the case that the backpropagation > algorithm doesn't need the derivative, it would simplify the > design enormously. We have trained multilayer perceptron without derivatives, using simplex algorithm for multidimensional optimization (not to be confused with simplex algorithm for linear programming). From our experiments, it turns out that it can be done, however the number of weights is seriously limited, since the memory complexity of simplex is N^2, where N is the total number of variable weights in the network. See reference for further details (the reference is available as a LaTeX file from ljubomir at darwin.bu.edu). Lj. Buturovic, Lj. Citkusev, ``Back Propagation and Forward Propagation,'' in Proc. Int. Joint Conf. Neural Networks, (Baltimore, MD), 1992, pp. IV-486 -- IV-491. Ljubomir Buturovic Boston University BioMolecular Engineering Research Center 36 Cummington Street, 3rd Floor Boston, MA 02215 office: 617-353-7123 home: 617-738-6487  From gary at cs.ucsd.edu Sat Feb 6 11:20:57 1993 From: gary at cs.ucsd.edu (Gary Cottrell) Date: Sat, 6 Feb 93 08:20:57 -0800 Subject: Does backprop need the derivative ?? Message-ID: <9302061620.AA29550@odin.ucsd.edu> I happen to know it doesn't work for a more complicated encoder problem: Image compression. When Paul Munro & I were first doing image compression back in 86, the error would go down and then back up! Rumelhart said: "there's a bug in your code" and indeed there was: we left out the derivative on the hidden units. -g.  From radford at cs.toronto.edu Sun Feb 7 12:24:15 1993 From: radford at cs.toronto.edu (Radford Neal) Date: Sun, 7 Feb 1993 12:24:15 -0500 Subject: Does backprop need the derivative? Message-ID: <93Feb7.122429edt.227@neuron.ai.toronto.edu> Other posters have discussed, regarding backprop... > ... the question whether the derivative can be replaced by a constant, To clarify, I believe the intent is that the "constant" have the same sign as the derivative, but have constant magnitude. Marwan Jabri says... > Regarding (1), it is likely as Scott Fahlman suggested any derivative > that "preserves" the error sign may do the job. One would expect this to work only for BATCH training. On-line training approximates the batch result only if the net result of updating the weights on many training cases mimics the summing of derivatives in the batch scheme. This will not be the case if a training case where the derivative is +0.00001 counts as much as one where it is +10000. This is not to say it might not work in some cases. There's just no reason to think that it will work generally. Radford Neal  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Feb 7 12:56:03 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 07 Feb 93 12:56:03 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Sat, 06 Feb 93 23:49:53 +1100. <9302061249.AA17234@sedal.sedal.su.OZ.AU> Message-ID: As the intention of the inquirer is the analog implementation of backprop, I see two problems: 1- the question whether the derivative can be replaced by a constant, and more importantly 2- whether the precision of the analog implementation will be high enough for backprop to work. ... Regarding (2), there has been several reports indicating that backpropagation simply does not work when the number of bits is reduced towards 6-8 bits! It is true that several studies show a sudden failure of backprop learning when you use fixnum arithmetic and reduce the number of bits per word. The point of failure seems to be problem-specific, but is often around 10-14 bits (incuding sign). Marcus Hoehfeld and I studied this issue and found that the source of the failure was a quantization effect: the learning algorithm needs to accumulate lots of small steps, for weight-update or whatever, and since these are smaller than half the low-order bit, it ends up accumulating a lot of zeros instead. We showed that if a form of probabilisitic rounding (dithering) is used to smooth over these quantization steps, learning continues on down to 4 bits or fewer, with only a gradual degradation in learning time, number of units/weights required, and quality of the result. This study used Cascor, but we believe that the results hold for backprop as well. Marcus Hoehfeld and Scott E. Fahlman (1992) "Learning with Limited Numerical Precision Using the Cascade-Correlation Learning Algorithm" in IEEE Transactions on Neural Networks, Vol. 3, no. 4, July 1992, pp. 602-611. Of course, a learning system implemented in analog hardware might have only a few bits of accuracy due to noise and nonlinearity in the circuits, but it wouldn't suffer from this quantization effect, since you get a sort of probabilistic dithering for free. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From kolen-j at cis.ohio-state.edu Sun Feb 7 11:31:20 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Sun, 7 Feb 93 11:31:20 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: "James L. McClelland"'s message of Sat, 6 Feb 93 08:39:43 EST <9302061339.AA19977@crab.psy.cmu.edu.noname> Message-ID: <9302071631.AA19877@pons.cis.ohio-state.edu> Back prop does not need THE derivative. I have some empirical results which show that most of the internal mathematical operators of back prop can be replaced by qualitatively similar operators. I'm not talking about reducing bit width, as most of the literature does. I was interested in what happens when you replace multiplication with maximum, the sigmoid with a generic bump, etc. What was suprising was that all the tweeks basically worked. Back prop is "functionally" stable in the sense that the learning functional ability remains regardless of minor shifts in internal organization. The reason that the reduced accuracy results are the way that they are can be traced to the loss of continuity rather than the loss of bits. John Kolen  From gary at cs.UCSD.EDU Sun Feb 7 13:09:19 1993 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Sun, 7 Feb 93 10:09:19 -0800 Subject: Does backprop need the derivative ?? Message-ID: <9302071809.AA00283@odin.ucsd.edu> The sign is always positive. Hence not using it is an approximation that preserves the sign. -g.  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Feb 7 13:02:42 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 07 Feb 93 13:02:42 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Sat, 06 Feb 93 08:20:57 -0800. <9302061620.AA29550@odin.ucsd.edu> Message-ID: I happen to know it doesn't work for a more complicated encoder problem: Image compression. When Paul Munro & I were first doing image compression back in 86, the error would go down and then back up! Rumelhart said: "there's a bug in your code" and indeed there was: we left out the derivative on the hidden units. -g. I can see why not using the true derivative of the sigmoid, but just an approximation that preserves the sign, might cause learning to bog down, but I don't offhand see how it could cause the error to go up, at least in a net with only one hidden layer and with a monotonic activation function. I wonder if this problem would also occur in a net using the "sigmoid prime offset", which adds a small constant to the derivative of the sigmoid. I haven't seen it. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From marwan at sedal.su.oz.au Sun Feb 7 18:13:36 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Mon, 8 Feb 1993 10:13:36 +1100 Subject: Does backprop need the derivative ?? Message-ID: <9302072313.AA24874@sedal.sedal.su.OZ.AU> > It is true that several studies show a sudden failure of backprop learning > when you use fixnum arithmetic and reduce the number of bits per word. The > point of failure seems to be problem-specific, but is often around 10-14 > bits (incuding sign). > > Marcus Hoehfeld and I studied this issue and found that the source of the > failure was a quantization effect: the learning algorithm needs to > accumulate lots of small steps, for weight-update or whatever, and since > these are smaller than half the low-order bit, it ends up accumulating a > lot of zeros instead. We showed that if a form of probabilisitic rounding > (dithering) is used to smooth over these quantization steps, learning > continues on down to 4 bits or fewer, with only a gradual degradation in > learning time, number of units/weights required, and quality of the result. > This study used Cascor, but we believe that the results hold for backprop > as well. > > Marcus Hoehfeld and Scott E. Fahlman (1992) "Learning with Limited > Numerical Precision Using the Cascade-Correlation Learning Algorithm" > in IEEE Transactions on Neural Networks, Vol. 3, no. 4, July 1992, pp. > 602-611. > Yun Xie and I have tried simular experiments on the Sonar and ECG data, and it is fair to say that standard backprop gives up about 10 bits [2]. In a closer look at the quantisation effects you would find that the signal/noise ratio depends on the number of layers[1]. As you go deeper you require less precision. This would be a source of variation between backprop and cascor. > Of course, a learning system implemented in analog hardware might have only > a few bits of accuracy due to noise and nonlinearity in the circuits, but > it wouldn't suffer from this quantization effect, since you get a sort of > probabilistic dithering for free. > Hmmm... precision also suffers from number of operations in analog implementations. The free dithering you get is every where including in your errors! The gradient descent turns into a yoyo. This is well explained in [2, 3]. The best way of using backprop or more efficiently, conjuguate gradient is to do the training off-chip and then to download the (truncated) weights. Our experience in the training of real analog chips shows that some further in-loop training is required. Note our chips were ultra low power and you may have less problems with strong inversion implementations. Regarding the idea of Simplex that has been suggested. The inquirer was talking about on-chip learning. Have you in your experiments done a limited precision Simplex? Have you tried it on a chip in in-loop mode? Philip Leong here has tried a similar idea (I think) a while back. The problem with this approach is that you need to a have a very good guess at your starting point as the Simplex will move you from one vertex (feasible solution) to another while expanding the weight solution space. Philip's experience is that it does work for small problems when you have a good guess! At the last NIPS, there were 4 posters about learning in or for analog chips. The inquirer may wish to consult these papers (two at least were advertised deposited in the neuroprose archive, one by Gert Cauwengergh and one by Barry Flower and I). So far, for us, the most reliable analog chips training algorithm has been the combined search algorithm (modified weight perturbation and partial random search) [3]. I will be very interested in hearing more about experiments where analog chips are trained. Marwan [1] Yun Xie and M. Jabri, Analysis of the Effects of Quantization in Multi-layer Neural Networks Using A Statistical Model, IEEE Transactions on Neural Networks, Vol. 3, No. 2, pp. 334-338, March, 1992. [2] M. Jabri, S. Pickard, P. Leong and Y. Xie, Algorithms and Implementation Issues in Analog Low Power Learning Neural Nertwork Chips, To appear in the Intenational Journal on VLSI Signal Processing, early 1993, USA. [3] Y. Xie and M. Jabri, On the Training of Limited Precision Multi-layer Perceptrons. Proceedings of the International Joint Conference on Neural Networks, pp III-942-947, July 1992, Baltimore, USA. ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From takagi at diva.berkeley.edu Sun Feb 7 14:36:59 1993 From: takagi at diva.berkeley.edu (Hideyuki Takagi) Date: Sun, 7 Feb 93 11:36:59 -0800 Subject: attendance restriction at BISC Special Seminar Message-ID: <9302071936.AA00803@diva.Berkeley.EDU> ORGANIZATIONAL CHANGE in Extended BISC Special Seminar 10:30AM-5:45PM, March 28 (Sunday), 1993 Sibley Auditorium (210) in Bechtel Hall University of California, Berkeley CA 94720 Dear Colleagues: This is to inform you of an organizational change in the Extended BISC Special Seminar which was announced on February 4. Most of speakers in the regular BISC Seminar are associated with companies and universities in the Bay area. The motivations for the Extended BISC Seminar was to take advantage of the presence in the Bay area of some of the leading contributors to fuzzy logic and neural network theory from abroad, who will be participating in FUZZ-IEEE'93 / ICNN'93. A problem which became apparent is that because both the Extended BISC Seminar and the FUZZ-IEEE'93/ICNN'93 tutorials are scheduled to take place on the same day, the BISC Seminar may have an adverse effect on registration for the conference tutorilas. To resolve this problem, it was felt that it may be nessary to restrict attendance at the Extended BISC Seminar to students and faculty in the Bay area who normally attend the BISC Seminar. In this way, the Extended BISC Seminar would serve its usual role and at the same time bring to the Berkeley Campus some of the leading contributors to soft computing. The publicity for the Extended BISC Seminar will state that attendance is limited to students and faculty in the Bay area. Sincerely, BISC (Berkeley Initiative for Soft Computing) ---------------------------------------------  From mav at cs.uq.oz.au Sun Feb 7 19:33:21 1993 From: mav at cs.uq.oz.au (Simon Dennis) Date: Mon, 08 Feb 93 10:33:21 +1000 Subject: Learning in Memory Technical Report Message-ID: <9302080033.AA10081@uqcspe.cs.uq.oz.au> The following technical report is available for anonymous ftp. TITLE: Integrating Learning into Models of Human Memory: The Hebbian Recurrent Network AUTHORS: Simon Dennis and Janet Wiles ABSTRACT: We develop an interactive model of human memory called the Hebbian Recurrent Network (HRN) which integrates work in the mathematical modeling of memory with that in error correcting connectionist networks. It incorporates the matrix model (Pike, 1984; Humphreys, Bain & Pike, 1989) into the Simple Recurrent Network (SRN, Elman, 1989). The result is an architecture which has the desirable memory characteristics of the matrix model such as low interference and massive generalization but which is able to learn appropriate encodings for items, decision criteria and the control functions of memory which have traditionally been chosen a priori in the mathematical memory literature. Simulations demonstrate that the HRN is well suited to a recognition task inspired by typical memory paradigms. When compared against the SRN the HRN is able to learn longer lists, generalizes from smaller training sets, and is not degraded significantly by increasing the vocabulary size. Please mail correspondence to mav at cs.uq.oz.au Ftp Instructions: $ ftp exstream.cs.uq.oz.au Connected to exstream.cs.uq.oz.au. 220 exstream FTP server (Version 6.12 Fri May 8 16:33:17 EST 1992) ready. Name (exstream.cs.uq.oz.au:mav): anonymous 331 Guest login ok, send e-mail address as password. Password: 230- Welcome to ftp.cs.uq.oz.au 230-This is the University of Queensland Computer Science Anonymous FTP server. 230-For people outside of the department, please restrict your usage to outside 230-of the hours 8am to 6pm. 230- 230-The local time is Mon Feb 8 10:26:05 1993 230- 230 Guest login ok, access restrictions apply. ftp> cd pub/TECHREPORTS/department 250 CWD command successful. ftp> bin 200 Type set to I. ftp> get TR0252.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for TR0252.ps.Z (160706 bytes). 226 Transfer complete. local: TR0252.ps.Z remote: TR0252.ps.Z 160706 bytes received in 0.71 seconds (2.2e+02 Kbytes/s) ftp> quit 221 Goodbye. $ Printing Instructions: $ zcat TR0252.ps.Z | lpr  From efiesler at idiap.ch Mon Feb 8 03:22:31 1993 From: efiesler at idiap.ch (E. Fiesler) Date: Mon, 8 Feb 93 09:22:31 +0100 Subject: Does backprop need the derivative ?? Message-ID: <9302080822.AA22484@idiap.ch> Marwan Jabri wrote: > Date: Sat, 6 Feb 1993 23:49:53 +1100 > From: Marwan Jabri > Subject: Re: Does backprop need the derivative ?? > > As the intention of the inquirer is the analog implementation of > backprop, I see two problems: 1- the question whether the derivative can > be replaced by a constant, and more importantly 2- whether the precision > of the analog implementation will be high enough for backprop to work. > > Regarding (1), ... > > Regarding (2), there has been several reports indicating that > backpropagation simply does not work when the number of bits is reduced > towards 6-8 bits! This is often reported for standard backpropagation. However, a simple extension of backpropagation can make it work for any precision; up to 1-2 bits. I'll append the reference(s) below. E. Fiesler Directeur de Recherche IDIAP Case postale 609 CH-1920 Martigny Switzerland @InProceedings{Fiesler-90, Author = "E. Fiesler and A. Choudry and H. J. Caulfield", Title = "A Weight Discretization Paradigm for Optical Neural Networks", BookTitle = "Proceedings of the International Congress on Optical Science and Engineering", Volume = "SPIE-1281", Pages = "164--173", Publisher = "The International Society for Optical Engineering Proceedings", Address = "Bellingham, Washington, U.S.A.", Year = "1990", ISBN = "0-8194-0328-8", Language = "English" } @Article{Fiesler-93, Author = "E. Fiesler and A. Choudry and H. J. Caulfield", Title = "A Universal Weight Discretization Method for Multi-Layer Neural Networks", Journal = "IEEE Transactions on Systems, Man, and Cybernetics (IEEE-SMC)", Publisher = "The Institute of Electrical and Electronics Engineers (IEEE), Inc.", Address = "New York, New York", Year = "1993", ISSN = "0018-9472", Language = "English", Note = "Accepted for publication." }  From annette at cdu.ucl.ac.uk Mon Feb 8 05:13:06 1993 From: annette at cdu.ucl.ac.uk (Annette Karmiloff-Smith) Date: Mon, 8 Feb 93 10:13:06 GMT Subject: Cognitive Development for Connectionists Message-ID: <9302081013.AA14475@cdu.ucl.ac.uk> Below are details of two articles and a book which may be of interest to connectionists: A.Karmiloff-Smith (1992), Connection Science, Vo.4, Nos. 3 & 4, 253- 269. NATURE, NURTURE ANDS PDP: Preposterous Developmental Postulates? (N.B. the question mark - I end on: Promising Developmental Postulates!) Abstract: In this article I discuss the nature/nurture debate in terms of evidence and theorizing from the field of cognitive development, and pinpoint various problems where the Connectionist framework needs to be further explored from this perspective. Evidence from normal and abnormal developmental phenotypes points to some domain-specific constraints on early learning. Yet, by invoking the dynamics of epigenesis, I avoid recourse to a strong Nativist stance and remain within the general spirit of Connectionism. _____________________________________________________________ A. Karmiloff-Smith (1992) Technical Report TR.PDP.CNS.92.7, Carnegie Mellon University, Pittsburgh. ABNORMAL PHENOTYPES AND THE CHALLENGES THEY POSE TO CONNECTIONIST MODELS OF DEVELOPMENT Abstract: The comparison of different abnormal phenotypes (e.g. Williams syndrome, Down syndrome, autism, hydrocephalus with associated myelomeningocele) raises a number of questions about domain-general versus domain-specific processes and suggests that development stems from domain-specific predispositions which channel infantsU attention to proprietary inputs. This is not to be confused with a strong Nativist position. Genetically fully specified modules are not the starting point of development. Rather, a process of gradual modularization builds on skeletal domain-specific predispositions (architectural and/or representational) which give the normal infant a small but significant head-start. It is argued that Down syndrome infants may lack these head-starts, whereas individuals with Williams syndrome, autism and hydrocephalus with associated myelomeningocele have a head-start in selected domains only, leading to different cognitive profiles despite equivalent input. Stress is placed on the importance of exploring a developing system, rather than a lesioned adult system. The position developed in the paper not only contrasts with the strong Nativist stance, but also with the view that domain-general processes are simply applied to whatever inputs the child encounters. The comparison of different phenotypical outcomes is shown to pose interesting challenges to connectionist simulations of development. ______________________________________________________________ A.Karmiloff-Smith (1992) BEYOND MODULARITY: A DEVELOPMENTAL PERSPECTIVE ON COGNITIVE SCIENCE. MIT Press/Bradford Books. A book intended to excite connectionists and other non- developmentalists about the essential role that a developmental perspective has in understanding the special nature of human cognition compared to other species. Contents: 1. Taking development seriously 2. The child as a linguist 3. The child as a physicist 4. The child as a mathematician 5. The child as a psychologist 6. The child as a notator 7. Nativism, domain specificity and PiagetUs constructivism 8. Modelling development: representational redescription and connectionism 9. Concluding speculations Reprints of articles obtainable from: Annette Karmiloff-Smith Medical Research Council Cognitive Development Unit London WC1H 0AH. U.K.  From SCHOLTES at ALF.LET.UVA.NL Mon Feb 8 06:19:00 1993 From: SCHOLTES at ALF.LET.UVA.NL (SCHOLTES@ALF.LET.UVA.NL) Date: 08 Feb 1993 12:19 +0100 (MET) Subject: PhD Dissertation Available Message-ID: <346B17ED606070C5@VAX1.SARA.NL> =================================================================== Ph.D. DISSERTATION AVAILABLE on Neural Networks, Natural Language Processing, Information Retrieval 292 pages and over 350 references =================================================================== A Copy of the dissertation "Neural Networks in Natural Language Processing and Information Retrieval" by Johannes C. Scholtes can be obtained for cost price and fast airmail- delivery at US$ 25,-. Payment by Major Creditcards (VISA, AMEX, MC, Diners) is accepted and encouraged. Please include Name on Card, Number and Exp. Date. Your Credit card will be charged for Dfl. 47,50. Within Europe one can also send a Euro-Cheque for Dfl. 47,50 to: University of Amsterdam J.C. Scholtes Dufaystraat 1 1075 GR Amsterdam The Netherlands Do not forget to mention a surface shipping address. Please allow 2-4 weeks for delivery. Abstract 1.0 Machine Intelligence For over fifty years the two main directions in machine intelligence (MI), neural networks (NN) and artificial intelligence (AI), have been studied by various persons with many different backgrounds. NN and AI seemed to conflict with many of the traditional sciences as well as with each other. The lack of a long research history and well defined foundations has always been an obstacle for the general acceptance of machine intelligence by other fields. At the same time, traditional schools of science such as mathematics and physics developed their own tradition of new or "intelligent" algorithms. Progress made in the field of statistical reestimation techniques such as the Hidden Markov Models (HMM) started a new phase in speech recognition. Another application of the progress of mathematics can be found in the application of the Kalman filter in the interpretation of sonar and radar signals. Much more examples of such "intelligent" algorithms can be found in the statistical classification en filtering techniques of the study of pattern recognition (PR). Here, the field of neural networks is studied with that of pattern recognition in mind. Although only global qualitative comparisons are made, the importance of the relation between them is not to be underestimated. In addition it is argued that neural networks do indeed add something to the fields of MI and PR, instead of competing or conflicting with them. 2.0 Natural Language Processing The study of natural language processing (NLP) exists even longer than that of MI. Already in the beginning of this century people tried to analyse human language with machines. However, serious efforts had to wait until the development of the digital computer in the 1940s, and even then, the possibilities were limited. For over 40 years, symbolic AI has been the most important approach in the study of NLP. That this has not always been the case, may be concluded from the early work on NLP by Harris. As a matter of fact, Chomsky's Syntactic Structures was an attack on the lack of structural properties in the mathematical methods used in those days. But, as the latter's work remained the standard in NLP, the former has been forgotten completely until recently. As the scientific community in NLP devoted all its attention to the symbolic AI-like theories, the only useful practical implementation of NLP systems were those that were based on statistics rather than on linguistics. As a result, more and more scientists are redirecting their attention towards the statistical techniques available in NLP. The field of connectionist NLP can be considered as a special case of these mathematical methods in NLP. More than one reason can be given to explain this turn in approach. On the one hand, many problems in NLP have never been addressed properly by symbolic AI. Some examples are robust behavior in noisy environments, disambiguation driven by different kinds of knowledge, commensense generalizations, and learning (or training) abilities. On the other hand, mathematical methods have become much stronger and more sensitive to specific properties of language such as hierarchical structures. Last but not least, the relatively high degree of success of mathematical techniques in commercial NLP systems might have set the trend towards the implementation of simple, but straightforward algorithms. In this study, the implementation of hierarchical structures and semantical features in mathematical objects such as vectors and matrices is given much attention. These vectors can then be used in models such as neural networks, but also in sequential statistical procedures implementing similar characteristics. 3.0 Information Retrieval The study of information retrieval (IR) was traditionally related to libraries on the one hand and military applications on the other. However, as PC's grew more popular, most common users loose track of the data they produced over the last couple of years. This, together with the introduction of various "small platform" computer programs made the field of IR relevant to ordinary users. However, most of these systems still use techniques that have been developed over thirty years ago and that implement nothing more than a global surface analysis of the textual (layout) properties. No deep structure whatsoever, is incorporated in the decision whether or not to retrieve a text. There is one large dilemma in IR research. On the one hand, the data collections are so incredibly large, that any method other than a global surface analysis would fail. On the other hand, such a global analysis could never implement a contextually sensitive method to restrict the number of possible candidates returned by the retrieval system. As a result, all methods that use some linguistic knowledge exist only in laboratories and not in the real world. Conversely, all methods that are used in the real world are based on technological achievements from twenty to thirty years ago. Therefore, the field of information retrieval would be greatly indebted to a method that could incorporate more context without slowing down. As computers are only capable of processing numbers within reasonable time limits, such a method should be based on vectors of numbers rather than on symbol manipulations. This is exactly where the challenge is: on the one hand keep up the speed, and on the other hand incorporate more context. If possible, the data representation of the contextual information must not be restricted to a single type of media. It should be possible to incorporate symbolic language as well as sound, pictures and video concurrently in the retrieval phase, although one does not know exactly how yet... Here, the emphasis is more on real-time filtering of large amounts of dynamic data than on document retrieval from large (static) data bases. By incorporating more contextual information, it should be possible to implement a model that can process large amounts of unstructured text without providing the end-user with an overkill of information. 4.0 The Combination As this study is a very multi-disciplinary one, the risk exists that it remains restricted to a surface discussion of many different problems without analyzing one in depth. To avoid this, some central themes, applications and tools are chosen. The themes in this work are self-organization, distributed data representations and context. The applications are NLP and IR, the tools are (variants of) Kohonen feature maps, a well known model from neural network research. Self-organization and context are more related to each other than one may suspect. First, without the proper natural context, self-organization shall not be possible. Next, self-organization enables one to discover contextual relations that were not known before. Distributed data representation may solve many of the unsolved problems in NLP and IR by introducing a powerful and efficient knowledge integration and generalization tool. However, distributed data representation and self-organization trigger new problems that should be solved in an elegant manner. Both NLP and IR work on symbolic language. Both have properties in common but both focus on different features of language. In NLP hierarchical structures and semantical features are important. In IR the amount of data sets the limitations of the methods used. However, as computers grow more powerful and the data sets get larger and larger, both approaches get more and more common ground. By using the same models on both applications, a better understanding of both may be obtained. Both neural networks and statistics would be able to implement self-organization, distrib- uted data and context in the same manner. In this thesis, the emphasis is on Kohonen feature maps rather than on statistics. However, it may be possible to implement many of the techniques used with regular sequential mathematical algorithms. So, the true aim of this work can be formulated as the understanding of self-organization, distributed data representation, and context in NLP and IR, by in depth analysis of Kohonen feature maps. ==============================================================================  From george at psychmips.york.ac.uk Mon Feb 8 08:32:38 1993 From: george at psychmips.york.ac.uk (George Bolt) Date: Mon, 8 Feb 93 13:32:38 +0000 (GMT) Subject: Does backprop need the derivative ?? Message-ID: Heini Withagen wrote: In his paper, 'An Empirical Study of Learning Speed in Back-Propagation Networks', Scott E. Fahlmann shows that with the encoder/decoder problem it is possible to replace the derivative of the transfer function by a constant. I have been able to reproduce this example. However, for several other examples, it was not possible to get the network converged using a constant for the derivative. - end quote - I've looked at BP learning in MLP's w.r.t. fault tolerance and found that the derivative of the transfer function is used to *stop* learning. Once a unit's weights for some particular input (to that unit rather than the network) are sufficiently developed for it to decide whether to output 0 or 1, then weight changes are approximately zero due to this derivative. I would imagine that by setting it to a constant, then a MLP will over- learn certain patterns and be unable to converge to a state of equilibrium, i.e. all patterns are matched to some degree. A better route would be to set the derivative function to a constant over a range [-r,+r], where f[r] - (sorry) f( |r| ) -> 1.0. To make individual units robust with respect to weights, make r=c.a where f( |a| ) -> 1.0 and c is a small constant multiplicative value. - George Bolt University of York, U.K.  From movellan at cogsci.UCSD.EDU Mon Feb 8 20:33:19 1993 From: movellan at cogsci.UCSD.EDU (Javier Movellan) Date: Mon, 8 Feb 93 17:33:19 PST Subject: Does backprop need the derivative ?? In-Reply-To: Marwan Jabri's message of Sat, 6 Feb 1993 23:49:53 +1100 <9302061249.AA17234@sedal.sedal.su.OZ.AU> Message-ID: <9302090133.AA16068@cogsci.UCSD.EDU> My experience with Boltzmann machines and GRAIN/diffusion networks (the continuous stochastic version of the Boltzmann machine) has been that replacing the real gradient by its sign times a constant accelerates learning DRAMATICALLY. I first saw this technique in one of the original CMU tech reports on the Boltzmann machine. I believe Peterson and Hartman and Peterson and Anderson also used this technique, which they called "Manhattan updating", with the deterministic Mean Field learning algorithm. I believe they had an article in "Complex Systems" comparing Backprop and Mean-Field with both with standard gradient descent and with Manhattan updating. It is my understanding that the Mean-Field/Boltzmann chip developed at Bellcore uses "Manhattan Updating" as its default training method. Josh Allspector is the person to contact about this. At this point I've tried 4 different learning algorithms with continuous and discrete stochastic networks and in all cases Manhattan Updating worked better than straight gradient descent.The question is why Manhattan updating works so well (at least in stochastic and Mean-Field networks) ? One possible interpreation is that Manhattan updating limits the influence of outliers and thus it performs something similar to robust regression. Another interpretation is that Manhattan updating avoids the saturation regions, where the error space becomes almost flat in some dimensions, slowing down learning. One of the disadvantages of Manhattan updating is that sometimes one needs to reduce the weight change constant at the end of learning. But sometimes we also do this in standard gradient descent anyway. -Javier  From oby%firenze%venezia.ROCKEFELLER.EDU at ROCKVAX.ROCKEFELLER.EDU Mon Feb 8 20:42:08 1993 From: oby%firenze%venezia.ROCKEFELLER.EDU at ROCKVAX.ROCKEFELLER.EDU (Klaus Obermayer) Date: Mon, 8 Feb 93 20:42:08 -0500 Subject: No subject Message-ID: <9302090142.AA01612@firenze> The following article is available as a (hardcopy) preprint: Obermayer K. and Blasdel G.G. (1993), Geometry of Orientation and Ocular Dominance Columns in Monkey Striate Cortex, J. Neurosci., in press. Abstract: In addition to showing that ocular dominance is organized in slabs and that orientation preferences are organized in linear sequences likely to reflect slabs, Hubel and Wiesel (1974) discussed the intriguing possibility that slabs of orientation might intersect slabs of ocular dominance at some consistent angle. Advances in optical imaging now make it possible to test this possibility directly. When maps of orientation are analyzed quantitatively, they appear to arise from a combination of at least two competing themes: one where orientation preferences change linearly along straight axes, remaining constant along perpendicular axes and forming iso-orientation slabs along the way, and one where orientation preferences change continuously along circular axes, remaining constant along radial axes and forming singularities at the centers of the spaces enclosed. When orientation patterns are compared with ocular dominance patterns from the same cortical regions, quantitative measures reveal: 1) that singularities tend to lie at the centers of ocular dominance columns, 2) that linear zones (arising where orientation preferences change along straight axes) tend to lie at the edges of ocular dominance columns, and 3) that the short iso-orientation bands within each linear zone tend to intersect the borders of ocular dominance slabs at angles of approximately 90$^o$. ----------------------------------------------------------------- The original article contains color figures which - for cost reasons - have to be reproduced black and white. If you would like to obtain a copy, please send your surface mail address to: Klaus Obermayer The Rockefeller University oby at rockvax.rockefeller.edu -----------------------------------------------------------------  From thgoh at iss.nus.sg Tue Feb 9 01:05:52 1993 From: thgoh at iss.nus.sg (Goh Tiong Hwee) Date: Tue, 9 Feb 1993 14:05:52 +0800 (WST) Subject: Does Backprop need the derivative Message-ID: <9302090605.AA08961@iss.nus.sg> From fellous%hyla.usc.edu at usc.edu Wed Feb 10 21:48:50 1993 From: fellous%hyla.usc.edu at usc.edu (Jean-Marc Fellous) Date: Wed, 10 Feb 93 18:48:50 PST Subject: CNE / USC Workshop Reminder and Update. Message-ID: <9302110248.AA01295@hyla.usc.edu> Thank you for posting the following final announcement: *********************** Last Reminder and Update ************************ SCHEMAS AND NEURAL NETWORKS INTEGRATING SYMBOLIC AND SUBSYMBOLIC APPROACHES TO COOPERATIVE COMPUTATION A Workshop sponsored by the Center for Neural Engineering University of Southern California Los Angeles, CA 90089-2520 April 13th and 14th, 1993 Program Committee: Michael Arbib (Organizer), John Barnden, George Bekey, Francisco Cervantes-Perez, Damian Lyons, Paul Rosenbloom, Ron Sun, Akinori Yonezawa A previous announcement (reproduced below) announced a registra- tion fee of $150 and advertised the availability of hotel accom- modation at $70/night. To encourage the participation of qualified students we have made 3 changes: 1) We have appointed Jean-Marc Fellous as Student Chair for the meeting to coordinate the active involvement of such students. 2) We offer a Student Registration Fee of only $40 to students whose application is accompanied by a letter from their supervi- sor attesting to their student status. 3) Mr. Fellous has identified a number of lower-cost housing op- tions, and will respond to queries to fellous at rana.usc.edu The original announcement - with updated registration form - fol- lows: To design complex technological systems and to analyze complex biological and cognitive systems, we need a multilevel methodolo- gy which combines a coarse-grain analysis of cooperative or dis- tributed computation (we shall refer to the computing agents at this level as "schemas") with a fine-grain model of flexible, adaptive computation (for which neural networks provide a power- ful general paradigm). Schemas provide a language for distri- buted artificial intelligence, perceptual robotics, cognitive modeling, and brain theory which is "in the style of the brain", but at a relatively high level of abstraction relative to neural networks. The proposed workshop will provide a 2-hour introductory tutorial and problem statement by Michael Arbib, and sessions in which an invited paper will be followed by several contributed papers, selected from those submitted in response to this call for pa- pers. Preference will be given to papers which present practical examples of, theory of, and/or methodology for the design and analysis of complex systems in which the overall specification or analysis is conducted in terms of schemas, and where some but not necessarily all of the schemas are implemented in neural net- works. A list of sample topics for contributions is as follows, where a hybrid approach means one in which the abstract schema level is integrated with neural or other lower level models: Schema Theory as a description language for neural networks Modular neural networks Linking DAI to Neural Networks to Hybrid Architecture Formal Theories of Schemas Hybrid approaches to integrating planning & reaction Hybrid approaches to learning Hybrid approaches to commonsense reasoning by integrating neural networks and rule- based reasoning (using schema for the integration) Programming Languages for Schemas and Neural Networks Concurrent Object-Oriented Programming for Distributed AI and Neural Networks Schema Theory Applied in Cognitive Psychology, Linguistics, Robotics, AI and Neuroscience Prospective contributors should send a hard copy of a five-page extended abstract, including figures with informative captions and full references (either by regular mail or fax) by February 15, 1993 to: Michael Arbib, Center for Neural Engineering University of Southern California Los Angeles, CA 90089-2520 USA Tel: (213) 740-9220 Fax: (213) 746-2863 arbib at pollux.usc.edu] Please include your full address, including fax and email, on the paper. Notification of acceptance or rejection will be sent by email no later than March 1, 1993. There are currently no plans to issue a formal proceedings of full papers, but revised versions of ac- cepted abstracts received prior to April 1, 1993 will be collect- ed with the full text of the Tutorial in a CNE Technical Report which will be made available to registrants at the start of the meeting. [A useful way to structure such an abstract is in short numbered sections, where each section presents (in a small type face!) the material corresponding to one transparency/slide in a verbal presentation. This will make it easy for an audi- ence to take notes if they have a copy of the abstract at your presentation.] Hotel Information: Attendees may register at the hotel of their choice, but the closest hotel to USC is the University Hilton, 3540 South Figueroa Street, Los Angeles, CA 90007, Phone: (213) 748- 4141, Reservation: (800) 872-1104, Fax: (213) 748- 0043. A single room costs $70/night while a double room costs $75/night. Workshop participants must specify that they are "Schemas and Neural Networks Workshop" attendees to avail of the above rates. Information on student accommodation may be ob- tained from the Student Chair, Jean-Marc Fellous, fellous at rana.usc.edu. The registration fee of $150 ($40 for qualified students who in- clude a "certificate of student status" from their advisor) in- cludes a copy of the abstracts, coffee breaks, and a dinner to be held on the evening of April 13th. Those wishing to register should send a check payable to "Center for Neural Engineering, USC" for $150 ($40 for students) together with the following information to: Paulina Tagle Center for Neural Engineering University of Southern California University Park Los Angeles, CA 90089-2520 USA ---------------------------------------------------------- SCHEMAS AND NEURAL NETWORKS Center for Neural Engineering USC April 13 - 14, 1993 NAME: ___________________________________________ ADDRESS: _________________________________________ PHONE NO.: _______________ FAX:___________________ EMAIL: ___________________________________________ I intend to submit a paper: YES [ ] NO [ ]  From ljubomir at darwin.bu.edu Wed Feb 10 21:12:30 1993 From: ljubomir at darwin.bu.edu (Ljubomir Buturovic) Date: Wed, 10 Feb 93 21:12:30 -0500 Subject: Does backprop need the derivative? Message-ID: <9302110212.AA07255@darwin.bu.edu> Marwan Jabri: > Regarding the idea of Simplex that has been suggested. The inquirer was > talking about on-chip learning. Have you in your experiments done a > limited precision Simplex? Have you tried it on a chip in in-loop mode? > Philip Leong here has tried a similar idea (I think) a while back. The > problem with this approach is that you need to a have a very good guess at > your starting point as the Simplex will move you from one vertex (feasible > solution) to another while expanding the weight solution space. > Philip's experience is that it does work for small problems when you have > a good guess! No, we did not try limited precision Simplex, since the method has another serious limitation, which is memory complexity. So there is no point performing such refined studies until this problem is resolved, let alone on-chip implementation. The biggest problem we tried it on succesfully was 11-dimensional (i. e., input samples were 11-dimensional vectors). The initial guess was pseudo-random, like in back-propagation. In another, 12-dimensional example, it did not do well (neither did back-prop, but Simplex was much worse), so it might be true that it needs a good starting point. Ljubomir Buturovic Boston University BioMolecular Engineering Research Center 36 Cummington Street, 3rd Floor Boston, MA 02215 office: 617-353-7123 home: 617-738-6487  From mozer at dendrite.cs.colorado.edu Thu Feb 11 23:47:27 1993 From: mozer at dendrite.cs.colorado.edu (Michael C. Mozer) Date: Thu, 11 Feb 1993 21:47:27 -0700 Subject: Preprint: Neural net architectures for temporal sequence processing Message-ID: <199302120447.AA06812@neuron.cs.colorado.edu> -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- PLEASE DO NOT POST TO OTHER BOARDS -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- Neural net architectures for temporal sequence processing Michael C. Mozer Department of Computer Science University of Colorado I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes time-varying patterns requires two conceptually distinct components: a short-term memory that holds on to relevant past events and an associator that uses the short-term memory to classify or predict. The taxonomy is based on a characterization of short-term memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar-Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. To appear in: A. S. Weigend & N. A. Gershenfeld (Eds.), _Predicting the future and understanding the past_. Redwood City, CA: Addison-Wesley. Spring 1993. -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- To retrieve: unix> ftp archive.cis.ohio-state.edu Name: anonymous 230 Guest ogin ok, access restrictions apply. ftp> cd pub/neuroprose ftp> binary ftp> get mozer.architectures.ps.Z 200 PORT command successful. ftp> quit unix> zcat mozer.architectures.ps.Z | lpr Warning: May not print on wimpy laser printers.  From kolen-j at cis.ohio-state.edu Tue Feb 9 07:51:53 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 07:51:53 -0500 Subject: Does backprop need the derivative? In-Reply-To: Radford Neal's message of Sun, 7 Feb 1993 12:24:15 -0500 <93Feb7.122429edt.227@neuron.ai.toronto.edu> Message-ID: <9302091251.AA27813@pons.cis.ohio-state.edu> The sign of the derivative is always positive ( remember o(1-o) and 0 >Other posters have discussed, regarding backprop... > >> ... the question whether the derivative can be replaced by a constant, > >To clarify, I believe the intent is that the "constant" have the same >sign as the derivative, but have constant magnitude. I haven't been following this thread, but the following reference may be helpful to those that are. Blum (Annals of Math. Statistics vol. 25 1954 p.385) shows that if the "constant magnitude" is going to zero (so that the system is convergent) the convergence is not to a minimum of the expected error (this is usually what we want backprop to do), but to a minimum of the *median* of the error. Chris Darken darken at learning.scr.siemens.com  From munro at lis.pitt.edu Thu Feb 11 11:14:44 1993 From: munro at lis.pitt.edu (fac paul munro) Date: Thu, 11 Feb 93 11:14:44 EST Subject: Summary of "Does backprop need the derivative ??" In-Reply-To: Mail from 'Heini Withagen ' dated: Tue, 9 Feb 1993 11:46:06 +0100 (MET) Message-ID: <9302111614.AA15497@icarus.lis.pitt.edu> Forgive the review of college math, but there are a few issues, while obvious to many, might be worth reviewing here... [1] The gradient of a well-behaved single-valued function of N variables (here the error as a function of the weights) is generally orthogonal to an N-1 dimensional manifold on which the function is constant (an iso-error surface) [2] The effect of infinitesimal motion in the space on the function can be computed as the inner (dot) product of the gradient vector with the movement vector; thus, as long as the dot product between the gradient and the delta-w vector is negative, the error will decrease. That is, the new iso-error surface will correspond to a lower error value. [3] This implies that the signs of the errors is adequate to reduce the error, assuming the learning rate is sufficiently small, since any two vectors with all components the same sign must have a positive inner product! [They lie in the same orthant of the space] Having said all this, I must point out that the argument pertains only to single patterns. That is, eliminating the derivative term, is guaranteed to reduce the error for the pattern that is presented. Its effect on the error summed over the training set is not guaranteed, even for batch learning... One more caveat: Of course, if the nonlinear part of the units' transfer function is non-monotonic (i.e., the sign of the derivative varies), be sure to throw the derivative back in! - Paul Munro  From dhw at t13.Lanl.GOV Thu Feb 11 17:19:13 1993 From: dhw at t13.Lanl.GOV (David Wolpert) Date: Thu, 11 Feb 93 15:19:13 MST Subject: new paper Message-ID: <9302112219.AA23017@t13.lanl.gov> *************************************************************** DO NOT FORWARD TO OTHER BOARDS OR LISTS *************************************************************** The following paper has been placed in neuroprose, under the name wolpert.nips92.ps.Z. It is a major revision of an earlier preprint on the same topic. An abbreviated version (2 fewer pages) will appear in the proceedings of NIPS 92. 0N THE USE OF EVIDENCE IN NEURAL NETWORKS. David H. Wolpert, Santa Fe Institute Abstract: The Bayesian evidence approximation, which is closely related to generalized maximum likelihood, has recently been employed to determine the noise and weight-penalty terms for training neural nets. This paper shows that it is far simpler to perform the exact calculation than it is to set up the evidence approximation. Moreover, unlike that approximation, the exact result does not have to be re-calculated for every new data set. Nor does it require the running of complex numerical computer code (the exact result is closed form). In addition, it turns out that for neural nets, the evidence procedure's MAP estimate is *in toto* approximation error. Another advantage of the exact analysis is that it does not lead to incorrect intuition, like the claim that one can "evaluate different priors in light of the data". This paper ends by discussing sufficiency conditions for the evidence approximation to hold, along with the implications of those conditions. Although couched in terms of neural nets, the analysis of this paper holds for any Bayesian interpolation problem. Recover the file in the usual way: unix> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. 220 cheops.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password: {your address} 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get wolpert.nips92.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for rosenblatt.reborn.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye unix> uncompress wolpert.nips92.ps.Z unix> lpr wolpert.nips92.ps (or however you print postscript  From mozer at dendrite.cs.colorado.edu Fri Feb 12 00:10:05 1993 From: mozer at dendrite.cs.colorado.edu (Michael C. Mozer) Date: Thu, 11 Feb 1993 22:10:05 -0700 Subject: connectionist models summer school -- final call for applications Message-ID: <199302120510.AA06977@neuron.cs.colorado.edu> FINAL CALL FOR APPLICATIONS CONNECTIONIST MODELS SUMMER SCHOOL The University of Colorado will host the 1993 Connectionist Models Summer School from June 21 to July 3, 1993. The purpose of the summer school is to provide training to promising young researchers in connectionism (neural networks) by leaders of the field and to foster interdisciplinary collaboration. This will be the fourth such program in a series that was held at Carnegie-Mellon in 1986 and 1988 and at UC San Diego in 1990. Previous summer schools have been extremely successful and we look forward to the 1993 session with anticipation of another exciting event. The summer school will offer courses in many areas of connectionist modeling, with emphasis on artificial intelligence, cognitive neuroscience, cognitive science, computational methods, and theoretical foundations. Visiting faculty (see list of invited faculty below) will present daily lectures and tutorials, coordinate informal workshops, and lead small discussion groups. The summer school schedule is designed to allow for significant interaction among students and faculty. As in previous years, a proceedings of the summer school will be published. Applications will be considered only from graduate students currently enrolled in Ph.D. programs. About 50 students will be accepted. Admission is on a competitive basis. Tuition will be covered for all students, and we expect to have scholarships available to subsidize housing and meal costs, but students are responsible for their own travel arrangements. Applications should include the following materials: * a vita, including mailing address, phone number, electronic mail address, academic history, list of publications (if any), and relevant courses taken with instructors' names and grades received; * a one-page statement of purpose, explaining major areas of interest and prior background in connectionist modeling and neural networks; * two letters of recommendation from individuals familiar with the applicants' work (either mailed separately or in sealed envelopes); and * a statement from the applicant describing potential sources of financial support available (department, advisor, etc.) for travel expenses. Applications should be sent to: Connectionist Models Summer School c/o Institute of Cognitive Science Campus Box 344 University of Colorado Boulder, CO 80309 All application materials must be received by March 1, 1993. Admission decisions will be announced around April 15. If you have specific questions, please write to the address above or send e-mail to "cmss at cs.colorado.edu". Application materials cannot be accepted via e-mail. Organizing Committee Jeff Elman (UC San Diego) Mike Mozer (University of Colorado) Paul Smolensky (University of Colorado) Dave Touretzky (Carnegie Mellon) Andreas Weigend (Xerox PARC and University of Colorado) Additional faculty will include: Yaser Abu-Mostafa (Cal Tech) Sue Becker (McMaster University) Andy Barto (University of Massachusetts, Amherst) Jack Cowan (University of Chicago) Peter Dayan (Salk Institute) Mary Hare (Birkbeck College) Cathy Harris (Boston University) David Haussler (UC Santa Cruz) Geoff Hinton (University of Toronto) Mike Jordan (MIT) John Kruschke (Indiana University) Jay McClelland (Carnegie Mellon) Ennio Mingolla (Boston University) Steve Nowlan (Salk Institute) Dave Plaut (Carnegie Mellon) Jordan Pollack (Ohio State) Dean Pomerleau (Carnegie Mellon) Dave Rumelhart (Stanford) Patrice Simard (ATT Bell Labs) Terry Sejnowski (UC San Diego and Salk Institute) Sara Solla (ATT Bell Labs) Janet Wiles (University of Queensland) The Summer School is sponsored by the American Association for Artificial Intelligence, the National Science Foundation, Siemens Research Center, and the University of Colorado Institute of Cognitive Science. Colorado has recently passed a law explicitly denying protection for lesbians, gays, and bisexuals. However, the Summer School does not discriminate in admissions on the basis of age, sex, race, national origin, religion, disability, veteran status, or sexual orientation.  From heiniw at sun1.eeb.ele.tue.nl Tue Feb 9 05:46:06 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Tue, 9 Feb 1993 11:46:06 +0100 (MET) Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302091046.AA08161@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 191 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/7e2fa3eb/attachment.ksh From kolen-j at cis.ohio-state.edu Tue Feb 9 08:46:53 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 08:46:53 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@sef-pmax.slisp.cs.cmu.edu's message of Sun, 07 Feb 93 12:56:03 EST <9302091257.AA06456@everest.eng.ohio-state.edu> Message-ID: <9302091346.AA28166@pons.cis.ohio-state.edu> From: Scott_Fahlman at sef-pmax.slisp.cs.cmu.edu Of course, a learning system implemented in analog hardware might have only a few bits of accuracy due to noise and nonlinearity in the circuits, but it wouldn't suffer from this quantization effect, since you get a sort of probabilistic dithering for free. This assumes, of course, that the mechanism is actually "computing" using the available bits. Bits are the result of binary measurements. An analog device does not normally convert voltages or currents into a binary representation and then operate on it. An analog mechanism sloppilly implementing backprop should be able to tweak the weights in the general direction, but not necessarily the same direction as theoretical backprop. John Kolen  From KRUSCHKE at ucs.indiana.edu Tue Feb 9 09:45:45 1993 From: KRUSCHKE at ucs.indiana.edu (John K. Kruschke) Date: Tue, 9 Feb 93 09:45:45 EST Subject: postdoctoral traineeships available Message-ID: POST-DOCTORAL FELLOWSHIPS AT INDIANA UNIVERSITY Postdoctoral Traineeships in MODELING OF COGNITIVE PROCESSES Please call this notice to the attention of all interested parties. The Psychology Department and Cognitive Science Programs at Indiana University are pleased to announce the availability of one or more Postdoctoral Traineeships in the area of Modeling of Cognitive Processes. The appointment will pay rates appropriate for a new PhD (about $18,800), and will be for one year, starting after July 1, 1993. The duration could be extended to two years if a training grant from NIH is funded as anticipated (we should receive final notification by May 1). Post-docs are offered to qualified individuals who wish to further their training in mathematical modeling or computer simulation modeling, in any substantive area of cognitive psychology or Cognitive Science. We are particularly interested in applicants with strong mathematical, scientific, and research credentials. Indiana University has superb computational and research facilities, and faculty with outstanding credentials in this area of research, including Richard Shiffrin and James Townsend, co-directors of the training program, and Robert Nosofsky, Donald Robinson, John Castellan, John Kruschke, Robert Goldstone, Geoffrey Bingham, and Robert Port. Trainees will be expected to carry out original theoretical and empirical research in association with one or more of these faculty and their laboratories, and to interact with other relevant faculty and the other pre- and postdoctoral trainees. Interested applicants should send an up to date vitae, personal letter describing their specific research interests, relevant background, goals, and career plans, and reference letters from two individuals. Relevant reprints and preprints should also be sent. Women, minority group members, and handicapped individuals are urged to apply. PLEASE NOTE: The conditions of our anticipated grant restrict awards to US citizens, or current green card holders. Awards will also have a 'payback' provision, generally requiring awardees to carry out research or teach for an equivalent period after termination of the traineeship. Send all materials to: Professors Richard Shiffrin and James Townsend, Program Directors Department of Psychology, Room 376B Indiana University Bloomington, IN 47405 We may be contacted at: 812-855-2722; Fax: 812-855-4691 email: shiffrin at ucs.indiana.edu Indiana University is an Affirmative Action Employer  From kenm at prodigal.psych.rochester.edu Tue Feb 9 10:50:49 1993 From: kenm at prodigal.psych.rochester.edu (Ken McRae) Date: Tue, 9 Feb 93 10:50:49 EST Subject: paper available Message-ID: <9302091550.AA20269@prodigal.psych.rochester.edu> The following paper is now available in pub/neuroprose. Catastrophic Interference is Eliminated in Pretrained Networks Ken McRae University of Rochester & Phil A. Hetherington McGill University When modeling strictly sequential experimental memory tasks, such as serial list learning, connectionist networks appear to experience excessive retroactive interference, known as catastrophic interference (McCloskey & Cohen,1989; Ratcliff, 1990). The main cause of this interference is overlap among representations at the hidden unit layer (French, 1991; Hetherington,1991; Murre, 1992). This can be alleviated by constraining the number of hidden units allocated to representing each item, thus reducing overlap and interference (French, 1991; Kruschke, 1992). When human subjects perform a laboratory memory experiment, they arrive with a wealth of prior knowledge that is relevant to performing the task. If a network is given the benefit of relevant prior knowledge, the representation of new items is constrained naturally, so that a sequential task involving novel items can be performed with little interference. Three laboratory memory experiments (ABA free recall, serial list, and ABA paired-associate learning) are used to show that little or no interference is found in networks that have been pretrained with a simple and relevant knowledge base. Thus, catastrophic interference is eliminated when critical aspects of simulations are made to be more analogous to the corresponding human situation. Thanks again to Jordan Pollack for maintaining this electronic library. An example of how to retrieve mcrae.pretrained.ps.Z: your machine> ftp archive.cis.ohio-state.edu Connected to archive.cis.ohio-state.edu. 220 archive FTP server (Version 6.15 Thu Apr 23 15:28:03 EDT 1992) ready. Name (archive.cis.ohio-state.edu:kenm): anonymous 331 Guest login ok, send e-mail address as password. Password: 230 Guest login ok, access restrictions apply. ftp> cd pub/neuroprose 250-Please read the file README 250- it was last modified on Mon Feb 17 15:51:43 1992 - 357 days ago 250-Please read the file README~ 250- it was last modified on Wed Feb 6 16:41:29 1991 - 733 days ago 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get mcrae.pretrained.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for mcrae.pretrained.ps.Z (129046 bytes). 226 Transfer complete. local: mcrae.pretrained.ps.Z remote: mcrae.pretrained.ps.Z 129046 bytes received in 30 seconds (4.2 Kbytes/s) ftp> quit 221 Goodbye. your machine> uncompress mcrae.pretrained.ps.Z your machine> then print the file  From kolen-j at cis.ohio-state.edu Tue Feb 9 13:31:43 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 13:31:43 -0500 Subject: Test & Derivatives in Backprop Message-ID: <9302091831.AA00142@pons.cis.ohio-state.edu> [I hope that this makes it to connectionists, the last couple of postings haven't made it back. So I have summarized these replies in one message for general consumption.] Regarding the latest talk about derivatives in backprop, I had looked into replacing the different mathematical operations with other, more implementation-amenable operations. This included replacing the derivative of the squashing function with d(x)=min(x,1-x). The results of these tests show that backprop is pretty stable as long as the qualitative shape of the operations are maintained. If you replace the derivative with a constant or linear (wrt activation) function it doesn't work at all for the learning tasks I considered. As long as the derivative replacement is minimal in the extreme activations and maximal at 0.5 (wrt the traditional sigmoid), the operation will not suffer dramatically. After reading Fahlman's observation about loosing bits to noise I had the following response. Bits come from binary decisions. Analog systems don't do that in normal processing, normally some continuous value affects another continuous value. No where do they perform A/D conversion and then operate on the bits. If there is no measurement device, then talking about bits doesn't make sense. John Kolen  From guy at cs.uq.oz.au Tue Feb 9 17:25:35 1993 From: guy at cs.uq.oz.au (guy@cs.uq.oz.au) Date: Wed, 10 Feb 93 08:25:35 +1000 Subject: Does backprop need the derivative ?? Message-ID: <9302092225.AA06661@client> The question has been asked whether the full derivative is needed for backprop to work, or whether the sign of the derivative is sufficient. As far as I am aware, the discussion has not defined at what point the derivative is truncated to +/-1. This might occur (1) for each input/output pair when the error is fed into the output layer, (2) in epoch based learning, the exact derivative of each weight over the training set might be computed, but the update to the weight truncated, or (3...) many intermediate cases. I believe one problem with limited precision weights is as follows. The magnitude of the update may be smaller than the limit of precision on the weight (which has much greater magnitude). If the machine arithmetic then rounds the updated weight to the nearest representable value, the updated weight will be rounded to its old value, and no learning will occur. I am co-author of a technical report which addressed this problem. In our algorithm, weights had very limited precision but their derivatives over the whole training set were computed exactly. The weight update step would shift the weight value to the next representable value with a probability proportional to the size of the derivative. In our inexhaustive testing, we found that very limited precision weights and activations could be used. The technical report is available in hardcopy (limited numbers) and postscript. My addresses are "guy at cs.uq.oz.au" and "Guy Smith, Department of Computer Science, The University of Queensland, St Lucia 4072, Australia". Guy Smith.  From meng at spring.kuee.kyoto-u.ac.jp Wed Feb 10 11:58:19 1993 From: meng at spring.kuee.kyoto-u.ac.jp (meng@spring.kuee.kyoto-u.ac.jp) Date: Wed, 10 Feb 93 11:58:19 JST Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Sun, 07 Feb 93 13:02:42 EST <9302091925.AA12414@ntt-sh.ntt.jp> 9 Feb 93 1:36:51 EST 9 Feb 93 1:35:08 EST 7 Feb 93 13:03:24 EST Message-ID: <9302100258.AA20634@spring.kuee.kyoto-u.ac.jp> Thinking about it, it seems that the derivative always can be replaced by a sufficiently small constant. I.e., for a certain training set and a certain requirement of precision on the ouput units, you can find a constant that is smaller than a certain constant that, with the same starting point, will find the same minimum for the same network as an algorithm that is using the derivative. The problem with this of course is that the constant may be so small that the training time may be prohibitive, while the motivation to such a constant is to speed up training. The reason that this works in a lot of instances is, I think, that the requirement of precision is wide enough to let the network jump into a region that is sufficiently close to a minimum. A situation where it wouldn't work, would be a situation where the network is moving in the right direction, but jumping too far, i.e. jumping from one side of a valley to the other alternately, never landing within a region that would give convergence within the requirements set. The use of the derivative solves this by getting smaller when approaching a minimum. Another possibility is that using a constant the network might settle in another minimum (or try to settle in another ("wider") minimum) by virtue of "seeing" the error surface as more coarse grained than the version using a derivative. In some cases, if you're lucky (i.e. has a good initial state in relation to a minimum and the constant you're using) you might hit bull's eye, with another initial state you might be oscillating around the solution (i.e. having the error go up and down without getting within the required limit). In such a case you could switch to using the derivative or simply decrease the constant (maybe how much could be computed on the basis of the increase in error? Just an idea). These are just some thoughts on the subject, no empirical study undertaken. Tore  From "MVUB::BLACK%hermes.mod.uk" at relay.MOD.UK Fri Feb 12 09:50:00 1993 From: "MVUB::BLACK%hermes.mod.uk" at relay.MOD.UK (John V. Black @ DRA Malvern) Date: Fri, 12 Feb 93 14:50 GMT Subject: IEE Third International Conference on ANN's (Registration Announcement) Message-ID: CONFERENCE ANNOUNCEMENT ======================= IEE Third International Conference on Artificial Neural Networks Brighton, UK, 25-27 May 1993. ----------------------------------------------------- This conference, organised by the Institute of Electrical Engineers will cover up-to-date reports on the curent state of research on Artificial Neural Networks, including theoretical understanding of fundamental structures, learning algorithms, implementation and applications. Over 70 papers willl be presented in formal and poster sessions under the following headings APPLICATIONS ARCHITECTURES VISION CONTROL & ROBOTICS MEDICAL SYSTEMS NETWORK ANALYSIS In addition there will be a small exhibition and publishers display, Civic Reception and Conference Dinner. Registration fees are as follows: Member(IEE/associated societies) 235 pounds sterling (inc 35 pounds vat) Non-member 294 " " (inc 43.79 " ") Research Student or Retired 83 " " (inc 12.36 " ") Further information including full programme available from Sheila Griffiths ANN93 Secretariat Conferemce Services Institute of Electrical Engineeers Savoy Place London WC2R 0BL, UK Tel: 071 344 5478/5477 Fax: 071 497 3633 Telex: 261776 IEE LDN G John Black (jvb%hermes.mod.uk at relay.mod.uk) E-mailing for David Lowe  From kolen-j at cis.ohio-state.edu Fri Feb 12 08:11:58 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Fri, 12 Feb 93 08:11:58 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: Mark Evans's message of Thu, 11 Feb 93 10:26:03 GMT <3468.9302111026@it-research-institute.brighton.ac.uk> Message-ID: <9302121311.AA20446@pons.cis.ohio-state.edu> When I used the term stable in my previous posting, I did not entail the mathematical notion of stability when applied to a control system. What I meant was the apparent behavior of the network, learning a set of associations of patterns, was unaffected by quantitative changes in these operations. An analogy I often use is the symbolic dynamics of unimodal iterated function systems. As long as small number of qualitative conditions are true, then the system will exhibit the same symbol dynamics as other functions for which the conditions hold regardless of the numerical differences between functions. Thus the bifurcation diagrams of rx(1-x) and a bump made up of sigmoids will exhibit the same type of period doubling cascaded. Even if it wasn't mathematically stable, but was guaranteed to pass through a region of weight space with usable weights, most of the NN community would find it useful. John  From shim at marlin.nosc.mil Fri Feb 12 13:00:08 1993 From: shim at marlin.nosc.mil (Randy L. Shimabukuro) Date: Fri, 12 Feb 93 10:00:08 -0800 Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302121800.AA01359@marlin.nosc.mil> Congratulations on initiating a very lively discussion. From reading the responses though, it appears that people are interpreting your question differently. At the risk of adding to the confusion let me try to explain. It seems that some people are talking about the derivative of the transfer function (F') and while others are talking about the gradient of the error function. We have looked at both cases: We approximate F' in a manner similar to that suggested by George Bolt. Letting F'(|x|) -> 1 for |x| a for |x|>=r. Where a is a small positive constant, and r is a point where F'(r) is approximately 1. We have also, in a sense, approximated the gradient of the error function by quantizing the weight updates. This is similar to what Peterson and Hartman call "Manhattan updating". In this case it is important to preserve the sign of the derivative. We have found that the first type of approximation has very little effect of back propagation. Depending on the problem, the second type sometimes shortens the learning time and sometimes prevents the network from learning. In some cases it helps to decrease the size of the updates as learning progresses. Randy Shimabukuro  From hartman%pav.mcc.com at mcc.com Sat Feb 13 17:36:04 1993 From: hartman%pav.mcc.com at mcc.com (E. Hartman) Date: Sat, 13 Feb 93 16:36:04 CST Subject: Re. does bp need the derivative? Message-ID: <9302132236.AA01583@energy.pav.mcc.com> Re. the question of the derivative in backprop, Javier Movellan and Randy Shimabukuro mentioned the "Manhattan updating" dicussed in Peterson and Hartman ("Explorations of the Mean Field Theory Learning Algorithm", Neural Networks Vol.2 pp 475-494 1989). This technique computes the gradient exactly, but then keeps only the signs of the components and takes fixed-size weight steps (each weight is changed by a fixed amount, either up or down). We used this technique to advantage, both in backprop and mean field theory nets, on problems with inconsistent data -- data containing exemplars with identical inputs but differing outputs (one-to-many mapping). (The problem in the paper was a classification problem drawn from overlapping gaussian distributions). The reason that this technique helped on this kind of problem is the following. Since the data was highly inconsistent, we found that before taking a step in weight space, it helped to average out the data inconsistencies by accumulating the gradient over a large number of patterns (large batch training). But, typically, it happens that some components of the gradient don't "average out" nicely and instead became very large. So the components of the gradient vary greatly in magnitude, which makes choosing a good learning rate difficult. "Manhattan updating" makes all the components equal in magnitude. We found it necessary to slowly reduce the step size as training proceeds. Eric Hartman  From marwan at sedal.su.oz.au Sat Feb 13 03:03:55 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Sat, 13 Feb 1993 19:03:55 +1100 Subject: Test & Derivatives in Backprop Message-ID: <9302130803.AA03429@sedal.sedal.su.OZ.AU> > From: john kolen > > [I hope that this makes it to connectionists, the last couple of postings > haven't made it back. So I have summarized these replies in one message > for general consumption.] > > Regarding the latest talk about derivatives in backprop, I had looked into > replacing the different mathematical operations with other, more > implementation-amenable operations. This included replacing the > derivative of the squashing function with d(x)=min(x,1-x). The results of > these tests show that backprop is pretty stable as long as the qualitative > shape of the operations are maintained. If you replace the derivative with > a constant or linear (wrt activation) function it doesn't work at all for > the learning tasks I considered. As long as the derivative replacement is > minimal in the extreme activations and maximal at 0.5 (wrt the traditional > sigmoid), the operation will not suffer dramatically. > > After reading Fahlman's observation about loosing bits to noise I had the > following response. Bits come from binary decisions. Analog systems > don't do that in normal processing, normally some continuous value affects > another continuous value. No where do they perform A/D conversion and then > operate on the bits. If there is no measurement device, then talking about > bits doesn't make sense. > > John Kolen > Are we talking about analog implementations? I hope so because I am. If not, then forget this message. The derivative issue boils down to whether you can implement cheaply, whatever is the approximation. The implication on the training speed depends on how good your gradient approximations are. The bit-width issue boils down to how you will implement your storage (weights). Whether you use analog EEPROM, RAM converted with DACs or whatever, you have to deal with bit effects. Except if you have a new analog high precision storage device that can be implemented cheaply, in which case I will be eager to learn about. If you have the analog dream device, then your next problem in analog implementation is the signal/noise ratio. Except if your analog circuits are noisyless. Marwan ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From miller at picard.ads.com Mon Feb 15 11:32:44 1993 From: miller at picard.ads.com (Kenyon Miller) Date: Mon, 15 Feb 93 11:32:44 EST Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302151632.AA03270@picard.ads.com> Paul Munro writes: > [3] This implies that the signs of the errors is adequate to reduce > the error, assuming the learning rate is sufficiently small, > since any two vectors with all components the same sign > must have a positive inner product! [They lie in the same > orthant of the space] I beleive a critical point is being missed, that is, the derivative is being replaced by sign at every stage in applying the chain rule, not just to the initial backpropagation of the error. Consider the following example: ----n2----- / \ w--n1 n4 \ / ----n3----- In other words, there is an output neuron n4 which is connected to two neurons n2 and n3, each of which is connected to neuron n1, which has a weight w. Suppose the weight connecting n2 to n4 is negative and all other connections in the diagram are positive. Suppose further that n2 is saturated and none of the other neurons are saturated. Now, suppose that n4 must be decreased in order to reduce the error. Backpropagating along the n4-n2-n1 path, w receives an error term which would tend to increase n1, while backpropagating along the n4-n3-n1 path would result in a term which would tend to decrease n1. If the true sigmoid derivative were used, the force to increase n1 would be dampened because n2 is saturated, and the net result would be to increase w and therefore increase n1 and n3 and decrease n4. However, replacing the sigmoid derivative with a constant could easily allow the n4-n2-n1 path to dominate, and the error at the output would increase. Thus, it is not a sound thing to do regardless of how many patterns are used for training. -Ken Miller.  From kanal at cs.UMD.EDU Mon Feb 15 12:35:27 1993 From: kanal at cs.UMD.EDU (Laveen N. Kanal) Date: Mon, 15 Feb 93 12:35:27 -0500 Subject: non-Turing machines? Message-ID: <9302151735.AA10355@mimsy.cs.UMD.EDU> I have only tuned into part of the quantum computers discussion and so I don't know if the following references have been mentioned in the discussion. Having speculated about natural perception not being modelable by Turing machines, I was not surprised to find similar speculation in the book Renewing Philosophy by Hilary Putnam (Harvard Univ. Press, 1992) which I picked up at the bookstore the other day. But Putnam does cite two specific refrences which may be of interest in this context. Marian Boykan Pour-El and Ian Richards, " The Wave Equation with Computable Initial Data Such That Its Unique Solution Is Not Computable," Advances in Mathematics, 39 (1981) p. 215-239 Georg Kreisel's review of the above paper in The Journal of Symbolic Logic, 47, No. 4, (1982) p. 900-902.  From ala at sans.kth.se Tue Feb 16 07:51:57 1993 From: ala at sans.kth.se (Anders Lansner) Date: Tue, 16 Feb 1993 13:51:57 +0100 Subject: MCPA'93 Call for Contributions Message-ID: <199302161251.AA02772@occipitalis.sans.kth.se> MCPA'93 Final Call **************************************************************************** * Invitation to * * International Workshop on Mechatronical Computer Systems * * for Perception and Action, June 1-3, 1993 * * Halmstad University, Sweden * * * * Final Call for Contributions * **************************************************************************** Mechatronical Computer Systems that Perceive and Act - A New Generation ======================================================================= Mechatronical computer systems, which we will see in advanced products and production equipment of tomorrow, are designed to do much more than calculate. The interaction with the environment and the integration of computational modules in every part of the equipment, engaging in every aspect of its functioning, put new, and conceptually different, demands on the computer system. A development towards a complete integration between the mechanical system, advanced sensors and actuators, and a multitude of process- ing modules can be foreseen. At the systems level, powerful algorithms for perceptual integration, goal-direction and action planning in real time will be critical components. The resulting action-oriented systems may interact with their environments by means of sophisticated sensors and actua- tors, often with a high degree of parallelism, and may be able to learn and adapt to different circumstances and environ- ments. Perceiveing the objects and events of the external world and acting upon the situation in accordance with an appropriate behaviour, whether programmed, trained, or learned, are key functions of these, next generation, compu- ter systems. The aim of this first International Workshop on Mechatronical Computer Systems for Perception and Action is to gather researchers and industrial development engineers, who work with different aspects of this exciting new generation of com- puting systems and computer-based applications, to a fruitful exchange of ideas and results and, often interdisciplinary, dis- cussions. Workshop Form ============= One of the days of the workshop will be devoted to true work- shop activities. The objective is to identify and propose research directions and key problem areas in mechatronical computing systems for perception and action. In the morning session, invited speakers, as well as other workshop dele- gates, will give their perspectives on the theme of the work- shop. The work will proceed in smaller working groups during the afternoon, after which the conclusions will be presented in a plenary session. The scientific programme will also include presentations of research results in oral or poster form, or as demonstrations. Subject Areas ============= Relevant subject areas are e.g.: Real-Time Systems Architecture and Real-Time Software. Sensor Systems and Sensory/Motor Coordination. Biologically Inspired Systems. Applications of Unsupervised and Reinforcement Learning. Real-Time Decision Making and Action Planning. Parallel Processor Architectures for Embedded Systems. Development Tools and Support Systems for Mechatronical Computer Systems and Applications. Dependable Computer Systems. Robotics and Machine Vision. Neural Networks in Real-Time Applications. Advanced Mechatronical Computing Demands in Industry. Contributions to the Workshop ============================= The programme committee welcomes all kinds of contribu- tions - papers to be presented orally or as posters, demon- strations, etc. - in the areas listed above, as well as other areas of relevance to the theme of the workshop. >From the workshop point of view, it is NOT essential that con- tributions contain only new, unpublished results. Rather, the new, interdisciplinary collection of delegates that can be expected at the workshop may motivate presentations of ear- lier published results. Specifically, we invite delegates to state their view of the workshop theme, including identification of key research issues and research directions. The planning of the workshop day will be based on these submitted statements , some of which will be presented in the plenary session, some of which in the smaller working groups. DEADLINES ========= Febr. 26, 1993: Submissions of extended abstracts or full papers. Submissions of statements regarding perspectives on the conference theme, that the delegate would like to present at the workshop (4 pages max). Submissions of descriptions of demonstrations, etc. March 19, 1993: Notification of acceptance. Preliminary final programme. May 1, 1993: Final papers and statements. All submissions shall be sent to the workshop secretariat, see address box. Please send two copies. Submissions must include name(s) and affiliation(s) of author(s) and full address, including phone and fax number and electronic mail address (if possible). The accepted papers and statements will be assembled into a Proceedings book given to the Workshop attendees. After the workshop a revised version of the proceedings, including results of the workshop discussions, will be published by an international publisher. Invited speakers ================ Prof. John A. Stankovic, University of Massachusetts, USA, and Scuola Superiore S. Anna, Pisa, Italy: "Major Real-Time Challenges for Mechatronical Systems" Prof. Jan-Olof Eklundh, CVAP, Royal Institute of Technology, Stockholm, Sweden: "Computer Vision and Seeing Systems" Prof. Dave Cliff, School of Cognitive and Computing Sciences and Neuroscience IRC, University of Sussex, U.K. "Animate Vision in an Artificial Fly: A Study in Computational Neuroethology" & "Visual Sensory-Motor Networks Without Design: Evolving Visually Guided Robots" (More invited speakers to be confirmed.) ORGANISERS ========== The workshop is arranged by CCA, the Centre for Computer Architecture at Halmstad University, Sweden, in cooperation with the DAMEK Mechatronics Research Group and the SANS (Studies of Artificial Neural Systems) Research Group, both at the Royal Institute of Technology (KTH), Stockholm, Sweden, and the Department of Computer Engineering, Chalmers University of Technology, Gothenburg, Sweden. The Organising Committee includes: Lars Bengtsson, CCA, Organising Chair Anders Lansner, SANS Kenneth Nilsson, CCA Bertil Svensson, Chalmers University of Technology and CCA, Programme and Conference Chair Per-Arne Wiberg, CCA Jan Wikander, DAMEK The workshop is supported by SNNS, the Swedish Neural Network Society. It is financially supported by Halmstad University, the County Administration of Halland, Swedish industries and NUTEK (the Swedish National Board for Industrial and Technical Development). Programme Committee =================== Bertil Svensson, Sweden (chair) Paolo Ancilotti, Italy Lars Bengtsson, Sweden Giorgio Buttazzo, Italy Robert Forchheimer, Sweden Anders Lansner, Sweden Kenneth Nilsson, Sweden John Stankovic, Italy and USA Jan Torin, Sweden Hendrik van Brussel, Belgium Per-Arne Wiberg, Sweden Jan Wikander, Sweden Workshop Language: English Workshop fee: SEK 2 000, incl. proceedings, lunch- eons, reception and workshop dinner. Early registration (before April 20) SEK 1750. The number of attendees to the workshop is limited. Among those not submitting a contribution attendance will be given on a first-come, first-served basis. Social Activities ================= Reception, workshop dinner. Deep sea fishing tour or a visit at Varberg castle/fortress and museum. Bring your family, a programme for accompanying persons will be arranged. How to get there ================ Halmstad is situated on the west coast of Sweden between Copenhagen and Gothenburg (major international airports). With a distance of 150 kilometres to each of these cities it is easy and convenient to reach Halmstad by train, bus or car. Halmstad Airport is linked to Stockholm International Airport (Arlanda). Flight time Stockholm - Halmstad is 50 minutes. Accomodation ============ Arrangements will be made with local hotels, both downtown Halmstad and at the seaside. Different price categories will be available. Please let us know what price category and loca- tion you prefer and we help you with the booking. Payment is made directly to the hotel. Prices (breakfast included) in SEK: CATEGORY 1: SEK 750-850 single room, 750-950 double room Downtown Single room ( ) Double room ( ) Seaside Single room ( ) Double room ( ) CATEGORY 2: SEK 400 single room, 450 double room Near town Single room ( ) Double room ( ) Transportation between the hotels and the University will be arranged. ( ) I register already now. Send preliminary programme when available. ( ) I do not register yet but want the preliminary programme when available. Name ................................................... ...................................................... Address................................................. ....................................................... ....................................................... Tel., Fax, e-mail .................................... ........................................................ ------------------------------------------------------------------------- MCPA Workshop Centre for Computer Architecture Halmstad University Box 823 S-30118 HALMSTAD Sweden Tel. +46 35 153134 (Lars Bengtsson) Fax. +46 35 157387 email: mcpa at cca.hh.se ------------------------------------------------------------------------ END OF MESSAGE  From harris at ai.mit.edu Tue Feb 16 18:50:28 1993 From: harris at ai.mit.edu (John G. Harris) Date: Tue, 16 Feb 93 18:50:28 EST Subject: Postdoc position in computational/biological vision (learning) Message-ID: <9302162350.AA05713@portofino> One (or possibly two) postdoctoral positions are available for one or two years in computational vision starting September 1993 (flexible). The postdoc will work in Lucia Vaina's laboratory at Boston University, College of Engineering, to conduct research in learning the direction in global motion. The researchers currently involved in this project are Lucia M. Vaina, John Harris, Charlie Chubb, Bob Sekuler, and Federico Girosi. Requirements are PhD in CS or related area with experience in visual modeling or psychophysics. Knowledge of biologically relevant neural models is desirable. Stipend ranges from $28,000 to $35,000 depending upon qualifications. Deadline for application is March 1, 1993. Two letter of recommendation, description of current research and an up to date CV are required. In the research we combine computational psychophysics, neural networks modeling and analog VLSI to study visual learning specifically applied to direction in global motion. The global motion problem requires estimation of the direction and magnitude of coherent motion in the presence of noise. We are proposing a set of psychophysical experiments in which the subject, or the network must integrate noisy, spatially local motion information from across the visual field in order to generate a response. We will study the classes of neural networks which best approximate the pattern of learning demonstrated in psychophysical tasks. We will explore Hebbian learning, multilayer perceptrons (e.g. backpropagation), cooperative networks, Radial Basis Function and Hyper-Basis Functions. The various strategies and their implementation will be evaluated on the basis of their performance and their biological plausibility. For more details, contact Prof. Lucia M. Vaina at vaina at buenga.bu.edu or lmv at ai.mit.edu.  From learn at galaxy.huji.ac.il Wed Feb 17 09:37:58 1993 From: learn at galaxy.huji.ac.il (learn conference) Date: Wed, 17 Feb 93 16:37:58 +0200 Subject: Learning Days in Jerusalem Message-ID: <9302171437.AA04425@galaxy.huji.ac.il> ========== DEADLINE FOR SUBMISSIONS: March 1, 1993 ========================== THE HEBREW UNIVERSITY OF JERUSALEM THE CENTER FOR NEURAL COMPUTATION LEARNING DAYS IN JERUSALEM Workshop on Fundamental Issues in Biological and Machine Learning May 30 - June 4, 1993 Hebrew University, Jerusalem, Israel The Center for Neural Computation at the Hebrew University is a new multi- disciplinary research center for collaborative investigation of the principles underlying computation and information processing in the brain and in neuron- like artificial computing systems. The Center's activities span theoretical studies of neural networks in physics, biology and computer science; experimental investigations in neurophysiology, psychophysics and cognitive psychology; and applied research on software and hardware implementations. The first international symposium sponsored by the Center will be held in the spring of 1993, at the Hebrew University of Jerusalem. It will focus on theoretical, experimental and practical aspects of learning in natural and artificial systems. Topics for the meeting include: Theoretical Issues in Supervised and Unsupervised Learning Neurophysiological Mechanisms Underlying Learning Cognitive Psychology and Learning Psychophysics Applications of Machine and Neural Network Learning Invited speakers include: Moshe Abeles (Hebrew U.) Yann LeCun (AT&T) Aharon Agranat (Hebrew U.) Joseph LeDoux (NYU) Ehud Ahissar (Weizmann Inst.) Christoph von der Malsburg (U. Bochum) Asher Cohen (Hebrew U.) Yishai Mansour (Tel Aviv U.) Yuval Davidor (Weizmann Inst.) Bruce McNaughton (U. of Arizona) Yadin Dudai (Weizmann Inst.) Helge Ritter (U. Bielefeld) Martha Farah (U. Penn) David Rumelhart (Stanford) David Haussler (UCSC) Dov Sagi (Weizmann Inst.) Nathan Intrator (Tel Aviv U.) Menachem Segal (Weizmann Inst.) Larry Jacoby (McMaster U.) Alex Waibel (CMU, U. Karlsruhe) Michael Jordan (MIT) Norman Weinberger (U.C. Irvine) Participation in the Workshop is limited to 100. A small number of contributed papers will be accepted. Interested researchers and students are asked to submit registration forms by **** March 1, 1993,***** to: Sari Steinberg Bchiri Tel: (972) 2 584563 Center for Neural Computation Fax: (972) 2 584437 c/o Racah Institute of Physics E-mail: learn at galaxy.huji.ac.il Hebrew University 91904 Jerusalem, Israel To ensure participation, please send a copy of the registration form by e-mail or fax as soon as possible. Organizing Committee: Shaul Hochstein, Haim Sompolinsky, Naftali Tishby. -------------------------------------------------------------------------------- REGISTRATION FORM Please complete the following form. To ensure participation, please send a copy of this form by e-mail or fax as soon as possible to: Sari Steinberg Bchiri E-MAIL: learn at galaxy.huji.ac.il Center for Neural Computation TEL: 972-2-584563 c/o Racah Institute of Physics FAX: 972-2-584437 Hebrew University 91904 Jerusalem, Israel Registration will be confirmed by e-mail. CONFERENCE REGISTRATION Name: _________________________________________________________________________ Affiliation: __________________________________________________________________ Address: ______________________________________________________________________ City: __________________ State: ______________ Zip: _________ Country: ________ Telephone: (____)________________ E-mail address: ____________________________ REGISTRATION FEES ____ Regular registration (before March 1): $100 ____ Student registration (before March 1): $50 ____ Late registration (after March 1): $150 ____ Student late registration (after March 1): $75 Please send payment by check or international money order in US dollars made payable to: Learning Workshop with this form by March 1, 1993 to avoid late fee. ACCOMMODATIONS If you are interested in assistance in reserving hotel accommodation for the duration of the Workshop, please indicate your preferences below: I wish to reserve a single/double (circle one) room from __________ to __________, for a total of _______ nights. CONTRIBUTED PAPERS A very limited number of contributed papers will be accepted. Participants interested in submitting papers should complete the following and enclose a 250-word abstract. Poster/Talk (circle one) Title: __________________________________________________________________ __________________________________________________________________  From kak at max.ee.lsu.edu Wed Feb 17 13:26:28 1993 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Wed, 17 Feb 93 12:26:28 CST Subject: Reprints Message-ID: <9302171826.AA05612@max.ee.lsu.edu> Reprints of the following article are now available: ---------------------------------------------------------------- Ciruits, Systems, & Signal Processing, vol. 12, 1993, pp. 263-278 ---------------------------------------------------------------- Feedback Neural Networks: New Characteristics and a Generalization Subhash C. Kak Department of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803, USA ABSTRACT New characteristics of feedback neural networks are studied. We discuss in detail the question of updating of neurons given incomplete information about the state of the neural network. We show how the mechanism of self-indexing [Self-indexing of neural memories, Physics Letters A, Vol. 143, 293-296, 1990.] for such updating provides better results than assigning 'don't know' values to the missing parts of the state vector. Issues related to the choice of the neural model for a feedback network are also considered. Properties of a new complex valued neuron model that generalizes McCulloch-Pitts neurons are examined. ----- Note: This issue of the journal is devoted exclusively to articles on neural networks.  From radford at cs.toronto.edu Wed Feb 17 15:15:58 1993 From: radford at cs.toronto.edu (Radford Neal) Date: Wed, 17 Feb 1993 15:15:58 -0500 Subject: Paper on "A new view of the EM algorithm" Message-ID: <93Feb17.151609edt.555@neuron.ai.toronto.edu> The following paper has been placed in the neuroprose archive, as the file 'neal.em.ps.Z': A NEW VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL AND OTHER VARIANTS Radford M. Neal and Geoffrey E. Hinton Department of Computer Science University of Toronto We present a new view of the EM algorithm for maximum likelihood estimation in situations with unobserved variables. In this view, both the E and the M steps of the algorithm are seen as maximizing a joint function of the model parameters and of the distribution over unobserved variables. From this perspective, it is easy to justify an incremental variant of the algorithm in which the distribution for only one of the unobserved variables is recalculated in each E step. This variant is shown empirically to give faster convergence in a mixture estimation problem. A wide range of other variant algorithms are also seen to be possible. The PostScript for this paper may be retrieved in the usual fashion: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get neal.em.ps.Z ftp> quit unix> uncompress neal.em.ps.Z unix> lpr neal.em.ps (or however you print PostScript files) Many thanks to Jordan Pollack for providing this service! Radford Neal  From gem at cogsci.indiana.edu Thu Feb 18 09:03:23 1993 From: gem at cogsci.indiana.edu (Gary McGraw) Date: Thu, 18 Feb 93 09:03:23 EST Subject: Letter Spirit technical report available Message-ID: The following technical report from the Center for Research on Concepts and Cognition is available by ftp (only). Although the project described in the paper is not connectionism per se, it shares many of the same philosophical convictions. ---------------------------------------------------------------------- Letter Spirit: An Emergent Model of the Perception and Creation of Alphabetic Style Douglas Hofstadter & Gary McGraw The Letter Spirit project explores the creative act of artistic letter-design. The aim is to model how the $26$ lowercase letters of the roman alphabet can be rendered in many different but internally coherent styles. Viewed from a distance, the behavior of the program can be seen to result from the interaction of four emergent agents working together to form a coherent style and to design a complete alphabet: the Imaginer (which plays with the concepts behind letterforms), the Drafter (which converts ideas for letterforms into graphical realizations), the Examiner (which combines bottom-up and top-down processing to perceive and categorize letterforms), and the Adjudicator (which perceives and dynamically builds a representation of the evolving style). Creating a gridfont is an iterative process of guesswork and evaluation carried out by the four agents. This process is the ``central feedback loop of creativity''. Implementation of Letter Spirit is just beginning. This paper outlines our goals and plans for the project. --------------------------------------------------------------------------- The paper is available by anonymous ftp from: cogsci.indiana.edu (129.79.238.12) as pub/hofstadter+mcgraw.letter-spirit.ps.Z and in neuroprose: archive.cis.ohio-state.edu (128.146.8.52) as pub/neuroprose/hofstadter.letter-spirit.ps.Z Unfortunately, we are not able to distribute hardcopy at this time. *---------------------------------------------------------------------------* | Gary McGraw gem at cogsci.indiana.edu | (__) | |--------------------------------------------------| (oo) | | Center for Research on Concepts and Cognition | /-------\/ | | Department of Computer Science | / | || | | Indiana University | * ||----|| | | mcgrawg at moose.indiana.edu | ^^ ^^ | *---------------------------------------------------------------------------*  From mwitten at hermes.chpc.utexas.edu Thu Feb 18 10:00:41 1993 From: mwitten at hermes.chpc.utexas.edu (mwitten@hermes.chpc.utexas.edu) Date: Thu, 18 Feb 93 9:00:41 CST Subject: Computational Neurosciences Workshop Message-ID: <9302181500.AA03619@morpheus.chpc.utexas.edu> *********************************************************************** ** ** ** UNIVERSITY OF TEXAS SYSTEM CENTER FOR HIGH PERFORMANCE COMPUTING ** ** ** ** Workshop Series In Computational Medicine And Public Health** ** ** ** Announces ** ** ** ** A Workshop On Computational Neurosciences ** ** ** ** 14-15 May 1993 ** ** ** ** Austin, Texas ** ** ** *********************************************************************** Workshop Director: ----------------- Dr. Matthew Witten Associate Director, University of Texas System - CHPC Balcones Research Center 10100 Burnet Road, CMS 1.154 Austin, TX 78758-4497 USA Phone: (512) 471-2472 or (800) 262-2472 Fax : (512) 471-2445 email: m.witten at chpc.utexas.edu m.witten at uthermes.bitnet ***** Peliminary Program ***** List Of Current Speakers: ------------------------- Dr. Peter Fox, Director Research Imaging Center, UT HSC San Antonio Dr. Terry Mikiten, Associate Dean, Grad School of Biomedical Sciences, UT HSC San Antonio Dr. Robert Wyatt, Director, Institute For Theoretical Chemistry, UT Austin Dr. Elizabeth Thomas, Department of Chemistry, UT Austin Dr. George Adomian, Director, General Analytics Corporation, Athens, Georgia Dr. George Moore, Department of Biomedical Engineering, University of Southern California, Los Angeles, CA Dr. William Softky, California Institute of Technology, Pasadena, CA Dr. Cathy Wu, Department of Biomathematics and Computer Science, UT Health Center, Tyler, TX Dr. Dan Levine, Department of Mathematics, University of Texas at Arlington, Arlington, TX Dr. Michael Liebman, Senior Scientist, Amoco Technology Company, Naperville, Illinois Dr. George Stanford, Learning Abilities Center, UT Austin Dr. Tom Oakland, School of Education, UT Austin Dr. Matthew Witten, Associate Director, UT System - CHPC Objective, Agenda and Participants: ---------------------------------- The 1990's have been declared the Decade of the Mind. Understanding the mind requires the understanding of a wide variety of topics in the neurosciences. This Workshop is part of an ongoing series of workshops being held at the UT System Center For High Performance Computing; addressing issues of high performance computing and its role in medicine, dentistry, allied health disciplines, and public health. Prior workshops have covered Computational Chemistry and Molecular Design, and Computational Issues in the Life Sciences and Medicine. Upcoming workshops will focus on the subject areas of Computational Molecular Biology and Genetics, Biomechanics, and Physiological Modeling and Simulation. The purpose of this Workshop On Computational Neurosciences is to bring together interested scientists for the purposes of introducing them to state-of-the-art thinking and applications in the domain of neuroscience. Topics to be discussed range across the disciplines of neurosimulation, cognitive neuroscience, neural nets and their theory/application to a variety of problems, methods for solving numerical problems arising in neurology, learning abilities and disabilities, and neurological imaging. Lectures will be presented in a tutorial fashion, and time for questions and answers will be allowed. Attendence is open to anyone. A background in the neurosciences is not required. The size of the workshop is limited due to seating constraints. It is best to register as soon as possible. Schedule: -------- 14 May 1993 - Friday 8:00am - 9:00am Registration and Refreshments 9:00am - 9:15am Opening Remarks - Dr. James C. Almond, Director, UT System CHPC 9:15am - 10:00am Conference Overview - Dr. Matthew Witten 10:00am - 11:00am Dr. Peter Fox 11:00am - 11:30am Coffee Break 11:30am - 12:30pm Dr. Dan Levine 12:30pm - 1:30pm Lunch Break 1:30pm - 2:30pm Dr. Michael Liebman 2:30pm - 3:30pm Dr. Cathy Wu 3:30pm - 4:00pm Coffee Break 4:00pm - 5:00pm Dr. Terry Mikiten 15 May 1993 - Saturday 8:00am - 9:00am Registration and Refreshments 9:00am - 10:00am Dr. George Moore 10:00am - 11:00am Dr. Robert Wyatt and Dr. Elizabeth Thomas 11:00am - 11:30am Coffee Break 11:30am - 12:30pm Dr. George Adomian 12:30am - 1:30pm Lunch Break 1:30am - 2:30pm Dr. George Stanford and Dr. Tom Oakland 2:30am - 3:30pm Dr. William Softky 3:30pm - 4:00pm Coffee Break 4:00pm - 5:00pm Closing Discussion and Remarks Poster Sessions: ---------------- While no poster sessions are planned, if enough conference participants indicate a desire to present a poster, we will make every attempt to accommodate the requests. If you are interested in presenting a poster presentation at this meeting, please contact the workshop director. Conference Proceedings: ---------------------- We will make every attempt to have a publication quality conference proceedings. All of the speakers have been asked to submit a paper covering the talk material. The proceedings will appear as a special issue of the series Advances In Mathematics And Computers In Medicine, which is part of the International Journal of Computers and Mathematics With Applications (Pergamon Press). Individuals wishing to have an appropriate paper included in this proceedings should contact the workshop director for manuscript details and deadlines. Conference Costs and Funding: ----------------------------- A nominal registration fee of US $50.00 will be charged by 1 April 93, and US $60.00 after that date. The conference proceedings will be an additional US $10.00 . The conference registration fee includes luncheon and refreshments for both days of the workshop. Accomodations: ------------- There are a number of very reasonable hotels near the UT System CHPC. Additional information may be obtained by contacting the workshop coordinator at the address below. Registration and Information: ---------------------------- Registration requests and further questions should be directed to: Ms. Leslie Bockoven Administrative Associate Workshop On Computational NeuroSciences UT System - CHPC Balcones Research Center 10100 Burnet Road, CMS 1.154 Austin, TX 78758-4497 Phone: (512) 471-2472 or (800) 262-2472 Fax : (512) 471-2445 Email: neuro93 at chpc.utexas.edu neuro93 at uthermes.bitnet ============ REGISTRATION FORM FOLLOWS - CUT HERE ========== NAME (As will appear on badge): AFFILIATION (As will appear on badge): ADDRESS: PHONE: FAX : EMAIL: Please answer the following questions as appropriate: Do you wish to purchase a copy of the conference proceedings? If yes, make sure to include the proceedings purchase fee. Do you have any special dietary requirements? If yes, what are they? Do you wish to present a poster? If yes, what will the proposed title be? Do you wish to include a manuscript in the conference proceedings? If yes, what will the proposed topic be? Do you wish to be on our Workshop Series mailing list? If yes, please give the address for announcements (email is okay) Do you need a hotel reservation? Do you anticipate needing local transportation? ==================== END OF REGISTRATION FORM ============================  From gary at psyche.mit.edu Wed Feb 17 18:42:21 1993 From: gary at psyche.mit.edu (Gary Marcus) Date: Wed, 17 Feb 93 18:42:21 EST Subject: MIT Center for Cognitive Science Occasional Paper #47 Message-ID: <9302172342.AA04329@psyche.mit.edu> Would you please post the following announcement? Thank you very much. Sincerely, Gary Marcus ---- The following technical report is now available: MIT CENTER FOR COGNITIVE SCIENCE OCCASIONAL PAPER #47 German Inflection: The Exception that Proves the Rule Gary F. Marcus MIT Ursula Brinkmann Max-Planck-Institut fuer Psycholinguistik Harald Clahsen Richard Wiese Andreas Woest Universit at act[c]t D at act[y]sseldorf. Steven Pinker MIT ABSTRACT Language is often explained by generative rules and a memorized lexicon. For example, most English verbs take a regular past tense suffix (ask-asked), which is applied to new verbs (faxed, wugged), suggesting the mental rule "add -d to a Verb." Irregular verbs (break-broke, go-went) would be listed in memory. Connectionists argue instead that a pattern associator memory can store and generalize all past tense forms; irregular and regular patterns differ only because of their different numbers of verbs. We present evidence that mental rules are indispensible. A rule concatenates a suffix to a symbol for verbs, so it does not require access to memorized verbs or their sounds, but applies as the "default," whenever memory access fails. We find 20 such circumstances, including novel, unusual-sounding, and derived words; in every case, people inflect them regularly (explaining quirks like flied out, sabre-tooths, walkmans). Contrary to connectionist accounts, these effects are not due to regular words being in the majority. The German participle -t and plural -s apply to minorities of words. Two experiments eliciting ratings of novel German words show that the affixes behave like their English counterparts, as defaults. Thus default suffixation is not due to numerous regular words reinforcing a pattern in associative memory, but to a memory-independent, symbol-concatenating mental operation. --------------------------------------------------------------------------- Copies of the postscript file german.ps.Z may be obtained electronically from psyche.mit.edu as follows: unix-1> ftp psyche.mit.edu (or ftp 18.88.0.85) Connected to psyche.mit.edu. Name (psyche:): anonymous 331 Guest login ok, sent ident as password. Password: yourname 230 Guest login ok, access restrictions apply. ftp> cd pub 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get german.ps.Z 200 PORT command successful. 150 Opening data connection for german.ps.Z (18.88.0.154,1500) (253471 bytes). 226 Transfer complete. local: german.ps.Z remote: german.ps.Z 166433 bytes received in 4.2 seconds (39 Kbytes/s) ftp> quit unix-2> uncompress german.ps.Z unix-3> lpr -P(your_local_postscript_printer) german.ps Or, order a hardcopy by sending your physical mail address to Eleanor Bonsaint (bonsaint at psyche.mit.edu), asking for Occasional Paper #47, Please do this only if you cannot use the ftp method described above.  From josh at faline.bellcore.com Thu Feb 18 10:59:52 1993 From: josh at faline.bellcore.com (Joshua Alspector) Date: Thu, 18 Feb 93 10:59:52 EST Subject: Workshop on applications of neural networks to telecommunications Message-ID: <9302181559.AA02043@faline.bellcore.com> CALL FOR PAPERS International Workshop on Applications of Neural Networks to Telecommunications Princeton, NJ October 18-20, 1993 You are invited to submit a paper to an international workshop on applications of neural networks to problems in telecommunications. The workshop will be held in Princeton, New Jersey on October, 18-20 1993. This workshop will bring together active researchers in neural networks with potential users in the telecommunications industry in a forum for discussion of applications issues. Applications will be identified, experiences shared, and directions for future work explored. Suggested Topics: Application of Neural Networks in: Network Management Congestion Control Adaptive Equalization Speech Recognition Security Verification Language ID/Translation Information Filtering Dynamic Routing Software Reliability Fraud Detection Financial and Market Prediction Adaptive User Interfaces Fault Identification and Prediction Character Recognition Adaptive Control Data Compression Please submit 6 copies of both a 50 word abstract and a 1000 word summary of your paper by May 14, 1993. Mail papers to the conference administrator: Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com Abstract and Summary Due: May 14 Author Notification of Acceptance: June 18 Camera-Ready Copy of Paper Due: August 13 Organizing Committee: General Chair Josh Alspector Bellcore, MRE 2P-396 445 South St. Morristown, NJ 07960-6438 (201) 829-4342 josh at bellcore.com Program Chair Rod Goodman Caltech 116-81 Pasadena, CA 91125 (818) 356-3677 rogo at micro.caltech.edu Publications Chair Timothy X Brown Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07960-6438 (201) 829-4314 timxb at faline.bellcore.com Treasurer Anthony Jayakumar, Bellcore Events Coordinator Larry Jackel, AT&T Bell Laboratories University Liaison S Y Kung, Princeton INNS Liaison Bernie Widrow, Stanford University IEEE Liaison Steve Weinstein, Bellcore Industry Liaisons Miklos Boda, Ellemtel Atul Chhabra, NYNEX Michael Gell, British Telecom Lee Giles, NEC Thomas John, Southwest Bell Adam Kowalczyk, Telecom Australia Conference Administrator Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- International Workshop on Applications of Neural Networks to Telecommunications Princeton, NJ October 18-20, 1993 Registration Form Name: _____________________________________________________________ Institution: __________________________________________________________ Mailing Address: ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ Telephone: ______________________________ Fax: ____________________________________ E-mail: _____________________________________________________________ I will attend | | Send more information | | Paper enclosed | | Registration Fee Enclosed ($350) | | (please make sure your name is on the check) Registration includes Monday night reception, Tuesday night banquet, and proceedings available at the conference. Mail to: Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com Deadline for submissions: May 14, 1993 Author Notification of Acceptance: June 18, 1993 Camera-Ready Copy of Paper Due: August 13, 1993  From miller at picard.ads.com Thu Feb 18 11:51:18 1993 From: miller at picard.ads.com (Kenyon Miller) Date: Thu, 18 Feb 93 11:51:18 EST Subject: correction to backprop example Message-ID: <9302181651.AA02454@picard.ads.com> For those of you who have lost interest in the backprop debate about replacing the sigmoid derivative with a constant, please disregard this message. It was recently pointed out to me that my backprop example was incomplete (I don't know the name of the sender): > The error need not be increased although w increased because W1-3 decreased > and W3-4 decreased. With 2 decreases and 1 increase, one could still expect > the N4 to decrease and also the error. > Rgds, > TH My original example (with typographical corrections) was: Consider the following example: ----n2----- / \ w--n1 n4 \ / ----n3----- In other words, there is an output neuron n4 which is connected to two neurons n2 and n3, each of which is connected to neuron n1, which has a weight w. Suppose the weight connecting n2 to n4 is negative and all other connections in the diagram are positive. Suppose further that n2 is saturated and none of the other neurons are saturated. Now, suppose that n4 must be decreased in order to reduce the error. Backpropagating along the n4-n2-n1 path, w receives an error term which would tend to increase n1, while backpropagating along the n4-n3-n1 path would result in a term which would tend to decrease n1. If the true sigmoid derivative were used, the force to increase n1 would be dampened because n2 is saturated, and the net result would be to decrease w and therefore decrease n1, n3, n4, and the error. However, replacing the sigmoid derivative with a constant could easily allow the n4-n2-n1 path to dominate, and the error at the output would increase. The conclusion was that replacing the sigmoid derivative with a constant can result in increasing the error, and is therefore undesireable. CORRECTION TO THE EXAMPLE: The original example did not take into account the perturbation on W1-3 and W3-4, but the argument still holds with the following modification. Whatever the perturbation on W1-3 and W3-4, there exists (or at least a situation can be constructed such that there exists) some positive perturbation on w which will counteract those perturbations and result in an increase in the output error. Now replicate the n1-n2-n4 path as necessary by adding an n1-n5-n4 path, an n1-n6-n4 path etc. Each new path results in incrementing w by some constant delta, so there must exist some number of paths which results in a sufficient increase in w to cause an increase in the output error of the network. Thus, an example can be constructed in which the error increases, so the method cannot be considered theoretically sound. However, you can get virtually all of the benefit without any of the theoretical problems by using the derivative of the piecewise-linear function ------------------- / / / --------- which involves using a constant or zero for the derivative, depending on a simple range test. -Ken Miller.  From georgiou at silicon.csci.csusb.edu Thu Feb 18 13:04:55 1993 From: georgiou at silicon.csci.csusb.edu (George M. Georgiou) Date: Thu, 18 Feb 1993 10:04:55 -0800 Subject: Multivalued and Continuous Perceptrons (Preprint) Message-ID: <9302181804.AA24680@silicon.csci.csusb.edu> Rosenblatt's Percepceptron Theorem guaranties us that a linearly separable function (R^n --> {0,1}) can be learned in finite time. Question: Is it possible to guarantee learning of a continuous-valued function (R^n --> (0,1)) which can be represented on a perceptron in finite time? This paper answers this question (and other ones too) in the affirmative: The Multivalued and Continuous Perceptrons by George M. Georgiou Rosenblatt's perceptron is extended to (1) a multivalued perceptron and (2) to a continuous-valued perceptron. It shown that any function that can be represented by the multivalued perceptron can be learned in a finite number of steps, and any function that can be represented by the continuous perceptron can be learned with arbitrary accuracy in a finite number of steps. The whole apparatus is defined in the complex domain. With these perceptrons learnability is extended to more complicated functions than the usual linearly separable ones. The complex domain promises to be a fertile ground for neural networks research. The file in the neuroprose is georgiou.perceptrons.ps.Z . Comments and questions on the proofs are welcome. --------------------------------------------------------------------- Sample session to get the file: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get georgiou.perceptrons.ps.Z ftp> quit unix> uncompress georgiou.perceptrons.ps.Z unix> lpr georgiou.perceptrons.ps (or however you print PostScript files) Thanks to Jordan Pollack for providing this service! --George ---------------------------------------------------- Dr. George M. Georgiou E-mail: georgiou at wiley.csusb.edu Computer Science Department TEL: (909) 880-5332 California State University FAX: (909) 880-7004 5500 University Pkwy San Bernardino, CA 92407, USA  From rangarajan-anand at CS.YALE.EDU Thu Feb 18 13:18:40 1993 From: rangarajan-anand at CS.YALE.EDU (Anand Rangarajan) Date: Thu, 18 Feb 1993 13:18:40 -0500 Subject: No subject Message-ID: <199302181818.AA24890@COMPOSITION.SYSTEMSZ.CS.YALE.EDU> Programmer/Analyst Position in Artificial Neural Networks The Yale Center for Theoretical and Applied Neuroscience (CTAN) and the Department of Computer Science Yale University, New Haven, CT We are offering a challenging position in software engineering in support of new techniques in image processing and computer vision using artificial neural networks (ANNs). 1. Basic Function: Designer and programmer for computer vision and neural network software at CTAN and the Computer Science department. 2. Major duties: (a) To implement computer vision algorithms using a Khoros (or similar) type of environment. (b) Use the aforementioned tools and environment to run and analyze computer experiments in specific image processing and vision application areas. (c) To facilitate the improvement of neural network algorithms and architectures for vision and image processing. 3. Position Specifications: (a) Education: BA, including linear algebra, differential equations, calculus. helpful: mathematical optimization. (b) Experience: programming experience in C++ (or C) under UNIX. some of the following: neural networks, vision or image processing applications, scientific computing, workstation graphics, image processing environments, parallel computing, computer algebra and object-oriented design. Preferred starting date: March 1, 1993. For information or to submit an application, please write: Eric Mjolsness Department of Computer Science Yale University P. O. Box 2158 Yale Station New Haven, CT 06520-2158 e-mail: mjolsness-eric at cs.yale.edu Any application must also be submitted to: Jeffrey Drexler Department of Human Resources Yale University 155 Whitney Ave. New Haven, CT 06520 -Eric Mjolsness and Anand Rangarajan (prospective supervisors)  From pjs at bvd.Jpl.Nasa.Gov Thu Feb 18 14:49:36 1993 From: pjs at bvd.Jpl.Nasa.Gov (Padhraic Smyth) Date: Thu, 18 Feb 93 11:49:36 PST Subject: Position Available at JPL Message-ID: <9302181949.AA26236@bvd.jpl.nasa.gov> We currently have an opening in our group for a new PhD graduate in the general area of signal processing and pattern recognition. While the job description does not mention neural computation per se, it may be of interest to some members of the connectionist mailing list. For details see below. Padhraic Smyth, JPL RESEARCH POSITION AVAILABLE AT THE JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY The Communications Systems Research Section at JPL has an immediate opening for a permanent member of technical staff in the area of adaptive signal processing and statistical pattern recognition. The position requires a PhD in Electrical Engineering or a closely related field and applicants should have a demonstrated ability to perform independent research. A background in statistical signal processing is highly desirable. Background in information theory, estimation and detection, advanced statistical methods, and pattern recognition, would also be a plus. Current projects within the group include the use of hidden Markov models for change detection in time series, and statistical methods for geologic feature detection in remotely sensed image data. The successful applicant will be expected to perform both basic and applied research and to propose and initiate new research projects. Permanent residency or U.S. citizenship is not a strict requirement - however, candidates not in either of these categories should be aware that their applications will only be considered in exceptional cases. Interested applicants should send their resume (plus any supporting background material such as recent relevant papers) to: Dr. Stephen Townes JPL 238-420 4800 Oak Grove Drive Pasadena, CA 91109. (email: townes at bvd.jpl.nasa.gov)  From mpp at cns.brown.edu Thu Feb 18 15:42:34 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Thu, 18 Feb 93 15:42:34 EST Subject: A computationally efficient squashing function Message-ID: <9302182042.AA03424@cns.brown.edu> Recently on the comp.ai.neural-nets bboard, there has been a discussion of more computationally efficient squashing functions. Some colleagues of mine suggested that many members of the Connectionist mailing list may not have access to the comp.ai.neural-nets bboard; so I have included a summary below. Michael ------------------------------------------------------ David L. Elliot mentioned using the following neuron activation function: x f(x) = ------- 1 + |x| He argues that this function has the same qualitative properties of the hyperbolic tangent function but in practice faster to calculate. I have suggested a similar speed-up for radial basis function networks: 1 f(x) = ------- 1 + x^2 which avoids the transcendental calculation associated with gaussian RBF nets. I have run simulations using the above squashing function in various backprop networks. The performance is comparable (sometimes worse sometimes better) to usual training using hyperbolic tangents. I also found that the performance of networks varied very little when the activation functions were switched (i.e. two networks with identical weights but different activation functions will have comparable performance on the same data). I tested these results on two databases: the NIST OCR database (preprocessed by Nestor Inc.) and the Turk and Pentland human face database. -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From henrik at robots.ox.ac.uk Fri Feb 19 11:47:16 1993 From: henrik at robots.ox.ac.uk (henrik@robots.ox.ac.uk) Date: Fri, 19 Feb 93 16:47:16 GMT Subject: Squashing functions Message-ID: <9302191647.AA05729@cato.robots.ox.ac.uk> Any interesting squashing function can be stored in a table of negligible size (eg 256) with very high accuracy if linear (or higher) interpolation is used. So, on a RISC workstation, there is no need for improvements. If you deal with analog VLSI, anything goes, though ... Cheers, henrik at robots.ox.ac.uk  From cateau at tkyux.phys.s.u-tokyo.ac.jp Sat Feb 20 01:11:11 1993 From: cateau at tkyux.phys.s.u-tokyo.ac.jp (Hideyuki Cateau) Date: Sat, 20 Feb 93 15:11:11 +0900 Subject: TR:Univeral Power law Message-ID: <9302200611.AA21000@tkyux.phys.s.u-tokyo.ac.jp> I and my collaborators previously reported that there is a beautiful power law in the pace of the memory of Back Prop. We found a reaction from one of networkers that the law was established only in the special model. This time we performed an extensive simulation to show the law is fairly universal in the technical report:cateau.univ.tar.Z, Universal Power law in feed forward networks H.Cateau Department of Physics University of Tokyo Abstract: The power law in the pace of the memory, which was previously reported for the encoder, is shown to hold universally for general feed forward networks. An extensive simulation on wide variety of feed forward networks shows this and reveals a lot of interesting new observations. The PostScript for this paper may be retrieved in the usual fashion: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get cateau.univ.tar.Z ftp> quit unix> uncompress cateau.univ.tar.Z unix> tar xvfo cateau.univ.tar Then you get three PS files:short.ps fig1.ps fig2.ps unix> lpr short.ps unix> lpr fig1.ps unix> lpr fig2.ps Hideyuki Cateau Particle theory group, Department of Physics,University of Tokyo,7-3-1, Hongo,Bunkyoku,113 Japan e-mail:cateau at tkyux.phys.s.u-tokyo.ac.jp  From soller at asylum.cs.utah.edu Fri Feb 19 16:09:43 1993 From: soller at asylum.cs.utah.edu (Jerome Soller) Date: Fri, 19 Feb 93 14:09:43 -0700 Subject: Industrial Position in Artificial Intelligence and/or Neural Networks Message-ID: <9302192109.AA22408@asylum.cs.utah.edu> I have just been made aware of a job opening in artificial intelligence and/or neural networks in southeast Ogden, UT. This company maintains strong technical interaction with existing industrial, U.S. government laboratory, and university strengths in Utah. Ogden is a half hour to 45 minute drive from Salt Lake City, UT. For further information, contact Dale Sanders at 801-625-8343 or dsanders at bmd.trw.com . The full job description is listed below. Sincerely, Jerome Soller U. of Utah Department of Computer Science and VA Geriatric, Research, Education and Clinical Center Knowledge engineering and expert systems development. Requires five years formal software development experience, including two years expert systems development. Requires experience implementing at least one working expert system. Requires familiarity with expert systems development tools and DoD specification practices. Experience with neural nets or fuzzy logic systems may qualify as equivalent experience to expert systems development. Familiarity with Ada, C/C++, database design, and probabilistic risk assessment strongly desired. Requires strong communication and customer interface skills. Minimum degree: BS in computer science, engineering, math, or physical science. M.S. or Ph.D. preferred. U.S. Citizenship is required. Relocation funding is limited.  From delliott at eng.umd.edu Fri Feb 19 15:22:38 1993 From: delliott at eng.umd.edu (David L. Elliott) Date: Fri, 19 Feb 1993 15:22:38 -0500 Subject: Abstract Message-ID: <199302192022.AA03327@verdi.eng.umd.edu> ABSTRACT A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS TR 93-8, Institute for Systems Research, University of Maryland by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University January 29, 1993 The activation function s(x) = x/(1 + |x|) is proposed for use in digital simulation of neural networks, on the grounds that the computational operation count for this function is much smaller than for those using exponentials and that it satisfies the simple differential equation s' = (1 + |s|)^2, which generalizes the logistic equation. The full report, a work-in-progress, is available in LaTeX or PostScript form (two pages + titlepage) by request to delliott at src.umd.edu.  From tony at aivru.shef.ac.uk Fri Feb 19 05:59:46 1993 From: tony at aivru.shef.ac.uk (Tony_Prescott) Date: Fri, 19 Feb 93 10:59:46 GMT Subject: lectureship Message-ID: <9302191059.AA23937@aivru> LECTURESHIP IN COGNITIVE SCIENCE University of Sheffield, UK. Applications are invited for the above post tenable from 1st October 1993 for three years in the first instance but with expectation of renewal. Preference will be given to candidates with a PhD in Cognitive Science, Artificial Intelligence, Cognitive Psychology, Computer Science, Robotics, or related disciplines. The Cognitive Science degree is an integrated course taught by the departments of Psychology and Computer Science. Research in Cognitive Science was highly evaluated in the recent UFC research evaluation exercise, special areas of interest being vision, speech, language, neural networks, and learning. The successful candidate will be expected to undertake research vigorously. Supervision of programming projects will be required, hence considerable experience with Lisp, Prolog, and/or C is essential. It is expected that the appointment will be made on the Lecturer A scale (13,400-18,576 pounds(uk) p.a.) according to age and experience but enquiries from more experienced staff able to bring research resources are welcomed. Informal enquiries to Professor John P Frisby 044-(0)742-826538 or e-mail jpf at aivru.sheffield.ac.uk. Further particulars from the director of Personnel Services, The University, Sheffield S10 2TN, UK, to whom all applications including a cv and the names and addresses of three referees (6 copies of all documents) should be sent by 1 April 1993. Short-listed candidates will be invited to Sheffield for interview for which travel expenses (within the UK only) will be funded. Current permanent research staff in Cognitive Science at Sheffield include: Prof John Frisby (visual psychophysics), Prof John Mayhew (computer vision, robotics, neural networks) Prof Yorik Wilks (natural language understanding) Dr Phil Green (speech recognition) Dr John Porrill (computer vision) Dr Paul McKevitt (natural language understanding) Dr Peter Scott (computer assisted learning) Dr Rod Nicolson (human learning) Dr Paul Dean (neuroscience, neural networks) Mr Tony Prescott (neural networks, comparative cog sci)  From delliott at src.umd.edu Sat Feb 20 15:23:57 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Sat, 20 Feb 1993 15:23:57 -0500 Subject: Corrected Abstract Message-ID: <199302202023.AA12407@newra.src.umd.edu> ABSTRACT [corrected] A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS TR 93-8, Institute for Systems Research, University of Maryland by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University January 29, 1993 The activation function s(x) = x/(1 + |x|) is proposed for use in digital simulation of neural networks, on the grounds that the computational operation count for this function is much smaller than for those using exponentials and that it satisfies the simple differential equation s' = (1 - |s|)^2, which generalizes the logistic equation. The full report, a work-in-progress, is available in LaTeX or PostScript form (two pages + titlepage) by request to delliott at src.umd.edu. Thanks to Michael Perrone for calling my attention to the typo in s'.  From raina at max.ee.lsu.edu Sat Feb 20 17:37:45 1993 From: raina at max.ee.lsu.edu (Praveen Raina) Date: Sat, 20 Feb 93 16:37:45 CST Subject: No subject Message-ID: <9302202237.AA13139@max.ee.lsu.edu> The following comparison between the backpropagation and Kak algorithm for training feedforward networks will be of interest to many. We took 52 training samples each having 25 input neurons and 3 output neurons.The training data taken was monthly price index of a commodity for 60 months. Monthly prices were normalised and quantized into 3 bit binary sequence. Each training sample represented prices taken over a period of 8 months (8X3=24 input neurons + 1 neuron for bias).The size of the learning window was fixed as 1 month.Binary values were used as the input for both BP and Kak algorithm. For BP the learning rate was taken as 0.45 and momentum equal to 0.55. The training samples were trained on IBM RISC 6000 machine. The training time for backpropagation was 4 minutes 5 seconds and the total number of iterations was 6101.The training time for the Kak algorithm was 5 seconds and the total number of iterations was 875. Thus, for this example the learning advantage in the Kak algorithm is 49. For larger examples the advantage becomes even greater. - Praveen Raina.  From unni at neuro.cs.gmr.com Sat Feb 20 14:57:13 1993 From: unni at neuro.cs.gmr.com (K.P.Unnikrishnan) Date: Sat, 20 Feb 93 14:57:13 EST Subject: A NEURAL COMPUTATION course reading list Message-ID: <9302201957.AA22392@neuro.cs.gmr.com> Folks: Here is the reading list for a course I offered last semester at Univ. of Michigan. Unnikrishnan --------------------------------------------------------------- READING LIST FOR THE COURSE "NEURAL COMPUTATION" EECS-598-6 (FALL 1992), UNIVERSITY OF MICHIGAN INSTRUCTOR: K. P. UNNIKRISHNAN ----------------------------------------------- A. COMPUTATION AND CODING IN THE NERVOUS SYSTEM 1. Hodgkin, A.L., and Huxley, A.F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500-544 (1952). 2a. Del Castillo, J., and Katz, B. Quantal components of the end-plate potential. J. Physiol. 124, 560-573 (1954). 2b. Del Castillo, J., and Katz, B. Statistical factors involved in neuromuscular facilitation and depression. J. Physiol. 124, 574-585 (1954). 3. Rall, W. Cable theory for dendritic neurons. In: Methods in neural modeling (Koch and Segev, eds.) pp. 9-62 (1989). 4. Koch, C., and Poggio, T. Biophysics of computation: neurons, synapses and membranes. In: Synaptic function (Edelman, Gall, and Cowan, eds.) pp. 637-698 (1987). B. SENSORY PROCESSING IN VISUAL AND AUDITORY SYSTEMS 1. Werblin, F.S., and Dowling, J.E. Organization of the retina of the mudpuppy, Necturus maculosus: II. Intracellular recording. J. Neurophysiol. 32, 339-355 (1969). 2a. Barlow H.B., and Levick, W.R. The mechanism of directionally selective units in rabbit's retina. J. Physiol. 178, 477-504 (1965). 2b. Lettvin, J.Y., Maturana, H.R., McCulloch, W.S., and Pitts, W.H. What the frog's eye tells the frogs's brain. Proc. IRE 47, 1940-1951 (1959). 3. Hubel, D.H., and Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. 160, 106-154 (1962). 4a. Suga, N. Cortical computational maps for auditory imaging. Neural Networks, 3, 3-21 (1990). 4b. Simmons, J.A. A view of the world through the bat's ear: the formation of acoustic images in echolocation. Cognition, 33 155-199 (1989). C. MODELS OF SENSORY SYSTEMS 1. Hect,S., Shlaer, S., and Pirenne, M.H. Energy, quanta, and vision. J. Gen. Physiol. 25, 819-840 (1942). 2. Julesz, B., and Bergen, J.R. Textons, the fundamental elements in preattentive vision and perception of textures. Bell Sys. Tech. J. 62, 1619-1645 (1983). 3a. Harth, E., Unnikrishnan, K.P., and Pandya, A.S. The inversion of sensory processing by feedback pathways: a model of visual cognitive functions. science 237, 184-187 (1987). 3b. Harth, E., Pandya, A.S., and Unnikrishnan, K.P. Optimization of cortical responses by feedback modification and synthesis of sensory afferents. A model of perception and rem sleep. Concepts Neurosci. 1, 53-68 (1990). 3c. Koch, C. The action of the corticofugal pathway on sensory thalamic nuclei: A hypothesis. Neurosci. 23, 399-406 (1987). 4a. Singer, W. et al., Formation of cortical cell assemblies. In: CSH Symposia on Quant. Biol. 55, pp. 939-952 (1990). 4b. Eckhorn, R., Reitboeck, H.J., Arndt, M., and Dicke, P. Feature linking via synchronization among distributed assemblies: Simulations of results from cat visual cortex. Neural Comp. 293-307 (1990). 5. Reichardt, W., and Poggio, T. Visual control of orientation behavior in the fly. Part I. A quantitative analysis. Q. Rev. Biophys. 9, 311-375 (1976). D. ARTIFICIAL NEURAL NETWORKS 1a. Block, H.D. The perceptron: a model for brain functioning. Rev. Mod. Phy. 34, 123-135 (1962). 1b. Minsky, M.L., and Papert, S.A. Perceptrons. pp. 62-68 (1988). 2a. Hornik, K., Stinchcombe, M., and White, H. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359-366 (1989). 2b. Lapedes, A., and Farber, R. How neural nets work. In: Neural Info. Proc. Sys. (Anderson, ed.) pp. 442-456 (1987). 3a. Ackley, D.H., Hinton, G.E., and Sejnowski, T.J. A learning algorithm for boltzmann machines. Cog. Sci. 9, 147-169 (1985). 3b. Hopfield, J.J. Learning algorithms and probability distributions in feed-forward and feed-back networks. PNAS, USA. 84, 8429-8433 (1987). 4. Tank, D.W., and Hopfield, J.J. Simple neural optimization networks: An A/D converter, signal decision circuit, and linear programming circuit. IEEE Tr. Cir. Sys. 33, 533-541 (1986). E. NEURAL NETWOK APPLICATIONS 1. LeCun, Y., et al., Backpropagation applied to handwritten zip code recognition. Neural Comp. 1, 541-551 (1990). 2. Lapedes, A., and Farber, R. Nonlinear signal processing using neural networks. LA-UR-87-2662, Los Alamos Natl. Lab. (1987). 3. Unnikrishnan, K.P., Hopfield, J.J., and Tank, D.W. Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections. IEEE Tr. ASSP. 39, 698-713 (1991). 4a. De Vries, B., and Principe, J.C. The gamma model - a new neural model for temporal processing. Neural Networks 5, 565-576 (1992). 4b. Poddar, P., and Unnikrishnan, K.P. Memory neuron networks: a prolegomenon. GMR-7493, GM Res. Labs. (1991). 5. Narendra, K.S., and Parthasarathy, K. Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Tr. NN 2, 252-262 (1991). F. HARDWARE IMPLEMENTATIONS 1a. Mahowald, M.A., and Mead, C. Silicon retina. In: Analog VLSI and neural systems (Mead). pp. 257-278 (1989). 1b. Mahowald, M.A., and Douglas, R. A silicon neuron. Nature 354, 515-518 (1991). 2. Mueller, P. et al. Design and fabrication of VLSI components for a general purpose analog computer. In: Proc. IEEE workshop VLSI neural sys. (Mead, ed.) pp. xx-xx (1989). 3. Graf, H.P., Jackel, L.D., and Hubbard, W.E. VLSI implementation of a neural network model. Computer 2, 41-49 (1988). G. ISSUES ON LEARNING 1. Geman, S., Bienenstock, E., and Doursat, R. Neural networks and the bias/variance dilema. Neural Comp. 4, 1-58 (1992). 2. Brown, T.H., Kairiss, E.W., and Keenan, C.L. Hebbian synapses: Biophysical mechanisms and algorithms. Ann. Rev. Neurosci. 13, 475-511 (1990). 3. Haussler, D. Quantifying inductive bias: AI learning algorithms and valiant's learning framework. AI 36, 177-221 (1988). 4. Reeke, G.N. Jr., and Edelman, G.M. Real brains and artificial intelligence. Daedalus 117, 143-173 (1988). 5. White, H. Learning in artificial neural networks: a statistical perspective. Neural Comp. 1, 425-464 (1989). ---------------------------------------------------------------------- SUPPLEMENTAL READING Nehr, E., and Sakmann, B. Single channel currents recorded from membrane of denervated frog muscle fibers. Nature 260, 779-781 (1976). Rall, W. Core conductor theory and cable properties of neurons. In: Handbook Physiol. (Brrokhart, Mountcastle, and Kandel eds.) pp. 39-97 (1977). Shepherd, G.M., and Koch, C. Introduction to synaptic circuits. In: The synaptic organization of the brain (Shepherd, ed.) pp. 3-31 (1990). Junge, D. Synaptic transmission. In: nerve and muscle excitation (Junge) pp. 149-178 (1981). Scott, A.C. The electrophysics of a nerve fiber. Rev. Mod. Phy. 47, 487-533 (1975). Enroth-Cugell, C., and Robson, J.G. The contrast sensitivity of retinal ganglion cells of the cat. J. Physiol. 187, 517-552 (1966). Felleman, D.J., and Van Essen, D.C. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1-47 (1991). Julesz, B. Early vision and focal attention. Rev. Mod. Phy.63, 735-772 (1991). Sejnowski, T.J., Koch, C., and Churchland, P.S. Computational neuroscience. Science 241, 1299-1302 (1988). Churchland, P.S., and Sejnowski, T.J. Perspectives on Cognitive Neuroscience. Science 242, 741-745 (1988). McCulloch, W.S., and Pitts, W. A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophy. 5, 115-133 (1943). Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. PNAS, USA. 79, 2554-2558 (1982). Hopfield, J.J. Neurons with graded responses have collective computational properties like those of two-state neurons. PNAS, USA. 81, 3088-3092 (1984). Hinton, G.E., and Sejnowski, T.J. Optimal perceptual inference. Proc. IEEE CVPR. 448-453 (1983). Rumelhart, D.E., Hinton, G.E., and Williams, R.J. Learning representations by back-propagating errors. Nature 323, 533-536 (1986). Unnikrishnan, K.P., and Venugopal, K.P. Learning in connectionist networks using the Alopex algorithm. Proc. IEEE IJCNN. I-926 - I-931 (1992). Cowan, J.D., and Sharp, D.H. Neural nets. Quart. Rev. Biophys. 21, 365-427 (1988). Lippmann, R.P. An introduction to computing with neural nets. IEEE ASSP Mag. 4, 4-22 (1987). Sompolinsky, H. Statistical mechanics of neural networks. Phy. Today 41, 70-80 (1988). Hinton, G.E. Connectionist learning procedures. Art. Intel. 40, 185-234 (1989).  From demers at cs.ucsd.edu Sun Feb 21 13:45:24 1993 From: demers at cs.ucsd.edu (David DeMers) Date: Sun, 21 Feb 93 10:45:24 -0800 Subject: NIPS-5 papers: Nonlinear dimensionallity reduction / Inverse kinematics Message-ID: <9302211845.AA24988@beowulf> Non-Linear Dimensionality Reduction David DeMers & Garrison Cottrell ABSTRACT -------- A method for creating a non--linear encoder--decoder for multidimensional data with compact representations is presented. The commonly used technique of autoassociation is extended to allow non--linear representations, and an objective function which penalizes activations of individual hidden units is shown to result in minimum dimensional encodings with respect to allowable error in reconstruction. ============================================================ Global Regularization of Inverse Kinematics for Redundant Manipulators David DeMers & Kenneth Kreutz-Delgado ABSTRACT -------- The inverse kinematics problem for redundant manipulators is ill--posed and nonlinear. There are two fundamentally different issues which result in the need for some form of regularization; the existence of multiple solution branches (global ill--posedness) and the existence of excess degrees of freedom (local ill--posedness). For certain classes of manipulators, learning methods applied to input--output data generated from the forward function can be used to globally regularize the problem by partitioning the domain of the forward mapping into a finite set of regions over which the inverse problem is well--posed. Local regularization can be accomplished by an appropriate parameterization of the redundancy consistently over each region. As a result, the ill--posed problem can be transformed into a finite set of well--posed problems. Each can then be solved separately to construct approximate direct inverse functions. ============================================================= Preprints are available from the neuroprose archive Retrievable in the usual way: unix> ftp archive.cis.ohio-state.edu (128.146.8.52) login as "anonymous", password = ftp> cd pub/neuroprose ftp> binary ftp> get demers.nips92-nldr.ps.Z ftp> get demers.nips92-robot.ps.Z ftp> bye unix> uncompress demers.*.ps.Z unix> lpr -s demers.nips92-nldr.ps.Z unix> lpr -s demers.nips92-robot.ps.Z (or however you print *LARGE* PostScript files) These papers will appear in S.J. Hanson, J.E. Moody & C.L. Giles, eds, Advances in Neural Information Processing Systems 5 (Morgan Kaufmann, 1993). Dave DeMers demers at cs.ucsd.edu Computer Science & Engineering 0114 demers%cs at ucsd.bitnet UC San Diego ...!ucsd!cs!demers La Jolla, CA 92093-0114 (619) 534-0688, or -8187, FAX: (619) 534-7029  From srikanth at rex.cs.tulane.edu Sun Feb 21 14:41:45 1993 From: srikanth at rex.cs.tulane.edu (R. Srikanth) Date: Sun, 21 Feb 93 13:41:45 CST Subject: Abstract, New Squashing function... In-Reply-To: <199302192022.AA03327@verdi.eng.umd.edu>; from "David L. Elliott" at Feb 19, 93 3:22 pm Message-ID: <9302211941.AA17332@hercules.cs.tulane.edu> > > ABSTRACT > > A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS > > TR 93-8, Institute for Systems Research, University of Maryland > > by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University > January 29, 1993 > The activation function s(x) = x/(1 + |x|) is proposed for use in > digital simulation of neural networks, on the grounds that the > computational operation count for this function is much smaller than > for those using exponentials and that it satisfies the simple differential > equation s' = (1 + |s|)^2, which generalizes the logistic equation. > The full report, a work-in-progress, is available in LaTeX or PostScript > form (two pages + titlepage) by request to delliott at src.umd.edu. > > This squashing function while not widely in use, is and has been used by few others. George Georgiou uses it for a complex back propagation network. Not only does the activation function enable him to model a complex BP but also seems to lend itself to easier implementation. For more information on complex domain backprop, contact Dr. George Georgiou at georgiou at meridian.csci.csusb.edu -- srikanth at cs.tulane.edu Dept of Computer Science, Tulane University, New Orleans, La - 70118  From delliott at src.umd.edu Sun Feb 21 15:00:03 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Sun, 21 Feb 1993 15:00:03 -0500 Subject: Response Message-ID: <199302212000.AA17583@newra.src.umd.edu> Henrik- Thanks for your comment; you wrote: "Any interesting squashing function can be stored in a table of negligible size (eg 256) with very high accuracy if linear (or higher) interpolation is used." I think you are right *if the domain of the map is compact* a priori. Otherwise the approximation must eventually become constant for large x, and this has bad consequences for backpropagation algorithms. For some other training methods, perhaps not. David  From gluck at pavlov.rutgers.edu Mon Feb 22 08:05:05 1993 From: gluck at pavlov.rutgers.edu (Mark Gluck) Date: Mon, 22 Feb 93 08:05:05 EST Subject: Neural Computation & Cognition: Opening for NN Programmer Message-ID: <9302221305.AA04474@james.rutgers.edu> POSITION AVAILABLE: NEURAL-NETWORK RESEARCH PROGRAMMER At the Center for Neuroscience at Rutgers-Newark, we have an opening for a full or part-time research programmer to assist in developing neural-network simulations. The research involves integrated experimental and theoretical analyses of the cognitive and neural bases of learning and memory. The focus of this research is on understanding the underlying neurobiological mechanisms for complex learning behaviors in both animals and humans. Substantial prior experience and understanding of neural-network theories and algorithms is required. Applicants should have a high level of programming experience (C or Pascal), and familiarity with Macintosh and/or UNIX. Strong English-language communication and writing skills are essential. *** This position would be particularly appropriate for a graduating college senior who seeks "hands-on" research experience prior to graduate school in the cognitive, neural, or computational sciences *** Applications are being accepted now for an immediate start-date or for starting in June or September of this year. NOTE TO N. CALIF. APPLICANTS: Interviews for applicants from the San Francisco/Silicon Valley area will be conducted at Stanford in late March. The Neuroscience Center is located 20 minutes outside of New York City in northern New Jersey. For further information, please send an email or hard-copy letter describe your relevant background, experience, and career goals to: ______________________________________________________________________ Dr. Mark A. Gluck Center for Molecular & Behavioral Neuroscience Rutgers University 197 University Ave. Newark, New Jersey 07102 Phone: (201) 648-1080 (Ext. 3221) Fax: (201) 648-1272 Email: gluck at pavlov.rutgers.edu  From peleg at cs.huji.ac.il Tue Feb 23 15:38:02 1993 From: peleg at cs.huji.ac.il (Shmuel Peleg) Date: Tue, 23 Feb 93 22:38:02 +0200 Subject: CFP: 12-ICPR, Int Conf Pattern Recognition, Jerusalem, 1994 Message-ID: <9302232038.AA28915@humus.cs.huji.ac.il> =============================================================================== CALL FOR PAPERS - 12th ICPR International Conferences on Pattern Recognition Oct 9-13, 1994, Jerusalem, Israel The 12th ICPR of the International Association for Pattern Recognition will be organized as a set of four conferences, each dealing with a special topic. The program for each individual conference will be organized by its own Program Committee. Papers describing applications are encouraged, and will be reviewed by a special Applications Committee. An award will be given for the best industry-related paper presented at the conference. Considerations for this award will include innovative applications, robust performance, and contributions to industrial progress. An exhibition will also be held. The conference proceedings are published by the IEEE Computer Society Press. GENERAL CO-CHAIRS: S. Ullman - Weizmann Inst. (shimon at wisdom.weizmann.ac.il) S. Peleg - The Hebrew University (peleg at cs.huji.ac.il) LOCAL ARRANGEMENTS: Y. Yeshurun - Tel-Aviv University (hezy at math.tau.ac.il) INDUSTRIAL & APPLICATIONS LIAISON: M. Ejiri - Hitachi (ejiri at crl.hitachi.co.jp) CONFERENCE DESCRIPTIONS 1. COMPUTER VISION AND IMAGE PROCESSING, T. Huang - University of Illinois Early vision and segmentation; image representation; shape and texture analysis; motion and stereo; range imaging and remote sensing; color; 3D representation and recognition. 2. PATTERN RECOGNITION AND NEURAL NETWORKS, N. Tishby - The Hebrew University Statistical, syntactic, and hybrid pattern recognition techniques; neural networks for associative memory, classification, and temporal processing; biologically oriented neural networks models; biomedical applications. 3. SIGNAL PROCESSING, D. Malah - Technion, Israel Institute of Technology Analysis, representation, coding, and recognition of signals; signal and image enhancement and restoration; scale-space and joint time-frequency analysis and representation; speech coding and recognition; image and video coding; auditory scene analysis. 4. PARALLEL COMPUTING, S. Tanimoto - University of Washington Parallel architectures and algorithms for pattern recognition, vision, and signal processing; special languages, programming tools, and applications of multiprocessor and distributed methods; design of chips, real-time hardware, and neural networks; recognition using multiple sensory modalities. PAPER SUBMISSION DEADLINE: February 1, 1994. Notification of Acceptance: May 1994. Camera-Ready Copy: June 1994. Send four copies of paper to: 12th ICPR, c/o International, 10 Rothschild blvd, 65121 Tel Aviv, ISRAEL. Tel. +972(3)510-2538, Fax +972(3)660-604 Each manuscript should include the following: 1. A Summary Page addressing these topics: - To which of the four conference is the paper submitted? - What is the paper about? - What is the original contribution of this work? - Does the paper mainly describe an application, and should be reviewed by the applications committee? 2. The paper, limited in length to 4000 words. This is the estimated length of the proceedings version. For further information contact the secretariat at the above address, or use E-mail: icpr at math.tau.ac.il . ===============================================================================  From prechelt at ira.uka.de Tue Feb 23 08:55:11 1993 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Tue, 23 Feb 93 14:55:11 +0100 Subject: Squashing functions In-Reply-To: Your message of Fri, 19 Feb 93 16:47:16 +0000. <9302191647.AA05729@cato.robots.ox.ac.uk> Message-ID: > Any interesting squashing function can be stored in a table of negligible size > (eg 256) with very high accuracy if linear (or higher) interpolation is used. 256 points are not always negligible: On a fine-grain massively parallel machine such as the MasPar MP-1, the 256*4 bytes needed to store it can consume a considerable amount of the available memory. Our MP-1216A has 16384 processors with only 16 kB memory each. Another point: On this machine, I am not sure whether interpolating from such a table would really be faster than, say, a third order Taylor approximation of the sigmoid. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4068, FAX: ++49/721/694092) | less simple.  From henrik at robots.ox.ac.uk Tue Feb 23 13:56:11 1993 From: henrik at robots.ox.ac.uk (henrik@robots.ox.ac.uk) Date: Tue, 23 Feb 93 18:56:11 GMT Subject: Squashing functions (continued) Message-ID: <9302231856.AA22594@cato.robots.ox.ac.uk> The saturation problem ('the actviation function gets constant for large |x|') can usually be solved by putting the derivative of the act. function into a table as well. You can then cheat a bit by not setting it to zero at large |x|. Concerning memory requirements (eg, MasPar MP1). I don't see why I need 4 bytes per table entry. According to the paper by Fahlman & Hoehfeld on limited pre- cision, the quantization can be done with very few bits (less than 8 if tricks are used). With interpolation you can get a pretty decent 16 bit act. value out of a 8bit wide table. Apart of that, seems to be quite complicated to put a nn on 16K processors ... how do you do that ? Cheers, henrik at robots.ox.ac.uk  From xueh at microsoft.com Wed Feb 24 01:19:47 1993 From: xueh at microsoft.com (Xuedong Huang) Date: Tue, 23 Feb 93 22:19:47 PST Subject: Microsoft Speech Research Message-ID: <9302240620.AA07680@netmail.microsoft.com> As you may know, I've started a new speech group here at Microsoft. For your information, I have enclosed the full advertisement we have been using to publicize the openings. If you are interested in joining MS, I strongly encourage you to apply and we will look forward to following up with you. ------------------------------------------------------------ THE FUTURE IS HERE. Speech Recognition. Intuitive Graphical Interfaces. Sophisticated User Agents. Advanced Operating Systems. Robust Environments. World Class Applications. Who's Pulling It All Together? Microsoft. We're setting the stage for the future of computing, building a world class research group and leveraging a solid foundation of object based technology and scalable operating systems. What's more, we're extending the recognition paradigm, employing advanced processor and RISC-based architecture, and harnessing distributed networks to connect users to worlds of information. We want to see more than just our own software running. We want to see a whole generation of users realize the future of computing. Realize your future with a position in our Speech Recognition group. Research Software Design Engineers, Speech Recognition. Primary responsibilities include designing and developing User Interface and systems level software for an advanced speech recognition system. A minimum of 3 years demonstrated microcomputer software design and development experience in C is required. Knowledge of Windows programming, speech recognition systems, hidden Markov model theory, statistics, DSP, or user interface development is preferred. A BA/BS in computer science or related discipline is required. An advanced degree (MS or Ph.D.) in a related discipline is preferred. Researchers, Speech Recognition. Primary responsibilities include research on stochastic modeling techniques to be applied to an advanced speech recognition system. A minimum of 4 years demonstrated research excellence in the area of speech recognition or spoken language understanding systems is required. Knowledge of Windows and real-time C programming for microcomputers, hidden Markov model theory, decoder systems design, DSP, and spoken language understanding is preferred. A MA/MS in CS or related discipline is required. A PhD degree in CS, EE, or related discipline is preferred. Make The Most of Your Future. At Microsoft, our technical leadership and strong Software Developers and Researchers stay ahead of the times, creating vision and turning it into reality. To apply, send your resume and cover letter, noting "ATTN: N5935-0223" to: Surface: Microsoft Recruiting ATTN: N5935-0223 One Microsoft Way Redmond, WA 98052-6399 Email: ASCII ONLY y-wait at microsoft.com.us Microsoft is an equal opportunity employer working to increase workforce diversity.  From john at cs.uow.edu.au Fri Feb 26 13:56:21 1993 From: john at cs.uow.edu.au (John Fulcher) Date: Fri, 26 Feb 93 13:56:21 EST Subject: submission Message-ID: <199302260256.AA25570@wraith.cs.uow.edu.au> COMPUTER STANDARDS & INTERFACES (North-Holland) Forthcoming Special Issue on ANN Standards ADDENDUM TO ORIGINAL POSTING Prompted by enquiries from several people regarding my original Call for Papers posting, I felt I should offer the following additional information (clarification). By ANN "Standards" we do not mean exclusively formal standards (in the ISO, IEEE, ANSI, CCITT etc. sense), although naturally enough we will be including papers on activities in these areas. "Standards" should be interpreted in its most general sense, namely as standard APPROACHES (e.g. the backpropagation algorithm & its many variants). Thus if you have a paper on some (any?) aspect of ANNs, provided it is prefaced by a summary of the standard approach(es) in that particular area, it could well be suitable for inclusion in this special issue of CS&I. If in doubt, post fax or email a copy by April 30th to: John Fulcher, Department of Computer Science, University of Wollongong, Northfields Avenue, Wollongong NSW 2522, Australia. fax: +61 42 213262 email: john at cs.uow.edu.au.oz  From terry at helmholtz.sdsc.edu Thu Feb 25 14:57:05 1993 From: terry at helmholtz.sdsc.edu (Terry Sejnowski) Date: Thu, 25 Feb 93 11:57:05 PST Subject: Neural Computation 5:2 Message-ID: <9302251957.AA14806@helmholtz.sdsc.edu> NEURAL COMPUTATION - Volume 5 - Issue 2 - March 1993 Review Neural Networks and Non-Linear Adaptive Filtering: Unifying Concepts and New Algorithms O. Nerrand, P. Roussel-Ragot, L. Personnaz, G. Dreyfus and S. Marcos Notes Fast Calculation of Synaptic Conductances Rajagopal Srinivasan and Hillel J. Chiel The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning Peter Dayan and Terrence J. Sejnowski Optimal Network Construction by Minimum Description Length Gary D. Kendall and Trevor J. Hall Letters A Neural Network Model of Inhibitory Information Processing in Aplysia Diana E.J. Blazis, Thomas M. Fischer and Thomas J. Carew Computational Diversity in a Formal Model of the Insect Olfactory Macroglomerulus C. Linster, C. Masson, M. Kerszberg, L. Personnaz and G. Dreyfus Learning Competition and Cooperation Sungzoon Cho and James A. Reggia Constraints on Synchronizing Oscillator Networks David E. Cairns, Roland J. Baddeley and Leslie S. Smith Learning Mixture Models of Spatial Coherence Suzanna Becker and Geoffrey E. Hinton Hints and the VC Dimension Yaser S. Abu-Mostafa Redundancy Reduction as a Strategy for Unsupervised Learning A. Norman Redlich Approximation and Radial-Basis-Function Networks Jooyoung Park and Irwin W. Sandberg A Polynomial Time Algorithm for Generating Neural Networks for Pattern Classification - its Stability Properties and Some Test Results Somnath Mukhopadhyay, Asim Roy, Lark Sang Kim and Sandeep Govil Neural Networks for Optimization Problems with Inequality Constraints - The Knapsack Problem Mattias Ohlsson, Carsten Peterson and Bo Soderberg ----- SUBSCRIPTIONS - VOLUME 5 - BIMONTHLY (6 issues) ______ $40 Student ______ $65 Individual ______ $156 Institution Add $22 for postage and handling outside USA (+7% GST for Canada). (Back issues from Volumes 1-4 are regularly available for $28 each to institutions and $14 each for individuals Add $5 for postage per issue outside USA (+7% GST for Canada) MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142. Tel: (617) 253-2889 FAX: (617) 258-6779 e-mail: hiscox at mitvma.mit.edu -----  From mark at dcs.kcl.ac.uk Fri Feb 26 08:25:01 1993 From: mark at dcs.kcl.ac.uk (Mark Plumbley) Date: Fri, 26 Feb 93 13:25:01 GMT Subject: King's College London Neural Networks MSc and PhD courses Message-ID: <17179.9302261325@xenon.dcs.kcl.ac.uk> Fellow Neural Networkers, Please post or forward this announcement about our M.Sc. and Ph.D. courses in Neural Networks to anyone who might be interested. Thanks, Mark Plumbley ------------------------------------------------------------------------- Dr. Mark D. Plumbley Tel: +44 71 873 2241 Centre for Neural Networks Fax: +44 71 873 2017 Department of Mathematics/King's College London/Strand/London WC2R 2LS/UK ------------------------------------------------------------------------- CENTRE FOR NEURAL NETWORKS and DEPARTMENT OF MATHEMATICS King's College London Strand London WC2R 2LS, UK M.Sc. AND Ph.D. COURSES IN NEURAL NETWORKS --------------------------------------------------------------------- M.Sc. in INFORMATION PROCESSING and NEURAL NETWORKS --------------------------------------------------- A ONE YEAR COURSE CONTENTS Dynamical Systems Theory Fourier Analysis Biosystems Theory Advanced Neural Networks Control Theory Combinatorial Models of Computing Digital Learning Digital Signal Processing Theory of Information Processing Communications Neurobiology REQUIREMENTS First Degree in Physics, Mathematics, Computing or Engineering NOTE: For 1993/94 we have 3 SERC quota awards for this course. --------------------------------------------------------------------- Ph.D. in NEURAL COMPUTING ------------------------- A 3-year Ph.D. programme in NEURAL COMPUTING is offered to applicants with a First degree in Mathematics, Computing, Physics or Engineering (others will also be considered). The first year consists of courses given under the M.Sc. in Information Processing and Neural Networks (see attached notice). Second and third year research will be supervised in one of the various programmes in the development and application of temporal, non-linear and stochastic features of neurons in visual, auditory and speech processing. There is also work in higher level category and concept formation and episodic memory storage. Analysis and simulation are used, both on PC's SUNs and main frame machines, and there is a programme on the development and use of adaptive hardware chips in VLSI for pattern and speed processing. This work is part of the activities of the Centre for Neural Networks in the School of Physical Sciences and Engineering, which has over 40 researchers in Neural Networks. It is one of the main centres of the subject in the U.K. --------------------------------------------------------------------- For further information on either of these courses please contact: Postgraduate Secretary Department of Mathematics King's College London Strand London WC2R 2LS, UK MATHS at OAK.CC.KCL.AC.UK  From stefano at kant.irmkant.rm.cnr.it Mon Feb 1 03:41:40 1993 From: stefano at kant.irmkant.rm.cnr.it (stefano@kant.irmkant.rm.cnr.it) Date: Mon, 1 Feb 1993 02:41:40 -0600 Subject: No subject Message-ID: <9302010841.AA11465@kant.irmkant.rm.cnr.it> The following paper has been placed in the neuroprose archive as nolfi.self-sel.ps.Z Instructions for retrieving and printing follow the abstract. Self-selection of Input Stimuli for Improving Performance Stefano Nolfi Domenico Parisi Institute of Psychology, CNR V.le Marx 15, 00137 Rome - Italy E-mail: stiva at irmkant.Bitnet domenico at irmkant.Bitnet Abstract A system which behaves in an environment can increase its performance level in two different ways. It can improve its ability to react efficiently to any stimulus that may come from the environment or it can acquire an ability to expose itself only to a sub-class of stimuli to which it knows how to respond efficiently. The possibility that a system can solve a task by selecting favourable stimuli is rarely considered in designing intelligent systems. In this paper we show that this type of ability can play a very powerful role in explaining a system's performance. The paper has been published in: G. A. Bekey (1993), Neural Networks and Robotics, Kluwer Academic Publisher. Sorry, no hard copies are available Comments are welcome. Stefano Nolfi Institute of Psychology, CNR V.le Marx, 15 00137 - Rome - Italy email stiva at irmkant.Bitnet _______________________________________________________________________ Here is an example of how to retrieve this file: gvax> ftp archive.cis.ohio-state.edu (or ftp 128.146.8.52) Connected to archive.cis.ohio-state.edu. 220 archive.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password:neuron at wherever 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get nolfi.self-sel.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for nolfi.self-sel.ps.Z 226 Transfer complete. ftp> quit 221 Goodbye. gvax> uncompress nolfi.self-sel.ps.Z gvax> lpr nolfi.self-sel.ps  From sbcho%gorai.kaist.ac.kr at DAIDUK.KAIST.AC.KR Tue Feb 2 11:31:50 1993 From: sbcho%gorai.kaist.ac.kr at DAIDUK.KAIST.AC.KR (Sung-Bae Cho) Date: Tue, 2 Feb 93 11:31:50 KST Subject: Paper Announcement Message-ID: <9302020231.AA01990@gorai.kaist.ac.kr.noname> Feedforward Neural Network Architectures for Complex Classification Problems To appear in the Fuzzy Systems & AI journal (Romanian Academia Publishing House). The idea of this paper was presented at the 2nd International Conference on Fuzzy Logic & Neural Networks, Iizuka-92. Sung-Bae Cho (sbcho at gorai.kaist.ac.kr) and Jin H. Kim Center for Artificial Intelligence Research and Computer Science Department Korea Advanced Institute of Science and Technology 373-1, Koosung-dong, Yoosung-ku, Taejeon 305-701, Republic of Korea Abstract This paper presents two neural network design strategies for incorporating a priori knowledge about a given problem into the feedforward neural networks. These strategies aim at obtaining tractability and reliability for solving complex classification problems by neural networks. The first type strategy based on multistage scheme decomposes the problem into manageable ones for reducing the complexity of the problem, and the second type strategy on multiple network scheme combines incomplete decisions from several copies of networks for reliable decision-making. A preliminary experiment of recognizing on-line handwriting characters confirms the superiority relative to a single large neural network classifier. Key words: neural network architecture design, multistage neural network, multiple neural networks, synthesis method, voting method, expert judgement, handwriting character recognition ----- Now available in the neuroprose archive: archive.cis.ohio-state.edu (128.146.8.52) pub/neuroprose directory under the file name sbcho.nn_architects.ps.Z (compressed PostScript).  From ro2m at crab.psy.cmu.edu Mon Feb 1 11:34:06 1993 From: ro2m at crab.psy.cmu.edu (Randall C. O'Reilly) Date: Mon, 1 Feb 93 11:34:06 EST Subject: 2 pdp.cns TR's available Message-ID: <9302011634.AA06379@crab.psy.cmu.edu.noname> The following two (related) TR's are now available for electronic ftp or by hardcopy. Instructions follow the abstracts. >>> NOTE THAT THE FTP SITE IS OUR OWN, NOT NEUROPROSE <<< Object Recognition and Sensitive Periods: A Computational Analysis of Visual Imprinting Randall C. O'Reilly Mark H. Johnson Technical Report PDP.CNS.93.1 (Submitted to Neural Computation) Abstract: Evidence from a variety of methods suggests that a localized portion of the domestic chick brain, the Intermediate and Medial Hyperstriatum Ventrale (IMHV), is critical for filial imprinting. Data further suggest that IMHV is performing the object recognition component of imprinting, as chicks with IMHV lesions are impaired on other tasks requiring object recognition. We present a neural network model of translation invariant object recognition developed from computational and neurobiological considerations that incorporates some features of the known local circuitry of IMHV. In particular, we propose that the recurrent excitatory and lateral inhibitory circuitry in the model, and observed in IMHV, produces hysteresis on the activation state of the units in the model and the principal excitatory neurons in IMHV. Hysteresis, when combined with a simple Hebbian covariance learning mechanism, has been shown in earlier work to produce translation invariant visual representations. To test the idea that IMHV might be implementing this type of object recognition algorithm, we have used a simple neural network model to simulate a variety of different empirical phenomena associated with the imprinting process. These phenomena include reversibility, sensitive periods, generalization, and temporal contiguity effects observed in behavioral studies of chicks. In addition to supporting the notion that these phenomena, and imprinting itself, result from the IMHV properties captured in the simplified model, the simulations also generate several predictions and clarify apparent contradictions in the behavioral data. ----------------------------------------------------------------------- The Self-Organization of Spatially Invariant Representations Randall C. O'Reilly James L. McClelland Technical Report PDP.CNS.92.5 Abstract: The problem of computing object-based visual representations can be construed as the development of invariancies to visual dimensions irrelevant for object identity. This view, when implemented in a neural network, suggests a different set of algorithms for computing object-based visual representations than the ``traditional'' approach pioneered by Marr, 1981. A biologically plausible self-organizing neural network model that develops spatially invariant representations is presented. There are four features of the self-organizing algorithm that contribute to the development of spatially invariant representations: temporal continuity of environmental stimuli, hysteresis of the activation state (via recurrent activation loops and lateral inhibition in an interactive network), Hebbian learning, and a split pathway between ``what'' and ``where'' representations. These constraints are tested with a backprop network, which allows for the evaluation of the individual contributions of each constraint on the development of spatially invariant representations. Subsequently, a complete model embodying a modified Hebbian learning rule and interactive connectivity is developed from biological and computational considerations. The activational stability and weight function maximization properties of this interactive network are analyzed using a Lyapunov function approach. The model is tested first on the same simple stimuli used in the backprop simulation, and then with a more complex environment consisting of right and left diagonal lines. The results indicate that the hypothesized constraints, implemented in a Hebbian network, were capable of producing spatially invariant representations. Further, evidence for the gradual integration of both featural complexity and spatial invariance over increasing layers in the network, thought to be important for real-world applications, was obtained. As the approach is generalizable to other dimensions such as orientation and size, it could provide the basis of a more complete biologically plausible object recognition system. Indeed, this work forms the basis of a recent model of object recognition in the domestic chick (O'Reilly & Johnson, 1993, TR PDP.CNS.93.1). ----------------------------------------------------------------------- Retrieval information for pdp.cns TRs: unix> ftp 128.2.248.152 # hydra.psy.cmu.edu Name: anonymous Password: ftp> cd pub/pdp.cns ftp> binary ftp> get pdp.cns.93.1.ps.Z # or, and ftp> get pdp.cns.92.5.ps.Z ftp> quit unix> zcat pdp.cns.93.1.ps.Z | lpr # or however you print postscript unix> zcat pdp.cns.92.5.ps.Z | lpr For those who do not have FTP access, physical copies can be requested from Barbara Dorney .  From tresp at inf21.zfe.siemens.de Tue Feb 2 12:29:10 1993 From: tresp at inf21.zfe.siemens.de (Volker Tresp) Date: Tue, 2 Feb 1993 18:29:10 +0100 Subject: paper in neuroprose Message-ID: <199302021729.AA24088@inf21.zfe.siemens.de> The following paper has been placed in the neuroprose archive as tresp.rules.ps.Z Instructions for retrieving and printing follow the abstract. ----------------------------------------------------------------- NETWORK STRUCTURING AND TRAINING USING RULE-BASED KNOWLEDGE ----------------------------------------------------------------- Volker Tresp, Siemens, Central Research Juergen Hollatz, TU Muenchen Subutai Ahmad, Siemens, Central Research Abstract We demonstrate in this paper how certain forms of rule-based knowledge can be used to prestructure a neural network of normalized basis functions and give a probabilistic interpretation of the network architecture. We describe several ways to assure that rule-based knowledge is preserved during training and present a method for complexity reduction that tries to minimize the number of rules and the number of conjuncts. After training, the refined rules are extracted and analyzed. To appear in: S. J. Hanson, J. D. Cowan, and C. L. Giles (Eds.), Advances in Neural Information Processing Systems 5. San Mateo CA: Morgan Kaufmann. ---- Volker Tresp Siemens AG, Central Research, Phone: +49 89 636-49408 Otto-Hahn-Ring 6, FAX: +49 89 636-3320 W-8000 Munich 83, Germany E-mail: tresp at zfe.siemens.de unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get tresp.rules.ps.Z ftp> quit unix> uncompress tresp.rules.ps.Z unix> lpr -s tresp.rules.ps (or however you print postscript)  From denis at psy.ox.ac.uk Wed Feb 3 10:47:36 1993 From: denis at psy.ox.ac.uk (Denis Mareschal) Date: Wed, 3 Feb 93 15:47:36 GMT Subject: visual tracking Message-ID: <9302031547.AA09779@dragon.psych.pdp> Hi, A couple of months ago I sent around a request for further information concerning higher level connectionist approaches to the development of visual tracking. I received a number of replies spanning the broad range of fields in which neural network research is being conducted. I also received a significant number of requests for the resulting compiled list of references. I am thus posting a list of references resulting directly and indirectly from my original request. I have also included a few relevant psychology review papers. Thanks to all those who replied. Clearly this list is not exhaustive and if anyone reading it notices an ommission which may be of interest I would greatly appreciate hearing from them. Cheers, Denis Mareschal Department of Experimental Psychology South Parks Road Oxford University Oxford OX1 3UD maresch at black.ox.ac.uk REFERENCES: Allen, R. B. (1988), Sequential connectionist networks for answering simple questions about a microworld. In: Proceedings of the Tenth Annual Conference of the Cognitive Science Society, pp. 489-495, Hillsdale, NJ: Erlbaum. Baloch, A. A. & Waxman A. M. (1991). Visual learning, adaptive expectations and behavioral conditioning of the mobile robot MAVIN, Neural Networks, vol. 4, pp. 271-302. Buck, D. S. & Nelson D. E. (1992). Applying the abductory induction mechanism (AIM) to the extrapolation of chaotic time series. In: Proceedings of the National Aerospace Electronics Conference (NAECON), 18-22 May, Dayton, Ohio, vol. 3, pp 910-915. Bremner, J. G. (1985). Object tracking and search in infancy: A review of data and a theoretical evaluation, Developmental Review, 5, pp. 371-396 Carpenter, G. A. & Grossberg, S. (1992). Neural Networks for Vision and Image Processing, Cambridge, MA: MIT Press. Cleermans, A., Servan-Schreiber, D. & McClelland, J. L. (1989). Finite state automata and simple recurrent networks, Neural Computation,1, pp 372- 381. Deno, D. C., Keller, E. L. & Crandall, W. F. (1989). Dynamical neural network organization of the visual pursuit system, IEEE Transactions on Biomedical Engineering, vol. 36, pp. 85-91. Dobnikar, A., Likar, A. & Podbregar, D. (1989). Optimal visual tracking with artificial neural network. In: First I.E.E. International Conference on Artificial Neural Networks (conf. Publ. 313), pp 275-279. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, pp. 179-211. Ensley, D. & Nelson, D. E. (1992). Applying Cascade-correlation to the extrapolation of chaotic time series. Proceedings of the Third Workshop on Neural Networks: Academic/Industrial/NASA/Defense; 10-12 February, Auburn, Alabama. Fay, D. A. & Waxman, A. M. (1992). Neurodynamics of real-time image velocity extraction. In: G. A. Carpenter & S. Grossberg (Eds), Neural Networks for Vision and Image Processing, pp 221-246, Cambridge, MA: MIT Press. Gordon, Steele, & Rossmiller (1991). Predicting trajectories using recurrent neural networks. In: Dagli, Kumara, & Shin (Eds), Intelligent Systems Through Artificial Neural Networks, ASME Press. (Sorry that's the best I can do for this reference) Grossberg, S. & Rudd(1989). A neural architecture for visual motion perception: Neural Networks, 2, pp. 421-450. Koch, C. & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, pp. 219-227. Lisberger, S. G., Morris, E. J. & Tychsen, L. (1987). Visual motion processing and sensory-motor integration for smooth pursuit eye movements, Annual Review of Neuroscience, 10, pp. 97-129. Lumer, E., D. (1992). The phase tracker of attention. In: Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pp 962-967, Hillsdale, NJ: Erlbaum. Neilson,P. D., Neilson, M. D. & O'Dwyer, N. J. (1993, in press). What limits high speed tracking performance?, Human Mouvement Science, 12. Nelson, D. E., Ensley, D. D. & Rogers, S. K. (1992). Prediction of chaotic time series using Cascade Correlation: Effects of number of inputs and training set size. In: The Society for Optical Engineering (SPIE), Proceeedings of the Applications of Artificial Neural Networks III Conference, 21-24 April, Orlando, Florida, vol. 1709, pp 823-829. Marshall, J. A. (1990). Self-organizing neural networks for perception of visual motion, Neural Networks, 3, pp. 45-74. Martin, W. N. & Aggarwal, J. K. (Eds) (1988). Motion Understanding: Robot and Human Vision. Boston: Kluwer Academic Publishers. Metzgen, Y. & Lehmann D. (1990). Learning temporal sequences by local synaptic changes, Network, 1, pp. 271-302. Nakayama, K. (1985). Biological image motion processing: A review. Vision Research 25, pp 625-660. Parisi, D., Cecconi, F. & Nolfi, S. (1990). Econets: Neural networks that learn in an environment, Network, 1, pp. 149-168. Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent networks, Neural Computation, 1, pp. 263-269. Regier, T. (1992). The acquisition of lexical semantics for spatial terms: A connectionist model of perceptual categorization. International Computer Science Institute (ICSI) Technical Report TR-92-062, Berkely. Schmidhuber, J. & Huber, R. (1991). Using adaptive sequential neurocontrol for efficient learning of translation and rotation invariance. In: T. Kohonen, K. Makisara, O. Simula & J. Kangas (Eds), Artificial Neural Networks, pp 315-320, North Holland: Elsevier Science. Schmidhuber, J. & Huber, R. (1991). Learning to generate artificial foveal trajectories for target detection. International Journal of Neural Systems, 2, pp. 135-141. Schmidhuber, J. & Wahnsiedler, R. (1992). Planning simple trajectories using neural subgoal generators. Second International Conference on Simulations of Adaptive Behavior (SAB92). (Available by ftp from Jordan Pollack's Neuroprose Archive). Sereno, M. E. (1986). Neural network model for the measurement of visual motion. Journal of the Optical Sociaty of America A, 3, pp 72. Sereno, M. E. (1987). Implementing stages of motion analysis in neural. Program of the Ninth Annual Conference of the Cognitive Science Society, pp. 405-416, Hillsdala, NJ: Erlbaum. Servan-Schreiber, D., Cleermans, A. & McClelland, J. L. (1991). Graded state machines: The representation of temporal contingencies in simple recurrent networks, 7, pp. 161-193. Shimohara, K., Uchiyama T. & Tokunaya Y. (1988). Back propagation networks for event-driven temporal sequence processing. In: IEEE International Conference on Neural Networks (San Diego), vol. 1, pp. 665-672, NY: IEEE. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences, Machine Learning, 3, pp 9-44. Tolg, S. (1991). A biological motivated system to track moving objectas by active camera control. In:T. Kohonen, K. Makisara, O. Simula & J. Kangas (Eds), Artificial Neural Networks, pp 1237-1240, North Holland: Elsevier Science. Wechsler, H. (Ed) (1991). Neural Networks for Human and Machine Perception, New York: Academic Press.  From gluck at pavlov.rutgers.edu Wed Feb 3 09:13:20 1993 From: gluck at pavlov.rutgers.edu (Mark Gluck) Date: Wed, 3 Feb 93 09:13:20 EST Subject: Preprint: Computational Models of the Neural Bases of Learning and Memory Message-ID: <9302031413.AA24540@james.rutgers.edu> For (hard copy) preprints of the following article: Gluck, M. A. & Granger, R. C. (1993). Computational models of the neural bases of learning and memory. Annual Review of Neuroscience. 16: 667-706 ABSTRACT: Advances in computational analyses of parallel-processing have made computer simulation of learning systems an increasingly useful tool in understanding complex aggregate functional effects of changes in neural systems. In this article, we review current efforts to develop computational models of the neural bases of learning and memory, with a focus on the behavioral implications of network-level characterizations of synaptic change in three anatomical regions: olfactory (piriform) cortex, cerebellum, and the hippocampal formation. ____________________________________ Send US-mail address to: Mark Gluck (Center for Neuroscience, Rutgers-Newark) gluck at pavlov.rutgers.edu  From robtag at udsab.dia.unisa.it Wed Feb 3 13:22:31 1993 From: robtag at udsab.dia.unisa.it (Tagliaferri Roberto) Date: Wed, 3 Feb 1993 19:22:31 +0100 Subject: course on Hybrid Systems Message-ID: <199302031822.AA08460@udsab.dia.unisa.it> **************** IIASS 1993 February Courses ************** **************** Last Announcement ************** A short course on "Hybrid Systems: Neural Nets, Fuzzy Sets and A.I. Systems" February 9 - 12 Lecturers: Dr. Silvano Colombano, NASA Research Center, CA Prof. Piero Morasso, Univ. Genova, Italia ----------------------------------------------------------------- Dr. Silvano Colombano (4 hours) Introduction: extending the representational power of connectionism The interim approach: hybrid symbolic connectionist systems - Distributed - Localist - Mixed localist and distributed (3 hours) Hybrid Fuzzy Logic connectionist systems - Classification - Control - Reasoning (2 hours) A competing approach: classifier systems Future directions Prof. Piero Morasso (2 hours) Self-organizing Systems and Hybrid Systems Course schedule February 9 3 pm - 6 pm Dr. S. Colombano February 10 3 pm - 6 pm Dr. S. Colombano February 11 3 pm - 6 pm Dr. S. Colombano February 12 3 pm - 5 pm Prof. P. Morasso The course will be held at IIASS, via G. Pellegrino, Vietri s/m (Sa) Italia. Participants will pay their own fare and travel expenses. No fees to be payed. The short course is sponsored by Progetto Finalizzato CNR "Sistemi Informatici e Calcolo Parallelo" and by Contratto quinquennale CNR-IIASS For any information for the short course, please contact the IIASS secretariat I.I.A.S.S Via G.Pellegrino, 19 I-84019 Vietri Sul Mare (SA) ITALY Tel. +39 89 761167 Fax +39 89 761189 or Dr. Roberto Tagliaferri E-Mail robtag at udsab.dia.unisa.it  From uli at ira.uka.de Thu Feb 4 12:06:41 1993 From: uli at ira.uka.de (Uli Bodenhausen) Date: Thu, 04 Feb 93 18:06:41 +0100 Subject: new papers in the neuroprose archive Message-ID: The following papers have been placed in the neuroprose archive as bodenhausen.application_oriented.ps.Z bodenhausen.architectural_learning.ps.Z Instructions for retrieving and printing follow the abstracts. 1.) CONNECTIONIST ARCHITECTURAL LEARNING FOR HIGH PERFORMANCE CHARACTER AND SPEECH RECOGNITION Ulrich Bodenhausen and Stefan Manke University of Karlsruhe and Carnegie Mellon University Highly structured neural networks like the Time-Delay Neural Network (TDNN) can achieve very high recognition accuracies in real world applications like handwritten character and speech recognition systems. Achieving the best possible performance greatly depends on the optimization of all structural parameters for the given task and amount of training data. We propose an Automatic Structure Optimization (ASO) algorithm that avoids time-consuming manual optimization and apply it to Multi State Time-Delay Neural Networks, a recent extension of the TDNN. We show that the ASO algorithm can construct efficient architec tures in a single training run that achieve very high recognition accuracies for two handwritten character recognition tasks and one speech recognition task. (only 4 pages!) To appear in the proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 93, Minneapolis -------------------------------------------------------------------------- 2.) Application Oriented Automatic Structuring of Time-Delay Neural Networks for High Performance Character and Speech Recognition Ulrich Bodenhausen and Alex Waibel University of Karlsruhe and Carnegie Mellon University Highly structured artificial neural networks have been shown to be superior to fully connected networks for real-world applications like speech recognition and handwritten character recognition. These structured networks can be optimized in many ways, and have to be optimized for optimal performance. This makes the manual optimization very time consuming. A highly structured approach is the Multi State Time Delay Neural Network (MSTDNN) which uses shifted input windows and allows the recognition of sequences of ordered events that have to be observed jointly. In this paper we propose an Automatic Structure Optimization (ASO) algorithm and apply it to MSTDNN type networks. The ASO algorithm optimizes all relevant parameters of MSTDNNs automatically and was successfully tested with three different tasks and varying amounts of training data. (6 pages, more detailed than the first paper) To appear in the ICNN 93 proceedings, San Francisco. -------------------------------------------------------------------------- unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52) Name: anonymous Password: neuron ftp> cd pub/neuroprose ftp> binary ftp> get bodenhausen.application_oriented.ps.Z ftp> get bodenhausen.architectural_learning.ps.Z ftp> quit unix> uncompress bodenhausen.application_oriented.ps.Z unix> uncompress bodenhausen.architectural_learning.ps.Z unix> lpr -s bodenhausen.application_oriented.ps (or however you print postscript) unix> lpr -s bodenhausen.architectural_learning.ps Thanks to Jordan Pollack for providing this service!  From moody at chianti.cse.ogi.edu Thu Feb 4 20:38:08 1993 From: moody at chianti.cse.ogi.edu (John Moody) Date: Thu, 4 Feb 93 17:38:08 -0800 Subject: NATO ASI: March 5 Deadline Approaching Message-ID: <9302050138.AA00659@chianti.cse.ogi.edu> As the March 5th application deadline is now four weeks away, I am posting this notice again. NATO Advanced Studies Institute (ASI) on Statistics and Neural Networks June 21 - July 2, 1993, Les Arcs, France Directors: Professor Vladimir Cherkassky, Department of Electrical Eng., University of Minnesota, Minneapolis, MN 55455, tel.(612)625-9597, fax (612)625- 4583, email cherkass at ee.umn.edu Professor Jerome H. Friedman, Statistics Department, Stanford University, Stanford, CA 94309 tel(415)723-9329, fax(415)926-3329, email jhf at playfair.stanford.edu Professor Harry Wechsler, Computer Science Department, George Mason University, Fairfax VA22030, tel(703)993-1533, fax(703)993-1521, email wechsler at gmuvax2.gmu.edu List of invited lecturers: I. Alexander, L. Almeida, A. Barron, A. Buja, E. Bienenstock, G. Carpenter, V. Cherkassky, T. Hastie, F. Fogelman, J. Friedman, H. Freeman, F. Girosi, S. Grossberg, J. Kittler, R. Lippmann, J. Moody, G. Palm, R. Tibshirani, H. Wechsler, C. Wellekens Objective, Agenda and Participants: Nonparametric estimation is a problem of fundamental importance for many applications involving pattern classification and discrimination. This problem has been addressed in Statistics, Pattern Recognition, Chaotic Systems Theory, and more recently in Artificial Neural Network (ANN) research. This ASI will bring together leading researchers from these fields to present an up-to-date review of the current state-of-the art, to identify fundamental concepts and trends for future development, to assess the relative advantages and limitations of statistical vs neural network techniques for various pattern recognition applications, and to develop a coherent framework for the joint study of Statistics and ANNs. Topics range from theoretical modeling and adaptive computational methods to empirical comparisons between statistical and neural network techniques. Lectures will be presented in a tutorial manner to benefit the participants of ASI. A two-week programme is planned, complete with lectures, industrial/government sessions, poster sessions and social events. It is expected that over seventy students (which can be researchers or practitioners at the post-graduate or graduate level) will attend, drawn from each NATO country and from Central and Eastern Europe. The proceedings of ASI will be published by Springer-Verlag. Applications: Applications for participation at the ASI are sought. Prospective students, industrial or government participants should send a brief statement of what they intend to accomplish and what form their participation would take. Each application should include a curriculum vitae, with a brief summary of relevant scientific or professional accomplishments, and a documented statement of financial need (if funds are applied for). Optionally, applications may include a one page summary for making a short presentation at the poster session. Poster presentations focusing on comparative evaluation of statistical and neural network methods and application studies are especially sought. For junior applicants, support letters from senior members of the professional community familiar with the applicant's work would strengthen the application. Prospective participants from Greece, Portugal and Turkey are especially encouraged to apply. Costs and Funding: The estimated cost of hotel accommodations and meals for the two-week duration of the ASI is US$1,600. In addition, participants from industry will be charged an industrial registration fee, not to exceed US$1,000. Participants representing industrial sponsors will be exempt from the fee. We intend to subsidize costs of participants to the maximum extent possible by available funding. Prospective participants should also seek support from their national scientific funding agencies. The agencies, such as the American NSF or the German DFG, may provide some ASI travel funds upon the recommendation of an ASI director. Additional funds exist for students from Greece, Portugal and Turkey. We are also seeking additional sponsorship of ASI. Every sponsor will be fully acknowledged at the ASI site as well as in the printed proceedings. Correspondence and Registration: Applications should be forwarded to Dr. Cherkassky at the above address. Applications arriving after March 5, 1993 may not be considered. All approved applicants will be informed of the exact registration arrangements. Informal email inquiries can be addressed to Dr. Cherkassky at nato_asi at ee.umn.edu  From takagi at diva.berkeley.edu Thu Feb 4 21:48:15 1993 From: takagi at diva.berkeley.edu (Hideyuki Takagi) Date: Thu, 4 Feb 93 18:48:15 -0800 Subject: BISC Special Seminar Message-ID: <9302050248.AA02922@diva.Berkeley.EDU> Dear Colleagues: We will hold the BISC Special Seminar at UC Berkeley one day before FUZZ-IEEE'93/ICNN'93. Please forward the following announcement to widely. Hideyuki TAKAGI ----------------------------------------------------------------------- EXTENDED BISC SPECIAL SEMINAR 10:30AM-5:45PM, March 28 (Sunday), 1993 Sibley Auditorium (210) in Bechtel Hall University of California, Berkeley CA 94720 BISC (Berkeley Initiative for Soft Computing) of UC Berkeley will hold a Special Seminar to take advantage of the presence in the San Francisco area of the luminaries attending FUZZ-IEEE'93/ICNN'93. We hope that your schedule will allow you to participate. PROGRAM: 10:30-11:00 Lotfi A. Zadeh (Univ. of California, Berkeley) Soft Computing 11:00-12:00 Hidetomo Ichihashi / Univ. of Osaka Prefecture Neuro-Fuzzy Approaches to Optimization and Inverse Problems 12:00- 1:30 (lunch) 1:30- 2:30 Philippe Smets (Iridia Universite Libre de Bruxelles) Imperfect information : Imprecision - Uncertainty 2:30- 3:30 Teuvo Kohonen (Helsinki University of Technology) Competitive-Learning Neural Networks are closest to Biology 3:30- 3:45 (break) 3:45- 4:45 Michio Sugeno (Tokyo Institute of Technology) Fuzzy Modeling towards Qualitative Modeling 4:45- 5:45 Hugues Bersini (Iridia Universite Libre de Bruxelles) The Immune Learning Mechanisms: Reinforcement, Recruitment and their Applications REGISTRATION: Attendance is free and registration is not required. HOW TO GET HERE: [BART subway from San Francisco downtown] The closest station to the SF Hilton Hotel is the Powell Str. Station. Berkeley is a safe 24 minute ride from the Powell Str. Station. You must catch the Concord bound train and transfer onto a Richmond bound train at the Oakland City Center-12th Str. Station. Trains on Sunday rendezvous every 20 minutes as indicated below. Powell 12th Str. Berkeley 8:17 ---- 8:31 8:31 ---- 8:41 8:37 ---- 8:51 8:51 ---- 9:01 8:57 ---- 9:11 9:11 ---- 9:21 9:17 ---- 9:31 9:31 ---- 9:41 9:37 ---- 9:51 9:51 ---- 10:01 It takes 15-20 minutes on foot from the Berkeley BART Station to reach Bechtel Hall, which is located on the North-East part of campus. Bechtel Hall is just North of Evans Hall, home of the Computer Science Division. North Gate is the nearest campus gate. [TAXI] You can take a taxi from the front of the Berkeley BART Station. Ask the taxi driver to enter from East Gate on campus and let you off at Mining Circle. The tallest building adjacent to the circle is Evans Hall. Bechtel Hall is just north of the Evans. [CAR] Get off at the University Ave. exit from Interstate 80. The east end of University Ave. is the West Gate to UC Berkeley. Most street parking is free on Sunday, but it may be scarce and remember to read the signs. If you feel you must park in a lot, we recommend UCB Parking Structure H which is located at the corner of Hearst and La Loma Avenues. You must buy an all day parking ticket from the vending machine located on the 2nd level (the only one in the structure). You need to prepare 12 quarters. Illegal parking in Berkeley is expensive. CONTACT ADDRESS: Hideyuki TAKAGI, Coordinator of this seminar (takagi at cs.berkeley.edu) Lotfi A. Zadeh, Director of BISC (zadeh at cs.berkeley.edu) Computer Science Division University of California at Berkeley Berkeley, CA 94720 FAX <+1>510-642-5775  From ira at linus.mitre.org Fri Feb 5 10:06:53 1993 From: ira at linus.mitre.org (ira@linus.mitre.org) Date: Fri, 5 Feb 93 10:06:53 -0500 Subject: vision position posting Message-ID: <9302051506.AA09737@ellington.mitre.org> Neural Network Vision Research Position The MITRE Corporation is looking for a Vision Modeler with an excellent math background, knowledge of signal processing techniques, considerable experience modeling biological low-level vision processes and broad knowledge of current neural network learning algorithm research. This is an *applied* research position which has as its goal the application of vision modeling techniques to real tasks such as 2D and 3D object recognition in synthetic and real world imagery. This position requires software implementation of models in C language. The position may also involve management responsibilities. The position is located in Bedford, Massachusetts. We are looking for someone with availability within the next two months. Interested applicants should send a resume and representative publications to: Ira Smotroff Lead Scientist The MITRE Corporation MS K331 202 Burlington Rd. Bedford, MA 01730-1420  From heiniw at sun1.eeb.ele.tue.nl Fri Feb 5 09:56:17 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Fri, 5 Feb 1993 15:56:17 +0100 (MET) Subject: Does backprop need the derivative ?? Message-ID: <9302051456.AA02038@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 1054 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/b760eda0/attachment-0001.ksh From wallyn at capsogeti.fr Fri Feb 5 13:05:20 1993 From: wallyn at capsogeti.fr (Alexandre Wallyn) Date: Fri, 5 Feb 93 19:05:20 +0100 Subject: Neural networks in Product modelling Message-ID: <9302051805.AA13434@gizmo> I am trying to evaluate the state of the art in the connectionist applications in Product Modelling (or engineering design). After looking in several journals (Neural Networks, IJCNN proceedings, Neuro-Nimes, and some history of connectionist mailing list), I only found: "Neural Network in Engineering Design" (H.Adeli, IJCNN 1990) (very general) Indirect quotations of general work in AI Wright University (1988) Modelling of MOS components in University of Dortmund (1990) and CadChem product of AIWare for product modelling and chemical formulation (seem to be uses by General Tire and Good Year). Are these applications in product modelling so scarce, or are they published in other forums ? I thank you in advance for your help. I will, of course, publish a summary of the replies. Alexandre Wallyn CAP GEMINI INNOVATION 86-90, rue Thiers 92513 BOULOGNE FRANCE wallyn at capsogeti.fr  From ira at linus.mitre.org Fri Feb 5 10:14:46 1993 From: ira at linus.mitre.org (ira@linus.mitre.org) Date: Fri, 5 Feb 93 10:14:46 -0500 Subject: vision position: US Citizens only Message-ID: <9302051514.AA09747@ellington.mitre.org> Sorry to clutter your mail boxes. The Neural Network Vision Position at The MITRE Corporation is open only to US Citizens. Ira Smotroff  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Fri Feb 5 22:55:28 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Fri, 05 Feb 93 22:55:28 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Fri, 05 Feb 93 15:56:17 +0100. <9302051456.AA02038@sun1.eeb.ele.tue.nl> Message-ID: In his paper, 'An Empirical Study of Learning Speed in Back-Propagation Networks', Scott E. Fahlmann shows that with the encoder/decoder problem it is possible to replace the derivative of the transfer function by a constant. I have been able to reproduce this example. However, for several other examples, it was not possible to get the network converged using a constant for the derivative. Interesting. I just tried this on encoder problems and a couple of other simple things, and leapt to the conclusion that it was a general phenomenon. It seems plausible to me that any "derivative" function that preserves the sign of the error and doesn't have a "flat spot" (stable point of 0 derivative) would work OK, but I don't know of anyone who has made an extensive study of this. I'd be interested in hearing more about the problems you've encountered and about any results others send to you. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From marwan at sedal.su.oz.au Sat Feb 6 07:49:53 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Sat, 6 Feb 1993 23:49:53 +1100 Subject: Does backprop need the derivative ?? Message-ID: <9302061249.AA17234@sedal.sedal.su.OZ.AU> As the intention of the inquirer is the analog implementation of backprop, I see two problems: 1- the question whether the derivative can be replaced by a constant, and more importantly 2- whether the precision of the analog implementation will be high enough for backprop to work. Regarding (1), it is likely as Scott Fahlman suggested any derivative that "preserves" the error sign may do the job. The question however is the implication in terms of convergence speed, and the comparison thereof with perturbation type training methods. Regarding (2), there has been several reports indicating that backpropagation simply does not work when the number of bits is reduced towards 6-8 bits! Marwan ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From jlm at crab.psy.cmu.edu Sat Feb 6 08:39:43 1993 From: jlm at crab.psy.cmu.edu (James L. McClelland) Date: Sat, 6 Feb 93 08:39:43 EST Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@sef-pmax.slisp.cs.cmu.edu's message of Fri, 05 Feb 93 22:55:28 EST Message-ID: <9302061339.AA19977@crab.psy.cmu.edu.noname> Re the discussion concerning replacing the derivative of the activations of units with a constant: Some work has been done using the activation rather than the derivative of the activation by Nestor Schmajuk. He is interested in biologically plausible models and tends to keep hidden units in the bottom half of the sigmoid. In that case they can be approximated by exponentials and so the derivative can be approximated by the activation. Approx ref: Schmajuk and DiCarlo, Psychological Review, 1992 - Jay McClelland  From ljubomir at darwin.bu.edu Sat Feb 6 11:17:56 1993 From: ljubomir at darwin.bu.edu (Ljubomir Buturovic) Date: Sat, 6 Feb 93 11:17:56 -0500 Subject: Does backprop need the derivative ?? Message-ID: <9302061617.AA13641@darwin.bu.edu> Mr. Heini Withagen says: > I am working on an analog chip implementing a feedforward > network and I am planning to incorporate backpropagation learning > on the chip. If it would be the case that the backpropagation > algorithm doesn't need the derivative, it would simplify the > design enormously. We have trained multilayer perceptron without derivatives, using simplex algorithm for multidimensional optimization (not to be confused with simplex algorithm for linear programming). From our experiments, it turns out that it can be done, however the number of weights is seriously limited, since the memory complexity of simplex is N^2, where N is the total number of variable weights in the network. See reference for further details (the reference is available as a LaTeX file from ljubomir at darwin.bu.edu). Lj. Buturovic, Lj. Citkusev, ``Back Propagation and Forward Propagation,'' in Proc. Int. Joint Conf. Neural Networks, (Baltimore, MD), 1992, pp. IV-486 -- IV-491. Ljubomir Buturovic Boston University BioMolecular Engineering Research Center 36 Cummington Street, 3rd Floor Boston, MA 02215 office: 617-353-7123 home: 617-738-6487  From gary at cs.ucsd.edu Sat Feb 6 11:20:57 1993 From: gary at cs.ucsd.edu (Gary Cottrell) Date: Sat, 6 Feb 93 08:20:57 -0800 Subject: Does backprop need the derivative ?? Message-ID: <9302061620.AA29550@odin.ucsd.edu> I happen to know it doesn't work for a more complicated encoder problem: Image compression. When Paul Munro & I were first doing image compression back in 86, the error would go down and then back up! Rumelhart said: "there's a bug in your code" and indeed there was: we left out the derivative on the hidden units. -g.  From radford at cs.toronto.edu Sun Feb 7 12:24:15 1993 From: radford at cs.toronto.edu (Radford Neal) Date: Sun, 7 Feb 1993 12:24:15 -0500 Subject: Does backprop need the derivative? Message-ID: <93Feb7.122429edt.227@neuron.ai.toronto.edu> Other posters have discussed, regarding backprop... > ... the question whether the derivative can be replaced by a constant, To clarify, I believe the intent is that the "constant" have the same sign as the derivative, but have constant magnitude. Marwan Jabri says... > Regarding (1), it is likely as Scott Fahlman suggested any derivative > that "preserves" the error sign may do the job. One would expect this to work only for BATCH training. On-line training approximates the batch result only if the net result of updating the weights on many training cases mimics the summing of derivatives in the batch scheme. This will not be the case if a training case where the derivative is +0.00001 counts as much as one where it is +10000. This is not to say it might not work in some cases. There's just no reason to think that it will work generally. Radford Neal  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Feb 7 12:56:03 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 07 Feb 93 12:56:03 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Sat, 06 Feb 93 23:49:53 +1100. <9302061249.AA17234@sedal.sedal.su.OZ.AU> Message-ID: As the intention of the inquirer is the analog implementation of backprop, I see two problems: 1- the question whether the derivative can be replaced by a constant, and more importantly 2- whether the precision of the analog implementation will be high enough for backprop to work. ... Regarding (2), there has been several reports indicating that backpropagation simply does not work when the number of bits is reduced towards 6-8 bits! It is true that several studies show a sudden failure of backprop learning when you use fixnum arithmetic and reduce the number of bits per word. The point of failure seems to be problem-specific, but is often around 10-14 bits (incuding sign). Marcus Hoehfeld and I studied this issue and found that the source of the failure was a quantization effect: the learning algorithm needs to accumulate lots of small steps, for weight-update or whatever, and since these are smaller than half the low-order bit, it ends up accumulating a lot of zeros instead. We showed that if a form of probabilisitic rounding (dithering) is used to smooth over these quantization steps, learning continues on down to 4 bits or fewer, with only a gradual degradation in learning time, number of units/weights required, and quality of the result. This study used Cascor, but we believe that the results hold for backprop as well. Marcus Hoehfeld and Scott E. Fahlman (1992) "Learning with Limited Numerical Precision Using the Cascade-Correlation Learning Algorithm" in IEEE Transactions on Neural Networks, Vol. 3, no. 4, July 1992, pp. 602-611. Of course, a learning system implemented in analog hardware might have only a few bits of accuracy due to noise and nonlinearity in the circuits, but it wouldn't suffer from this quantization effect, since you get a sort of probabilistic dithering for free. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From kolen-j at cis.ohio-state.edu Sun Feb 7 11:31:20 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Sun, 7 Feb 93 11:31:20 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: "James L. McClelland"'s message of Sat, 6 Feb 93 08:39:43 EST <9302061339.AA19977@crab.psy.cmu.edu.noname> Message-ID: <9302071631.AA19877@pons.cis.ohio-state.edu> Back prop does not need THE derivative. I have some empirical results which show that most of the internal mathematical operators of back prop can be replaced by qualitatively similar operators. I'm not talking about reducing bit width, as most of the literature does. I was interested in what happens when you replace multiplication with maximum, the sigmoid with a generic bump, etc. What was suprising was that all the tweeks basically worked. Back prop is "functionally" stable in the sense that the learning functional ability remains regardless of minor shifts in internal organization. The reason that the reduced accuracy results are the way that they are can be traced to the loss of continuity rather than the loss of bits. John Kolen  From gary at cs.UCSD.EDU Sun Feb 7 13:09:19 1993 From: gary at cs.UCSD.EDU (Gary Cottrell) Date: Sun, 7 Feb 93 10:09:19 -0800 Subject: Does backprop need the derivative ?? Message-ID: <9302071809.AA00283@odin.ucsd.edu> The sign is always positive. Hence not using it is an approximation that preserves the sign. -g.  From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU Sun Feb 7 13:02:42 1993 From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU) Date: Sun, 07 Feb 93 13:02:42 EST Subject: Does backprop need the derivative ?? In-Reply-To: Your message of Sat, 06 Feb 93 08:20:57 -0800. <9302061620.AA29550@odin.ucsd.edu> Message-ID: I happen to know it doesn't work for a more complicated encoder problem: Image compression. When Paul Munro & I were first doing image compression back in 86, the error would go down and then back up! Rumelhart said: "there's a bug in your code" and indeed there was: we left out the derivative on the hidden units. -g. I can see why not using the true derivative of the sigmoid, but just an approximation that preserves the sign, might cause learning to bog down, but I don't offhand see how it could cause the error to go up, at least in a net with only one hidden layer and with a monotonic activation function. I wonder if this problem would also occur in a net using the "sigmoid prime offset", which adds a small constant to the derivative of the sigmoid. I haven't seen it. -- Scott =========================================================================== Scott E. Fahlman Internet: sef+ at cs.cmu.edu Senior Research Scientist Phone: 412 268-2575 School of Computer Science Fax: 412 681-5739 Carnegie Mellon University Latitude: 40:26:33 N 5000 Forbes Avenue Longitude: 79:56:48 W Pittsburgh, PA 15213 ===========================================================================  From marwan at sedal.su.oz.au Sun Feb 7 18:13:36 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Mon, 8 Feb 1993 10:13:36 +1100 Subject: Does backprop need the derivative ?? Message-ID: <9302072313.AA24874@sedal.sedal.su.OZ.AU> > It is true that several studies show a sudden failure of backprop learning > when you use fixnum arithmetic and reduce the number of bits per word. The > point of failure seems to be problem-specific, but is often around 10-14 > bits (incuding sign). > > Marcus Hoehfeld and I studied this issue and found that the source of the > failure was a quantization effect: the learning algorithm needs to > accumulate lots of small steps, for weight-update or whatever, and since > these are smaller than half the low-order bit, it ends up accumulating a > lot of zeros instead. We showed that if a form of probabilisitic rounding > (dithering) is used to smooth over these quantization steps, learning > continues on down to 4 bits or fewer, with only a gradual degradation in > learning time, number of units/weights required, and quality of the result. > This study used Cascor, but we believe that the results hold for backprop > as well. > > Marcus Hoehfeld and Scott E. Fahlman (1992) "Learning with Limited > Numerical Precision Using the Cascade-Correlation Learning Algorithm" > in IEEE Transactions on Neural Networks, Vol. 3, no. 4, July 1992, pp. > 602-611. > Yun Xie and I have tried simular experiments on the Sonar and ECG data, and it is fair to say that standard backprop gives up about 10 bits [2]. In a closer look at the quantisation effects you would find that the signal/noise ratio depends on the number of layers[1]. As you go deeper you require less precision. This would be a source of variation between backprop and cascor. > Of course, a learning system implemented in analog hardware might have only > a few bits of accuracy due to noise and nonlinearity in the circuits, but > it wouldn't suffer from this quantization effect, since you get a sort of > probabilistic dithering for free. > Hmmm... precision also suffers from number of operations in analog implementations. The free dithering you get is every where including in your errors! The gradient descent turns into a yoyo. This is well explained in [2, 3]. The best way of using backprop or more efficiently, conjuguate gradient is to do the training off-chip and then to download the (truncated) weights. Our experience in the training of real analog chips shows that some further in-loop training is required. Note our chips were ultra low power and you may have less problems with strong inversion implementations. Regarding the idea of Simplex that has been suggested. The inquirer was talking about on-chip learning. Have you in your experiments done a limited precision Simplex? Have you tried it on a chip in in-loop mode? Philip Leong here has tried a similar idea (I think) a while back. The problem with this approach is that you need to a have a very good guess at your starting point as the Simplex will move you from one vertex (feasible solution) to another while expanding the weight solution space. Philip's experience is that it does work for small problems when you have a good guess! At the last NIPS, there were 4 posters about learning in or for analog chips. The inquirer may wish to consult these papers (two at least were advertised deposited in the neuroprose archive, one by Gert Cauwengergh and one by Barry Flower and I). So far, for us, the most reliable analog chips training algorithm has been the combined search algorithm (modified weight perturbation and partial random search) [3]. I will be very interested in hearing more about experiments where analog chips are trained. Marwan [1] Yun Xie and M. Jabri, Analysis of the Effects of Quantization in Multi-layer Neural Networks Using A Statistical Model, IEEE Transactions on Neural Networks, Vol. 3, No. 2, pp. 334-338, March, 1992. [2] M. Jabri, S. Pickard, P. Leong and Y. Xie, Algorithms and Implementation Issues in Analog Low Power Learning Neural Nertwork Chips, To appear in the Intenational Journal on VLSI Signal Processing, early 1993, USA. [3] Y. Xie and M. Jabri, On the Training of Limited Precision Multi-layer Perceptrons. Proceedings of the International Joint Conference on Neural Networks, pp III-942-947, July 1992, Baltimore, USA. ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From takagi at diva.berkeley.edu Sun Feb 7 14:36:59 1993 From: takagi at diva.berkeley.edu (Hideyuki Takagi) Date: Sun, 7 Feb 93 11:36:59 -0800 Subject: attendance restriction at BISC Special Seminar Message-ID: <9302071936.AA00803@diva.Berkeley.EDU> ORGANIZATIONAL CHANGE in Extended BISC Special Seminar 10:30AM-5:45PM, March 28 (Sunday), 1993 Sibley Auditorium (210) in Bechtel Hall University of California, Berkeley CA 94720 Dear Colleagues: This is to inform you of an organizational change in the Extended BISC Special Seminar which was announced on February 4. Most of speakers in the regular BISC Seminar are associated with companies and universities in the Bay area. The motivations for the Extended BISC Seminar was to take advantage of the presence in the Bay area of some of the leading contributors to fuzzy logic and neural network theory from abroad, who will be participating in FUZZ-IEEE'93 / ICNN'93. A problem which became apparent is that because both the Extended BISC Seminar and the FUZZ-IEEE'93/ICNN'93 tutorials are scheduled to take place on the same day, the BISC Seminar may have an adverse effect on registration for the conference tutorilas. To resolve this problem, it was felt that it may be nessary to restrict attendance at the Extended BISC Seminar to students and faculty in the Bay area who normally attend the BISC Seminar. In this way, the Extended BISC Seminar would serve its usual role and at the same time bring to the Berkeley Campus some of the leading contributors to soft computing. The publicity for the Extended BISC Seminar will state that attendance is limited to students and faculty in the Bay area. Sincerely, BISC (Berkeley Initiative for Soft Computing) ---------------------------------------------  From mav at cs.uq.oz.au Sun Feb 7 19:33:21 1993 From: mav at cs.uq.oz.au (Simon Dennis) Date: Mon, 08 Feb 93 10:33:21 +1000 Subject: Learning in Memory Technical Report Message-ID: <9302080033.AA10081@uqcspe.cs.uq.oz.au> The following technical report is available for anonymous ftp. TITLE: Integrating Learning into Models of Human Memory: The Hebbian Recurrent Network AUTHORS: Simon Dennis and Janet Wiles ABSTRACT: We develop an interactive model of human memory called the Hebbian Recurrent Network (HRN) which integrates work in the mathematical modeling of memory with that in error correcting connectionist networks. It incorporates the matrix model (Pike, 1984; Humphreys, Bain & Pike, 1989) into the Simple Recurrent Network (SRN, Elman, 1989). The result is an architecture which has the desirable memory characteristics of the matrix model such as low interference and massive generalization but which is able to learn appropriate encodings for items, decision criteria and the control functions of memory which have traditionally been chosen a priori in the mathematical memory literature. Simulations demonstrate that the HRN is well suited to a recognition task inspired by typical memory paradigms. When compared against the SRN the HRN is able to learn longer lists, generalizes from smaller training sets, and is not degraded significantly by increasing the vocabulary size. Please mail correspondence to mav at cs.uq.oz.au Ftp Instructions: $ ftp exstream.cs.uq.oz.au Connected to exstream.cs.uq.oz.au. 220 exstream FTP server (Version 6.12 Fri May 8 16:33:17 EST 1992) ready. Name (exstream.cs.uq.oz.au:mav): anonymous 331 Guest login ok, send e-mail address as password. Password: 230- Welcome to ftp.cs.uq.oz.au 230-This is the University of Queensland Computer Science Anonymous FTP server. 230-For people outside of the department, please restrict your usage to outside 230-of the hours 8am to 6pm. 230- 230-The local time is Mon Feb 8 10:26:05 1993 230- 230 Guest login ok, access restrictions apply. ftp> cd pub/TECHREPORTS/department 250 CWD command successful. ftp> bin 200 Type set to I. ftp> get TR0252.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for TR0252.ps.Z (160706 bytes). 226 Transfer complete. local: TR0252.ps.Z remote: TR0252.ps.Z 160706 bytes received in 0.71 seconds (2.2e+02 Kbytes/s) ftp> quit 221 Goodbye. $ Printing Instructions: $ zcat TR0252.ps.Z | lpr  From efiesler at idiap.ch Mon Feb 8 03:22:31 1993 From: efiesler at idiap.ch (E. Fiesler) Date: Mon, 8 Feb 93 09:22:31 +0100 Subject: Does backprop need the derivative ?? Message-ID: <9302080822.AA22484@idiap.ch> Marwan Jabri wrote: > Date: Sat, 6 Feb 1993 23:49:53 +1100 > From: Marwan Jabri > Subject: Re: Does backprop need the derivative ?? > > As the intention of the inquirer is the analog implementation of > backprop, I see two problems: 1- the question whether the derivative can > be replaced by a constant, and more importantly 2- whether the precision > of the analog implementation will be high enough for backprop to work. > > Regarding (1), ... > > Regarding (2), there has been several reports indicating that > backpropagation simply does not work when the number of bits is reduced > towards 6-8 bits! This is often reported for standard backpropagation. However, a simple extension of backpropagation can make it work for any precision; up to 1-2 bits. I'll append the reference(s) below. E. Fiesler Directeur de Recherche IDIAP Case postale 609 CH-1920 Martigny Switzerland @InProceedings{Fiesler-90, Author = "E. Fiesler and A. Choudry and H. J. Caulfield", Title = "A Weight Discretization Paradigm for Optical Neural Networks", BookTitle = "Proceedings of the International Congress on Optical Science and Engineering", Volume = "SPIE-1281", Pages = "164--173", Publisher = "The International Society for Optical Engineering Proceedings", Address = "Bellingham, Washington, U.S.A.", Year = "1990", ISBN = "0-8194-0328-8", Language = "English" } @Article{Fiesler-93, Author = "E. Fiesler and A. Choudry and H. J. Caulfield", Title = "A Universal Weight Discretization Method for Multi-Layer Neural Networks", Journal = "IEEE Transactions on Systems, Man, and Cybernetics (IEEE-SMC)", Publisher = "The Institute of Electrical and Electronics Engineers (IEEE), Inc.", Address = "New York, New York", Year = "1993", ISSN = "0018-9472", Language = "English", Note = "Accepted for publication." }  From annette at cdu.ucl.ac.uk Mon Feb 8 05:13:06 1993 From: annette at cdu.ucl.ac.uk (Annette Karmiloff-Smith) Date: Mon, 8 Feb 93 10:13:06 GMT Subject: Cognitive Development for Connectionists Message-ID: <9302081013.AA14475@cdu.ucl.ac.uk> Below are details of two articles and a book which may be of interest to connectionists: A.Karmiloff-Smith (1992), Connection Science, Vo.4, Nos. 3 & 4, 253- 269. NATURE, NURTURE ANDS PDP: Preposterous Developmental Postulates? (N.B. the question mark - I end on: Promising Developmental Postulates!) Abstract: In this article I discuss the nature/nurture debate in terms of evidence and theorizing from the field of cognitive development, and pinpoint various problems where the Connectionist framework needs to be further explored from this perspective. Evidence from normal and abnormal developmental phenotypes points to some domain-specific constraints on early learning. Yet, by invoking the dynamics of epigenesis, I avoid recourse to a strong Nativist stance and remain within the general spirit of Connectionism. _____________________________________________________________ A. Karmiloff-Smith (1992) Technical Report TR.PDP.CNS.92.7, Carnegie Mellon University, Pittsburgh. ABNORMAL PHENOTYPES AND THE CHALLENGES THEY POSE TO CONNECTIONIST MODELS OF DEVELOPMENT Abstract: The comparison of different abnormal phenotypes (e.g. Williams syndrome, Down syndrome, autism, hydrocephalus with associated myelomeningocele) raises a number of questions about domain-general versus domain-specific processes and suggests that development stems from domain-specific predispositions which channel infantsU attention to proprietary inputs. This is not to be confused with a strong Nativist position. Genetically fully specified modules are not the starting point of development. Rather, a process of gradual modularization builds on skeletal domain-specific predispositions (architectural and/or representational) which give the normal infant a small but significant head-start. It is argued that Down syndrome infants may lack these head-starts, whereas individuals with Williams syndrome, autism and hydrocephalus with associated myelomeningocele have a head-start in selected domains only, leading to different cognitive profiles despite equivalent input. Stress is placed on the importance of exploring a developing system, rather than a lesioned adult system. The position developed in the paper not only contrasts with the strong Nativist stance, but also with the view that domain-general processes are simply applied to whatever inputs the child encounters. The comparison of different phenotypical outcomes is shown to pose interesting challenges to connectionist simulations of development. ______________________________________________________________ A.Karmiloff-Smith (1992) BEYOND MODULARITY: A DEVELOPMENTAL PERSPECTIVE ON COGNITIVE SCIENCE. MIT Press/Bradford Books. A book intended to excite connectionists and other non- developmentalists about the essential role that a developmental perspective has in understanding the special nature of human cognition compared to other species. Contents: 1. Taking development seriously 2. The child as a linguist 3. The child as a physicist 4. The child as a mathematician 5. The child as a psychologist 6. The child as a notator 7. Nativism, domain specificity and PiagetUs constructivism 8. Modelling development: representational redescription and connectionism 9. Concluding speculations Reprints of articles obtainable from: Annette Karmiloff-Smith Medical Research Council Cognitive Development Unit London WC1H 0AH. U.K.  From SCHOLTES at ALF.LET.UVA.NL Mon Feb 8 06:19:00 1993 From: SCHOLTES at ALF.LET.UVA.NL (SCHOLTES@ALF.LET.UVA.NL) Date: 08 Feb 1993 12:19 +0100 (MET) Subject: PhD Dissertation Available Message-ID: <346B17ED606070C5@VAX1.SARA.NL> =================================================================== Ph.D. DISSERTATION AVAILABLE on Neural Networks, Natural Language Processing, Information Retrieval 292 pages and over 350 references =================================================================== A Copy of the dissertation "Neural Networks in Natural Language Processing and Information Retrieval" by Johannes C. Scholtes can be obtained for cost price and fast airmail- delivery at US$ 25,-. Payment by Major Creditcards (VISA, AMEX, MC, Diners) is accepted and encouraged. Please include Name on Card, Number and Exp. Date. Your Credit card will be charged for Dfl. 47,50. Within Europe one can also send a Euro-Cheque for Dfl. 47,50 to: University of Amsterdam J.C. Scholtes Dufaystraat 1 1075 GR Amsterdam The Netherlands Do not forget to mention a surface shipping address. Please allow 2-4 weeks for delivery. Abstract 1.0 Machine Intelligence For over fifty years the two main directions in machine intelligence (MI), neural networks (NN) and artificial intelligence (AI), have been studied by various persons with many different backgrounds. NN and AI seemed to conflict with many of the traditional sciences as well as with each other. The lack of a long research history and well defined foundations has always been an obstacle for the general acceptance of machine intelligence by other fields. At the same time, traditional schools of science such as mathematics and physics developed their own tradition of new or "intelligent" algorithms. Progress made in the field of statistical reestimation techniques such as the Hidden Markov Models (HMM) started a new phase in speech recognition. Another application of the progress of mathematics can be found in the application of the Kalman filter in the interpretation of sonar and radar signals. Much more examples of such "intelligent" algorithms can be found in the statistical classification en filtering techniques of the study of pattern recognition (PR). Here, the field of neural networks is studied with that of pattern recognition in mind. Although only global qualitative comparisons are made, the importance of the relation between them is not to be underestimated. In addition it is argued that neural networks do indeed add something to the fields of MI and PR, instead of competing or conflicting with them. 2.0 Natural Language Processing The study of natural language processing (NLP) exists even longer than that of MI. Already in the beginning of this century people tried to analyse human language with machines. However, serious efforts had to wait until the development of the digital computer in the 1940s, and even then, the possibilities were limited. For over 40 years, symbolic AI has been the most important approach in the study of NLP. That this has not always been the case, may be concluded from the early work on NLP by Harris. As a matter of fact, Chomsky's Syntactic Structures was an attack on the lack of structural properties in the mathematical methods used in those days. But, as the latter's work remained the standard in NLP, the former has been forgotten completely until recently. As the scientific community in NLP devoted all its attention to the symbolic AI-like theories, the only useful practical implementation of NLP systems were those that were based on statistics rather than on linguistics. As a result, more and more scientists are redirecting their attention towards the statistical techniques available in NLP. The field of connectionist NLP can be considered as a special case of these mathematical methods in NLP. More than one reason can be given to explain this turn in approach. On the one hand, many problems in NLP have never been addressed properly by symbolic AI. Some examples are robust behavior in noisy environments, disambiguation driven by different kinds of knowledge, commensense generalizations, and learning (or training) abilities. On the other hand, mathematical methods have become much stronger and more sensitive to specific properties of language such as hierarchical structures. Last but not least, the relatively high degree of success of mathematical techniques in commercial NLP systems might have set the trend towards the implementation of simple, but straightforward algorithms. In this study, the implementation of hierarchical structures and semantical features in mathematical objects such as vectors and matrices is given much attention. These vectors can then be used in models such as neural networks, but also in sequential statistical procedures implementing similar characteristics. 3.0 Information Retrieval The study of information retrieval (IR) was traditionally related to libraries on the one hand and military applications on the other. However, as PC's grew more popular, most common users loose track of the data they produced over the last couple of years. This, together with the introduction of various "small platform" computer programs made the field of IR relevant to ordinary users. However, most of these systems still use techniques that have been developed over thirty years ago and that implement nothing more than a global surface analysis of the textual (layout) properties. No deep structure whatsoever, is incorporated in the decision whether or not to retrieve a text. There is one large dilemma in IR research. On the one hand, the data collections are so incredibly large, that any method other than a global surface analysis would fail. On the other hand, such a global analysis could never implement a contextually sensitive method to restrict the number of possible candidates returned by the retrieval system. As a result, all methods that use some linguistic knowledge exist only in laboratories and not in the real world. Conversely, all methods that are used in the real world are based on technological achievements from twenty to thirty years ago. Therefore, the field of information retrieval would be greatly indebted to a method that could incorporate more context without slowing down. As computers are only capable of processing numbers within reasonable time limits, such a method should be based on vectors of numbers rather than on symbol manipulations. This is exactly where the challenge is: on the one hand keep up the speed, and on the other hand incorporate more context. If possible, the data representation of the contextual information must not be restricted to a single type of media. It should be possible to incorporate symbolic language as well as sound, pictures and video concurrently in the retrieval phase, although one does not know exactly how yet... Here, the emphasis is more on real-time filtering of large amounts of dynamic data than on document retrieval from large (static) data bases. By incorporating more contextual information, it should be possible to implement a model that can process large amounts of unstructured text without providing the end-user with an overkill of information. 4.0 The Combination As this study is a very multi-disciplinary one, the risk exists that it remains restricted to a surface discussion of many different problems without analyzing one in depth. To avoid this, some central themes, applications and tools are chosen. The themes in this work are self-organization, distributed data representations and context. The applications are NLP and IR, the tools are (variants of) Kohonen feature maps, a well known model from neural network research. Self-organization and context are more related to each other than one may suspect. First, without the proper natural context, self-organization shall not be possible. Next, self-organization enables one to discover contextual relations that were not known before. Distributed data representation may solve many of the unsolved problems in NLP and IR by introducing a powerful and efficient knowledge integration and generalization tool. However, distributed data representation and self-organization trigger new problems that should be solved in an elegant manner. Both NLP and IR work on symbolic language. Both have properties in common but both focus on different features of language. In NLP hierarchical structures and semantical features are important. In IR the amount of data sets the limitations of the methods used. However, as computers grow more powerful and the data sets get larger and larger, both approaches get more and more common ground. By using the same models on both applications, a better understanding of both may be obtained. Both neural networks and statistics would be able to implement self-organization, distrib- uted data and context in the same manner. In this thesis, the emphasis is on Kohonen feature maps rather than on statistics. However, it may be possible to implement many of the techniques used with regular sequential mathematical algorithms. So, the true aim of this work can be formulated as the understanding of self-organization, distributed data representation, and context in NLP and IR, by in depth analysis of Kohonen feature maps. ==============================================================================  From george at psychmips.york.ac.uk Mon Feb 8 08:32:38 1993 From: george at psychmips.york.ac.uk (George Bolt) Date: Mon, 8 Feb 93 13:32:38 +0000 (GMT) Subject: Does backprop need the derivative ?? Message-ID: Heini Withagen wrote: In his paper, 'An Empirical Study of Learning Speed in Back-Propagation Networks', Scott E. Fahlmann shows that with the encoder/decoder problem it is possible to replace the derivative of the transfer function by a constant. I have been able to reproduce this example. However, for several other examples, it was not possible to get the network converged using a constant for the derivative. - end quote - I've looked at BP learning in MLP's w.r.t. fault tolerance and found that the derivative of the transfer function is used to *stop* learning. Once a unit's weights for some particular input (to that unit rather than the network) are sufficiently developed for it to decide whether to output 0 or 1, then weight changes are approximately zero due to this derivative. I would imagine that by setting it to a constant, then a MLP will over- learn certain patterns and be unable to converge to a state of equilibrium, i.e. all patterns are matched to some degree. A better route would be to set the derivative function to a constant over a range [-r,+r], where f[r] - (sorry) f( |r| ) -> 1.0. To make individual units robust with respect to weights, make r=c.a where f( |a| ) -> 1.0 and c is a small constant multiplicative value. - George Bolt University of York, U.K.  From movellan at cogsci.UCSD.EDU Mon Feb 8 20:33:19 1993 From: movellan at cogsci.UCSD.EDU (Javier Movellan) Date: Mon, 8 Feb 93 17:33:19 PST Subject: Does backprop need the derivative ?? In-Reply-To: Marwan Jabri's message of Sat, 6 Feb 1993 23:49:53 +1100 <9302061249.AA17234@sedal.sedal.su.OZ.AU> Message-ID: <9302090133.AA16068@cogsci.UCSD.EDU> My experience with Boltzmann machines and GRAIN/diffusion networks (the continuous stochastic version of the Boltzmann machine) has been that replacing the real gradient by its sign times a constant accelerates learning DRAMATICALLY. I first saw this technique in one of the original CMU tech reports on the Boltzmann machine. I believe Peterson and Hartman and Peterson and Anderson also used this technique, which they called "Manhattan updating", with the deterministic Mean Field learning algorithm. I believe they had an article in "Complex Systems" comparing Backprop and Mean-Field with both with standard gradient descent and with Manhattan updating. It is my understanding that the Mean-Field/Boltzmann chip developed at Bellcore uses "Manhattan Updating" as its default training method. Josh Allspector is the person to contact about this. At this point I've tried 4 different learning algorithms with continuous and discrete stochastic networks and in all cases Manhattan Updating worked better than straight gradient descent.The question is why Manhattan updating works so well (at least in stochastic and Mean-Field networks) ? One possible interpreation is that Manhattan updating limits the influence of outliers and thus it performs something similar to robust regression. Another interpretation is that Manhattan updating avoids the saturation regions, where the error space becomes almost flat in some dimensions, slowing down learning. One of the disadvantages of Manhattan updating is that sometimes one needs to reduce the weight change constant at the end of learning. But sometimes we also do this in standard gradient descent anyway. -Javier  From oby%firenze%venezia.ROCKEFELLER.EDU at ROCKVAX.ROCKEFELLER.EDU Mon Feb 8 20:42:08 1993 From: oby%firenze%venezia.ROCKEFELLER.EDU at ROCKVAX.ROCKEFELLER.EDU (Klaus Obermayer) Date: Mon, 8 Feb 93 20:42:08 -0500 Subject: No subject Message-ID: <9302090142.AA01612@firenze> The following article is available as a (hardcopy) preprint: Obermayer K. and Blasdel G.G. (1993), Geometry of Orientation and Ocular Dominance Columns in Monkey Striate Cortex, J. Neurosci., in press. Abstract: In addition to showing that ocular dominance is organized in slabs and that orientation preferences are organized in linear sequences likely to reflect slabs, Hubel and Wiesel (1974) discussed the intriguing possibility that slabs of orientation might intersect slabs of ocular dominance at some consistent angle. Advances in optical imaging now make it possible to test this possibility directly. When maps of orientation are analyzed quantitatively, they appear to arise from a combination of at least two competing themes: one where orientation preferences change linearly along straight axes, remaining constant along perpendicular axes and forming iso-orientation slabs along the way, and one where orientation preferences change continuously along circular axes, remaining constant along radial axes and forming singularities at the centers of the spaces enclosed. When orientation patterns are compared with ocular dominance patterns from the same cortical regions, quantitative measures reveal: 1) that singularities tend to lie at the centers of ocular dominance columns, 2) that linear zones (arising where orientation preferences change along straight axes) tend to lie at the edges of ocular dominance columns, and 3) that the short iso-orientation bands within each linear zone tend to intersect the borders of ocular dominance slabs at angles of approximately 90$^o$. ----------------------------------------------------------------- The original article contains color figures which - for cost reasons - have to be reproduced black and white. If you would like to obtain a copy, please send your surface mail address to: Klaus Obermayer The Rockefeller University oby at rockvax.rockefeller.edu -----------------------------------------------------------------  From thgoh at iss.nus.sg Tue Feb 9 01:05:52 1993 From: thgoh at iss.nus.sg (Goh Tiong Hwee) Date: Tue, 9 Feb 1993 14:05:52 +0800 (WST) Subject: Does Backprop need the derivative Message-ID: <9302090605.AA08961@iss.nus.sg> From fellous%hyla.usc.edu at usc.edu Wed Feb 10 21:48:50 1993 From: fellous%hyla.usc.edu at usc.edu (Jean-Marc Fellous) Date: Wed, 10 Feb 93 18:48:50 PST Subject: CNE / USC Workshop Reminder and Update. Message-ID: <9302110248.AA01295@hyla.usc.edu> Thank you for posting the following final announcement: *********************** Last Reminder and Update ************************ SCHEMAS AND NEURAL NETWORKS INTEGRATING SYMBOLIC AND SUBSYMBOLIC APPROACHES TO COOPERATIVE COMPUTATION A Workshop sponsored by the Center for Neural Engineering University of Southern California Los Angeles, CA 90089-2520 April 13th and 14th, 1993 Program Committee: Michael Arbib (Organizer), John Barnden, George Bekey, Francisco Cervantes-Perez, Damian Lyons, Paul Rosenbloom, Ron Sun, Akinori Yonezawa A previous announcement (reproduced below) announced a registra- tion fee of $150 and advertised the availability of hotel accom- modation at $70/night. To encourage the participation of qualified students we have made 3 changes: 1) We have appointed Jean-Marc Fellous as Student Chair for the meeting to coordinate the active involvement of such students. 2) We offer a Student Registration Fee of only $40 to students whose application is accompanied by a letter from their supervi- sor attesting to their student status. 3) Mr. Fellous has identified a number of lower-cost housing op- tions, and will respond to queries to fellous at rana.usc.edu The original announcement - with updated registration form - fol- lows: To design complex technological systems and to analyze complex biological and cognitive systems, we need a multilevel methodolo- gy which combines a coarse-grain analysis of cooperative or dis- tributed computation (we shall refer to the computing agents at this level as "schemas") with a fine-grain model of flexible, adaptive computation (for which neural networks provide a power- ful general paradigm). Schemas provide a language for distri- buted artificial intelligence, perceptual robotics, cognitive modeling, and brain theory which is "in the style of the brain", but at a relatively high level of abstraction relative to neural networks. The proposed workshop will provide a 2-hour introductory tutorial and problem statement by Michael Arbib, and sessions in which an invited paper will be followed by several contributed papers, selected from those submitted in response to this call for pa- pers. Preference will be given to papers which present practical examples of, theory of, and/or methodology for the design and analysis of complex systems in which the overall specification or analysis is conducted in terms of schemas, and where some but not necessarily all of the schemas are implemented in neural net- works. A list of sample topics for contributions is as follows, where a hybrid approach means one in which the abstract schema level is integrated with neural or other lower level models: Schema Theory as a description language for neural networks Modular neural networks Linking DAI to Neural Networks to Hybrid Architecture Formal Theories of Schemas Hybrid approaches to integrating planning & reaction Hybrid approaches to learning Hybrid approaches to commonsense reasoning by integrating neural networks and rule- based reasoning (using schema for the integration) Programming Languages for Schemas and Neural Networks Concurrent Object-Oriented Programming for Distributed AI and Neural Networks Schema Theory Applied in Cognitive Psychology, Linguistics, Robotics, AI and Neuroscience Prospective contributors should send a hard copy of a five-page extended abstract, including figures with informative captions and full references (either by regular mail or fax) by February 15, 1993 to: Michael Arbib, Center for Neural Engineering University of Southern California Los Angeles, CA 90089-2520 USA Tel: (213) 740-9220 Fax: (213) 746-2863 arbib at pollux.usc.edu] Please include your full address, including fax and email, on the paper. Notification of acceptance or rejection will be sent by email no later than March 1, 1993. There are currently no plans to issue a formal proceedings of full papers, but revised versions of ac- cepted abstracts received prior to April 1, 1993 will be collect- ed with the full text of the Tutorial in a CNE Technical Report which will be made available to registrants at the start of the meeting. [A useful way to structure such an abstract is in short numbered sections, where each section presents (in a small type face!) the material corresponding to one transparency/slide in a verbal presentation. This will make it easy for an audi- ence to take notes if they have a copy of the abstract at your presentation.] Hotel Information: Attendees may register at the hotel of their choice, but the closest hotel to USC is the University Hilton, 3540 South Figueroa Street, Los Angeles, CA 90007, Phone: (213) 748- 4141, Reservation: (800) 872-1104, Fax: (213) 748- 0043. A single room costs $70/night while a double room costs $75/night. Workshop participants must specify that they are "Schemas and Neural Networks Workshop" attendees to avail of the above rates. Information on student accommodation may be ob- tained from the Student Chair, Jean-Marc Fellous, fellous at rana.usc.edu. The registration fee of $150 ($40 for qualified students who in- clude a "certificate of student status" from their advisor) in- cludes a copy of the abstracts, coffee breaks, and a dinner to be held on the evening of April 13th. Those wishing to register should send a check payable to "Center for Neural Engineering, USC" for $150 ($40 for students) together with the following information to: Paulina Tagle Center for Neural Engineering University of Southern California University Park Los Angeles, CA 90089-2520 USA ---------------------------------------------------------- SCHEMAS AND NEURAL NETWORKS Center for Neural Engineering USC April 13 - 14, 1993 NAME: ___________________________________________ ADDRESS: _________________________________________ PHONE NO.: _______________ FAX:___________________ EMAIL: ___________________________________________ I intend to submit a paper: YES [ ] NO [ ]  From ljubomir at darwin.bu.edu Wed Feb 10 21:12:30 1993 From: ljubomir at darwin.bu.edu (Ljubomir Buturovic) Date: Wed, 10 Feb 93 21:12:30 -0500 Subject: Does backprop need the derivative? Message-ID: <9302110212.AA07255@darwin.bu.edu> Marwan Jabri: > Regarding the idea of Simplex that has been suggested. The inquirer was > talking about on-chip learning. Have you in your experiments done a > limited precision Simplex? Have you tried it on a chip in in-loop mode? > Philip Leong here has tried a similar idea (I think) a while back. The > problem with this approach is that you need to a have a very good guess at > your starting point as the Simplex will move you from one vertex (feasible > solution) to another while expanding the weight solution space. > Philip's experience is that it does work for small problems when you have > a good guess! No, we did not try limited precision Simplex, since the method has another serious limitation, which is memory complexity. So there is no point performing such refined studies until this problem is resolved, let alone on-chip implementation. The biggest problem we tried it on succesfully was 11-dimensional (i. e., input samples were 11-dimensional vectors). The initial guess was pseudo-random, like in back-propagation. In another, 12-dimensional example, it did not do well (neither did back-prop, but Simplex was much worse), so it might be true that it needs a good starting point. Ljubomir Buturovic Boston University BioMolecular Engineering Research Center 36 Cummington Street, 3rd Floor Boston, MA 02215 office: 617-353-7123 home: 617-738-6487  From mozer at dendrite.cs.colorado.edu Thu Feb 11 23:47:27 1993 From: mozer at dendrite.cs.colorado.edu (Michael C. Mozer) Date: Thu, 11 Feb 1993 21:47:27 -0700 Subject: Preprint: Neural net architectures for temporal sequence processing Message-ID: <199302120447.AA06812@neuron.cs.colorado.edu> -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- PLEASE DO NOT POST TO OTHER BOARDS -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- Neural net architectures for temporal sequence processing Michael C. Mozer Department of Computer Science University of Colorado I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes time-varying patterns requires two conceptually distinct components: a short-term memory that holds on to relevant past events and an associator that uses the short-term memory to classify or predict. The taxonomy is based on a characterization of short-term memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar-Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. To appear in: A. S. Weigend & N. A. Gershenfeld (Eds.), _Predicting the future and understanding the past_. Redwood City, CA: Addison-Wesley. Spring 1993. -.--.---.----.-----.------.-------.--------.-------.------.-----.----.---.--.- To retrieve: unix> ftp archive.cis.ohio-state.edu Name: anonymous 230 Guest ogin ok, access restrictions apply. ftp> cd pub/neuroprose ftp> binary ftp> get mozer.architectures.ps.Z 200 PORT command successful. ftp> quit unix> zcat mozer.architectures.ps.Z | lpr Warning: May not print on wimpy laser printers.  From kolen-j at cis.ohio-state.edu Tue Feb 9 07:51:53 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 07:51:53 -0500 Subject: Does backprop need the derivative? In-Reply-To: Radford Neal's message of Sun, 7 Feb 1993 12:24:15 -0500 <93Feb7.122429edt.227@neuron.ai.toronto.edu> Message-ID: <9302091251.AA27813@pons.cis.ohio-state.edu> The sign of the derivative is always positive ( remember o(1-o) and 0 >Other posters have discussed, regarding backprop... > >> ... the question whether the derivative can be replaced by a constant, > >To clarify, I believe the intent is that the "constant" have the same >sign as the derivative, but have constant magnitude. I haven't been following this thread, but the following reference may be helpful to those that are. Blum (Annals of Math. Statistics vol. 25 1954 p.385) shows that if the "constant magnitude" is going to zero (so that the system is convergent) the convergence is not to a minimum of the expected error (this is usually what we want backprop to do), but to a minimum of the *median* of the error. Chris Darken darken at learning.scr.siemens.com  From munro at lis.pitt.edu Thu Feb 11 11:14:44 1993 From: munro at lis.pitt.edu (fac paul munro) Date: Thu, 11 Feb 93 11:14:44 EST Subject: Summary of "Does backprop need the derivative ??" In-Reply-To: Mail from 'Heini Withagen ' dated: Tue, 9 Feb 1993 11:46:06 +0100 (MET) Message-ID: <9302111614.AA15497@icarus.lis.pitt.edu> Forgive the review of college math, but there are a few issues, while obvious to many, might be worth reviewing here... [1] The gradient of a well-behaved single-valued function of N variables (here the error as a function of the weights) is generally orthogonal to an N-1 dimensional manifold on which the function is constant (an iso-error surface) [2] The effect of infinitesimal motion in the space on the function can be computed as the inner (dot) product of the gradient vector with the movement vector; thus, as long as the dot product between the gradient and the delta-w vector is negative, the error will decrease. That is, the new iso-error surface will correspond to a lower error value. [3] This implies that the signs of the errors is adequate to reduce the error, assuming the learning rate is sufficiently small, since any two vectors with all components the same sign must have a positive inner product! [They lie in the same orthant of the space] Having said all this, I must point out that the argument pertains only to single patterns. That is, eliminating the derivative term, is guaranteed to reduce the error for the pattern that is presented. Its effect on the error summed over the training set is not guaranteed, even for batch learning... One more caveat: Of course, if the nonlinear part of the units' transfer function is non-monotonic (i.e., the sign of the derivative varies), be sure to throw the derivative back in! - Paul Munro  From dhw at t13.Lanl.GOV Thu Feb 11 17:19:13 1993 From: dhw at t13.Lanl.GOV (David Wolpert) Date: Thu, 11 Feb 93 15:19:13 MST Subject: new paper Message-ID: <9302112219.AA23017@t13.lanl.gov> *************************************************************** DO NOT FORWARD TO OTHER BOARDS OR LISTS *************************************************************** The following paper has been placed in neuroprose, under the name wolpert.nips92.ps.Z. It is a major revision of an earlier preprint on the same topic. An abbreviated version (2 fewer pages) will appear in the proceedings of NIPS 92. 0N THE USE OF EVIDENCE IN NEURAL NETWORKS. David H. Wolpert, Santa Fe Institute Abstract: The Bayesian evidence approximation, which is closely related to generalized maximum likelihood, has recently been employed to determine the noise and weight-penalty terms for training neural nets. This paper shows that it is far simpler to perform the exact calculation than it is to set up the evidence approximation. Moreover, unlike that approximation, the exact result does not have to be re-calculated for every new data set. Nor does it require the running of complex numerical computer code (the exact result is closed form). In addition, it turns out that for neural nets, the evidence procedure's MAP estimate is *in toto* approximation error. Another advantage of the exact analysis is that it does not lead to incorrect intuition, like the claim that one can "evaluate different priors in light of the data". This paper ends by discussing sufficiency conditions for the evidence approximation to hold, along with the implications of those conditions. Although couched in terms of neural nets, the analysis of this paper holds for any Bayesian interpolation problem. Recover the file in the usual way: unix> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. 220 cheops.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password: {your address} 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get wolpert.nips92.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for rosenblatt.reborn.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye unix> uncompress wolpert.nips92.ps.Z unix> lpr wolpert.nips92.ps (or however you print postscript  From mozer at dendrite.cs.colorado.edu Fri Feb 12 00:10:05 1993 From: mozer at dendrite.cs.colorado.edu (Michael C. Mozer) Date: Thu, 11 Feb 1993 22:10:05 -0700 Subject: connectionist models summer school -- final call for applications Message-ID: <199302120510.AA06977@neuron.cs.colorado.edu> FINAL CALL FOR APPLICATIONS CONNECTIONIST MODELS SUMMER SCHOOL The University of Colorado will host the 1993 Connectionist Models Summer School from June 21 to July 3, 1993. The purpose of the summer school is to provide training to promising young researchers in connectionism (neural networks) by leaders of the field and to foster interdisciplinary collaboration. This will be the fourth such program in a series that was held at Carnegie-Mellon in 1986 and 1988 and at UC San Diego in 1990. Previous summer schools have been extremely successful and we look forward to the 1993 session with anticipation of another exciting event. The summer school will offer courses in many areas of connectionist modeling, with emphasis on artificial intelligence, cognitive neuroscience, cognitive science, computational methods, and theoretical foundations. Visiting faculty (see list of invited faculty below) will present daily lectures and tutorials, coordinate informal workshops, and lead small discussion groups. The summer school schedule is designed to allow for significant interaction among students and faculty. As in previous years, a proceedings of the summer school will be published. Applications will be considered only from graduate students currently enrolled in Ph.D. programs. About 50 students will be accepted. Admission is on a competitive basis. Tuition will be covered for all students, and we expect to have scholarships available to subsidize housing and meal costs, but students are responsible for their own travel arrangements. Applications should include the following materials: * a vita, including mailing address, phone number, electronic mail address, academic history, list of publications (if any), and relevant courses taken with instructors' names and grades received; * a one-page statement of purpose, explaining major areas of interest and prior background in connectionist modeling and neural networks; * two letters of recommendation from individuals familiar with the applicants' work (either mailed separately or in sealed envelopes); and * a statement from the applicant describing potential sources of financial support available (department, advisor, etc.) for travel expenses. Applications should be sent to: Connectionist Models Summer School c/o Institute of Cognitive Science Campus Box 344 University of Colorado Boulder, CO 80309 All application materials must be received by March 1, 1993. Admission decisions will be announced around April 15. If you have specific questions, please write to the address above or send e-mail to "cmss at cs.colorado.edu". Application materials cannot be accepted via e-mail. Organizing Committee Jeff Elman (UC San Diego) Mike Mozer (University of Colorado) Paul Smolensky (University of Colorado) Dave Touretzky (Carnegie Mellon) Andreas Weigend (Xerox PARC and University of Colorado) Additional faculty will include: Yaser Abu-Mostafa (Cal Tech) Sue Becker (McMaster University) Andy Barto (University of Massachusetts, Amherst) Jack Cowan (University of Chicago) Peter Dayan (Salk Institute) Mary Hare (Birkbeck College) Cathy Harris (Boston University) David Haussler (UC Santa Cruz) Geoff Hinton (University of Toronto) Mike Jordan (MIT) John Kruschke (Indiana University) Jay McClelland (Carnegie Mellon) Ennio Mingolla (Boston University) Steve Nowlan (Salk Institute) Dave Plaut (Carnegie Mellon) Jordan Pollack (Ohio State) Dean Pomerleau (Carnegie Mellon) Dave Rumelhart (Stanford) Patrice Simard (ATT Bell Labs) Terry Sejnowski (UC San Diego and Salk Institute) Sara Solla (ATT Bell Labs) Janet Wiles (University of Queensland) The Summer School is sponsored by the American Association for Artificial Intelligence, the National Science Foundation, Siemens Research Center, and the University of Colorado Institute of Cognitive Science. Colorado has recently passed a law explicitly denying protection for lesbians, gays, and bisexuals. However, the Summer School does not discriminate in admissions on the basis of age, sex, race, national origin, religion, disability, veteran status, or sexual orientation.  From heiniw at sun1.eeb.ele.tue.nl Tue Feb 9 05:46:06 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Tue, 9 Feb 1993 11:46:06 +0100 (MET) Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302091046.AA08161@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 191 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/7e2fa3eb/attachment-0001.ksh From kolen-j at cis.ohio-state.edu Tue Feb 9 08:46:53 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 08:46:53 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@sef-pmax.slisp.cs.cmu.edu's message of Sun, 07 Feb 93 12:56:03 EST <9302091257.AA06456@everest.eng.ohio-state.edu> Message-ID: <9302091346.AA28166@pons.cis.ohio-state.edu> From: Scott_Fahlman at sef-pmax.slisp.cs.cmu.edu Of course, a learning system implemented in analog hardware might have only a few bits of accuracy due to noise and nonlinearity in the circuits, but it wouldn't suffer from this quantization effect, since you get a sort of probabilistic dithering for free. This assumes, of course, that the mechanism is actually "computing" using the available bits. Bits are the result of binary measurements. An analog device does not normally convert voltages or currents into a binary representation and then operate on it. An analog mechanism sloppilly implementing backprop should be able to tweak the weights in the general direction, but not necessarily the same direction as theoretical backprop. John Kolen  From KRUSCHKE at ucs.indiana.edu Tue Feb 9 09:45:45 1993 From: KRUSCHKE at ucs.indiana.edu (John K. Kruschke) Date: Tue, 9 Feb 93 09:45:45 EST Subject: postdoctoral traineeships available Message-ID: POST-DOCTORAL FELLOWSHIPS AT INDIANA UNIVERSITY Postdoctoral Traineeships in MODELING OF COGNITIVE PROCESSES Please call this notice to the attention of all interested parties. The Psychology Department and Cognitive Science Programs at Indiana University are pleased to announce the availability of one or more Postdoctoral Traineeships in the area of Modeling of Cognitive Processes. The appointment will pay rates appropriate for a new PhD (about $18,800), and will be for one year, starting after July 1, 1993. The duration could be extended to two years if a training grant from NIH is funded as anticipated (we should receive final notification by May 1). Post-docs are offered to qualified individuals who wish to further their training in mathematical modeling or computer simulation modeling, in any substantive area of cognitive psychology or Cognitive Science. We are particularly interested in applicants with strong mathematical, scientific, and research credentials. Indiana University has superb computational and research facilities, and faculty with outstanding credentials in this area of research, including Richard Shiffrin and James Townsend, co-directors of the training program, and Robert Nosofsky, Donald Robinson, John Castellan, John Kruschke, Robert Goldstone, Geoffrey Bingham, and Robert Port. Trainees will be expected to carry out original theoretical and empirical research in association with one or more of these faculty and their laboratories, and to interact with other relevant faculty and the other pre- and postdoctoral trainees. Interested applicants should send an up to date vitae, personal letter describing their specific research interests, relevant background, goals, and career plans, and reference letters from two individuals. Relevant reprints and preprints should also be sent. Women, minority group members, and handicapped individuals are urged to apply. PLEASE NOTE: The conditions of our anticipated grant restrict awards to US citizens, or current green card holders. Awards will also have a 'payback' provision, generally requiring awardees to carry out research or teach for an equivalent period after termination of the traineeship. Send all materials to: Professors Richard Shiffrin and James Townsend, Program Directors Department of Psychology, Room 376B Indiana University Bloomington, IN 47405 We may be contacted at: 812-855-2722; Fax: 812-855-4691 email: shiffrin at ucs.indiana.edu Indiana University is an Affirmative Action Employer  From kenm at prodigal.psych.rochester.edu Tue Feb 9 10:50:49 1993 From: kenm at prodigal.psych.rochester.edu (Ken McRae) Date: Tue, 9 Feb 93 10:50:49 EST Subject: paper available Message-ID: <9302091550.AA20269@prodigal.psych.rochester.edu> The following paper is now available in pub/neuroprose. Catastrophic Interference is Eliminated in Pretrained Networks Ken McRae University of Rochester & Phil A. Hetherington McGill University When modeling strictly sequential experimental memory tasks, such as serial list learning, connectionist networks appear to experience excessive retroactive interference, known as catastrophic interference (McCloskey & Cohen,1989; Ratcliff, 1990). The main cause of this interference is overlap among representations at the hidden unit layer (French, 1991; Hetherington,1991; Murre, 1992). This can be alleviated by constraining the number of hidden units allocated to representing each item, thus reducing overlap and interference (French, 1991; Kruschke, 1992). When human subjects perform a laboratory memory experiment, they arrive with a wealth of prior knowledge that is relevant to performing the task. If a network is given the benefit of relevant prior knowledge, the representation of new items is constrained naturally, so that a sequential task involving novel items can be performed with little interference. Three laboratory memory experiments (ABA free recall, serial list, and ABA paired-associate learning) are used to show that little or no interference is found in networks that have been pretrained with a simple and relevant knowledge base. Thus, catastrophic interference is eliminated when critical aspects of simulations are made to be more analogous to the corresponding human situation. Thanks again to Jordan Pollack for maintaining this electronic library. An example of how to retrieve mcrae.pretrained.ps.Z: your machine> ftp archive.cis.ohio-state.edu Connected to archive.cis.ohio-state.edu. 220 archive FTP server (Version 6.15 Thu Apr 23 15:28:03 EDT 1992) ready. Name (archive.cis.ohio-state.edu:kenm): anonymous 331 Guest login ok, send e-mail address as password. Password: 230 Guest login ok, access restrictions apply. ftp> cd pub/neuroprose 250-Please read the file README 250- it was last modified on Mon Feb 17 15:51:43 1992 - 357 days ago 250-Please read the file README~ 250- it was last modified on Wed Feb 6 16:41:29 1991 - 733 days ago 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get mcrae.pretrained.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for mcrae.pretrained.ps.Z (129046 bytes). 226 Transfer complete. local: mcrae.pretrained.ps.Z remote: mcrae.pretrained.ps.Z 129046 bytes received in 30 seconds (4.2 Kbytes/s) ftp> quit 221 Goodbye. your machine> uncompress mcrae.pretrained.ps.Z your machine> then print the file  From kolen-j at cis.ohio-state.edu Tue Feb 9 13:31:43 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Tue, 9 Feb 93 13:31:43 -0500 Subject: Test & Derivatives in Backprop Message-ID: <9302091831.AA00142@pons.cis.ohio-state.edu> [I hope that this makes it to connectionists, the last couple of postings haven't made it back. So I have summarized these replies in one message for general consumption.] Regarding the latest talk about derivatives in backprop, I had looked into replacing the different mathematical operations with other, more implementation-amenable operations. This included replacing the derivative of the squashing function with d(x)=min(x,1-x). The results of these tests show that backprop is pretty stable as long as the qualitative shape of the operations are maintained. If you replace the derivative with a constant or linear (wrt activation) function it doesn't work at all for the learning tasks I considered. As long as the derivative replacement is minimal in the extreme activations and maximal at 0.5 (wrt the traditional sigmoid), the operation will not suffer dramatically. After reading Fahlman's observation about loosing bits to noise I had the following response. Bits come from binary decisions. Analog systems don't do that in normal processing, normally some continuous value affects another continuous value. No where do they perform A/D conversion and then operate on the bits. If there is no measurement device, then talking about bits doesn't make sense. John Kolen  From guy at cs.uq.oz.au Tue Feb 9 17:25:35 1993 From: guy at cs.uq.oz.au (guy@cs.uq.oz.au) Date: Wed, 10 Feb 93 08:25:35 +1000 Subject: Does backprop need the derivative ?? Message-ID: <9302092225.AA06661@client> The question has been asked whether the full derivative is needed for backprop to work, or whether the sign of the derivative is sufficient. As far as I am aware, the discussion has not defined at what point the derivative is truncated to +/-1. This might occur (1) for each input/output pair when the error is fed into the output layer, (2) in epoch based learning, the exact derivative of each weight over the training set might be computed, but the update to the weight truncated, or (3...) many intermediate cases. I believe one problem with limited precision weights is as follows. The magnitude of the update may be smaller than the limit of precision on the weight (which has much greater magnitude). If the machine arithmetic then rounds the updated weight to the nearest representable value, the updated weight will be rounded to its old value, and no learning will occur. I am co-author of a technical report which addressed this problem. In our algorithm, weights had very limited precision but their derivatives over the whole training set were computed exactly. The weight update step would shift the weight value to the next representable value with a probability proportional to the size of the derivative. In our inexhaustive testing, we found that very limited precision weights and activations could be used. The technical report is available in hardcopy (limited numbers) and postscript. My addresses are "guy at cs.uq.oz.au" and "Guy Smith, Department of Computer Science, The University of Queensland, St Lucia 4072, Australia". Guy Smith.  From meng at spring.kuee.kyoto-u.ac.jp Wed Feb 10 11:58:19 1993 From: meng at spring.kuee.kyoto-u.ac.jp (meng@spring.kuee.kyoto-u.ac.jp) Date: Wed, 10 Feb 93 11:58:19 JST Subject: Does backprop need the derivative ?? In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Sun, 07 Feb 93 13:02:42 EST <9302091925.AA12414@ntt-sh.ntt.jp> 9 Feb 93 1:36:51 EST 9 Feb 93 1:35:08 EST 7 Feb 93 13:03:24 EST Message-ID: <9302100258.AA20634@spring.kuee.kyoto-u.ac.jp> Thinking about it, it seems that the derivative always can be replaced by a sufficiently small constant. I.e., for a certain training set and a certain requirement of precision on the ouput units, you can find a constant that is smaller than a certain constant that, with the same starting point, will find the same minimum for the same network as an algorithm that is using the derivative. The problem with this of course is that the constant may be so small that the training time may be prohibitive, while the motivation to such a constant is to speed up training. The reason that this works in a lot of instances is, I think, that the requirement of precision is wide enough to let the network jump into a region that is sufficiently close to a minimum. A situation where it wouldn't work, would be a situation where the network is moving in the right direction, but jumping too far, i.e. jumping from one side of a valley to the other alternately, never landing within a region that would give convergence within the requirements set. The use of the derivative solves this by getting smaller when approaching a minimum. Another possibility is that using a constant the network might settle in another minimum (or try to settle in another ("wider") minimum) by virtue of "seeing" the error surface as more coarse grained than the version using a derivative. In some cases, if you're lucky (i.e. has a good initial state in relation to a minimum and the constant you're using) you might hit bull's eye, with another initial state you might be oscillating around the solution (i.e. having the error go up and down without getting within the required limit). In such a case you could switch to using the derivative or simply decrease the constant (maybe how much could be computed on the basis of the increase in error? Just an idea). These are just some thoughts on the subject, no empirical study undertaken. Tore  From "MVUB::BLACK%hermes.mod.uk" at relay.MOD.UK Fri Feb 12 09:50:00 1993 From: "MVUB::BLACK%hermes.mod.uk" at relay.MOD.UK (John V. Black @ DRA Malvern) Date: Fri, 12 Feb 93 14:50 GMT Subject: IEE Third International Conference on ANN's (Registration Announcement) Message-ID: CONFERENCE ANNOUNCEMENT ======================= IEE Third International Conference on Artificial Neural Networks Brighton, UK, 25-27 May 1993. ----------------------------------------------------- This conference, organised by the Institute of Electrical Engineers will cover up-to-date reports on the curent state of research on Artificial Neural Networks, including theoretical understanding of fundamental structures, learning algorithms, implementation and applications. Over 70 papers willl be presented in formal and poster sessions under the following headings APPLICATIONS ARCHITECTURES VISION CONTROL & ROBOTICS MEDICAL SYSTEMS NETWORK ANALYSIS In addition there will be a small exhibition and publishers display, Civic Reception and Conference Dinner. Registration fees are as follows: Member(IEE/associated societies) 235 pounds sterling (inc 35 pounds vat) Non-member 294 " " (inc 43.79 " ") Research Student or Retired 83 " " (inc 12.36 " ") Further information including full programme available from Sheila Griffiths ANN93 Secretariat Conferemce Services Institute of Electrical Engineeers Savoy Place London WC2R 0BL, UK Tel: 071 344 5478/5477 Fax: 071 497 3633 Telex: 261776 IEE LDN G John Black (jvb%hermes.mod.uk at relay.mod.uk) E-mailing for David Lowe  From kolen-j at cis.ohio-state.edu Fri Feb 12 08:11:58 1993 From: kolen-j at cis.ohio-state.edu (john kolen) Date: Fri, 12 Feb 93 08:11:58 -0500 Subject: Does backprop need the derivative ?? In-Reply-To: Mark Evans's message of Thu, 11 Feb 93 10:26:03 GMT <3468.9302111026@it-research-institute.brighton.ac.uk> Message-ID: <9302121311.AA20446@pons.cis.ohio-state.edu> When I used the term stable in my previous posting, I did not entail the mathematical notion of stability when applied to a control system. What I meant was the apparent behavior of the network, learning a set of associations of patterns, was unaffected by quantitative changes in these operations. An analogy I often use is the symbolic dynamics of unimodal iterated function systems. As long as small number of qualitative conditions are true, then the system will exhibit the same symbol dynamics as other functions for which the conditions hold regardless of the numerical differences between functions. Thus the bifurcation diagrams of rx(1-x) and a bump made up of sigmoids will exhibit the same type of period doubling cascaded. Even if it wasn't mathematically stable, but was guaranteed to pass through a region of weight space with usable weights, most of the NN community would find it useful. John  From shim at marlin.nosc.mil Fri Feb 12 13:00:08 1993 From: shim at marlin.nosc.mil (Randy L. Shimabukuro) Date: Fri, 12 Feb 93 10:00:08 -0800 Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302121800.AA01359@marlin.nosc.mil> Congratulations on initiating a very lively discussion. From reading the responses though, it appears that people are interpreting your question differently. At the risk of adding to the confusion let me try to explain. It seems that some people are talking about the derivative of the transfer function (F') and while others are talking about the gradient of the error function. We have looked at both cases: We approximate F' in a manner similar to that suggested by George Bolt. Letting F'(|x|) -> 1 for |x| a for |x|>=r. Where a is a small positive constant, and r is a point where F'(r) is approximately 1. We have also, in a sense, approximated the gradient of the error function by quantizing the weight updates. This is similar to what Peterson and Hartman call "Manhattan updating". In this case it is important to preserve the sign of the derivative. We have found that the first type of approximation has very little effect of back propagation. Depending on the problem, the second type sometimes shortens the learning time and sometimes prevents the network from learning. In some cases it helps to decrease the size of the updates as learning progresses. Randy Shimabukuro  From hartman%pav.mcc.com at mcc.com Sat Feb 13 17:36:04 1993 From: hartman%pav.mcc.com at mcc.com (E. Hartman) Date: Sat, 13 Feb 93 16:36:04 CST Subject: Re. does bp need the derivative? Message-ID: <9302132236.AA01583@energy.pav.mcc.com> Re. the question of the derivative in backprop, Javier Movellan and Randy Shimabukuro mentioned the "Manhattan updating" dicussed in Peterson and Hartman ("Explorations of the Mean Field Theory Learning Algorithm", Neural Networks Vol.2 pp 475-494 1989). This technique computes the gradient exactly, but then keeps only the signs of the components and takes fixed-size weight steps (each weight is changed by a fixed amount, either up or down). We used this technique to advantage, both in backprop and mean field theory nets, on problems with inconsistent data -- data containing exemplars with identical inputs but differing outputs (one-to-many mapping). (The problem in the paper was a classification problem drawn from overlapping gaussian distributions). The reason that this technique helped on this kind of problem is the following. Since the data was highly inconsistent, we found that before taking a step in weight space, it helped to average out the data inconsistencies by accumulating the gradient over a large number of patterns (large batch training). But, typically, it happens that some components of the gradient don't "average out" nicely and instead became very large. So the components of the gradient vary greatly in magnitude, which makes choosing a good learning rate difficult. "Manhattan updating" makes all the components equal in magnitude. We found it necessary to slowly reduce the step size as training proceeds. Eric Hartman  From marwan at sedal.su.oz.au Sat Feb 13 03:03:55 1993 From: marwan at sedal.su.oz.au (Marwan Jabri) Date: Sat, 13 Feb 1993 19:03:55 +1100 Subject: Test & Derivatives in Backprop Message-ID: <9302130803.AA03429@sedal.sedal.su.OZ.AU> > From: john kolen > > [I hope that this makes it to connectionists, the last couple of postings > haven't made it back. So I have summarized these replies in one message > for general consumption.] > > Regarding the latest talk about derivatives in backprop, I had looked into > replacing the different mathematical operations with other, more > implementation-amenable operations. This included replacing the > derivative of the squashing function with d(x)=min(x,1-x). The results of > these tests show that backprop is pretty stable as long as the qualitative > shape of the operations are maintained. If you replace the derivative with > a constant or linear (wrt activation) function it doesn't work at all for > the learning tasks I considered. As long as the derivative replacement is > minimal in the extreme activations and maximal at 0.5 (wrt the traditional > sigmoid), the operation will not suffer dramatically. > > After reading Fahlman's observation about loosing bits to noise I had the > following response. Bits come from binary decisions. Analog systems > don't do that in normal processing, normally some continuous value affects > another continuous value. No where do they perform A/D conversion and then > operate on the bits. If there is no measurement device, then talking about > bits doesn't make sense. > > John Kolen > Are we talking about analog implementations? I hope so because I am. If not, then forget this message. The derivative issue boils down to whether you can implement cheaply, whatever is the approximation. The implication on the training speed depends on how good your gradient approximations are. The bit-width issue boils down to how you will implement your storage (weights). Whether you use analog EEPROM, RAM converted with DACs or whatever, you have to deal with bit effects. Except if you have a new analog high precision storage device that can be implemented cheaply, in which case I will be eager to learn about. If you have the analog dream device, then your next problem in analog implementation is the signal/noise ratio. Except if your analog circuits are noisyless. Marwan ------------------------------------------------------------------- Marwan Jabri Email: marwan at sedal.su.oz.au Senior Lecturer Tel: (+61-2) 692-2240 SEDAL, Electrical Engineering, Fax: 660-1228 Sydney University, NSW 2006, Australia Mobile: (+61-18) 259-086  From miller at picard.ads.com Mon Feb 15 11:32:44 1993 From: miller at picard.ads.com (Kenyon Miller) Date: Mon, 15 Feb 93 11:32:44 EST Subject: Summary of "Does backprop need the derivative ??" Message-ID: <9302151632.AA03270@picard.ads.com> Paul Munro writes: > [3] This implies that the signs of the errors is adequate to reduce > the error, assuming the learning rate is sufficiently small, > since any two vectors with all components the same sign > must have a positive inner product! [They lie in the same > orthant of the space] I beleive a critical point is being missed, that is, the derivative is being replaced by sign at every stage in applying the chain rule, not just to the initial backpropagation of the error. Consider the following example: ----n2----- / \ w--n1 n4 \ / ----n3----- In other words, there is an output neuron n4 which is connected to two neurons n2 and n3, each of which is connected to neuron n1, which has a weight w. Suppose the weight connecting n2 to n4 is negative and all other connections in the diagram are positive. Suppose further that n2 is saturated and none of the other neurons are saturated. Now, suppose that n4 must be decreased in order to reduce the error. Backpropagating along the n4-n2-n1 path, w receives an error term which would tend to increase n1, while backpropagating along the n4-n3-n1 path would result in a term which would tend to decrease n1. If the true sigmoid derivative were used, the force to increase n1 would be dampened because n2 is saturated, and the net result would be to increase w and therefore increase n1 and n3 and decrease n4. However, replacing the sigmoid derivative with a constant could easily allow the n4-n2-n1 path to dominate, and the error at the output would increase. Thus, it is not a sound thing to do regardless of how many patterns are used for training. -Ken Miller.  From kanal at cs.UMD.EDU Mon Feb 15 12:35:27 1993 From: kanal at cs.UMD.EDU (Laveen N. Kanal) Date: Mon, 15 Feb 93 12:35:27 -0500 Subject: non-Turing machines? Message-ID: <9302151735.AA10355@mimsy.cs.UMD.EDU> I have only tuned into part of the quantum computers discussion and so I don't know if the following references have been mentioned in the discussion. Having speculated about natural perception not being modelable by Turing machines, I was not surprised to find similar speculation in the book Renewing Philosophy by Hilary Putnam (Harvard Univ. Press, 1992) which I picked up at the bookstore the other day. But Putnam does cite two specific refrences which may be of interest in this context. Marian Boykan Pour-El and Ian Richards, " The Wave Equation with Computable Initial Data Such That Its Unique Solution Is Not Computable," Advances in Mathematics, 39 (1981) p. 215-239 Georg Kreisel's review of the above paper in The Journal of Symbolic Logic, 47, No. 4, (1982) p. 900-902.  From ala at sans.kth.se Tue Feb 16 07:51:57 1993 From: ala at sans.kth.se (Anders Lansner) Date: Tue, 16 Feb 1993 13:51:57 +0100 Subject: MCPA'93 Call for Contributions Message-ID: <199302161251.AA02772@occipitalis.sans.kth.se> MCPA'93 Final Call **************************************************************************** * Invitation to * * International Workshop on Mechatronical Computer Systems * * for Perception and Action, June 1-3, 1993 * * Halmstad University, Sweden * * * * Final Call for Contributions * **************************************************************************** Mechatronical Computer Systems that Perceive and Act - A New Generation ======================================================================= Mechatronical computer systems, which we will see in advanced products and production equipment of tomorrow, are designed to do much more than calculate. The interaction with the environment and the integration of computational modules in every part of the equipment, engaging in every aspect of its functioning, put new, and conceptually different, demands on the computer system. A development towards a complete integration between the mechanical system, advanced sensors and actuators, and a multitude of process- ing modules can be foreseen. At the systems level, powerful algorithms for perceptual integration, goal-direction and action planning in real time will be critical components. The resulting action-oriented systems may interact with their environments by means of sophisticated sensors and actua- tors, often with a high degree of parallelism, and may be able to learn and adapt to different circumstances and environ- ments. Perceiveing the objects and events of the external world and acting upon the situation in accordance with an appropriate behaviour, whether programmed, trained, or learned, are key functions of these, next generation, compu- ter systems. The aim of this first International Workshop on Mechatronical Computer Systems for Perception and Action is to gather researchers and industrial development engineers, who work with different aspects of this exciting new generation of com- puting systems and computer-based applications, to a fruitful exchange of ideas and results and, often interdisciplinary, dis- cussions. Workshop Form ============= One of the days of the workshop will be devoted to true work- shop activities. The objective is to identify and propose research directions and key problem areas in mechatronical computing systems for perception and action. In the morning session, invited speakers, as well as other workshop dele- gates, will give their perspectives on the theme of the work- shop. The work will proceed in smaller working groups during the afternoon, after which the conclusions will be presented in a plenary session. The scientific programme will also include presentations of research results in oral or poster form, or as demonstrations. Subject Areas ============= Relevant subject areas are e.g.: Real-Time Systems Architecture and Real-Time Software. Sensor Systems and Sensory/Motor Coordination. Biologically Inspired Systems. Applications of Unsupervised and Reinforcement Learning. Real-Time Decision Making and Action Planning. Parallel Processor Architectures for Embedded Systems. Development Tools and Support Systems for Mechatronical Computer Systems and Applications. Dependable Computer Systems. Robotics and Machine Vision. Neural Networks in Real-Time Applications. Advanced Mechatronical Computing Demands in Industry. Contributions to the Workshop ============================= The programme committee welcomes all kinds of contribu- tions - papers to be presented orally or as posters, demon- strations, etc. - in the areas listed above, as well as other areas of relevance to the theme of the workshop. >From the workshop point of view, it is NOT essential that con- tributions contain only new, unpublished results. Rather, the new, interdisciplinary collection of delegates that can be expected at the workshop may motivate presentations of ear- lier published results. Specifically, we invite delegates to state their view of the workshop theme, including identification of key research issues and research directions. The planning of the workshop day will be based on these submitted statements , some of which will be presented in the plenary session, some of which in the smaller working groups. DEADLINES ========= Febr. 26, 1993: Submissions of extended abstracts or full papers. Submissions of statements regarding perspectives on the conference theme, that the delegate would like to present at the workshop (4 pages max). Submissions of descriptions of demonstrations, etc. March 19, 1993: Notification of acceptance. Preliminary final programme. May 1, 1993: Final papers and statements. All submissions shall be sent to the workshop secretariat, see address box. Please send two copies. Submissions must include name(s) and affiliation(s) of author(s) and full address, including phone and fax number and electronic mail address (if possible). The accepted papers and statements will be assembled into a Proceedings book given to the Workshop attendees. After the workshop a revised version of the proceedings, including results of the workshop discussions, will be published by an international publisher. Invited speakers ================ Prof. John A. Stankovic, University of Massachusetts, USA, and Scuola Superiore S. Anna, Pisa, Italy: "Major Real-Time Challenges for Mechatronical Systems" Prof. Jan-Olof Eklundh, CVAP, Royal Institute of Technology, Stockholm, Sweden: "Computer Vision and Seeing Systems" Prof. Dave Cliff, School of Cognitive and Computing Sciences and Neuroscience IRC, University of Sussex, U.K. "Animate Vision in an Artificial Fly: A Study in Computational Neuroethology" & "Visual Sensory-Motor Networks Without Design: Evolving Visually Guided Robots" (More invited speakers to be confirmed.) ORGANISERS ========== The workshop is arranged by CCA, the Centre for Computer Architecture at Halmstad University, Sweden, in cooperation with the DAMEK Mechatronics Research Group and the SANS (Studies of Artificial Neural Systems) Research Group, both at the Royal Institute of Technology (KTH), Stockholm, Sweden, and the Department of Computer Engineering, Chalmers University of Technology, Gothenburg, Sweden. The Organising Committee includes: Lars Bengtsson, CCA, Organising Chair Anders Lansner, SANS Kenneth Nilsson, CCA Bertil Svensson, Chalmers University of Technology and CCA, Programme and Conference Chair Per-Arne Wiberg, CCA Jan Wikander, DAMEK The workshop is supported by SNNS, the Swedish Neural Network Society. It is financially supported by Halmstad University, the County Administration of Halland, Swedish industries and NUTEK (the Swedish National Board for Industrial and Technical Development). Programme Committee =================== Bertil Svensson, Sweden (chair) Paolo Ancilotti, Italy Lars Bengtsson, Sweden Giorgio Buttazzo, Italy Robert Forchheimer, Sweden Anders Lansner, Sweden Kenneth Nilsson, Sweden John Stankovic, Italy and USA Jan Torin, Sweden Hendrik van Brussel, Belgium Per-Arne Wiberg, Sweden Jan Wikander, Sweden Workshop Language: English Workshop fee: SEK 2 000, incl. proceedings, lunch- eons, reception and workshop dinner. Early registration (before April 20) SEK 1750. The number of attendees to the workshop is limited. Among those not submitting a contribution attendance will be given on a first-come, first-served basis. Social Activities ================= Reception, workshop dinner. Deep sea fishing tour or a visit at Varberg castle/fortress and museum. Bring your family, a programme for accompanying persons will be arranged. How to get there ================ Halmstad is situated on the west coast of Sweden between Copenhagen and Gothenburg (major international airports). With a distance of 150 kilometres to each of these cities it is easy and convenient to reach Halmstad by train, bus or car. Halmstad Airport is linked to Stockholm International Airport (Arlanda). Flight time Stockholm - Halmstad is 50 minutes. Accomodation ============ Arrangements will be made with local hotels, both downtown Halmstad and at the seaside. Different price categories will be available. Please let us know what price category and loca- tion you prefer and we help you with the booking. Payment is made directly to the hotel. Prices (breakfast included) in SEK: CATEGORY 1: SEK 750-850 single room, 750-950 double room Downtown Single room ( ) Double room ( ) Seaside Single room ( ) Double room ( ) CATEGORY 2: SEK 400 single room, 450 double room Near town Single room ( ) Double room ( ) Transportation between the hotels and the University will be arranged. ( ) I register already now. Send preliminary programme when available. ( ) I do not register yet but want the preliminary programme when available. Name ................................................... ...................................................... Address................................................. ....................................................... ....................................................... Tel., Fax, e-mail .................................... ........................................................ ------------------------------------------------------------------------- MCPA Workshop Centre for Computer Architecture Halmstad University Box 823 S-30118 HALMSTAD Sweden Tel. +46 35 153134 (Lars Bengtsson) Fax. +46 35 157387 email: mcpa at cca.hh.se ------------------------------------------------------------------------ END OF MESSAGE  From harris at ai.mit.edu Tue Feb 16 18:50:28 1993 From: harris at ai.mit.edu (John G. Harris) Date: Tue, 16 Feb 93 18:50:28 EST Subject: Postdoc position in computational/biological vision (learning) Message-ID: <9302162350.AA05713@portofino> One (or possibly two) postdoctoral positions are available for one or two years in computational vision starting September 1993 (flexible). The postdoc will work in Lucia Vaina's laboratory at Boston University, College of Engineering, to conduct research in learning the direction in global motion. The researchers currently involved in this project are Lucia M. Vaina, John Harris, Charlie Chubb, Bob Sekuler, and Federico Girosi. Requirements are PhD in CS or related area with experience in visual modeling or psychophysics. Knowledge of biologically relevant neural models is desirable. Stipend ranges from $28,000 to $35,000 depending upon qualifications. Deadline for application is March 1, 1993. Two letter of recommendation, description of current research and an up to date CV are required. In the research we combine computational psychophysics, neural networks modeling and analog VLSI to study visual learning specifically applied to direction in global motion. The global motion problem requires estimation of the direction and magnitude of coherent motion in the presence of noise. We are proposing a set of psychophysical experiments in which the subject, or the network must integrate noisy, spatially local motion information from across the visual field in order to generate a response. We will study the classes of neural networks which best approximate the pattern of learning demonstrated in psychophysical tasks. We will explore Hebbian learning, multilayer perceptrons (e.g. backpropagation), cooperative networks, Radial Basis Function and Hyper-Basis Functions. The various strategies and their implementation will be evaluated on the basis of their performance and their biological plausibility. For more details, contact Prof. Lucia M. Vaina at vaina at buenga.bu.edu or lmv at ai.mit.edu.  From learn at galaxy.huji.ac.il Wed Feb 17 09:37:58 1993 From: learn at galaxy.huji.ac.il (learn conference) Date: Wed, 17 Feb 93 16:37:58 +0200 Subject: Learning Days in Jerusalem Message-ID: <9302171437.AA04425@galaxy.huji.ac.il> ========== DEADLINE FOR SUBMISSIONS: March 1, 1993 ========================== THE HEBREW UNIVERSITY OF JERUSALEM THE CENTER FOR NEURAL COMPUTATION LEARNING DAYS IN JERUSALEM Workshop on Fundamental Issues in Biological and Machine Learning May 30 - June 4, 1993 Hebrew University, Jerusalem, Israel The Center for Neural Computation at the Hebrew University is a new multi- disciplinary research center for collaborative investigation of the principles underlying computation and information processing in the brain and in neuron- like artificial computing systems. The Center's activities span theoretical studies of neural networks in physics, biology and computer science; experimental investigations in neurophysiology, psychophysics and cognitive psychology; and applied research on software and hardware implementations. The first international symposium sponsored by the Center will be held in the spring of 1993, at the Hebrew University of Jerusalem. It will focus on theoretical, experimental and practical aspects of learning in natural and artificial systems. Topics for the meeting include: Theoretical Issues in Supervised and Unsupervised Learning Neurophysiological Mechanisms Underlying Learning Cognitive Psychology and Learning Psychophysics Applications of Machine and Neural Network Learning Invited speakers include: Moshe Abeles (Hebrew U.) Yann LeCun (AT&T) Aharon Agranat (Hebrew U.) Joseph LeDoux (NYU) Ehud Ahissar (Weizmann Inst.) Christoph von der Malsburg (U. Bochum) Asher Cohen (Hebrew U.) Yishai Mansour (Tel Aviv U.) Yuval Davidor (Weizmann Inst.) Bruce McNaughton (U. of Arizona) Yadin Dudai (Weizmann Inst.) Helge Ritter (U. Bielefeld) Martha Farah (U. Penn) David Rumelhart (Stanford) David Haussler (UCSC) Dov Sagi (Weizmann Inst.) Nathan Intrator (Tel Aviv U.) Menachem Segal (Weizmann Inst.) Larry Jacoby (McMaster U.) Alex Waibel (CMU, U. Karlsruhe) Michael Jordan (MIT) Norman Weinberger (U.C. Irvine) Participation in the Workshop is limited to 100. A small number of contributed papers will be accepted. Interested researchers and students are asked to submit registration forms by **** March 1, 1993,***** to: Sari Steinberg Bchiri Tel: (972) 2 584563 Center for Neural Computation Fax: (972) 2 584437 c/o Racah Institute of Physics E-mail: learn at galaxy.huji.ac.il Hebrew University 91904 Jerusalem, Israel To ensure participation, please send a copy of the registration form by e-mail or fax as soon as possible. Organizing Committee: Shaul Hochstein, Haim Sompolinsky, Naftali Tishby. -------------------------------------------------------------------------------- REGISTRATION FORM Please complete the following form. To ensure participation, please send a copy of this form by e-mail or fax as soon as possible to: Sari Steinberg Bchiri E-MAIL: learn at galaxy.huji.ac.il Center for Neural Computation TEL: 972-2-584563 c/o Racah Institute of Physics FAX: 972-2-584437 Hebrew University 91904 Jerusalem, Israel Registration will be confirmed by e-mail. CONFERENCE REGISTRATION Name: _________________________________________________________________________ Affiliation: __________________________________________________________________ Address: ______________________________________________________________________ City: __________________ State: ______________ Zip: _________ Country: ________ Telephone: (____)________________ E-mail address: ____________________________ REGISTRATION FEES ____ Regular registration (before March 1): $100 ____ Student registration (before March 1): $50 ____ Late registration (after March 1): $150 ____ Student late registration (after March 1): $75 Please send payment by check or international money order in US dollars made payable to: Learning Workshop with this form by March 1, 1993 to avoid late fee. ACCOMMODATIONS If you are interested in assistance in reserving hotel accommodation for the duration of the Workshop, please indicate your preferences below: I wish to reserve a single/double (circle one) room from __________ to __________, for a total of _______ nights. CONTRIBUTED PAPERS A very limited number of contributed papers will be accepted. Participants interested in submitting papers should complete the following and enclose a 250-word abstract. Poster/Talk (circle one) Title: __________________________________________________________________ __________________________________________________________________  From kak at max.ee.lsu.edu Wed Feb 17 13:26:28 1993 From: kak at max.ee.lsu.edu (Dr. S. Kak) Date: Wed, 17 Feb 93 12:26:28 CST Subject: Reprints Message-ID: <9302171826.AA05612@max.ee.lsu.edu> Reprints of the following article are now available: ---------------------------------------------------------------- Ciruits, Systems, & Signal Processing, vol. 12, 1993, pp. 263-278 ---------------------------------------------------------------- Feedback Neural Networks: New Characteristics and a Generalization Subhash C. Kak Department of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803, USA ABSTRACT New characteristics of feedback neural networks are studied. We discuss in detail the question of updating of neurons given incomplete information about the state of the neural network. We show how the mechanism of self-indexing [Self-indexing of neural memories, Physics Letters A, Vol. 143, 293-296, 1990.] for such updating provides better results than assigning 'don't know' values to the missing parts of the state vector. Issues related to the choice of the neural model for a feedback network are also considered. Properties of a new complex valued neuron model that generalizes McCulloch-Pitts neurons are examined. ----- Note: This issue of the journal is devoted exclusively to articles on neural networks.  From radford at cs.toronto.edu Wed Feb 17 15:15:58 1993 From: radford at cs.toronto.edu (Radford Neal) Date: Wed, 17 Feb 1993 15:15:58 -0500 Subject: Paper on "A new view of the EM algorithm" Message-ID: <93Feb17.151609edt.555@neuron.ai.toronto.edu> The following paper has been placed in the neuroprose archive, as the file 'neal.em.ps.Z': A NEW VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL AND OTHER VARIANTS Radford M. Neal and Geoffrey E. Hinton Department of Computer Science University of Toronto We present a new view of the EM algorithm for maximum likelihood estimation in situations with unobserved variables. In this view, both the E and the M steps of the algorithm are seen as maximizing a joint function of the model parameters and of the distribution over unobserved variables. From this perspective, it is easy to justify an incremental variant of the algorithm in which the distribution for only one of the unobserved variables is recalculated in each E step. This variant is shown empirically to give faster convergence in a mixture estimation problem. A wide range of other variant algorithms are also seen to be possible. The PostScript for this paper may be retrieved in the usual fashion: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get neal.em.ps.Z ftp> quit unix> uncompress neal.em.ps.Z unix> lpr neal.em.ps (or however you print PostScript files) Many thanks to Jordan Pollack for providing this service! Radford Neal  From gem at cogsci.indiana.edu Thu Feb 18 09:03:23 1993 From: gem at cogsci.indiana.edu (Gary McGraw) Date: Thu, 18 Feb 93 09:03:23 EST Subject: Letter Spirit technical report available Message-ID: The following technical report from the Center for Research on Concepts and Cognition is available by ftp (only). Although the project described in the paper is not connectionism per se, it shares many of the same philosophical convictions. ---------------------------------------------------------------------- Letter Spirit: An Emergent Model of the Perception and Creation of Alphabetic Style Douglas Hofstadter & Gary McGraw The Letter Spirit project explores the creative act of artistic letter-design. The aim is to model how the $26$ lowercase letters of the roman alphabet can be rendered in many different but internally coherent styles. Viewed from a distance, the behavior of the program can be seen to result from the interaction of four emergent agents working together to form a coherent style and to design a complete alphabet: the Imaginer (which plays with the concepts behind letterforms), the Drafter (which converts ideas for letterforms into graphical realizations), the Examiner (which combines bottom-up and top-down processing to perceive and categorize letterforms), and the Adjudicator (which perceives and dynamically builds a representation of the evolving style). Creating a gridfont is an iterative process of guesswork and evaluation carried out by the four agents. This process is the ``central feedback loop of creativity''. Implementation of Letter Spirit is just beginning. This paper outlines our goals and plans for the project. --------------------------------------------------------------------------- The paper is available by anonymous ftp from: cogsci.indiana.edu (129.79.238.12) as pub/hofstadter+mcgraw.letter-spirit.ps.Z and in neuroprose: archive.cis.ohio-state.edu (128.146.8.52) as pub/neuroprose/hofstadter.letter-spirit.ps.Z Unfortunately, we are not able to distribute hardcopy at this time. *---------------------------------------------------------------------------* | Gary McGraw gem at cogsci.indiana.edu | (__) | |--------------------------------------------------| (oo) | | Center for Research on Concepts and Cognition | /-------\/ | | Department of Computer Science | / | || | | Indiana University | * ||----|| | | mcgrawg at moose.indiana.edu | ^^ ^^ | *---------------------------------------------------------------------------*  From mwitten at hermes.chpc.utexas.edu Thu Feb 18 10:00:41 1993 From: mwitten at hermes.chpc.utexas.edu (mwitten@hermes.chpc.utexas.edu) Date: Thu, 18 Feb 93 9:00:41 CST Subject: Computational Neurosciences Workshop Message-ID: <9302181500.AA03619@morpheus.chpc.utexas.edu> *********************************************************************** ** ** ** UNIVERSITY OF TEXAS SYSTEM CENTER FOR HIGH PERFORMANCE COMPUTING ** ** ** ** Workshop Series In Computational Medicine And Public Health** ** ** ** Announces ** ** ** ** A Workshop On Computational Neurosciences ** ** ** ** 14-15 May 1993 ** ** ** ** Austin, Texas ** ** ** *********************************************************************** Workshop Director: ----------------- Dr. Matthew Witten Associate Director, University of Texas System - CHPC Balcones Research Center 10100 Burnet Road, CMS 1.154 Austin, TX 78758-4497 USA Phone: (512) 471-2472 or (800) 262-2472 Fax : (512) 471-2445 email: m.witten at chpc.utexas.edu m.witten at uthermes.bitnet ***** Peliminary Program ***** List Of Current Speakers: ------------------------- Dr. Peter Fox, Director Research Imaging Center, UT HSC San Antonio Dr. Terry Mikiten, Associate Dean, Grad School of Biomedical Sciences, UT HSC San Antonio Dr. Robert Wyatt, Director, Institute For Theoretical Chemistry, UT Austin Dr. Elizabeth Thomas, Department of Chemistry, UT Austin Dr. George Adomian, Director, General Analytics Corporation, Athens, Georgia Dr. George Moore, Department of Biomedical Engineering, University of Southern California, Los Angeles, CA Dr. William Softky, California Institute of Technology, Pasadena, CA Dr. Cathy Wu, Department of Biomathematics and Computer Science, UT Health Center, Tyler, TX Dr. Dan Levine, Department of Mathematics, University of Texas at Arlington, Arlington, TX Dr. Michael Liebman, Senior Scientist, Amoco Technology Company, Naperville, Illinois Dr. George Stanford, Learning Abilities Center, UT Austin Dr. Tom Oakland, School of Education, UT Austin Dr. Matthew Witten, Associate Director, UT System - CHPC Objective, Agenda and Participants: ---------------------------------- The 1990's have been declared the Decade of the Mind. Understanding the mind requires the understanding of a wide variety of topics in the neurosciences. This Workshop is part of an ongoing series of workshops being held at the UT System Center For High Performance Computing; addressing issues of high performance computing and its role in medicine, dentistry, allied health disciplines, and public health. Prior workshops have covered Computational Chemistry and Molecular Design, and Computational Issues in the Life Sciences and Medicine. Upcoming workshops will focus on the subject areas of Computational Molecular Biology and Genetics, Biomechanics, and Physiological Modeling and Simulation. The purpose of this Workshop On Computational Neurosciences is to bring together interested scientists for the purposes of introducing them to state-of-the-art thinking and applications in the domain of neuroscience. Topics to be discussed range across the disciplines of neurosimulation, cognitive neuroscience, neural nets and their theory/application to a variety of problems, methods for solving numerical problems arising in neurology, learning abilities and disabilities, and neurological imaging. Lectures will be presented in a tutorial fashion, and time for questions and answers will be allowed. Attendence is open to anyone. A background in the neurosciences is not required. The size of the workshop is limited due to seating constraints. It is best to register as soon as possible. Schedule: -------- 14 May 1993 - Friday 8:00am - 9:00am Registration and Refreshments 9:00am - 9:15am Opening Remarks - Dr. James C. Almond, Director, UT System CHPC 9:15am - 10:00am Conference Overview - Dr. Matthew Witten 10:00am - 11:00am Dr. Peter Fox 11:00am - 11:30am Coffee Break 11:30am - 12:30pm Dr. Dan Levine 12:30pm - 1:30pm Lunch Break 1:30pm - 2:30pm Dr. Michael Liebman 2:30pm - 3:30pm Dr. Cathy Wu 3:30pm - 4:00pm Coffee Break 4:00pm - 5:00pm Dr. Terry Mikiten 15 May 1993 - Saturday 8:00am - 9:00am Registration and Refreshments 9:00am - 10:00am Dr. George Moore 10:00am - 11:00am Dr. Robert Wyatt and Dr. Elizabeth Thomas 11:00am - 11:30am Coffee Break 11:30am - 12:30pm Dr. George Adomian 12:30am - 1:30pm Lunch Break 1:30am - 2:30pm Dr. George Stanford and Dr. Tom Oakland 2:30am - 3:30pm Dr. William Softky 3:30pm - 4:00pm Coffee Break 4:00pm - 5:00pm Closing Discussion and Remarks Poster Sessions: ---------------- While no poster sessions are planned, if enough conference participants indicate a desire to present a poster, we will make every attempt to accommodate the requests. If you are interested in presenting a poster presentation at this meeting, please contact the workshop director. Conference Proceedings: ---------------------- We will make every attempt to have a publication quality conference proceedings. All of the speakers have been asked to submit a paper covering the talk material. The proceedings will appear as a special issue of the series Advances In Mathematics And Computers In Medicine, which is part of the International Journal of Computers and Mathematics With Applications (Pergamon Press). Individuals wishing to have an appropriate paper included in this proceedings should contact the workshop director for manuscript details and deadlines. Conference Costs and Funding: ----------------------------- A nominal registration fee of US $50.00 will be charged by 1 April 93, and US $60.00 after that date. The conference proceedings will be an additional US $10.00 . The conference registration fee includes luncheon and refreshments for both days of the workshop. Accomodations: ------------- There are a number of very reasonable hotels near the UT System CHPC. Additional information may be obtained by contacting the workshop coordinator at the address below. Registration and Information: ---------------------------- Registration requests and further questions should be directed to: Ms. Leslie Bockoven Administrative Associate Workshop On Computational NeuroSciences UT System - CHPC Balcones Research Center 10100 Burnet Road, CMS 1.154 Austin, TX 78758-4497 Phone: (512) 471-2472 or (800) 262-2472 Fax : (512) 471-2445 Email: neuro93 at chpc.utexas.edu neuro93 at uthermes.bitnet ============ REGISTRATION FORM FOLLOWS - CUT HERE ========== NAME (As will appear on badge): AFFILIATION (As will appear on badge): ADDRESS: PHONE: FAX : EMAIL: Please answer the following questions as appropriate: Do you wish to purchase a copy of the conference proceedings? If yes, make sure to include the proceedings purchase fee. Do you have any special dietary requirements? If yes, what are they? Do you wish to present a poster? If yes, what will the proposed title be? Do you wish to include a manuscript in the conference proceedings? If yes, what will the proposed topic be? Do you wish to be on our Workshop Series mailing list? If yes, please give the address for announcements (email is okay) Do you need a hotel reservation? Do you anticipate needing local transportation? ==================== END OF REGISTRATION FORM ============================  From gary at psyche.mit.edu Wed Feb 17 18:42:21 1993 From: gary at psyche.mit.edu (Gary Marcus) Date: Wed, 17 Feb 93 18:42:21 EST Subject: MIT Center for Cognitive Science Occasional Paper #47 Message-ID: <9302172342.AA04329@psyche.mit.edu> Would you please post the following announcement? Thank you very much. Sincerely, Gary Marcus ---- The following technical report is now available: MIT CENTER FOR COGNITIVE SCIENCE OCCASIONAL PAPER #47 German Inflection: The Exception that Proves the Rule Gary F. Marcus MIT Ursula Brinkmann Max-Planck-Institut fuer Psycholinguistik Harald Clahsen Richard Wiese Andreas Woest Universit at act[c]t D at act[y]sseldorf. Steven Pinker MIT ABSTRACT Language is often explained by generative rules and a memorized lexicon. For example, most English verbs take a regular past tense suffix (ask-asked), which is applied to new verbs (faxed, wugged), suggesting the mental rule "add -d to a Verb." Irregular verbs (break-broke, go-went) would be listed in memory. Connectionists argue instead that a pattern associator memory can store and generalize all past tense forms; irregular and regular patterns differ only because of their different numbers of verbs. We present evidence that mental rules are indispensible. A rule concatenates a suffix to a symbol for verbs, so it does not require access to memorized verbs or their sounds, but applies as the "default," whenever memory access fails. We find 20 such circumstances, including novel, unusual-sounding, and derived words; in every case, people inflect them regularly (explaining quirks like flied out, sabre-tooths, walkmans). Contrary to connectionist accounts, these effects are not due to regular words being in the majority. The German participle -t and plural -s apply to minorities of words. Two experiments eliciting ratings of novel German words show that the affixes behave like their English counterparts, as defaults. Thus default suffixation is not due to numerous regular words reinforcing a pattern in associative memory, but to a memory-independent, symbol-concatenating mental operation. --------------------------------------------------------------------------- Copies of the postscript file german.ps.Z may be obtained electronically from psyche.mit.edu as follows: unix-1> ftp psyche.mit.edu (or ftp 18.88.0.85) Connected to psyche.mit.edu. Name (psyche:): anonymous 331 Guest login ok, sent ident as password. Password: yourname 230 Guest login ok, access restrictions apply. ftp> cd pub 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get german.ps.Z 200 PORT command successful. 150 Opening data connection for german.ps.Z (18.88.0.154,1500) (253471 bytes). 226 Transfer complete. local: german.ps.Z remote: german.ps.Z 166433 bytes received in 4.2 seconds (39 Kbytes/s) ftp> quit unix-2> uncompress german.ps.Z unix-3> lpr -P(your_local_postscript_printer) german.ps Or, order a hardcopy by sending your physical mail address to Eleanor Bonsaint (bonsaint at psyche.mit.edu), asking for Occasional Paper #47, Please do this only if you cannot use the ftp method described above.  From josh at faline.bellcore.com Thu Feb 18 10:59:52 1993 From: josh at faline.bellcore.com (Joshua Alspector) Date: Thu, 18 Feb 93 10:59:52 EST Subject: Workshop on applications of neural networks to telecommunications Message-ID: <9302181559.AA02043@faline.bellcore.com> CALL FOR PAPERS International Workshop on Applications of Neural Networks to Telecommunications Princeton, NJ October 18-20, 1993 You are invited to submit a paper to an international workshop on applications of neural networks to problems in telecommunications. The workshop will be held in Princeton, New Jersey on October, 18-20 1993. This workshop will bring together active researchers in neural networks with potential users in the telecommunications industry in a forum for discussion of applications issues. Applications will be identified, experiences shared, and directions for future work explored. Suggested Topics: Application of Neural Networks in: Network Management Congestion Control Adaptive Equalization Speech Recognition Security Verification Language ID/Translation Information Filtering Dynamic Routing Software Reliability Fraud Detection Financial and Market Prediction Adaptive User Interfaces Fault Identification and Prediction Character Recognition Adaptive Control Data Compression Please submit 6 copies of both a 50 word abstract and a 1000 word summary of your paper by May 14, 1993. Mail papers to the conference administrator: Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com Abstract and Summary Due: May 14 Author Notification of Acceptance: June 18 Camera-Ready Copy of Paper Due: August 13 Organizing Committee: General Chair Josh Alspector Bellcore, MRE 2P-396 445 South St. Morristown, NJ 07960-6438 (201) 829-4342 josh at bellcore.com Program Chair Rod Goodman Caltech 116-81 Pasadena, CA 91125 (818) 356-3677 rogo at micro.caltech.edu Publications Chair Timothy X Brown Bellcore, MRE 2E-378 445 South St. Morristown, NJ 07960-6438 (201) 829-4314 timxb at faline.bellcore.com Treasurer Anthony Jayakumar, Bellcore Events Coordinator Larry Jackel, AT&T Bell Laboratories University Liaison S Y Kung, Princeton INNS Liaison Bernie Widrow, Stanford University IEEE Liaison Steve Weinstein, Bellcore Industry Liaisons Miklos Boda, Ellemtel Atul Chhabra, NYNEX Michael Gell, British Telecom Lee Giles, NEC Thomas John, Southwest Bell Adam Kowalczyk, Telecom Australia Conference Administrator Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- International Workshop on Applications of Neural Networks to Telecommunications Princeton, NJ October 18-20, 1993 Registration Form Name: _____________________________________________________________ Institution: __________________________________________________________ Mailing Address: ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ ___________________________________________________________________ Telephone: ______________________________ Fax: ____________________________________ E-mail: _____________________________________________________________ I will attend | | Send more information | | Paper enclosed | | Registration Fee Enclosed ($350) | | (please make sure your name is on the check) Registration includes Monday night reception, Tuesday night banquet, and proceedings available at the conference. Mail to: Betty Greer Bellcore, MRE 2P-295 445 South St. Morristown, NJ 07960 (201) 829-4993 (fax) 829-5888 bg1 at faline.bellcore.com Deadline for submissions: May 14, 1993 Author Notification of Acceptance: June 18, 1993 Camera-Ready Copy of Paper Due: August 13, 1993  From miller at picard.ads.com Thu Feb 18 11:51:18 1993 From: miller at picard.ads.com (Kenyon Miller) Date: Thu, 18 Feb 93 11:51:18 EST Subject: correction to backprop example Message-ID: <9302181651.AA02454@picard.ads.com> For those of you who have lost interest in the backprop debate about replacing the sigmoid derivative with a constant, please disregard this message. It was recently pointed out to me that my backprop example was incomplete (I don't know the name of the sender): > The error need not be increased although w increased because W1-3 decreased > and W3-4 decreased. With 2 decreases and 1 increase, one could still expect > the N4 to decrease and also the error. > Rgds, > TH My original example (with typographical corrections) was: Consider the following example: ----n2----- / \ w--n1 n4 \ / ----n3----- In other words, there is an output neuron n4 which is connected to two neurons n2 and n3, each of which is connected to neuron n1, which has a weight w. Suppose the weight connecting n2 to n4 is negative and all other connections in the diagram are positive. Suppose further that n2 is saturated and none of the other neurons are saturated. Now, suppose that n4 must be decreased in order to reduce the error. Backpropagating along the n4-n2-n1 path, w receives an error term which would tend to increase n1, while backpropagating along the n4-n3-n1 path would result in a term which would tend to decrease n1. If the true sigmoid derivative were used, the force to increase n1 would be dampened because n2 is saturated, and the net result would be to decrease w and therefore decrease n1, n3, n4, and the error. However, replacing the sigmoid derivative with a constant could easily allow the n4-n2-n1 path to dominate, and the error at the output would increase. The conclusion was that replacing the sigmoid derivative with a constant can result in increasing the error, and is therefore undesireable. CORRECTION TO THE EXAMPLE: The original example did not take into account the perturbation on W1-3 and W3-4, but the argument still holds with the following modification. Whatever the perturbation on W1-3 and W3-4, there exists (or at least a situation can be constructed such that there exists) some positive perturbation on w which will counteract those perturbations and result in an increase in the output error. Now replicate the n1-n2-n4 path as necessary by adding an n1-n5-n4 path, an n1-n6-n4 path etc. Each new path results in incrementing w by some constant delta, so there must exist some number of paths which results in a sufficient increase in w to cause an increase in the output error of the network. Thus, an example can be constructed in which the error increases, so the method cannot be considered theoretically sound. However, you can get virtually all of the benefit without any of the theoretical problems by using the derivative of the piecewise-linear function ------------------- / / / --------- which involves using a constant or zero for the derivative, depending on a simple range test. -Ken Miller.  From georgiou at silicon.csci.csusb.edu Thu Feb 18 13:04:55 1993 From: georgiou at silicon.csci.csusb.edu (George M. Georgiou) Date: Thu, 18 Feb 1993 10:04:55 -0800 Subject: Multivalued and Continuous Perceptrons (Preprint) Message-ID: <9302181804.AA24680@silicon.csci.csusb.edu> Rosenblatt's Percepceptron Theorem guaranties us that a linearly separable function (R^n --> {0,1}) can be learned in finite time. Question: Is it possible to guarantee learning of a continuous-valued function (R^n --> (0,1)) which can be represented on a perceptron in finite time? This paper answers this question (and other ones too) in the affirmative: The Multivalued and Continuous Perceptrons by George M. Georgiou Rosenblatt's perceptron is extended to (1) a multivalued perceptron and (2) to a continuous-valued perceptron. It shown that any function that can be represented by the multivalued perceptron can be learned in a finite number of steps, and any function that can be represented by the continuous perceptron can be learned with arbitrary accuracy in a finite number of steps. The whole apparatus is defined in the complex domain. With these perceptrons learnability is extended to more complicated functions than the usual linearly separable ones. The complex domain promises to be a fertile ground for neural networks research. The file in the neuroprose is georgiou.perceptrons.ps.Z . Comments and questions on the proofs are welcome. --------------------------------------------------------------------- Sample session to get the file: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get georgiou.perceptrons.ps.Z ftp> quit unix> uncompress georgiou.perceptrons.ps.Z unix> lpr georgiou.perceptrons.ps (or however you print PostScript files) Thanks to Jordan Pollack for providing this service! --George ---------------------------------------------------- Dr. George M. Georgiou E-mail: georgiou at wiley.csusb.edu Computer Science Department TEL: (909) 880-5332 California State University FAX: (909) 880-7004 5500 University Pkwy San Bernardino, CA 92407, USA  From rangarajan-anand at CS.YALE.EDU Thu Feb 18 13:18:40 1993 From: rangarajan-anand at CS.YALE.EDU (Anand Rangarajan) Date: Thu, 18 Feb 1993 13:18:40 -0500 Subject: No subject Message-ID: <199302181818.AA24890@COMPOSITION.SYSTEMSZ.CS.YALE.EDU> Programmer/Analyst Position in Artificial Neural Networks The Yale Center for Theoretical and Applied Neuroscience (CTAN) and the Department of Computer Science Yale University, New Haven, CT We are offering a challenging position in software engineering in support of new techniques in image processing and computer vision using artificial neural networks (ANNs). 1. Basic Function: Designer and programmer for computer vision and neural network software at CTAN and the Computer Science department. 2. Major duties: (a) To implement computer vision algorithms using a Khoros (or similar) type of environment. (b) Use the aforementioned tools and environment to run and analyze computer experiments in specific image processing and vision application areas. (c) To facilitate the improvement of neural network algorithms and architectures for vision and image processing. 3. Position Specifications: (a) Education: BA, including linear algebra, differential equations, calculus. helpful: mathematical optimization. (b) Experience: programming experience in C++ (or C) under UNIX. some of the following: neural networks, vision or image processing applications, scientific computing, workstation graphics, image processing environments, parallel computing, computer algebra and object-oriented design. Preferred starting date: March 1, 1993. For information or to submit an application, please write: Eric Mjolsness Department of Computer Science Yale University P. O. Box 2158 Yale Station New Haven, CT 06520-2158 e-mail: mjolsness-eric at cs.yale.edu Any application must also be submitted to: Jeffrey Drexler Department of Human Resources Yale University 155 Whitney Ave. New Haven, CT 06520 -Eric Mjolsness and Anand Rangarajan (prospective supervisors)  From pjs at bvd.Jpl.Nasa.Gov Thu Feb 18 14:49:36 1993 From: pjs at bvd.Jpl.Nasa.Gov (Padhraic Smyth) Date: Thu, 18 Feb 93 11:49:36 PST Subject: Position Available at JPL Message-ID: <9302181949.AA26236@bvd.jpl.nasa.gov> We currently have an opening in our group for a new PhD graduate in the general area of signal processing and pattern recognition. While the job description does not mention neural computation per se, it may be of interest to some members of the connectionist mailing list. For details see below. Padhraic Smyth, JPL RESEARCH POSITION AVAILABLE AT THE JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY The Communications Systems Research Section at JPL has an immediate opening for a permanent member of technical staff in the area of adaptive signal processing and statistical pattern recognition. The position requires a PhD in Electrical Engineering or a closely related field and applicants should have a demonstrated ability to perform independent research. A background in statistical signal processing is highly desirable. Background in information theory, estimation and detection, advanced statistical methods, and pattern recognition, would also be a plus. Current projects within the group include the use of hidden Markov models for change detection in time series, and statistical methods for geologic feature detection in remotely sensed image data. The successful applicant will be expected to perform both basic and applied research and to propose and initiate new research projects. Permanent residency or U.S. citizenship is not a strict requirement - however, candidates not in either of these categories should be aware that their applications will only be considered in exceptional cases. Interested applicants should send their resume (plus any supporting background material such as recent relevant papers) to: Dr. Stephen Townes JPL 238-420 4800 Oak Grove Drive Pasadena, CA 91109. (email: townes at bvd.jpl.nasa.gov)  From mpp at cns.brown.edu Thu Feb 18 15:42:34 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Thu, 18 Feb 93 15:42:34 EST Subject: A computationally efficient squashing function Message-ID: <9302182042.AA03424@cns.brown.edu> Recently on the comp.ai.neural-nets bboard, there has been a discussion of more computationally efficient squashing functions. Some colleagues of mine suggested that many members of the Connectionist mailing list may not have access to the comp.ai.neural-nets bboard; so I have included a summary below. Michael ------------------------------------------------------ David L. Elliot mentioned using the following neuron activation function: x f(x) = ------- 1 + |x| He argues that this function has the same qualitative properties of the hyperbolic tangent function but in practice faster to calculate. I have suggested a similar speed-up for radial basis function networks: 1 f(x) = ------- 1 + x^2 which avoids the transcendental calculation associated with gaussian RBF nets. I have run simulations using the above squashing function in various backprop networks. The performance is comparable (sometimes worse sometimes better) to usual training using hyperbolic tangents. I also found that the performance of networks varied very little when the activation functions were switched (i.e. two networks with identical weights but different activation functions will have comparable performance on the same data). I tested these results on two databases: the NIST OCR database (preprocessed by Nestor Inc.) and the Turk and Pentland human face database. -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From henrik at robots.ox.ac.uk Fri Feb 19 11:47:16 1993 From: henrik at robots.ox.ac.uk (henrik@robots.ox.ac.uk) Date: Fri, 19 Feb 93 16:47:16 GMT Subject: Squashing functions Message-ID: <9302191647.AA05729@cato.robots.ox.ac.uk> Any interesting squashing function can be stored in a table of negligible size (eg 256) with very high accuracy if linear (or higher) interpolation is used. So, on a RISC workstation, there is no need for improvements. If you deal with analog VLSI, anything goes, though ... Cheers, henrik at robots.ox.ac.uk  From cateau at tkyux.phys.s.u-tokyo.ac.jp Sat Feb 20 01:11:11 1993 From: cateau at tkyux.phys.s.u-tokyo.ac.jp (Hideyuki Cateau) Date: Sat, 20 Feb 93 15:11:11 +0900 Subject: TR:Univeral Power law Message-ID: <9302200611.AA21000@tkyux.phys.s.u-tokyo.ac.jp> I and my collaborators previously reported that there is a beautiful power law in the pace of the memory of Back Prop. We found a reaction from one of networkers that the law was established only in the special model. This time we performed an extensive simulation to show the law is fairly universal in the technical report:cateau.univ.tar.Z, Universal Power law in feed forward networks H.Cateau Department of Physics University of Tokyo Abstract: The power law in the pace of the memory, which was previously reported for the encoder, is shown to hold universally for general feed forward networks. An extensive simulation on wide variety of feed forward networks shows this and reveals a lot of interesting new observations. The PostScript for this paper may be retrieved in the usual fashion: unix> ftp archive.cis.ohio-state.edu (log in as user 'anonymous', e-mail address as password) ftp> binary ftp> cd pub/neuroprose ftp> get cateau.univ.tar.Z ftp> quit unix> uncompress cateau.univ.tar.Z unix> tar xvfo cateau.univ.tar Then you get three PS files:short.ps fig1.ps fig2.ps unix> lpr short.ps unix> lpr fig1.ps unix> lpr fig2.ps Hideyuki Cateau Particle theory group, Department of Physics,University of Tokyo,7-3-1, Hongo,Bunkyoku,113 Japan e-mail:cateau at tkyux.phys.s.u-tokyo.ac.jp  From soller at asylum.cs.utah.edu Fri Feb 19 16:09:43 1993 From: soller at asylum.cs.utah.edu (Jerome Soller) Date: Fri, 19 Feb 93 14:09:43 -0700 Subject: Industrial Position in Artificial Intelligence and/or Neural Networks Message-ID: <9302192109.AA22408@asylum.cs.utah.edu> I have just been made aware of a job opening in artificial intelligence and/or neural networks in southeast Ogden, UT. This company maintains strong technical interaction with existing industrial, U.S. government laboratory, and university strengths in Utah. Ogden is a half hour to 45 minute drive from Salt Lake City, UT. For further information, contact Dale Sanders at 801-625-8343 or dsanders at bmd.trw.com . The full job description is listed below. Sincerely, Jerome Soller U. of Utah Department of Computer Science and VA Geriatric, Research, Education and Clinical Center Knowledge engineering and expert systems development. Requires five years formal software development experience, including two years expert systems development. Requires experience implementing at least one working expert system. Requires familiarity with expert systems development tools and DoD specification practices. Experience with neural nets or fuzzy logic systems may qualify as equivalent experience to expert systems development. Familiarity with Ada, C/C++, database design, and probabilistic risk assessment strongly desired. Requires strong communication and customer interface skills. Minimum degree: BS in computer science, engineering, math, or physical science. M.S. or Ph.D. preferred. U.S. Citizenship is required. Relocation funding is limited.  From delliott at eng.umd.edu Fri Feb 19 15:22:38 1993 From: delliott at eng.umd.edu (David L. Elliott) Date: Fri, 19 Feb 1993 15:22:38 -0500 Subject: Abstract Message-ID: <199302192022.AA03327@verdi.eng.umd.edu> ABSTRACT A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS TR 93-8, Institute for Systems Research, University of Maryland by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University January 29, 1993 The activation function s(x) = x/(1 + |x|) is proposed for use in digital simulation of neural networks, on the grounds that the computational operation count for this function is much smaller than for those using exponentials and that it satisfies the simple differential equation s' = (1 + |s|)^2, which generalizes the logistic equation. The full report, a work-in-progress, is available in LaTeX or PostScript form (two pages + titlepage) by request to delliott at src.umd.edu.  From tony at aivru.shef.ac.uk Fri Feb 19 05:59:46 1993 From: tony at aivru.shef.ac.uk (Tony_Prescott) Date: Fri, 19 Feb 93 10:59:46 GMT Subject: lectureship Message-ID: <9302191059.AA23937@aivru> LECTURESHIP IN COGNITIVE SCIENCE University of Sheffield, UK. Applications are invited for the above post tenable from 1st October 1993 for three years in the first instance but with expectation of renewal. Preference will be given to candidates with a PhD in Cognitive Science, Artificial Intelligence, Cognitive Psychology, Computer Science, Robotics, or related disciplines. The Cognitive Science degree is an integrated course taught by the departments of Psychology and Computer Science. Research in Cognitive Science was highly evaluated in the recent UFC research evaluation exercise, special areas of interest being vision, speech, language, neural networks, and learning. The successful candidate will be expected to undertake research vigorously. Supervision of programming projects will be required, hence considerable experience with Lisp, Prolog, and/or C is essential. It is expected that the appointment will be made on the Lecturer A scale (13,400-18,576 pounds(uk) p.a.) according to age and experience but enquiries from more experienced staff able to bring research resources are welcomed. Informal enquiries to Professor John P Frisby 044-(0)742-826538 or e-mail jpf at aivru.sheffield.ac.uk. Further particulars from the director of Personnel Services, The University, Sheffield S10 2TN, UK, to whom all applications including a cv and the names and addresses of three referees (6 copies of all documents) should be sent by 1 April 1993. Short-listed candidates will be invited to Sheffield for interview for which travel expenses (within the UK only) will be funded. Current permanent research staff in Cognitive Science at Sheffield include: Prof John Frisby (visual psychophysics), Prof John Mayhew (computer vision, robotics, neural networks) Prof Yorik Wilks (natural language understanding) Dr Phil Green (speech recognition) Dr John Porrill (computer vision) Dr Paul McKevitt (natural language understanding) Dr Peter Scott (computer assisted learning) Dr Rod Nicolson (human learning) Dr Paul Dean (neuroscience, neural networks) Mr Tony Prescott (neural networks, comparative cog sci)  From delliott at src.umd.edu Sat Feb 20 15:23:57 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Sat, 20 Feb 1993 15:23:57 -0500 Subject: Corrected Abstract Message-ID: <199302202023.AA12407@newra.src.umd.edu> ABSTRACT [corrected] A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS TR 93-8, Institute for Systems Research, University of Maryland by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University January 29, 1993 The activation function s(x) = x/(1 + |x|) is proposed for use in digital simulation of neural networks, on the grounds that the computational operation count for this function is much smaller than for those using exponentials and that it satisfies the simple differential equation s' = (1 - |s|)^2, which generalizes the logistic equation. The full report, a work-in-progress, is available in LaTeX or PostScript form (two pages + titlepage) by request to delliott at src.umd.edu. Thanks to Michael Perrone for calling my attention to the typo in s'.  From raina at max.ee.lsu.edu Sat Feb 20 17:37:45 1993 From: raina at max.ee.lsu.edu (Praveen Raina) Date: Sat, 20 Feb 93 16:37:45 CST Subject: No subject Message-ID: <9302202237.AA13139@max.ee.lsu.edu> The following comparison between the backpropagation and Kak algorithm for training feedforward networks will be of interest to many. We took 52 training samples each having 25 input neurons and 3 output neurons.The training data taken was monthly price index of a commodity for 60 months. Monthly prices were normalised and quantized into 3 bit binary sequence. Each training sample represented prices taken over a period of 8 months (8X3=24 input neurons + 1 neuron for bias).The size of the learning window was fixed as 1 month.Binary values were used as the input for both BP and Kak algorithm. For BP the learning rate was taken as 0.45 and momentum equal to 0.55. The training samples were trained on IBM RISC 6000 machine. The training time for backpropagation was 4 minutes 5 seconds and the total number of iterations was 6101.The training time for the Kak algorithm was 5 seconds and the total number of iterations was 875. Thus, for this example the learning advantage in the Kak algorithm is 49. For larger examples the advantage becomes even greater. - Praveen Raina.  From unni at neuro.cs.gmr.com Sat Feb 20 14:57:13 1993 From: unni at neuro.cs.gmr.com (K.P.Unnikrishnan) Date: Sat, 20 Feb 93 14:57:13 EST Subject: A NEURAL COMPUTATION course reading list Message-ID: <9302201957.AA22392@neuro.cs.gmr.com> Folks: Here is the reading list for a course I offered last semester at Univ. of Michigan. Unnikrishnan --------------------------------------------------------------- READING LIST FOR THE COURSE "NEURAL COMPUTATION" EECS-598-6 (FALL 1992), UNIVERSITY OF MICHIGAN INSTRUCTOR: K. P. UNNIKRISHNAN ----------------------------------------------- A. COMPUTATION AND CODING IN THE NERVOUS SYSTEM 1. Hodgkin, A.L., and Huxley, A.F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500-544 (1952). 2a. Del Castillo, J., and Katz, B. Quantal components of the end-plate potential. J. Physiol. 124, 560-573 (1954). 2b. Del Castillo, J., and Katz, B. Statistical factors involved in neuromuscular facilitation and depression. J. Physiol. 124, 574-585 (1954). 3. Rall, W. Cable theory for dendritic neurons. In: Methods in neural modeling (Koch and Segev, eds.) pp. 9-62 (1989). 4. Koch, C., and Poggio, T. Biophysics of computation: neurons, synapses and membranes. In: Synaptic function (Edelman, Gall, and Cowan, eds.) pp. 637-698 (1987). B. SENSORY PROCESSING IN VISUAL AND AUDITORY SYSTEMS 1. Werblin, F.S., and Dowling, J.E. Organization of the retina of the mudpuppy, Necturus maculosus: II. Intracellular recording. J. Neurophysiol. 32, 339-355 (1969). 2a. Barlow H.B., and Levick, W.R. The mechanism of directionally selective units in rabbit's retina. J. Physiol. 178, 477-504 (1965). 2b. Lettvin, J.Y., Maturana, H.R., McCulloch, W.S., and Pitts, W.H. What the frog's eye tells the frogs's brain. Proc. IRE 47, 1940-1951 (1959). 3. Hubel, D.H., and Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. 160, 106-154 (1962). 4a. Suga, N. Cortical computational maps for auditory imaging. Neural Networks, 3, 3-21 (1990). 4b. Simmons, J.A. A view of the world through the bat's ear: the formation of acoustic images in echolocation. Cognition, 33 155-199 (1989). C. MODELS OF SENSORY SYSTEMS 1. Hect,S., Shlaer, S., and Pirenne, M.H. Energy, quanta, and vision. J. Gen. Physiol. 25, 819-840 (1942). 2. Julesz, B., and Bergen, J.R. Textons, the fundamental elements in preattentive vision and perception of textures. Bell Sys. Tech. J. 62, 1619-1645 (1983). 3a. Harth, E., Unnikrishnan, K.P., and Pandya, A.S. The inversion of sensory processing by feedback pathways: a model of visual cognitive functions. science 237, 184-187 (1987). 3b. Harth, E., Pandya, A.S., and Unnikrishnan, K.P. Optimization of cortical responses by feedback modification and synthesis of sensory afferents. A model of perception and rem sleep. Concepts Neurosci. 1, 53-68 (1990). 3c. Koch, C. The action of the corticofugal pathway on sensory thalamic nuclei: A hypothesis. Neurosci. 23, 399-406 (1987). 4a. Singer, W. et al., Formation of cortical cell assemblies. In: CSH Symposia on Quant. Biol. 55, pp. 939-952 (1990). 4b. Eckhorn, R., Reitboeck, H.J., Arndt, M., and Dicke, P. Feature linking via synchronization among distributed assemblies: Simulations of results from cat visual cortex. Neural Comp. 293-307 (1990). 5. Reichardt, W., and Poggio, T. Visual control of orientation behavior in the fly. Part I. A quantitative analysis. Q. Rev. Biophys. 9, 311-375 (1976). D. ARTIFICIAL NEURAL NETWORKS 1a. Block, H.D. The perceptron: a model for brain functioning. Rev. Mod. Phy. 34, 123-135 (1962). 1b. Minsky, M.L., and Papert, S.A. Perceptrons. pp. 62-68 (1988). 2a. Hornik, K., Stinchcombe, M., and White, H. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359-366 (1989). 2b. Lapedes, A., and Farber, R. How neural nets work. In: Neural Info. Proc. Sys. (Anderson, ed.) pp. 442-456 (1987). 3a. Ackley, D.H., Hinton, G.E., and Sejnowski, T.J. A learning algorithm for boltzmann machines. Cog. Sci. 9, 147-169 (1985). 3b. Hopfield, J.J. Learning algorithms and probability distributions in feed-forward and feed-back networks. PNAS, USA. 84, 8429-8433 (1987). 4. Tank, D.W., and Hopfield, J.J. Simple neural optimization networks: An A/D converter, signal decision circuit, and linear programming circuit. IEEE Tr. Cir. Sys. 33, 533-541 (1986). E. NEURAL NETWOK APPLICATIONS 1. LeCun, Y., et al., Backpropagation applied to handwritten zip code recognition. Neural Comp. 1, 541-551 (1990). 2. Lapedes, A., and Farber, R. Nonlinear signal processing using neural networks. LA-UR-87-2662, Los Alamos Natl. Lab. (1987). 3. Unnikrishnan, K.P., Hopfield, J.J., and Tank, D.W. Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections. IEEE Tr. ASSP. 39, 698-713 (1991). 4a. De Vries, B., and Principe, J.C. The gamma model - a new neural model for temporal processing. Neural Networks 5, 565-576 (1992). 4b. Poddar, P., and Unnikrishnan, K.P. Memory neuron networks: a prolegomenon. GMR-7493, GM Res. Labs. (1991). 5. Narendra, K.S., and Parthasarathy, K. Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Tr. NN 2, 252-262 (1991). F. HARDWARE IMPLEMENTATIONS 1a. Mahowald, M.A., and Mead, C. Silicon retina. In: Analog VLSI and neural systems (Mead). pp. 257-278 (1989). 1b. Mahowald, M.A., and Douglas, R. A silicon neuron. Nature 354, 515-518 (1991). 2. Mueller, P. et al. Design and fabrication of VLSI components for a general purpose analog computer. In: Proc. IEEE workshop VLSI neural sys. (Mead, ed.) pp. xx-xx (1989). 3. Graf, H.P., Jackel, L.D., and Hubbard, W.E. VLSI implementation of a neural network model. Computer 2, 41-49 (1988). G. ISSUES ON LEARNING 1. Geman, S., Bienenstock, E., and Doursat, R. Neural networks and the bias/variance dilema. Neural Comp. 4, 1-58 (1992). 2. Brown, T.H., Kairiss, E.W., and Keenan, C.L. Hebbian synapses: Biophysical mechanisms and algorithms. Ann. Rev. Neurosci. 13, 475-511 (1990). 3. Haussler, D. Quantifying inductive bias: AI learning algorithms and valiant's learning framework. AI 36, 177-221 (1988). 4. Reeke, G.N. Jr., and Edelman, G.M. Real brains and artificial intelligence. Daedalus 117, 143-173 (1988). 5. White, H. Learning in artificial neural networks: a statistical perspective. Neural Comp. 1, 425-464 (1989). ---------------------------------------------------------------------- SUPPLEMENTAL READING Nehr, E., and Sakmann, B. Single channel currents recorded from membrane of denervated frog muscle fibers. Nature 260, 779-781 (1976). Rall, W. Core conductor theory and cable properties of neurons. In: Handbook Physiol. (Brrokhart, Mountcastle, and Kandel eds.) pp. 39-97 (1977). Shepherd, G.M., and Koch, C. Introduction to synaptic circuits. In: The synaptic organization of the brain (Shepherd, ed.) pp. 3-31 (1990). Junge, D. Synaptic transmission. In: nerve and muscle excitation (Junge) pp. 149-178 (1981). Scott, A.C. The electrophysics of a nerve fiber. Rev. Mod. Phy. 47, 487-533 (1975). Enroth-Cugell, C., and Robson, J.G. The contrast sensitivity of retinal ganglion cells of the cat. J. Physiol. 187, 517-552 (1966). Felleman, D.J., and Van Essen, D.C. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1-47 (1991). Julesz, B. Early vision and focal attention. Rev. Mod. Phy.63, 735-772 (1991). Sejnowski, T.J., Koch, C., and Churchland, P.S. Computational neuroscience. Science 241, 1299-1302 (1988). Churchland, P.S., and Sejnowski, T.J. Perspectives on Cognitive Neuroscience. Science 242, 741-745 (1988). McCulloch, W.S., and Pitts, W. A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophy. 5, 115-133 (1943). Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. PNAS, USA. 79, 2554-2558 (1982). Hopfield, J.J. Neurons with graded responses have collective computational properties like those of two-state neurons. PNAS, USA. 81, 3088-3092 (1984). Hinton, G.E., and Sejnowski, T.J. Optimal perceptual inference. Proc. IEEE CVPR. 448-453 (1983). Rumelhart, D.E., Hinton, G.E., and Williams, R.J. Learning representations by back-propagating errors. Nature 323, 533-536 (1986). Unnikrishnan, K.P., and Venugopal, K.P. Learning in connectionist networks using the Alopex algorithm. Proc. IEEE IJCNN. I-926 - I-931 (1992). Cowan, J.D., and Sharp, D.H. Neural nets. Quart. Rev. Biophys. 21, 365-427 (1988). Lippmann, R.P. An introduction to computing with neural nets. IEEE ASSP Mag. 4, 4-22 (1987). Sompolinsky, H. Statistical mechanics of neural networks. Phy. Today 41, 70-80 (1988). Hinton, G.E. Connectionist learning procedures. Art. Intel. 40, 185-234 (1989).  From demers at cs.ucsd.edu Sun Feb 21 13:45:24 1993 From: demers at cs.ucsd.edu (David DeMers) Date: Sun, 21 Feb 93 10:45:24 -0800 Subject: NIPS-5 papers: Nonlinear dimensionallity reduction / Inverse kinematics Message-ID: <9302211845.AA24988@beowulf> Non-Linear Dimensionality Reduction David DeMers & Garrison Cottrell ABSTRACT -------- A method for creating a non--linear encoder--decoder for multidimensional data with compact representations is presented. The commonly used technique of autoassociation is extended to allow non--linear representations, and an objective function which penalizes activations of individual hidden units is shown to result in minimum dimensional encodings with respect to allowable error in reconstruction. ============================================================ Global Regularization of Inverse Kinematics for Redundant Manipulators David DeMers & Kenneth Kreutz-Delgado ABSTRACT -------- The inverse kinematics problem for redundant manipulators is ill--posed and nonlinear. There are two fundamentally different issues which result in the need for some form of regularization; the existence of multiple solution branches (global ill--posedness) and the existence of excess degrees of freedom (local ill--posedness). For certain classes of manipulators, learning methods applied to input--output data generated from the forward function can be used to globally regularize the problem by partitioning the domain of the forward mapping into a finite set of regions over which the inverse problem is well--posed. Local regularization can be accomplished by an appropriate parameterization of the redundancy consistently over each region. As a result, the ill--posed problem can be transformed into a finite set of well--posed problems. Each can then be solved separately to construct approximate direct inverse functions. ============================================================= Preprints are available from the neuroprose archive Retrievable in the usual way: unix> ftp archive.cis.ohio-state.edu (128.146.8.52) login as "anonymous", password = ftp> cd pub/neuroprose ftp> binary ftp> get demers.nips92-nldr.ps.Z ftp> get demers.nips92-robot.ps.Z ftp> bye unix> uncompress demers.*.ps.Z unix> lpr -s demers.nips92-nldr.ps.Z unix> lpr -s demers.nips92-robot.ps.Z (or however you print *LARGE* PostScript files) These papers will appear in S.J. Hanson, J.E. Moody & C.L. Giles, eds, Advances in Neural Information Processing Systems 5 (Morgan Kaufmann, 1993). Dave DeMers demers at cs.ucsd.edu Computer Science & Engineering 0114 demers%cs at ucsd.bitnet UC San Diego ...!ucsd!cs!demers La Jolla, CA 92093-0114 (619) 534-0688, or -8187, FAX: (619) 534-7029  From srikanth at rex.cs.tulane.edu Sun Feb 21 14:41:45 1993 From: srikanth at rex.cs.tulane.edu (R. Srikanth) Date: Sun, 21 Feb 93 13:41:45 CST Subject: Abstract, New Squashing function... In-Reply-To: <199302192022.AA03327@verdi.eng.umd.edu>; from "David L. Elliott" at Feb 19, 93 3:22 pm Message-ID: <9302211941.AA17332@hercules.cs.tulane.edu> > > ABSTRACT > > A BETTER ACTIVATION FUNCTION FOR ARTIFICIAL NEURAL NETWORKS > > TR 93-8, Institute for Systems Research, University of Maryland > > by David L. Elliott-- ISR, NeuroDyne, Inc., and Washington University > January 29, 1993 > The activation function s(x) = x/(1 + |x|) is proposed for use in > digital simulation of neural networks, on the grounds that the > computational operation count for this function is much smaller than > for those using exponentials and that it satisfies the simple differential > equation s' = (1 + |s|)^2, which generalizes the logistic equation. > The full report, a work-in-progress, is available in LaTeX or PostScript > form (two pages + titlepage) by request to delliott at src.umd.edu. > > This squashing function while not widely in use, is and has been used by few others. George Georgiou uses it for a complex back propagation network. Not only does the activation function enable him to model a complex BP but also seems to lend itself to easier implementation. For more information on complex domain backprop, contact Dr. George Georgiou at georgiou at meridian.csci.csusb.edu -- srikanth at cs.tulane.edu Dept of Computer Science, Tulane University, New Orleans, La - 70118  From delliott at src.umd.edu Sun Feb 21 15:00:03 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Sun, 21 Feb 1993 15:00:03 -0500 Subject: Response Message-ID: <199302212000.AA17583@newra.src.umd.edu> Henrik- Thanks for your comment; you wrote: "Any interesting squashing function can be stored in a table of negligible size (eg 256) with very high accuracy if linear (or higher) interpolation is used." I think you are right *if the domain of the map is compact* a priori. Otherwise the approximation must eventually become constant for large x, and this has bad consequences for backpropagation algorithms. For some other training methods, perhaps not. David  From gluck at pavlov.rutgers.edu Mon Feb 22 08:05:05 1993 From: gluck at pavlov.rutgers.edu (Mark Gluck) Date: Mon, 22 Feb 93 08:05:05 EST Subject: Neural Computation & Cognition: Opening for NN Programmer Message-ID: <9302221305.AA04474@james.rutgers.edu> POSITION AVAILABLE: NEURAL-NETWORK RESEARCH PROGRAMMER At the Center for Neuroscience at Rutgers-Newark, we have an opening for a full or part-time research programmer to assist in developing neural-network simulations. The research involves integrated experimental and theoretical analyses of the cognitive and neural bases of learning and memory. The focus of this research is on understanding the underlying neurobiological mechanisms for complex learning behaviors in both animals and humans. Substantial prior experience and understanding of neural-network theories and algorithms is required. Applicants should have a high level of programming experience (C or Pascal), and familiarity with Macintosh and/or UNIX. Strong English-language communication and writing skills are essential. *** This position would be particularly appropriate for a graduating college senior who seeks "hands-on" research experience prior to graduate school in the cognitive, neural, or computational sciences *** Applications are being accepted now for an immediate start-date or for starting in June or September of this year. NOTE TO N. CALIF. APPLICANTS: Interviews for applicants from the San Francisco/Silicon Valley area will be conducted at Stanford in late March. The Neuroscience Center is located 20 minutes outside of New York City in northern New Jersey. For further information, please send an email or hard-copy letter describe your relevant background, experience, and career goals to: ______________________________________________________________________ Dr. Mark A. Gluck Center for Molecular & Behavioral Neuroscience Rutgers University 197 University Ave. Newark, New Jersey 07102 Phone: (201) 648-1080 (Ext. 3221) Fax: (201) 648-1272 Email: gluck at pavlov.rutgers.edu  From peleg at cs.huji.ac.il Tue Feb 23 15:38:02 1993 From: peleg at cs.huji.ac.il (Shmuel Peleg) Date: Tue, 23 Feb 93 22:38:02 +0200 Subject: CFP: 12-ICPR, Int Conf Pattern Recognition, Jerusalem, 1994 Message-ID: <9302232038.AA28915@humus.cs.huji.ac.il> =============================================================================== CALL FOR PAPERS - 12th ICPR International Conferences on Pattern Recognition Oct 9-13, 1994, Jerusalem, Israel The 12th ICPR of the International Association for Pattern Recognition will be organized as a set of four conferences, each dealing with a special topic. The program for each individual conference will be organized by its own Program Committee. Papers describing applications are encouraged, and will be reviewed by a special Applications Committee. An award will be given for the best industry-related paper presented at the conference. Considerations for this award will include innovative applications, robust performance, and contributions to industrial progress. An exhibition will also be held. The conference proceedings are published by the IEEE Computer Society Press. GENERAL CO-CHAIRS: S. Ullman - Weizmann Inst. (shimon at wisdom.weizmann.ac.il) S. Peleg - The Hebrew University (peleg at cs.huji.ac.il) LOCAL ARRANGEMENTS: Y. Yeshurun - Tel-Aviv University (hezy at math.tau.ac.il) INDUSTRIAL & APPLICATIONS LIAISON: M. Ejiri - Hitachi (ejiri at crl.hitachi.co.jp) CONFERENCE DESCRIPTIONS 1. COMPUTER VISION AND IMAGE PROCESSING, T. Huang - University of Illinois Early vision and segmentation; image representation; shape and texture analysis; motion and stereo; range imaging and remote sensing; color; 3D representation and recognition. 2. PATTERN RECOGNITION AND NEURAL NETWORKS, N. Tishby - The Hebrew University Statistical, syntactic, and hybrid pattern recognition techniques; neural networks for associative memory, classification, and temporal processing; biologically oriented neural networks models; biomedical applications. 3. SIGNAL PROCESSING, D. Malah - Technion, Israel Institute of Technology Analysis, representation, coding, and recognition of signals; signal and image enhancement and restoration; scale-space and joint time-frequency analysis and representation; speech coding and recognition; image and video coding; auditory scene analysis. 4. PARALLEL COMPUTING, S. Tanimoto - University of Washington Parallel architectures and algorithms for pattern recognition, vision, and signal processing; special languages, programming tools, and applications of multiprocessor and distributed methods; design of chips, real-time hardware, and neural networks; recognition using multiple sensory modalities. PAPER SUBMISSION DEADLINE: February 1, 1994. Notification of Acceptance: May 1994. Camera-Ready Copy: June 1994. Send four copies of paper to: 12th ICPR, c/o International, 10 Rothschild blvd, 65121 Tel Aviv, ISRAEL. Tel. +972(3)510-2538, Fax +972(3)660-604 Each manuscript should include the following: 1. A Summary Page addressing these topics: - To which of the four conference is the paper submitted? - What is the paper about? - What is the original contribution of this work? - Does the paper mainly describe an application, and should be reviewed by the applications committee? 2. The paper, limited in length to 4000 words. This is the estimated length of the proceedings version. For further information contact the secretariat at the above address, or use E-mail: icpr at math.tau.ac.il . ===============================================================================  From prechelt at ira.uka.de Tue Feb 23 08:55:11 1993 From: prechelt at ira.uka.de (prechelt@ira.uka.de) Date: Tue, 23 Feb 93 14:55:11 +0100 Subject: Squashing functions In-Reply-To: Your message of Fri, 19 Feb 93 16:47:16 +0000. <9302191647.AA05729@cato.robots.ox.ac.uk> Message-ID: > Any interesting squashing function can be stored in a table of negligible size > (eg 256) with very high accuracy if linear (or higher) interpolation is used. 256 points are not always negligible: On a fine-grain massively parallel machine such as the MasPar MP-1, the 256*4 bytes needed to store it can consume a considerable amount of the available memory. Our MP-1216A has 16384 processors with only 16 kB memory each. Another point: On this machine, I am not sure whether interpolating from such a table would really be faster than, say, a third order Taylor approximation of the sigmoid. Lutz Lutz Prechelt (email: prechelt at ira.uka.de) | Whenever you Institut fuer Programmstrukturen und Datenorganisation | complicate things, Universitaet Karlsruhe; D-7500 Karlsruhe 1; Germany | they get (Voice: ++49/721/608-4068, FAX: ++49/721/694092) | less simple.  From henrik at robots.ox.ac.uk Tue Feb 23 13:56:11 1993 From: henrik at robots.ox.ac.uk (henrik@robots.ox.ac.uk) Date: Tue, 23 Feb 93 18:56:11 GMT Subject: Squashing functions (continued) Message-ID: <9302231856.AA22594@cato.robots.ox.ac.uk> The saturation problem ('the actviation function gets constant for large |x|') can usually be solved by putting the derivative of the act. function into a table as well. You can then cheat a bit by not setting it to zero at large |x|. Concerning memory requirements (eg, MasPar MP1). I don't see why I need 4 bytes per table entry. According to the paper by Fahlman & Hoehfeld on limited pre- cision, the quantization can be done with very few bits (less than 8 if tricks are used). With interpolation you can get a pretty decent 16 bit act. value out of a 8bit wide table. Apart of that, seems to be quite complicated to put a nn on 16K processors ... how do you do that ? Cheers, henrik at robots.ox.ac.uk  From xueh at microsoft.com Wed Feb 24 01:19:47 1993 From: xueh at microsoft.com (Xuedong Huang) Date: Tue, 23 Feb 93 22:19:47 PST Subject: Microsoft Speech Research Message-ID: <9302240620.AA07680@netmail.microsoft.com> As you may know, I've started a new speech group here at Microsoft. For your information, I have enclosed the full advertisement we have been using to publicize the openings. If you are interested in joining MS, I strongly encourage you to apply and we will look forward to following up with you. ------------------------------------------------------------ THE FUTURE IS HERE. Speech Recognition. Intuitive Graphical Interfaces. Sophisticated User Agents. Advanced Operating Systems. Robust Environments. World Class Applications. Who's Pulling It All Together? Microsoft. We're setting the stage for the future of computing, building a world class research group and leveraging a solid foundation of object based technology and scalable operating systems. What's more, we're extending the recognition paradigm, employing advanced processor and RISC-based architecture, and harnessing distributed networks to connect users to worlds of information. We want to see more than just our own software running. We want to see a whole generation of users realize the future of computing. Realize your future with a position in our Speech Recognition group. Research Software Design Engineers, Speech Recognition. Primary responsibilities include designing and developing User Interface and systems level software for an advanced speech recognition system. A minimum of 3 years demonstrated microcomputer software design and development experience in C is required. Knowledge of Windows programming, speech recognition systems, hidden Markov model theory, statistics, DSP, or user interface development is preferred. A BA/BS in computer science or related discipline is required. An advanced degree (MS or Ph.D.) in a related discipline is preferred. Researchers, Speech Recognition. Primary responsibilities include research on stochastic modeling techniques to be applied to an advanced speech recognition system. A minimum of 4 years demonstrated research excellence in the area of speech recognition or spoken language understanding systems is required. Knowledge of Windows and real-time C programming for microcomputers, hidden Markov model theory, decoder systems design, DSP, and spoken language understanding is preferred. A MA/MS in CS or related discipline is required. A PhD degree in CS, EE, or related discipline is preferred. Make The Most of Your Future. At Microsoft, our technical leadership and strong Software Developers and Researchers stay ahead of the times, creating vision and turning it into reality. To apply, send your resume and cover letter, noting "ATTN: N5935-0223" to: Surface: Microsoft Recruiting ATTN: N5935-0223 One Microsoft Way Redmond, WA 98052-6399 Email: ASCII ONLY y-wait at microsoft.com.us Microsoft is an equal opportunity employer working to increase workforce diversity.  From john at cs.uow.edu.au Fri Feb 26 13:56:21 1993 From: john at cs.uow.edu.au (John Fulcher) Date: Fri, 26 Feb 93 13:56:21 EST Subject: submission Message-ID: <199302260256.AA25570@wraith.cs.uow.edu.au> COMPUTER STANDARDS & INTERFACES (North-Holland) Forthcoming Special Issue on ANN Standards ADDENDUM TO ORIGINAL POSTING Prompted by enquiries from several people regarding my original Call for Papers posting, I felt I should offer the following additional information (clarification). By ANN "Standards" we do not mean exclusively formal standards (in the ISO, IEEE, ANSI, CCITT etc. sense), although naturally enough we will be including papers on activities in these areas. "Standards" should be interpreted in its most general sense, namely as standard APPROACHES (e.g. the backpropagation algorithm & its many variants). Thus if you have a paper on some (any?) aspect of ANNs, provided it is prefaced by a summary of the standard approach(es) in that particular area, it could well be suitable for inclusion in this special issue of CS&I. If in doubt, post fax or email a copy by April 30th to: John Fulcher, Department of Computer Science, University of Wollongong, Northfields Avenue, Wollongong NSW 2522, Australia. fax: +61 42 213262 email: john at cs.uow.edu.au.oz  From terry at helmholtz.sdsc.edu Thu Feb 25 14:57:05 1993 From: terry at helmholtz.sdsc.edu (Terry Sejnowski) Date: Thu, 25 Feb 93 11:57:05 PST Subject: Neural Computation 5:2 Message-ID: <9302251957.AA14806@helmholtz.sdsc.edu> NEURAL COMPUTATION - Volume 5 - Issue 2 - March 1993 Review Neural Networks and Non-Linear Adaptive Filtering: Unifying Concepts and New Algorithms O. Nerrand, P. Roussel-Ragot, L. Personnaz, G. Dreyfus and S. Marcos Notes Fast Calculation of Synaptic Conductances Rajagopal Srinivasan and Hillel J. Chiel The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning Peter Dayan and Terrence J. Sejnowski Optimal Network Construction by Minimum Description Length Gary D. Kendall and Trevor J. Hall Letters A Neural Network Model of Inhibitory Information Processing in Aplysia Diana E.J. Blazis, Thomas M. Fischer and Thomas J. Carew Computational Diversity in a Formal Model of the Insect Olfactory Macroglomerulus C. Linster, C. Masson, M. Kerszberg, L. Personnaz and G. Dreyfus Learning Competition and Cooperation Sungzoon Cho and James A. Reggia Constraints on Synchronizing Oscillator Networks David E. Cairns, Roland J. Baddeley and Leslie S. Smith Learning Mixture Models of Spatial Coherence Suzanna Becker and Geoffrey E. Hinton Hints and the VC Dimension Yaser S. Abu-Mostafa Redundancy Reduction as a Strategy for Unsupervised Learning A. Norman Redlich Approximation and Radial-Basis-Function Networks Jooyoung Park and Irwin W. Sandberg A Polynomial Time Algorithm for Generating Neural Networks for Pattern Classification - its Stability Properties and Some Test Results Somnath Mukhopadhyay, Asim Roy, Lark Sang Kim and Sandeep Govil Neural Networks for Optimization Problems with Inequality Constraints - The Knapsack Problem Mattias Ohlsson, Carsten Peterson and Bo Soderberg ----- SUBSCRIPTIONS - VOLUME 5 - BIMONTHLY (6 issues) ______ $40 Student ______ $65 Individual ______ $156 Institution Add $22 for postage and handling outside USA (+7% GST for Canada). (Back issues from Volumes 1-4 are regularly available for $28 each to institutions and $14 each for individuals Add $5 for postage per issue outside USA (+7% GST for Canada) MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142. Tel: (617) 253-2889 FAX: (617) 258-6779 e-mail: hiscox at mitvma.mit.edu -----  From mark at dcs.kcl.ac.uk Fri Feb 26 08:25:01 1993 From: mark at dcs.kcl.ac.uk (Mark Plumbley) Date: Fri, 26 Feb 93 13:25:01 GMT Subject: King's College London Neural Networks MSc and PhD courses Message-ID: <17179.9302261325@xenon.dcs.kcl.ac.uk> Fellow Neural Networkers, Please post or forward this announcement about our M.Sc. and Ph.D. courses in Neural Networks to anyone who might be interested. Thanks, Mark Plumbley ------------------------------------------------------------------------- Dr. Mark D. Plumbley Tel: +44 71 873 2241 Centre for Neural Networks Fax: +44 71 873 2017 Department of Mathematics/King's College London/Strand/London WC2R 2LS/UK ------------------------------------------------------------------------- CENTRE FOR NEURAL NETWORKS and DEPARTMENT OF MATHEMATICS King's College London Strand London WC2R 2LS, UK M.Sc. AND Ph.D. COURSES IN NEURAL NETWORKS --------------------------------------------------------------------- M.Sc. in INFORMATION PROCESSING and NEURAL NETWORKS --------------------------------------------------- A ONE YEAR COURSE CONTENTS Dynamical Systems Theory Fourier Analysis Biosystems Theory Advanced Neural Networks Control Theory Combinatorial Models of Computing Digital Learning Digital Signal Processing Theory of Information Processing Communications Neurobiology REQUIREMENTS First Degree in Physics, Mathematics, Computing or Engineering NOTE: For 1993/94 we have 3 SERC quota awards for this course. --------------------------------------------------------------------- Ph.D. in NEURAL COMPUTING ------------------------- A 3-year Ph.D. programme in NEURAL COMPUTING is offered to applicants with a First degree in Mathematics, Computing, Physics or Engineering (others will also be considered). The first year consists of courses given under the M.Sc. in Information Processing and Neural Networks (see attached notice). Second and third year research will be supervised in one of the various programmes in the development and application of temporal, non-linear and stochastic features of neurons in visual, auditory and speech processing. There is also work in higher level category and concept formation and episodic memory storage. Analysis and simulation are used, both on PC's SUNs and main frame machines, and there is a programme on the development and use of adaptive hardware chips in VLSI for pattern and speed processing. This work is part of the activities of the Centre for Neural Networks in the School of Physical Sciences and Engineering, which has over 40 researchers in Neural Networks. It is one of the main centres of the subject in the U.K. --------------------------------------------------------------------- For further information on either of these courses please contact: Postgraduate Secretary Department of Mathematics King's College London Strand London WC2R 2LS, UK MATHS at OAK.CC.KCL.AC.UK