From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Tue Nov 1 12:02:32 1988 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (Thanasis Kehagias) Date: Tue, 01 Nov 88 12:02:32 EST Subject: proceedings of Pittsburgh School Message-ID: Are the proceedings of the Pitt. Summer School out already? do people that contributed get a free copy, or do we have to order our own? where from? Thanks, Thanasis From dyer at CS.UCLA.EDU Tue Nov 1 14:24:52 1988 From: dyer at CS.UCLA.EDU (Dr Michael G Dyer) Date: Tue, 1 Nov 88 11:24:52 PST Subject: Technical Report Available Message-ID: <881101.192452z.05183.dyer@lanai.cs.ucla.edu> Symbolic NeuroEngineering for Natural Language Processing: A Multilevel Research Approach. Michael G. Dyer Tech. Rep. UCLA-AI-88-14 Abstract: Natural language processing (NLP) research has been built on the assumption that natural language tasks, such as comprehension, generation, argumentation, acquisition, and question answering, are fundamentally symbolic in nature. Recently, an alternative, subsymbolic paradigm has arisen, inspired by neural mechanisms and based on parallel processing over distributed representations. In this paper, the assumptions of these two paradigms are compared and contrasted, resulting in the observation that each paradigm possesses strengths exactly where the other is weak, and vice versa. This observation serves as a strong motivation for synthesis. A multilevel research approach is proposed, involving the construction of hybrid models, to achieve the long-term goal of mapping high-level cognitive function into neural mechanisms and brain architecture. Four levels of modeling are discussed: knowledge engineering level, localist connectionist level, distributed processing level, and artificial neural systems dynamics level. The two major goals of research at each level are (a) to explore its scope and limits and (b) to find mappings to the levels above and below it. In this paper the capabilities of several NLP models, at each level, are described, along with major research questions remaining to be resolved and major techniques currently being used in an attempt to complete the mappings. Techniques include: (1) forming hybrid systems with spreading activation, thresholds and markers to propagate bindings, (2) using extended back-error propagation in reactive training environments to eliminate microfeature representations, (3) transforming weight matrices into patterns of activation to create virtual semantic networks, (4) using conjunctive codings to implement role bindings, and (5) employing firing patterns and time-varying action potential to represent and associate verbal with visual sequences. (This report to appear in J. Barnden and J. Pollack (Eds.) Advances in Connectionist and Neural Computation Theory. Ablex Publ. An initial version of this report was presented at the AAAI & ONR sponsored Workshop on HIgh-Level Connectionism, held at New Mexico State University, April 9-11, 1988.) For copies of this tech. rep., please send requests to: Valerie at CS.UCLA.EDU or Valerie Aylett 3532 Boelter Hall Computer Science Dept. UCLA, Los Angeles, CA 90024 From jbower at bek-mc.caltech.edu Tue Nov 1 16:38:34 1988 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Tue, 1 Nov 88 13:38:34 pst Subject: NIPS computer demos Message-ID: <8811012138.AA02291@bek-mc.caltech.edu> Concerning: Software demonstrations at NIPS Authors presenting papers at NIPS are invited to demo any relevant software either at the meeting itself, or during the post-meeting workshop. The organizers have arranged for several IBMs and SUN workstations to be available. For information on the IBMs contact Scott Kirkpatrick at Kirk at IBM.COM. Two SUN 386i workstations will be available. Each will have a 1/4 cartrage tape drive as well as the standard hard floppies. The machines each have 8 MBytes of memory and color monitors. SUN windows as well as X windows (version 11.3) will be supported. The Caltech neural network simulator GENESIS will be available. For further information on the SUN demos contact: John Uhley (Uhley at Caltech.bitnet) From Dave.Touretzky at B.GP.CS.CMU.EDU Tue Nov 1 21:08:54 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Tue, 01 Nov 88 21:08:54 EST Subject: proceedings of Pittsburgh School In-Reply-To: Your message of Tue, 01 Nov 88 12:02:32 -0500. Message-ID: <2673.594439734@DST.BOLTZ.CS.CMU.EDU> The proceedings are due back from the printer this week. Morgan Kaufmann will be mailing you your free copy shortly. Additional copies can be ordered from them for $24.95. They will have a booth at the NIPS conference and will be selling the proceedings there. -- Dave From neural!jsd Wed Nov 2 11:25:35 1988 From: neural!jsd (John Denker) Date: Wed, 2 Nov 88 11:25:35 EST Subject: LMS fails even in separable cases Message-ID: <8811021624.AA19399@neural.UUCP> ((I apologize for possible duplication. I posted this 5 days ago with no discernible effect, so I'm trying again.)) Yes, we noticed that a Least-Mean-Squares (LMS) network even with no hidden units fails to separate some problems. Ben Wittner spoke at the IEEE NIPS meeting in Denver, November 1987, describing !two! failings of this type. He gave an example of a situation in which LMS algorithms (including ordinary versions of back-prop) are metastable, i.e. they fail to separate the data for certain initial configurations of the weights. He went on to describe another case in which the algorithm actually !leaves! the solution region after starting within it. He also pointed out that this can lead to learning sessions in which the categorization performance of back-prop nets (with or without hidden units) is not a monotonically improving function of learning time. Finally, he presented a couple of ways of modifying the algorithm to prevent these problems, and proved a convergence theorem for the modified algorithms. One of the key ideas is something that has been mentioned in several recent postings, namely, to have zero penalty when the training pattern is well-classified or "beyond". We cited Minsky & Papert as well as Duda & Hart; we believe they were more-or-less aware of these bugs in LMS, although they never presented explicit examples of the failure modes. Here is the abstract of our paper in the proceedings, _Neural Information Processing Systems -- Natural and Synthetic_, Denver, Colorado, November 8-12, 1987, Dana Anderson Ed., AIP Press. We posted the abstract back in January '88, but apparently it didn't get through to everybody. Reprints of the whole paper are available. Strategies for Teaching Layered Networks Classification Tasks Ben S. Wittner (1) John S. Denker AT&T Bell Laboratories Holmdel, New Jersey 07733 ABSTRACT: There is a widespread misconception that the delta-rule is in some sense guaranteed to work on networks without hidden units. As previous authors have mentioned, there is no such guarantee for classification tasks. We will begin by presenting explicit counter-examples illustrating two different interesting ways in which the delta rule can fail. We go on to provide conditions which do guarantee that gradient descent will successfully train networks without hidden units to perform two-category classification tasks. We discuss the generalization of our ideas to networks with hidden units and to multi-category classification tasks. (1) Currently at NYNEX Science and Technology / 500 Westchester Ave. White Plains, NY 10604 From subutai at cougar.ccsr.uiuc.edu Thu Nov 3 14:19:20 1988 From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Date: Thu, 3 Nov 88 13:19:20 CST Subject: Scaling and Generalization in Neural Networks Message-ID: <8811031919.AA01584@cougar.ccsr.uiuc.edu> The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. From emo at iuvax.cs.indiana.edu Thu Nov 3 22:32:31 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:32:31 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 19:20:08 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:33:15 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:33:15 EST Subject: Mail problems... Message-ID: #/bin/cat <From gold::gold::wins% Thu Nov 3 20:29:22 1988 Date: Thu, 3 Nov 88 20:29:21 EST From: gold::gold::wins% (subutai at cougar.ccsr.uiuc.edu) To: mm Subject: Scaling and Generalization in Neural Networks Return-Path: Received: from Q.CS.CMU.EDU by gold.bacs.indiana.edu with SMTP ; Thu, 3 Nov 88 20:30:56 EST Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:34:29 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:34:29 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 21:12:43 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:32:24 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:32:24 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 19:13:36 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From unido!gmdzi!joerg at uunet.UU.NET Fri Nov 4 10:07:18 1988 From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann) Date: Fri, 4 Nov 88 14:07:18 -0100 Subject: Tech. report available Message-ID: <8811041307.AA07221@gmdzi.UUCP> Detection of Minimal Microfeatures by Internal Feedback J. Kindermann & A. Linden e-mail: joerg at gmdzi, al at gmdzi Gesellschaft fuer Mathematik und Datenverarbeitung mbH Postfach 1240 D-5205 Sankt Augustin 1 Abstract We define the notion of minimal microfeatures and introduce a new method of internal feedback for multilayer networks. Error signals are used to modify the input of a net. When combined with input decay, internal feedback allows the detection of sets of minimal microfeatures, i.e. those subpatterns which the network actually uses for discrimination. Additional noise on the training data increases the number of minimal microfeatures for a given pattern. The detection of minimal microfeatures is a first step towards a subsymbolic system with the capability of self-explanation. The paper provides examples from the domain of letter recognition. Keywords: minimal microfeatures, neural networks, parallel distributed processing, backpropagation, self-explanation. ************************ If you would like a copy of the above technical report, please send e-mail to joerg at gmdzi.uucp or write to: Dr. Joerg Kindermann Gesellschaft fuer Mathematik und Datenverarbeitung Schloss Birlinghoven Postfach 1240 D-5205 St. Augustin 1 WEST GERMANY Please remember: no reply or Cc to connectionists at .. ************************* From terry at cs.jhu.edu Tue Nov 8 19:41:15 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Tue, 8 Nov 88 19:41:15 est Subject: 88 Connectionist Proceedings Message-ID: <8811090041.AA18238@crabcake.cs.jhu.edu> NOW AVAILABLE: Proceedings of the 1988 Connectionist Models Summer School, edited by David Touretzky, Geoffrey Hinton, and Terrence Sejnowski. Available from: Morgan Kaufmann Publishers, Inc. Order Fulfillment Center P.O. Box 50490 Palo Alto, CA 94303-9953 tel. 415-965-4081 Cost is $24.95 plus $2.25 postage and handling ($4.00 for foreign orders.) For each additional volume ordered, increase postage by $1.00 (foreign, $3.00). Enclose full payment by check or money order. California residents please add sales tax. Terry ----- From David.Servan-Schreiber at A.GP.CS.CMU.EDU Wed Nov 9 00:05:00 1988 From: David.Servan-Schreiber at A.GP.CS.CMU.EDU (David.Servan-Schreiber@A.GP.CS.CMU.EDU) Date: Wed, 09 Nov 88 00:05:00 EST Subject: Technical report announcement Message-ID: <27375.595055100@A.GP.CS.CMU.EDU> The following technical report is available upon request: ENCODING SEQUENTIAL STRUCTURE IN SIMPLE RECURRENT NETWORKS David Servan-Schreiber, Axel Cleeremans & James L. McClelland CMU-CS-88-183 We explore a network architecture introduced by Elman (1988) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time- step t-1, together with element t, to predict element t+1. When the network is trained with strings from a particular finite- state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finite-state recognizer. We explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. Send surface mail to : Department of Computer Science Carnegie Mellon University Pittsburgh, PA. 15213-3890 U.S.A or electronic mail to Ms. Terina Jett: Jett at CS.CMU.EDU (ARPA net) Ask for technical report CMU-CS-88-183. From pratt at paul.rutgers.edu Wed Nov 9 14:37:50 1988 From: pratt at paul.rutgers.edu (Lorien Y. Pratt) Date: Wed, 9 Nov 88 14:37:50 EST Subject: E. Tzakou to speak on ALOPEX Message-ID: <8811091937.AA00231@paul.rutgers.edu> Fall, 1988 Neural Networks Colloquium Series at Rutgers ALOPEX: Another optimization method ----------------------------------- E. Tzanakou Rutgers University Biomedical Engineering Room 705 Hill center, Busch Campus Friday November 18, 1988 at 11:10 am Refreshments served before the talk Abstract The ALOPEX process was developed in the early 70's by Harth and Tzanakou as an automated method of mapping Visual Receptive Fields in the Visual Pathway of animals. Since then it has been used as a "universal" optimization method that lends itself to a number of optimization problems. The method uses a cost function that is calculated by the simultaneous convergence of a large number of parameters. It is iterative and stochastic in nature and has the tendency to avoid local extrema. Computing times largely depend on the number of iterations required for convergence and on times required to compute the cost function. As such they are problem dependent. On the other hand ALOPEX has a unique inherent feature i.e it can run in a parallel manner by which the computing times can be reduced. Several applications of the method in physical, physiological and pattern recognition problems will be discussed. From dewan at paul.rutgers.edu Wed Nov 9 18:35:46 1988 From: dewan at paul.rutgers.edu (Hasanat Dewan) Date: Wed, 9 Nov 88 18:35:46 EST Subject: Dave Handelman at Sarnoff on knowledge-based + nnets for robots Message-ID: <8811092335.AA05411@surfers.rutgers.edu> From terry at cs.jhu.edu Wed Nov 9 19:06:29 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Wed, 9 Nov 88 19:06:29 est Subject: Frontiers in Neuroscience Message-ID: <8811100006.AA28449@crabcake.cs.jhu.edu> The latest issue of Science (4 November) has a special section on Frontiers in Neuroscience. The cover is a spectacular image of a Purkinje cell by Dave Tank. Four of the major reviews in the issue make contact with network modeling: Tom Brown et al. on Long-Term Synatpic Potentiation; Steve Lisberger on The Neural Basis for Learning of Simple Motor Skills; Steve Wise and Bob Desimone on Insights into Seeing and Grasping; and Pat Churchland and Terry Sejnowski on Perspectives on Cognitive Neuroscience. See also the letter by Dave Tank et al. on Spatially Resolved Calcium Dynamics of Mammalian Purkinje Cells in Cerebellar Slice. This issue was timed to coincide with the Annual Meeting of the Society for Neuroscience in Toronto next week. Terry ----- From rba at flash.bellcore.com Thu Nov 10 16:52:24 1988 From: rba at flash.bellcore.com (Robert B Allen) Date: Thu, 10 Nov 88 16:52:24 EST Subject: No subject Message-ID: <8811102152.AA21305@flash.bellcore.com> Subject: Report Available - Connectionist State Machines Connectionist State Machines Robert B. Allen Bellcore, November 1988 Performance of sequential adaptive networks on a number of tasks was explored. For example the ability to respond to continuous sequences was demonstrated first with a network which was trained to flag a given subsequence and, in a second study, to generate responses to transitions conditional upon previous transitions. Another set of studies demonstrated that the networks are able to recognize legal strings drawn from simple context-free grammars and regular expressions. Finally, sequential networks were also shown to be able to be trained to generate long strings. In some cases, adaptive schedules were introduced to gradually extend the network's processing of strings. Contact: rba at bllcore.com Robert B. Allen 2A-367 Bellcore Morristown, NJ 07960-1910 From harnad at confidence.Princeton.EDU Fri Nov 11 02:32:57 1988 From: harnad at confidence.Princeton.EDU (Stevan Harnad) Date: Fri, 11 Nov 88 02:32:57 EST Subject: BBS Call For Commentators: The Tag Assignment Problem Message-ID: <8811110732.AA00839@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal providing Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. To be considered as a commentator or to suggest other appropriate commentators, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ A SOLUTION TO THE TAG-ASSIGNMENT PROBLEM FOR NEURAL NETWORKS Gary W. Strong Bruce A. Whitehead College of Information Studies Computer Science Program Drexel University University of Tennessee Space Institute Philadelphia, PA 19104 USA Tullahoma, TN 37388 USA ABSTRACT: Purely parallel neural networks can model object recognition in brief displays -- the same conditions under which illusory conjunctions (the incorrect combination of features into perceived objects in a stimulus array) have been demonstrated empirically (Treisman & Gelade 1980; Treisman 1986). Correcting errors of illusory conjunction is the "tag-assignment" problem for a purely parallel processor: the problem of assigning a spatial tag to nonspatial features, feature combinations and objects. This problem must be solved to model human object recognition over a longer time scale. A neurally plausible model has been constructed which simulates both the parallel processes that may give rise to illusory conjunctions and the serial processes that may solve the tag-assignment problem in normal perception. One component of the model extracts pooled features and another provides attentional tags that can correct illusory conjunctions. Our approach addresses two questions: (i) How can objects be identified from simultaneously attended features in a parallel, distributed representation? (ii) How can the spatial selection requirements of such an attentional process be met by a separation of pathways between spatial and nonspatial processing? Analysis of these questions yields a neurally plausible simulation model of tag assignment, based on synchronization of neural activity for features within a spatial focus of attention. KEYWORDS: affordance; attention; connectionist network; eye movements; illusory conjunction; neural network; object recognition; retinotopic representations; saccades; spatial localization From rr%eusip.edinburgh.ac.uk at NSS.Cs.Ucl.AC.UK Fri Nov 11 06:29:38 1988 From: rr%eusip.edinburgh.ac.uk at NSS.Cs.Ucl.AC.UK (Richard Rohwer) Date: Fri, 11 Nov 88 11:29:38 GMT Subject: seperability and unbalanced data discussion Message-ID: <11289.8811111129@eusip.ed.ac.uk> In a close inspection of convergence ailments afflicting a multilayer net, I found that the problem boiled down to a layer which needed to learn the separable AND function, but wasn't. So I had a close look at the LMS error function for AND, in terms of the the weights from each of the two inputs, the bias weight, and the multiplicities of each of the 4 exemplars in the truth table. It turns out that the error can not be made exactly 0 (with finite weights), so minimization of the error involves a tradeoff between the contributions of the 4 exemplars, and this tradeoff is strongly influenced by the multiplicities. It is not difficult to find the minimum analytically in this problem, so I was able to verify that with my highly unbalanced training data, the actual minimum was precisely where the LMS algorithm had terminated, miles away from a reasonable solution for AND. I also found that balanced data puts the minimum where it "belongs". The relative importance of the different exemplars in the LMS error function runs as the square root of the ratio of their multiplicities. So I solved my particular problem by turning to a quartic error function, for which it is the 4th root of this ratio that matters. (The p-norm, p-th root of the sum of the p-th powers, approaches the MAX norm as p approaches infinity, and 4 is much closer to infinity than 2.) ---Richard Rohwer, CSTR, Edinburgh From jose at tractatus.bellcore.com Sun Nov 13 13:01:41 1988 From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Sun, 13 Nov 88 13:01:41 EST Subject: seperability and unbalanced data discussion Message-ID: <8811131801.AA08242@tractatus.bellcore.com> See Hanson S. J. & Burr, D., Minkowski-r Backpropagation: Learning in Connectionist Networks with non-euclidean error metrics, in D. Anderson, Neural Information Processing: Natural and Synthetic, AIP, 1988. We look at some similar cases.. Steve From russ%yummy at gateway.mitre.org Mon Nov 14 09:36:19 1988 From: russ%yummy at gateway.mitre.org (russ%yummy@gateway.mitre.org) Date: Mon, 14 Nov 88 09:36:19 EST Subject: Paper Request In-Reply-To: Robert B Allen's message of Thu, 10 Nov 88 16:52:24 EST <8811102152.AA21305@flash.bellcore.com> Message-ID: <8811141436.AA11255@baklava.mitre.org> Please send me a copy of the paper: Connectionist State Machines Robert B. Allen Bellcore, November 1988 Thanks, Russ. ARPA: russ%yummy at gateway.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA From sylvie at wildcat.caltech.edu Mon Nov 14 19:48:46 1988 From: sylvie at wildcat.caltech.edu (sylvie ryckebusch) Date: Mon, 14 Nov 88 16:48:46 pst Subject: Roommate for NIPS 88 Message-ID: <8811150048.AA12205@wildcat.caltech.edu> Zhaoping Li, a graduate student at Caltech, is looking for a woman to share a room at the NIPS conference in Denver. Anyone who is interested can contact Zhaoping by sending her mail at: zl at aurel.caltech.edu. From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Wed Nov 16 15:22:35 1988 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (Thanasis Kehagias) Date: Wed, 16 Nov 88 15:22:35 EST Subject: the Rochester Connectionist Simulator Message-ID: i believe there is a package of software called the Rochester Connectionist Simulator. i have the following questions about it: 1. will it run on a SUN? 2. what operating system does it need? 3. is it public domain? 4. can i get it by ftp? 5. how big is it? 6. is it source code (in C)? or object code? 7. does it come with documentation? you can mail me the answers; i will summarize and post ... Thanasis From steve at psyche.mit.edu Mon Nov 14 14:45:01 1988 From: steve at psyche.mit.edu (Steve Pinker) Date: Mon, 14 Nov 88 14:45:01 est Subject: Assistant Professor Position at MIT Message-ID: <8811141953.AA18508@ATHENA.MIT.EDU> November 8, 1988 JOB ANNOUNCEMENT The Department of Brain and Cognitive Sciences (formerly the Department of Psychology) of the Massachusetts Institute of Technology is seeking applicants for a nontenured, tenure-track position in Cognitive Science, with a preferred specialization in psycholinguistics, reasoning, or knowledge representation. The candidate must show promise of developing a distinguished research program, preferably one that combines human experimentation with computational modeling or formal analysis, and must be a skilled teacher. He or she will be expected to participate in department's educational programs in cognitive science at the undergraduate and graduate levels, including supervising students' experimental research and offering courses in Cognitive Science or Psycholinguistics. Applications must include a brief cover letter stating the candidate's research and teaching interests, a resume, and at least three letters of recommendation, which must arrive by January 1, 1989. Address applications to: Cognitive Science Search Committee Attn: Steven Pinker, Chair E10-018 Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 From strom at ogccse.ogc.edu Mon Nov 14 18:03:22 1988 From: strom at ogccse.ogc.edu (Dan Hammerstrom) Date: Mon, 14 Nov 88 15:03:22 PST Subject: ReEstablish Connection Message-ID: <8811142303.AA04135@ogccse.OGC.EDU> After having been on the connectionists mailing list for a couple of years, I seem to have been removed recently. No doubt the work of some nasty virus. Could you please put my name back on the list. Thanks. Dan Hammerstrom: strom at cse.ogc.edu From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From rsun at cs.brandeis.edu Thu Nov 17 09:47:27 1988 From: rsun at cs.brandeis.edu (Ron Sun) Date: Thu, 17 Nov 88 09:47:27 est Subject: paper Message-ID: Could you send me a copy of your paper: connectionist state machine? Thank you. Ron Sun Brandeis Univ. CS dept. Waltham, MA 02254 From keeler at mcc.com Tue Nov 22 20:04:48 1988 From: keeler at mcc.com (Jim Keeler) Date: Tue, 22 Nov 88 19:04:48 CST Subject: neural net job opening at MCC Message-ID: <8811230104.AA08091@r2d2.aca.mcc.com> WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is looking for research scientists to join our newly formed neural network research team. We are looking for researchers with strong theoretical skills in Physics, Electrical Engineering or Computer Science (Ph. D. level or above preferred). The research will focus on (non-military), fundamental questions about neural networks including -Scaling and improvement of existing algorithms -Development of new learning algorithms -Temporal pattern recognition and processing -Reverse engineering of biological networks -Optical neural network architectures MCC offers competitive salaries and a very stimulating, academic-like research environment. Contact Jim Keeler at jdk.mcc.com or Haran Boral at haran.mcc.com Or contact Jim Keeler at the NIPS conference in Denver. From watrous at linc.cis.upenn.edu Wed Nov 23 15:53:30 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Wed, 23 Nov 88 15:53:30 EST Subject: Tech Report - Connectionist Speech Recognition Message-ID: <8811232053.AA26325@linc.cis.upenn.edu> The following technical report is available from the Department of Computer and Information Science, University of Pennsylvania: Speech Recognition Using Connectionist Networks Raymond L. Watrous MS-CIS-88-96 LINC LAB 138 Abstract The use of connectionist networks for speech recognition is assessed using a set of representative phonetic discrimination problems. The problems are chosen with respect to the physiological theory of phonetics in order to give broad coverage to the space of articulatory phonetics. Separate network solutions are sought to each phonetic discrimination problem. A connectionist network model called the Temporal Flow Model is defined which consists of simple processing units with single valued outputs interconnected by links of variable weight. The model represents temporal relationships using delay links and permits general patterns of connectivity including feedback. It is argued that the model has properties appropriate for time varying signals such as speech. Methods for selecting network architectures for different recognition problems are presented. The architectures discussed include random networks, minimally structured networks, hand crafted networks and networks automatically generated based on samples of speech data. Networks are trained by modifying their weight parameters so as to minimize the mean squared error between the actual and the desired response of the output units. The desired output unit response is specified by a target function. Training is accomplished by a second order method of iterative nonlinear optimization by gradient descent which incorporates a method for computing the complete gradient of recurrent networks. Network solutions are demonstrated for all eight phonetic discrimination problems for one male speaker. The network solutions are analyzed carefully and are shown in every case to make use of known acoustic phonetic cues. The network solutions vary in the degree to which they make use of context dependent cues to achieve phoneme recognition. The network solutions were tested on data not used for training and achieved an average accuracy of 99.5%. Methods for extending these results to a single network for recognizing the complete phoneme set from continuous speech obtained from different speakers are outlined. It is concluded that acoustic phonetic speech recognition can be accomplished using connectionist networks. +++++++++++++++++++++++++++++++++++++++++++++++++++++ This report is available from: James Lotkowski Technical Report Facility Room 269/Moore Building Computer Science Department University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104-6389 or james at central.cis.upenn.edu Please do not request copies of this report from me. Copies of the report cost approximately $19.00 which covers duplication (300 pages) and postage. I will bring a 'desk copy' to NIPS. As of December 1, I will be affiliated with the University of Toronto. My address will be: Department of Computer Science University of Toronto 10 King's College Road Toronto, Canada M5S 1A4 watrous at ai.toronto.edu From niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK Wed Nov 23 07:42:17 1988 From: niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK (M. Niranjan) Date: Wed, 23 Nov 88 12:42:17 GMT Subject: Radial basis functions etc. Message-ID: <11560.8811231242@dsl.eng.cam.ac.uk> Some recent comments on RBFs that you might find interesting! niranjan ============================== ONE ===================================== Date: Wed, 16 Nov 88 09:41:13 +0200 From: Dario Ringach To: ajr Subject: Comments on TR.25 Thanks a lot for the papers! I'd like to share a few thoughts on TR.25 "Generalising the Nodes of the Error Propagation Network". What are the advantages of choosing radial basis functions (or Gaussian nodes) in *general* discrimination tasks? It seems clear to me, that the results presented in Table 1 are due to the fact that the spectral distribution of steady state vowels can be closely represented by normal/radial distributions. If I have no a-priori information about the distribution of the classes then how can I know which kind of nodes will perform better? I think that in this case the best we can do is to look at the combinatorical problem of how many partitions of the n-dimensional Euclidean space can be obtained using N (proposed shape) boundaries. This is closely related to obtaining the Vapnik-Chervonenkis Dimension of the boundary class. In the case of n-dimensional hyperplanes and hypershperes, both have VC-dimension n+1, so I think there is really no difference in using hyperplanes or hyperspheres in *general* discrimination problems. Don't you agree? Thanks again for the papers! Dario ============================== TWO ===================================== From: M. Niranjan Date: Mon, 21 Nov 88 14:03:16 GMT To: dario at bitnet.techunix Subject: RBF etc Cc: ajr, dr, fallside, idb, jpmg, lwc, mdp, mwhc, niranjan, psts, rwp, tpm, visakan, vsa10 With RBFs of the Gaussian type, the class conditional density function is approximated by a mixture of multiple Gaussians. But the parameters of the mixture are estimated to maximise the discrimination rather than modelling the individual probability densities. > If I have no a-priori information about the distribution of the classes > then how can I know which kind of nodes will perform better? There is no way other than by a set of experiments. In small scale problems, we can probably plot cross sections of the feature space, or even projections of it on a linear discriminant plane and get some rough idea. > problem of how many partitions of the n-dimensional Euclidean space > can be obtained using N (proposed shape) boundaries. It is not how many different partitions; I think our problem in pattern classification is dealing with breakpoints of class boundary. It is this capability that is the power in MLPs (and RBFs). In a two class problem, we still partition the input space into two using N boundary segments (or splines), with N-1 break-points. What I like about RBFs is that you can have a probabilistic interpretation. With standard MLPs this is not very obvious and what happens is more like a functional interpolation. > both have VC-dimension n+1, so I think there is really no difference I dont know what VC-dimension is. Any reference please? Best wishes niranjan ============================ THREE ======================================= Date: Tue, 22 Nov 88 08:22:53 +0200 From: Dario Ringach To: M. Niranjan Subject: Re: RBF etc Thanks for your Re! [some stuff deleted] > > I think that in this case the best we can do is to look at the combinatoric al > > problem of how many partitions of the n-dimensional Euclidean space > > can be obtained using N (proposed shape) boundaries. > > It is not how many different partitions; I think our problem in pattern > classification is dealing with breakpoints of class boundary. It is this > capability that is the power in MLPs (and RBFs). In a two class problem, > we still partition the input space into two using N boundary segments > (or splines), with N-1 break-points. Sure, I agree. But if you address the question of how many hidden units of a determined type you need to classify the input vector into one of N distinct classes, and consider it a rough measure of the complexity of the boundary class proposed for the units, then the problem seems to be the one of partitioning the input space. Note that I don't care about the nature of the class shapes in real world problems, in this case I must agree with you that the issue of breakpoints of the class boundary becomes of real importance. [...] > > I dont know what VC-dimension is. Any reference please? > An earlier draft is "Classifying Learnable Geometric Concepts with the Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86, pp 273-282. But if you don't know what the Valiant's lernability model is take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11), 1984, pp 1134-42. The original article by Vapnik and Chervonenkis is "On the Uniform Convergence of Relative Frequencies of Events to their Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80. More up-to-date papers dealing with the VC-dimension can be found at the Proc. of the first Workshop on Computational Learning Theory, COLT '88, held at MIT last June. --Dario. =========================== THE END ===================================== From pwh at ece-csc.ncsu.edu Fri Nov 25 14:21:29 1988 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Fri, 25 Nov 88 14:21:29 EST Subject: No subject Message-ID: <8811251921.AA29034@ece-csc.ncsu.edu> NEURAL NETWORKS CALL FOR PAPERS IEEE International Conference on Neural Networks June 19-22, 1989 Washington, D.C. The 1989 IEEE International Conference on Neural Networks (ICNN-89) will be held at the Sheraton Washington Hotel in Washington, D.C., USA from June 19-22, 1989. ICNN-89 is the third annual conference in a series devoted to the technology of neurocomputing in its academic, industrial, commercial, consumer, and biomedical engineering aspects. The series is sponsored by the IEEE Technical Activities Board Neural Network Committee, created Spring 1988. ICNN-87 and 88 were huge successes, both in terms of large attendance and high quality of the technical presentations. ICNN-89 continues this tradition. It will be by far the largest and most important neural network meeting of 1989. As in the past, the full text of papers presented orally in the technical sessions will be published in the Conference Proceedings (along with some particularly outstanding papers from the Poster Sessions). The Abstract portions of all poster papers not published in full will also be published in the Proceedings. The Conference Proceedings will be distributed at the registration desk to all regular conference registrants as well as to all student registrants. This gives conference participants the full text of every paper presented in each technical session -- which greatly increases the value of the conference. ICNN is the only major neural network conference in the world to offer this feature. As is now the tradition, ICNN-89 will include a day of tutorials (June 18), the exhibit hall (the neurocomputing industry's primary annual tradeshow), plenary talks, and social events. Mark your calendar today and plan to attend IEEE ICNN-89 -- the definitive annual progress report on the neurocomputing revolution! DEADLINE FOR SUBMISSION OF PAPERS for ICNN-89 is February 1, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Processing -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Optimization -Electronic Neurocomputers -Neural Network Theory & Architectures Papers should be prepared in standard IEEE Conference Proceedings Format, and typed on the special forms provided in the Author's Kit. The Title, Author Name, Affiliation, and Abstract portions of the first page of the paper must be less than a half page in length. Indicate in your cover letter which of the above subject areas you wish your paper included in and whether you wish your paper to be considered for oral presentation, presentation as a poster, or both. For papers with multiple authors, indicate the name and address of the author to whom correspondence should be sent. Papers submitted for oral presentation may, at the referee's discretion, be designated for poster presentation instead, if they feel this would be more appropriate. FULL PAPERS in camera-ready form (1 original and 5 copies) should be submitted to Nomi Feldman, Conference Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, ICNN-89 Conference Coordinator 3770 Tansy Street San Diego, CA 92121 (619) 453-6222 From moody-john at YALE.ARPA Fri Nov 25 17:01:03 1988 From: moody-john at YALE.ARPA (john moody) Date: Fri, 25 Nov 88 17:01:03 EST Subject: car pooling from Denver Airport to NIPS conference hotel Message-ID: <8811252159.AA02309@NEBULA.SUN3.CS.YALE.EDU> I'm arriving at Denver Airport at 10:15 PM Monday night (after the last shuttle leaves the airport for the hotel) and will probably have to rent a Hertz car to get to the hotel. Would anyone out there arriving Monday night like to car pool with me and possibly split the cost of a one-day car rental? (Starving students are welcome to tag along for free.) If interested, please reply ASAP. --John Moody (203)432-6493 ------- From harnad at Princeton.EDU Sun Nov 27 12:35:11 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 27 Nov 88 12:35:11 EST Subject: Explanatory Coherence: BBS Call for Commentators Message-ID: <8811271735.AA08252@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal providing Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. To be considered as a commentator or to suggest other appropriate commentators, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ EXPLANATORY COHERENCE Paul Thagard Cognitive Science Loboratory Princeton University Princeton NJ 08542 Keywords: Connectionist models, artificial intelligence, explanation, coherence, reasoning, decision theory, philosophy of science This paper presents a new computational theory of explanatory coherence that applies both to the acceptance and rejection of scientific hypotheses and to reasoning in everyday life. The theory consists of seven principles that establish relations of local coherence between a hypothesis and other propositions that explain it, are explained by it, or contradict it. An explanatory hypothesis is accepted if it coheres better overall than its competitors. The power of the seven principles is shown by their implementation in a connectionist program called ECHO, which has been applied to such important scientific cases as Lavoisier's argument for oxygen against the phlogiston theory and Darwin's argument for evolution against creationism, and also to cases of legal reasoning. The theory of explanatory coherence has implications for artificial intelligence, psychology, and philosophy. From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 13:55:28 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 13:55:28 EST Subject: No subject Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 14:01:09 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 14:01:09 EST Subject: No subject Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 14:02:50 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 14:02:50 EST Subject: NIPS Speech Workshop Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From honavar at cs.wisc.edu Wed Nov 30 18:23:01 1988 From: honavar at cs.wisc.edu (A Buggy AI Program) Date: Wed, 30 Nov 88 17:23:01 CST Subject: Tech report abstracts Message-ID: <8811302323.AA10286@ai.cs.wisc.edu> The following technical reports are now available. Requests for copies may be sent to: Linda McConnell Technical reports librarian Computer Sciences Department University of Wisconsin-Madison 1210 W. Dayton St. Madison, WI 53706. USA. or by e-mail, to: linda at shorty.cs.wisc.edu PLEASE DO NOT REPLY TO THIS MESSAGE, BUT WRITE TO THE TECH REPORTS LIBRARIAN FOR COPIES. -- Vasant ------------------------------------------------------------------- Computer Sciences TR 793 (also in the proceedings of the 1988 connectionist models summer school, (ed) Sejnowski, Hinton, and Touretzky, Morgan Kauffmann, San Mateo, CA) A NETWORK OF NEURON-LIKE UNITS THAT LEARNS TO PERCEIVE BY GENERATION AS WELL AS REWEIGHTING OF ITS LINKS Vasant Honavar and Leonard Uhr Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706. U.S.A. Abstract Learning in connectionist models typically involves the modif- ication of weights associated with the links between neuron-like units; but the topology of the network does not change. This paper describes a new connectionist learning mechanism for generation in a network of neuron-like elements that enables the network to modify its own topology by growing links and recruiting units as needed (possibly from a pool of available units). A combination of generation and reweighting of links, and appropriate brain-like constraints on network topology, together with regulatory mechan- isms and neuronal structures that monitor the network's performance that enable the network to decide when to generate, is shown capa- ble of discovering, through feedback-aided learning, substantially more powerful, and potentially more practical, networks for percep- tual recognition than those obtained through reweighting alone. The recognition cones model of perception (Uhr1972, Hona- var1987, Uhr1987) is used to demonstrate the feasibility of the approach. Results of simulations of carefully pre-designed recog- nition cones illustrate the usefulness of brain-like topological constraints such as near-neighbor connectivity and converging- diverging heterarchies for the perception of complex objects (such as houses) from digitized TV images. In addition, preliminary results indicate that brain-structured recognition cone networks can successfully learn to recognize simple patterns (such as letters of the alphabet, drawings of objects like cups and apples), using generation-discovery as well as reweighting, whereas systems that attempt to learn using reweighting alone fail to learn. ------------------------------------------------------------------- Computer Sciences TR 805 Experimental Results Indicate that Generation, Local Receptive Fields and Global Convergence Improve Perceptual Learning in Connectionist Networks Vasant Honavar and Leonard Uhr Computer Sciences Department University of Wisconsin-Madison Abstract This paper presents and compares results for three types of connectionist networks: [A] Multi-layered converging networks of neuron-like units, with each unit connected to a small randomly chosen subset of units in the adjacent layers, that learn by re-weighting of their links; [B] Networks of neuron-like units structured into successively larger modules under brain-like topological constraints (such as layered, converging-diverging heterarchies and local recep- tive fields) that learn by re-weighting of their links; [C] Networks with brain-like structures that learn by generation- discovery, which involves the growth of links and recruiting of units in addition to re-weighting of links. Preliminary empirical results from simulation of these net- works for perceptual recognition tasks show large improvements in learning from using brain-like structures (e.g., local receptive fields, global convergence) over networks that lack such structure; further substantial improvements in learning result from the use of generation in addition to reweighting of links. We examine some of the implications of these results for perceptual learning in con- nectionist networks. ------------------------------------------------------------------- From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Tue Nov 1 12:02:32 1988 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (Thanasis Kehagias) Date: Tue, 01 Nov 88 12:02:32 EST Subject: proceedings of Pittsburgh School Message-ID: Are the proceedings of the Pitt. Summer School out already? do people that contributed get a free copy, or do we have to order our own? where from? Thanks, Thanasis From dyer at CS.UCLA.EDU Tue Nov 1 14:24:52 1988 From: dyer at CS.UCLA.EDU (Dr Michael G Dyer) Date: Tue, 1 Nov 88 11:24:52 PST Subject: Technical Report Available Message-ID: <881101.192452z.05183.dyer@lanai.cs.ucla.edu> Symbolic NeuroEngineering for Natural Language Processing: A Multilevel Research Approach. Michael G. Dyer Tech. Rep. UCLA-AI-88-14 Abstract: Natural language processing (NLP) research has been built on the assumption that natural language tasks, such as comprehension, generation, argumentation, acquisition, and question answering, are fundamentally symbolic in nature. Recently, an alternative, subsymbolic paradigm has arisen, inspired by neural mechanisms and based on parallel processing over distributed representations. In this paper, the assumptions of these two paradigms are compared and contrasted, resulting in the observation that each paradigm possesses strengths exactly where the other is weak, and vice versa. This observation serves as a strong motivation for synthesis. A multilevel research approach is proposed, involving the construction of hybrid models, to achieve the long-term goal of mapping high-level cognitive function into neural mechanisms and brain architecture. Four levels of modeling are discussed: knowledge engineering level, localist connectionist level, distributed processing level, and artificial neural systems dynamics level. The two major goals of research at each level are (a) to explore its scope and limits and (b) to find mappings to the levels above and below it. In this paper the capabilities of several NLP models, at each level, are described, along with major research questions remaining to be resolved and major techniques currently being used in an attempt to complete the mappings. Techniques include: (1) forming hybrid systems with spreading activation, thresholds and markers to propagate bindings, (2) using extended back-error propagation in reactive training environments to eliminate microfeature representations, (3) transforming weight matrices into patterns of activation to create virtual semantic networks, (4) using conjunctive codings to implement role bindings, and (5) employing firing patterns and time-varying action potential to represent and associate verbal with visual sequences. (This report to appear in J. Barnden and J. Pollack (Eds.) Advances in Connectionist and Neural Computation Theory. Ablex Publ. An initial version of this report was presented at the AAAI & ONR sponsored Workshop on HIgh-Level Connectionism, held at New Mexico State University, April 9-11, 1988.) For copies of this tech. rep., please send requests to: Valerie at CS.UCLA.EDU or Valerie Aylett 3532 Boelter Hall Computer Science Dept. UCLA, Los Angeles, CA 90024 From jbower at bek-mc.caltech.edu Tue Nov 1 16:38:34 1988 From: jbower at bek-mc.caltech.edu (Jim Bower) Date: Tue, 1 Nov 88 13:38:34 pst Subject: NIPS computer demos Message-ID: <8811012138.AA02291@bek-mc.caltech.edu> Concerning: Software demonstrations at NIPS Authors presenting papers at NIPS are invited to demo any relevant software either at the meeting itself, or during the post-meeting workshop. The organizers have arranged for several IBMs and SUN workstations to be available. For information on the IBMs contact Scott Kirkpatrick at Kirk at IBM.COM. Two SUN 386i workstations will be available. Each will have a 1/4 cartrage tape drive as well as the standard hard floppies. The machines each have 8 MBytes of memory and color monitors. SUN windows as well as X windows (version 11.3) will be supported. The Caltech neural network simulator GENESIS will be available. For further information on the SUN demos contact: John Uhley (Uhley at Caltech.bitnet) From Dave.Touretzky at B.GP.CS.CMU.EDU Tue Nov 1 21:08:54 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Tue, 01 Nov 88 21:08:54 EST Subject: proceedings of Pittsburgh School In-Reply-To: Your message of Tue, 01 Nov 88 12:02:32 -0500. Message-ID: <2673.594439734@DST.BOLTZ.CS.CMU.EDU> The proceedings are due back from the printer this week. Morgan Kaufmann will be mailing you your free copy shortly. Additional copies can be ordered from them for $24.95. They will have a booth at the NIPS conference and will be selling the proceedings there. -- Dave From neural!jsd Wed Nov 2 11:25:35 1988 From: neural!jsd (John Denker) Date: Wed, 2 Nov 88 11:25:35 EST Subject: LMS fails even in separable cases Message-ID: <8811021624.AA19399@neural.UUCP> ((I apologize for possible duplication. I posted this 5 days ago with no discernible effect, so I'm trying again.)) Yes, we noticed that a Least-Mean-Squares (LMS) network even with no hidden units fails to separate some problems. Ben Wittner spoke at the IEEE NIPS meeting in Denver, November 1987, describing !two! failings of this type. He gave an example of a situation in which LMS algorithms (including ordinary versions of back-prop) are metastable, i.e. they fail to separate the data for certain initial configurations of the weights. He went on to describe another case in which the algorithm actually !leaves! the solution region after starting within it. He also pointed out that this can lead to learning sessions in which the categorization performance of back-prop nets (with or without hidden units) is not a monotonically improving function of learning time. Finally, he presented a couple of ways of modifying the algorithm to prevent these problems, and proved a convergence theorem for the modified algorithms. One of the key ideas is something that has been mentioned in several recent postings, namely, to have zero penalty when the training pattern is well-classified or "beyond". We cited Minsky & Papert as well as Duda & Hart; we believe they were more-or-less aware of these bugs in LMS, although they never presented explicit examples of the failure modes. Here is the abstract of our paper in the proceedings, _Neural Information Processing Systems -- Natural and Synthetic_, Denver, Colorado, November 8-12, 1987, Dana Anderson Ed., AIP Press. We posted the abstract back in January '88, but apparently it didn't get through to everybody. Reprints of the whole paper are available. Strategies for Teaching Layered Networks Classification Tasks Ben S. Wittner (1) John S. Denker AT&T Bell Laboratories Holmdel, New Jersey 07733 ABSTRACT: There is a widespread misconception that the delta-rule is in some sense guaranteed to work on networks without hidden units. As previous authors have mentioned, there is no such guarantee for classification tasks. We will begin by presenting explicit counter-examples illustrating two different interesting ways in which the delta rule can fail. We go on to provide conditions which do guarantee that gradient descent will successfully train networks without hidden units to perform two-category classification tasks. We discuss the generalization of our ideas to networks with hidden units and to multi-category classification tasks. (1) Currently at NYNEX Science and Technology / 500 Westchester Ave. White Plains, NY 10604 From subutai at cougar.ccsr.uiuc.edu Thu Nov 3 14:19:20 1988 From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Date: Thu, 3 Nov 88 13:19:20 CST Subject: Scaling and Generalization in Neural Networks Message-ID: <8811031919.AA01584@cougar.ccsr.uiuc.edu> The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. From emo at iuvax.cs.indiana.edu Thu Nov 3 22:32:31 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:32:31 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 19:20:08 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:33:15 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:33:15 EST Subject: Mail problems... Message-ID: #/bin/cat <From gold::gold::wins% Thu Nov 3 20:29:22 1988 Date: Thu, 3 Nov 88 20:29:21 EST From: gold::gold::wins% (subutai at cougar.ccsr.uiuc.edu) To: mm Subject: Scaling and Generalization in Neural Networks Return-Path: Received: from Q.CS.CMU.EDU by gold.bacs.indiana.edu with SMTP ; Thu, 3 Nov 88 20:30:56 EST Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:34:29 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:34:29 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 21:12:43 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From emo at iuvax.cs.indiana.edu Thu Nov 3 22:32:24 1988 From: emo at iuvax.cs.indiana.edu (Eric Ost) Date: Thu, 3 Nov 88 22:32:24 EST Subject: Mail problems... Message-ID: #/bin/cat <From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 19:13:36 1988 Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP (5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST Received: by cougar.ccsr.uiuc.edu (3.2/9.7) id AA01584; Thu, 3 Nov 88 13:19:20 CST Date: Thu, 3 Nov 88 13:19:20 CST From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad) Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu> To: connectionists at c.cs.cmu.edu Subject: Scaling and Generalization in Neural Networks The following Technical Report is avaiable. For a copy please send requests to subutai at complex.ccsr.uiuc.edu or: Subutai Ahmad Center for Complex Systems Research, 508 S. 6th St. Champaign, IL 61820 USA A Study of Scaling and Generalization in Neural Networks Subutai Ahmad Technical Report UIUCDCS-R-88-1454 Abstract The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to best represent the input, and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, although as yet there are few rigorous results. In this paper we present a careful study of generalization in the simplest possible case--perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved. For a network with d input units trained on a set of S random training patterns, we find that the failure rate, the fraction of misclassified test instances, falls off exponentially as a function of S. In addition, with S = alpha d, for fixed values of alpha, our studies show that the failure rate remains constant independent of d. This implies that the number of training patterns required to achieve a given performance level scales linearly with d. We also discuss various ways in which this performance can be altered, with an emphasis on the effects of the input representation and the specific patterns used to train the network. We demonstrate a small change in the representation that can lead to a jump in the performance level. We also show that the most useful training instances are the ones closest to the separating surface. With a training set consisting only of such ``borderline'' training patterns, the failure rate decreases faster than exponentially, and for a given training set size, the performance of the network is significantly better than when trained with random patterns. Finally, we compare the effects of the initial state of the network and the training patterns on the final state. *** end message *** From unido!gmdzi!joerg at uunet.UU.NET Fri Nov 4 10:07:18 1988 From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann) Date: Fri, 4 Nov 88 14:07:18 -0100 Subject: Tech. report available Message-ID: <8811041307.AA07221@gmdzi.UUCP> Detection of Minimal Microfeatures by Internal Feedback J. Kindermann & A. Linden e-mail: joerg at gmdzi, al at gmdzi Gesellschaft fuer Mathematik und Datenverarbeitung mbH Postfach 1240 D-5205 Sankt Augustin 1 Abstract We define the notion of minimal microfeatures and introduce a new method of internal feedback for multilayer networks. Error signals are used to modify the input of a net. When combined with input decay, internal feedback allows the detection of sets of minimal microfeatures, i.e. those subpatterns which the network actually uses for discrimination. Additional noise on the training data increases the number of minimal microfeatures for a given pattern. The detection of minimal microfeatures is a first step towards a subsymbolic system with the capability of self-explanation. The paper provides examples from the domain of letter recognition. Keywords: minimal microfeatures, neural networks, parallel distributed processing, backpropagation, self-explanation. ************************ If you would like a copy of the above technical report, please send e-mail to joerg at gmdzi.uucp or write to: Dr. Joerg Kindermann Gesellschaft fuer Mathematik und Datenverarbeitung Schloss Birlinghoven Postfach 1240 D-5205 St. Augustin 1 WEST GERMANY Please remember: no reply or Cc to connectionists at .. ************************* From terry at cs.jhu.edu Tue Nov 8 19:41:15 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Tue, 8 Nov 88 19:41:15 est Subject: 88 Connectionist Proceedings Message-ID: <8811090041.AA18238@crabcake.cs.jhu.edu> NOW AVAILABLE: Proceedings of the 1988 Connectionist Models Summer School, edited by David Touretzky, Geoffrey Hinton, and Terrence Sejnowski. Available from: Morgan Kaufmann Publishers, Inc. Order Fulfillment Center P.O. Box 50490 Palo Alto, CA 94303-9953 tel. 415-965-4081 Cost is $24.95 plus $2.25 postage and handling ($4.00 for foreign orders.) For each additional volume ordered, increase postage by $1.00 (foreign, $3.00). Enclose full payment by check or money order. California residents please add sales tax. Terry ----- From David.Servan-Schreiber at A.GP.CS.CMU.EDU Wed Nov 9 00:05:00 1988 From: David.Servan-Schreiber at A.GP.CS.CMU.EDU (David.Servan-Schreiber@A.GP.CS.CMU.EDU) Date: Wed, 09 Nov 88 00:05:00 EST Subject: Technical report announcement Message-ID: <27375.595055100@A.GP.CS.CMU.EDU> The following technical report is available upon request: ENCODING SEQUENTIAL STRUCTURE IN SIMPLE RECURRENT NETWORKS David Servan-Schreiber, Axel Cleeremans & James L. McClelland CMU-CS-88-183 We explore a network architecture introduced by Elman (1988) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time- step t-1, together with element t, to predict element t+1. When the network is trained with strings from a particular finite- state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finite-state recognizer. We explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. Send surface mail to : Department of Computer Science Carnegie Mellon University Pittsburgh, PA. 15213-3890 U.S.A or electronic mail to Ms. Terina Jett: Jett at CS.CMU.EDU (ARPA net) Ask for technical report CMU-CS-88-183. From pratt at paul.rutgers.edu Wed Nov 9 14:37:50 1988 From: pratt at paul.rutgers.edu (Lorien Y. Pratt) Date: Wed, 9 Nov 88 14:37:50 EST Subject: E. Tzakou to speak on ALOPEX Message-ID: <8811091937.AA00231@paul.rutgers.edu> Fall, 1988 Neural Networks Colloquium Series at Rutgers ALOPEX: Another optimization method ----------------------------------- E. Tzanakou Rutgers University Biomedical Engineering Room 705 Hill center, Busch Campus Friday November 18, 1988 at 11:10 am Refreshments served before the talk Abstract The ALOPEX process was developed in the early 70's by Harth and Tzanakou as an automated method of mapping Visual Receptive Fields in the Visual Pathway of animals. Since then it has been used as a "universal" optimization method that lends itself to a number of optimization problems. The method uses a cost function that is calculated by the simultaneous convergence of a large number of parameters. It is iterative and stochastic in nature and has the tendency to avoid local extrema. Computing times largely depend on the number of iterations required for convergence and on times required to compute the cost function. As such they are problem dependent. On the other hand ALOPEX has a unique inherent feature i.e it can run in a parallel manner by which the computing times can be reduced. Several applications of the method in physical, physiological and pattern recognition problems will be discussed. From dewan at paul.rutgers.edu Wed Nov 9 18:35:46 1988 From: dewan at paul.rutgers.edu (Hasanat Dewan) Date: Wed, 9 Nov 88 18:35:46 EST Subject: Dave Handelman at Sarnoff on knowledge-based + nnets for robots Message-ID: <8811092335.AA05411@surfers.rutgers.edu> From terry at cs.jhu.edu Wed Nov 9 19:06:29 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Wed, 9 Nov 88 19:06:29 est Subject: Frontiers in Neuroscience Message-ID: <8811100006.AA28449@crabcake.cs.jhu.edu> The latest issue of Science (4 November) has a special section on Frontiers in Neuroscience. The cover is a spectacular image of a Purkinje cell by Dave Tank. Four of the major reviews in the issue make contact with network modeling: Tom Brown et al. on Long-Term Synatpic Potentiation; Steve Lisberger on The Neural Basis for Learning of Simple Motor Skills; Steve Wise and Bob Desimone on Insights into Seeing and Grasping; and Pat Churchland and Terry Sejnowski on Perspectives on Cognitive Neuroscience. See also the letter by Dave Tank et al. on Spatially Resolved Calcium Dynamics of Mammalian Purkinje Cells in Cerebellar Slice. This issue was timed to coincide with the Annual Meeting of the Society for Neuroscience in Toronto next week. Terry ----- From rba at flash.bellcore.com Thu Nov 10 16:52:24 1988 From: rba at flash.bellcore.com (Robert B Allen) Date: Thu, 10 Nov 88 16:52:24 EST Subject: No subject Message-ID: <8811102152.AA21305@flash.bellcore.com> Subject: Report Available - Connectionist State Machines Connectionist State Machines Robert B. Allen Bellcore, November 1988 Performance of sequential adaptive networks on a number of tasks was explored. For example the ability to respond to continuous sequences was demonstrated first with a network which was trained to flag a given subsequence and, in a second study, to generate responses to transitions conditional upon previous transitions. Another set of studies demonstrated that the networks are able to recognize legal strings drawn from simple context-free grammars and regular expressions. Finally, sequential networks were also shown to be able to be trained to generate long strings. In some cases, adaptive schedules were introduced to gradually extend the network's processing of strings. Contact: rba at bllcore.com Robert B. Allen 2A-367 Bellcore Morristown, NJ 07960-1910 From harnad at confidence.Princeton.EDU Fri Nov 11 02:32:57 1988 From: harnad at confidence.Princeton.EDU (Stevan Harnad) Date: Fri, 11 Nov 88 02:32:57 EST Subject: BBS Call For Commentators: The Tag Assignment Problem Message-ID: <8811110732.AA00839@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal providing Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. To be considered as a commentator or to suggest other appropriate commentators, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ A SOLUTION TO THE TAG-ASSIGNMENT PROBLEM FOR NEURAL NETWORKS Gary W. Strong Bruce A. Whitehead College of Information Studies Computer Science Program Drexel University University of Tennessee Space Institute Philadelphia, PA 19104 USA Tullahoma, TN 37388 USA ABSTRACT: Purely parallel neural networks can model object recognition in brief displays -- the same conditions under which illusory conjunctions (the incorrect combination of features into perceived objects in a stimulus array) have been demonstrated empirically (Treisman & Gelade 1980; Treisman 1986). Correcting errors of illusory conjunction is the "tag-assignment" problem for a purely parallel processor: the problem of assigning a spatial tag to nonspatial features, feature combinations and objects. This problem must be solved to model human object recognition over a longer time scale. A neurally plausible model has been constructed which simulates both the parallel processes that may give rise to illusory conjunctions and the serial processes that may solve the tag-assignment problem in normal perception. One component of the model extracts pooled features and another provides attentional tags that can correct illusory conjunctions. Our approach addresses two questions: (i) How can objects be identified from simultaneously attended features in a parallel, distributed representation? (ii) How can the spatial selection requirements of such an attentional process be met by a separation of pathways between spatial and nonspatial processing? Analysis of these questions yields a neurally plausible simulation model of tag assignment, based on synchronization of neural activity for features within a spatial focus of attention. KEYWORDS: affordance; attention; connectionist network; eye movements; illusory conjunction; neural network; object recognition; retinotopic representations; saccades; spatial localization From rr%eusip.edinburgh.ac.uk at NSS.Cs.Ucl.AC.UK Fri Nov 11 06:29:38 1988 From: rr%eusip.edinburgh.ac.uk at NSS.Cs.Ucl.AC.UK (Richard Rohwer) Date: Fri, 11 Nov 88 11:29:38 GMT Subject: seperability and unbalanced data discussion Message-ID: <11289.8811111129@eusip.ed.ac.uk> In a close inspection of convergence ailments afflicting a multilayer net, I found that the problem boiled down to a layer which needed to learn the separable AND function, but wasn't. So I had a close look at the LMS error function for AND, in terms of the the weights from each of the two inputs, the bias weight, and the multiplicities of each of the 4 exemplars in the truth table. It turns out that the error can not be made exactly 0 (with finite weights), so minimization of the error involves a tradeoff between the contributions of the 4 exemplars, and this tradeoff is strongly influenced by the multiplicities. It is not difficult to find the minimum analytically in this problem, so I was able to verify that with my highly unbalanced training data, the actual minimum was precisely where the LMS algorithm had terminated, miles away from a reasonable solution for AND. I also found that balanced data puts the minimum where it "belongs". The relative importance of the different exemplars in the LMS error function runs as the square root of the ratio of their multiplicities. So I solved my particular problem by turning to a quartic error function, for which it is the 4th root of this ratio that matters. (The p-norm, p-th root of the sum of the p-th powers, approaches the MAX norm as p approaches infinity, and 4 is much closer to infinity than 2.) ---Richard Rohwer, CSTR, Edinburgh From jose at tractatus.bellcore.com Sun Nov 13 13:01:41 1988 From: jose at tractatus.bellcore.com (Stephen J Hanson) Date: Sun, 13 Nov 88 13:01:41 EST Subject: seperability and unbalanced data discussion Message-ID: <8811131801.AA08242@tractatus.bellcore.com> See Hanson S. J. & Burr, D., Minkowski-r Backpropagation: Learning in Connectionist Networks with non-euclidean error metrics, in D. Anderson, Neural Information Processing: Natural and Synthetic, AIP, 1988. We look at some similar cases.. Steve From russ%yummy at gateway.mitre.org Mon Nov 14 09:36:19 1988 From: russ%yummy at gateway.mitre.org (russ%yummy@gateway.mitre.org) Date: Mon, 14 Nov 88 09:36:19 EST Subject: Paper Request In-Reply-To: Robert B Allen's message of Thu, 10 Nov 88 16:52:24 EST <8811102152.AA21305@flash.bellcore.com> Message-ID: <8811141436.AA11255@baklava.mitre.org> Please send me a copy of the paper: Connectionist State Machines Robert B. Allen Bellcore, November 1988 Thanks, Russ. ARPA: russ%yummy at gateway.mitre.org Russell Leighton MITRE Signal Processing Lab 7525 Colshire Dr. McLean, Va. 22102 USA From sylvie at wildcat.caltech.edu Mon Nov 14 19:48:46 1988 From: sylvie at wildcat.caltech.edu (sylvie ryckebusch) Date: Mon, 14 Nov 88 16:48:46 pst Subject: Roommate for NIPS 88 Message-ID: <8811150048.AA12205@wildcat.caltech.edu> Zhaoping Li, a graduate student at Caltech, is looking for a woman to share a room at the NIPS conference in Denver. Anyone who is interested can contact Zhaoping by sending her mail at: zl at aurel.caltech.edu. From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU Wed Nov 16 15:22:35 1988 From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (Thanasis Kehagias) Date: Wed, 16 Nov 88 15:22:35 EST Subject: the Rochester Connectionist Simulator Message-ID: i believe there is a package of software called the Rochester Connectionist Simulator. i have the following questions about it: 1. will it run on a SUN? 2. what operating system does it need? 3. is it public domain? 4. can i get it by ftp? 5. how big is it? 6. is it source code (in C)? or object code? 7. does it come with documentation? you can mail me the answers; i will summarize and post ... Thanasis From steve at psyche.mit.edu Mon Nov 14 14:45:01 1988 From: steve at psyche.mit.edu (Steve Pinker) Date: Mon, 14 Nov 88 14:45:01 est Subject: Assistant Professor Position at MIT Message-ID: <8811141953.AA18508@ATHENA.MIT.EDU> November 8, 1988 JOB ANNOUNCEMENT The Department of Brain and Cognitive Sciences (formerly the Department of Psychology) of the Massachusetts Institute of Technology is seeking applicants for a nontenured, tenure-track position in Cognitive Science, with a preferred specialization in psycholinguistics, reasoning, or knowledge representation. The candidate must show promise of developing a distinguished research program, preferably one that combines human experimentation with computational modeling or formal analysis, and must be a skilled teacher. He or she will be expected to participate in department's educational programs in cognitive science at the undergraduate and graduate levels, including supervising students' experimental research and offering courses in Cognitive Science or Psycholinguistics. Applications must include a brief cover letter stating the candidate's research and teaching interests, a resume, and at least three letters of recommendation, which must arrive by January 1, 1989. Address applications to: Cognitive Science Search Committee Attn: Steven Pinker, Chair E10-018 Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 From strom at ogccse.ogc.edu Mon Nov 14 18:03:22 1988 From: strom at ogccse.ogc.edu (Dan Hammerstrom) Date: Mon, 14 Nov 88 15:03:22 PST Subject: ReEstablish Connection Message-ID: <8811142303.AA04135@ogccse.OGC.EDU> After having been on the connectionists mailing list for a couple of years, I seem to have been removed recently. No doubt the work of some nasty virus. Could you please put my name back on the list. Thanks. Dan Hammerstrom: strom at cse.ogc.edu From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From netlist at psych.Stanford.EDU Thu Nov 17 20:06:02 1988 From: netlist at psych.Stanford.EDU (Mark Gluck) Date: Thu, 17 Nov 88 17:06:02 PST Subject: Stanford Adaptive Networks Colloquium Message-ID: Stanford University Interdisciplinary Colloquium Series: Adaptive Networks and their Applications Nov. 22nd (Tuesday, 3:15pm) ************************************************************************** Toward a model of speech acquisition: Supervised learning and systems with excess degrees of freedom MICHAEL JORDAN E10-034C Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 ************************************************************************** Abstract The acquisition of speech production is an interesting domain for the development of connectionist learning methods. In this talk, I will focus on a particular component of the speech learning problem, namely, that of finding an inverse of the function that relates articulatory events to perceptual events. A problem for the learning of such an inverse is that the forward function is many-to-one and nonlinear. That is, there are many possible target vectors corresponding to each perceptual input, but the average target is not in general a solution. I will argue that this problem is best resolved if targets are specified implicitly with sets of constraints, rather than as particular vectors (as in direct inverse system identification). Two classes of constraints are distinguished---paradigmatic constraints, which implicitly specify inverse images in articulatory space, and syntagmatic constraints, which define relationships between outputs produced at different points in time. (The latter include smoothness constraints on articulatory representations, and distinctiveness constraints on perceptual representations). I will discuss how the interactions between these classes of constraints may account for two kinds of variability in speech: coarticulation and historical change. ************************************************************************** Location: Room 380-380W, which can be reached through the lower level between the Psychology and Mathematical Sciences buildings. Technical Level: These talks will be technically oriented and are intended for persons actively working in related areas. They are not intended for the newcomer seeking general introductory material. Information: To be added to the network mailing list, netmail to netlist at psych.stanford.edu For additional information, contact Mark Gluck (gluck at psych.stanford.edu). Upcomming talks: Dec. 6: Ralph Linsker (IBM) Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ. From rsun at cs.brandeis.edu Thu Nov 17 09:47:27 1988 From: rsun at cs.brandeis.edu (Ron Sun) Date: Thu, 17 Nov 88 09:47:27 est Subject: paper Message-ID: Could you send me a copy of your paper: connectionist state machine? Thank you. Ron Sun Brandeis Univ. CS dept. Waltham, MA 02254 From keeler at mcc.com Tue Nov 22 20:04:48 1988 From: keeler at mcc.com (Jim Keeler) Date: Tue, 22 Nov 88 19:04:48 CST Subject: neural net job opening at MCC Message-ID: <8811230104.AA08091@r2d2.aca.mcc.com> WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is looking for research scientists to join our newly formed neural network research team. We are looking for researchers with strong theoretical skills in Physics, Electrical Engineering or Computer Science (Ph. D. level or above preferred). The research will focus on (non-military), fundamental questions about neural networks including -Scaling and improvement of existing algorithms -Development of new learning algorithms -Temporal pattern recognition and processing -Reverse engineering of biological networks -Optical neural network architectures MCC offers competitive salaries and a very stimulating, academic-like research environment. Contact Jim Keeler at jdk.mcc.com or Haran Boral at haran.mcc.com Or contact Jim Keeler at the NIPS conference in Denver. From watrous at linc.cis.upenn.edu Wed Nov 23 15:53:30 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Wed, 23 Nov 88 15:53:30 EST Subject: Tech Report - Connectionist Speech Recognition Message-ID: <8811232053.AA26325@linc.cis.upenn.edu> The following technical report is available from the Department of Computer and Information Science, University of Pennsylvania: Speech Recognition Using Connectionist Networks Raymond L. Watrous MS-CIS-88-96 LINC LAB 138 Abstract The use of connectionist networks for speech recognition is assessed using a set of representative phonetic discrimination problems. The problems are chosen with respect to the physiological theory of phonetics in order to give broad coverage to the space of articulatory phonetics. Separate network solutions are sought to each phonetic discrimination problem. A connectionist network model called the Temporal Flow Model is defined which consists of simple processing units with single valued outputs interconnected by links of variable weight. The model represents temporal relationships using delay links and permits general patterns of connectivity including feedback. It is argued that the model has properties appropriate for time varying signals such as speech. Methods for selecting network architectures for different recognition problems are presented. The architectures discussed include random networks, minimally structured networks, hand crafted networks and networks automatically generated based on samples of speech data. Networks are trained by modifying their weight parameters so as to minimize the mean squared error between the actual and the desired response of the output units. The desired output unit response is specified by a target function. Training is accomplished by a second order method of iterative nonlinear optimization by gradient descent which incorporates a method for computing the complete gradient of recurrent networks. Network solutions are demonstrated for all eight phonetic discrimination problems for one male speaker. The network solutions are analyzed carefully and are shown in every case to make use of known acoustic phonetic cues. The network solutions vary in the degree to which they make use of context dependent cues to achieve phoneme recognition. The network solutions were tested on data not used for training and achieved an average accuracy of 99.5%. Methods for extending these results to a single network for recognizing the complete phoneme set from continuous speech obtained from different speakers are outlined. It is concluded that acoustic phonetic speech recognition can be accomplished using connectionist networks. +++++++++++++++++++++++++++++++++++++++++++++++++++++ This report is available from: James Lotkowski Technical Report Facility Room 269/Moore Building Computer Science Department University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104-6389 or james at central.cis.upenn.edu Please do not request copies of this report from me. Copies of the report cost approximately $19.00 which covers duplication (300 pages) and postage. I will bring a 'desk copy' to NIPS. As of December 1, I will be affiliated with the University of Toronto. My address will be: Department of Computer Science University of Toronto 10 King's College Road Toronto, Canada M5S 1A4 watrous at ai.toronto.edu From niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK Wed Nov 23 07:42:17 1988 From: niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK (M. Niranjan) Date: Wed, 23 Nov 88 12:42:17 GMT Subject: Radial basis functions etc. Message-ID: <11560.8811231242@dsl.eng.cam.ac.uk> Some recent comments on RBFs that you might find interesting! niranjan ============================== ONE ===================================== Date: Wed, 16 Nov 88 09:41:13 +0200 From: Dario Ringach To: ajr Subject: Comments on TR.25 Thanks a lot for the papers! I'd like to share a few thoughts on TR.25 "Generalising the Nodes of the Error Propagation Network". What are the advantages of choosing radial basis functions (or Gaussian nodes) in *general* discrimination tasks? It seems clear to me, that the results presented in Table 1 are due to the fact that the spectral distribution of steady state vowels can be closely represented by normal/radial distributions. If I have no a-priori information about the distribution of the classes then how can I know which kind of nodes will perform better? I think that in this case the best we can do is to look at the combinatorical problem of how many partitions of the n-dimensional Euclidean space can be obtained using N (proposed shape) boundaries. This is closely related to obtaining the Vapnik-Chervonenkis Dimension of the boundary class. In the case of n-dimensional hyperplanes and hypershperes, both have VC-dimension n+1, so I think there is really no difference in using hyperplanes or hyperspheres in *general* discrimination problems. Don't you agree? Thanks again for the papers! Dario ============================== TWO ===================================== From: M. Niranjan Date: Mon, 21 Nov 88 14:03:16 GMT To: dario at bitnet.techunix Subject: RBF etc Cc: ajr, dr, fallside, idb, jpmg, lwc, mdp, mwhc, niranjan, psts, rwp, tpm, visakan, vsa10 With RBFs of the Gaussian type, the class conditional density function is approximated by a mixture of multiple Gaussians. But the parameters of the mixture are estimated to maximise the discrimination rather than modelling the individual probability densities. > If I have no a-priori information about the distribution of the classes > then how can I know which kind of nodes will perform better? There is no way other than by a set of experiments. In small scale problems, we can probably plot cross sections of the feature space, or even projections of it on a linear discriminant plane and get some rough idea. > problem of how many partitions of the n-dimensional Euclidean space > can be obtained using N (proposed shape) boundaries. It is not how many different partitions; I think our problem in pattern classification is dealing with breakpoints of class boundary. It is this capability that is the power in MLPs (and RBFs). In a two class problem, we still partition the input space into two using N boundary segments (or splines), with N-1 break-points. What I like about RBFs is that you can have a probabilistic interpretation. With standard MLPs this is not very obvious and what happens is more like a functional interpolation. > both have VC-dimension n+1, so I think there is really no difference I dont know what VC-dimension is. Any reference please? Best wishes niranjan ============================ THREE ======================================= Date: Tue, 22 Nov 88 08:22:53 +0200 From: Dario Ringach To: M. Niranjan Subject: Re: RBF etc Thanks for your Re! [some stuff deleted] > > I think that in this case the best we can do is to look at the combinatoric al > > problem of how many partitions of the n-dimensional Euclidean space > > can be obtained using N (proposed shape) boundaries. > > It is not how many different partitions; I think our problem in pattern > classification is dealing with breakpoints of class boundary. It is this > capability that is the power in MLPs (and RBFs). In a two class problem, > we still partition the input space into two using N boundary segments > (or splines), with N-1 break-points. Sure, I agree. But if you address the question of how many hidden units of a determined type you need to classify the input vector into one of N distinct classes, and consider it a rough measure of the complexity of the boundary class proposed for the units, then the problem seems to be the one of partitioning the input space. Note that I don't care about the nature of the class shapes in real world problems, in this case I must agree with you that the issue of breakpoints of the class boundary becomes of real importance. [...] > > I dont know what VC-dimension is. Any reference please? > An earlier draft is "Classifying Learnable Geometric Concepts with the Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86, pp 273-282. But if you don't know what the Valiant's lernability model is take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11), 1984, pp 1134-42. The original article by Vapnik and Chervonenkis is "On the Uniform Convergence of Relative Frequencies of Events to their Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80. More up-to-date papers dealing with the VC-dimension can be found at the Proc. of the first Workshop on Computational Learning Theory, COLT '88, held at MIT last June. --Dario. =========================== THE END ===================================== From pwh at ece-csc.ncsu.edu Fri Nov 25 14:21:29 1988 From: pwh at ece-csc.ncsu.edu (Paul Hollis) Date: Fri, 25 Nov 88 14:21:29 EST Subject: No subject Message-ID: <8811251921.AA29034@ece-csc.ncsu.edu> NEURAL NETWORKS CALL FOR PAPERS IEEE International Conference on Neural Networks June 19-22, 1989 Washington, D.C. The 1989 IEEE International Conference on Neural Networks (ICNN-89) will be held at the Sheraton Washington Hotel in Washington, D.C., USA from June 19-22, 1989. ICNN-89 is the third annual conference in a series devoted to the technology of neurocomputing in its academic, industrial, commercial, consumer, and biomedical engineering aspects. The series is sponsored by the IEEE Technical Activities Board Neural Network Committee, created Spring 1988. ICNN-87 and 88 were huge successes, both in terms of large attendance and high quality of the technical presentations. ICNN-89 continues this tradition. It will be by far the largest and most important neural network meeting of 1989. As in the past, the full text of papers presented orally in the technical sessions will be published in the Conference Proceedings (along with some particularly outstanding papers from the Poster Sessions). The Abstract portions of all poster papers not published in full will also be published in the Proceedings. The Conference Proceedings will be distributed at the registration desk to all regular conference registrants as well as to all student registrants. This gives conference participants the full text of every paper presented in each technical session -- which greatly increases the value of the conference. ICNN is the only major neural network conference in the world to offer this feature. As is now the tradition, ICNN-89 will include a day of tutorials (June 18), the exhibit hall (the neurocomputing industry's primary annual tradeshow), plenary talks, and social events. Mark your calendar today and plan to attend IEEE ICNN-89 -- the definitive annual progress report on the neurocomputing revolution! DEADLINE FOR SUBMISSION OF PAPERS for ICNN-89 is February 1, 1989. Papers of 8 pages or less are solicited in the following areas: -Real World Applications -Associative Memory -Supervised Learning Theory -Image Processing -Reinforcement Learning Theory -Self-Organization -Robotics and Control -Neurobiological Models -Optical Neurocomputers -Vision -Optimization -Electronic Neurocomputers -Neural Network Theory & Architectures Papers should be prepared in standard IEEE Conference Proceedings Format, and typed on the special forms provided in the Author's Kit. The Title, Author Name, Affiliation, and Abstract portions of the first page of the paper must be less than a half page in length. Indicate in your cover letter which of the above subject areas you wish your paper included in and whether you wish your paper to be considered for oral presentation, presentation as a poster, or both. For papers with multiple authors, indicate the name and address of the author to whom correspondence should be sent. Papers submitted for oral presentation may, at the referee's discretion, be designated for poster presentation instead, if they feel this would be more appropriate. FULL PAPERS in camera-ready form (1 original and 5 copies) should be submitted to Nomi Feldman, Conference Coordinator, at the address below. For more details, or to request your IEEE Author's Kit, call or write: Nomi Feldman, ICNN-89 Conference Coordinator 3770 Tansy Street San Diego, CA 92121 (619) 453-6222 From moody-john at YALE.ARPA Fri Nov 25 17:01:03 1988 From: moody-john at YALE.ARPA (john moody) Date: Fri, 25 Nov 88 17:01:03 EST Subject: car pooling from Denver Airport to NIPS conference hotel Message-ID: <8811252159.AA02309@NEBULA.SUN3.CS.YALE.EDU> I'm arriving at Denver Airport at 10:15 PM Monday night (after the last shuttle leaves the airport for the hotel) and will probably have to rent a Hertz car to get to the hotel. Would anyone out there arriving Monday night like to car pool with me and possibly split the cost of a one-day car rental? (Starving students are welcome to tag along for free.) If interested, please reply ASAP. --John Moody (203)432-6493 ------- From harnad at Princeton.EDU Sun Nov 27 12:35:11 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Sun, 27 Nov 88 12:35:11 EST Subject: Explanatory Coherence: BBS Call for Commentators Message-ID: <8811271735.AA08252@psycho.Princeton.EDU> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international, interdisciplinary journal providing Open Peer Commentary on important and controversial current research in the biobehavioral and cognitive sciences. To be considered as a commentator or to suggest other appropriate commentators, please send email to: harnad at confidence.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] ____________________________________________________________________ EXPLANATORY COHERENCE Paul Thagard Cognitive Science Loboratory Princeton University Princeton NJ 08542 Keywords: Connectionist models, artificial intelligence, explanation, coherence, reasoning, decision theory, philosophy of science This paper presents a new computational theory of explanatory coherence that applies both to the acceptance and rejection of scientific hypotheses and to reasoning in everyday life. The theory consists of seven principles that establish relations of local coherence between a hypothesis and other propositions that explain it, are explained by it, or contradict it. An explanatory hypothesis is accepted if it coheres better overall than its competitors. The power of the seven principles is shown by their implementation in a connectionist program called ECHO, which has been applied to such important scientific cases as Lavoisier's argument for oxygen against the phlogiston theory and Darwin's argument for evolution against creationism, and also to cases of legal reasoning. The theory of explanatory coherence has implications for artificial intelligence, psychology, and philosophy. From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 13:55:28 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 13:55:28 EST Subject: No subject Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 14:01:09 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 14:01:09 EST Subject: No subject Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From John.Hampshire at SPEECH2.CS.CMU.EDU Mon Nov 28 14:02:50 1988 From: John.Hampshire at SPEECH2.CS.CMU.EDU (John.Hampshire@SPEECH2.CS.CMU.EDU) Date: Mon, 28 Nov 88 14:02:50 EST Subject: NIPS Speech Workshop Message-ID: This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop ------------------ ----Dec. 1, eve: Overview. All Groups meet. --------------------------- ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. --------------------------------------------------------- From honavar at cs.wisc.edu Wed Nov 30 18:23:01 1988 From: honavar at cs.wisc.edu (A Buggy AI Program) Date: Wed, 30 Nov 88 17:23:01 CST Subject: Tech report abstracts Message-ID: <8811302323.AA10286@ai.cs.wisc.edu> The following technical reports are now available. Requests for copies may be sent to: Linda McConnell Technical reports librarian Computer Sciences Department University of Wisconsin-Madison 1210 W. Dayton St. Madison, WI 53706. USA. or by e-mail, to: linda at shorty.cs.wisc.edu PLEASE DO NOT REPLY TO THIS MESSAGE, BUT WRITE TO THE TECH REPORTS LIBRARIAN FOR COPIES. -- Vasant ------------------------------------------------------------------- Computer Sciences TR 793 (also in the proceedings of the 1988 connectionist models summer school, (ed) Sejnowski, Hinton, and Touretzky, Morgan Kauffmann, San Mateo, CA) A NETWORK OF NEURON-LIKE UNITS THAT LEARNS TO PERCEIVE BY GENERATION AS WELL AS REWEIGHTING OF ITS LINKS Vasant Honavar and Leonard Uhr Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706. U.S.A. Abstract Learning in connectionist models typically involves the modif- ication of weights associated with the links between neuron-like units; but the topology of the network does not change. This paper describes a new connectionist learning mechanism for generation in a network of neuron-like elements that enables the network to modify its own topology by growing links and recruiting units as needed (possibly from a pool of available units). A combination of generation and reweighting of links, and appropriate brain-like constraints on network topology, together with regulatory mechan- isms and neuronal structures that monitor the network's performance that enable the network to decide when to generate, is shown capa- ble of discovering, through feedback-aided learning, substantially more powerful, and potentially more practical, networks for percep- tual recognition than those obtained through reweighting alone. The recognition cones model of perception (Uhr1972, Hona- var1987, Uhr1987) is used to demonstrate the feasibility of the approach. Results of simulations of carefully pre-designed recog- nition cones illustrate the usefulness of brain-like topological constraints such as near-neighbor connectivity and converging- diverging heterarchies for the perception of complex objects (such as houses) from digitized TV images. In addition, preliminary results indicate that brain-structured recognition cone networks can successfully learn to recognize simple patterns (such as letters of the alphabet, drawings of objects like cups and apples), using generation-discovery as well as reweighting, whereas systems that attempt to learn using reweighting alone fail to learn. ------------------------------------------------------------------- Computer Sciences TR 805 Experimental Results Indicate that Generation, Local Receptive Fields and Global Convergence Improve Perceptual Learning in Connectionist Networks Vasant Honavar and Leonard Uhr Computer Sciences Department University of Wisconsin-Madison Abstract This paper presents and compares results for three types of connectionist networks: [A] Multi-layered converging networks of neuron-like units, with each unit connected to a small randomly chosen subset of units in the adjacent layers, that learn by re-weighting of their links; [B] Networks of neuron-like units structured into successively larger modules under brain-like topological constraints (such as layered, converging-diverging heterarchies and local recep- tive fields) that learn by re-weighting of their links; [C] Networks with brain-like structures that learn by generation- discovery, which involves the growth of links and recruiting of units in addition to re-weighting of links. Preliminary empirical results from simulation of these net- works for perceptual recognition tasks show large improvements in learning from using brain-like structures (e.g., local receptive fields, global convergence) over networks that lack such structure; further substantial improvements in learning result from the use of generation in addition to reweighting of links. We examine some of the implications of these results for perceptual learning in con- nectionist networks. -------------------------------------------------------------------