From mike%bucasb.bu.edu at bu-it.BU.EDU Mon Aug 1 12:08:42 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Mon, 1 Aug 88 12:08:42 EDT Subject: FIRST ANNUAL MEETING OF THE INTERNATIONAL NEURAL NETWORK SOCIETY Message-ID: <8808011608.AA06511@bucasb.bu.edu> -----Meeting Update----- September 6--10, 1988 Park Plaza Hotel Boston, Massachusetts The first annual INNS meeting promises to be a historic event. Its program includes the largest selection of investigators ever assembled to present the full range of neural network research and applications. The meeting will bring together over 2000 scientists, engineers, students, government administrators, industrial commercializers, and financiers. It is rapidly selling out. Reserve now to avoid disappointment. Call J.R. Shuman Associates, (617) 237-7931 for information about registration For information about hotel reservations, call the Park Plaza Hotel at (800) 225-2008 and reference "Neural Networks." If you call from Massachusetts, call (800) 462-2022. There will be 600 scientific presentations, including tutorials, plenary lectures, symposia, and contributed oral and poster presentations. Over 50 exhibits are already reserved for industrial firms, publishing houses, and government agencies. The full day of tutorials presented on September 6 will be given by Gail Carpenter, John Daugman, Stephen Grossberg, Morris Hirsch, Teuvo Kohonen, David Rumelhart, Demetri Psaltis, and Allen Selverston. The plenary lecturers are Stephen Grossberg, Carver Mead, Terrence Sejnowski, Nobuo Suga, and Bernard Widrow. Approximately 30 symposium lectures will be given, 125 contributed oral presentations, and 400 poster presentations. Fourteen professional societies are cooperating with the INNS meeting. They are: American Association of Artificial Intelligence American Mathematical Society Association for Behavior Analysis Cognitive Science Society IEEE Boston Section IEEE Computer Society IEEE Control Systems Society IEEE Engineering in Medicine and Biology Society IEEE Systems, Man and Cybernetics Society Optical Society of America Society for Industrial and Applied Mathematics Society for Mathematical Biology Society of Photo-Optical Instrumentation Engineers Society for the Experimental Analysis of Behavior DO NOT MISS THE FIRST BIRTHDAY CELEBRATION OF THIS IMPORTANT NEW RESEARCH COALITION! From jdk at riacs.edu Mon Aug 1 13:03:44 1988 From: jdk at riacs.edu (Jim Keeler) Date: Mon, 1 Aug 88 10:03:44 pdt Subject: Job opening at MCC Message-ID: <8808011703.AA28463@hydra.riacs.edu> WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is looking for research scientists to join our newly formed neural network research team. We are looking for researchers with strong theoretical skills in Physics, Electrical Engineering or Computer Science (Ph. D. level or above preferred). The following is a partial list of research topics that the group will address: -Scaling and improvement of existing algorithms -Development of new learing algorithms -Temporal pattern recognition and processing -Reverse engineering of biological networks -Optical neural network architectures MCC offers competitive salaries and a very stimulating research environment. Contact Jim Keeler at jdk.riacs.edu or Haran Boral at haran.mcc.com From mike%bucasb.bu.edu at bu-it.BU.EDU Mon Aug 1 12:08:42 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Mon, 1 Aug 88 12:08:42 EDT Subject: FIRST ANNUAL MEETING OF THE INTERNATIONAL NEURAL NETWORK SOCIETY Message-ID: <8808011608.AA06511@bucasb.bu.edu> -----Meeting Update----- September 6--10, 1988 Park Plaza Hotel Boston, Massachusetts The first annual INNS meeting promises to be a historic event. Its program includes the largest selection of investigators ever assembled to present the full range of neural network research and applications. The meeting will bring together over 2000 scientists, engineers, students, government administrators, industrial commercializers, and financiers. It is rapidly seeservations, call the Park Plaza Hotel at (800) 225-2008 and reference "Neural Networks." If you call from Massachusetts, call (800) 462-2022. There will be 600 scientific presentations, including tutorials, plenary lectures, symposia, and contributed oral and poster presentations. Over 50 exhibits are already reserved for industrial firms, publishing houses, and government agencies. The full day of tutorials presented on September 6 will be given by Gail Carpenter, John Daugman, Stephen Grossberg, Morris Hirsch, Teuvo Kohonen, David Rumelhart, Demetri Psaltis, and Allen Selverston. The plenary lecturers are Stephen Grossberg, Carver Mead, Terrence l presentations, and 400 poster presentations. Fourteen professional societies are cooperating with the INNS meeting. They are: American Association of Artificial Intelligence American Mathematical Society Association for Behavior Analysis Cognitive Science Society IEEE Boston Section IEEE Computer Society IEEE Control Systems Society IEEE Engineering in Medicine and Biology Society IEEE Systems, Man and Cybernetics Society Optical Society of America Society for Industrial and Applied Mathematics Society for Mathematical Biology Society of Photo-Optical Instrumentation Engineers Society for the Experimental Analysis of Behavior DO NOT MISS THE FIRST BIRTHDAY CELEBRATION OF THIS IMPORTANT NEW RESEARCH COALITION! From solla at homxb.att.com Tue Aug 2 10:41:00 1988 From: solla at homxb.att.com (solla@homxb.att.com) Date: Tue, 2 Aug 88 10:41 EDT Subject: No subject Message-ID: The following preprint is available. If you want a copy, please send your request to: Sara A. Solla AT&T Bell Laboratories, Rm 4G-336 Crawfords Corner Road Holmdel, NJ 07733 solla at homxb.att.com ************************************************************************ ACCELERATED LEARNING IN LAYERED NEURAL NETWORKS Sara A. Solla AT&T Bell Laboratories, Holmdel NJ 07733 Esther Levin and Michael Fleisher Technion Israel Institute of Technology, Haifa 32000, Israel ABSTRACT Learning in layered neural networks is posed as the minimization of an error function defined over the training set. A probabilistic interpretation of the target activities suggests the use of relative entropy as an error measure. We investigate the merits of using this error function over the traditional quadratic function for gradient descent learning. Comparative numerical simulations for the contiguity problem show marked reductions in learning times. This improvement is explained in terms of the characteristic roughness of the landscape defined by the error function in configuration space. ************************************************************************ From watrous at linc.cis.upenn.edu Wed Aug 3 09:39:03 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Wed, 3 Aug 88 09:39:03 EDT Subject: Complexity of Second Order Learning Algorithms Message-ID: <8808031339.AA28717@linc.cis.upenn.edu> It is generally assumed that second order learning algorithms are computationally too expensive for use on large problems, since the complexity is O(N**2), N being the number of links. It turns out that the function and gradient evaluations are O(NT), where T is the number of training samples. In order to have statistically adequate training data, T should approximate N, and is typically greater than N. Thus, the computational cost of the function and gradient evaluations exceeds that of the update algorithm. Moreover, since the ratio of function and gradient evaluations to Hessian updates is generally greater than two, the optimization process becomes dominated by function and gradient evaluations rather than by the update operation. The complexity details and several examples are discussed in the technical report (revised): Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization MS-CIS-87-51 available from: James Lotkowski Technical Report Facility Room 269/Moore Building University of Pennsylvania 200 S 33rd Street Philadelphia, PA 19104-6389 james at upenn.cis.edu Ray Watrous From harnad at Princeton.EDU Thu Aug 4 01:46:35 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Thu, 4 Aug 88 01:46:35 edt Subject: Behav. Brain Sci. Call for Commentators: Motor Control Message-ID: <8808040546.AA07055@mind> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international journal of "open peer commentary" in the biobehavioral and cognitive sciences, published by Cambridge University Press. For information on how to serve as a commentator or to nominate qualified professionals in these fields as commentators, please send email to: harnad at mind.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] Strategies for the Control of Voluntary Movements with One Degree of Freedom Gerald L. Gottlieb (Physiology, Rush Medical Center), Daniel M. Corcos (Physical Education, U. Illinois, Chicago), Gyan C. Agarwal (Electr. Engineering & Computer Science, U. Illinois, Chicago) A theory is presented to explain how people's accurate single-joint movements are controlled. The theory applies to movements across different distances, with different inertial loads, toward targets of different widths over a wide range of experimentally manipulated velocities. The theory is based on three propositions: (1) Movements are planned according to "strategies," of which there are at least two: a speed-insensitive (SI) and a speed-sensitive (SS) strategy. (2) These strategies can be equated with sets of rules for performing diverse movement tasks. The choice between (SI) and (SS) depends on whether movement speed and/or movement time (and hence appropriate muscle forces) must be constrained to meet task requirements. (3) The electromyogram can be interpreted as a low-pass filtered version of the controlling signal to motoneuron pools. This controlling signal can be modelled as a rectangular excitation pulse in which modulation occurs in either pulse amplitude or pulse width. Movements with different distances and loads are controlled by the SI strategy, which modulates pulse width. Movements in which speed must be explicitly regulated are controlled by the SS strategy, which modulates pulse amplitude. The distinction between the two movement strategies reconciles many apparent conflicts in the motor control literature. From lab at cs.brandeis.edu Thu Aug 4 08:26:25 1988 From: lab at cs.brandeis.edu (Larry Bookman) Date: Thu, 4 Aug 88 08:26:25 edt Subject: Complexity of Second Order Learning Algorithms In-Reply-To: Raymond Watrous's message of Wed, 3 Aug 88 09:39:03 EDT <8808031339.AA28717@linc.cis.upenn.edu> Message-ID: Could you please send me a copy of MS-CIS-87-51: learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization Thanks, Larry Bookman Brandeis University Computer Science Department Waltham, MA 02254 From watrous at linc.cis.upenn.edu Thu Aug 4 15:49:46 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Thu, 4 Aug 88 15:49:46 EDT Subject: Clarification on Technical Report Message-ID: <8808041949.AA11301@linc.cis.upenn.edu> The recent posting regarding the technical report on the complexity of second order methods of gradient optimization should be amended as follows: 1. There is normally a reproduction charge for technical reports ordered from the University of Pennsylvania. This varies with the length of the report, and for MS-CIS-87-51 is $3.68. This charge has recently been waived for this technical report, compliments of the Computer Science Department. 2. The e-mail address for James Lotkowski should read: james at cis.upenn.edu 3. The revised report has now been renumbered, to distinguish it from its predecessor: MS-CIS-88-62 I apologize for the inconvenience due to these oversights. RW From watrous at linc.cis.upenn.edu Thu Aug 4 15:50:44 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Thu, 4 Aug 88 15:50:44 EDT Subject: Clarification on Technical Report Message-ID: <8808041950.AA11314@linc.cis.upenn.edu> The recent posting regarding the technical report on the complexity of second order methods of gradient optimization should be amended as follows: 1. There is normally a reproduction charge for technical reports ordered from the University of Pennsylvania. This varies with the length of the report, and for MS-CIS-87-51 is $3.68. This charge has recently been waived for this technical report, compliments of the Computer Science Department. 2. The e-mail address for James Lotkowski should read: james at cis.upenn.edu 3. The revised report has now been renumbered, to distinguish it from its predecessor: MS-CIS-88-62 I apologize for the inconvenience due to these oversights. RW From alexis%yummy at gateway.mitre.org Thu Aug 4 11:23:00 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 4 Aug 88 11:23:00 EDT Subject: A Harder Learning Problem Message-ID: <8808041523.AA01234@marzipan.mitre.org> There are many problems with the current standard "benchmark" tasks that are used with NNs, but one of them is that they're just too simple. It's hard to compare learning algorithms when the task the network has to perform is excessively easy. One of the tasks that we've been using at MITRE to test and compare our learning algorithms is to distinguish between two intertwined spirals. This task uses a net with 2 inputs and 1 output. The inputs correspond to points, and the net should output a 1 on one spiral and a 0 on the other. Each of the spirals contains 3 full revolutions. This task has some nice features: it's very non-linear, it's relatively difficult (our spiffed up learning algorithm requires ~15-20 million presentations = ~150-200 thousand epochs = ~1-2 days of cpu on a (loaded) Sun4/280 to learn, ... we've never succeeded at getting vanilla bp to correctly converge), and because you have 2 in and 1 out you can *PLOT* the current transfer function of the entire network as it learns. I'd be interested in seeing other people try this or a related problem. Following this is a simple C program that we use to generate I/O data. Alexis P. Wieland wieland at mitre.arpa MITRE Corporation or 7525 Colshire Dr. alexis%yummy at gateway.mitre.org McLean, VA 22102 /=========================================================================/ #include #include /************************************************************************* ** ** mkspiral.c ** ** A program to generate input and output data for a neural network ** with 2 inputs and 1 output. ** ** If the 2 inputs are taken to represent an x-y position and the ** output (which is either 0.0 or 1.0) is taken to represent which of ** two classes the input point is in, then the data forms two coiled ** spirals. Each spiral forms 3 complete revolutions and contains ** 97 points (32 pts per revolution plus end points). Spiral 1 passes ** from (0, 6.5) -> (6, 0) -> (0, -5.5) -> (-5, 0) -> (0, 4.5) -> ** ... -> (0, 0.5). Likewise, Spiral 0 passes from (0, -6.5) -> ** (-6, 0) -> (0, 5.5) -> (5, 0) -> (0, -4.5) -> ... -> (0, -0.5). ** ** This program writes out data in ascii, one exemplar per line, in ** the form: ((x-pt y-pt) (class)). ** ** This data set was developed to test learning algorithms developed ** at the MITRE Corporation. The intention was to create a data set ** which would be non-trivial to learn. We at MITRE have never ** succeeded at learning this task with vanilla back-propagation. ** ** Any questions or comment (reports of success or failure with this ** task are as interesting as anything to us) contact: ** ** Alexis P. Wieland ** MITRE Corporation ** 7525 Colshire Dr. ** McLean, VA 22102 ** (703) 883-7476 ** wieland at mitre.ARPA ** *************************************************************************/ main() { int i; double x, y, angle, radius; /* write spiral of data */ for (i=0; i<=96; i++) { angle = i * M_PI / 16.0; radius = 6.5 * (104 - i) / 104.0; x = radius * sin(angle); y = radius * cos(angle); printf("((%8.5f %8.5f) (%3.1f))\n", x, y, 1.0); printf("((%8.5f %8.5f) (%3.1f))\n", -x, -y, 0.0); } } From Mark.J.Zeren at mac.Dartmouth.EDU Fri Aug 5 17:38:13 1988 From: Mark.J.Zeren at mac.Dartmouth.EDU (Mark.J.Zeren@mac.Dartmouth.EDU) Date: 5 Aug 88 17:38:13 EDT Subject: Joining the list Message-ID: <11383@mac.dartmouth.edu> I am a student working on some neural net research at Dartmouth College under Jamshed Barucha. I would like to join/use the mailing list to get some feedback/insights to some of the ideas that I have been pursuing this summer. I would remain on the list only through the end of August, as I am leaving the country at the beginning of september. Mark Zeren mark.zeren at dartmouth.edu From Kevin.Lang at G.GP.CS.CMU.EDU Fri Aug 5 17:43:30 1988 From: Kevin.Lang at G.GP.CS.CMU.EDU (Kevin.Lang@G.GP.CS.CMU.EDU) Date: Fri, 5 Aug 88 17:43:30 EDT Subject: A Harder Learning Problem Message-ID: I tried standard back-propagation on the spiral problem, and found that it is a useful addition to the standard set of benchmark problems. It is as small and easily stated as the usual encoder and shifter problems, but many times harder. My network has a 2-5-5-5-1 structure, consisting of 2 input units, three hidden layers of 5 units each, and 1 output unit. Each layer is connected to all of the other layers to provide quick pathways along which to propagate errors. This network contains 138 weights, which seems about right for a training set with 194 examples. The network was trained with parameters that were increased gradually from .001 to .002 for the learning rate parameter, and from .5 to .95 for the momentum parameter. A few brief excursions to .005 for the learning parameter caused derailments (the cosine of the angle between successive steps went negative). At CMU, we generally use target values of 0.2 and 0.8 in place of 0.0 and 1.0, in order to reduce the need for big weights. Assuming that errors occur when the output value for a case lies on the wrong side of 0.5, the network had the following error history as it was trained using the batch version of back-propagation (all cases presented between weight updates.) This run chewed up about 9 CPU minutes on our Convex. epochs errors 2,000 75 4,000 74 6,000 64 8,000 14 (big improvement here) 10,000 8 12,000 4 14,000 2 16,000 2 (struggling) 18,000 0 The average weight at this point is about 3.4. Since all of the output values lie on opposite sides of 0.5, it is a simple matter to grow the weights and separate the values further. For example, about 1,000 more epochs are required to pull the output values below 0.4 and above 0.6 for the two spirals. From Dave.Touretzky at B.GP.CS.CMU.EDU Fri Aug 5 20:56:49 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Fri, 05 Aug 88 20:56:49 EDT Subject: NIPS needs a logo Message-ID: <11682.586832209@DST.BOLTZ.CS.CMU.EDU> ------- Blind-Carbon-Copy Reply-To: Dave.Touretzky at cs.cmu.edu cc: reyner at c.cs.cmu.edu Subject: NIPS needs a logo Date: Fri, 05 Aug 88 20:56:49 EDT Message-ID: <11682.586832209 at DST.BOLTZ.CS.CMU.EDU> From: Dave.Touretzky at DST.BOLTZ.CS.CMU.EDU We are seeking a distinctive logo for the IEEE NIPS (Neural Information Processing Systems) Conference. This conference is held each year in Denver. The 1988 conference is scheduled for November 28-December 1. The logo will be used on the abstract booklet and conference proceedings. In future years it will also be used on all conference stationery, calls for papers, and publicity releases. The logo should be a small and fairly simple design that expresses the scientific theme of the NIPS conference. We welcome any and all suggestions. Crude sketches are okay; an artist will refine the best half dozen ideas we receive, and the final decision will be made by the conference organizing committee. Submissions should be sent (hardcopy only) by September 1st to: Pamela Reyner Scott Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213-3890 All submissions become the property of the organizing committee and cannot be returned, so send a photocopy, not your original. It would also be a good idea to write your name and address on each submission so we don't lose track of who sent what. The submitter of the winning logo design will reap unimaginable fame and wealth, or at the very least a warm acknowledgement in the proceedings. Multiple submissions are welcome. - -- Dave Touretzky Publications Chairman 1988 IEEE NIPS Conference ------- End of Blind-Carbon-Copy From panther!panther.UUCP!gjt at uxc.cso.uiuc.edu Fri Aug 5 18:49:14 1988 From: panther!panther.UUCP!gjt at uxc.cso.uiuc.edu (Gerry Tesauro) Date: Fri, 5 Aug 88 17:49:14 CDT Subject: Two tech reports Message-ID: <8808052249.AA21912@panther.ccsr.uiuc.edu> Two new Center for Complex Systems Research Tech. Reports are now available; the abstracts appear below. (A cautionary note: CCSR-88-6 describes an obsolete network, and is of no use to readers unfamiliar with backgammon.) Requests may be sent to: gjt%panther at uxc.cso.uiuc.edu or the US mail address which appears below. ------------------------ Neural Network Defeats Creator in Backgammon Match G. Tesauro Center for Complex Systems Research, University of Illinois at Urbana-Champaign, 508 S. 6th St., Champaign, IL 61820 USA Technical Report No. CCSR-88-6 This paper presents an annotated record of a 20-game match which I played against one of the networks discussed in ``A Parallel Network that Learns to Play Backgammon,'' by myself and Terry Sejnowski. (Tech. Report CCSR-88-2, and Artifi- cial Intelligence, to appear.) This paper is specifically intended for backgammon enthusiasts who want to see exactly how the network plays. The surprising result of the match was that the network won, 11 games to 9. However, the network made several blunders during the course of the match, and was extremely lucky to have won. Nevertheless, in spite of the network's worst-case play, its average performance in typical positions is quite sharp, and is more challenging than con- ventional commercial programs. ------------------------ Asymptotic Convergence of Back-Propagation in Single-Layer Networks Gerald Tesauro and Yu He Center for Complex Systems Research University of Illinois at Urbana-Champaign 508 S. 6th St., Champaign, IL 61820 USA Technical Report No. CCSR-88-7 We calculate analytically the rate of conver- gence at long times in the back-propagation learn- ing algorithm for networks without hidden units. For the standard quadratic error function and a sigmoidal transfer function, we find that the error decreases as 1/t for large t, and the output states approach their target values as 1/sqrt(t). It is possible to obtain a different convergence rate for certain error and transfer functions, but the convergence can never be faster than 1/t. These results also hold when a momentum term is added to the learning algorithm. Our calculation agrees with the numerical results of Ahmad and Tesauro. From PH706008%BROWNVM.BITNET at VMA.CC.CMU.EDU Sat Aug 6 15:36:41 1988 From: PH706008%BROWNVM.BITNET at VMA.CC.CMU.EDU (PH706008%BROWNVM.BITNET@VMA.CC.CMU.EDU) Date: Sat, 06 Aug 88 15:36:41 EDT Subject: Reply to Alexis Wieland Message-ID: In your recent communication concerning the spirals problem for neural networks, you mentioned that you would be interested in hearing from anyone working on a related problem. I have been investigating a similar, although somewhat less complicated problem, using a backward propagation network. The paradigm is a concentric circle problem in which the desired output on the inner disc is 0 and on the outer annulus is 1. The two input units are loaded with the x and y coordinates for each pattern as in your paradigm. After learning the training patterns (randomly chosen patterns in the two regions: usually 50 -150), the network generalizes nicely for previously unseen patterns; a plot of average output versus radius approaches that of a step function as more patterns are used. The paradigm was suggested by my dissertation advisor, Prof. Leon Cooper. I would be interested in hearing more details of your simulations. Charles M. Bachmann ("Chip") ph706008 at Brownvm Box 1843 Physics Dpt. & Ctr. for Neural Science Brown University Providence, R. I. 02912 From elman at amos.ling.ucsd.edu Mon Aug 8 16:23:47 1988 From: elman at amos.ling.ucsd.edu (Jeff Elman) Date: Mon, 8 Aug 88 13:23:47 PDT Subject: Technical Report announcement Message-ID: <8808082023.AA22615@amos.ling.ucsd.edu> The following abstract describes a paper which can be obtained from Hal White, Dept. of Economics D-008, Univ. of Calif., San Diego, La Jolla, CA 92093. Multi-layer feedforward networks are universal approximators by Kurt Hornik, Maxwell Stinchcombe, and Halbert White This paper rigorously establishes that standard multi-layer feedforward networks with as few as one hidden layer using arbitrary squashing functions (not necessarily continuous) at the hidden layer(s) are capable of approximating any Borel measurable function from one Euclidean space to another to any desired degree of accuracy, provided suffi- ciently many hidden units are available. In this sense, multi-layer feedforward networks are a class of universal approximators. From pfeifer at ifi.unizh.ch Mon Aug 8 10:16:00 1988 From: pfeifer at ifi.unizh.ch (Rolf Pfeifer) Date: 8 Aug 88 16:16 +0200 Subject: Connectionist conference Message-ID: <742*pfeifer@ifi.unizh.ch> ***************************************************************************** SGAICO Conference ******************************************************************************* Program and Call for Presentation of Ongoing Work C O N N E C T I O N I S M I N P E R S P E C T I V E University of Zurich, Switzerland 10-13 October 1988 Tutorials: 10 October 1988 Technical Program: 11 - 12 October 1988 Workshops and Poster/demonstration session 13 October 1988 ****************************************************************************** Organization: - University of Zurich, Dept. of Computer Science - SGAICO (Swiss Group for Artificial Intelligence and Cognitive Science) - Gottlieb Duttweiler Institute (GDI) About the conference ____________________ Introdution: Connectionism has gained much attention in recent years as a paradigm for building models of intelligent systems in which intresting behavioral properties emerge from complex interactions of a large number of simple "neuron-like" elements. Such work is highly relevant to fields such as cognitive science, artificial intelligence, neurobiology, and computer science and to all disciplines where complex dynamical processes and principles of self-organization are studied. Connectionism models seem to be suited for solving many problems which have proved difficult in the past using traditional AI techniques. But to what extent do they really provide solutions? One major theme of the conference is to evaluate the import of connectionist models for the various disciplines. Another one is to see in what ways connectionism, being a young discipline in its present form, can benefit from the influx of concepts and research results from other disciplines. The conference includes tutorials, workshops, a technical program and panel discussions with some of the leading researchers in the field. Tutorials: The goal of the tutorials is to introduce connectionism to people who are relatively new to the field. They will enable participants to follow the technical program and the panel discussions. Technical Program: There are many points of view to the study of intelligent systems. The conference will focus on the views from connectionism, artificial intelligence and cognitive science, neuroscience, and complex dynamics. Along another dimension there are several significant issues in the study of intelligent systems, some of which are "Knowledge representation and memory", "Perception, sequential processing, and action", "Learning", and "Problem solving and reasoning". Researchers from connectionism, cognitive science, artificial intelligence, etc. will take issue with the ways connectionism is approaching these various problem areas. This idea is reflected in the structure of the program. Panel Discussions: There will be panel discussion with experts in the field on specialized topics which are of particular interest to the application of connectionism. Workshops and Presentations of Ongoing Work: The last day of the conference is devoted to wokrshops with the purpose of identifying the major problems that currently exist within connectionism, to define future research agendas and collaborations, to provide a platform for the interdisciplinary exchange of information and experience, and to find a framework for practical applications. The workshop day will als feature presentation of ongoing work (see "Call for presentation of ongoing work"). ******************************************************************************* * * * CALL FOR PRESENTATION OF OINGOING WORK * * * * Presentations are invited on all areas of connectionist research. The focus * * is on current research issues, i.e. "work in progress" is of highest * * interest even if major problems remain to be resolved. Work of RESEARCH * * GROUPS OR LABORATORIES is particularly welcome. Presentations can be in the * * form of poster, or demonstration of prototypes. The goal is to encourage * * cooperation and the exchange of ideas between different research groups. * * Please submit an extended abstract (1-2 pages). * * * * Deadline for submissions: September 2, 1988 * * Notification of acceptance: September 20, 1988 * * * * Contact: Zoltan Schreter, Computer Science Department, University of * * Zurich, Switzerland, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland * * Phone: (41) 1 257 43 07/11 * * Fax: (41) 1 257 40 04 * * or send mail to * * pfeifer at ifi.unizh.ch * * * ******************************************************************************* Tutorials MONDAY, October 10, 1988 ___________________________________________________________________________ 08.30 Tutorial 1: Introduction to neural nets. F. Fogelman - Adaptive systems: Perceptrons (Rosenblatt) and Adalines (Widrow & Hoff) - Associative memories: linear model (Kohonen), Hopfield networks, Brain state in a box model (BSB; Anderson) - Link to other disciplines 09.30 Coffee 10.00 Tutorial 2: Self-organizing Topological maps. T. Kohonen - Theory - Application: Speech-recognizing systems - Tuning of maps for optimal recognition accuracy (learning vector quantization) 11:30 Tutorial 3: Multi-layer neural networks. Y. Le Cun - Elementary learning mechanisms (LMS and Perceptron) and their limitations - Easy and hard learning - Learning in multi-layer networks: The back-propagation algorithm (and its variations) - Multi-layer networks: - as associative memories - for pattern recognition (a case study) - Network design techniques; simulators and software tools 13.00 Lunch 14.00 Tutorial 4: Parallel Distributed Processing of symbolic structure. P. Smolensky Can Connectionism deal with the kind of complex highly structured information characteristic of most AI domains? This tutorial presents recent research suggesting that the answer is yes. 15.30 Coffee 16.00 Tutorial 5: Connectionist modeling and simulation in neuroscience and psychology. R. Granger Biological networks are composed of neurons with a range of biophysical and physiological properties that give rise to complex learning and performance rules embedded in anatomical architectures with complex connectivity. Given this complexity it is of interest to identify which of the characteristics of brain networks are central and which are less salient with respect to behavioral function. "Bottom-up" biological modeling attempts to identify the crucial learning and performance rules and their appropriate level of abstraction. 17.30 End of tutorial sessions _______________________________________________________________________________ Technical Program TUESDAY, October 11, 1988 ___________________________________________________________________________ Introduction 09:00 Connectionism: Is it a new paradigm? M. Boden 09:45 Discussion 10:00 Coffee 1. Knowledge Representation & Memory. Chair: F. Fogelman The perspective of: 10:30 - Connectionism. P. Smolensky Dealing with structure in Connectionism 11:15 - AI/ N.N. Cognitive Science 12:00 - Neuroscience/ C. v. der Malsburg Connectionism A neural architecture for the representation of structured objects 12:45 Lunch 2. Perception, Sequential Processing & Action. Chair: T. Kohonen The perspective of: 14:30 - Connectionism M. Kuperstein Adaptive sensory-motor coordination using neural networks 15:15 - Connectionism/ M. Imbert Neuroscience and Connectionism: Neuroscience The case of orientation coding. 16:00 Coffee 16:30 - AI/ J. Bridle Connectionist approaches to Connectionism artificial perception: A speech pattern processing approach 17:15 - Neuroscience G. Reeke Synthetic neural modeling: A new approach to Brain Theory 18:00 Intermission/snack 18.30 - 20.00 panel discussion/workshop on Expert Systems and Connectionism. Chair: S. Ahuja D. Bounds D. Reilly Y. Le Cun R. Serra ___________________________________________________________________________ WEDNESDAY, October 12, 1988 ___________________________________________________________________________ 3. Learning. Chair: R. Serra The perspective of: 9:00 - Connectionism Y. Le Cun Generalization and network design strategies 9:45 - AI Y. Kodratoff Science of explanations versus science of numbers 10:30 Coffee 11:00 - Complex Dynamics/ Genetic Algorithms H. Muehlenbein Genetic algorithms and parallel computers 11:45 - Neuroscience G. Lynch Behavioral effects of learning rules for long-term potentiation 12:30 Lunch 4. Problem Solving & Reasoning. Chair: R. Pfeifer The perspective of: 14:00 - AI/ B. Huberman Dynamical perspectives on Complex Dynamics problem solving and reasoning 14:45 - Complex Dynamics L. Steels The Complex Dynamics of common sense 15:30 Coffee 16:00 - Connectionism J. Hendler Problem solving and reasoning: A Connectionist perspective 16:45 - AI P. Rosenbloom A cognitive-levels perspective on the role of Connectionism in symbolic goal-oriented behavior 17:30 Intermission/snack 18:00 - 19:30 panel discussion/workshop on Implementation Issues & Industrial Applications. Chair: P. Treleaven B. Angeniol G. Lynch G. Dreyfus C. Wellekens __________________________________________________________________________ Workshops and presentation of ongoing work THURSDAY, October 13, 1988 ___________________________________________________________________________ 9:00-16:00 Workshops in partially parallel sessions. There will be a separate poster/demonstration session for the presentation of ongoing work. The detailed program will be based on the submitted work and will be available at the beginning of the conference. The workshops: 1. Knowledge Representation & Memory Chair: F. Fogelman 2. Perception, Sequential Processing & Action Chair: F. Gardin 3. Learning Chair: R. Serra 4. Problem Solving & Reasoning Chair: R. Pfeifer 5. Evolutionary Modelling Chair: L. Steels 6. Neuro-Informatics in Switzerland: Theoretical and technical neurosciences Chair: K. Hepp 7. European Initiatives Chair: N.N. 8. Other 16:10 Summing up: R. Pfeifer 16:30 End of the conference ___________________________________________________________________________ Program as of June 29, 1988, subject to minor changes ___________________________________________________________________________ THE SMALL PRINT Organizers Computer Science Department, University of Zurich Swiss Group for Artificial Intelligence and Cognitive Science (SGAICO) Gottlieb Duttweiler Institute (GDI) Location University of Zurich-Irchel Winterthurerstrasse 190 CH-8057 Zurich, Switzerland Administration Gabi Vogl Phone: (41) 1 257 43 21 Fax: (41) 1 257 40 04 Information Rolf Pfeifer Zoltan Schreter Computer Science Department, University of Zurich Winterthurerstrasse 190, CH-8057 Zurich Phone: (41) 1 257 43 23 / 43 07 Fax: (41) 1 257 40 04 Sanjeev B. Ahuja, Rentenanstalt (Swiss Life) General Guisan-Quai 40, CH-8022 Zurich Phone: (41) 1 206 40 61 / 33 11 Thomas Bernold, Gottlieb Duttweiler Institute, CH-8803 Ruschlikon Phone: (41) 1 461 37 16 Fax: (41) 1 461 37 39 Participation fees Conference 11-13 October 1988: Regular SFr. 350.-- ECCAI/SGAICO/ SI/SVI-members SFr. 250.-- Full time students SFr. 100.-- Tutorials 10 October 1988: Regular SFr. 200.-- ECCAI/SGAICO/ SI/SVI-members SFr. 120.-- Full time students SFr. 50.-- For graduate students / assistants a limited number of reduced fees are available. Documentation and refreshments are included. Please remit the fee only upon receipt of invoice by the Computer Science Department. Language The language of the conference is English. Cancellations If a registration is cancelled, there will be a cancellation charge of SFr. 50.-- after 1st October 1988, unless you name a replacement. Hotel booking Hotel booking will be handled separately. Please indicate on your registration form whether you would like information on hotel reservations. Proceedings Proceedings of the conference will be published in book form. They will become available in early 1989. From yann at ai.toronto.edu Tue Aug 9 02:13:46 1988 From: yann at ai.toronto.edu (Yann le Cun) Date: Tue, 9 Aug 88 02:13:46 EDT Subject: Technical Report announcement In-Reply-To: Your message of Mon, 08 Aug 88 16:23:47 -0400. Message-ID: <88Aug8.233349edt.386@neat.ai.toronto.edu> Jeff Elman writes: > This paper rigorously establishes that standard multi-layer > feedforward networks with as few as one hidden layer using > arbitrary squashing functions (not necessarily continuous) > at the hidden layer(s) are capable of approximating any > Borel measurable function from one Euclidean space to > another to any desired degree of accuracy, provided suffi- > ciently many hidden units are available. In this sense, > multi-layer feedforward networks are a class of universal > approximators. I showed the same kind of result in my thesis (although probably not as rigorously). The problem is: if you use monotonic squashing functions, then you need one more layer (i.e two hidden layers). reference: Yann le Cun: "modeles connexionnistes de l'apprentissage" (connectionist learning models), These de Doctorat, Universite Pierre et Marie Curie (Paris 6), June 1987, Paris, France. - Yann From chrisley.pa at Xerox.COM Mon Aug 8 22:38:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 8 Aug 88 19:38 PDT Subject: A Harder Learning Problem In-Reply-To: alexis%yummy@gateway.mitre.org's message of Thu, 4 Aug 88 11:23:00 EDT Message-ID: <880808-194211-4680@Xerox> This is in response to the recent comments by Alexis Wieland, Charles Bachmann, and the tech report by Scott Fahlman which was announced on this mailing list. I agree that a more careful selection of benchmarking tasks is required. Specifically, there has been little effort spent on comparing networks on the kinds of tasks that many are advocating as one of the fortes of the neural network approach: pattern recognition in natural signals (eg, speech). The key characteristic of patterns in natural signals is that they are statistical: a sample is often a member of more than one class. Thus, one does not talk of zero error, but minimal error. The reason why explicitly statistical tasks should be used in benchmarking is that the pattern recognition properties of models vary noticeably when moving from the deterministic to the statistical case. For an example of statistical benchmarking of Backprop, Boltzmann machines, and Learning Vector Quantization, see Kohonen, Barna and Chrisley, '88, in the proceedings of this year's ICNN. Also see Huang and Lippmann, '87a and b (ICNN and NIPS). For example, a typical two category task might have category A as a Gaussian distribution cetered around the origin with a variance of 2, while category B might be a Gaussian that is offset in the first dimension by some amount, and with a variance of 1. This requires non-linear decision boundaries for optimal (Bayesian) performance, and the optimal performance may be calculated analytically (good performance = low misclassification rate). This is one of the tasks discussed in our paper, above. BTW, we found that LVQ was better than BP, especially in high dimensional and difficult tasks, while the BM had almost optimal performance, although it required inordinate amounts of computing time. Ron Chrisley Xerox PARC SSL 3333 Coyote Hill Road Palo Alto, CA 94304 From huyser at mojave.Stanford.EDU Wed Aug 10 13:34:31 1988 From: huyser at mojave.Stanford.EDU (Karen Huyser) Date: Wed, 10 Aug 88 10:34:31 PDT Subject: Roommates wanted Message-ID: <8808101734.AA19443@mojave.Stanford.EDU> Make a new friend!! Live with a stranger for five days and hope you never see them again :-) !! BE A ROOMMATE!! Vip Tolat and I, both Stanford students, will be giving papers at the INNS Conference in Boston next month, and we are too poor to spend five or six nights at the Park Plaza at $100/night. If anyone on the net would like to save half the hotel fee by teaming up with one of us, we'd sure appreciate it. Roommate requests should be sent to huyser at sonoma.stanford.edu or huyser at mojave.stanford.edu. If there are a lot of requests, I will collect a list and distribute it to the people who are on it. Karen Huyser Vip Tolat From jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU Wed Aug 10 16:42:27 1988 From: jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU (Jerry Feldman) Date: Wed, 10 Aug 88 13:42:27 PDT Subject: Benchmark Message-ID: <8808102042.AA12887@icsia7.Berkeley.EDU> I suggest the connectionist learning of Finite State Automata (FSA) as an interesting benchmark. For example, people compute the parity of a long binary string by an algorithm equivalent to a 2-state FSA. Sara Porat and I have a constructive proof that FSA learning can be done from a complete, lexicographically ordered sample so that might be a reasonable subcase to consider. It is known that the general case is NP complete, i.e. very hard. I dont much like the way our network functions and most of you would hate it, so I don't suggest starting with our paper. Should anyone care, I could expound on why the FSA learning problem is an important test of learning models. From Scott.Fahlman at B.GP.CS.CMU.EDU Wed Aug 10 20:11:45 1988 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Wed, 10 Aug 88 20:11:45 EDT Subject: Benchmark In-Reply-To: Your message of Wed, 10 Aug 88 13:42:27 -0700. <8808102042.AA12887@icsia7.Berkeley.EDU> Message-ID: Jerry, Could you specify some particular FSA problem that we could all take a crack at? Ideally, people proposing learning-speed (or learning at all) benchmarks should specify the problem in enough detail that the results of different learning approaches can be compared. If you can provide one set of results as a starting point, that's better still. I'm not sure if you have in mind a set of problems that can only be attacked by nets with directed loops, or whether you are directly training the combinational logic network that combines input with previous state bits to get new state bits. In other words, are you (the trainer) telling the network what states the memory bits are to assume or is that part of what must be learned? -- Scott From mjolsness-eric at YALE.ARPA Wed Aug 10 22:53:17 1988 From: mjolsness-eric at YALE.ARPA (Eric Mjolsness) Date: Wed, 10 Aug 88 22:53:17 EDT Subject: Tech report available Message-ID: <8808110252.AA26050@NEBULA.SUN3.CS.YALE.EDU> The following technical report is now available. ------------------------------------------------------------------------------ Optimization in Model Matching and Perceptual Organization: A First Look Eric Mjolsness, Gene Gindi, and P. Anandan (YALEU/DCS/RR-634) Abstract We introduce an optimization approach for solving problems in computer vision that involve multiple levels of abstraction. Specifically, our objective functions can include compositional hierarchies involving object-part relationships and specialization hierarchies involving object-class relationships. The large class of vision problems that can be subsumed by this method includes traditional model matching, perceptual grouping, dense field computation (regularization), and even early feature detection which is often formulated as a simple filtering operation. Our approach involves casting a variety of vision problems as inexact graph matching problems, formulating graph matching in terms of constrained optimization, and using analog neural networks to perform the constrained optimization. We will show the application of this approach to shape recognition in a domain of stick-figures and to the perceptual grouping of line segments into long lines. ------------------------------------------------------------------------------ available from: connolly-eileen at yale.cs.edu alternatively: connolly-eileen at yale.arpa or write to: Eileen Connolly Yale Computer Science Dept 51 Prospect Street P.O. Box 2158 Yale Station New Haven CT 06520 Please include a physical address with your request. ------- From neuron at ei.ecn.purdue.edu Thu Aug 11 12:37:46 1988 From: neuron at ei.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Thu, 11 Aug 88 11:37:46 EST Subject: Tech report available In-Reply-To: Your message of Wed, 10 Aug 88 22:53:17 EDT. <8808110252.AA26050@NEBULA.SUN3.CS.YALE.EDU> Message-ID: <8808111637.AA11417@ei.ecn.purdue.edu> Could you plese send a copy of this report. I was unable to reach the other email address... M. F. Tenorio School of ELectrical Engineering Purdue University W> Lafayette, IN 47907 From chrisley.pa at Xerox.COM Thu Aug 11 16:48:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 11 Aug 88 13:48 PDT Subject: Benchmark In-Reply-To: Scott.Fahlman@B.GP.CS.CMU.EDU's message of Wed, 10 Aug 88 20:11:45 EDT Message-ID: <880811-135056-3134@Xerox> By the way, Scott Fahlman's Aug 10th comment reminds me of another important distinction in benchmarking: learning rates vs. performance. Since the bulk of the tasks that have been used in benchmarking are deterministic (i.e., non-statistical), the performance comparison has been less interesting: any network worth anything should achieve about the same performance, which is often 100%. Since, in the statistical case, 0% error is generally impossible, the performance of the networks becomes a much more interesting issue. And in many applications, the learning time is off-line, and therefore an irrelevant way to judge the system. A good example where this might not be the case is, ironically, in our own application of speech recognition. Since none of the networks yet developed are truly speaker independent, there must always be some re-calibration when you want real-time, speaker independent recognition. Thus, learning rates are inmportant as well as performance. Prof. Kohonen has it down to 10 minutes for a new (Finnish) speaker, but that is not good enough for many applications. -- Ron From terry at cs.jhu.edu Thu Aug 11 17:41:04 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Thu, 11 Aug 88 17:41:04 edt Subject: Neural Computation Message-ID: <8808112141.AA28848@crabcake.cs.jhu.edu> Announcement and Call for Papers NEURAL COMPUTATION First Issue: Spring 1989 Editor-in-Chief Terrence Sejnowski The Salk Institute and The University of California at San Diego Neural Computation will provide a unique interdisciplinary forum for the dissemination of important research results and for reviews of research areas in neural computation. Neural computation is a rapidly growing field that is attracting researchers in neuroscience, psychology, physics, mathematics, electrical engineering, computer science, and artificial intelligence. Researchers within these disciplines address, from special perspectives, the twin scientific and engineering challenges of understanding the brain and building computers. The journal serves to bring together work from various application areas, highlighting common problems and techniques in modeling the brain and in the design and construction of neurally-inspired information processing systems. By publishing timely short communications and research reviews, Neural Computation will allow researchers easy access to information on important advances and will provide a valuable overview of the broad range of work contributing to neural computation. The journal will not accept long research articles. The fields covered include neuroscience, computer science, artificial intelligence, mathematics, physics, psychology, linguistics, adaptive systems, vision, speech, robotics, optical computing, and VLSI. Neural Computation is published quarterly by The MIT Press. Board of Editors Editor-in-Chief: Terrence Sejnowski, The Salk Institute and The University of California at San Diego Advisory Board: Shun-ichi Amari, University of Tokyo, Japan Michael Arbib, University of Southern California Jean-Pierre Changeux, Institut Pasteur, France Leon Cooper, Brown University Jack Cowan, University of Chicago Jerome Feldman, University of Rochester Teuovo Kohonen, University of Helsinki, Finland Carver Mead, California Institute of Technology Tomaso Poggio, Massachusetts Institute of Technology Wilfrid Rall, National Institutes of Health Werner Reichardt, Max-Planck-Institut fur Biologische Kybernetik David A. Robinson, Johns Hopkins University David Rumelhart, Stanford University Bernard Widrow, Stanford University Action Editors: Joshua Alspector, Bell Communications Research Richard Andersen, MIT James Anderson, Brown University Dana Ballard, University of Rochester Harry Barrow, University of Sussex Andrew Barto, University of Massachusetts Gail Carpenter, Northeastern University Gary Dell, University of Rochester Gerard Dreyfus, Paris, France Jeffrey Elman, University of California at San Diego Nabil Farhat, University of Pennsylvania Francois Fogelman-Soulie, Paris, France Peter Getting, University of Iowa Ellen Hildreth, Massachusetts Institute of Technology Geoffrey Hinton, University of Toronto, Canada Bernardo Huberman, Xerox, Palo Alto Lawrence Jackel, AT&T Bell Laboratories Scott Kirkpatrick, IBM Yorktown Heights Christof Koch, California Institute of Technology Richard Lippmann, Lincoln Laboratories Stephen Lisberger, University of California San Francisco James McClelland, Carnegie-Mellon University Graeme Mitchison, Cambridge University, England David Mumford, Harvard University Erkki Oja, Kuopio, Finland Andras Pellionisz, New York University Demetri Psaltis, California Institute of Technology Idan Segev, The Hebrew University Gordon Shepherd, Yale University Vincent Torre, Universita di Genova, Italy David Touretzky, Carnegie-Mellon University Roger Traub, IBM Yorktown Heights Les Valiant, Harvard University Christoph von der Malsburg, University of Southern California David Willshaw, Edinburgh, Scotland John Wyatt, Massachusetts Institute of Technology Steven Zucker, McGill University, Canada Instructions to Authors The journal will consider short communications, having no more than 2000 words of text, 4 figures, and 10 citations; and area reviews which summarize significant advances in a broad area of research, with up to 5000 words of text, 8 figures, and 100 citations. The journal will accept one-page summaries for proposed reviews to be considered for solicitation. All papers should be submitted to the editor-in-chief. Authors may recommend one or more of the action editors. Accepted papers will appear with the name of the action aditor that communicated the paper. Before January 1, 1989, please address submissions to: Dr. Terrence Sejnowski Biophysics Department Johns Hopkins University Baltimore, MD 21218 After January 1, 1989, please address submissions to: Dr. Terrence Sejnowski The Salk Institute P.O. Box 85800 San Diego, CA 92138 Subscription Information Neural Computation Annual subscription price (four issues): $90.00 institution $45.00 individual (add $9.00 surface mail or $17.00 airmail postage outside U.S. and Canada) Available from: MIT Press Journals 55 Hayward Street Cambridge, MA 02142 USA 617-253-2889 From pauls at boulder.Colorado.EDU Sun Aug 14 13:55:21 1988 From: pauls at boulder.Colorado.EDU (Paul Smolensky) Date: Sun, 14 Aug 88 11:55:21 MDT Subject: TRs available Message-ID: <8808141755.AA05168@sigi.colorado.edu> Three technical reports are available; please direct requests via e-mail to kate at boulder.colorado.edu or via regular mail to: Paul Smolensky Department of Computer Science University of Colorado Boulder, CO 80309-0430 Thanks -- paul ----------------------------------------------------------------- Analyzing a connectionist model as a system of soft rules Clayton McMillan & Paul Smolensky CU-CS-393-88 March, 1988 In this paper we reexamine the knowledge in the Rumelhart and McClelland (1986) connectionist model of the acquisition of the English past tense. We show that their original connection ma- trix is approximately equivalent to one that can be explicitly decomposed into what we call soft rule matrices. Each soft rule matrix encodes the knowledge of how to handle the verbs in one of the verb classes determined for this task by Bybee & Slobin (1982). This demonstrates one approximate but explicit sense in which it is reasonable to speak of the weights in connectionist networks encoding higher-level rules or schemas that operate in parallel. Our results also suggest that it may be feasible to understand the knowledge in connectionist networks at a level in- termediate between the microscopic level of individual connec- tions and the monolithic level of the entire connection matrix. To appear in the Proceedings of the Tenth Meeting of the Cognitive Science Society ----------------------------------------------------------------- The constituent structure of connectionist mental states: A reply to Fodor and Pylyshyn Paul Smolensky CU-CS-394-88 March, 1988 The primary purpose of this article is to reply to the central point of Fodor and Pylyshyn's (1988) critique of connectionism. The direct reply to their critique comprises Section 2 of this paper. I argue that Fodor and Pylyshyn are simply mistaken in their claim that connectionist mental states lack the necessary constituent structure, and that the basis of this mistake is a failure to appreciate the significance of distributed representa- tions in connectionist models. Section 3 is a broader response to the bottom line of their critique, which is that connection- ists should re-orient their work towards implementation of the classical symbolic cognitive architecture. I argue instead that connectionist research should develop new formalizations of the fundamental computational notions that have been given one par- ticular formal shape in the traditional symbolic paradigm. My response to Fodor and Pylyshyn's critique presumes a certain meta-theoretical context that is laid out in Section 1. In this first section I argue that any discussion of the choice of some framework for cognitive modeling (e.g. the connectionist frame- work) must admit that such a choice embodies a response to a fun- damental cognitive paradox, and that this response shapes the en- tire scientific enterprise surrounding research within that framework. Fodor and Pylyshyn are implicitly advocating one class of response to the paradox over another, their critique is analyzed in this light. In the Southern Journal of Philosophy, special issue on Connectionism and the Foundations of Cognitive Science ---------------------------------------------------------------- Application of the Interactive Activation Model to Document Retrieval Jonathan Bein & Paul Smolensky CU-CS-405-88 May 1988 In this paper we consider an application of the Interactive Ac- tivation Model [McClelland 82] to the problem of document re- trieval. The issues in this application center around a neural net or "connectionist" model called inductive information Re- trieval set forth in [Mozer 84]. The paper provides empirical results on the robustness of this model using a real-world docu- ment database consisting of 13,000 documents. To appear in the Proceedings of NeuroNimes: Neural Networks and their Applications From panther!panther.UUCP!gjt at uxc.cso.uiuc.edu Mon Aug 15 18:59:41 1988 From: panther!panther.UUCP!gjt at uxc.cso.uiuc.edu (Gerry Tesauro) Date: Mon, 15 Aug 88 17:59:41 CDT Subject: New address Message-ID: <8808152259.AA00453@panther.ccsr.uiuc.edu> Effective tomorrow, Aug. 16, I will no longer be at CCSR. I am moving to IBM Watson in New York. For those of you who wish to request copies of CCSR Technical Reports CCSR-88-1, -2, -6 and -7, please send your requests to jean%panther at uxc.cso.uiuc.edu Requests for my other publications should be sent to me by postcard at IBM (I do not have an e-mail address yet). The address is: Dr. Gerald Tesauro IBM Watson Labs. P. O. Box 704 Yorktown Heights, NY 10598 (Tel: 914-789-7863) Thanks, -Gerry ------ From hinton at ai.toronto.edu Mon Aug 15 20:06:26 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Mon, 15 Aug 88 20:06:26 EDT Subject: Benchmark In-Reply-To: Your message of Wed, 10 Aug 88 16:42:27 -0400. Message-ID: <88Aug15.172644edt.284@neat.ai.toronto.edu> I am interested in why FSA learning should be a benchmark task. Please expound. Geoff From jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU Tue Aug 16 16:40:56 1988 From: jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU (Jerry Feldman) Date: Tue, 16 Aug 88 13:40:56 PDT Subject: Benchmark Message-ID: <8808162040.AA17000@icsia7.Berkeley.EDU> Geoff, I thought you'd never ask. Finite State Automata (FSA) are the most primitive infitary systems. Almost all connectionist learning has involved only input/output maps, a very restricted form of computation. Tasks of psychological or engineering interest involve multiple step calculcations and FSA are the simplest of these. There is also lots of literature on FSA and the learning thereof and the generalization issue is clear. The benchmark task is to exhibit an initial network and learning rule that will converge(approximately, if you'd like) to a minimal FSA from ANY large sample generated by an FSA. There are several encodings possible including one-unit/one-state, Jordan's state vector or even the state of the whole network. As I said in response to Fahlman, a solution in any form would be fine. Jerry From hi.pittman at MCC.COM Wed Aug 17 16:40:00 1988 From: hi.pittman at MCC.COM (James Arthur Pittman) Date: Wed, 17 Aug 88 15:40 CDT Subject: Infitary systems In-Reply-To: <12423192061.47.DOUTHAT@A.ISI.EDU> Message-ID: <19880817204001.6.PITTMAN@DIMEBOX.ACA.MCC.COM> Douthat asks: BTW: what is an "infitary" system? Perhaps it is an infinitely infantile military system? From chrisley.pa at Xerox.COM Thu Aug 18 19:22:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 18 Aug 88 16:22 PDT Subject: Has anyone heard of ALOPEX? Message-ID: <880818-162715-7616@Xerox> I would appreciate it if anyone could give me information about ALOPEX (greek for "fox"), a neural net program released by a publisher of the same name. A friend of mine would like details, including information such as when it was released, etc. Thanks. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From Mark.J.Zeren at mac.Dartmouth.EDU Tue Aug 23 17:32:19 1988 From: Mark.J.Zeren at mac.Dartmouth.EDU (Mark.J.Zeren@mac.Dartmouth.EDU) Date: 23 Aug 88 17:32:19 EDT Subject: Japanese Connectionists? Message-ID: <17114@mac.dartmouth.edu> I am an undergraduate at Dartmouth College and have become very interested in connectionist research. I am spending the next year (Sept '88 - Sept '89) in Japan. I will be studying language through May with the University of Illinois program at Konan University outside of Kobe. If possible, I would like to pursue my interest in neural nets while in Japan, particularly next summer, thus getting "two birds with one stone." Any information about the connectionist community in Japan would be of use to me. Mark Zeren mark.zeren at dartmouth.edu From kawahara at av-convex.ntt.jp Tue Aug 23 19:48:30 1988 From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA) Date: Wed, 24 Aug 88 08:48:30+0900 Subject: Japanese Connectionists? Message-ID: <8808232348.AA02761@av-convex.NTT.jp> Welcom to Japan, Mr Zeren. Connectionism or so colled Neuro-computing in Japan is growing very rapidly since last year. Almost every technical psycholgical biological etc. societies are featured neural-nets sessions in their annual meetings. I have just returned from a private meeting which , I think, is the most important inter-diciplinary meeting for researchers in this field. It is "Neural Information Science Workshop" initiated by Dr.Fukushima of NHK labs. It is an annual meeting over 10 years. I'll report it later. You can contact many researchers via CSNET-JUNET link. I think about 100 organizations are reading connectionists mails regularly. It will be convenient for you to contact the ATR labs., which is located near Kobe where you will stay. Hideki Kawahara kawahara%nttlab.ntt.JP at RELAY.CS.NET (from ARPA site) NTT Basic Research Labs. From alexis%yummy at gateway.mitre.org Wed Aug 24 12:04:17 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Wed, 24 Aug 88 12:04:17 EDT Subject: Mistake in DARPA NN Report Message-ID: <8808241604.AA02507@marzipan.mitre.org> I just read the executive summary of the "DARPA Neural Network Study" by MIT/Lincoln Labs which is really quite good (I would have prefered less emphasis on computing power and more on say learning but ...). Unfortunately they repeat a mistake in the intro about ability of feed- forward networks. In Figure 4-4 and the supporting text on p. 15 they state that a net with 2 in and 1 out can partition the 2D input space as such: One-Layer ----- Two-Layer ----- Three-Layer ############ #######::::::: ####:::::::::: :::::::::::::: ::##### A ## ## A ##:: B :: ######::: B :: : B :###:::::: ::::######## #######::::::: ## A ###:::::: ::::######:::: ::::::###### :::::::####### ########:::::: :::### A ###:: : B ::::#### :: B ::## A ## #######::::::: :::::######::: ::::::::::## :::::::####### ######:::::::: :::::::::::::: Certainly a one-layer (i.e., Perceptron) can linearly partition, and a three-layer (with enough nodes) can do anything, but otherwise the figure is all wrong. The "island" shown for a three- :::::::::::::: layer can easily be done by a two layer. In our paper :::########::: "Geometric Analysis of Neural Network Capabilities" :::##::::##::: (ICNN87, VIII p385) we bother to take this to the :::##::::::::: extreme by doing something like the "C" (for convex) :::##::::##::: at left. Actually any finite number of finitely :::########::: complex items can be done with a two-layer net. :::::::::::::: Far worse, the "four-quadrant" problem shown under ######:::::: two-layers *CANNOT* be done with two layers. There ####:::::::: are few problems that can't be done with two layers, ##:::::::::: but the easiest I know of is precisely that. Assuming ::::::::::## thoses boundaries go on to +/- infinity this requires ::::::::#### a three-layer net (if they only go a finite distance ::::::###### you can do it with 2-layer if the inputs go to both layers). The report states that this is how an XOR is done with two layers, when in fact it is done by having a single "valley" (or equiv. a "mountain" the other way) like the fig at left. Just grumbling .... alexis wieland MITRE Corp. From harnad at Princeton.EDU Wed Aug 24 14:15:27 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 24 Aug 88 14:15:27 edt Subject: Pinker & Prince on Rules & Learning Message-ID: <8808241815.AA28089@mind> On Pinker & Prince on Rules & Learning Steve: Having read your Cognition paper and twice seen your talk (latest at cogsci-88), I thought I'd point out what look like some problems with the argument (as I understand it). In reading my comments, please bear in mind that I am NOT a connectionist; I am on record as a sceptic about connectionism's current accomplishments (and how they are being interpreted and extrapolated) and as an agnostic about its future possibilities. (Because I think this issue is of interest to the connectionist/AI community as a whole, I am branching a copy of this challenge to connectionists and comp.ai.) (1) An argument that pattern-associaters (henceforth "nets") cannot do something in principle cannot be based on the fact that a particular net (Rumelhart & McClelland 86/87) has not done it in practice. (2) If the argument is that nets cannot learn past tense forms (from ecologically valid samples) in principle, then it's the "in principle" part that seems to be missing. For it certainly seems incorrect that past tense formation is not learnable in principle. I know of no poverty-of-the-stimulus argument for past tense formation. On the contrary, the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample. (This is without even mentioning that surely no one imagines that past-tense formation is an independent cognitive module; it is probably learned jointly with other morphological regularities and irregularities, and there may well be degrees-of-freedom-reducing cross-talk.) (3) If the argument is only that nets cannot learn past tense forms without rules, then the matter is somewhat vaguer and more equivocal, for there are still ambiguities about what it is to be or represent a "rule." At the least, there is the issue of "explicit" vs. "implicit" representation of a rule, and the related Wittgensteinian distinction between "knowing" a rule and merely being describable as behaving in accordance with a rule. These are not crisp issues, and hence not a solid basis for a principled critique. For example, it may well be that what nets learn in order to form past tenses correctly is describable as a rule, but not explicitly represented as one (as it would be in a symbolic program); the rule may simple operate as a causal I/O constraint. Ultimately, even conditional branching in a symbolic program is implemented as a causal constraint; "if/then" is really just an interpretation we can make of the software. The possibility of making such systematic, decomposable semantic intrepretations is, of course, precisely what distinguishes the symbolic approach from the connectionistic one (as Fodor/Pylyshyn argue). But at the level of a few individual "rules," it is not clear that the higher-order interpretation AS a formal rule, and all of its connotations, is justified. In any case, the important distinction is that the net's "rules" are LEARNED from statistical regularities in the data, rather than BUILT IN (as they are, coincidentally, in both symbolic AI and poverty-of-the-stimulus-governed linguistics). [The intermediate case of formally INFERRED rules does not seem to be at issue here.] So here are some questions: (a) Do you believe that English past tense formation is NOT learnable (except as "parameter settings" on an innate structure, from impoverished data)? If so, what are the supporting arguments for that? (b) If past tense formation IS learnable in the usual sense (i.e., by trial-and-error induction of regularities from the data sample), then do you believe that it is specifically unlearnable by nets? If so, what are the supporting arguments for that? (c) If past tense formation IS learnable by nets, but only if the invariance that the net learns and that comes to causally constrain its successful performance is describable as a "rule," what's wrong with that? Looking forward to your commentary on Lightfoot, where poverty-of-the-stimulus IS the explicit issue, -- best wishes, Stevan Harnad From chrisley.pa at Xerox.COM Wed Aug 24 20:50:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 24 Aug 88 17:50 PDT Subject: Roommates wanted In-Reply-To: huyser@mojave.Stanford.EDU (Karen Huyser)'s message of Wed, 10 Aug 88 10:34:31 PDT Message-ID: <880824-180000-3025@Xerox> Is Vip still looking for a roommate for INNS? -- Ron From harnad at Princeton.EDU Thu Aug 25 01:17:37 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Thu, 25 Aug 88 01:17:37 edt Subject: On Pinker & Prince On Rules & Learning Message-ID: <8808250517.AA03589@mind> On Pinker & Prince on Rules & Learning To: Steve Pinker, Psychology, MIT Steve: Having read your Cognition paper and twice seen your talk (latest at cogsci-88), I thought I'd point out what look like some problems with the argument (as I understand it). In reading my comments, please bear in mind that I am NOT a connectionist; I am on record as a sceptic about connectionism's current accomplishments (and how they are being interpreted and extrapolated) and as an agnostic about its future possibilities. (Because I think this issue is of interest to the connectionist/AI community as a whole, I am branching a copy of this challenge to connectionists and comp.ai.) (1) An argument that pattern-associaters (henceforth "nets") cannot do something in principle cannot be based on the fact that a particular net (Rumelhart & McClelland 86/87) has not done it in practice. (2) If the argument is that nets cannot learn past tense forms (from ecologically valid samples) in principle, then it's the "in principle" part that seems to be missing. For it certainly seems incorrect that past tense formation is not learnable in principle. I know of no poverty-of-the-stimulus argument for past tense formation. On the contrary, the regularities you describe -- both in the irregular verbs and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample. (This is without even mentioning that surely no one imagines that past-tense formation is an independent cognitive module; it is probably learned jointly with other morphological regularities and irregularities, and there may well be degrees-of-freedom-reducing cross-talk.) (3) If the argument is only that nets cannot learn past tense forms without rules, then the matter is somewhat vaguer and more equivocal, for there are still ambiguities about what it is to be or represent a "rule." At the least, there is the issue of "explicit" vs. "implicit" representation of a rule, and the related Wittgensteinian distinction between "knowing" a rule and merely being describable as behaving in accordance with a rule. These are not crisp issues, and hence not a solid basis for a principled critique. For example, it may well be that what nets learn in order to form past tenses correctly is describable as a rule, but not explicitly represented as one (as it would be in a symbolic program); the rule may simply operate as a causal I/O constraint. Ultimately, even conditional branching in a symbolic program is likewise implemented as a causal constraint; "if/then" is really just an interpretation we can make of the software. The possibility of making such systematic, decomposable semantic intrepretations is, of course, precisely what distinguishes the symbolic approach from the connectionistic one (as Fodor/Pylyshyn argue). But at the level of a few individual "rules," it is not clear that the higher-order interpretation AS a formal rule, and all of its connotations, is justified. In any case, the important distinction is that the net's "rules" are LEARNED from statistical regularities in the data, rather than BUILT IN (as they are, coincidentally, in both symbolic AI and poverty-of-the-stimulus-governed linguistics). [The intermediate case of formally INFERRED rules does not seem to be at issue here.] So here are some questions: (a) Do you believe that English past tense formation is NOT learnable (except as "parameter settings" on an innate structure, from impoverished data)? If so, what are the supporting arguments for that? (b) If past tense formation IS learnable in the usual sense (i.e., by trial-and-error induction of regularities from the data sample), then do you believe that it is specifically not learnable by nets? If so, what are the supporting arguments for that? (c) If past tense formation IS learnable by nets, but only if the invariance that the net learns and that comes to causally constrain its successful performance is describable as a "rule," what's wrong with that? Looking forward to your commentary on Lightfoot, where poverty-of-the-stimulus IS the explicit issue, -- best wishes, Stevan Harnad From alexis%yummy at gateway.mitre.org Thu Aug 25 12:41:36 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 25 Aug 88 12:41:36 EDT Subject: DARPA NN Report Message-ID: <8808251641.AA03463@marzipan.mitre.org> I've had a number of people asking how to get the DARPA report. The report was announced at the Government Panel at the ICNN88. Call the Pentagon in Washington, DC at (202)697-5737 to obtain a copy of the (78 page) executive summary {sorry, I don't know a E-mail or snail-mail address}. The complete 600 page study final report is supposed to be available as a Lincoln Labs Report and as a book (possibly published by AFSIA). alexis. From alexis%yummy at gateway.mitre.org Thu Aug 25 07:47:12 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 25 Aug 88 07:47:12 EDT Subject: four-quadrant problem Message-ID: <8808251147.AA03129@marzipan.mitre.org> It was long and had nice pictures, but not correct ... Before someone else catches it (I based part of my message in part on old notes that counted layers different): a) You can't do the four-quadrant problem with two layers ever (even for a finite distance) b) You can't do "any finite number of finitely complex items" with 2- layers, you can often do lots, but a counter example is (a) above. This makes the DARPA report even more certainly wrong, but in also puts me in a glass house to some degree ... alexis From pauls at boulder.Colorado.EDU Thu Aug 25 12:14:52 1988 From: pauls at boulder.Colorado.EDU (Paul Smolensky) Date: Thu, 25 Aug 88 10:14:52 MDT Subject: Mistake in DARPA NN Report Message-ID: <8808251614.AA24053@sigi.colorado.edu> if we're going to discuss the DARPA report on this net, we should discuss not only the technical syntactic sugar but also the political ramifications. can you tell us all how to get a copy of the report so we can have an informed political discussion? thanks, Paul Smolensky Dept. of Computer Science Univ. of Colorado Box 430 Boulder, CO 80309-0430 From ceci at boulder.Colorado.EDU Thu Aug 25 15:44:15 1988 From: ceci at boulder.Colorado.EDU (Lou Ceci) Date: Thu, 25 Aug 88 13:44:15 MDT Subject: Mistake in DARPA NN Report Message-ID: <8808251944.AA04670@tut> Dear Mr. Wieland: I enjoyed the article you and Mr. Leighton did for the ICNN more than any other. It was not only well-written, it was *CLEAR*--a rarity in today's connectionist literature. Other than proof by demonstration, can you point me to the mathematics behind the claims that "a neural net with X number of layers can partition a space into Y different regions of Z complexity"? Thanks. --Lou Ceci CU Boulder ceci at boulder.colorado.edu From chrisley.pa at Xerox.COM Fri Aug 26 16:26:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 26 Aug 88 13:26 PDT Subject: four-quadrant problem In-Reply-To: alexis%yummy@gateway.mitre.org's message of Thu, 25 Aug 88 07:47:12 EDT Message-ID: <880826-134143-6475@Xerox> I am very interested, as may be other connectionists who have not read the DARPA report, in knowing what the four-quadrant problem is, exactly. I am interested in any task that is proposed as not being able to be solved by a 2-layer network, since Huang and Lippmann seem to indicate that there may not be any such tasks. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From hinton at ai.toronto.edu Fri Aug 26 18:23:01 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Fri, 26 Aug 88 18:23:01 EDT Subject: tenure-track job Message-ID: <88Aug26.154316edt.640@neat.ai.toronto.edu> McMaster University in Hamilton (near Toronto) has a tenure stream position in computer science for a "neural net" person. They have a project on multisensor fusion. If you are near graduation or a postdoc and you are interested call Dr. Simon Haykin, Director Communications Research Lab. 416-648-6589 for details. Geoff From strom at ogcvax.ogc.edu Sat Aug 27 19:51:20 1988 From: strom at ogcvax.ogc.edu (Dan Hammerstrom) Date: Sat, 27 Aug 88 16:51:20 PDT Subject: Faculty Position Message-ID: <8808272351.AA26695@ogcvax.OGC.EDU> FACULTY POSITION AVAILABLE: Connectionist/Neural Networks The Oregon Graduate Center The Computer Science/Engineering Department at the Oregon Gradu- ate Center seeks to hire faculty in the field of Connectionist/Neural Networks. We are interested in expanding an already successful program in this important area. Our current connectionist program is strongly oriented towards VLSI implemen- tation, and we regularly fabricate silicon to support our research efforts. In addition to VLSI, we are also starting a strong speech recognition effort with the arrival of a new faculty member, Ron Cole, who just recently joined us from Carne- gie Mellon University. Our program is well funded, and we actively seek additional research talent in the form of either junior or senior faculty. The Oregon Graduate Center is a private institute for research and graduate education (MS and PhD) in the applied sciences. OGC gives its faculty unmatched freedom and responsibility in direct- ing their research programs. The typical OGC faculty member spends 2/3 of his or her time on research. OGC is in the heart of Oregon's Sunset Corridor, amid such companies as Tektronix, Intel, Floating Point Systems, Mentor Graphics, Sequent, Cogent, National Semiconductor, Lattice, Fujitsu, Adaptive Systems Inc., NCube, Servio Logic, and NEC. OGC works because it has a first- rate faculty that is motivated and self-directed. In addition, the state of Oregon, in conjunction with OGC other Oregon schools and local industry, has begun OACIS (the Oregon Advanced Computer Institute). This institute will eventually provide excellent computing resources for research in parallel applications. The Department occupies a new building with comfortable offices, extensive laboratory space, and built-in computer communications. Its equipment base includes Tektronix, DEC, Sun, and Mentor workstations, high-resolution color graphic design stations, a Sequent Symmetry, a Cogent system, and an Intel iPSC Hypercube. The Portland environment is as stimulating as that at OGC: the climate is mild, there is easy access to year-round skiing, ocean beaches, and hiking in mountains and high desert. Dan Hammerstrom Department of Computer Science/Engineering Oregon Graduate Center 19600 NW von Neumann Dr. Beaverton, OR 97007 (503) 690-1160 CSNET: strom at ogc.edu From strom at ogcvax.ogc.edu Sat Aug 27 19:51:20 1988 From: strom at ogcvax.ogc.edu (Dan Hammerstrom) Date: Sat, 27 Aug 88 16:51:20 PDT Subject: Faculty Position Message-ID: <8808272351.AA26695@ogcvax.OGC.EDU> FACULTY POSITION AVAILABLE: Connectionist/Neural Networks The Oregon Graduate Center The Computer Science/Engineering Department at the Oregon Gradu- ate Center seeks to hire faculty in the field of Connectionist/Neural Networks. We are interested in expanding an already successful program in this important area. Our current connectionist program is strongly oriented towards VLSI implemen- tation, and we regularly fabricate silicon to support our research efforts. In addition to VLSI, we are also starting a strong speech recognition effort with the arrival of a new faculty member, Ron Cole, who just recently joined us from Carne- gie Mellon University. Our program is well funded, and we actively seek additional research talent in the form of either junior or senior faculty. The Oregon Graduate Center is a private institute for research and graduate education (MS and PhD) in the applied sciences. OGC gives its faculty unmatched freedom and responsibility in direct- ing their research programs. The typical OGC faculty member spends 2/3 of his or her time on research. OGC is in the heart of Oregon's Sunset Corridor, amid such companies as Tektronix, Intel, Floating Point Systems, Mentor Graphics, Sequent, Cogent, National Semiconductor, Lattice, Fujitsu, Adaptive Systems Inc., NCube, Servio Logic, and NEC. OGC works because it has a first- rate faculty that is motivated and self-directed. In addition, the state of Oregon, in conjunction with OGC other Oregon schools and local industry, has begun OACIS (the Oregon Advanced Computer Institute). This institute will eventually provide excellent computing resources for research in parallel applications. The Department occupies a new building with comfortable offices, extensive laboratory space, and built-in computer communications. Its equipment base includes Tektronix, DEC, Sun, and Mentor workstations, high-resolution color graphic design stations, a Sequent Symmetry, a Cogent system, and an Intel iPSC Hypercube. The Portland environment is as stimulating as that at OGC: the climate is mild, there is easy access to year-round skiing, ocean beaches, and hiking in mountains and high desert. Dan Hammerstrom Department of Computer Science/Engineering Oregon Graduate Center 19600 NW von Neumann Dr. Beaverton, OR 97007 (503) 690-1160 CSNET: strom at ogc.edu From jmlubin at phoenix.Princeton.EDU Mon Aug 29 04:33:06 1988 From: jmlubin at phoenix.Princeton.EDU (Joseph Michael Lubin) Date: Mon, 29 Aug 88 04:33:06 edt Subject: mailing list Message-ID: <8808290833.AA19782@phoenix.Princeton.EDU> if this is the request node for the connectionists mailing list please add my name if you ahve access to the list please forward my name thank you, Joseph Lubin From nutto%UMASS.BITNET at VMA.CC.CMU.EDU Mon Aug 29 19:24:32 1988 From: nutto%UMASS.BITNET at VMA.CC.CMU.EDU (nutto%UMASS.BITNET@VMA.CC.CMU.EDU) Date: Mon, 29 Aug 88 19:24:32 EDT Subject: Neural Computation Message-ID: <880829192242C37.AFRK@Mars.UCC.UMass.EDU> (UMass-Mailer 4.04) I saw your address in a recent message to the Biotech, Physics, and Psychology digests. I am a psychology major concentrating in neuroscience and minoring in zoology, and I would appreciate it if you could send me any information about your mailing list. Thanx. USnail: Andy Steinberg BITNet: nutto at UMass PO Box 170 nutto at Mars.UCC.UMass.EDU Hadley, MA 01035-0170 Internet: nutto%UMass.BITNet at cunyvm.cuny.edu Phone: (413) 546-4908 nutto%UMass.BITNet at mitvma.mit.edu From munnari!nswitgould.oz.au!geof at uunet.UU.NET Tue Aug 30 12:10:09 1988 From: munnari!nswitgould.oz.au!geof at uunet.UU.NET (Geoffrey Jones) Date: Tue, 30 Aug 88 11:10:09 EST Subject: Jordan paper request Message-ID: <8808300249.AA02218@uunet.UU.NET> Can anyone help me? I'm after a copy of M. I. Jordan's 1986 paper "Attractor dynamics and parallelism in a connectionist sequential machine" which appeared in _Proceedings of the Eighth Annual Meeting of the Cognitive Science Society_, Hillsdale, NJ, Erlbaum. The Proceedings aren't available in any Australian University library, hence the call for an overseas source. Rather than everyone rushing to their filing cabinets, could anyone (preferably the author himself) with access to a copy email me to that effect and we can go from there. Thank-you all in advance. Cheers. geof. ---------------------------------------------------------------------------- Geoffrey Jones ACSnet: geof at nswitgould.oz Dept. of Computer Science CSNET: geof at nswitgould.oz U. of Technology, Sydney ARPA: geof%nswitgould.oz at uunet.uu.net P.O. Box 123, UUCP: {uunet,ukc}!munnari!nswitgould.oz!geof Broadway, 2007 AUSTRALIA Phone: (02) 218 9582 ---------------------------------------------------------------------------- From hendler at dormouse.cs.umd.edu Tue Aug 30 10:15:29 1988 From: hendler at dormouse.cs.umd.edu (Jim Hendler) Date: Tue, 30 Aug 88 10:15:29 EDT Subject: Jordan paper request Message-ID: <8808301415.AA04288@dormouse.cs.umd.edu> > Can anyone help me? I'm after a copy of M. I. Jordan's 1986 paper > "Attractor dynamics and parallelism in a connectionist sequential machine" > which appeared in _Proceedings of the Eighth Annual Meeting of the > Cognitive Science Society_, Hillsdale, NJ, Erlbaum. The Proceedings Geof brings up a good point, with more and more connectionist papers showing up at Coggie Sci., those of us missing the meetings for one reason or another end up missing some good papers. Does anyone know if the Proceedings can be ordered separately these days? In the old days the Cog Sci proceedings were NOT available outside the conference, is this still true? If so, could we agitate a bit to change it? -Jim H From Dave.Touretzky at B.GP.CS.CMU.EDU Tue Aug 30 21:47:47 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Tue, 30 Aug 88 21:47:47 EDT Subject: Jordan paper request In-Reply-To: Your message of Tue, 30 Aug 88 10:15:29 -0400. <8808301415.AA04288@dormouse.cs.umd.edu> Message-ID: <4483.588995267@DST.BOLTZ.CS.CMU.EDU> The proceedings of this year's and previous Cognitive Science conferences can be ordered from Lawrence Erlbaum Associates. I bought a copy of this year's proceedings at AAAI last week. The price for this year's Cog Sci proceedings is $49.99. If you pay by check, LEA will pay postage and handling (US and Canada only); outside the Americas add $5 per book. If you charge your order to a VISA, MasterCard, AmEx, or Discover card, UPS charges will be addded to the bill. New Jersey residents must add sales tax. Order from: Lawrence Erlbaum Associates, Inc. 365 Broadway Hillsdale, NJ 07642 tel. 201-666-4110 -- Dave PS: sending email to hundreds of people with the push of a button can be fun, but sometimes it pays to know how to use a telephone. From steve at cogito.mit.edu Tue Aug 30 18:44:05 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Tue, 30 Aug 88 18:44:05 edt Subject: Reply to S. Harnad's questions, short version Message-ID: <8808302244.AA27242@ATHENA.MIT.EDU> Alluding to our paper "On Language and Connectionism: Analysis of a PDP model of language acquisition", Stevan Harnad has posted a list of questions and observations as a 'challenge' to us. His remarks owe more to the general ambience of the connectionism / symbol-processing debate than to the actual text of our paper, in which the questions are already answered. We urge those interested in these issues to read the paper or the nutshell version published in Trends in Neurosciences, either of which may be obtained from Prince (address below). In this note we briefly answer Harnad's three questions. In another longer message to follow, we direct an open letter to Harnad which justifies the answers and goes over the issues he raises in more detail. Question # 1: Do we believe that English past tense formation is not learnable? Of course we don't! So imperturbable is our faith in the learnability of this system that we ourselves propose a way in which it might be done (OLC, 130-136). Question #2: If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but with which theories are true, and our conclusions were about pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave in the way that such a pattern associator behaves. Question # 3: If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! --just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) but only if the invariance that the molecules learn, etc. etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as *irregular* past tense forms. But it is simply a fact about the *regular* past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical predictions of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without an appeal to the rules that they happen to implement. But you can't have it both ways. Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu References: Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From steve at cogito.mit.edu Tue Aug 30 18:46:06 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Tue, 30 Aug 88 18:46:06 edt Subject: Reply to S. Harnad's questions, longer version Message-ID: <8808302246.AA27270@ATHENA.MIT.EDU> Dear Stevan, This letter is a reply to your posted list of questions and observations alluding to our paper "On language and connectionism: Analysis of a PDP model of language acquisition" (Pinker & Prince, 1988; see also Prince and Pinker, 1988). The questions are based on misunderstandings of our papers, in which they are already answered. (1) Contrary to your suggestion, we never claimed that pattern associators cannot learn the past tense rule, or anything else, in principle. Our concern is with which theories of the psychology of language are true. This question cannot be answered from an archair but only by examining what people learn and how they learn it. Our main conclusion is that the claim that the English past tense rule is learned and represented as a pattern-associator with distributed representations over phonological features for input and output forms (e.g., the Rumelhart-McClelland 1986 model) is false. That's because what pattern-associators are good at is precisely what the regular rule doesn't need. Pattern associators are designed to pick up patterns of correlation among input and output features. The regular past tense alternation, as acquired by English speakers, is not systematically sensitive to phonological features. Therefore some of the failures of the R-M model we found are traceable to its trying to handle the regular rule with an architecture inappropriate to the regular rule. We therefore predict that these failures should be seen in other network models that compute the regular past tense alternation using pattern associators with distributed phonological representations (*not* all conceivable network models, in general, in principle, forever, etc.). This prediction has been confirmed. Egedi and Sproat (1988) devised a network model that retained the assumption of associations between distributed phonological representations but otherwise differed radically from the R-M model: it had three layers, not two; it used a back-propagation learning rule, not just the simple perceptron convergence procedure; it used position-specific phonological features, not context-dependent ones; and it had a completely different output decoder. Nonetheless its successes and failures were virtually identical to those of the R-M model. (2) You claim that "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." This is an interesting claim and we strongly encourage you to back it up with argument and analysis; a real demonstration of its truth would be a significant advance. It's certainly false of the R-M and Egedi-Sproat models. There's a real danger in this kind of glib commentary of trivializing the issues by assuming that net models are a kind of miraculous wonder tissue that can do anything. The brilliance of the Rumelhart and McClelland (1986) paper is that they studiously avoided this trap. In the section of their paper called "Learning regular and exceptional patterns in a pattern associator" they took great pains to point out that pattern associators are good at specific things, especially exploiting statistical regularities in the mapping from one set of featural patterns to another. They then made the interesting emprical claim that these basic properties of the pattern associator model lie at the heart of the acquisition of the past tense. Indeed, the properties of the model afforded it some interesting successes with the *irregular* alternations, which fall into family resemblance clusters of the sort that pattern associators handle in interesting ways. But it is exactly these properties of the model that made it fail at the *regular* alternation, which does not form family resemblance clusters. We like to think that these kinds of comparisons make for productive empirical science. The successes of the pattern associator architecture for irregulars teaches us something about the psychology of the irregulars (basically a memory phenomenon, we argue), and its failures for the regulars teach us something about the psychology of the regulars (use of a default rule, we argue). Rumelhart and McClelland disagree with us over the facts but not over the key emprical tests. They hold that pattern associators have particular aptitudes that are suited to modeling certain kinds of processes, which they claim are those of cognition. One can argue for or against this and learn something about psychology while so doing. Your claim about a 'statistical pattern learner...sensitive to higher order correlations' is essentially impossible to evaluate. (3) We're mystified that you attribute to us the claim that "past tense formation is not learnable in principle." The implication is that our critique of the R-M model was based on the assertion that the rule is unlearned and that this is the key issue separating us from R&M. Therefore -- you seem to reason -- if the rule is learned, it is learned by a network. But both parts are wrong. No one in his right mind would claim that the English past tense rule is "built in". We spent a full seven pages (130-136) of 'OLC' presenting a simple model of how the past tense rule might be learned by a symbol manipulation device. So obviously we don't believe it can't be learned. The question is how children in fact do it. The only way we can make sense of this misattribution is to suppose that you equate "learnable" with "learnable by some (nth-order) statistical algorithm". The underlying presupposition is that statistical modeling (of an undefined character) has some kind of philosophical priority over other forms of analysis; so that if statistical modeling seems somehow possible-in-principle, then rule-based models (and the problems they solve) can be safely ignored. As a kind of corollary, you seem to assume that unless the input is so impoverished as to rule out all statistical modeling, rule theories are irrelevant; that rules are impossible without major stimulus-poverty. In our view, the question is not CAN some (ungiven) algorithm 'learn' it, but DO learners approach the data in that fashion. Poverty-of-the-stimulus considerations are one out of many sources of evidence in this issue. (In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system: across languages, speakers automatically regularize verbs derived from nouns and adjectives (e.g., 'he high-sticked/*high-stuck the goalie'; she braked/*broke the car'), despite virtually no exposure to crucial informative data in childhood. This is evidence that the system is built around representations corresponding to the constructs 'word', 'root', and 'irregular'; see OLC 110-114.) (4) You bring up the old distinction between rules that describe overall behavior and rules that are explicitly represented in a computational device and play a causal role in its behavior. Perhaps, as you say, "these are not crisp issues, and hence not a solid basis for a principled critique". But it was Rumelhart and McClelland who first brought them up, and it was the main thrust of their paper. We tend to agree with them that the issues are crisp enough to motivate interesting research, and don't just degenerate into discussions of logical possibilities. We just disagree about which conclusions are warranted. We noted that (a) the R-M model is empirically incorrect, therefore you can't use it to defend any claims for whether or not rules are explicitly represented; (b) if you simply wire up a network to do exactly what a rule does, by making every decision about how to build the net (which features to use, what its topology should be, etc.) by consulting the rule-based theory, then that's a clear sense in which the network "implements" the rule. The reason is that the hand-wiring and tweaking of such a network would not be motivated by principles of connectionist theory; at the level at which the manipulations are carried out, the units and connections are indistinguishable from one another and could be wired together any way one pleased. The answer to the question "Why is the network wired up that way?" would come from the rule-theory; for example, "Because the regular rule is a default operation that is insensitive to stem phonology". Therefore in the most interesting sense such a network *is* a rule. The point carries over to more complex cases, where one would have different subnetworks corresponding to different parts of rules. Since it is the fact that the network implements such-and-such a rule that is doing the work of explaining the phenomenon, the question now becomes, is there any reason to believe that the rule is implemented in that way rather some other way? Please note that we are *not* asserting that no PDP model of any sort could ever acquire linguistic knowledge without directly implementing linguistic rules. Our hope, of course, is that as the discussion proceeds, models of all kinds will be become more sophisticated and ambitious. As we said in our Conclusion, "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle. At the same time, they show that there is no basis for the belief that connectionism will dissolve the difficult puzzles of language, or even provide radically new solutions to them." So to answer the catechism: (a) Do we believe that English past tense formation is not learnable? Of course we don't! (b) If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but about which theories are true, and our analysis was of pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave the way such a pattern associator behaves. (c) If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! -- just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) such that the invariance that the molecules learn, etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as irregular past tense forms. But it is simply a fact about the regular past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical properties of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without appeals to the rules that they happen to implement. But you can't have it both ways. Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet References: Egedi, D.M. and R.W. Sproat (1988) Neural Nets and Natural Language Morphology, AT&T Bell Laboratories, Murray Hill,NJ, 07974. Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From steve at cogito.mit.edu Wed Aug 31 10:22:56 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Wed, 31 Aug 88 10:22:56 edt Subject: Reply to S. Harnad's questions, longer version Message-ID: <8808311423.AA05929@ATHENA.MIT.EDU> Dear Stevan, This letter is a reply to your posted list of questions and observations alluding to our paper "On language and connectionism: Analysis of a PDP model of language acquisition" (Pinker & Prince, 1988; see also Prince and Pinker, 1988). The questions are based on misunderstandings of our papers, in which they are already answered. (1) Contrary to your suggestion, we never claimed that pattern associators cannot learn the past tense rule, or anything else, in principle. Our concern is with which theories of the psychology of language are true. This question cannot be answered from an archair but only by examining what people learn and how they learn it. Our main conclusion is that the claim that the English past tense rule is learned and represented as a pattern-associator with distributed representations over phonological features for input and output forms (e.g., the Rumelhart-McClelland 1986 model) is false. That's because what pattern-associators are good at is precisely what the regular rule doesn't need. Pattern associators are designed to pick up patterns of correlation among input and output features. The regular past tense alternation, as acquired by English speakers, is not systematically sensitive to phonological features. Therefore some of the failures of the R-M model we found are traceable to its trying to handle the regular rule with an architecture inappropriate to the regular rule. We therefore predict that these failures should be seen in other network models that compute the regular past tense alternation using pattern associators with distributed phonological representations (*not* all conceivable network models, in general, in principle, forever, etc.). This prediction has been confirmed. Egedi and Sproat (1988) devised a network model that retained the assumption of associations between distributed phonological representations but otherwise differed radically from the R-M model: it had three layers, not two; it used a back-propagation learning rule, not just the simple perceptron convergence procedure; it used position-specific phonological features, not context-dependent ones; and it had a completely different output decoder. Nonetheless its successes and failures were virtually identical to those of the R-M model. (2) You claim that "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." This is an interesting claim and we strongly encourage you to back it up with argument and analysis; a real demonstration of its truth would be a significant advance. It's certainly false of the R-M and Egedi-Sproat models. There's a real danger in this kind of glib commentary of trivializing the issues by assuming that net models are a kind of miraculous wonder tissue that can do anything. The brilliance of the Rumelhart and McClelland (1986) paper is that they studiously avoided this trap. In the section of their paper called "Learning regular and exceptional patterns in a pattern associator" they took great pains to point out that pattern associators are good at specific things, especially exploiting statistical regularities in the mapping from one set of featural patterns to another. They then made the interesting emprical claim that these basic properties of the pattern associator model lie at the heart of the acquisition of the past tense. Indeed, the properties of the model afforded it some interesting successes with the *irregular* alternations, which fall into family resemblance clusters of the sort that pattern associators handle in interesting ways. But it is exactly these properties of the model that made it fail at the *regular* alternation, which does not form family resemblance clusters. We like to think that these kinds of comparisons make for productive empirical science. The successes of the pattern associator architecture for irregulars teaches us something about the psychology of the irregulars (basically a memory phenomenon, we argue), and its failures for the regulars teach us something about the psychology of the regulars (use of a default rule, we argue). Rumelhart and McClelland disagree with us over the facts but not over the key emprical tests. They hold that pattern associators have particular aptitudes that are suited to modeling certain kinds of processes, which they claim are those of cognition. One can argue for or against this and learn something about psychology while so doing. Your claim about a 'statistical pattern learner...sensitive to higher order correlations' is essentially impossible to evaluate. (3) We're mystified that you attribute to us the claim that "past tense formation is not learnable in principle." The implication is that our critique of the R-M model was based on the assertion that the rule is unlearned and that this is the key issue separating us from R&M. Therefore -- you seem to reason -- if the rule is learned, it is learned by a network. But both parts are wrong. No one in his right mind would claim that the English past tense rule is "built in". We spent a full seven pages (130-136) of 'OLC' presenting a simple model of how the past tense rule might be learned by a symbol manipulation device. So obviously we don't believe it can't be learned. The question is how children in fact do it. The only way we can make sense of this misattribution is to suppose that you equate "learnable" with "learnable by some (nth-order) statistical algorithm". The underlying presupposition is that statistical modeling (of an undefined character) has some kind of philosophical priority over other forms of analysis; so that if statistical modeling seems somehow possible-in-principle, then rule-based models (and the problems they solve) can be safely ignored. As a kind of corollary, you seem to assume that unless the input is so impoverished as to rule out all statistical modeling, rule theories are irrelevant; that rules are impossible without major stimulus-poverty. In our view, the question is not CAN some (ungiven) algorithm 'learn' it, but DO learners approach the data in that fashion. Poverty-of-the-stimulus considerations are one out of many sources of evidence in this issue. (In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system: across languages, speakers automatically regularize verbs derived from nouns and adjectives (e.g., 'he high-sticked/*high-stuck the goalie'; she braked/*broke the car'), despite virtually no exposure to crucial informative data in childhood. This is evidence that the system is built around representations corresponding to the constructs 'word', 'root', and 'irregular'; see OLC 110-114.) (4) You bring up the old distinction between rules that describe overall behavior and rules that are explicitly represented in a computational device and play a causal role in its behavior. Perhaps, as you say, "these are not crisp issues, and hence not a solid basis for a principled critique". But it was Rumelhart and McClelland who first brought them up, and it was the main thrust of their paper. We tend to agree with them that the issues are crisp enough to motivate interesting research, and don't just degenerate into discussions of logical possibilities. We just disagree about which conclusions are warranted. We noted that (a) the R-M model is empirically incorrect, therefore you can't use it to defend any claims for whether or not rules are explicitly represented; (b) if you simply wire up a network to do exactly what a rule does, by making every decision about how to build the net (which features to use, what its topology should be, etc.) by consulting the rule-based theory, then that's a clear sense in which the network "implements" the rule. The reason is that the hand-wiring and tweaking of such a network would not be motivated by principles of connectionist theory; at the level at which the manipulations are carried out, the units and connections are indistinguishable from one another and could be wired together any way one pleased. The answer to the question "Why is the network wired up that way?" would come from the rule-theory; for example, "Because the regular rule is a default operation that is insensitive to stem phonology". Therefore in the most interesting sense such a network *is* a rule. The point carries over to more complex cases, where one would have different subnetworks corresponding to different parts of rules. Since it is the fact that the network implements such-and-such a rule that is doing the work of explaining the phenomenon, the question now becomes, is there any reason to believe that the rule is implemented in that way rather some other way? Please note that we are *not* asserting that no PDP model of any sort could ever acquire linguistic knowledge without directly implementing linguistic rules. Our hope, of course, is that as the discussion proceeds, models of all kinds will be become more sophisticated and ambitious. As we said in our Conclusion, "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle. At the same time, they show that there is no basis for the belief that connectionism will dissolve the difficult puzzles of language, or even provide radically new solutions to them." So to answer the catechism: (a) Do we believe that English past tense formation is not learnable? Of course we don't! (b) If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but about which theories are true, and our analysis was of pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave the way such a pattern associator behaves. (c) If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! -- just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) such that the invariance that the molecules learn, etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as irregular past tense forms. But it is simply a fact about the regular past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical properties of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without appeals to the rules that they happen to implement. But you can't have it both ways. Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet References: Egedi, D.M. and R.W. Sproat (1988) Neural Nets and Natural Language Morphology, AT&T Bell Laboratories, Murray Hill,NJ, 07974. Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From jlm+ at andrew.cmu.edu Wed Aug 31 12:23:11 1988 From: jlm+ at andrew.cmu.edu (James L. McClelland) Date: Wed, 31 Aug 88 12:23:11 -0400 (EDT) Subject: Replys to S. Harnad In-Reply-To: <8808302244.AA27242@ATHENA.MIT.EDU> References: <8808302244.AA27242@ATHENA.MIT.EDU> Message-ID: <8X72Tjy00jWDQ2M10o@andrew.cmu.edu> Steve -- In the first of your two messages, there seemed to be a failure to entertain the possibility that there might be a network that is not a strict implementation of a rule system nor a pattern associator of the type described by Rumelhart and me that could capture the past tense phenomena. The principle shortcoming of our network, in my view, was that it treated the problem of past-tense formation as a problem in which one generates the past tense of a word from its present tense. This of course cannot be the right way to do things, for reasons which you describe at some length in your paper. However, THIS problem has nothing to do with whether a network or some other method is used for going from present to past tense. Several researchers are now exploring models that take as input a distributed representation of the intended meaning, and generate as output a description of the phonological properties of the utterance that expresses that meaning. Such a network must have at least one hidden layer to do this task. Note that such a network would naturally be able to exploit the common structure of the various different versions of English inflectional morphology. It is already clear that it would have a much easier time learning inflection rather than word-reversal as a way of mastering past tense etc. What remain to be addressed are issues about the nature and onset of use of the regular inflection in English. Suffice it to say here that the claims you and Prince make about the sharp distinction between the regular and irregular systems deserve very close scrutiny. I for one find the arguments you give in favor of this view unconvincing. We will be writing at more length on these matters, but for now I just wanted two points to be clear: 1) The argument about what class of models a particular model's shortcomings exemplify is not an easy one to resolve, and there is considerable science and (yes) mathematics to be done to understand just what the classes are and what can be taken as examples of them. Just what generalization you believe you have reason to claim your arguments allow you to make has not always been clear. In the first of your two recent messages you state: Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but with which theories are true, and our conclusions were about pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. After receiving the message containing the above I wrote the following: Now, the model Rumelhart and I proposed was a pattern associator using distributed phonological representations, but so are the other kinds of models that people are currently exploring; they happen though to use such representations at the output and not the input and to have hidden layers. I strongly suspect that you would like your argument to apply to the broad class of models which might be encompassed by the phrase "pattern associators using distributed phonological representations", and I know for a fact that many readers think that this is what you intend. However, I think it is much more likely that your arguments apply to the much narrower class of models which map distributed phonological representations of present tense to distributed phonological represenations of past tense. In your longer, second note, you are very clear in stating that you indend your arguments to be taken against the narrow class of models that map phonology to phonology. I do hope that this sensible view gets propagated, as I think many may feel that you think you have a more general case. Indeed, your second message takes a general attitude that I find I can agree with: Let's do some more research and find out what can and can't be done and what the important taxonomic classes of architecture types might be. 2) There's quite a bit more empirical research to be done even characterizing accurately the facts about the past tense. I believe this research will show that you have substantially overstated the empirical situation in several respects. Just as one example, you and Prince state the following: The baseball term _to fly out_, meaning 'make an out by hitting a fly ball that gets caught', is derived from the baseball noun _fly (ball)_, meaning 'ball hit on a conspicuously parabolic trajectory', which is in turn related to the simple strong verb _fly_, 'proceed through the air. Everyone says 'he flied out'; no mere mortal has yet been observed to have "flown out" to left field. You repeated this at Cog Sci two weeks ago. Yet in October of 87 I received the message appended below, which directly contradicts your claim. As you state in your second, more constructive message, we ALL need to be very clear about what the facts are and not to rush around making glib statements! Jay McClelland ======================================================= [The following is appended with the consent of the author.] Date: Sun, 11 Oct 87 21:20:55 PDT From: elman at amos.ling.ucsd.edu (Jeff Elman) To: der at psych.stanford.edu, jlm at andrew.cmu.edu Subject: flying baseball players Heard in Thursday's play-off game between the Tigers and Twins: "...and he flew out to left field...he's...OUT!" What was that P&P were saying?! Jeff ======================================================= From jlm+ at andrew.cmu.edu Wed Aug 31 09:22:16 1988 From: jlm+ at andrew.cmu.edu (James L. McClelland) Date: Wed, 31 Aug 88 09:22:16 -0400 (EDT) Subject: Jordan paper request In-Reply-To: <8808301415.AA04288@dormouse.cs.umd.edu> References: <8808301415.AA04288@dormouse.cs.umd.edu> Message-ID: Back copies of the Cognitive Science Proceedings are available for 49.95 from: Lawrence Erlbaum Associates, Inc. Publsihers 365 Broadway, Hillside, NJ 07642 (201) 767-8450 I presume this includes the 1988 proceedings, though only earlier proceedings are mentioned in the advertisement printed on the back of the 1988 proceedings. In any case the 1986 proceedings containing the Jordan paper amd lots of other good stuff is still available. I might also add that the journal Cognitive Science (the publication of the Cognitive Science Society) has a commitment to the exploration of connectionist models. I am one of the senior editors and the editorial board includes several prominant connectionists. I speak for the journal in saying that we welcome connectionist research with an interdisiplinary flavor. There will be a group of connectionist papers coming out shortly. If you want to submit, read the instructions for authors inside the back cover of a recent issue. If you want to subscribe, write to Ablex Publishing, 355 Chestnut St. Norwood, NJ 07648 or join the society by writing to Alan Lesgold, Secretary-Treasurer Learning Research and Development Center University of Pittsburgh Pittsburgh, PA 15260 membership is just a bit more than a plain subscription and gets you announcements about meetings etc as well as the journal. -- Jay McClelland From harnad at Princeton.EDU Wed Aug 31 16:39:33 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 31 Aug 88 16:39:33 edt Subject: On Theft vs. Honest Toil (Pinker & Prince Discussion, cont'd) Message-ID: <8808312039.AA01275@mind> Pinker & Prince write in reply: >> Contrary to your suggestion, we never claimed that pattern associators >> cannot learn the past tense rule, or anything else, in principle. I've reread the paper, and unfortunately I still find it ambiguous: For example, one place (p. 183) you write: "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle." But elsewhere (p. 179) you write: "the representations used in decomposed, modular systems are abstract, and many aspects of their organization cannot be learned in any obvious way." [Does past tense learning depend on any of this unlearnable organization?] On p. 181 you write: "Perhaps it is the limitations of these simplest PDP devices -- two-layer association networks -- that causes problems for the R & M model, and these problems would diminish if more sophisticated kinds of PDP networks were used." But earlier on the same page you write: "a model that can learn all possible degrees of correlation among a set of features is not a model of a human being" [Sounds like a Catch-22...] It's because of this ambiguity that my comments were made in the form of conditionals and questions rather than assertions. But we now stand answered: You do NOT claim "that pattern associaters cannot learn the past tense rule, or anything else, in principle." [Oddly enough, I do: if by "pattern associaters" you mean (as you mostly seem to mean) 2-layer perceptron-style nets like the R & M model, then I would claim that they cannot learn the kinds of things Minsky showed they couldn't learn, in principle. Whether or not more general nets (e.g., PDP models with hidden layers, back-prop, etc.) will turn out to have corresponding higher-order limitations seems to be an open question at this point.] You go on to quote my claim that: "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." and then you comment: >> This is an interesting claim and we strongly encourage you to back it >> up with argument and analysis; a real demonstration of its truth would >> be a significant advance. It's certainly false of the R-M and >> Egedi-Sproat models. There's a real danger in this kind of glib >> commentary of trivializing the issues by assuming that net models are >> a kind of miraculous wonder tissue that can do anything. I don't understand the logic of your challenge. You've disavowed having claimed that any of this was unlearnable in principle. Why is it glibber to conjecture that it's learnable in practice than that it's unlearnable in practice? From everything you've said, it certainly LOOKS perfectly learnable: Sample a lot of forms and discover that the default invariance turns out to work well in most cases (i.e., the "regulars"; the rest, the "irregulars," have their own local invariances, likewise inducible from statistical regularities in the data). This has nothing to do with a belief in wonder tissue. It was precisely in order to avoid irrelevant stereotypes like that that the first posting was prominently preceded by the disclaimer that I happen to be a sceptic about connectionism's actual accomplishments and an agnostic about its future potential. My critique was based solely on the logic of your argument against connectionism (in favor of symbolism). Based only on what you've written about its underlying regularities, past tense rule learning simply doesn't seem to pose a serious challenge for a statistical learner -- not in principle, at any rate. It seems to have stumped R & M 86 and E & S 88 in practice, but how many tries is that? It is possible, for example, as suggested by your valid analysis of the limitations of the Wickelfeature representation, that some of the requisite regularities are simply not reflected in this phonological representation, or that other learning (e.g. plurals) must complement past-tense data. This looks more like an entry-point problem (see (1) below), however, rather than a problem of principle for connectionist learning of past tense formation. After all, there's no serious underdetermination here; it's not like looking for a needle in a haystack, or NP-complete, or like that. I agree that R & M made rather inflated general claims on the basis of the limited success of R & M 86. But (to me, at any rate) the only potentially substantive issue here seems to be the one of principle (about the relative scope and limits of the symbolic vs. the connectionistic approach). Otherwise we're all just arguing about the scope and limits of R & M 86 (and perhaps now also E & S 88). Two sources of ambiguity seem to be keeping this disagreement unnecessarily vague: (1) There is an "entry-point" problem in comparing a toy model (e.g., R & M 86) with a lifesize cognitive capacity (e.g., the human ability to form past tenses): The capacity may not be modular; it may depend on other capacities. For example, as you point out in your article, other phonological and morphological data and regularities (e.g., pluralization) may contribute to successful past tense formation. Here again, the challenge is to come up with a PRINCIPLED limitation, for otherwise the connectionist can reasonably claim that there's no reason to doubt that those further regularities could have been netted exactly the same way (if they had been the target of the toy model); the entry point just happened to be arbitrarily downstream. I don't say this isn't hand-waving; but it can't be interestingly blocked by hand-waving in the opposite direction. (2) The second factor is the most critical one: learning. You put a lot of weight on the idea that if nets turn out to behave rulefully then this is a vindication of the symbolic approach. However, you make no distinction between rules that are built in (as "constraints," say) and rules that are learned. The endstate may be the same, but there's a world of difference in how it's reached -- and that may turn out to be one of the most important differences between the symbolic approach and connectionism: Not whether they use rules, but how they come by them -- by theft or honest toil. Typically, the symbolic approach builds them in, whereas the connectionistic one learns them from statistical regularities in its input data. This is why the learnability issue is so critical. (It is also what makes it legitimate for a connectionist to conjecture, as in (1) above, that if a task is nonmodular, and depends on other knowledge, then that other knowledge too could be acquired the same way: by learning.) >> Your claim about a 'statistical pattern learner...sensitive to higher >> order correlations' is essentially impossible to evaluate. There are in principle two ways to evaluate it, one empirical and open-ended, the other analytical and definitive. You can demonstrate that specific regularities can be learned from specific data by getting a specific learning model to do it (but its failure would only be evidence that that model fails for those data). The other way is to prove analytically that certain kinds of regularities are (or are not) learnable from certain kinds of data (in certain ways, I might add, because connectionism may be only one candidate class of statistical learning algorithms). Poverty-of-the-stimulus arguments attempt to demonstrate the latter (i.e., unlearnability in principle). >> We're mystified that you attribute to us the claim that "past >> tense formation is not learnable in principle."... No one in his right >> mind would claim that the English past tense rule is "built in". We >> spent a full seven pages (130-136) of 'OLC' presenting a simple model >> of how the past tense rule might be learned by a symbol manipulation >> device. So obviously we don't believe it can't be learned. Here are some extracts from OLC 130ff: "When a child hears an inflected verb in a single context, it is utterly ambiguous what morphological category the inflection is signalling... Pinker (1984) suggested that the child solves this problem by "sampling" from the space of possible hypotheses defined by combinations of an innate finite set of elements, maintaining these hypotheses in the provisional grammar, and testing them against future uses of that inflection, expunging a hypothesis if it is counterexemplified by a future word. Eventually... only correct ones will survive." [The text goes on to describe a mechanism in which hypothesis strength grows with success frequency and diminishes with failure frequency through trial and error.] "Any adequate rule-based theory will have to have a module that extracts multiple regularities at several levels of generality, assign them strengths related to their frequency of exemplification by input verbs, and let them compete in generating a past tense for for a given verb." It's not entirely clear from the description on pp. 130-136 (probably partly because of the finessed entry-point problem) whether (i) this is an innate parameter-setting or fine-tuning model, as it sounds, with the "learning" really just choosing among or tuning the built-in parameter settings, or whether (ii) there's genuine bottom-up learning going on here. If it's the former, then that's not what's usually meant by "learning." If it's the latter, then the strength-adjusting mechanism sounds equivalent to a net, one that could just as well have been implemented nonsymbolically. (You do state that your hypothetical module would be equivalent to R & M's in many respects, but it is not clear how this supports the symbolic approach.) [It's also unclear what to make of the point you add in your reply (again partly because of the entry-point problem): >>"(In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system...)">> Is this or is this not a claim that all or part of English past tense formation is not learnable (from the data available to the child) in principle? There seems to be some ambiguity (or perhaps ambivalence) here.] >> The only way we can make sense of this misattribution is to suppose >> that you equate "learnable" with "learnable by some (nth-order) >> statistical algorithm". The underlying presupposition is that >> statistical modeling (of an undefined character) has some kind of >> philosophical priority over other forms of analysis; so that if >> statistical modeling seems somehow possible-in-principle, then >> rule-based models (and the problems they solve) can be safely ignored. Yes, I equate learnability with an algorithm that can extract statistical regularities (possibly nth order) from input data. Connectionism seems to be (an interpretation of) a candidate class of such algorithms; so does multiple nonlinear regression. The question of "philosophical priority" is a deep one (on which I've written: "Induction, Evolution and Accountability," Ann. NY Acad. Sci. 280, 1976). Suffice it to say that induction has epistemological priority over innatism (or such a case can be made) and that a lot of induction (including hypothesis-strengthening by sampling instances) has a statistical character. It is not true that where statistical induction is possible, rule-based models must be ignored (especially if the rule-based models learn by what is equivalent to statistics anyway), only that the learning NEED not be implemented symbolically. But it is true that where a rule can be learned from regularities in the data, it need not be built in. [Ceterum sentio: there is an entry-point problem for symbols that I've also written about: "Categorical Perception," Cambr. U. Pr. 1987. I describe there a hybrid approach in in which symbolic and nonsymbolic representations, including a connectionistic component, are put together bottom-up in a principled way that avoids spuriously pitting connectionism against symbolism.] >> As a kind of corollary, you seem to assume that unless the input is so >> impoverished as to rule out all statistical modeling, rule theories >> are irrelevant; that rules are impossible without major stimulus-poverty. No, but I do think there's an entry-point problem. Symbolic rules can indeed be used to implement statistical learning, or even to preempt it, but they must first be grounded in nonsymbolic learning or in innate structures. Where there is learnability in principle, learning does have "philosophical (actually methodological) priority" over innateness. >> In our view, the question is not CAN some (ungiven) algorithm >> 'learn' it, but DO learners approach the data in that fashion. >> Poverty-of-the-stimulus considerations are one out of many >> sources of evidence in this issue... >> developmental data confirm that children do not behave the way such a >> pattern associator behaves. Poverty-of-the-stimulus arguments are the cornerstone of modern linguistics because, if they are valid, they entail that certain rules are unlearnable in principle (from the data available to the child) and hence that a learning model must fail for such cases. The rule system itself must accordingly be attributed to the brain, rather than just the general-purpose inductive wherewithal to learn the rules from experience. Where something IS learnable in principle, there is of course still a question as to whether it is indeed learned in practice rather than being innate; but neither (a) the absence of data on whether it is learned nor (b) the existence of a rule-based model that confers it on the child for free provide very strong empirical guidance in such a case. In any event, developmental performance data themselves seem far too impoverished to decide between rival theories at this stage. It seems advisable to devise theories that account for more lifesize chunks of our asymptotic (adult) performance capacity before trying to fine-tune them with developmental (or neural, or reaction-time, or brain-damage) tests or constraints. (Standard linguistic theory has in any case found it difficult to find either confirmation or refutation in developmental data to date.) By way of a concrete example, suppose we had two pairs of rival toy models, symbolic vs. connectionistic, one pair doing chess-playing and the other doing factorials. (By a "toy" model I mean one that models some arbitrary subset of our total cognitive capacity; all models to date, symbolic and connectionistic, are toy models in this sense.) The symbolic chess player and the connectionistic chess player both perform at the same level; so do the symbolic and connectionistic factorializer. It seems evident that so little is known about how people actually learn chess and factorials that "developmental" support would hardly be a sound basis for choosing between the respective pairs of models (particularly because of the entry-point problem, since these skills are unlikely to be acquired in isolation). A much more principled way would be to see how they scaled up from this toy skill to more and more lifesize chunks of cognitive capacity. (It has to be conceded, however, that the connectionist models would have a marginal lead in this race, because they would already be using the same basic [statistical learning] algorithm for both tasks, and for all future tasks, presumably, whereas the symbolic approach would have to be making its rules on the fly, an increasingly heavy load.) I am agnostic about who would win this race; connectionism may well turn out to be side-lined early because of a higher-order Perceptron-like limit on its rule-learning ability, or because of principled unlearnability handicaps. Who knows? But the race is on. And it seems obvious that it's far too early to use developmental (or neural) evidence to decide which way to bet. It's not even clear that it will remain a 2-man race for long -- or that a finish might not be more likely as a collaborative relay. (Nor is the one who finishes first or gets farthest guaranteed to be the "real" winner -- even WITH developmental and neural support. But that's just normal underdetermination.) >> if you simply wire up a network to do exactly what a rule does, by >> making every decision about how to build the net (which features to >> use, what its topology should be, etc.) by consulting the rule-based >> theory, then that's a clear sense in which the network "implements" >> the rule What if you don't WIRE it up but TRAIN it up? That's the case at issue here, not the one you describe. (I would of course agree that if nets wire in a rule as a built-in constraint, that's theft, not honest toil, but that's not the issue!) Stevan Harnad harnad at mind.princeton.edu From chrisley.pa at Xerox.COM Wed Aug 31 14:03:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 31 Aug 88 11:03 PDT Subject: The Four-Quadrant Problem In-Reply-To: alexis@marzipan.mitre.org (Alexis Wieland)'s message of Wed, 31 Aug 88 07:58:43 EDT Message-ID: <880831-112100-2321@Xerox> In the truly general case of an infinite plane, it seems that the four-quadrant problem cannot be solved by a (finite) two-layer network. But for any arbitrarily large, finite sub-plane, a two-layer (one hidden layer) network exists that can solve the four-quadrant problem, provided you use an inequality to interpret the output node. I think. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From alexis at marzipan.mitre.org Wed Aug 31 07:58:43 1988 From: alexis at marzipan.mitre.org (Alexis Wieland) Date: Wed, 31 Aug 88 07:58:43 EDT Subject: The Four-Quadrant Problem Message-ID: <8808311158.AA00606@marzipan.mitre.org.> Let me try to remove some of the confusion I've caused. The four-quadrant problem is *my* name for an easily described problem which *requires* a neural net with three (or more) layers (e.g. 2+ hidden layers). The only relation of all this to the recent DARPA report is that they use an illustration of it in passing as an example of what a two layer net can do (which I assert it cannot). The four-quadrant problem is to use a 2-input/1-output AAAAAAAAA***BBBBBBBBB network and, assuming that the inputs represent xy pts AAAAAAAAA***BBBBBBBBB on a Cartesian plane, classify all the points in the AAAAAAAAA***BBBBBBBBB first and third quadrant as being in one class and all AAAAAAAAA***BBBBBBBBB the points in the second and forth quadrant as being AAAAAAAAA***BBBBBBBBB in the other class. For pragmatic reasons, you can ********************* allow a "don't care" region along each axis not to ********************* exceed a fixed width delta. This is illustrated at BBBBBBBBB***AAAAAAAAA left: A's are one class (i.e., one output (or range BBBBBBBBB***AAAAAAAAA of outputs)), B's are the other class (i.e., another BBBBBBBBB***AAAAAAAAA output (or non-overlapping range of outputs)), and *'s BBBBBBBBB***AAAAAAAAA are don't cares. As always with this sort of problem, BBBBBBBBB***AAAAAAAAA rotations and translations of the figure can be ignored. Alexis Wieland alexis%yummy at gateway.mitre.org From mike%bucasb.bu.edu at bu-it.BU.EDU Mon Aug 1 12:08:42 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Mon, 1 Aug 88 12:08:42 EDT Subject: FIRST ANNUAL MEETING OF THE INTERNATIONAL NEURAL NETWORK SOCIETY Message-ID: <8808011608.AA06511@bucasb.bu.edu> -----Meeting Update----- September 6--10, 1988 Park Plaza Hotel Boston, Massachusetts The first annual INNS meeting promises to be a historic event. Its program includes the largest selection of investigators ever assembled to present the full range of neural network research and applications. The meeting will bring together over 2000 scientists, engineers, students, government administrators, industrial commercializers, and financiers. It is rapidly selling out. Reserve now to avoid disappointment. Call J.R. Shuman Associates, (617) 237-7931 for information about registration For information about hotel reservations, call the Park Plaza Hotel at (800) 225-2008 and reference "Neural Networks." If you call from Massachusetts, call (800) 462-2022. There will be 600 scientific presentations, including tutorials, plenary lectures, symposia, and contributed oral and poster presentations. Over 50 exhibits are already reserved for industrial firms, publishing houses, and government agencies. The full day of tutorials presented on September 6 will be given by Gail Carpenter, John Daugman, Stephen Grossberg, Morris Hirsch, Teuvo Kohonen, David Rumelhart, Demetri Psaltis, and Allen Selverston. The plenary lecturers are Stephen Grossberg, Carver Mead, Terrence Sejnowski, Nobuo Suga, and Bernard Widrow. Approximately 30 symposium lectures will be given, 125 contributed oral presentations, and 400 poster presentations. Fourteen professional societies are cooperating with the INNS meeting. They are: American Association of Artificial Intelligence American Mathematical Society Association for Behavior Analysis Cognitive Science Society IEEE Boston Section IEEE Computer Society IEEE Control Systems Society IEEE Engineering in Medicine and Biology Society IEEE Systems, Man and Cybernetics Society Optical Society of America Society for Industrial and Applied Mathematics Society for Mathematical Biology Society of Photo-Optical Instrumentation Engineers Society for the Experimental Analysis of Behavior DO NOT MISS THE FIRST BIRTHDAY CELEBRATION OF THIS IMPORTANT NEW RESEARCH COALITION! From jdk at riacs.edu Mon Aug 1 13:03:44 1988 From: jdk at riacs.edu (Jim Keeler) Date: Mon, 1 Aug 88 10:03:44 pdt Subject: Job opening at MCC Message-ID: <8808011703.AA28463@hydra.riacs.edu> WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is looking for research scientists to join our newly formed neural network research team. We are looking for researchers with strong theoretical skills in Physics, Electrical Engineering or Computer Science (Ph. D. level or above preferred). The following is a partial list of research topics that the group will address: -Scaling and improvement of existing algorithms -Development of new learing algorithms -Temporal pattern recognition and processing -Reverse engineering of biological networks -Optical neural network architectures MCC offers competitive salaries and a very stimulating research environment. Contact Jim Keeler at jdk.riacs.edu or Haran Boral at haran.mcc.com From mike%bucasb.bu.edu at bu-it.BU.EDU Mon Aug 1 12:08:42 1988 From: mike%bucasb.bu.edu at bu-it.BU.EDU (Michael Cohen) Date: Mon, 1 Aug 88 12:08:42 EDT Subject: FIRST ANNUAL MEETING OF THE INTERNATIONAL NEURAL NETWORK SOCIETY Message-ID: <8808011608.AA06511@bucasb.bu.edu> -----Meeting Update----- September 6--10, 1988 Park Plaza Hotel Boston, Massachusetts The first annual INNS meeting promises to be a historic event. Its program includes the largest selection of investigators ever assembled to present the full range of neural network research and applications. The meeting will bring together over 2000 scientists, engineers, students, government administrators, industrial commercializers, and financiers. It is rapidly seeservations, call the Park Plaza Hotel at (800) 225-2008 and reference "Neural Networks." If you call from Massachusetts, call (800) 462-2022. There will be 600 scientific presentations, including tutorials, plenary lectures, symposia, and contributed oral and poster presentations. Over 50 exhibits are already reserved for industrial firms, publishing houses, and government agencies. The full day of tutorials presented on September 6 will be given by Gail Carpenter, John Daugman, Stephen Grossberg, Morris Hirsch, Teuvo Kohonen, David Rumelhart, Demetri Psaltis, and Allen Selverston. The plenary lecturers are Stephen Grossberg, Carver Mead, Terrence l presentations, and 400 poster presentations. Fourteen professional societies are cooperating with the INNS meeting. They are: American Association of Artificial Intelligence American Mathematical Society Association for Behavior Analysis Cognitive Science Society IEEE Boston Section IEEE Computer Society IEEE Control Systems Society IEEE Engineering in Medicine and Biology Society IEEE Systems, Man and Cybernetics Society Optical Society of America Society for Industrial and Applied Mathematics Society for Mathematical Biology Society of Photo-Optical Instrumentation Engineers Society for the Experimental Analysis of Behavior DO NOT MISS THE FIRST BIRTHDAY CELEBRATION OF THIS IMPORTANT NEW RESEARCH COALITION! From solla at homxb.att.com Tue Aug 2 10:41:00 1988 From: solla at homxb.att.com (solla@homxb.att.com) Date: Tue, 2 Aug 88 10:41 EDT Subject: No subject Message-ID: The following preprint is available. If you want a copy, please send your request to: Sara A. Solla AT&T Bell Laboratories, Rm 4G-336 Crawfords Corner Road Holmdel, NJ 07733 solla at homxb.att.com ************************************************************************ ACCELERATED LEARNING IN LAYERED NEURAL NETWORKS Sara A. Solla AT&T Bell Laboratories, Holmdel NJ 07733 Esther Levin and Michael Fleisher Technion Israel Institute of Technology, Haifa 32000, Israel ABSTRACT Learning in layered neural networks is posed as the minimization of an error function defined over the training set. A probabilistic interpretation of the target activities suggests the use of relative entropy as an error measure. We investigate the merits of using this error function over the traditional quadratic function for gradient descent learning. Comparative numerical simulations for the contiguity problem show marked reductions in learning times. This improvement is explained in terms of the characteristic roughness of the landscape defined by the error function in configuration space. ************************************************************************ From watrous at linc.cis.upenn.edu Wed Aug 3 09:39:03 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Wed, 3 Aug 88 09:39:03 EDT Subject: Complexity of Second Order Learning Algorithms Message-ID: <8808031339.AA28717@linc.cis.upenn.edu> It is generally assumed that second order learning algorithms are computationally too expensive for use on large problems, since the complexity is O(N**2), N being the number of links. It turns out that the function and gradient evaluations are O(NT), where T is the number of training samples. In order to have statistically adequate training data, T should approximate N, and is typically greater than N. Thus, the computational cost of the function and gradient evaluations exceeds that of the update algorithm. Moreover, since the ratio of function and gradient evaluations to Hessian updates is generally greater than two, the optimization process becomes dominated by function and gradient evaluations rather than by the update operation. The complexity details and several examples are discussed in the technical report (revised): Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization MS-CIS-87-51 available from: James Lotkowski Technical Report Facility Room 269/Moore Building University of Pennsylvania 200 S 33rd Street Philadelphia, PA 19104-6389 james at upenn.cis.edu Ray Watrous From harnad at Princeton.EDU Thu Aug 4 01:46:35 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Thu, 4 Aug 88 01:46:35 edt Subject: Behav. Brain Sci. Call for Commentators: Motor Control Message-ID: <8808040546.AA07055@mind> Below is the abstract of a forthcoming target article to appear in Behavioral and Brain Sciences (BBS), an international journal of "open peer commentary" in the biobehavioral and cognitive sciences, published by Cambridge University Press. For information on how to serve as a commentator or to nominate qualified professionals in these fields as commentators, please send email to: harnad at mind.princeton.edu or write to: BBS, 20 Nassau Street, #240, Princeton NJ 08542 [tel: 609-921-7771] Strategies for the Control of Voluntary Movements with One Degree of Freedom Gerald L. Gottlieb (Physiology, Rush Medical Center), Daniel M. Corcos (Physical Education, U. Illinois, Chicago), Gyan C. Agarwal (Electr. Engineering & Computer Science, U. Illinois, Chicago) A theory is presented to explain how people's accurate single-joint movements are controlled. The theory applies to movements across different distances, with different inertial loads, toward targets of different widths over a wide range of experimentally manipulated velocities. The theory is based on three propositions: (1) Movements are planned according to "strategies," of which there are at least two: a speed-insensitive (SI) and a speed-sensitive (SS) strategy. (2) These strategies can be equated with sets of rules for performing diverse movement tasks. The choice between (SI) and (SS) depends on whether movement speed and/or movement time (and hence appropriate muscle forces) must be constrained to meet task requirements. (3) The electromyogram can be interpreted as a low-pass filtered version of the controlling signal to motoneuron pools. This controlling signal can be modelled as a rectangular excitation pulse in which modulation occurs in either pulse amplitude or pulse width. Movements with different distances and loads are controlled by the SI strategy, which modulates pulse width. Movements in which speed must be explicitly regulated are controlled by the SS strategy, which modulates pulse amplitude. The distinction between the two movement strategies reconciles many apparent conflicts in the motor control literature. From lab at cs.brandeis.edu Thu Aug 4 08:26:25 1988 From: lab at cs.brandeis.edu (Larry Bookman) Date: Thu, 4 Aug 88 08:26:25 edt Subject: Complexity of Second Order Learning Algorithms In-Reply-To: Raymond Watrous's message of Wed, 3 Aug 88 09:39:03 EDT <8808031339.AA28717@linc.cis.upenn.edu> Message-ID: Could you please send me a copy of MS-CIS-87-51: learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization Thanks, Larry Bookman Brandeis University Computer Science Department Waltham, MA 02254 From watrous at linc.cis.upenn.edu Thu Aug 4 15:49:46 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Thu, 4 Aug 88 15:49:46 EDT Subject: Clarification on Technical Report Message-ID: <8808041949.AA11301@linc.cis.upenn.edu> The recent posting regarding the technical report on the complexity of second order methods of gradient optimization should be amended as follows: 1. There is normally a reproduction charge for technical reports ordered from the University of Pennsylvania. This varies with the length of the report, and for MS-CIS-87-51 is $3.68. This charge has recently been waived for this technical report, compliments of the Computer Science Department. 2. The e-mail address for James Lotkowski should read: james at cis.upenn.edu 3. The revised report has now been renumbered, to distinguish it from its predecessor: MS-CIS-88-62 I apologize for the inconvenience due to these oversights. RW From watrous at linc.cis.upenn.edu Thu Aug 4 15:50:44 1988 From: watrous at linc.cis.upenn.edu (Raymond Watrous) Date: Thu, 4 Aug 88 15:50:44 EDT Subject: Clarification on Technical Report Message-ID: <8808041950.AA11314@linc.cis.upenn.edu> The recent posting regarding the technical report on the complexity of second order methods of gradient optimization should be amended as follows: 1. There is normally a reproduction charge for technical reports ordered from the University of Pennsylvania. This varies with the length of the report, and for MS-CIS-87-51 is $3.68. This charge has recently been waived for this technical report, compliments of the Computer Science Department. 2. The e-mail address for James Lotkowski should read: james at cis.upenn.edu 3. The revised report has now been renumbered, to distinguish it from its predecessor: MS-CIS-88-62 I apologize for the inconvenience due to these oversights. RW From alexis%yummy at gateway.mitre.org Thu Aug 4 11:23:00 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 4 Aug 88 11:23:00 EDT Subject: A Harder Learning Problem Message-ID: <8808041523.AA01234@marzipan.mitre.org> There are many problems with the current standard "benchmark" tasks that are used with NNs, but one of them is that they're just too simple. It's hard to compare learning algorithms when the task the network has to perform is excessively easy. One of the tasks that we've been using at MITRE to test and compare our learning algorithms is to distinguish between two intertwined spirals. This task uses a net with 2 inputs and 1 output. The inputs correspond to points, and the net should output a 1 on one spiral and a 0 on the other. Each of the spirals contains 3 full revolutions. This task has some nice features: it's very non-linear, it's relatively difficult (our spiffed up learning algorithm requires ~15-20 million presentations = ~150-200 thousand epochs = ~1-2 days of cpu on a (loaded) Sun4/280 to learn, ... we've never succeeded at getting vanilla bp to correctly converge), and because you have 2 in and 1 out you can *PLOT* the current transfer function of the entire network as it learns. I'd be interested in seeing other people try this or a related problem. Following this is a simple C program that we use to generate I/O data. Alexis P. Wieland wieland at mitre.arpa MITRE Corporation or 7525 Colshire Dr. alexis%yummy at gateway.mitre.org McLean, VA 22102 /=========================================================================/ #include #include /************************************************************************* ** ** mkspiral.c ** ** A program to generate input and output data for a neural network ** with 2 inputs and 1 output. ** ** If the 2 inputs are taken to represent an x-y position and the ** output (which is either 0.0 or 1.0) is taken to represent which of ** two classes the input point is in, then the data forms two coiled ** spirals. Each spiral forms 3 complete revolutions and contains ** 97 points (32 pts per revolution plus end points). Spiral 1 passes ** from (0, 6.5) -> (6, 0) -> (0, -5.5) -> (-5, 0) -> (0, 4.5) -> ** ... -> (0, 0.5). Likewise, Spiral 0 passes from (0, -6.5) -> ** (-6, 0) -> (0, 5.5) -> (5, 0) -> (0, -4.5) -> ... -> (0, -0.5). ** ** This program writes out data in ascii, one exemplar per line, in ** the form: ((x-pt y-pt) (class)). ** ** This data set was developed to test learning algorithms developed ** at the MITRE Corporation. The intention was to create a data set ** which would be non-trivial to learn. We at MITRE have never ** succeeded at learning this task with vanilla back-propagation. ** ** Any questions or comment (reports of success or failure with this ** task are as interesting as anything to us) contact: ** ** Alexis P. Wieland ** MITRE Corporation ** 7525 Colshire Dr. ** McLean, VA 22102 ** (703) 883-7476 ** wieland at mitre.ARPA ** *************************************************************************/ main() { int i; double x, y, angle, radius; /* write spiral of data */ for (i=0; i<=96; i++) { angle = i * M_PI / 16.0; radius = 6.5 * (104 - i) / 104.0; x = radius * sin(angle); y = radius * cos(angle); printf("((%8.5f %8.5f) (%3.1f))\n", x, y, 1.0); printf("((%8.5f %8.5f) (%3.1f))\n", -x, -y, 0.0); } } From Mark.J.Zeren at mac.Dartmouth.EDU Fri Aug 5 17:38:13 1988 From: Mark.J.Zeren at mac.Dartmouth.EDU (Mark.J.Zeren@mac.Dartmouth.EDU) Date: 5 Aug 88 17:38:13 EDT Subject: Joining the list Message-ID: <11383@mac.dartmouth.edu> I am a student working on some neural net research at Dartmouth College under Jamshed Barucha. I would like to join/use the mailing list to get some feedback/insights to some of the ideas that I have been pursuing this summer. I would remain on the list only through the end of August, as I am leaving the country at the beginning of september. Mark Zeren mark.zeren at dartmouth.edu From Kevin.Lang at G.GP.CS.CMU.EDU Fri Aug 5 17:43:30 1988 From: Kevin.Lang at G.GP.CS.CMU.EDU (Kevin.Lang@G.GP.CS.CMU.EDU) Date: Fri, 5 Aug 88 17:43:30 EDT Subject: A Harder Learning Problem Message-ID: I tried standard back-propagation on the spiral problem, and found that it is a useful addition to the standard set of benchmark problems. It is as small and easily stated as the usual encoder and shifter problems, but many times harder. My network has a 2-5-5-5-1 structure, consisting of 2 input units, three hidden layers of 5 units each, and 1 output unit. Each layer is connected to all of the other layers to provide quick pathways along which to propagate errors. This network contains 138 weights, which seems about right for a training set with 194 examples. The network was trained with parameters that were increased gradually from .001 to .002 for the learning rate parameter, and from .5 to .95 for the momentum parameter. A few brief excursions to .005 for the learning parameter caused derailments (the cosine of the angle between successive steps went negative). At CMU, we generally use target values of 0.2 and 0.8 in place of 0.0 and 1.0, in order to reduce the need for big weights. Assuming that errors occur when the output value for a case lies on the wrong side of 0.5, the network had the following error history as it was trained using the batch version of back-propagation (all cases presented between weight updates.) This run chewed up about 9 CPU minutes on our Convex. epochs errors 2,000 75 4,000 74 6,000 64 8,000 14 (big improvement here) 10,000 8 12,000 4 14,000 2 16,000 2 (struggling) 18,000 0 The average weight at this point is about 3.4. Since all of the output values lie on opposite sides of 0.5, it is a simple matter to grow the weights and separate the values further. For example, about 1,000 more epochs are required to pull the output values below 0.4 and above 0.6 for the two spirals. From Dave.Touretzky at B.GP.CS.CMU.EDU Fri Aug 5 20:56:49 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Fri, 05 Aug 88 20:56:49 EDT Subject: NIPS needs a logo Message-ID: <11682.586832209@DST.BOLTZ.CS.CMU.EDU> ------- Blind-Carbon-Copy Reply-To: Dave.Touretzky at cs.cmu.edu cc: reyner at c.cs.cmu.edu Subject: NIPS needs a logo Date: Fri, 05 Aug 88 20:56:49 EDT Message-ID: <11682.586832209 at DST.BOLTZ.CS.CMU.EDU> From: Dave.Touretzky at DST.BOLTZ.CS.CMU.EDU We are seeking a distinctive logo for the IEEE NIPS (Neural Information Processing Systems) Conference. This conference is held each year in Denver. The 1988 conference is scheduled for November 28-December 1. The logo will be used on the abstract booklet and conference proceedings. In future years it will also be used on all conference stationery, calls for papers, and publicity releases. The logo should be a small and fairly simple design that expresses the scientific theme of the NIPS conference. We welcome any and all suggestions. Crude sketches are okay; an artist will refine the best half dozen ideas we receive, and the final decision will be made by the conference organizing committee. Submissions should be sent (hardcopy only) by September 1st to: Pamela Reyner Scott Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213-3890 All submissions become the property of the organizing committee and cannot be returned, so send a photocopy, not your original. It would also be a good idea to write your name and address on each submission so we don't lose track of who sent what. The submitter of the winning logo design will reap unimaginable fame and wealth, or at the very least a warm acknowledgement in the proceedings. Multiple submissions are welcome. - -- Dave Touretzky Publications Chairman 1988 IEEE NIPS Conference ------- End of Blind-Carbon-Copy From panther!panther.UUCP!gjt at uxc.cso.uiuc.edu Fri Aug 5 18:49:14 1988 From: panther!panther.UUCP!gjt at uxc.cso.uiuc.edu (Gerry Tesauro) Date: Fri, 5 Aug 88 17:49:14 CDT Subject: Two tech reports Message-ID: <8808052249.AA21912@panther.ccsr.uiuc.edu> Two new Center for Complex Systems Research Tech. Reports are now available; the abstracts appear below. (A cautionary note: CCSR-88-6 describes an obsolete network, and is of no use to readers unfamiliar with backgammon.) Requests may be sent to: gjt%panther at uxc.cso.uiuc.edu or the US mail address which appears below. ------------------------ Neural Network Defeats Creator in Backgammon Match G. Tesauro Center for Complex Systems Research, University of Illinois at Urbana-Champaign, 508 S. 6th St., Champaign, IL 61820 USA Technical Report No. CCSR-88-6 This paper presents an annotated record of a 20-game match which I played against one of the networks discussed in ``A Parallel Network that Learns to Play Backgammon,'' by myself and Terry Sejnowski. (Tech. Report CCSR-88-2, and Artifi- cial Intelligence, to appear.) This paper is specifically intended for backgammon enthusiasts who want to see exactly how the network plays. The surprising result of the match was that the network won, 11 games to 9. However, the network made several blunders during the course of the match, and was extremely lucky to have won. Nevertheless, in spite of the network's worst-case play, its average performance in typical positions is quite sharp, and is more challenging than con- ventional commercial programs. ------------------------ Asymptotic Convergence of Back-Propagation in Single-Layer Networks Gerald Tesauro and Yu He Center for Complex Systems Research University of Illinois at Urbana-Champaign 508 S. 6th St., Champaign, IL 61820 USA Technical Report No. CCSR-88-7 We calculate analytically the rate of conver- gence at long times in the back-propagation learn- ing algorithm for networks without hidden units. For the standard quadratic error function and a sigmoidal transfer function, we find that the error decreases as 1/t for large t, and the output states approach their target values as 1/sqrt(t). It is possible to obtain a different convergence rate for certain error and transfer functions, but the convergence can never be faster than 1/t. These results also hold when a momentum term is added to the learning algorithm. Our calculation agrees with the numerical results of Ahmad and Tesauro. From PH706008%BROWNVM.BITNET at VMA.CC.CMU.EDU Sat Aug 6 15:36:41 1988 From: PH706008%BROWNVM.BITNET at VMA.CC.CMU.EDU (PH706008%BROWNVM.BITNET@VMA.CC.CMU.EDU) Date: Sat, 06 Aug 88 15:36:41 EDT Subject: Reply to Alexis Wieland Message-ID: In your recent communication concerning the spirals problem for neural networks, you mentioned that you would be interested in hearing from anyone working on a related problem. I have been investigating a similar, although somewhat less complicated problem, using a backward propagation network. The paradigm is a concentric circle problem in which the desired output on the inner disc is 0 and on the outer annulus is 1. The two input units are loaded with the x and y coordinates for each pattern as in your paradigm. After learning the training patterns (randomly chosen patterns in the two regions: usually 50 -150), the network generalizes nicely for previously unseen patterns; a plot of average output versus radius approaches that of a step function as more patterns are used. The paradigm was suggested by my dissertation advisor, Prof. Leon Cooper. I would be interested in hearing more details of your simulations. Charles M. Bachmann ("Chip") ph706008 at Brownvm Box 1843 Physics Dpt. & Ctr. for Neural Science Brown University Providence, R. I. 02912 From elman at amos.ling.ucsd.edu Mon Aug 8 16:23:47 1988 From: elman at amos.ling.ucsd.edu (Jeff Elman) Date: Mon, 8 Aug 88 13:23:47 PDT Subject: Technical Report announcement Message-ID: <8808082023.AA22615@amos.ling.ucsd.edu> The following abstract describes a paper which can be obtained from Hal White, Dept. of Economics D-008, Univ. of Calif., San Diego, La Jolla, CA 92093. Multi-layer feedforward networks are universal approximators by Kurt Hornik, Maxwell Stinchcombe, and Halbert White This paper rigorously establishes that standard multi-layer feedforward networks with as few as one hidden layer using arbitrary squashing functions (not necessarily continuous) at the hidden layer(s) are capable of approximating any Borel measurable function from one Euclidean space to another to any desired degree of accuracy, provided suffi- ciently many hidden units are available. In this sense, multi-layer feedforward networks are a class of universal approximators. From pfeifer at ifi.unizh.ch Mon Aug 8 10:16:00 1988 From: pfeifer at ifi.unizh.ch (Rolf Pfeifer) Date: 8 Aug 88 16:16 +0200 Subject: Connectionist conference Message-ID: <742*pfeifer@ifi.unizh.ch> ***************************************************************************** SGAICO Conference ******************************************************************************* Program and Call for Presentation of Ongoing Work C O N N E C T I O N I S M I N P E R S P E C T I V E University of Zurich, Switzerland 10-13 October 1988 Tutorials: 10 October 1988 Technical Program: 11 - 12 October 1988 Workshops and Poster/demonstration session 13 October 1988 ****************************************************************************** Organization: - University of Zurich, Dept. of Computer Science - SGAICO (Swiss Group for Artificial Intelligence and Cognitive Science) - Gottlieb Duttweiler Institute (GDI) About the conference ____________________ Introdution: Connectionism has gained much attention in recent years as a paradigm for building models of intelligent systems in which intresting behavioral properties emerge from complex interactions of a large number of simple "neuron-like" elements. Such work is highly relevant to fields such as cognitive science, artificial intelligence, neurobiology, and computer science and to all disciplines where complex dynamical processes and principles of self-organization are studied. Connectionism models seem to be suited for solving many problems which have proved difficult in the past using traditional AI techniques. But to what extent do they really provide solutions? One major theme of the conference is to evaluate the import of connectionist models for the various disciplines. Another one is to see in what ways connectionism, being a young discipline in its present form, can benefit from the influx of concepts and research results from other disciplines. The conference includes tutorials, workshops, a technical program and panel discussions with some of the leading researchers in the field. Tutorials: The goal of the tutorials is to introduce connectionism to people who are relatively new to the field. They will enable participants to follow the technical program and the panel discussions. Technical Program: There are many points of view to the study of intelligent systems. The conference will focus on the views from connectionism, artificial intelligence and cognitive science, neuroscience, and complex dynamics. Along another dimension there are several significant issues in the study of intelligent systems, some of which are "Knowledge representation and memory", "Perception, sequential processing, and action", "Learning", and "Problem solving and reasoning". Researchers from connectionism, cognitive science, artificial intelligence, etc. will take issue with the ways connectionism is approaching these various problem areas. This idea is reflected in the structure of the program. Panel Discussions: There will be panel discussion with experts in the field on specialized topics which are of particular interest to the application of connectionism. Workshops and Presentations of Ongoing Work: The last day of the conference is devoted to wokrshops with the purpose of identifying the major problems that currently exist within connectionism, to define future research agendas and collaborations, to provide a platform for the interdisciplinary exchange of information and experience, and to find a framework for practical applications. The workshop day will als feature presentation of ongoing work (see "Call for presentation of ongoing work"). ******************************************************************************* * * * CALL FOR PRESENTATION OF OINGOING WORK * * * * Presentations are invited on all areas of connectionist research. The focus * * is on current research issues, i.e. "work in progress" is of highest * * interest even if major problems remain to be resolved. Work of RESEARCH * * GROUPS OR LABORATORIES is particularly welcome. Presentations can be in the * * form of poster, or demonstration of prototypes. The goal is to encourage * * cooperation and the exchange of ideas between different research groups. * * Please submit an extended abstract (1-2 pages). * * * * Deadline for submissions: September 2, 1988 * * Notification of acceptance: September 20, 1988 * * * * Contact: Zoltan Schreter, Computer Science Department, University of * * Zurich, Switzerland, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland * * Phone: (41) 1 257 43 07/11 * * Fax: (41) 1 257 40 04 * * or send mail to * * pfeifer at ifi.unizh.ch * * * ******************************************************************************* Tutorials MONDAY, October 10, 1988 ___________________________________________________________________________ 08.30 Tutorial 1: Introduction to neural nets. F. Fogelman - Adaptive systems: Perceptrons (Rosenblatt) and Adalines (Widrow & Hoff) - Associative memories: linear model (Kohonen), Hopfield networks, Brain state in a box model (BSB; Anderson) - Link to other disciplines 09.30 Coffee 10.00 Tutorial 2: Self-organizing Topological maps. T. Kohonen - Theory - Application: Speech-recognizing systems - Tuning of maps for optimal recognition accuracy (learning vector quantization) 11:30 Tutorial 3: Multi-layer neural networks. Y. Le Cun - Elementary learning mechanisms (LMS and Perceptron) and their limitations - Easy and hard learning - Learning in multi-layer networks: The back-propagation algorithm (and its variations) - Multi-layer networks: - as associative memories - for pattern recognition (a case study) - Network design techniques; simulators and software tools 13.00 Lunch 14.00 Tutorial 4: Parallel Distributed Processing of symbolic structure. P. Smolensky Can Connectionism deal with the kind of complex highly structured information characteristic of most AI domains? This tutorial presents recent research suggesting that the answer is yes. 15.30 Coffee 16.00 Tutorial 5: Connectionist modeling and simulation in neuroscience and psychology. R. Granger Biological networks are composed of neurons with a range of biophysical and physiological properties that give rise to complex learning and performance rules embedded in anatomical architectures with complex connectivity. Given this complexity it is of interest to identify which of the characteristics of brain networks are central and which are less salient with respect to behavioral function. "Bottom-up" biological modeling attempts to identify the crucial learning and performance rules and their appropriate level of abstraction. 17.30 End of tutorial sessions _______________________________________________________________________________ Technical Program TUESDAY, October 11, 1988 ___________________________________________________________________________ Introduction 09:00 Connectionism: Is it a new paradigm? M. Boden 09:45 Discussion 10:00 Coffee 1. Knowledge Representation & Memory. Chair: F. Fogelman The perspective of: 10:30 - Connectionism. P. Smolensky Dealing with structure in Connectionism 11:15 - AI/ N.N. Cognitive Science 12:00 - Neuroscience/ C. v. der Malsburg Connectionism A neural architecture for the representation of structured objects 12:45 Lunch 2. Perception, Sequential Processing & Action. Chair: T. Kohonen The perspective of: 14:30 - Connectionism M. Kuperstein Adaptive sensory-motor coordination using neural networks 15:15 - Connectionism/ M. Imbert Neuroscience and Connectionism: Neuroscience The case of orientation coding. 16:00 Coffee 16:30 - AI/ J. Bridle Connectionist approaches to Connectionism artificial perception: A speech pattern processing approach 17:15 - Neuroscience G. Reeke Synthetic neural modeling: A new approach to Brain Theory 18:00 Intermission/snack 18.30 - 20.00 panel discussion/workshop on Expert Systems and Connectionism. Chair: S. Ahuja D. Bounds D. Reilly Y. Le Cun R. Serra ___________________________________________________________________________ WEDNESDAY, October 12, 1988 ___________________________________________________________________________ 3. Learning. Chair: R. Serra The perspective of: 9:00 - Connectionism Y. Le Cun Generalization and network design strategies 9:45 - AI Y. Kodratoff Science of explanations versus science of numbers 10:30 Coffee 11:00 - Complex Dynamics/ Genetic Algorithms H. Muehlenbein Genetic algorithms and parallel computers 11:45 - Neuroscience G. Lynch Behavioral effects of learning rules for long-term potentiation 12:30 Lunch 4. Problem Solving & Reasoning. Chair: R. Pfeifer The perspective of: 14:00 - AI/ B. Huberman Dynamical perspectives on Complex Dynamics problem solving and reasoning 14:45 - Complex Dynamics L. Steels The Complex Dynamics of common sense 15:30 Coffee 16:00 - Connectionism J. Hendler Problem solving and reasoning: A Connectionist perspective 16:45 - AI P. Rosenbloom A cognitive-levels perspective on the role of Connectionism in symbolic goal-oriented behavior 17:30 Intermission/snack 18:00 - 19:30 panel discussion/workshop on Implementation Issues & Industrial Applications. Chair: P. Treleaven B. Angeniol G. Lynch G. Dreyfus C. Wellekens __________________________________________________________________________ Workshops and presentation of ongoing work THURSDAY, October 13, 1988 ___________________________________________________________________________ 9:00-16:00 Workshops in partially parallel sessions. There will be a separate poster/demonstration session for the presentation of ongoing work. The detailed program will be based on the submitted work and will be available at the beginning of the conference. The workshops: 1. Knowledge Representation & Memory Chair: F. Fogelman 2. Perception, Sequential Processing & Action Chair: F. Gardin 3. Learning Chair: R. Serra 4. Problem Solving & Reasoning Chair: R. Pfeifer 5. Evolutionary Modelling Chair: L. Steels 6. Neuro-Informatics in Switzerland: Theoretical and technical neurosciences Chair: K. Hepp 7. European Initiatives Chair: N.N. 8. Other 16:10 Summing up: R. Pfeifer 16:30 End of the conference ___________________________________________________________________________ Program as of June 29, 1988, subject to minor changes ___________________________________________________________________________ THE SMALL PRINT Organizers Computer Science Department, University of Zurich Swiss Group for Artificial Intelligence and Cognitive Science (SGAICO) Gottlieb Duttweiler Institute (GDI) Location University of Zurich-Irchel Winterthurerstrasse 190 CH-8057 Zurich, Switzerland Administration Gabi Vogl Phone: (41) 1 257 43 21 Fax: (41) 1 257 40 04 Information Rolf Pfeifer Zoltan Schreter Computer Science Department, University of Zurich Winterthurerstrasse 190, CH-8057 Zurich Phone: (41) 1 257 43 23 / 43 07 Fax: (41) 1 257 40 04 Sanjeev B. Ahuja, Rentenanstalt (Swiss Life) General Guisan-Quai 40, CH-8022 Zurich Phone: (41) 1 206 40 61 / 33 11 Thomas Bernold, Gottlieb Duttweiler Institute, CH-8803 Ruschlikon Phone: (41) 1 461 37 16 Fax: (41) 1 461 37 39 Participation fees Conference 11-13 October 1988: Regular SFr. 350.-- ECCAI/SGAICO/ SI/SVI-members SFr. 250.-- Full time students SFr. 100.-- Tutorials 10 October 1988: Regular SFr. 200.-- ECCAI/SGAICO/ SI/SVI-members SFr. 120.-- Full time students SFr. 50.-- For graduate students / assistants a limited number of reduced fees are available. Documentation and refreshments are included. Please remit the fee only upon receipt of invoice by the Computer Science Department. Language The language of the conference is English. Cancellations If a registration is cancelled, there will be a cancellation charge of SFr. 50.-- after 1st October 1988, unless you name a replacement. Hotel booking Hotel booking will be handled separately. Please indicate on your registration form whether you would like information on hotel reservations. Proceedings Proceedings of the conference will be published in book form. They will become available in early 1989. From yann at ai.toronto.edu Tue Aug 9 02:13:46 1988 From: yann at ai.toronto.edu (Yann le Cun) Date: Tue, 9 Aug 88 02:13:46 EDT Subject: Technical Report announcement In-Reply-To: Your message of Mon, 08 Aug 88 16:23:47 -0400. Message-ID: <88Aug8.233349edt.386@neat.ai.toronto.edu> Jeff Elman writes: > This paper rigorously establishes that standard multi-layer > feedforward networks with as few as one hidden layer using > arbitrary squashing functions (not necessarily continuous) > at the hidden layer(s) are capable of approximating any > Borel measurable function from one Euclidean space to > another to any desired degree of accuracy, provided suffi- > ciently many hidden units are available. In this sense, > multi-layer feedforward networks are a class of universal > approximators. I showed the same kind of result in my thesis (although probably not as rigorously). The problem is: if you use monotonic squashing functions, then you need one more layer (i.e two hidden layers). reference: Yann le Cun: "modeles connexionnistes de l'apprentissage" (connectionist learning models), These de Doctorat, Universite Pierre et Marie Curie (Paris 6), June 1987, Paris, France. - Yann From chrisley.pa at Xerox.COM Mon Aug 8 22:38:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 8 Aug 88 19:38 PDT Subject: A Harder Learning Problem In-Reply-To: alexis%yummy@gateway.mitre.org's message of Thu, 4 Aug 88 11:23:00 EDT Message-ID: <880808-194211-4680@Xerox> This is in response to the recent comments by Alexis Wieland, Charles Bachmann, and the tech report by Scott Fahlman which was announced on this mailing list. I agree that a more careful selection of benchmarking tasks is required. Specifically, there has been little effort spent on comparing networks on the kinds of tasks that many are advocating as one of the fortes of the neural network approach: pattern recognition in natural signals (eg, speech). The key characteristic of patterns in natural signals is that they are statistical: a sample is often a member of more than one class. Thus, one does not talk of zero error, but minimal error. The reason why explicitly statistical tasks should be used in benchmarking is that the pattern recognition properties of models vary noticeably when moving from the deterministic to the statistical case. For an example of statistical benchmarking of Backprop, Boltzmann machines, and Learning Vector Quantization, see Kohonen, Barna and Chrisley, '88, in the proceedings of this year's ICNN. Also see Huang and Lippmann, '87a and b (ICNN and NIPS). For example, a typical two category task might have category A as a Gaussian distribution cetered around the origin with a variance of 2, while category B might be a Gaussian that is offset in the first dimension by some amount, and with a variance of 1. This requires non-linear decision boundaries for optimal (Bayesian) performance, and the optimal performance may be calculated analytically (good performance = low misclassification rate). This is one of the tasks discussed in our paper, above. BTW, we found that LVQ was better than BP, especially in high dimensional and difficult tasks, while the BM had almost optimal performance, although it required inordinate amounts of computing time. Ron Chrisley Xerox PARC SSL 3333 Coyote Hill Road Palo Alto, CA 94304 From huyser at mojave.Stanford.EDU Wed Aug 10 13:34:31 1988 From: huyser at mojave.Stanford.EDU (Karen Huyser) Date: Wed, 10 Aug 88 10:34:31 PDT Subject: Roommates wanted Message-ID: <8808101734.AA19443@mojave.Stanford.EDU> Make a new friend!! Live with a stranger for five days and hope you never see them again :-) !! BE A ROOMMATE!! Vip Tolat and I, both Stanford students, will be giving papers at the INNS Conference in Boston next month, and we are too poor to spend five or six nights at the Park Plaza at $100/night. If anyone on the net would like to save half the hotel fee by teaming up with one of us, we'd sure appreciate it. Roommate requests should be sent to huyser at sonoma.stanford.edu or huyser at mojave.stanford.edu. If there are a lot of requests, I will collect a list and distribute it to the people who are on it. Karen Huyser Vip Tolat From jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU Wed Aug 10 16:42:27 1988 From: jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU (Jerry Feldman) Date: Wed, 10 Aug 88 13:42:27 PDT Subject: Benchmark Message-ID: <8808102042.AA12887@icsia7.Berkeley.EDU> I suggest the connectionist learning of Finite State Automata (FSA) as an interesting benchmark. For example, people compute the parity of a long binary string by an algorithm equivalent to a 2-state FSA. Sara Porat and I have a constructive proof that FSA learning can be done from a complete, lexicographically ordered sample so that might be a reasonable subcase to consider. It is known that the general case is NP complete, i.e. very hard. I dont much like the way our network functions and most of you would hate it, so I don't suggest starting with our paper. Should anyone care, I could expound on why the FSA learning problem is an important test of learning models. From Scott.Fahlman at B.GP.CS.CMU.EDU Wed Aug 10 20:11:45 1988 From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU) Date: Wed, 10 Aug 88 20:11:45 EDT Subject: Benchmark In-Reply-To: Your message of Wed, 10 Aug 88 13:42:27 -0700. <8808102042.AA12887@icsia7.Berkeley.EDU> Message-ID: Jerry, Could you specify some particular FSA problem that we could all take a crack at? Ideally, people proposing learning-speed (or learning at all) benchmarks should specify the problem in enough detail that the results of different learning approaches can be compared. If you can provide one set of results as a starting point, that's better still. I'm not sure if you have in mind a set of problems that can only be attacked by nets with directed loops, or whether you are directly training the combinational logic network that combines input with previous state bits to get new state bits. In other words, are you (the trainer) telling the network what states the memory bits are to assume or is that part of what must be learned? -- Scott From mjolsness-eric at YALE.ARPA Wed Aug 10 22:53:17 1988 From: mjolsness-eric at YALE.ARPA (Eric Mjolsness) Date: Wed, 10 Aug 88 22:53:17 EDT Subject: Tech report available Message-ID: <8808110252.AA26050@NEBULA.SUN3.CS.YALE.EDU> The following technical report is now available. ------------------------------------------------------------------------------ Optimization in Model Matching and Perceptual Organization: A First Look Eric Mjolsness, Gene Gindi, and P. Anandan (YALEU/DCS/RR-634) Abstract We introduce an optimization approach for solving problems in computer vision that involve multiple levels of abstraction. Specifically, our objective functions can include compositional hierarchies involving object-part relationships and specialization hierarchies involving object-class relationships. The large class of vision problems that can be subsumed by this method includes traditional model matching, perceptual grouping, dense field computation (regularization), and even early feature detection which is often formulated as a simple filtering operation. Our approach involves casting a variety of vision problems as inexact graph matching problems, formulating graph matching in terms of constrained optimization, and using analog neural networks to perform the constrained optimization. We will show the application of this approach to shape recognition in a domain of stick-figures and to the perceptual grouping of line segments into long lines. ------------------------------------------------------------------------------ available from: connolly-eileen at yale.cs.edu alternatively: connolly-eileen at yale.arpa or write to: Eileen Connolly Yale Computer Science Dept 51 Prospect Street P.O. Box 2158 Yale Station New Haven CT 06520 Please include a physical address with your request. ------- From neuron at ei.ecn.purdue.edu Thu Aug 11 12:37:46 1988 From: neuron at ei.ecn.purdue.edu (Manoel Fernando Tenorio) Date: Thu, 11 Aug 88 11:37:46 EST Subject: Tech report available In-Reply-To: Your message of Wed, 10 Aug 88 22:53:17 EDT. <8808110252.AA26050@NEBULA.SUN3.CS.YALE.EDU> Message-ID: <8808111637.AA11417@ei.ecn.purdue.edu> Could you plese send a copy of this report. I was unable to reach the other email address... M. F. Tenorio School of ELectrical Engineering Purdue University W> Lafayette, IN 47907 From chrisley.pa at Xerox.COM Thu Aug 11 16:48:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 11 Aug 88 13:48 PDT Subject: Benchmark In-Reply-To: Scott.Fahlman@B.GP.CS.CMU.EDU's message of Wed, 10 Aug 88 20:11:45 EDT Message-ID: <880811-135056-3134@Xerox> By the way, Scott Fahlman's Aug 10th comment reminds me of another important distinction in benchmarking: learning rates vs. performance. Since the bulk of the tasks that have been used in benchmarking are deterministic (i.e., non-statistical), the performance comparison has been less interesting: any network worth anything should achieve about the same performance, which is often 100%. Since, in the statistical case, 0% error is generally impossible, the performance of the networks becomes a much more interesting issue. And in many applications, the learning time is off-line, and therefore an irrelevant way to judge the system. A good example where this might not be the case is, ironically, in our own application of speech recognition. Since none of the networks yet developed are truly speaker independent, there must always be some re-calibration when you want real-time, speaker independent recognition. Thus, learning rates are inmportant as well as performance. Prof. Kohonen has it down to 10 minutes for a new (Finnish) speaker, but that is not good enough for many applications. -- Ron From terry at cs.jhu.edu Thu Aug 11 17:41:04 1988 From: terry at cs.jhu.edu (Terry Sejnowski ) Date: Thu, 11 Aug 88 17:41:04 edt Subject: Neural Computation Message-ID: <8808112141.AA28848@crabcake.cs.jhu.edu> Announcement and Call for Papers NEURAL COMPUTATION First Issue: Spring 1989 Editor-in-Chief Terrence Sejnowski The Salk Institute and The University of California at San Diego Neural Computation will provide a unique interdisciplinary forum for the dissemination of important research results and for reviews of research areas in neural computation. Neural computation is a rapidly growing field that is attracting researchers in neuroscience, psychology, physics, mathematics, electrical engineering, computer science, and artificial intelligence. Researchers within these disciplines address, from special perspectives, the twin scientific and engineering challenges of understanding the brain and building computers. The journal serves to bring together work from various application areas, highlighting common problems and techniques in modeling the brain and in the design and construction of neurally-inspired information processing systems. By publishing timely short communications and research reviews, Neural Computation will allow researchers easy access to information on important advances and will provide a valuable overview of the broad range of work contributing to neural computation. The journal will not accept long research articles. The fields covered include neuroscience, computer science, artificial intelligence, mathematics, physics, psychology, linguistics, adaptive systems, vision, speech, robotics, optical computing, and VLSI. Neural Computation is published quarterly by The MIT Press. Board of Editors Editor-in-Chief: Terrence Sejnowski, The Salk Institute and The University of California at San Diego Advisory Board: Shun-ichi Amari, University of Tokyo, Japan Michael Arbib, University of Southern California Jean-Pierre Changeux, Institut Pasteur, France Leon Cooper, Brown University Jack Cowan, University of Chicago Jerome Feldman, University of Rochester Teuovo Kohonen, University of Helsinki, Finland Carver Mead, California Institute of Technology Tomaso Poggio, Massachusetts Institute of Technology Wilfrid Rall, National Institutes of Health Werner Reichardt, Max-Planck-Institut fur Biologische Kybernetik David A. Robinson, Johns Hopkins University David Rumelhart, Stanford University Bernard Widrow, Stanford University Action Editors: Joshua Alspector, Bell Communications Research Richard Andersen, MIT James Anderson, Brown University Dana Ballard, University of Rochester Harry Barrow, University of Sussex Andrew Barto, University of Massachusetts Gail Carpenter, Northeastern University Gary Dell, University of Rochester Gerard Dreyfus, Paris, France Jeffrey Elman, University of California at San Diego Nabil Farhat, University of Pennsylvania Francois Fogelman-Soulie, Paris, France Peter Getting, University of Iowa Ellen Hildreth, Massachusetts Institute of Technology Geoffrey Hinton, University of Toronto, Canada Bernardo Huberman, Xerox, Palo Alto Lawrence Jackel, AT&T Bell Laboratories Scott Kirkpatrick, IBM Yorktown Heights Christof Koch, California Institute of Technology Richard Lippmann, Lincoln Laboratories Stephen Lisberger, University of California San Francisco James McClelland, Carnegie-Mellon University Graeme Mitchison, Cambridge University, England David Mumford, Harvard University Erkki Oja, Kuopio, Finland Andras Pellionisz, New York University Demetri Psaltis, California Institute of Technology Idan Segev, The Hebrew University Gordon Shepherd, Yale University Vincent Torre, Universita di Genova, Italy David Touretzky, Carnegie-Mellon University Roger Traub, IBM Yorktown Heights Les Valiant, Harvard University Christoph von der Malsburg, University of Southern California David Willshaw, Edinburgh, Scotland John Wyatt, Massachusetts Institute of Technology Steven Zucker, McGill University, Canada Instructions to Authors The journal will consider short communications, having no more than 2000 words of text, 4 figures, and 10 citations; and area reviews which summarize significant advances in a broad area of research, with up to 5000 words of text, 8 figures, and 100 citations. The journal will accept one-page summaries for proposed reviews to be considered for solicitation. All papers should be submitted to the editor-in-chief. Authors may recommend one or more of the action editors. Accepted papers will appear with the name of the action aditor that communicated the paper. Before January 1, 1989, please address submissions to: Dr. Terrence Sejnowski Biophysics Department Johns Hopkins University Baltimore, MD 21218 After January 1, 1989, please address submissions to: Dr. Terrence Sejnowski The Salk Institute P.O. Box 85800 San Diego, CA 92138 Subscription Information Neural Computation Annual subscription price (four issues): $90.00 institution $45.00 individual (add $9.00 surface mail or $17.00 airmail postage outside U.S. and Canada) Available from: MIT Press Journals 55 Hayward Street Cambridge, MA 02142 USA 617-253-2889 From pauls at boulder.Colorado.EDU Sun Aug 14 13:55:21 1988 From: pauls at boulder.Colorado.EDU (Paul Smolensky) Date: Sun, 14 Aug 88 11:55:21 MDT Subject: TRs available Message-ID: <8808141755.AA05168@sigi.colorado.edu> Three technical reports are available; please direct requests via e-mail to kate at boulder.colorado.edu or via regular mail to: Paul Smolensky Department of Computer Science University of Colorado Boulder, CO 80309-0430 Thanks -- paul ----------------------------------------------------------------- Analyzing a connectionist model as a system of soft rules Clayton McMillan & Paul Smolensky CU-CS-393-88 March, 1988 In this paper we reexamine the knowledge in the Rumelhart and McClelland (1986) connectionist model of the acquisition of the English past tense. We show that their original connection ma- trix is approximately equivalent to one that can be explicitly decomposed into what we call soft rule matrices. Each soft rule matrix encodes the knowledge of how to handle the verbs in one of the verb classes determined for this task by Bybee & Slobin (1982). This demonstrates one approximate but explicit sense in which it is reasonable to speak of the weights in connectionist networks encoding higher-level rules or schemas that operate in parallel. Our results also suggest that it may be feasible to understand the knowledge in connectionist networks at a level in- termediate between the microscopic level of individual connec- tions and the monolithic level of the entire connection matrix. To appear in the Proceedings of the Tenth Meeting of the Cognitive Science Society ----------------------------------------------------------------- The constituent structure of connectionist mental states: A reply to Fodor and Pylyshyn Paul Smolensky CU-CS-394-88 March, 1988 The primary purpose of this article is to reply to the central point of Fodor and Pylyshyn's (1988) critique of connectionism. The direct reply to their critique comprises Section 2 of this paper. I argue that Fodor and Pylyshyn are simply mistaken in their claim that connectionist mental states lack the necessary constituent structure, and that the basis of this mistake is a failure to appreciate the significance of distributed representa- tions in connectionist models. Section 3 is a broader response to the bottom line of their critique, which is that connection- ists should re-orient their work towards implementation of the classical symbolic cognitive architecture. I argue instead that connectionist research should develop new formalizations of the fundamental computational notions that have been given one par- ticular formal shape in the traditional symbolic paradigm. My response to Fodor and Pylyshyn's critique presumes a certain meta-theoretical context that is laid out in Section 1. In this first section I argue that any discussion of the choice of some framework for cognitive modeling (e.g. the connectionist frame- work) must admit that such a choice embodies a response to a fun- damental cognitive paradox, and that this response shapes the en- tire scientific enterprise surrounding research within that framework. Fodor and Pylyshyn are implicitly advocating one class of response to the paradox over another, their critique is analyzed in this light. In the Southern Journal of Philosophy, special issue on Connectionism and the Foundations of Cognitive Science ---------------------------------------------------------------- Application of the Interactive Activation Model to Document Retrieval Jonathan Bein & Paul Smolensky CU-CS-405-88 May 1988 In this paper we consider an application of the Interactive Ac- tivation Model [McClelland 82] to the problem of document re- trieval. The issues in this application center around a neural net or "connectionist" model called inductive information Re- trieval set forth in [Mozer 84]. The paper provides empirical results on the robustness of this model using a real-world docu- ment database consisting of 13,000 documents. To appear in the Proceedings of NeuroNimes: Neural Networks and their Applications From panther!panther.UUCP!gjt at uxc.cso.uiuc.edu Mon Aug 15 18:59:41 1988 From: panther!panther.UUCP!gjt at uxc.cso.uiuc.edu (Gerry Tesauro) Date: Mon, 15 Aug 88 17:59:41 CDT Subject: New address Message-ID: <8808152259.AA00453@panther.ccsr.uiuc.edu> Effective tomorrow, Aug. 16, I will no longer be at CCSR. I am moving to IBM Watson in New York. For those of you who wish to request copies of CCSR Technical Reports CCSR-88-1, -2, -6 and -7, please send your requests to jean%panther at uxc.cso.uiuc.edu Requests for my other publications should be sent to me by postcard at IBM (I do not have an e-mail address yet). The address is: Dr. Gerald Tesauro IBM Watson Labs. P. O. Box 704 Yorktown Heights, NY 10598 (Tel: 914-789-7863) Thanks, -Gerry ------ From hinton at ai.toronto.edu Mon Aug 15 20:06:26 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Mon, 15 Aug 88 20:06:26 EDT Subject: Benchmark In-Reply-To: Your message of Wed, 10 Aug 88 16:42:27 -0400. Message-ID: <88Aug15.172644edt.284@neat.ai.toronto.edu> I am interested in why FSA learning should be a benchmark task. Please expound. Geoff From jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU Tue Aug 16 16:40:56 1988 From: jfeldman%icsia7.Berkeley.EDU at BERKELEY.EDU (Jerry Feldman) Date: Tue, 16 Aug 88 13:40:56 PDT Subject: Benchmark Message-ID: <8808162040.AA17000@icsia7.Berkeley.EDU> Geoff, I thought you'd never ask. Finite State Automata (FSA) are the most primitive infitary systems. Almost all connectionist learning has involved only input/output maps, a very restricted form of computation. Tasks of psychological or engineering interest involve multiple step calculcations and FSA are the simplest of these. There is also lots of literature on FSA and the learning thereof and the generalization issue is clear. The benchmark task is to exhibit an initial network and learning rule that will converge(approximately, if you'd like) to a minimal FSA from ANY large sample generated by an FSA. There are several encodings possible including one-unit/one-state, Jordan's state vector or even the state of the whole network. As I said in response to Fahlman, a solution in any form would be fine. Jerry From hi.pittman at MCC.COM Wed Aug 17 16:40:00 1988 From: hi.pittman at MCC.COM (James Arthur Pittman) Date: Wed, 17 Aug 88 15:40 CDT Subject: Infitary systems In-Reply-To: <12423192061.47.DOUTHAT@A.ISI.EDU> Message-ID: <19880817204001.6.PITTMAN@DIMEBOX.ACA.MCC.COM> Douthat asks: BTW: what is an "infitary" system? Perhaps it is an infinitely infantile military system? From chrisley.pa at Xerox.COM Thu Aug 18 19:22:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 18 Aug 88 16:22 PDT Subject: Has anyone heard of ALOPEX? Message-ID: <880818-162715-7616@Xerox> I would appreciate it if anyone could give me information about ALOPEX (greek for "fox"), a neural net program released by a publisher of the same name. A friend of mine would like details, including information such as when it was released, etc. Thanks. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From Mark.J.Zeren at mac.Dartmouth.EDU Tue Aug 23 17:32:19 1988 From: Mark.J.Zeren at mac.Dartmouth.EDU (Mark.J.Zeren@mac.Dartmouth.EDU) Date: 23 Aug 88 17:32:19 EDT Subject: Japanese Connectionists? Message-ID: <17114@mac.dartmouth.edu> I am an undergraduate at Dartmouth College and have become very interested in connectionist research. I am spending the next year (Sept '88 - Sept '89) in Japan. I will be studying language through May with the University of Illinois program at Konan University outside of Kobe. If possible, I would like to pursue my interest in neural nets while in Japan, particularly next summer, thus getting "two birds with one stone." Any information about the connectionist community in Japan would be of use to me. Mark Zeren mark.zeren at dartmouth.edu From kawahara at av-convex.ntt.jp Tue Aug 23 19:48:30 1988 From: kawahara at av-convex.ntt.jp (Hideki KAWAHARA) Date: Wed, 24 Aug 88 08:48:30+0900 Subject: Japanese Connectionists? Message-ID: <8808232348.AA02761@av-convex.NTT.jp> Welcom to Japan, Mr Zeren. Connectionism or so colled Neuro-computing in Japan is growing very rapidly since last year. Almost every technical psycholgical biological etc. societies are featured neural-nets sessions in their annual meetings. I have just returned from a private meeting which , I think, is the most important inter-diciplinary meeting for researchers in this field. It is "Neural Information Science Workshop" initiated by Dr.Fukushima of NHK labs. It is an annual meeting over 10 years. I'll report it later. You can contact many researchers via CSNET-JUNET link. I think about 100 organizations are reading connectionists mails regularly. It will be convenient for you to contact the ATR labs., which is located near Kobe where you will stay. Hideki Kawahara kawahara%nttlab.ntt.JP at RELAY.CS.NET (from ARPA site) NTT Basic Research Labs. From alexis%yummy at gateway.mitre.org Wed Aug 24 12:04:17 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Wed, 24 Aug 88 12:04:17 EDT Subject: Mistake in DARPA NN Report Message-ID: <8808241604.AA02507@marzipan.mitre.org> I just read the executive summary of the "DARPA Neural Network Study" by MIT/Lincoln Labs which is really quite good (I would have prefered less emphasis on computing power and more on say learning but ...). Unfortunately they repeat a mistake in the intro about ability of feed- forward networks. In Figure 4-4 and the supporting text on p. 15 they state that a net with 2 in and 1 out can partition the 2D input space as such: One-Layer ----- Two-Layer ----- Three-Layer ############ #######::::::: ####:::::::::: :::::::::::::: ::##### A ## ## A ##:: B :: ######::: B :: : B :###:::::: ::::######## #######::::::: ## A ###:::::: ::::######:::: ::::::###### :::::::####### ########:::::: :::### A ###:: : B ::::#### :: B ::## A ## #######::::::: :::::######::: ::::::::::## :::::::####### ######:::::::: :::::::::::::: Certainly a one-layer (i.e., Perceptron) can linearly partition, and a three-layer (with enough nodes) can do anything, but otherwise the figure is all wrong. The "island" shown for a three- :::::::::::::: layer can easily be done by a two layer. In our paper :::########::: "Geometric Analysis of Neural Network Capabilities" :::##::::##::: (ICNN87, VIII p385) we bother to take this to the :::##::::::::: extreme by doing something like the "C" (for convex) :::##::::##::: at left. Actually any finite number of finitely :::########::: complex items can be done with a two-layer net. :::::::::::::: Far worse, the "four-quadrant" problem shown under ######:::::: two-layers *CANNOT* be done with two layers. There ####:::::::: are few problems that can't be done with two layers, ##:::::::::: but the easiest I know of is precisely that. Assuming ::::::::::## thoses boundaries go on to +/- infinity this requires ::::::::#### a three-layer net (if they only go a finite distance ::::::###### you can do it with 2-layer if the inputs go to both layers). The report states that this is how an XOR is done with two layers, when in fact it is done by having a single "valley" (or equiv. a "mountain" the other way) like the fig at left. Just grumbling .... alexis wieland MITRE Corp. From harnad at Princeton.EDU Wed Aug 24 14:15:27 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 24 Aug 88 14:15:27 edt Subject: Pinker & Prince on Rules & Learning Message-ID: <8808241815.AA28089@mind> On Pinker & Prince on Rules & Learning Steve: Having read your Cognition paper and twice seen your talk (latest at cogsci-88), I thought I'd point out what look like some problems with the argument (as I understand it). In reading my comments, please bear in mind that I am NOT a connectionist; I am on record as a sceptic about connectionism's current accomplishments (and how they are being interpreted and extrapolated) and as an agnostic about its future possibilities. (Because I think this issue is of interest to the connectionist/AI community as a whole, I am branching a copy of this challenge to connectionists and comp.ai.) (1) An argument that pattern-associaters (henceforth "nets") cannot do something in principle cannot be based on the fact that a particular net (Rumelhart & McClelland 86/87) has not done it in practice. (2) If the argument is that nets cannot learn past tense forms (from ecologically valid samples) in principle, then it's the "in principle" part that seems to be missing. For it certainly seems incorrect that past tense formation is not learnable in principle. I know of no poverty-of-the-stimulus argument for past tense formation. On the contrary, the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample. (This is without even mentioning that surely no one imagines that past-tense formation is an independent cognitive module; it is probably learned jointly with other morphological regularities and irregularities, and there may well be degrees-of-freedom-reducing cross-talk.) (3) If the argument is only that nets cannot learn past tense forms without rules, then the matter is somewhat vaguer and more equivocal, for there are still ambiguities about what it is to be or represent a "rule." At the least, there is the issue of "explicit" vs. "implicit" representation of a rule, and the related Wittgensteinian distinction between "knowing" a rule and merely being describable as behaving in accordance with a rule. These are not crisp issues, and hence not a solid basis for a principled critique. For example, it may well be that what nets learn in order to form past tenses correctly is describable as a rule, but not explicitly represented as one (as it would be in a symbolic program); the rule may simple operate as a causal I/O constraint. Ultimately, even conditional branching in a symbolic program is implemented as a causal constraint; "if/then" is really just an interpretation we can make of the software. The possibility of making such systematic, decomposable semantic intrepretations is, of course, precisely what distinguishes the symbolic approach from the connectionistic one (as Fodor/Pylyshyn argue). But at the level of a few individual "rules," it is not clear that the higher-order interpretation AS a formal rule, and all of its connotations, is justified. In any case, the important distinction is that the net's "rules" are LEARNED from statistical regularities in the data, rather than BUILT IN (as they are, coincidentally, in both symbolic AI and poverty-of-the-stimulus-governed linguistics). [The intermediate case of formally INFERRED rules does not seem to be at issue here.] So here are some questions: (a) Do you believe that English past tense formation is NOT learnable (except as "parameter settings" on an innate structure, from impoverished data)? If so, what are the supporting arguments for that? (b) If past tense formation IS learnable in the usual sense (i.e., by trial-and-error induction of regularities from the data sample), then do you believe that it is specifically unlearnable by nets? If so, what are the supporting arguments for that? (c) If past tense formation IS learnable by nets, but only if the invariance that the net learns and that comes to causally constrain its successful performance is describable as a "rule," what's wrong with that? Looking forward to your commentary on Lightfoot, where poverty-of-the-stimulus IS the explicit issue, -- best wishes, Stevan Harnad From chrisley.pa at Xerox.COM Wed Aug 24 20:50:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 24 Aug 88 17:50 PDT Subject: Roommates wanted In-Reply-To: huyser@mojave.Stanford.EDU (Karen Huyser)'s message of Wed, 10 Aug 88 10:34:31 PDT Message-ID: <880824-180000-3025@Xerox> Is Vip still looking for a roommate for INNS? -- Ron From harnad at Princeton.EDU Thu Aug 25 01:17:37 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Thu, 25 Aug 88 01:17:37 edt Subject: On Pinker & Prince On Rules & Learning Message-ID: <8808250517.AA03589@mind> On Pinker & Prince on Rules & Learning To: Steve Pinker, Psychology, MIT Steve: Having read your Cognition paper and twice seen your talk (latest at cogsci-88), I thought I'd point out what look like some problems with the argument (as I understand it). In reading my comments, please bear in mind that I am NOT a connectionist; I am on record as a sceptic about connectionism's current accomplishments (and how they are being interpreted and extrapolated) and as an agnostic about its future possibilities. (Because I think this issue is of interest to the connectionist/AI community as a whole, I am branching a copy of this challenge to connectionists and comp.ai.) (1) An argument that pattern-associaters (henceforth "nets") cannot do something in principle cannot be based on the fact that a particular net (Rumelhart & McClelland 86/87) has not done it in practice. (2) If the argument is that nets cannot learn past tense forms (from ecologically valid samples) in principle, then it's the "in principle" part that seems to be missing. For it certainly seems incorrect that past tense formation is not learnable in principle. I know of no poverty-of-the-stimulus argument for past tense formation. On the contrary, the regularities you describe -- both in the irregular verbs and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample. (This is without even mentioning that surely no one imagines that past-tense formation is an independent cognitive module; it is probably learned jointly with other morphological regularities and irregularities, and there may well be degrees-of-freedom-reducing cross-talk.) (3) If the argument is only that nets cannot learn past tense forms without rules, then the matter is somewhat vaguer and more equivocal, for there are still ambiguities about what it is to be or represent a "rule." At the least, there is the issue of "explicit" vs. "implicit" representation of a rule, and the related Wittgensteinian distinction between "knowing" a rule and merely being describable as behaving in accordance with a rule. These are not crisp issues, and hence not a solid basis for a principled critique. For example, it may well be that what nets learn in order to form past tenses correctly is describable as a rule, but not explicitly represented as one (as it would be in a symbolic program); the rule may simply operate as a causal I/O constraint. Ultimately, even conditional branching in a symbolic program is likewise implemented as a causal constraint; "if/then" is really just an interpretation we can make of the software. The possibility of making such systematic, decomposable semantic intrepretations is, of course, precisely what distinguishes the symbolic approach from the connectionistic one (as Fodor/Pylyshyn argue). But at the level of a few individual "rules," it is not clear that the higher-order interpretation AS a formal rule, and all of its connotations, is justified. In any case, the important distinction is that the net's "rules" are LEARNED from statistical regularities in the data, rather than BUILT IN (as they are, coincidentally, in both symbolic AI and poverty-of-the-stimulus-governed linguistics). [The intermediate case of formally INFERRED rules does not seem to be at issue here.] So here are some questions: (a) Do you believe that English past tense formation is NOT learnable (except as "parameter settings" on an innate structure, from impoverished data)? If so, what are the supporting arguments for that? (b) If past tense formation IS learnable in the usual sense (i.e., by trial-and-error induction of regularities from the data sample), then do you believe that it is specifically not learnable by nets? If so, what are the supporting arguments for that? (c) If past tense formation IS learnable by nets, but only if the invariance that the net learns and that comes to causally constrain its successful performance is describable as a "rule," what's wrong with that? Looking forward to your commentary on Lightfoot, where poverty-of-the-stimulus IS the explicit issue, -- best wishes, Stevan Harnad From alexis%yummy at gateway.mitre.org Thu Aug 25 12:41:36 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 25 Aug 88 12:41:36 EDT Subject: DARPA NN Report Message-ID: <8808251641.AA03463@marzipan.mitre.org> I've had a number of people asking how to get the DARPA report. The report was announced at the Government Panel at the ICNN88. Call the Pentagon in Washington, DC at (202)697-5737 to obtain a copy of the (78 page) executive summary {sorry, I don't know a E-mail or snail-mail address}. The complete 600 page study final report is supposed to be available as a Lincoln Labs Report and as a book (possibly published by AFSIA). alexis. From alexis%yummy at gateway.mitre.org Thu Aug 25 07:47:12 1988 From: alexis%yummy at gateway.mitre.org (alexis%yummy@gateway.mitre.org) Date: Thu, 25 Aug 88 07:47:12 EDT Subject: four-quadrant problem Message-ID: <8808251147.AA03129@marzipan.mitre.org> It was long and had nice pictures, but not correct ... Before someone else catches it (I based part of my message in part on old notes that counted layers different): a) You can't do the four-quadrant problem with two layers ever (even for a finite distance) b) You can't do "any finite number of finitely complex items" with 2- layers, you can often do lots, but a counter example is (a) above. This makes the DARPA report even more certainly wrong, but in also puts me in a glass house to some degree ... alexis From pauls at boulder.Colorado.EDU Thu Aug 25 12:14:52 1988 From: pauls at boulder.Colorado.EDU (Paul Smolensky) Date: Thu, 25 Aug 88 10:14:52 MDT Subject: Mistake in DARPA NN Report Message-ID: <8808251614.AA24053@sigi.colorado.edu> if we're going to discuss the DARPA report on this net, we should discuss not only the technical syntactic sugar but also the political ramifications. can you tell us all how to get a copy of the report so we can have an informed political discussion? thanks, Paul Smolensky Dept. of Computer Science Univ. of Colorado Box 430 Boulder, CO 80309-0430 From ceci at boulder.Colorado.EDU Thu Aug 25 15:44:15 1988 From: ceci at boulder.Colorado.EDU (Lou Ceci) Date: Thu, 25 Aug 88 13:44:15 MDT Subject: Mistake in DARPA NN Report Message-ID: <8808251944.AA04670@tut> Dear Mr. Wieland: I enjoyed the article you and Mr. Leighton did for the ICNN more than any other. It was not only well-written, it was *CLEAR*--a rarity in today's connectionist literature. Other than proof by demonstration, can you point me to the mathematics behind the claims that "a neural net with X number of layers can partition a space into Y different regions of Z complexity"? Thanks. --Lou Ceci CU Boulder ceci at boulder.colorado.edu From chrisley.pa at Xerox.COM Fri Aug 26 16:26:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 26 Aug 88 13:26 PDT Subject: four-quadrant problem In-Reply-To: alexis%yummy@gateway.mitre.org's message of Thu, 25 Aug 88 07:47:12 EDT Message-ID: <880826-134143-6475@Xerox> I am very interested, as may be other connectionists who have not read the DARPA report, in knowing what the four-quadrant problem is, exactly. I am interested in any task that is proposed as not being able to be solved by a 2-layer network, since Huang and Lippmann seem to indicate that there may not be any such tasks. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From hinton at ai.toronto.edu Fri Aug 26 18:23:01 1988 From: hinton at ai.toronto.edu (Geoffrey Hinton) Date: Fri, 26 Aug 88 18:23:01 EDT Subject: tenure-track job Message-ID: <88Aug26.154316edt.640@neat.ai.toronto.edu> McMaster University in Hamilton (near Toronto) has a tenure stream position in computer science for a "neural net" person. They have a project on multisensor fusion. If you are near graduation or a postdoc and you are interested call Dr. Simon Haykin, Director Communications Research Lab. 416-648-6589 for details. Geoff From strom at ogcvax.ogc.edu Sat Aug 27 19:51:20 1988 From: strom at ogcvax.ogc.edu (Dan Hammerstrom) Date: Sat, 27 Aug 88 16:51:20 PDT Subject: Faculty Position Message-ID: <8808272351.AA26695@ogcvax.OGC.EDU> FACULTY POSITION AVAILABLE: Connectionist/Neural Networks The Oregon Graduate Center The Computer Science/Engineering Department at the Oregon Gradu- ate Center seeks to hire faculty in the field of Connectionist/Neural Networks. We are interested in expanding an already successful program in this important area. Our current connectionist program is strongly oriented towards VLSI implemen- tation, and we regularly fabricate silicon to support our research efforts. In addition to VLSI, we are also starting a strong speech recognition effort with the arrival of a new faculty member, Ron Cole, who just recently joined us from Carne- gie Mellon University. Our program is well funded, and we actively seek additional research talent in the form of either junior or senior faculty. The Oregon Graduate Center is a private institute for research and graduate education (MS and PhD) in the applied sciences. OGC gives its faculty unmatched freedom and responsibility in direct- ing their research programs. The typical OGC faculty member spends 2/3 of his or her time on research. OGC is in the heart of Oregon's Sunset Corridor, amid such companies as Tektronix, Intel, Floating Point Systems, Mentor Graphics, Sequent, Cogent, National Semiconductor, Lattice, Fujitsu, Adaptive Systems Inc., NCube, Servio Logic, and NEC. OGC works because it has a first- rate faculty that is motivated and self-directed. In addition, the state of Oregon, in conjunction with OGC other Oregon schools and local industry, has begun OACIS (the Oregon Advanced Computer Institute). This institute will eventually provide excellent computing resources for research in parallel applications. The Department occupies a new building with comfortable offices, extensive laboratory space, and built-in computer communications. Its equipment base includes Tektronix, DEC, Sun, and Mentor workstations, high-resolution color graphic design stations, a Sequent Symmetry, a Cogent system, and an Intel iPSC Hypercube. The Portland environment is as stimulating as that at OGC: the climate is mild, there is easy access to year-round skiing, ocean beaches, and hiking in mountains and high desert. Dan Hammerstrom Department of Computer Science/Engineering Oregon Graduate Center 19600 NW von Neumann Dr. Beaverton, OR 97007 (503) 690-1160 CSNET: strom at ogc.edu From strom at ogcvax.ogc.edu Sat Aug 27 19:51:20 1988 From: strom at ogcvax.ogc.edu (Dan Hammerstrom) Date: Sat, 27 Aug 88 16:51:20 PDT Subject: Faculty Position Message-ID: <8808272351.AA26695@ogcvax.OGC.EDU> FACULTY POSITION AVAILABLE: Connectionist/Neural Networks The Oregon Graduate Center The Computer Science/Engineering Department at the Oregon Gradu- ate Center seeks to hire faculty in the field of Connectionist/Neural Networks. We are interested in expanding an already successful program in this important area. Our current connectionist program is strongly oriented towards VLSI implemen- tation, and we regularly fabricate silicon to support our research efforts. In addition to VLSI, we are also starting a strong speech recognition effort with the arrival of a new faculty member, Ron Cole, who just recently joined us from Carne- gie Mellon University. Our program is well funded, and we actively seek additional research talent in the form of either junior or senior faculty. The Oregon Graduate Center is a private institute for research and graduate education (MS and PhD) in the applied sciences. OGC gives its faculty unmatched freedom and responsibility in direct- ing their research programs. The typical OGC faculty member spends 2/3 of his or her time on research. OGC is in the heart of Oregon's Sunset Corridor, amid such companies as Tektronix, Intel, Floating Point Systems, Mentor Graphics, Sequent, Cogent, National Semiconductor, Lattice, Fujitsu, Adaptive Systems Inc., NCube, Servio Logic, and NEC. OGC works because it has a first- rate faculty that is motivated and self-directed. In addition, the state of Oregon, in conjunction with OGC other Oregon schools and local industry, has begun OACIS (the Oregon Advanced Computer Institute). This institute will eventually provide excellent computing resources for research in parallel applications. The Department occupies a new building with comfortable offices, extensive laboratory space, and built-in computer communications. Its equipment base includes Tektronix, DEC, Sun, and Mentor workstations, high-resolution color graphic design stations, a Sequent Symmetry, a Cogent system, and an Intel iPSC Hypercube. The Portland environment is as stimulating as that at OGC: the climate is mild, there is easy access to year-round skiing, ocean beaches, and hiking in mountains and high desert. Dan Hammerstrom Department of Computer Science/Engineering Oregon Graduate Center 19600 NW von Neumann Dr. Beaverton, OR 97007 (503) 690-1160 CSNET: strom at ogc.edu From jmlubin at phoenix.Princeton.EDU Mon Aug 29 04:33:06 1988 From: jmlubin at phoenix.Princeton.EDU (Joseph Michael Lubin) Date: Mon, 29 Aug 88 04:33:06 edt Subject: mailing list Message-ID: <8808290833.AA19782@phoenix.Princeton.EDU> if this is the request node for the connectionists mailing list please add my name if you ahve access to the list please forward my name thank you, Joseph Lubin From nutto%UMASS.BITNET at VMA.CC.CMU.EDU Mon Aug 29 19:24:32 1988 From: nutto%UMASS.BITNET at VMA.CC.CMU.EDU (nutto%UMASS.BITNET@VMA.CC.CMU.EDU) Date: Mon, 29 Aug 88 19:24:32 EDT Subject: Neural Computation Message-ID: <880829192242C37.AFRK@Mars.UCC.UMass.EDU> (UMass-Mailer 4.04) I saw your address in a recent message to the Biotech, Physics, and Psychology digests. I am a psychology major concentrating in neuroscience and minoring in zoology, and I would appreciate it if you could send me any information about your mailing list. Thanx. USnail: Andy Steinberg BITNet: nutto at UMass PO Box 170 nutto at Mars.UCC.UMass.EDU Hadley, MA 01035-0170 Internet: nutto%UMass.BITNet at cunyvm.cuny.edu Phone: (413) 546-4908 nutto%UMass.BITNet at mitvma.mit.edu From munnari!nswitgould.oz.au!geof at uunet.UU.NET Tue Aug 30 12:10:09 1988 From: munnari!nswitgould.oz.au!geof at uunet.UU.NET (Geoffrey Jones) Date: Tue, 30 Aug 88 11:10:09 EST Subject: Jordan paper request Message-ID: <8808300249.AA02218@uunet.UU.NET> Can anyone help me? I'm after a copy of M. I. Jordan's 1986 paper "Attractor dynamics and parallelism in a connectionist sequential machine" which appeared in _Proceedings of the Eighth Annual Meeting of the Cognitive Science Society_, Hillsdale, NJ, Erlbaum. The Proceedings aren't available in any Australian University library, hence the call for an overseas source. Rather than everyone rushing to their filing cabinets, could anyone (preferably the author himself) with access to a copy email me to that effect and we can go from there. Thank-you all in advance. Cheers. geof. ---------------------------------------------------------------------------- Geoffrey Jones ACSnet: geof at nswitgould.oz Dept. of Computer Science CSNET: geof at nswitgould.oz U. of Technology, Sydney ARPA: geof%nswitgould.oz at uunet.uu.net P.O. Box 123, UUCP: {uunet,ukc}!munnari!nswitgould.oz!geof Broadway, 2007 AUSTRALIA Phone: (02) 218 9582 ---------------------------------------------------------------------------- From hendler at dormouse.cs.umd.edu Tue Aug 30 10:15:29 1988 From: hendler at dormouse.cs.umd.edu (Jim Hendler) Date: Tue, 30 Aug 88 10:15:29 EDT Subject: Jordan paper request Message-ID: <8808301415.AA04288@dormouse.cs.umd.edu> > Can anyone help me? I'm after a copy of M. I. Jordan's 1986 paper > "Attractor dynamics and parallelism in a connectionist sequential machine" > which appeared in _Proceedings of the Eighth Annual Meeting of the > Cognitive Science Society_, Hillsdale, NJ, Erlbaum. The Proceedings Geof brings up a good point, with more and more connectionist papers showing up at Coggie Sci., those of us missing the meetings for one reason or another end up missing some good papers. Does anyone know if the Proceedings can be ordered separately these days? In the old days the Cog Sci proceedings were NOT available outside the conference, is this still true? If so, could we agitate a bit to change it? -Jim H From Dave.Touretzky at B.GP.CS.CMU.EDU Tue Aug 30 21:47:47 1988 From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU) Date: Tue, 30 Aug 88 21:47:47 EDT Subject: Jordan paper request In-Reply-To: Your message of Tue, 30 Aug 88 10:15:29 -0400. <8808301415.AA04288@dormouse.cs.umd.edu> Message-ID: <4483.588995267@DST.BOLTZ.CS.CMU.EDU> The proceedings of this year's and previous Cognitive Science conferences can be ordered from Lawrence Erlbaum Associates. I bought a copy of this year's proceedings at AAAI last week. The price for this year's Cog Sci proceedings is $49.99. If you pay by check, LEA will pay postage and handling (US and Canada only); outside the Americas add $5 per book. If you charge your order to a VISA, MasterCard, AmEx, or Discover card, UPS charges will be addded to the bill. New Jersey residents must add sales tax. Order from: Lawrence Erlbaum Associates, Inc. 365 Broadway Hillsdale, NJ 07642 tel. 201-666-4110 -- Dave PS: sending email to hundreds of people with the push of a button can be fun, but sometimes it pays to know how to use a telephone. From steve at cogito.mit.edu Tue Aug 30 18:44:05 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Tue, 30 Aug 88 18:44:05 edt Subject: Reply to S. Harnad's questions, short version Message-ID: <8808302244.AA27242@ATHENA.MIT.EDU> Alluding to our paper "On Language and Connectionism: Analysis of a PDP model of language acquisition", Stevan Harnad has posted a list of questions and observations as a 'challenge' to us. His remarks owe more to the general ambience of the connectionism / symbol-processing debate than to the actual text of our paper, in which the questions are already answered. We urge those interested in these issues to read the paper or the nutshell version published in Trends in Neurosciences, either of which may be obtained from Prince (address below). In this note we briefly answer Harnad's three questions. In another longer message to follow, we direct an open letter to Harnad which justifies the answers and goes over the issues he raises in more detail. Question # 1: Do we believe that English past tense formation is not learnable? Of course we don't! So imperturbable is our faith in the learnability of this system that we ourselves propose a way in which it might be done (OLC, 130-136). Question #2: If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but with which theories are true, and our conclusions were about pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave in the way that such a pattern associator behaves. Question # 3: If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! --just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) but only if the invariance that the molecules learn, etc. etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as *irregular* past tense forms. But it is simply a fact about the *regular* past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical predictions of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without an appeal to the rules that they happen to implement. But you can't have it both ways. Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu References: Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From steve at cogito.mit.edu Tue Aug 30 18:46:06 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Tue, 30 Aug 88 18:46:06 edt Subject: Reply to S. Harnad's questions, longer version Message-ID: <8808302246.AA27270@ATHENA.MIT.EDU> Dear Stevan, This letter is a reply to your posted list of questions and observations alluding to our paper "On language and connectionism: Analysis of a PDP model of language acquisition" (Pinker & Prince, 1988; see also Prince and Pinker, 1988). The questions are based on misunderstandings of our papers, in which they are already answered. (1) Contrary to your suggestion, we never claimed that pattern associators cannot learn the past tense rule, or anything else, in principle. Our concern is with which theories of the psychology of language are true. This question cannot be answered from an archair but only by examining what people learn and how they learn it. Our main conclusion is that the claim that the English past tense rule is learned and represented as a pattern-associator with distributed representations over phonological features for input and output forms (e.g., the Rumelhart-McClelland 1986 model) is false. That's because what pattern-associators are good at is precisely what the regular rule doesn't need. Pattern associators are designed to pick up patterns of correlation among input and output features. The regular past tense alternation, as acquired by English speakers, is not systematically sensitive to phonological features. Therefore some of the failures of the R-M model we found are traceable to its trying to handle the regular rule with an architecture inappropriate to the regular rule. We therefore predict that these failures should be seen in other network models that compute the regular past tense alternation using pattern associators with distributed phonological representations (*not* all conceivable network models, in general, in principle, forever, etc.). This prediction has been confirmed. Egedi and Sproat (1988) devised a network model that retained the assumption of associations between distributed phonological representations but otherwise differed radically from the R-M model: it had three layers, not two; it used a back-propagation learning rule, not just the simple perceptron convergence procedure; it used position-specific phonological features, not context-dependent ones; and it had a completely different output decoder. Nonetheless its successes and failures were virtually identical to those of the R-M model. (2) You claim that "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." This is an interesting claim and we strongly encourage you to back it up with argument and analysis; a real demonstration of its truth would be a significant advance. It's certainly false of the R-M and Egedi-Sproat models. There's a real danger in this kind of glib commentary of trivializing the issues by assuming that net models are a kind of miraculous wonder tissue that can do anything. The brilliance of the Rumelhart and McClelland (1986) paper is that they studiously avoided this trap. In the section of their paper called "Learning regular and exceptional patterns in a pattern associator" they took great pains to point out that pattern associators are good at specific things, especially exploiting statistical regularities in the mapping from one set of featural patterns to another. They then made the interesting emprical claim that these basic properties of the pattern associator model lie at the heart of the acquisition of the past tense. Indeed, the properties of the model afforded it some interesting successes with the *irregular* alternations, which fall into family resemblance clusters of the sort that pattern associators handle in interesting ways. But it is exactly these properties of the model that made it fail at the *regular* alternation, which does not form family resemblance clusters. We like to think that these kinds of comparisons make for productive empirical science. The successes of the pattern associator architecture for irregulars teaches us something about the psychology of the irregulars (basically a memory phenomenon, we argue), and its failures for the regulars teach us something about the psychology of the regulars (use of a default rule, we argue). Rumelhart and McClelland disagree with us over the facts but not over the key emprical tests. They hold that pattern associators have particular aptitudes that are suited to modeling certain kinds of processes, which they claim are those of cognition. One can argue for or against this and learn something about psychology while so doing. Your claim about a 'statistical pattern learner...sensitive to higher order correlations' is essentially impossible to evaluate. (3) We're mystified that you attribute to us the claim that "past tense formation is not learnable in principle." The implication is that our critique of the R-M model was based on the assertion that the rule is unlearned and that this is the key issue separating us from R&M. Therefore -- you seem to reason -- if the rule is learned, it is learned by a network. But both parts are wrong. No one in his right mind would claim that the English past tense rule is "built in". We spent a full seven pages (130-136) of 'OLC' presenting a simple model of how the past tense rule might be learned by a symbol manipulation device. So obviously we don't believe it can't be learned. The question is how children in fact do it. The only way we can make sense of this misattribution is to suppose that you equate "learnable" with "learnable by some (nth-order) statistical algorithm". The underlying presupposition is that statistical modeling (of an undefined character) has some kind of philosophical priority over other forms of analysis; so that if statistical modeling seems somehow possible-in-principle, then rule-based models (and the problems they solve) can be safely ignored. As a kind of corollary, you seem to assume that unless the input is so impoverished as to rule out all statistical modeling, rule theories are irrelevant; that rules are impossible without major stimulus-poverty. In our view, the question is not CAN some (ungiven) algorithm 'learn' it, but DO learners approach the data in that fashion. Poverty-of-the-stimulus considerations are one out of many sources of evidence in this issue. (In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system: across languages, speakers automatically regularize verbs derived from nouns and adjectives (e.g., 'he high-sticked/*high-stuck the goalie'; she braked/*broke the car'), despite virtually no exposure to crucial informative data in childhood. This is evidence that the system is built around representations corresponding to the constructs 'word', 'root', and 'irregular'; see OLC 110-114.) (4) You bring up the old distinction between rules that describe overall behavior and rules that are explicitly represented in a computational device and play a causal role in its behavior. Perhaps, as you say, "these are not crisp issues, and hence not a solid basis for a principled critique". But it was Rumelhart and McClelland who first brought them up, and it was the main thrust of their paper. We tend to agree with them that the issues are crisp enough to motivate interesting research, and don't just degenerate into discussions of logical possibilities. We just disagree about which conclusions are warranted. We noted that (a) the R-M model is empirically incorrect, therefore you can't use it to defend any claims for whether or not rules are explicitly represented; (b) if you simply wire up a network to do exactly what a rule does, by making every decision about how to build the net (which features to use, what its topology should be, etc.) by consulting the rule-based theory, then that's a clear sense in which the network "implements" the rule. The reason is that the hand-wiring and tweaking of such a network would not be motivated by principles of connectionist theory; at the level at which the manipulations are carried out, the units and connections are indistinguishable from one another and could be wired together any way one pleased. The answer to the question "Why is the network wired up that way?" would come from the rule-theory; for example, "Because the regular rule is a default operation that is insensitive to stem phonology". Therefore in the most interesting sense such a network *is* a rule. The point carries over to more complex cases, where one would have different subnetworks corresponding to different parts of rules. Since it is the fact that the network implements such-and-such a rule that is doing the work of explaining the phenomenon, the question now becomes, is there any reason to believe that the rule is implemented in that way rather some other way? Please note that we are *not* asserting that no PDP model of any sort could ever acquire linguistic knowledge without directly implementing linguistic rules. Our hope, of course, is that as the discussion proceeds, models of all kinds will be become more sophisticated and ambitious. As we said in our Conclusion, "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle. At the same time, they show that there is no basis for the belief that connectionism will dissolve the difficult puzzles of language, or even provide radically new solutions to them." So to answer the catechism: (a) Do we believe that English past tense formation is not learnable? Of course we don't! (b) If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but about which theories are true, and our analysis was of pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave the way such a pattern associator behaves. (c) If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! -- just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) such that the invariance that the molecules learn, etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as irregular past tense forms. But it is simply a fact about the regular past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical properties of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without appeals to the rules that they happen to implement. But you can't have it both ways. Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet References: Egedi, D.M. and R.W. Sproat (1988) Neural Nets and Natural Language Morphology, AT&T Bell Laboratories, Murray Hill,NJ, 07974. Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From steve at cogito.mit.edu Wed Aug 31 10:22:56 1988 From: steve at cogito.mit.edu (Steve Pinker) Date: Wed, 31 Aug 88 10:22:56 edt Subject: Reply to S. Harnad's questions, longer version Message-ID: <8808311423.AA05929@ATHENA.MIT.EDU> Dear Stevan, This letter is a reply to your posted list of questions and observations alluding to our paper "On language and connectionism: Analysis of a PDP model of language acquisition" (Pinker & Prince, 1988; see also Prince and Pinker, 1988). The questions are based on misunderstandings of our papers, in which they are already answered. (1) Contrary to your suggestion, we never claimed that pattern associators cannot learn the past tense rule, or anything else, in principle. Our concern is with which theories of the psychology of language are true. This question cannot be answered from an archair but only by examining what people learn and how they learn it. Our main conclusion is that the claim that the English past tense rule is learned and represented as a pattern-associator with distributed representations over phonological features for input and output forms (e.g., the Rumelhart-McClelland 1986 model) is false. That's because what pattern-associators are good at is precisely what the regular rule doesn't need. Pattern associators are designed to pick up patterns of correlation among input and output features. The regular past tense alternation, as acquired by English speakers, is not systematically sensitive to phonological features. Therefore some of the failures of the R-M model we found are traceable to its trying to handle the regular rule with an architecture inappropriate to the regular rule. We therefore predict that these failures should be seen in other network models that compute the regular past tense alternation using pattern associators with distributed phonological representations (*not* all conceivable network models, in general, in principle, forever, etc.). This prediction has been confirmed. Egedi and Sproat (1988) devised a network model that retained the assumption of associations between distributed phonological representations but otherwise differed radically from the R-M model: it had three layers, not two; it used a back-propagation learning rule, not just the simple perceptron convergence procedure; it used position-specific phonological features, not context-dependent ones; and it had a completely different output decoder. Nonetheless its successes and failures were virtually identical to those of the R-M model. (2) You claim that "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." This is an interesting claim and we strongly encourage you to back it up with argument and analysis; a real demonstration of its truth would be a significant advance. It's certainly false of the R-M and Egedi-Sproat models. There's a real danger in this kind of glib commentary of trivializing the issues by assuming that net models are a kind of miraculous wonder tissue that can do anything. The brilliance of the Rumelhart and McClelland (1986) paper is that they studiously avoided this trap. In the section of their paper called "Learning regular and exceptional patterns in a pattern associator" they took great pains to point out that pattern associators are good at specific things, especially exploiting statistical regularities in the mapping from one set of featural patterns to another. They then made the interesting emprical claim that these basic properties of the pattern associator model lie at the heart of the acquisition of the past tense. Indeed, the properties of the model afforded it some interesting successes with the *irregular* alternations, which fall into family resemblance clusters of the sort that pattern associators handle in interesting ways. But it is exactly these properties of the model that made it fail at the *regular* alternation, which does not form family resemblance clusters. We like to think that these kinds of comparisons make for productive empirical science. The successes of the pattern associator architecture for irregulars teaches us something about the psychology of the irregulars (basically a memory phenomenon, we argue), and its failures for the regulars teach us something about the psychology of the regulars (use of a default rule, we argue). Rumelhart and McClelland disagree with us over the facts but not over the key emprical tests. They hold that pattern associators have particular aptitudes that are suited to modeling certain kinds of processes, which they claim are those of cognition. One can argue for or against this and learn something about psychology while so doing. Your claim about a 'statistical pattern learner...sensitive to higher order correlations' is essentially impossible to evaluate. (3) We're mystified that you attribute to us the claim that "past tense formation is not learnable in principle." The implication is that our critique of the R-M model was based on the assertion that the rule is unlearned and that this is the key issue separating us from R&M. Therefore -- you seem to reason -- if the rule is learned, it is learned by a network. But both parts are wrong. No one in his right mind would claim that the English past tense rule is "built in". We spent a full seven pages (130-136) of 'OLC' presenting a simple model of how the past tense rule might be learned by a symbol manipulation device. So obviously we don't believe it can't be learned. The question is how children in fact do it. The only way we can make sense of this misattribution is to suppose that you equate "learnable" with "learnable by some (nth-order) statistical algorithm". The underlying presupposition is that statistical modeling (of an undefined character) has some kind of philosophical priority over other forms of analysis; so that if statistical modeling seems somehow possible-in-principle, then rule-based models (and the problems they solve) can be safely ignored. As a kind of corollary, you seem to assume that unless the input is so impoverished as to rule out all statistical modeling, rule theories are irrelevant; that rules are impossible without major stimulus-poverty. In our view, the question is not CAN some (ungiven) algorithm 'learn' it, but DO learners approach the data in that fashion. Poverty-of-the-stimulus considerations are one out of many sources of evidence in this issue. (In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system: across languages, speakers automatically regularize verbs derived from nouns and adjectives (e.g., 'he high-sticked/*high-stuck the goalie'; she braked/*broke the car'), despite virtually no exposure to crucial informative data in childhood. This is evidence that the system is built around representations corresponding to the constructs 'word', 'root', and 'irregular'; see OLC 110-114.) (4) You bring up the old distinction between rules that describe overall behavior and rules that are explicitly represented in a computational device and play a causal role in its behavior. Perhaps, as you say, "these are not crisp issues, and hence not a solid basis for a principled critique". But it was Rumelhart and McClelland who first brought them up, and it was the main thrust of their paper. We tend to agree with them that the issues are crisp enough to motivate interesting research, and don't just degenerate into discussions of logical possibilities. We just disagree about which conclusions are warranted. We noted that (a) the R-M model is empirically incorrect, therefore you can't use it to defend any claims for whether or not rules are explicitly represented; (b) if you simply wire up a network to do exactly what a rule does, by making every decision about how to build the net (which features to use, what its topology should be, etc.) by consulting the rule-based theory, then that's a clear sense in which the network "implements" the rule. The reason is that the hand-wiring and tweaking of such a network would not be motivated by principles of connectionist theory; at the level at which the manipulations are carried out, the units and connections are indistinguishable from one another and could be wired together any way one pleased. The answer to the question "Why is the network wired up that way?" would come from the rule-theory; for example, "Because the regular rule is a default operation that is insensitive to stem phonology". Therefore in the most interesting sense such a network *is* a rule. The point carries over to more complex cases, where one would have different subnetworks corresponding to different parts of rules. Since it is the fact that the network implements such-and-such a rule that is doing the work of explaining the phenomenon, the question now becomes, is there any reason to believe that the rule is implemented in that way rather some other way? Please note that we are *not* asserting that no PDP model of any sort could ever acquire linguistic knowledge without directly implementing linguistic rules. Our hope, of course, is that as the discussion proceeds, models of all kinds will be become more sophisticated and ambitious. As we said in our Conclusion, "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle. At the same time, they show that there is no basis for the belief that connectionism will dissolve the difficult puzzles of language, or even provide radically new solutions to them." So to answer the catechism: (a) Do we believe that English past tense formation is not learnable? Of course we don't! (b) If it is learnable, is it specifically unlearnable by nets? No, there may be some nets that can learn it; certainly any net that is intentionally wired up to behave exactly like a rule-learning algorithm can learn it. Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but about which theories are true, and our analysis was of pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. Therefore it's not surprising that the developmental data confirm that children do not behave the way such a pattern associator behaves. (c) If past tense formation is learnable by nets, but only if the invariance that the net learns and that causally constrains its successful performance is describable as a "rule", what's wrong with that? Absolutely nothing! -- just like there's nothing wrong with saying that past tense formation is learnable by a bunch of precisely-arranged molecules (viz., the brain) such that the invariance that the molecules learn, etc. etc. The question is, what explains the facts of human cognition? Pattern associator networks have some interesting properties that can shed light on certain kinds of phenomena, such as irregular past tense forms. But it is simply a fact about the regular past tense alternation in English that it is not that kind of phenomenon. You can focus on the interesting empirical properties of pattern associators, and use them to explain certain things (but not others), or you can generalize them to a class of universal devices that can explain nothing without appeals to the rules that they happen to implement. But you can't have it both ways. Steven Pinker Department of Brain and Cognitive Sciences E10-018 MIT Cambridge, MA 02139 steve at cogito.mit.edu Alan Prince Program in Cognitive Science Department of Psychology Brown 125 Brandeis University Waltham, MA 02254-9110 prince at brandeis.bitnet References: Egedi, D.M. and R.W. Sproat (1988) Neural Nets and Natural Language Morphology, AT&T Bell Laboratories, Murray Hill,NJ, 07974. Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193. Reprinted in S. Pinker & J. Mehler (Eds.), Connections and symbols. Cambridge, MA: Bradford Books/MIT Press. Prince, A. & Pinker, S. (1988) Rules and connections in human language. Trends in Neurosciences, 11, 195-202. Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press. From jlm+ at andrew.cmu.edu Wed Aug 31 12:23:11 1988 From: jlm+ at andrew.cmu.edu (James L. McClelland) Date: Wed, 31 Aug 88 12:23:11 -0400 (EDT) Subject: Replys to S. Harnad In-Reply-To: <8808302244.AA27242@ATHENA.MIT.EDU> References: <8808302244.AA27242@ATHENA.MIT.EDU> Message-ID: <8X72Tjy00jWDQ2M10o@andrew.cmu.edu> Steve -- In the first of your two messages, there seemed to be a failure to entertain the possibility that there might be a network that is not a strict implementation of a rule system nor a pattern associator of the type described by Rumelhart and me that could capture the past tense phenomena. The principle shortcoming of our network, in my view, was that it treated the problem of past-tense formation as a problem in which one generates the past tense of a word from its present tense. This of course cannot be the right way to do things, for reasons which you describe at some length in your paper. However, THIS problem has nothing to do with whether a network or some other method is used for going from present to past tense. Several researchers are now exploring models that take as input a distributed representation of the intended meaning, and generate as output a description of the phonological properties of the utterance that expresses that meaning. Such a network must have at least one hidden layer to do this task. Note that such a network would naturally be able to exploit the common structure of the various different versions of English inflectional morphology. It is already clear that it would have a much easier time learning inflection rather than word-reversal as a way of mastering past tense etc. What remain to be addressed are issues about the nature and onset of use of the regular inflection in English. Suffice it to say here that the claims you and Prince make about the sharp distinction between the regular and irregular systems deserve very close scrutiny. I for one find the arguments you give in favor of this view unconvincing. We will be writing at more length on these matters, but for now I just wanted two points to be clear: 1) The argument about what class of models a particular model's shortcomings exemplify is not an easy one to resolve, and there is considerable science and (yes) mathematics to be done to understand just what the classes are and what can be taken as examples of them. Just what generalization you believe you have reason to claim your arguments allow you to make has not always been clear. In the first of your two recent messages you state: Our concern is not with (the mathematical question of) what nets can or cannot do in principle, but with which theories are true, and our conclusions were about pattern associators using distributed phonological representations. We showed that it is unlikely that human children learn the regular rule the way such a pattern associator learns the regular rule, because it is simply the wrong tool for the job. After receiving the message containing the above I wrote the following: Now, the model Rumelhart and I proposed was a pattern associator using distributed phonological representations, but so are the other kinds of models that people are currently exploring; they happen though to use such representations at the output and not the input and to have hidden layers. I strongly suspect that you would like your argument to apply to the broad class of models which might be encompassed by the phrase "pattern associators using distributed phonological representations", and I know for a fact that many readers think that this is what you intend. However, I think it is much more likely that your arguments apply to the much narrower class of models which map distributed phonological representations of present tense to distributed phonological represenations of past tense. In your longer, second note, you are very clear in stating that you indend your arguments to be taken against the narrow class of models that map phonology to phonology. I do hope that this sensible view gets propagated, as I think many may feel that you think you have a more general case. Indeed, your second message takes a general attitude that I find I can agree with: Let's do some more research and find out what can and can't be done and what the important taxonomic classes of architecture types might be. 2) There's quite a bit more empirical research to be done even characterizing accurately the facts about the past tense. I believe this research will show that you have substantially overstated the empirical situation in several respects. Just as one example, you and Prince state the following: The baseball term _to fly out_, meaning 'make an out by hitting a fly ball that gets caught', is derived from the baseball noun _fly (ball)_, meaning 'ball hit on a conspicuously parabolic trajectory', which is in turn related to the simple strong verb _fly_, 'proceed through the air. Everyone says 'he flied out'; no mere mortal has yet been observed to have "flown out" to left field. You repeated this at Cog Sci two weeks ago. Yet in October of 87 I received the message appended below, which directly contradicts your claim. As you state in your second, more constructive message, we ALL need to be very clear about what the facts are and not to rush around making glib statements! Jay McClelland ======================================================= [The following is appended with the consent of the author.] Date: Sun, 11 Oct 87 21:20:55 PDT From: elman at amos.ling.ucsd.edu (Jeff Elman) To: der at psych.stanford.edu, jlm at andrew.cmu.edu Subject: flying baseball players Heard in Thursday's play-off game between the Tigers and Twins: "...and he flew out to left field...he's...OUT!" What was that P&P were saying?! Jeff ======================================================= From jlm+ at andrew.cmu.edu Wed Aug 31 09:22:16 1988 From: jlm+ at andrew.cmu.edu (James L. McClelland) Date: Wed, 31 Aug 88 09:22:16 -0400 (EDT) Subject: Jordan paper request In-Reply-To: <8808301415.AA04288@dormouse.cs.umd.edu> References: <8808301415.AA04288@dormouse.cs.umd.edu> Message-ID: Back copies of the Cognitive Science Proceedings are available for 49.95 from: Lawrence Erlbaum Associates, Inc. Publsihers 365 Broadway, Hillside, NJ 07642 (201) 767-8450 I presume this includes the 1988 proceedings, though only earlier proceedings are mentioned in the advertisement printed on the back of the 1988 proceedings. In any case the 1986 proceedings containing the Jordan paper amd lots of other good stuff is still available. I might also add that the journal Cognitive Science (the publication of the Cognitive Science Society) has a commitment to the exploration of connectionist models. I am one of the senior editors and the editorial board includes several prominant connectionists. I speak for the journal in saying that we welcome connectionist research with an interdisiplinary flavor. There will be a group of connectionist papers coming out shortly. If you want to submit, read the instructions for authors inside the back cover of a recent issue. If you want to subscribe, write to Ablex Publishing, 355 Chestnut St. Norwood, NJ 07648 or join the society by writing to Alan Lesgold, Secretary-Treasurer Learning Research and Development Center University of Pittsburgh Pittsburgh, PA 15260 membership is just a bit more than a plain subscription and gets you announcements about meetings etc as well as the journal. -- Jay McClelland From harnad at Princeton.EDU Wed Aug 31 16:39:33 1988 From: harnad at Princeton.EDU (Stevan Harnad) Date: Wed, 31 Aug 88 16:39:33 edt Subject: On Theft vs. Honest Toil (Pinker & Prince Discussion, cont'd) Message-ID: <8808312039.AA01275@mind> Pinker & Prince write in reply: >> Contrary to your suggestion, we never claimed that pattern associators >> cannot learn the past tense rule, or anything else, in principle. I've reread the paper, and unfortunately I still find it ambiguous: For example, one place (p. 183) you write: "These problems are exactly that, problems. They do not demonstrate that interesting PDP models of language are impossible in principle." But elsewhere (p. 179) you write: "the representations used in decomposed, modular systems are abstract, and many aspects of their organization cannot be learned in any obvious way." [Does past tense learning depend on any of this unlearnable organization?] On p. 181 you write: "Perhaps it is the limitations of these simplest PDP devices -- two-layer association networks -- that causes problems for the R & M model, and these problems would diminish if more sophisticated kinds of PDP networks were used." But earlier on the same page you write: "a model that can learn all possible degrees of correlation among a set of features is not a model of a human being" [Sounds like a Catch-22...] It's because of this ambiguity that my comments were made in the form of conditionals and questions rather than assertions. But we now stand answered: You do NOT claim "that pattern associaters cannot learn the past tense rule, or anything else, in principle." [Oddly enough, I do: if by "pattern associaters" you mean (as you mostly seem to mean) 2-layer perceptron-style nets like the R & M model, then I would claim that they cannot learn the kinds of things Minsky showed they couldn't learn, in principle. Whether or not more general nets (e.g., PDP models with hidden layers, back-prop, etc.) will turn out to have corresponding higher-order limitations seems to be an open question at this point.] You go on to quote my claim that: "the regularities you describe -- both in the irregulars and the regulars -- are PRECISELY the kinds of invariances you would expect a statistical pattern learner that was sensitive to higher order correlations to be able to learn successfully. In particular, the form-independent default option for the regulars should be readily inducible from a representative sample." and then you comment: >> This is an interesting claim and we strongly encourage you to back it >> up with argument and analysis; a real demonstration of its truth would >> be a significant advance. It's certainly false of the R-M and >> Egedi-Sproat models. There's a real danger in this kind of glib >> commentary of trivializing the issues by assuming that net models are >> a kind of miraculous wonder tissue that can do anything. I don't understand the logic of your challenge. You've disavowed having claimed that any of this was unlearnable in principle. Why is it glibber to conjecture that it's learnable in practice than that it's unlearnable in practice? From everything you've said, it certainly LOOKS perfectly learnable: Sample a lot of forms and discover that the default invariance turns out to work well in most cases (i.e., the "regulars"; the rest, the "irregulars," have their own local invariances, likewise inducible from statistical regularities in the data). This has nothing to do with a belief in wonder tissue. It was precisely in order to avoid irrelevant stereotypes like that that the first posting was prominently preceded by the disclaimer that I happen to be a sceptic about connectionism's actual accomplishments and an agnostic about its future potential. My critique was based solely on the logic of your argument against connectionism (in favor of symbolism). Based only on what you've written about its underlying regularities, past tense rule learning simply doesn't seem to pose a serious challenge for a statistical learner -- not in principle, at any rate. It seems to have stumped R & M 86 and E & S 88 in practice, but how many tries is that? It is possible, for example, as suggested by your valid analysis of the limitations of the Wickelfeature representation, that some of the requisite regularities are simply not reflected in this phonological representation, or that other learning (e.g. plurals) must complement past-tense data. This looks more like an entry-point problem (see (1) below), however, rather than a problem of principle for connectionist learning of past tense formation. After all, there's no serious underdetermination here; it's not like looking for a needle in a haystack, or NP-complete, or like that. I agree that R & M made rather inflated general claims on the basis of the limited success of R & M 86. But (to me, at any rate) the only potentially substantive issue here seems to be the one of principle (about the relative scope and limits of the symbolic vs. the connectionistic approach). Otherwise we're all just arguing about the scope and limits of R & M 86 (and perhaps now also E & S 88). Two sources of ambiguity seem to be keeping this disagreement unnecessarily vague: (1) There is an "entry-point" problem in comparing a toy model (e.g., R & M 86) with a lifesize cognitive capacity (e.g., the human ability to form past tenses): The capacity may not be modular; it may depend on other capacities. For example, as you point out in your article, other phonological and morphological data and regularities (e.g., pluralization) may contribute to successful past tense formation. Here again, the challenge is to come up with a PRINCIPLED limitation, for otherwise the connectionist can reasonably claim that there's no reason to doubt that those further regularities could have been netted exactly the same way (if they had been the target of the toy model); the entry point just happened to be arbitrarily downstream. I don't say this isn't hand-waving; but it can't be interestingly blocked by hand-waving in the opposite direction. (2) The second factor is the most critical one: learning. You put a lot of weight on the idea that if nets turn out to behave rulefully then this is a vindication of the symbolic approach. However, you make no distinction between rules that are built in (as "constraints," say) and rules that are learned. The endstate may be the same, but there's a world of difference in how it's reached -- and that may turn out to be one of the most important differences between the symbolic approach and connectionism: Not whether they use rules, but how they come by them -- by theft or honest toil. Typically, the symbolic approach builds them in, whereas the connectionistic one learns them from statistical regularities in its input data. This is why the learnability issue is so critical. (It is also what makes it legitimate for a connectionist to conjecture, as in (1) above, that if a task is nonmodular, and depends on other knowledge, then that other knowledge too could be acquired the same way: by learning.) >> Your claim about a 'statistical pattern learner...sensitive to higher >> order correlations' is essentially impossible to evaluate. There are in principle two ways to evaluate it, one empirical and open-ended, the other analytical and definitive. You can demonstrate that specific regularities can be learned from specific data by getting a specific learning model to do it (but its failure would only be evidence that that model fails for those data). The other way is to prove analytically that certain kinds of regularities are (or are not) learnable from certain kinds of data (in certain ways, I might add, because connectionism may be only one candidate class of statistical learning algorithms). Poverty-of-the-stimulus arguments attempt to demonstrate the latter (i.e., unlearnability in principle). >> We're mystified that you attribute to us the claim that "past >> tense formation is not learnable in principle."... No one in his right >> mind would claim that the English past tense rule is "built in". We >> spent a full seven pages (130-136) of 'OLC' presenting a simple model >> of how the past tense rule might be learned by a symbol manipulation >> device. So obviously we don't believe it can't be learned. Here are some extracts from OLC 130ff: "When a child hears an inflected verb in a single context, it is utterly ambiguous what morphological category the inflection is signalling... Pinker (1984) suggested that the child solves this problem by "sampling" from the space of possible hypotheses defined by combinations of an innate finite set of elements, maintaining these hypotheses in the provisional grammar, and testing them against future uses of that inflection, expunging a hypothesis if it is counterexemplified by a future word. Eventually... only correct ones will survive." [The text goes on to describe a mechanism in which hypothesis strength grows with success frequency and diminishes with failure frequency through trial and error.] "Any adequate rule-based theory will have to have a module that extracts multiple regularities at several levels of generality, assign them strengths related to their frequency of exemplification by input verbs, and let them compete in generating a past tense for for a given verb." It's not entirely clear from the description on pp. 130-136 (probably partly because of the finessed entry-point problem) whether (i) this is an innate parameter-setting or fine-tuning model, as it sounds, with the "learning" really just choosing among or tuning the built-in parameter settings, or whether (ii) there's genuine bottom-up learning going on here. If it's the former, then that's not what's usually meant by "learning." If it's the latter, then the strength-adjusting mechanism sounds equivalent to a net, one that could just as well have been implemented nonsymbolically. (You do state that your hypothetical module would be equivalent to R & M's in many respects, but it is not clear how this supports the symbolic approach.) [It's also unclear what to make of the point you add in your reply (again partly because of the entry-point problem): >>"(In the case of the past tense rule, there is a clear P-of-S argument for at least one aspect of the organization of the inflectional system...)">> Is this or is this not a claim that all or part of English past tense formation is not learnable (from the data available to the child) in principle? There seems to be some ambiguity (or perhaps ambivalence) here.] >> The only way we can make sense of this misattribution is to suppose >> that you equate "learnable" with "learnable by some (nth-order) >> statistical algorithm". The underlying presupposition is that >> statistical modeling (of an undefined character) has some kind of >> philosophical priority over other forms of analysis; so that if >> statistical modeling seems somehow possible-in-principle, then >> rule-based models (and the problems they solve) can be safely ignored. Yes, I equate learnability with an algorithm that can extract statistical regularities (possibly nth order) from input data. Connectionism seems to be (an interpretation of) a candidate class of such algorithms; so does multiple nonlinear regression. The question of "philosophical priority" is a deep one (on which I've written: "Induction, Evolution and Accountability," Ann. NY Acad. Sci. 280, 1976). Suffice it to say that induction has epistemological priority over innatism (or such a case can be made) and that a lot of induction (including hypothesis-strengthening by sampling instances) has a statistical character. It is not true that where statistical induction is possible, rule-based models must be ignored (especially if the rule-based models learn by what is equivalent to statistics anyway), only that the learning NEED not be implemented symbolically. But it is true that where a rule can be learned from regularities in the data, it need not be built in. [Ceterum sentio: there is an entry-point problem for symbols that I've also written about: "Categorical Perception," Cambr. U. Pr. 1987. I describe there a hybrid approach in in which symbolic and nonsymbolic representations, including a connectionistic component, are put together bottom-up in a principled way that avoids spuriously pitting connectionism against symbolism.] >> As a kind of corollary, you seem to assume that unless the input is so >> impoverished as to rule out all statistical modeling, rule theories >> are irrelevant; that rules are impossible without major stimulus-poverty. No, but I do think there's an entry-point problem. Symbolic rules can indeed be used to implement statistical learning, or even to preempt it, but they must first be grounded in nonsymbolic learning or in innate structures. Where there is learnability in principle, learning does have "philosophical (actually methodological) priority" over innateness. >> In our view, the question is not CAN some (ungiven) algorithm >> 'learn' it, but DO learners approach the data in that fashion. >> Poverty-of-the-stimulus considerations are one out of many >> sources of evidence in this issue... >> developmental data confirm that children do not behave the way such a >> pattern associator behaves. Poverty-of-the-stimulus arguments are the cornerstone of modern linguistics because, if they are valid, they entail that certain rules are unlearnable in principle (from the data available to the child) and hence that a learning model must fail for such cases. The rule system itself must accordingly be attributed to the brain, rather than just the general-purpose inductive wherewithal to learn the rules from experience. Where something IS learnable in principle, there is of course still a question as to whether it is indeed learned in practice rather than being innate; but neither (a) the absence of data on whether it is learned nor (b) the existence of a rule-based model that confers it on the child for free provide very strong empirical guidance in such a case. In any event, developmental performance data themselves seem far too impoverished to decide between rival theories at this stage. It seems advisable to devise theories that account for more lifesize chunks of our asymptotic (adult) performance capacity before trying to fine-tune them with developmental (or neural, or reaction-time, or brain-damage) tests or constraints. (Standard linguistic theory has in any case found it difficult to find either confirmation or refutation in developmental data to date.) By way of a concrete example, suppose we had two pairs of rival toy models, symbolic vs. connectionistic, one pair doing chess-playing and the other doing factorials. (By a "toy" model I mean one that models some arbitrary subset of our total cognitive capacity; all models to date, symbolic and connectionistic, are toy models in this sense.) The symbolic chess player and the connectionistic chess player both perform at the same level; so do the symbolic and connectionistic factorializer. It seems evident that so little is known about how people actually learn chess and factorials that "developmental" support would hardly be a sound basis for choosing between the respective pairs of models (particularly because of the entry-point problem, since these skills are unlikely to be acquired in isolation). A much more principled way would be to see how they scaled up from this toy skill to more and more lifesize chunks of cognitive capacity. (It has to be conceded, however, that the connectionist models would have a marginal lead in this race, because they would already be using the same basic [statistical learning] algorithm for both tasks, and for all future tasks, presumably, whereas the symbolic approach would have to be making its rules on the fly, an increasingly heavy load.) I am agnostic about who would win this race; connectionism may well turn out to be side-lined early because of a higher-order Perceptron-like limit on its rule-learning ability, or because of principled unlearnability handicaps. Who knows? But the race is on. And it seems obvious that it's far too early to use developmental (or neural) evidence to decide which way to bet. It's not even clear that it will remain a 2-man race for long -- or that a finish might not be more likely as a collaborative relay. (Nor is the one who finishes first or gets farthest guaranteed to be the "real" winner -- even WITH developmental and neural support. But that's just normal underdetermination.) >> if you simply wire up a network to do exactly what a rule does, by >> making every decision about how to build the net (which features to >> use, what its topology should be, etc.) by consulting the rule-based >> theory, then that's a clear sense in which the network "implements" >> the rule What if you don't WIRE it up but TRAIN it up? That's the case at issue here, not the one you describe. (I would of course agree that if nets wire in a rule as a built-in constraint, that's theft, not honest toil, but that's not the issue!) Stevan Harnad harnad at mind.princeton.edu From chrisley.pa at Xerox.COM Wed Aug 31 14:03:00 1988 From: chrisley.pa at Xerox.COM (chrisley.pa@Xerox.COM) Date: 31 Aug 88 11:03 PDT Subject: The Four-Quadrant Problem In-Reply-To: alexis@marzipan.mitre.org (Alexis Wieland)'s message of Wed, 31 Aug 88 07:58:43 EDT Message-ID: <880831-112100-2321@Xerox> In the truly general case of an infinite plane, it seems that the four-quadrant problem cannot be solved by a (finite) two-layer network. But for any arbitrarily large, finite sub-plane, a two-layer (one hidden layer) network exists that can solve the four-quadrant problem, provided you use an inequality to interpret the output node. I think. Ron Chrisley Xerox PARC SSL Room 1620 3333 Coyote Hill Road Palo Alto, CA 94309 (415) 494-4740 From alexis at marzipan.mitre.org Wed Aug 31 07:58:43 1988 From: alexis at marzipan.mitre.org (Alexis Wieland) Date: Wed, 31 Aug 88 07:58:43 EDT Subject: The Four-Quadrant Problem Message-ID: <8808311158.AA00606@marzipan.mitre.org.> Let me try to remove some of the confusion I've caused. The four-quadrant problem is *my* name for an easily described problem which *requires* a neural net with three (or more) layers (e.g. 2+ hidden layers). The only relation of all this to the recent DARPA report is that they use an illustration of it in passing as an example of what a two layer net can do (which I assert it cannot). The four-quadrant problem is to use a 2-input/1-output AAAAAAAAA***BBBBBBBBB network and, assuming that the inputs represent xy pts AAAAAAAAA***BBBBBBBBB on a Cartesian plane, classify all the points in the AAAAAAAAA***BBBBBBBBB first and third quadrant as being in one class and all AAAAAAAAA***BBBBBBBBB the points in the second and forth quadrant as being AAAAAAAAA***BBBBBBBBB in the other class. For pragmatic reasons, you can ********************* allow a "don't care" region along each axis not to ********************* exceed a fixed width delta. This is illustrated at BBBBBBBBB***AAAAAAAAA left: A's are one class (i.e., one output (or range BBBBBBBBB***AAAAAAAAA of outputs)), B's are the other class (i.e., another BBBBBBBBB***AAAAAAAAA output (or non-overlapping range of outputs)), and *'s BBBBBBBBB***AAAAAAAAA are don't cares. As always with this sort of problem, BBBBBBBBB***AAAAAAAAA rotations and translations of the figure can be ignored. Alexis Wieland alexis%yummy at gateway.mitre.org