From hicks at cs.titech.ac.jp Sun Aug 1 16:14:14 1993 From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp) Date: Sun, 1 Aug 93 16:14:14 JST Subject: Multiple Models, Committee of nets etc... In-Reply-To: "Michael P. Perrone"'s message of Thu, 29 Jul 93 03:27:21 EDT <9307290727.AA19084@cns.brown.edu> Message-ID: <9308010714.AA25751@maruko.cs.titech.ac.jp> Michael P. Perrone writes >Tom Dietterich write: >> This analysis predicts that using a committee of very diverse >> algorithms (i.e., having diverse approximation errors) would yield >> better performance (as long as the committee members are competent) >> than a committee made up of a single algorithm applied multiple times >> under slightly varying conditions. > >and David Wolpert writes: >>There is a good deal of heuristic and empirical evidence supporting >>this claim. In general, when using stacking to combine generalizers, >>one wants them to be as "orthogonal" as possible, as Tom maintains. > >One minor result from my thesis shows that when the estimators are >orthogonal in the sense that > > E[n_i(x)n_j(x)] = 0 for all i<>j > >where n_i(x) = f(x) - f_i(x), f(x) is the target function, f_i(x) is >the i-th estimator and the expected value is over the underlying >distribution; then the MSE of the average estimator goes like 1/N >times the average of the MSE of the estimators where N is the number >of estimators in the population. > >This is a shocking result because all we have to do to get arbitrarily >good performance is to increase the size of our estimator population! >Of course in practice, the nets are correlated and the result is no >longer true. The matrix E[n_i(x)n_j(x)] may not be known but an estimate E[n'_i(x)n'_j(x)] may be obtained using some training data which is different from the training data used to train the generalizers in the first place. Here n'_i(x) = f'(x) - f_i(x), E[n'_i(x)] = 0, f'(x) is a training data, f_i(x) is the i-th estimator, and the expected value is over the training data. Take the eigenvectors (with non-zero eigenvalues) of E[n'_i(x)n'_j(x)] and you have a set of generalizers (each a linear combination of the original generalizers) which are orthogonal and uncorrelated over the training data, i.e. E[n'_i(x)n'_j(x)] = 0 for all i<>j. They can even be normalized by their eigenvalues so that E[n'_i(x)n'_j(x)] = 1 for all i==j. To summarize, in practice the generalizers can be de-correlated (to the extent that they are linearly independent) by finding new generalizers composed of appropriate linear sums of the originals. I have an unrelated comment regarding Drucker Harris' earlier mail about using synthetic data to improve performance. Wouldn't it be true to say that if you had a choice between learning with N synthetically created data and learning with N novel training data that the latter is, on average, going to give better results? If so, then using synth data is a way to stretch your training data; something like potato helper.  From jim at hydra.maths.unsw.EDU.AU Sun Aug 1 19:51:14 1993 From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU) Date: Mon, 2 Aug 93 09:51:14 +1000 Subject: committees Message-ID: <9308012351.AA15492@hydra.maths.unsw.EDU.AU> A small caveat about when it is good to average different estimates of an unknown quantity: If you have a fairly accurate and a fairly inaccurate way of estimating something, it is obviously not good to take their simple average (that is, half of one plus half of the other). The correct weighting of the estimates is in inverse proportion to their variances (that is, keep closer to the more accurate one). (At least, that is the correct weighting if the estimates are independent: if they are correlated, it is more complicated, but not much more). Proofs are easy, and included in the ref below: R. Templeton & J. Franklin, `Adaptive information and animal behaviour', Evolutionary Theory 10 (Dec 1992): 145-155. (Note that this concerns inaccurate estimates, not biased ones, as some previous posters have been considering). Of course, averaging and correlations are very easy calculations for neural nets. Some similar ideas have been studied in connection with "sensor fusion" for robots: Journal of Robotic Systems 7 (3): (1990), Special issue on multisensor integration and fusion for intelligent robots, ed. R.C. Luo. Interesting work on how real committees combine information is reviewed in: D. Bunn & G. Wright, `Interaction of judgemental and statistical forecasting methods: issues and analysis', Management Science 37 (1991): 501. James Franklin School of Mathematics University of New South Wales  From mpp at cns.brown.edu Mon Aug 2 17:54:29 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Mon, 2 Aug 93 17:54:29 EDT Subject: Multiple Models, Committee of nets etc... Message-ID: <9308022154.AA00323@cns.brown.edu> Joydeep Ghosh writes: > in our experiments, the difference between simple averaging > and the best among other arbitration mechanisms does not > seem statistically significant, thus supporting Waibel and > Hampshire's observations. The combination of > networks trained on different feature vectors, on the other > hand leads to 15-25% reduction in errors on a very difficult data set. The result I discussed in a previous posting (that there is a 1/n relation between the MSE of the averaged estimator and the avg. population MSE) helps explain this result in the following terms: Averaging is more effective when the estimates are more distinct. Thus in the example that Joydeep gives, the fact that different features where used to generate different estimates suggests that those estimates will be distinct (unless the features carry the same information). also we have the advantage that using fewer features, we can use smaller nets which helps avoid problems like over-fitting and the curse of dimensionality. -Michael -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From mpp at cns.brown.edu Tue Aug 3 01:45:18 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Tue, 3 Aug 93 01:45:18 EDT Subject: Committees Message-ID: <9308030545.AA01131@cns.brown.edu> David Wolpert writes: -->Many of the results in the literature which appear to dispute this -->are simply due to use of an error function which is not restricted to -->being off-training set. In other words, there's always a "win" -->if you perform rationally on the training set (e.g., reproduce it -->exactly, when there's no noise), if your error function gives you -->points for performing rationally on the training set. In a certain -->sense, this is trivial, and what's really interesting is off-training -->set behavior. In any case, this automatic on-training set win is all -->those aforementioned results refer to; in particular, they imply essentially -->nothing concerning performance off of the training set. In the case of averaging for MSE optimization (the meat and potatoes of neural networks) and any other convex measure, the improvement due to averaging is independent of the distribution - on-training or off-. It depends only on the topology of the optimization measure. It is important to note that this result does NOT say the average is better than any individual estimate - only better than the average population performance. For example, if one had a reliable selection criterion for deciding which element of a population of estimators was the best and that estimator was better than the average estimator, then just choose the better one. (Going one step further, simply use the selection criterion to choose the best estimator from all possible weighted averages of the elements of the population.) As David Wolpert pointed out, any estimator can be confounded by a pathological data sample and therefore there doesn't exist a *guaranteed* method for deciding which estimator is the best from a population in all cases. Weak (as opposed to guaranteed) selection criteria exist in in the form of cross-validation (in all of its flavors). Coupling cross-validation with averaging is a good idea since one gets the best of both worlds particularly for problems with insufficient data. I think that another very interesting direction for research (as David Wolpert alluded to) is the investigation of more reliable selection criterion. -Michael  From bernasch at forwiss.tu-muenchen.de Tue Aug 3 03:41:45 1993 From: bernasch at forwiss.tu-muenchen.de (Jost Bernasch) Date: Tue, 3 Aug 1993 09:41:45 +0200 Subject: weighting of estimates In-Reply-To: jim@hydra.maths.unsw.EDU.AU's message of Mon, 2 Aug 93 09:51:14 +1000 <9308012351.AA15492@hydra.maths.unsw.EDU.AU> Message-ID: <9308030741.AA29386@forwiss.tu-muenchen.de> James Franklin writes: > If you have a fairly accurate and a fairly inaccurate way of estimating >something, it is obviously not good to take their simple average (that >is, half of one plus half of the other). The correct weighting of the >estimates is in inverse proportion to their variances (that is, keep >closer to the more accurate one). Of course this is the correct weighting. Since the 60s this is done very succesfully with the well-known "Kalman Filter". In this theory the optimal combination of knowledge sources is described and proofed in detail. See the original work @article{Kalman:60, AUTHOR = {R.E. Kalman}, TITLE = "A New Approach to Linear Filtering and Prdiction Problems.", VOLUME = 12, number = 1, PAGES = {35--45}, JOURNAL = "Trans. ASME, series D, J. Basic Eng.", YEAR = 1960 } some neural network literature concerning this subject @Article{WatanabeTzafestas:90, author = "Watanabe and Tzafestas", title = "Learning Algorithms for Neural Networks with the Kalman Filter", journal = JIRS, year = 1990, volume = 3, number = 4, pages = "305-319", keywords= "kalman, neural net" } @string{JIRS = {Journal of Intelligent and Robotic Systems}} and a very good and practice oriented book @book{Gelb:74, AUTHOR = "A. Gelb", TITLE = "Applied {O}ptimal {E}stimation", PUBLISHER = "{M.I.T} {P}ress, {C}ambridge, {M}assachusetts", YEAR = "1974" } (At least, that is the correct >weighting if the estimates are independent: if they are correlated, >it is more complicated, but not much more). Proofs are easy, and included >in the ref below: For proofs and extensions to non-linear filtering and correlated weights see the control theory literature. A lot of work is already done! -- Jost Jost Bernasch Bavarian Research Center for Knowledge-Based Systems Orleansstr. 34, D-81667 Muenchen , Germany bernasch at forwiss.tu-muenchen.de  From edelman at wisdom.weizmann.ac.il Tue Aug 3 16:23:11 1993 From: edelman at wisdom.weizmann.ac.il (Edelman Shimon) Date: Tue, 3 Aug 93 23:23:11 +0300 Subject: TR on representation with receptive fields available Message-ID: <9308032023.AA23457@wisdom.weizmann.ac.il> The following TR is available via anonymous ftp from eris.wisdom.weizmann.ac.il (132.76.80.53), as /pub/rfs-for-recog.ps.Z Representation with receptive fields: gearing up for recognition Weizmann Institute CS-TR 93-09 Yair Weiss and Shimon Edelman Abstract: Receptive fields are probably the most prominent and ubiquitous computational mechanism employed by biological information processing systems. We report an attempt to understand the representational capabilities of the kind of receptive fields found in mammalian vision motivated by the assumption that the successive stages of processing remap the retinal representation space in a manner that makes objectively similar stimuli (e.g., different views of the same 3D object) closer to each other, and dissimilar stimuli farther apart. We present theoretical analysis and computational experiments that compare the similarity between stimuli as they are represented at the successive levels of the processing hierarchy, from the retina to the nonlinear cortical units. Our results indicate that population-based codes do convey information that seems lost in the activities of the individual receptive fields, and that at the higher levels of the hierarchy objects may be represented in a form that is more useful for visual recognition. This finding may, therefore, explain the success of previous empirical approaches to object recognition that employed representation by localized receptive fields.  From jim at hydra.maths.unsw.EDU.AU Wed Aug 4 02:32:13 1993 From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU) Date: Wed, 4 Aug 93 16:32:13 +1000 Subject: weighting of estimates Message-ID: <9308040632.AA07933@hydra.maths.unsw.EDU.AU> bernasch at forwiss.tu-muenchen.de (Jost Bernasch) writes: >James Franklin writes: >> If you have a fairly accurate and a fairly inaccurate way of estimating >>something, it is obviously not good to take their simple average (that >>is, half of one plus half of the other). The correct weighting of the >>estimates is in inverse proportion to their variances (that is, keep >>closer to the more accurate one). > >Of course this is the correct weighting. Since the 60s this is done >very succesfully with the well-known "Kalman Filter". In this theory >the optimal combination of knowledge sources is described and >proofed in detail. Well, yes, in a way, but that's something like saying that the motion of your body can be derived from Einstein's equations of General Relativity. Too complicated. In particular, Kalman filters, and control theory generally, are about time-varying entities, and Kalman filters are an (essentially Bayesian) way of successively updating estimates of a (possibly time-varying) quantity (See R.J. Meinhold & N.D. Singpurwalla, `Understanding the Kalman filter', American Statistician 37 (1983): 123). The situation I was considering, and what is relevant to committees, is much simpler (hence more general): how to combine estimates (possibly correlated) of a single unknown quantity. James Franklin Mathematics University of New South Wales  From Graham.Lamont at newcastle.ac.uk Wed Aug 4 12:41:36 1993 From: Graham.Lamont at newcastle.ac.uk (Graham Lamont) Date: Wed, 4 Aug 93 12:41:36 BST Subject: multiple models, hybrid estimation Message-ID: When I emailled Wray Buntine about his original posting on the subject of multiple models, I quipped: `Shhh.... dont tell everyone, they'll all want one!' (a multiple model) Little did I know everyman and his dog appears to have one already:) The recent postings and especially Michael Perrone's recent contribution(s) have persuaded me to sketch the extent of my work in this area and donate a FREE piece of Mathematica code. I mention Michael's work because it follows the same basic approach of general least squares as mine, and I agree with many of the points that he raises in his general discussion of hybrid estimation, such as the need for a completely general method, the utility of a closed form solution, and his novel description of distinct local minima in functional space as opposed to parameter space. However..... he says that for his method (GEM): >> 7) The *optimal* parameters of the ensemble estimator are given in closed >> form. I present a method in the same general spirit of Michael's that is slightly more optimal and general (and I am not claiming even this is the best!). It is based on the unconstrained least squares of the estimator poulation "design matrix" via SVD. 1 Generality: The technique utilises singular value decomposition (SVD), and hence avoids the problem of collinearity between estimators that can (and often does) occur in a population of estimators as mentioned by Michael. SVD happily copes with highly collinear or even duplicate estimators in the design matrix, without preprocessing/thresholding. 2 Optimality: The technique places no constraint on the value of the weights (MP [1] has sum=1 and also in the results he presents all w are 0 This seemed relevant. Please excuse the bandwidth if it's not. jim From: IN%"DFP10 at ALBANY.ALBANY.EDU" "Donald F. Parsons MD" 3-AUG-1993 05:38:36.51 To: IN%"hspnet-l at albnydh2.bitnet" "Rural Hospital Consulting Network" CC: Subj: Call for Papers: AIM-94 Spring Symposium ----------------------------Original message---------------------------- Call for Papers AAAI 1994 Spring Symposium: Artificial Intelligence in Medicine: Interpreting Clinical Data (March 21-23, 1994, Stanford University, Stanford, CA) The deployment of on-line clinical databases, many supplanting the traditional role of the paper patient chart, has increased rapidly over the past decade. The consequent explosion in the quality and volume of available clinical data, along with an ever more stringent medicolegal obligation to remain aware of all implications of these data, has created a substantial burden for the clinician. The challenge of providing intelligent tools to help clinicians monitor patient clinical courses, forecast likely prognoses, and discover new relational knowledge, is at least as large as that generated by the knowledge explosion which motivated earlier efforts in Artificial Intelligence in Medicine (AIM). Whereas many of the pioneering programs worked on small data sets which were entered interactively by knowledge engineers or clinicians, the current generation of programs have to act on raw data, unfiltered and unmediated by human beings. Interaction with human users typically only occurs on demand or on detection of clinically significant events. The emphasis of this symposium will be on methodologies that provide robust autonomous performance in data-rich clinical environments ranging from busy outpatient practices to operating rooms and intensive care units. Relevant topics include intelligent alarming (including anticipation and prevention of adverse clinical events), data abstraction, sensor validation, preliminary event classification, therapy advice, critiquing, and assistance in the establishment and execution of clinical treatment protocols. Detection of temporal and geographical patterns of disease manifestations and machine learning of clinical patterns are also of interest. Organizing committee Serdar Uckun, Co-chair (Stanford University) Isaac Kohane, Co-chair (Harvard Medical School) Enrico Coiera (Hewlett-Packard Laboratories/Bristol) Ramesh Patil (USC/Information Sciences Institute) Mario Stefanelli (Universita di Pavia) Format A large data sample will be made available to participants to serve as training and test sets for various approaches to information management and to provide a common domain of discourse. The sample will consist of two data sets: * A dense, high volume data set typical of a critical care environment. This data set will consist of hemodynamic measurements, mechanical ventilator settings, laboratory values including arterial blood gas measurements, and treatment information covering a 12-hour period of a patient with severe respiratory distress. Monitored parameters (10-15 channels of data) will be sampled and recorded at rates up to 1/10 Hz. The data set will be annotated with other clinically relevant data, physician's interpretations, and established diagnoses. * A large number of sparse data sets representative of outpatient environments. The data will include laboratory measurements, treatment information, and physical findings on a large sample of patients (50 to 100 patients) taken from the same disorder population. Each patient record will consist of several weeks' or months' worth of clinical information sampled at irregular intervals. Most of the cases will be made available to interested researchers to be used as training cases. For interested parties, a small percentage of cases will be made available two weeks prior to the symposium to be used as an optional testing set for various approaches. The data samples and accompanying clinical information will be available via ftp or e-mail server around August 15, 1993. Please contact the organizers at the addresses below for further information. The data will also be made available on diskettes to participants who do not have Internet access. It will be left to the discretion of the participants to use any subset of these samples to help focus their approaches and presentations. The data can also be used as test vehicles for their own research and to create sample programs for demonstration at the symposium. Participants do not have to use the data in order to participate. However, the program committee will favor presentations which exploit the provided data sets in their analyses. Submission process Potential participants are invited to submit abstracts no longer than 2 pages (< 1200 words) by October 15, 1993. The abstracts should outline methodology and indicate, if applicable, how the provided data may be used as a proof-of-principle for the discussed methodology. Electronic submissions are encouraged. The abstracts may be sent to in ASCII, RTF, or PostScript formats. Authors of accepted abstracts will be asked to submit a working paper by January 31, 1994. They will also be asked to prepare either a poster or an oral presentation. Submissions by mail Use this method ONLY IF you cannot submit an abstract electronically. Fax submissions will not be accepted. Send 6 copies of the abstract to: Serdar Uckun, MD, PhD Co-chair, AIM-94 Knowledge Systems Laboratory Stanford University 701 Welch Road, Bldg. C Palo Alto, CA 94304 U.S.A. Phone: [+1] (415) 723-1915 Calendar Abstracts due: October 15, 1993 Notification of authors by: November 15, 1993 Working papers due: January 31, 1994 Spring Symposium: March 21-23, 1994 Information For further information, please contact the co-chairs at the address above or (preferably) via e-mail at:  From hicks at cs.titech.ac.jp Thu Aug 5 11:33:00 1993 From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp) Date: Thu, 5 Aug 93 11:33:00 JST Subject: weighting of estimates In-Reply-To: Jost Bernasch's message of Tue, 3 Aug 1993 09:41:45 +0200 <9308030741.AA29386@forwiss.tu-muenchen.de> Message-ID: <9308050233.AA29633@maruko.cs.titech.ac.jp> Jost Bernasch writes: > >James Franklin writes: > > If you have a fairly accurate and a fairly inaccurate way of estimating > >something, it is obviously not good to take their simple average (that > >is, half of one plus half of the other). The correct weighting of the > >estimates is in inverse proportion to their variances (that is, keep > >closer to the more accurate one). > >Of course this is the correct weighting. Since the 60s this is done >very succesfully with the well-known "Kalman Filter". In this theory >the optimal combination of knowledge sources is described and >proofed in detail. > > (At least, that is the correct > >weighting if the estimates are independent: if they are correlated, > >it is more complicated, but not much more). Proofs are easy, and included > >in the ref below: > >For proofs and extensions to non-linear filtering and correlated >weights see the control theory literature. A lot of work is already >done! I think the comments about the Kalman filter are a bit off the mark. The Kalman filter is based on the mathematics of conditional expectation. However, the Kalman filter is designed to be used for time series. What makes the Kalman filter particularly useful is its recursive nature; a stream of observations may be processed (often in real time) to produce a stream of current estimates (or next estimates if you're trying to beat the stock market). Committees of networks may also use conditional expectation, but combining networks is not the same as processing time series of data. I think it is appropriate at this point to bring up 2 classical results concerning probability theory, conditional expectation, and wide sense conditional expectation. (Wide sense conditional expectation uses the same formulas as conditional expectation. "Wide sense' merely serves to emphasize that the distribution is not assumed to be normal. 'Conditional expectation' is used in the case where the underlying distribution is assumed to be normal.) (1) When the objective function is to minimize the mean squared error over the training data, the wide sense conditional expectation is the best linear predictor, regardless of the original distribution. (2) If the original distribution is normal, and the objective function is to minimize the MSE over the >entire< distribution, (both on-training and off-training), then the conditional expectation is the best predictor, linear or otherwise. There are 3 important factors here. [1]: Underlying distribution (of network outputs): normal? not normal? [2]: Objective function (assume MSE): on-training? off-training? [3]: Predictor: linear? non-linear? {1} [1:normal] => [2:off-training],[3:linear] Neural nets (as opposed to systolic arrays) are needed because the world is full of non-normal distributions. But that doesn't mean that the ouputs of non-linear networks don't have joint normal distributions (over off-training data). Perhaps the non-linearities have been successfully ironed out by the non-linear networks, leaving only linear (or nearly linear) errors to be corrected. In that case we can refer to result (2) to build the optimal off-training predictor for the given committee of networks. {2} [1:not normal] and [2:on-training] and [3:linear] => best predictor is WSE. If the distribution of network outputs is not normal, and we use an on-training criterion, then by virtue of (1), the best linear predictor is the wide sense conditional expectation. {3} [1:not normal] and [2:off-training] and [3:non-linear] => research It is the case in {2} that since [1:not normal], <1> better on-training results may be obtained using some non-linear predictor <2> better on-or-off-training results may be obtained using some different criterion <3> <1> and <2> together. The problem is of course to find such criterion and non-linear predictors. The existence of a priori knowledge can play an important role here; for example adding a term to penelize the complexity of output functions. In conclusion, if {1} is true, that is the networks have captured the non-linearities and the network outputs are joint normal (or nearly normal) distributions, we're home free. Otherwise we ought to think about {3}, non-linear predictors and alternative criterion. {2}, using the WSE, the best performing linear predictor over the MSE of the on-training data, is useful to get the job done, but is only optimal in a limited sense. Craig Hicks hicks at cs.titech.ac.jp Ogawa Laboratory, Dept. of Computer Science Tokyo Institute of Technology, Tokyo, Japan lab: 03-3726-1111 ext. 2190 home: 03-3785-1974 fax: +81(3)3729-0685 (from abroad), 03-3729-0685 (from Japan)  From mpp at cns.brown.edu Thu Aug 5 15:13:44 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Thu, 5 Aug 93 15:13:44 EDT Subject: committees Message-ID: <9308051913.AA13266@cns.brown.edu> Scott Farrar writes: -->John Hampshire characterized a committee as a collection of biased -->estimators; the idea being that a collection of many different kinds of -->bias might constitute a unbiased estimator. I was wondering if anyone -->had any ideas about how this might be related to, supported by, or refuted -->by the Central Limit Theorem. Could experimental variances or confounds -->be likened to "biases", and if so, do these "average out" in a manner which -->can give us a useful mean or useful estimator? I think that this is a very interesting point because, for averaging with MSE optimization, it is possible to show using the strong law of large numbers that the bias of the average estimator converges to the expected bias of any individual estimator while the variance converges to zero. Thus the only way to cancel existing bias using averaging is to average two (or more) different populations from two (or more) estimators which are (somehow) known to have complementary bias. The trick is of course the "somehow"... Any ideas? -Michael -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From wray at ptolemy.arc.nasa.gov Thu Aug 5 19:37:42 1993 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Thu, 5 Aug 93 16:37:42 PDT Subject: committees In-Reply-To: "Michael P. Perrone"'s message of Thu, 5 Aug 93 15:13:44 EDT <9308051913.AA13266@cns.brown.edu> Message-ID: <9308052337.AA04745@ptolemy.arc.nasa.gov> I'm not convinced that the notion of an "unbiased estimator" is useful here. It comes from classical statistics and is really a means of justifying the choice of an estimator for lack of better ideas. An estimator is "unbiased" if the average of the estimator based on all the other samples which we might have seen (but didn't) is equal to the "truth". Notice that unbiased estimators and the use of Occam's razor conflict. We all routinely throw away an "unbiased" neural network, i.e. the best fitting network, in favor of a smoother, simpler network, i.e. by early stopping, weight decay, ...., which is very clearly "biased". So I think its a great thing to be biased. One reason for averaging is because we have several quite different biased networks that we think are reasonable, so like any good gambler, we hedge our bets. Of course, averaging is also standard Bayesian practice, i.e. an obvious result of the mathematics. ---------- Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 269-2 fax: (415) 604 3594 Moffett Field, CA, 94035-1000 email: wray at kronos.arc.nasa.gov  From cohn at psyche.mit.edu Fri Aug 6 11:28:11 1993 From: cohn at psyche.mit.edu (David Cohn) Date: Fri, 6 Aug 93 11:28:11 EDT Subject: Call for Participation: Workshop on Exploration Message-ID: <9308061528.AA06177@psyche.mit.edu> I am helping organize the following one-day workshop during the post-NIPS workshops in Vail, Colorado, on December 3, 1993. We would like to hear from people interested in participating in the workshop, either formally, as a presenter, or informally, as an attendee. Even if you will not be able to attend, if you have work which you feel is relevant, and would like to see discussed, please contact me at the email address below. Given the limited time available, we will not be able to present *every* approach, but we hope to cover a broad range of approaches, both in formal presentations, and in informal discussion, Many thanks in advance, -David Cohn (cohn at psyche.mit.edu) ====================== begin workshop announcement ===================== Robot Learning II: Exploration and Continuous Domains A NIPS '93 Workshop David Cohn Dept. of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02138 cohn at psyche.mit.edu The goal of this one-day workshop will be to provide a forum for researchers active in the area of robot learning and related fields. Due to the limited time available, we will focus on two major issues: efficient exploration of a learner's state space, and learning in continuous domains. Robot learning is characterized by sensor noise, control error, dynamically changing environments and the opportunity for learning by experimentation. A number of approaches, such as Q-learning, have shown great practical utility learning under these difficult conditions. However, these approaches have only been proven to converge to a solution if all states of a system are visited infinitely often. What has yet to be determined is whether we can efficiently explore a state space so that we can learn without having to visit every state an infinite number of times, and how we are to address problems on continuous domains, where there are effectively an infinite number of states to be visited. This workshop is intended to serve as a followup to last year's post-NIPS workshop on robot learning. The two problems to be addressed this year were identified as two (of the many) crucial issues facing the field. The morning session of the workshop will consist of short presentations discussing theoretical approaches to exploration and to learning in continuous domains, followed by general discussion guided by a moderator. The afternoon session will center on practical and/or heuristic approaches to these problems in the same format. As time permits, we may also attempt to create an updated "Where do we go from here?" list, like that drawn up in last year's workshop. The targeted audience for the workshop are those researchers who are interested in robot learning, exploration, or active learning in general. We expect to draw an eclectic audience, so every attempt will be made to ensure that presentations are accessible to people without any specific background in the field.  From sontag at control.rutgers.edu Fri Aug 6 17:35:16 1993 From: sontag at control.rutgers.edu (Eduardo Sontag) Date: Fri, 6 Aug 93 17:35:16 EDT Subject: Expository Tech Report on Neural Nets Available by FTP Message-ID: <9308062135.AA06104@control.rutgers.edu> As notes for a short course given at the 1993 European Control Conference this summer, I prepared an expository introduction to two related topics: 1. Some mathematical results on "neural networks". 2. "Neurocontrol" and "learning control". The choice of topics was heavily influenced by my interests, but some readers may still find the material useful. The two parts are essentially independent. In particular, the part on mathematical results does not require any knowledge of (nor interest in) control theory. An *extended* version of the paper which appeared in the conference proceedings is now available as a tech report. This report, in postscript form, can be obtained by anonymous FTP. Retrieval instructions are as follows: yourhost> ftp siemens.com Connected to siemens.com. 220 siemens FTP server (SunOS 4.1) ready. Name (siemens.com:sontag): anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> cd pub/learning/TechReports 250 CWD command successful. ftp> bin 200 Type set to I. ftp> get Sontag9302.ps.Z 200 PORT command successful. 150 Binary data connection for Sontag9302.ps.Z (128.6.62.9,1600) (114253 bytes) 226 Binary Transfer complete. local: Sontag9302.ps.Z remote: Sontag9302.ps.Z 114253 bytes received in 24 seconds (4.6 Kbytes/s) ftp> quit 221 Goodbye. yourhost> uncompress Sontag9302.ps.Z yourhost> lpr Sontag9302.ps (or however you print PostScript) ****** Please note: I am not able to send hardcopy. ****** -- Eduardo D. Sontag  From liaw%dylink.usc.edu at usc.edu Fri Aug 6 18:45:39 1993 From: liaw%dylink.usc.edu at usc.edu (Jim Liaw) Date: Fri, 6 Aug 93 15:45:39 PDT Subject: Workshop on Neural Architectures and Distributed AI Message-ID: <9308062245.AA23804@dylink.usc.edu> Please note the change in deadline of submission of abstracts. ------ The Center for Neural Engineering University of Southern California announces a Workshop on Neural Architectures and Distributed AI: >From Schema Assemblages to Neural Networks October 19-20, 1993 [This Workshop was previously scheduled for April 1993] Program Committee: Michael Arbib (Organizer), George Bekey, Damian Lyons, Paul Rosenbloom, and Ron Sun To design complex technological systems, we need a multilevel methodology which combines a coarse- grain analysis of cooperative or distributed computation (we shall refer to the computing agents at this level as "schemas") with a fine-grain model of flexible, adaptive computation (for which neural networks provide a powerful general paradigm). Schemas provide a language for distributed artificial intelligence and perceptual robotics which is "in the style of the brain", but at a relatively high level of abstraction relative to neural networks. We seek (both at the level of schema asemblages, and in terms of "modular" neural networks) a distributed model of computation, supporting many concurrent activities for recognition of objects, and the planning and control of different activities. The use, representation, and recall of knowledge is mediated through the activity of a network of interacting computing agents which between them provide processes for going from a particular situation and a particular structure of goals and tasks to a suitable course of action. This action may involve passing of messages, changes of state, instantiation to add new schema instances to the network, deinstantiation to remove instances, and may involve self-modification and self- organization. Schemas provide a form of knowledge representation which differs from frames and scripts by being of a finer granularity. Schema theory is generative: schemas may well be linked to others to provide yet more comprehensive schemas, whereas frames tend to "build in" from the overall framework. The analysis of interacting computing agents (the schema instances) is intermediate between the overall specification of some behavior and the neural networks that subserve it. The Workshop will focus on different facets of this multi-level methodology. While the emphasis will be on technological systems, papers will also be accepted on biological and cognitive systems. Submission of Papers A list of sample topics for contributions is as follows, where a hybrid approach means one in which the abstract schema level is integrated with neural or other lower level models: Schema Theory as a description language for neural networks Modular neural networks Alternative paradigms for modeling symbolic and subsymbolic knowledge Hierarchical and distributed representations: adaptation and coding: Linking DAI to Neural Networks to Hybrid Architecture Formal Theories of Schemas Hybrid approaches to integrating planning & reaction Hybrid approaches to learning Hybrid approaches to commonsense reasoning by integrating neural networks and rule-based reasoning (using schemas for the integration) Programming Languages for Schemas and Neural Networks Schema Theory Applied in Cognitive Psychology, Linguistics, and Neuroscience Prospective contributors should send a five-page extended abstract, including figures with informative captions and full references - a hard copy, either by regular mail or fax - by August 30, 1993 to Michael Arbib, Center for Neural Engineering, University of Southern California, Los Angeles, CA 90089-2520, USA [Tel: (213) 740-9220, Fax: (213) 746-2863, arbib at pollux.usc.edu]. Please include your full address, including fax and email, on the paper. In accepting papers submitted in response to this Call for Papers, preference will be given to papers which present practical examples of, theory of, and/or methodology for the design and analysis of complex systems in which the overall specification or analysis is conducted in terms of a network of interacting schemas, and where some but not necessarily all of the schemas are implemented in neural networks. Papers which present a single neural network for pattern recognition ("perceptual schema") or pattern generation ("motor schema") will not be accepted. It is the development of a methodology to analyze the interaction of multiple functional units that constitutes the distinctive thrust of this Workshop. Notification of acceptance or rejection will be sent by email no later than September 1, 1993. There are currently no plans to issue a formal proceedings of full papers, but (revised versions) of accepted abstracts received prior to October 1, 1993 will be collected with the full text of the Tutorial in a CNE Technical Report which will be made available to registrants at the start of the meeting. A number of papers have already been accepted for the Workshop. These include the following: Arbib: Schemas and Neural Networks: A Tutorial Introduction to Integrating Symbolic and Subsymbolic Approaches to Cooperative Computation Arkin: Reactive Schema-based Robotic Systems: Principles and Practice Heenskerk and Keijzer: A Real-time Neural Implementation of a Schema Driven Toy-Car Leow and Miikkulainen, Representing and Learning Visual Schemas in Neural Networks for Scene Analysis Lyons & Hendriks: Describing and analysing robot behavior with schema theory Murphy, Lyons & Hendriks: Visually Guided Multi- Fingered Robot Hand Grasping as Defined by Schemas and a Reactive System Sun: Neural Schemas and Connectionist Logic: A Synthesis of the Symbolic and the Subsymbolic Weitzenfeld: Hierarchy, Composition, Heterogeneity, and Multi-granularity in Concurrent Object-Oriented Programming for Schemas and Neural Networks Wilson & Hendler: Neural Network Software Modules Bonus Event: The CNE Research Review: Monday, October 18, 1993 The CNE Review will present a day-long sampling of CNE research, with talks by faculty, and students, as well as demos of hardware and software. Special attention will be paid to talks on, and demos in, our new Autonomous Robotics Lab and Neuro-Optical Computing Lab. Fully paid registrants of the Workshop are entitled to attend the CNE Review at no extra charge. Registration The registration fee of $150 ($40 for qualified students who include a "certificate of student status" from their advisor) includes a copy of the abstracts, coffee breaks, and a dinner to be held on the evening of October 18th. Those wishing to register should send a check payable to "Center for Neural Engineering, USC" for $150 ($40 for students and CNE members) together with the following information to Paulina Tagle, Center for Neural Engineering, University of Southern California, University Park, Los Angeles, CA 90089-2520, USA. --------------------------------------------------- SCHEMAS AND NEURAL NETWORKS Center for Neural Engineering, USC October 19-20, 1993 NAME: ___________________________________________ ADDRESS: _________________________________________ PHONE NO.: _______________ FAX:___________________ EMAIL: ___________________________________________ I intend to submit a paper: YES [ ] NO [ ] I wish to be registered for the CNE Research Review: YES [ ] NO [ ] Accommodation Attendees may register at the hotel of their choice, but the closest hotel to USC is the University Hilton, 3540 South Figueroa Street, Los Angeles, CA 90007, Phone: (213) 748-4141, Reservation: (800) 872-1104, Fax: (213) 7480043. A single room costs $70/night while a double room costs $75/night. Workshop participants must specify that they are "Schemas and Neural Networks Workshop" attendees to avail of the above rates. Information on student accommodation may be obtained from the Student Chair, Jean-Marc Fellous, fellous at pollux.usc.edu.  From sims at pdesds1.scra.org Mon Aug 9 07:39:31 1993 From: sims at pdesds1.scra.org (Jim Sims) Date: Mon, 9 Aug 93 07:39:31 EDT Subject: [fwd: jbeard@aip.org: bee learning in Nature] Message-ID: <9308091139.AA02487@pdesds1.noname> Some cross-disciplinary, ah, pollination. jim From greiner at learning.siemens.com Mon Aug 9 14:59:26 1993 From: greiner at learning.siemens.com (Russell Greiner) Date: Mon, 9 Aug 93 14:59:26 EDT Subject: CLNL'93 Schedule Message-ID: <9308091859.AA05371@eagle.siemens.com> *********************************************************** * CLNL'93 -- Computational Learning and Natural Learning * * Provincetown, Massachusetts * * 10-12 September 1993 * *********************************************************** CLNL'93 is the fourth of an ongoing series of workshops designed to bring together researchers from a diverse set of disciplines --- including computational learning theory, AI/machine learning, connectionist learning, statistics, and control theory --- to explore issues at the intersection of theoretical learning research and natural learning systems. The schedule of presentations appears below, followed by logistics and information on registration ================ ** CLNL'93 Schedule (tentative) ** ======================= Thursday 9/Sept/93: 6:30-9:00 (optional) Ferry (optional): Boston to Provincetown [departs Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue] Friday 10/Sept/93 [CLNL meetings, at Provincetown Inn] 9 - 9:15 Opening remarks 9:15-10:15 Scaling Up Machine Learning: Practical and Theoretical Issues Thomas Dietterich [Oregon State Univ] (invited talk, see abstract below) 10:30-12:30 Paper session 1 What makes derivational analogy work: an experience report using APU Sanjay Bhansali [Stanford]; Mehdi T. Harandi [Univ of Illinois] Scaling Up Strategy Learning: A Study with Analogical Reasoning Manuela M. Veloso [CMU] Learning Hierarchies in Stochastic Domains Leslie Pack Kaebling [Brown] Learning an Unknown Signalling Alphabet Edward C. Posner, Eugene R. Rodemich [CalTech/JPL] 12:30- 2 Lunch (on own) Unscheduled TIME ( Whale watching, beach walking, ... ) ( Poster set-up time; Poster preview (perhaps) ) Dinner (on own) 7 - 10 Poster Session [16 posters] (Hors d'oeuvres) Induction of Verb Translation Rules from Ambiguous Training and a Large Semantic Hierarchy Hussein Almuallim, Yasuhiro Akiba, Takefumi Yamazaki, Shigeo Kaneda [NTT Network Information Systems Lab.] What Cross-Validation Doesn't Say About Real-World Generalization Gunner Blix, Gary Bradshaw, Larry Rendall [Univ of Illinois] Efficient Learning of Regular Expressions from Approximate Examples Alvis Brazma [Univ of Latvia] Capturing the Dynamics of Chaotic Time Series by Neural Networks Gurtavo Deco, Bernd Schurmann [Siemens AG] Learning One-Dimensional Geometrical Patterns Under One-Sided Random Misclassification Noise Paul Goldberg [Sandia National Lab]; Sally Goldman [Washington Univ] Adaptive Learning of Feedforward Control Using RBF Network ... Dimitry M Gorinevsky [Univ of Toronto] A practical approach for evaluating generalization performance Marjorie Klenin [North Carolina State Univ] Scaling to Domains with Many Irrelevant Features Pat Langley, Stephanie Sage [Siemens Corporate Research] Variable-Kernel Similarity Metric Learning David G. Lowe [Univ British Columbia] On-Line Training of Recurrent Neural Networks with Continuous Topology Adaptation Dragan Obradovic [Siemens AG] N-Learners Problem: System of PAC Learners Nageswara Rao, E.M. Oblow [Engineering Systems/Advanced Research] Soft Dynamic Programming Algorithms: Convergence Proofs Satinder P. Singh [Univ of Mass] Integrating Background Knowledge into Incremental Concept Formation Leon Shklar [Bell Communications Research]; Haym Hirsh [Rutgers] Learning Metal Models Astro Teller [Stanford] Generalized Competitive Learning and then Handling of Irrelevant Features Chris Thornton [Univ of Sussex] Learning to Ignore: Psychophysics and Computational Modeling of Fast Learning of Direction in Noisy Motion Stimuli Lucia M. Vaina [Boston Univ], John G. Harris [Univ of Florida] Saturday 11/Sept/93 [CLNL meetings, at Provincetown Inn] 9:00-10:00 Current Tree Research Leo Breiman [UCBerkeley] (invited talk, see abstract below) 10:30-12:30 Paper session 2 Initializing Neural Networks using Decision Trees Arunava Banerjee [Rutgers] Exploring the Decision Forest Patrick M. Murphy, Michael Pazzani [UC Irvine] What Do We Do When There Is Outrageous Data Points in the Data Set? - Algorithm for Robust Neural Net Regression Yong Liu [Brown] A Comparison of RBF and MLP Networks for Classification of Biomagnetic Fields Martin F. Schlang, Ralph Neunier, Klaus Abraham-Fuchs [Siemens AG] 12:30- 2 Lunch (on own) 2:30- 3:30 TBA (invited talk) Yann le Cun [ATT] 4:00- 6:00 Paper session 3 On Learning the Neural Network Architecture: An Average Case Analysis Mostefa Golea [Univ of Ottawa] Fast (Distribution Specific) Learning Dale Schuurmans [Univ of Toronto] Computational capacity of single neuron models Anthony Zador [Yale Univ School of Medicine] Probalistic Self-Structuring and Learning A.D.M. Garvin, P.J.W. Rayner [Cambridge] 7:00- 9 Banquet dinner Sunday 12/Sept/93 [CLNL meetings, at Provincetown Inn] 9 -11 Paper session 4 Supervised Learning from real and Discrete Incomplete Data Zoubin Ghaharamani, Michael Jordan [MIT] Model Building with Uncertainty in the Independent Variable Volker Tresp, Subutai Ahmad, Ralph Neuneier [Siemens AG] Supervised Learning using Unclassified and Classified Examples Geoff Towell [Siemens Corp. Res.] Learning to Classify Incomplete Examples Dale Schuurmans [Univ of Toronto]; R. Greiner [Siemens Corp. Res.] 11:30 -12:30 TBA (invited talk) Ron Rivest [MIT] 12:30 - 2 Lunch (on own) 3:30 - 6:30 Ferry (optional): Provincetown to Boston Depart from Boston (on own) ------ ------ Scaling Up Machine Learning: Practical and Theoretical Issues Thomas G. Dietterich Oregon State University and Arris Pharmaceutical Corporation Supervised learning methods are being applied to an ever-expanding range of problems. This talk will review issues arising in these applications that require further research. The issues can be organized according to the problem-solving task, the form of the inputs and outputs, and any constraints or prior knowledge that must be considered. For example, the learning task often involves extrapolating beyond the training data in ways that are not addressed in current theory or engineering experience. As another example, each training example may be represented by a disjunction of feature vectors, rather than a unique feature vector as is usually assumed. More generally, each training example may correspond to a manifold of feature vectors. As a third example, background knowledge may take the form of constraints that must be satisfied by any hypothesis output by a learning algorithm. The issues will be illustrated using examples from several applications including recent work in computational drug design and ecosystem modelling. -------- Current Tree Research Leo Breiman Deptartment of Statistics University of California, Berkeley This talk will summarize current research by myself and collaborators into methods of enhancing tree methodology. The topics covered will be: 1) Tree optimization 2) Forming features 3) Regularizing trees 4) Multiple response trees 5) Hyperplane trees These research areas are in a simmer. They have been programmed and are undergoing testing. The results are diverse. -------- -------- Programme Committee: Andrew Barron, Russell Greiner, Tom Hancock, Steve Hanson, Robert Holte, Michael Jordan, Stephen Judd, Pat Langley, Thomas Petsche, Tomaso Poggio, Ron Rivest, Eduardo Sontag, Steve Whitehead Workshop Sponsors: Siemens Corporate Research and MIT Laboratory of Computer Science ================ ** CLNL'93 Logistics ** ======================= Dates: The workshop begins at 9am Friday 10/Sept, and concludes by 3pm Sunday 12/Sept, in time to catch the 3:30pm Provincetown--Boston ferry. Location: All sessions will take place in the Provincetown Inn (800 942-5388); we encourage registrants to stay there. Provincetown Massachusetts is located at the very tip of Cape Cod, jutting into the Atlantic Ocean. Transportation: We have rented a ship from The Portuguese Princess to transport CLNL'93 registrants from Boston to Provincetown on Thursday 9/Sept/93, at no charge to the registrants. We will also supply light munchies en route. This ship will depart from the back of Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue (parking garage is 617 439-0328); tentatively at 6:30pm. If you are interested in using this service, please let us know ASAP (via e-mail to clnl93 at learning.scr.siemens.com) and also tell us whether you be able to make the scheduled 6:30pm departure. (N.b., this service replaces the earlier proposal, which involved the Bay State Cruise Lines.) The drive from Boston to Provincetown requires approximately two hours. There are cabs, busses, ferries and commuter airplanes (CapeAir, 800 352-0714) that service this Boston--Provincetown route. The Hyannis/Plymouth bus (508 746-0378) leaves Logan Airport at 8:45am, 11:45am, 2:45pm, 4:45pm on weekdays, and arrives in Provincetown about 4 hours later; its cost is $24.25. For the return trip (only), Bay State Cruise Lines (617 723-7800) runs a ferry that departs Provincetown at 3:30pm on Sundays, arriving at Commonwealth Pier in Boston Harbor at 6:30pm; its cost is $15/person, one way. Inquiries: For additional information about CLNL'93, contact clnl93 at learning.scr.siemens.com or CLNL'93 Workshop Learning Systems Department Siemens Corporate Research 755 College Road East Princeton, NJ 08540--6632 To learn more about Provincetown, contact their Chamber of Commerce at 508 487-3424. ================ ** CLNL'93 Registration ** ======================= Name: ________________________________________________ Affiliation: ________________________________________________ Address: ________________________________________________ ________________________________________________ Telephone: ____________________ E-mail: ____________________ Select the appropriate options and fees: Workshop registration fee ($50 regular; $25 student) ___________ Includes * attendance at all presentation and poster sessions * the banquet dinner on Saturday night; and * a copy of the accepted abstracts. Hotel room ($74 = 1 night deposit) ___________ [This is at the Provincetown Inn, assuming a minimum stay of 2 nights. The total cost for three nights is $222 = $74 x 3, plus optional breakfasts. Room reservations are accepted subject to availability. See hotel for cancellation policy.] Arrival date ___________ Departure date _____________ Name of person sharing room (optional) __________________ [Notice the $74/night does correspond to $37/person per night double-occupancy, if two people share one room.] # of breakfasts desired ($7.50/bkfst; no deposit req'd) ___ Total amount enclosed: ___________ If you are not using a credit card, make your check payable in U.S. dollars to "Provincetown Inn/CLNL'93", and mail your completed registration form to Provincetown Inn/CLNL P.O. Box 619 Provincetown, MA 02657. If you are using Visa or MasterCard, please fill out the following, which you may mail to above address, or FAX to 508 487-2911. Signature: ______________________________________________ Visa/MasterCard #: ______________________________________________ Expiration: ______________________________________________  From bill at nsma.arizona.edu Mon Aug 9 17:00:59 1993 From: bill at nsma.arizona.edu (Bill Skaggs) Date: Mon, 09 Aug 1993 14:00:59 -0700 (MST) Subject: [fwd: jbeard@aip.org: bee learning in Nature] Message-ID: <9308092100.AA10510@nsma.arizona.edu> This is a very interesting piece of work, but the "news release" is overblown and historically ignorant. The connection between mushroom bodies and learning has been known for a long time. There is also direct evidence for changes in the structure of the mushroom bodies as a result of experience: Coss and Perkel over a decade ago found changes in the length of dendritic spines after honeybees went on a single exploratory flight. This is much more direct than the evidence described in the "news release". Contrary to the claims in the "news release", these new results are unlikely to tell us much about human learning. It is not true that the honeybee brain is merely a simpler version of the human brain. They're completely different -- even the neurons are different in structure. Also insect learning and mammal learning are qualitatively different: for example, both honeybees and mammals can learn to navigate to a location using landmarks, but honeybees do it by simple visual pattern-matching, while mammals use considerably more sophisticated algorithms. Furthermore, it is not news that experience can lead to an increase in the number of connections. It has long been known that mammals raised in an enriched environment have thicker cortices, due to a greater density of synaptic structures. Surely this is more directly relevant to humans than data from honeybees could be. It's a shame to obscure a nice piece of work by making bogus claims about its significance. -- Bill  From dhw at santafe.edu Tue Aug 10 15:58:41 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 10 Aug 93 13:58:41 MDT Subject: Provable optimality of averaging generalizers Message-ID: <9308101958.AA15514@zia> Michael Perrone writes: >>> In the case of averaging for MSE optimization (the meat and potatoes of neural networks) and any other convex measure, the improvement due to averaging is independent of the distribution - on-training or off-. It depends only on the topology of the optimization measure. It is important to note that this result does NOT say the average is better than any individual estimate - only better than the average population performance. For example, if one had a reliable selection criterion for deciding which element of a population of estimators was the best and that estimator was better than the average estimator, then just choose the better one. (Going one step further, simply use the selection criterion to choose the best estimator from all possible weighted averages of the elements of the population.) As David Wolpert pointed out, any estimator can be confounded by a pathological data sample and therefore there doesn't exist a *guaranteed* method for deciding which estimator is the best from a population in all cases. Weak (as opposed to guaranteed) selection criteria exist in in the form of cross-validation (in all of its flavors). Coupling cross-validation with averaging is a good idea since one gets the best of both worlds particularly for problems with insufficient data. I think that another very interesting direction for research (as David Wolpert alluded to) is the investigation of more reliable selection criterion. >>> *** Well, I agree with the second two paragraphs, but not the first. At least not exactly as written. Although Michael is making an interesting and important point, I think it helps to draw attention to some things: I) First, I haven't yet gone though Michael's work in detail, but it seems to me that the "measures" Michael is referring to really only make sense as real-world cost functions (otherwise known as loss functions, sometimes as risk functions, etc.). Indeed many very powerful learning algorithms (e.g., memory based reasoning) are not directly cast as finding the minimum on an energy surface, be it "convex" or otherwise. For such algorithms, "measures" come in with the cost function. In fact, *by definition*, one is only interested in such real world cost - results concerning anything else do not concern the primary object of interest. With costs, an example of a convex surface is the quadratic cost function, which says that given truth f, your penalty for guessing h is given by the function (f - h)^2. For such a cost, Michael's result holds essentially because by guessing the average you reduce variance but keep the same bias (as compared to the average over all guesses). In other words, it holds because for any f, h1, and h2, [(h1 + h2)/2 - f)]^2 <= [(h1 - f)^2 + (h2 - f)^2] / 2. (When f, h1, and h2 refer to distributions rather than single values, as Michael rightly points out, you have to worry about other issues before making this statement, like whether the distributions are correlated with one another.) *** It should be realized though that there are many non-convex cost functions in the real world. For example, when doing classification, one popular cost function is zero-one. This function says you get credit for guessing exactly correctly, and if you miss, it doesn't matter what you guessed; all misses "cost you" the same. This cost function is implicit in much of PAC, stat. mech. of learning, etc. Moreover, in Bayesian decision theory, guessing the weights which maximize the posterior probability P(weights | data) (which in the Bayesian perspective of neural nets is exactly what is done in backprop with weight decay) is the optimal strategy only for this zero-one cost. Now if we take this zero-one cost function, and evaluate it only off the training set, it is straight-forward to prove that for a uniform Pr(target function), the probability of a certain value of cost, given data, is independent of the learning algorithm. (The same result holds for other cost functions as well, though as Michael points out, you must be careful in trying to extend this result to convex cost functions.) This is true for any data set, i.e., it is not based on "pathological data", as Michael puts it. It says that unless you can rule out a uniform Pr(target function), you can not prove any one algorithm to be superior to any other (as far as this cost function is concerned). *** II) Okay. Now Michael correctly points out that even in those cases w/ a convex cost "measure", you must interpret his result with caution. I agree, and would say that this is somewhat like the famous "two letters" paradox of probability theory. Consider the following: 1) Say I have 3 real numbers, A, B, and X. In general, it's always true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} / 2. (This is exactly analogous to having the cost of the average guess bounded above by the average cost of the individual guesses.) 2) This means that if we had a choice of either randomly drawing one of the numbers {A, B}, or drawing C, that on average drawing C would give smaller quadratic cost with respect to X. 3) However, as Michael points out, this does *not* mean that if we had just the numbers A and C, and could either draw A or C, that we should draw C. In fact, point (1) tells us nothing whatsoever about whether A or C is preferable (as far as quadratic cost with respect to X is concerned). 4) In fact, now create a 5th number, D = [C + A] / 2. By the same logic as in (1), we see that the cost (wrt/ X) of D is less than the average of the costs of C and A. So to the exact same degree that (1) says we "should" guess C rather than A or B, it also says we should guess D rather than A or C. (Note that this does *not* mean that D's cost is necessarily less than C's though; we don't get endlessly diminishing costs.) 5) Step (4) can be repeated ad infinitum, getting a never-ending sequence of "newly optimal" guesses. In particular, in the *exact* sense in which C is "preferable" to A or B, and therefore should "replace" them, D is preferable to A or B, and therefore should replace *them* (and in particular replace C). So one is never left with C as the object of choice. *** So (1) isn't really normative; it doesn't say one "should" guess the average of a bunch of guesses: 7) Choosing D is better than randomly choosing amongst C or A, just as choosing C is better than randomly choosing amongst A or B. 8) This doesn't mean that given C, one should introduce an A and then guess the average of C and A (D) rather than C, just as this doesn't mean that given A, one should introduce a B and then guess the average of A and B (C) rather than A. 9) An analogy which casts some light on all this: view A and B not as the outputs of separate single-valued learning algorithms, but rather as the random outputs of a single learning algorithm. Using this analogy, the result of Michael's, that one should always guess C rather than randomly amongst A or B, suggest that one should always use a deterministic, single-valued learning algorithm (i.e., just guess C) rather than one that guesses randomly from a distribution over possible guesses (i.e., one that guess randomly amongst A or B). This implication shouldn't surprise anyone familiar with Bayesian decision theory. In fact, it's (relatively) straight-forward to prove that independent of priors or the like, for a convex cost function, one should always use a single-valued learning algorithm rather than one which guesses randomly. (This has probably been proven many times. One proof can be found in Wolpert and Stolorz, On the implementation of Bayes optimal generalizers, SFI tech. report 92-03-012.) (Blatant self-promotion: Other interesting things proven in that report and others in its series are: there are priors and noise processes such that the expected cost, given the data set and that one is using a Bayes-optimal learning algorithm, can *decrease* with added noise; if the cost function is a proper metric, then the magnitude of the change in expected cost if one guesses h rather than h' is bounded above by the cost of h relative to h'; other results about using "Bayes-optimal" generalizers predicated on an incorrect prior, etc., etc.) *** The important point is that although it is both intriguing and illuminating, there are no implications of Michael's result for what one should do with (or in place of) a particular deterministic, single-valued learning algorithm. It was for such learning algorithms that my original comments were intended. David Wolpert  From dhw at santafe.edu Tue Aug 10 16:29:07 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 10 Aug 93 14:29:07 MDT Subject: MacKay's recent work and feature selection Message-ID: <9308102029.AA15554@zia> Recently David MacKay made a posting concerning a technique he used to win an energy prediction competition. Parts of that technique have been done before (e.g., combining generalizers via validation set behavior). However other parts are both novel and very interesting. This posting concerns the "feature selection" aspect of his technique, which I understand MacKay developed in association w/ Radford Neal. (Note: MacKay prefers to call the technique "automatic relevance determination"; nothing I'll discuss here will be detailed enough for that distinction to be important though.) What I will say grew out of conversations w/ David Rosen and Tom Loredo, in part. Of course, any stupid or silly aspects to what I will say should be assumed to originate w/ me. *** Roughly speaking, MacKay implemented feature selection in a neural net framework as follows: 1) Define a potentially different "weight decay constant" (i.e., regularization hyperparameter) for each input neuron. The idea is that one wants to have those constants set high for input neurons representing "features" of the input vector which it behooves us to ignore. 2) One way to set those hyperparameters would be via a technique like cross-validation. MacKay instead set them via maximum likelihood, i.e., he set the weight decay constants alpha_i to those values maximizing P(data | alpha_i). Given a reasonably smooth prior P(alpha_i), this is equivalent to finding the maximum a posterior (MAP) alpha_i, i.e., the alpha_i maximizing P(alpha_i | data). 3) Empirically, David found that this worked very well. (I.e., he won the competition.) *** This neat idea makes some interesting suggestions: 1) The first grows out of "blurring" the distinction between parameters (i.e., weights w_j) and hyperparameters (the alpha_i). Given such squinting, MacKay's procedure amounts to a sort of "greedy MAP". First he sets one set of parameters to its MAP values (the alpha_i), and then with those values fixed, he sets the other parameters (the w_j) to their MAP values (this is done via the usual back-propagation w/ weight-decay, which we can do since the first stage set the weight decay constants). In general, the resultant system will not be at the global MAP maximizing P(alpha_i, w_j | D). In essence, a sort of extra level of regularization has been added. (Note: Radford Neal informs me that calculationally, in the procedure MacKay used, the second MAP step is "automatic", in the sense that one has already made the necessary calcualtions to perform that step when one carries out the first MAP step.) Of course, this viewing the technique from a "blurred" perspective is a bit of a fudge, since hyperparameters are not the same thing as parameters. Nonetheless, this view suggests some interesting new techniques. E.g., first set the weights leading to hidden layer 1 to their MAP values (or maximum likelihood values, for that matter). Then with those values fixed, do the same to the weights in the second layer, etc. Another reason to consider this layer-by-layer technique is the fact that training of the weights connecting different layers should in general be distinguishable, e.g., as MacKay has pointed out, one should have different weight-decay constants for the different layers. 2) Another interesting suggestion comes from justifying the technique not as a priori reasonable, but rather as an approximation to a full "hierarchical" Bayesian technique, in which one writes P(w_j | data) (i.e., the ultimate object of interest) prop. to integral d_alpha_i P(data | w_j alpha_i) P(w_j | alpha_i) P(alpha_i). Note that all 3 distributions occuring in this integrand must be set in order to use MacKay's technique. (The by-now-familiar object of contention between MacKay and myself is on how generically this approximation will be valid, and whether one should explicitly test its validity when one claims that it holds. This issue isn't pertinent to the current discussion however.) Let's assume the approximation is very good. Then under the assumptions: i) P(alpha_i) is flat enought to be ignored; ii) the distribution P(w_j | alpha_i) is a product of gaussians (each gaussian being for those w_j connecting to input neuron i, i.e., for those weights using weight decay constant alpha_i); then what MacKay did is equivalent to back-propagation with weight-decay, where rather than minimizing {training set error} + constant x {sum over all j (w_j)^2}, as in conventional weight decay, MacKay is minimizing (something like) {training set error } + {(sum over i) [ (number of weights connecting to neuron i) x ln [(sum over j; those weights connecting to neuron i) (w_j)^2] ]}. What's interesting about this isn't so much the logarithm in the "weight decay" term, but rather the fact that weights are being clumped together in that weight-decay term, into groups of those weights connecting to the same neuron. (This is not true in conventional weight decay.) So in essence, the weight-decay term in MacKay's scenario is designed to affect all the weights connecting to a given neuron as a group. This makes intuitive sense if the goal is feature selection. 3) One obvious idea based on viewing things this way is to try to perform weight-decay using this modified weight-decay term. This might be reasonable even if MacKay's technique is not a good approximation to this full Bayesian technique. 4) The idea of MacKay's also leads to all kinds of ideas about how to set up the weight-decay term so as to enforce feature selection (or automatic relevance determination, if you prefer). These need not have anything to do w/ the precise weight-decay term MacKay used; rather the idea is simply to take his (implicit) suggestion of trying to do feature selection via the weight-decay term, and see where it leads. For example: Where originally we have input neurons at layer 1, hidden layers 2 through n, and then output neurons at layer n+1, now have the same architecture with an extra "pre-processing" layer 0 added. Inputs are now fed to the neurons at layer 0. For each input neuron at layer 0, there is one and only weight, leading straight up to the neuron at layer 1 which in the original formulation was the (corresponding) input neuron. The hope would be that for those input neurons which we "should" mostly ignore, something like backprop might set the associated weights from layer 0 to layer 1 to very small values. David Wolpert  From rsun at athos.cs.ua.edu Tue Aug 10 17:33:56 1993 From: rsun at athos.cs.ua.edu (Ron Sun) Date: Tue, 10 Aug 1993 16:33:56 -0500 Subject: No subject Message-ID: <9308102133.AA22967@athos.cs.ua.edu> CALL FOR PAPERS International Symposium on Integrating Knowledge and Neural Heuristics (ISIKNH'94) Sponsored by University of Florida, and AAAI, in cooperation with IEEE Neural Network Council, and Florida AI Research Society. Time: May 9-10 1994; Place: Pensacola Beach, Florida, USA. A large amount of research has been directed toward integrating neural and symbolic methods in recent years. Especially, the integration of knowledge-based principles and neural heuristics holds great promise in solving complicated real-world problems. This symposium will provide a forum for discussions and exchanges of ideas in this area. The objective of this symposium is to bring together researchers from a variety of fields who are interested in applying neural network techniques to augmenting existing knowledge or proceeding the other way around, and especially, who have demonstrated that this combined approach outperforms either approach alone. We welcome views of this problem from areas such as constraint-(knowledge-) based learning and reasoning, connectionist symbol processing, hybrid intelligent systems, fuzzy neural networks, multi-strategic learning, and cognitive science. Examples of specific research include but are not limited to: 1. How do we build a neural network based on {\em a priori} knowledge (i.e., a knowledge-based neural network)? 2. How do neural heuristics improve the current model for a particular problem (e.g., classification, planning, signal processing, and control)? 3. How does knowledge in conjunction with neural heuristics contribute to machine learning? 4. What is the emergent behavior of a hybrid system? 5. What are the fundamental issues behind the combined approach? Program activities include keynote speeches, paper presentation, and panel discussions. ***** Scholarships are offered to assist students in attending the symposium. Students who wish to apply for a scholarship should send their resumes and a statement of how their researches are related to the symposium. ***** Symposium Chairs: LiMin Fu, University of Florida, USA. Chris Lacher, Florida State University, USA. Program Committee: Jim Anderson, Brown University, USA Michael Arbib, University of Southern California, USA Fevzi Belli, The University of Paderborn, Germany Jim Bezdek, University of West Florida, USA Bir Bhanu, University of California, USA Su-Shing Chen, National Science Foundation, USA Tharam Dillon, La Trobe University, Australia Douglas Fisher, Vanderbilt University, USA Paul Fishwick, University of Florida, USA Stephen Gallant, HNC Inc., USA Yoichi Hayashi, Ibaraki University, Japan Susan I. Hruska, Florida State University, USA Michel Klefstad-Sillonville CCETT, France David C. Kuncicky, Florida State University, USA Joseph Principe, University of Florida, USA Sylvian Ray, University of Illinois, USA Armando F. Rocha, University of Estadual, Brasil Ron Sun, University of Alabama, USA Keynote Speaker: Balakrishnan Chandrasekaran, Ohio-State University Schedule for Contributed Papers ---------------------------------------------------------------------- Paper Summaries Due: December 15, 1993 Notice of Acceptance Due: February 1, 1994 Camera Ready Papers Due: March 1, 1994 Extended paper summaries should be limited to four pages (single or double-spaced) and should include the title, names of the authors, the network and mailing addresses and telephone number of the corresponding author. Important research results should be attached. Send four copies of extended paper summaries to LiMin Fu Dept. of CIS, 301 CSE University of Florida Gainesville, FL 32611 USA (e-mail: fu at cis.ufl.edu; phone: 904-392-1485). Students' applications for a scholarship should also be sent to the above address. General information and registration materials can be obtained by writing to Rob Francis ISIKNH'94 DOCE/Conferences 2209 NW 13th Street, STE E University of Florida Gainesville, FL 32609-3476 USA (Phone: 904-392-1701; fax: 904-392-6950) --------------------------------------------------------------------- --------------------------------------------------------------------- If you intend to attend the symposium, you may submit the following information by returning this message: NAME: _______________________________________ ADDRESS: ____________________________________ _____________________________________________ _____________________________________________ _____________________________________________ _____________________________________________ PHONE: ______________________________________ FAX: ________________________________________ E-MAIL: _____________________________________ ---------------------------------------------------------------------  From ld231782 at longs.lance.colostate.edu Wed Aug 11 00:56:26 1993 From: ld231782 at longs.lance.colostate.edu (L. Detweiler) Date: Tue, 10 Aug 93 22:56:26 -0600 Subject: neuroanatomy list ad & more on bee brains In-Reply-To: Your message of "Mon, 09 Aug 93 14:00:59 PDT." <9308092100.AA10510@nsma.arizona.edu> Message-ID: <9308110456.AA06912@longs.lance.colostate.edu> While many on this list will not be interested in the details of bee-brain neuroanatomy or arguments thereon, an excellent list for discussions of this can be requested from cogneuro-request at ptolemy.arc.nasa.gov, maintained by Kimball Collins . The list has fairly low volume although definitely more than connectionists, and I'd like to encourage any of this amazingly literate connectionist crowd with a strong interest in neurobiological research to subscribe (recent/past topics: neurobiology of rabies infections, Hebb's rule, vision, dyslexia, etc.) * * * Mr. Skaggs writes an exceedingly hostile flame (a redundant phrase) on the recent syndicated news article describing research into bee function and neuroanatomy, calling it `overblown and historically ignorant'. While I don't have as close of a background to the area in question as Mr. Skaggs appears to, this is just a short note to balance the scale a little closer to equilibrium. The critical feature that I see going on here is a professional scientist demeaning a non-detailed popular account of scientific work, esp. in that person's area of expertise, for lapses in precise description. This happens all the time, of course, both the presence of the quasi-skewed material and the criticism. Definitely, the article was the overwrought cheerleeding type, rather stereotypical, but Mr. Skaggs, on the other hand, plays into the cliche of the pessimistic and sour curmudgeon-scientist in attacking it. I'd like to point out that this popular literature serves a very useful purpose in keeping the lay public apprised of new developments in scientific fields and, ultimately, encouraging funding. It is not fair to apply the strict scientific standard of evaluation to something that appears in the popular press. In this case, there is no significant error, and the purpose is served in being `approximately correct', and there is no point to rebutting it. We are bound to lose something in the translation, and the major points of disagreement are likely to be over opinion. We should instead be highly encouraged and appreciative of these attempts to bring increasingly abstruse and technical science to the interested layman. I appreciate the popular press to some degree in that it forces scientists to get at the essence of their research, something they sometimes lose sight of. The scientist (perhaps the neuroscientist in particular) is forever saying `it's not quite that simple' or `it doesn't quite happen like that' or `there are exceptions to that' to the point that an outsider can give up in frustration, thinking that it is nothing but a disconnected morass with no underlying message or cohesion. The general press usually gives a close and fascinating view into what the `big picture' is. Looking at reporters as nothing but clueless intruders is a somewhat self-destructive position, IMHO. And yes, the grandiose statements like `will shed insight into human learning' can be recognized by other scientists as the necessary fodder and not criticized but ignored. Now, to address a few points: >Coss and Perkel over a decade ago found >changes in the length of dendritic spines after honeybees went on a >single exploratory flight. This is much more direct than the evidence >described in the "news release". Incidentally, the changes in dendritic growth with learning are IMHO one of the most fascinating studies of plasticity, and on the cutting edge of current research, and perhaps others will wish to post references. (The classic study showed that rats reared in deprived vs. abundant sensory-stimulii containing environments had less or more growth, respectively.) >It is not true that the >honeybee brain is merely a simpler version of the human brain. They're >completely different -- even the neurons are different in structure. definitely, any animal model always has minor or major imperfections and pitfalls. But this brings up an interesting point--is there an analogue to LTP in the insect brain? there is probably at least a degree of overlap in the kinds of neurotransmitters involved. However, arguing against the relevance, superiority, and verisimilitude of one animal model vs. another can turn into a very emotional debate, and should be engaged with the utmost delicacy or statements come out with a connotation much like `the car you drive all day is worthless'.  From delliott at src.umd.edu Wed Aug 11 17:52:44 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Wed, 11 Aug 1993 17:52:44 -0400 Subject: Call for papers, NeuroControl book Message-ID: <199308112152.AA04995@newra.src.umd.edu> PROGRESS IN NEURAL NETWORKS series Editor O. M. Omidvar Special Volume: NEURAL NETWORKS FOR CONTROL Editor: David L. Elliott CALL FOR PAPERS Original manuscripts describing recent progress in neural networks research directly applicable to Control or making use of modern control theory. Manuscripts may be survey or tutorial in nature. Suggested topics for this book are: %New directions in neurocontrol %Adaptive control %Biological control architectures %Mathematical foundations of control %Model-based control with learning capability %Natural neural control systems %Neurocontrol hardware research %Optimal control and incremental dynamic programming %Process control and manufacturing %Reinforcement-Learning Control %Sensor fusion and vector quantization %Validating neural control systems The papers will be refereed and uniformly typeset. Ablex and the Progress Series editors invite you to submit an abstract, extended summary or manuscript proposal, directly to the Special Volume Editor: Dr. David L. Elliott, Institute for Systems Research University of Maryland, College Park, MD 20742 Tel: (301)405-1241 FAX (301)314-9920 Email: DELLIOTT at SRC.UMD.EDU or to the Series Editor: Dr. Omid M. Omidvar, Computer Science Dept., University of the District of Columbia, Washington DC 20008 Tel: (202)282-7345 FAX: (202)282-3677 Email: OOMIDVAR at UDCVAX.BITNET The Publisher is Ablex Publishing Corporation, Norwood, NJ  From pittman at mcc.com Thu Aug 12 08:38:30 1993 From: pittman at mcc.com (Jay Pittman) Date: Thu, 12 Aug 93 08:38:30 EDT Subject: neuroanatomy list ad & more on bee brains Message-ID: <9308121338.AA14022@gluttony.mcc.com> Excellent note, well stated. I agree with everything Detweiler said about the press. On the other hand, when I originally read Bill Skaggs note I didn't think he was being all that critical. I went back and looked at it again, and, yes, he does sound like a real flamer, WHEN I START OUT ASSUMING THAT. One can also read it as a calmly-stated critique of the article. I find myself imagining different "tones of voice", depending on (presumably) random triggers. I hope when you read this note you perceive me speaking in a calm, relaxed manner. While I agree with Detweiler's attitude toward the popular press, I think Skaggs statements were addressed to us, the members of the research community, and not to the reporters. As long as the note does not reach members of that community, we should tolerate somewhat-more-grouchy phrasing than we might want for lay consumption. I've just spent a lot of time trying to carefully word the above message. The neat thing about a group such as connectionists is that (I think) we can skip that labor, and just spit out our thoughts. Or perhaps I am being naive? BTW, I have no HO on bee brains. My own dendrites get thinner every day. J  From chris at arraysystems.nstn.ns.ca Sat Aug 14 14:25:29 1993 From: chris at arraysystems.nstn.ns.ca (Chris Brobeck) Date: Sat, 14 Aug 93 15:25:29 ADT Subject: Genetic Algorithms Message-ID: <9308141825.AA07238@arraysystems.nstn.ns.ca> Dear Colleagues; We're currently in the process of building a relatively large net and were looking at of using a genetic algorithm to optimize the network structure. The question is as follows. Early forms of genetic algorithms seemed to rely on reading the gene once, linearly, in the construction process, whereas a number of more recent algorithms allow the reading to start anywhere along the gene, and continue to read (construct rules) until some stopping criteria is met. In the former, it seems reasonable then for one organism to compete against the other in a winner-take-all sort of way. On the other hand, the rigidity of the genetic structure makes it very sensitive to mutation. In the latter case the gene may be thought of as a generator for a process (randomly) creating rules of a variety of lengths. If one assumes that individual rules are much shorter than the entire gene this method becomes less sensitive to mutation,crossover,etc (both the beneficial and not so beneficial aspects). In this case it seems that competition among species would be as critical as competion among individuals, with the interspecies competion perhaps representing a fast way to remove ineffective rule sets, and individual competion more of a way of fine-tuning a distribution. The upshot would be (one assumes) slower but more robust convergence. In any case, if there is anyone out there who can point us in the direction of some good references let us know - particularly ones that might be available via ftp. Thanks, Chris Brobeck.  From bengio at iro.umontreal.ca Mon Aug 16 11:09:57 1993 From: bengio at iro.umontreal.ca (Samy Bengio) Date: Mon, 16 Aug 1993 11:09:57 -0400 Subject: Preprint announcement: Generalization of a Parametric Learning Rule Message-ID: <9308161509.AA06576@carre.iro.umontreal.ca> FTP-host: archive.cis.ohio-state.edu FTP-filename: /pub/neuroprose/bengio.general.ps.Z The following file has been placed in neuroprose: (no hardcopies will be provided): GENERALIZATION OF A PARAMETRIC LEARNING RULE (8 pages) by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei Abstract: In previous work we discussed the subject of parametric learning rules for neural networks. In this article, we present a theoretical basis permitting to study the {\it generalization} property of a learning rule whose parameters are estimated from a set of learning tasks. By generalization, we mean the possibility of using the learning rule to learn solve new tasks. Finally, we describe simple experiments on two-dimensional categorization tasks and show how they corroborate the theoretical results. This paper is an extended version of a paper which will appear in ICANN'93: Proceedings of the International Conference on Artificial Neural Networks. To retrieve the file: unix> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. 220 cheops.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get bengio.general.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for bengio.general.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye. unix> uncompress bengio.general.ps.Z unix lpr bengio.general.ps (or however you print out postscript) Many thanks to Jordan Pollack for maintaining this archive. -- Samy Bengio E-mail: bengio at iro.umontreal.ca Fax: (514) 343-5834 Tel: (514) 343-6111 ext. 3545/3494 Residence: (514) 495-3869 Universite de Montreal, Dept. IRO, C.P. 6128, Succ. A, Montreal, Quebec, Canada, H3C 3J7  From reza at ai.mit.edu Mon Aug 16 12:37:02 1993 From: reza at ai.mit.edu (Reza Shadmehr) Date: Mon, 16 Aug 93 12:37:02 EDT Subject: Tech Reports from CBCL at MIT Message-ID: <9308161637.AA03497@corpus-callosum.ai.mit.edu> The following technical reports from the Center for Biological and Computational Learning at M.I.T. are now available via anonymous ftp. -------------- :CBCL Paper #83/AI Memo #1440 :author Michael I. Jordan and Robert A. Jacobs :title Hierarchical Mixtures of Experts and the EM Algorithm :date August 1993 :pages 29 We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation- Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. -------------- :CBCL Paper #84/AI Memo #1441 :author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh :title On the Convergence of Stochastic Iterative Dynamic Programming Algorithms :date August 1993 :pages 13 Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD (lambda) and Q-learning belong. ============================ How to get a copy of above reports: The files are in compressed postscript format and are named by their AI memo number, e.g., the Jordan and Jacobs paper is named AIM-1440.ps.Z. Here is the procedure for ftp-ing: unix> ftp ftp.ai.mit.edu (log-in as anonymous) ftp> cd ai-pubs/publications/1993 ftp> binary ftp> get AIM-number.ps.Z ftp> quit unix> zcat AIM-number.ps.Z | lpr I will periodically update the above list as new titles become available. Best wishes, Reza Shadmehr Center for Biological and Computational Learning M. I. T. Cambridge, MA 02139  From mikewj at signal.dra.hmg.gb Tue Aug 17 04:15:33 1993 From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb) Date: Tue, 17 Aug 93 09:15:33 +0100 Subject: Practical Neural Nets Conference & Workshops in UK. Message-ID: AA20707@ravel.dra.hmg.gb *********************************** NEURAL COMPUTING APPLICATIONS FORUM *********************************** 8 - 9 September 1993 Brunel University, Runnymede, UK ***************************************** PRACTICAL APPLICATIONS OF NEURAL NETWORKS ***************************************** Neural Computing Applications Forum is the primary meeting place for people developing Neural Network applications in industry and academia. It has 200 members from the UK and Europe, from universities, small companies and big ones, and holds four main meeting each year. It has been running for 3 years, and is cheap to join. This meeting spans two days with informal workshops on 8 September and the main meeting comprising talks about neural network techniques and applications on 9 September. ********* WORKSHOPS ********* ********************************************************** Neural Networks in Engine Health Monitoring 8 September, 13.00 to 15.00 ********************************************************** Technical Contact: Tom Harris: (+44/0) 784 431341 Including : Roger Hutton (ENTEK): "What is Predictive Maintenance?" John Hobday (Lloyds Register): "Gas Turbine Start Monitoring" John McIntyre (University of Sunderland / National Power plc): Predictive Maintenance at Blyth Power Station ********************************************************* Building a Neural Network Application 8 September, 15.30 to 17.30 ********************************************************* Technical Contact: Tom Harris: (+44/0) 784 431341 Including: Chris Bishop (Aston University): "The DTI Neural Computing Guidelines Project" Tom Harris (Brunel University): "A Design Process for Neural Network Applications" Paul Gregory (Recognition Research Ltd.): "Building an Applications in Software" (case study) Simon Hancock (Neural Technologies Ltd.): "Implementing Hardware Neural Network Solutions" (case study) ************************* Evening: Barbecue Supper ************************* ***************************** MAIN MEETING - 24 June 1993 ***************************** 8.30 Registration 9.05 Welcome 9.15 Neil Burgess (CRL): "Feature Selection in Neural Networks" 9.50 Bryn Williams (Aston University): "Convergence and Diversity of Species in Genetic Algorithms for Optimization of a Bump-Tree Classifier" 10.20 Coffee 11.00 Mike Brinn (Health and Safety Executive): "Kohonen Networks Classifying Toxic Molecules" 11.40 John Bridle (Dragon Systems Ltd.): "Speech Recognition in Principle and Practice" 12.15 Lunch 2.00 Bruce Wilkie (Brunel University): "Real Time Logical Neural Networks" 2.40 Stan Swallow (Brunel University): "TARDIS: The World's Fastest Neural Network?" 3.15 Tea 3.40 Dave Cressy (Logica Cambridge Ltd.): "Neural Control of an Experimental Batch Distillation Column" 4.10 Discussions 4.30 Close & minibus to the station ACCOMODATION is available in Brunel University at 35 pounds (including barbecue supper) and **MUST** be booked and paid for in advance. Accommodation and breakfast only: 25 pounds; barbecue supper only: 12 pounds. ***************** Application ***************** Members of NCAF get free entry to all meetings for a year. (This is very good value - main meetings, tutorials, special interest meetings). It also includes subscription to Springer Verlag's journal "Neural Computing and Applications". Full membership: 250 pounds. - anybody in your small company / research group in big company. Individual membership: 140 pounds - named individual only. Student membership (with journal): 55 pounds - copy of student ID required. Student membership (no journal, very cheap!): 25 pounds - copy of student ID required. Entry to this meeting without membership costs 35 pounds for the workshops, and 80 pounds for the main day. Payment in advance if possible; please give an official order number if an invoice is required. Email enquiries to Mike Wynne-Jones, mikewj at signal.dra.hmg.gb. Postal to Mike Wynne-Jones, NCAF, PO Box 62, Malvern, WR14 4NU, UK. Fax to Karen Edwards, (+44/0) 21 333 6215  From mpp at cns.brown.edu Tue Aug 17 12:18:01 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Tue, 17 Aug 93 12:18:01 EDT Subject: Provable optimality of averaging generalizers Message-ID: <9308171618.AA15207@cns.brown.edu> David Wolpert writes: -->1) Say I have 3 real numbers, A, B, and X. In general, it's always -->true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} / -->2. (This is exactly analogous to having the cost of the average guess -->bounded above by the average cost of the individual guesses.) --> -->2) This means that if we had a choice of either randomly drawing one -->of the numbers {A, B}, or drawing C, that on average drawing C would -->give smaller quadratic cost with respect to X. --> -->3) However, as Michael points out, this does *not* mean that if we had -->just the numbers A and C, and could either draw A or C, that we should -->draw C. In fact, point (1) tells us nothing whatsoever about whether A -->or C is preferable (as far as quadratic cost with respect to X is -->concerned). --> -->4) In fact, now create a 5th number, D = [C + A] / 2. By the same -->logic as in (1), we see that the cost (wrt/ X) of D is less than the -->average of the costs of C and A. So to the exact same degree that (1) -->says we "should" guess C rather than A or B, it also says we should -->guess D rather than A or C. (Note that this does *not* mean that D's -->cost is necessarily less than C's though; we don't get endlessly -->diminishing costs.) --> -->5) Step (4) can be repeated ad infinitum, getting a never-ending -->sequence of "newly optimal" guesses. In particular, in the *exact* -->sense in which C is "preferable" to A or B, and therefore should -->"replace" them, D is preferable to A or B, and therefore should -->replace *them* (and in particular replace C). So one is never left -->with C as the object of choice. This argument does not imply a contradiction for averaging! This argument shows the natural result of throwing away information. Step (4) throws away number B. Given that we no longer know B, number D is the correct choice. (One could imagine such "forgetting" to be useful in time varying situations - which leads towards the Kalman filtering that was mentioned in relation to averaging a couple of weeks ago.) In Step (5), an infinite sequence is developed by successively throwing away more and more of number B. The infinite limit of Step (5) is number A. In other words, we have thrown away all knowledge of B. -->So (1) isn't really normative; it doesn't say one "should" guess the -->average of a bunch of guesses: Normative? Hey is this an ethics class!? :-) -->7) Choosing D is better than randomly choosing amongst C or A, just as --> choosing C is better than randomly choosing amongst A or B. --> -->8) This doesn't mean that given C, one should introduce an A and --> then guess the average of C and A (D) rather than C, just as --> this doesn't mean that given A, one should introduce a B and --> then guess the average of A and B (C) rather than A. Sure, if you're willing to throw away information. Michael  From cns at clarity.Princeton.EDU Tue Aug 17 11:30:02 1993 From: cns at clarity.Princeton.EDU (Cognitive Neuroscience) Date: Tue, 17 Aug 93 11:30:02 EDT Subject: RFP Research - McDonnell-Pew Program Message-ID: <9308171530.AA27618@clarity.Princeton.EDU> McDonnell-Pew Program in Cognitive Neuroscience SEPTEMBER 1993 Individual Grants-in-Aid for Research Program supported jointly by the James S. McDonnell Foundation and The Pew Charitable Trusts INTRODUCTION The McDonnell-Pew Program in Cognitive Neuroscience has been created jointly by the James S. McDonnell Foundation and The Pew Charitable Trusts to promote the development of cognitive neuroscience. The foundations have allocated $20 million over a five-year period for this program. Cognitive neuroscience attempts to understand human mental events by specifying how neural tissue carries out computations. Work in cognitive neuroscience is interdisciplinary in character, drawing on developments in clinical and basic neuroscience, computer science, psychology, linguistics, and philosophy. Cognitive neuroscience excludes descriptions of psychological function that do not address the underlying brain mechanisms and neuroscientific descriptions that do not speak to psychological function. The program has three components. (1) Institutional grants, which have already been awarded, for the purpose of creating centers where cognitive scientists and neuroscientists can work together. (2) Small grants-in-aid, presently being awarded, for individual research projects to encourage Ph.D. and M.D. investigators in cognitive neuroscience. (3) Small grants-in-aid, presently being awarded, for individual training projects to encourage Ph.D. and M.D. investigators to acquire skills for interdisciplinary research. This brochure describes the individual grants-in-aid for research. RESEARCH GRANTS The McDonnell-Pew Program in Cognitive Neuroscience will issue a limited number of awards to support collaborative work by cognitive neuroscientists. Applications are sought for projects of exceptional merit that are not currently fundable through other channels and from investigators who are not at institutions already funded by an institutional grant from the program. In order to distribute available funds as widely as possible, preference will be given to applicants who have not received previous grants under this program. Preference will be given to projects that are interdisciplinary in character. The goals of the program are to encourage broad participation in the development of the field and to facilitate the participation of investigators outside the major centers of cognitive neuroscience. There are no U.S. citizenship restrictions or requirements, nor does the proposed work need to be conducted at a U.S. institution, providing the sponsoring organization qualifies as tax-exempt as described in the "Applications" section of this brochure. Ph.D. thesis research of graduate students will not be funded. Grant support under the research component is limited to $30,000 per year for two years. Indirect costs are to be included in the $30,000 maximum and may not exceed 10 percent of total salaries and fringe benefits. These grants are not renewable after two years. The program is looking for innovative proposals that would, for example: * combine experimental data from cognitive psychology and neuroscience; * explore the implications of neurobiological methods for the study of the higher cognitive processes; * bring formal modeling techniques to bear on cognition, including emotions and higher thought processes; * use sensing or imaging techniques to observe the brain during conscious activity; * make imaginative use of patient populations to analyze cognition; * develop new theories of the human mind/brain system. This list of examples is necessarily incomplete but should suggest the general kind of proposals desired. Ideally, a small grant-in-aid for research should facilitate the initial exploration of a novel or risky idea, with success leading to more extensive funding from other sources. APPLICATIONS Applicants should submit five copies of the following information: * a brief, one-page abstract describing the proposed work; * a brief, itemized budget that includes direct and indirect costs (indirect costs may not exceed 10 percent of total salaries and fringe benefits); * a budget justification; * a narrative proposal that does not exceed 5,000 words; the 5,000-word proposal should include: 1) a description of the work to be done and where it might lead; 2) an account of the investigator's professional qualifications to do the work; 3) an account of any plans to collaborate with other cognitive neuroscientists; 4) a brief description of the available research facilities; * curriculum(a) vitae of the participating investigator(s); * an authorized document indicating clearance for the use of human and animal subjects; * an endoresement letter from the officer of the sponsoring institution who will be responsible for administering the grant. One copy of the following items must also be submitted along with the proposal. These documents should be readily available from the sponsoring institution's grants or development office. * A copy of the IRS determination letter, or the international equivalent, stating that the sponsoring organization is a nonprofit, tax-exempt institution classified as a 501(c)(3) organization. * A copy of the IRS determination letter stating that your organization is not listed as a private foundation under section 509(a) of the Internal Revenue Service Code. * A statement on the sponsoring institution's letterhead, following the wording on Attachment A and signed by an officer of the institution, certifying that the status or purpose of the organization has not changed since the issuance of the IRS determinations. (If your organization's name has changed, include a copy of the IRS document reflecting this change.) * An audited financial statement of the most recently completed fiscal year of the sponsoring organization. * A current list of the names and professional affiliations of the members of the organization's board of trustees and the names and titles of the principal officers. Other appended documents will not be accepted for evaluation and will be returned to the applicant. Any incomplete proposals will also be returned to the applicant. Submissions will be reviewed by the program's advisory board. Applications must be postmarked on or before FEBRUARY 1 to be considered for review. INFORMATION McDonnell-Pew Program in Cognitive Neuroscience Green Hall 1-N-6 Princeton University Princeton, New Jersey 08544-1010 Telephone: 609-258-5014 Facsimile: 609-258-3031 Email: cns at clarity.princeton.edu ADVISORY BOARD Emilio Bizzi, M.D. Eugene McDermott Professor in the Brain Sciences and Human Behavior Chairman, Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, E25-526 Cambridge, Massachusetts 02139 Sheila E. Blumstein, Ph.D. Professor of Cognitive and Linguistic Sciences Dean of the College Brown University University Hall, Room 218 Providence, Rhode Island 02912 Stephen J. Hanson, Ph.D. Head, Learning Systems Department Siemens Corporate Research 755 College Road East Princeton, New Jersey 08540 Jon H. Kaas, Ph.D. Centennial Professor Department of Psychology Vanderbilt University 301 Wilson Hall 111 21st Avenue South Nashville, Tennessee 37240 George A. Miller, Ph.D. Director, McDonnell-Pew Program in Cognitive Neuroscience James S. McDonnell Distinguished University Professor of Psychology Department of Psychology Princeton University Princeton, New Jersey 08544-1010 Mortimer Mishkin, Ph.D. Chief, Laboratory of Neurpsychology National Institute of Mental Health 9000 Rockville Pike Building 49, Room 1B80 Bethesda, Maryland 20892 Marcus E. Raichle, M.D. Professor of Neurology and Radiology Division of Radiation Sciences Washington University School of Medicine Campus Box 8225 510 S. Kingshighway Boulevard St. Louis, Missouri 63110 Endel Tulving, Ph.D. Tanenbaum Chair in Cognitive Neuroscience Rotman Research Institute of Baycrest Centre 3560 Bathurst Street North York, Ontario M6A 2E1 Canada  From dhw at santafe.edu Tue Aug 17 21:26:08 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 17 Aug 93 19:26:08 MDT Subject: Yet more on averaging Message-ID: <9308180126.AA02904@zia> In several recent e-mail conversations, Michael Perrone and I have gotten to where think we agree with each other substance, although we disagree a bit on emphasis. To complete the picture for the connectionist community and present the other side to Michael's recent posting: In my back pocket, I have a number. I'll fine you according to the squared difference between your guess for the number and its actual value. Okay, should you guess 3 or 5? Obviously you can't answer. 7 or 5? Same response. 5 or a random sample of 3 or 7? Now, as Michael points out, you *can* answer: 5. However I'm not as convinced as Michael that this actually tells us anything of practical use. How should you use this fact to help you guess the number in my back pocket? Seems to me you can't. The bottom line, as I see it: arguments like Michael's show that one should always use a single-valued learning algorithm rather than a stochastic one. (Subtle caveat: If used only once, there is no difference between a stochastic learning algorithm and a single-valued one; multiple trials are implicitly assumed here.) But if one has before one a smorgasbord of single-valued learning algorithms, one can not infer that one should average over them. Even if I choose amongst them in a really stupid way (say according to the alphabetical listing of their creators), *so long as I am consistent and single-valued in how I make my choice*, I have no assurace that doing this will give worse results than averaging them. To sum it up: one can not prove averaging to be preferable to a scheme like using the alphabet to pick. Michael's result shows instead that averaging the guess is better (over multiple trials) than randomly picking amongst the guesses. Which simply means that one should not randomly pick amongst the guesses. It does *not* mean that one should average rather than use some other (arbitrarilly silly) single-valued scheme. David Wolpert Disclaimer: All the above notwithstanding, I personally *would* use some sort of averaging scheme in practice. The only issue of contention here is what is *provably* the way one should generalize. In addition to disseminating the important result concerning the sub-optimality of stochastic schems (of which there are many in the neural nets community!), Michael is to be commended for bringing this entire fascinating subject to the attention of the community.  From tmb at idiap.ch Wed Aug 18 02:27:58 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Wed, 18 Aug 93 08:27:58 +0200 Subject: Yet more on averaging Message-ID: <9308180627.AA18505@idiap.ch> In reply to dhw at santafe.edu: |The bottom line, as I see it: arguments like Michael's show that one |should always use a single-valued learning algorithm rather than a |stochastic one. From tmb at idiap.ch Wed Aug 18 02:29:42 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Wed, 18 Aug 93 08:29:42 +0200 Subject: Yet more on averaging Message-ID: <9308180629.AA18508@idiap.ch> dhw at santafe.edu writes: |To sum it up: one can not prove averaging to be preferable to a scheme |like using the alphabet to pick. Michael's result shows instead that |averaging the guess is better (over multiple trials) than randomly |picking amongst the guesses. | |Which simply means that one should not randomly pick amongst the |guesses. It does *not* mean that one should average rather than use |some other (arbitrarilly silly) single-valued scheme. I would like to strengthen this point a little. In general, averaging is clearly not optimal, nor even justifiable on theoretical grounds. For example, let us take the classification case and let us assume that each neural network $i$ returns an estimate $p^i_j(x)$ of the probability that the object belongs to class $j$ given the measurement $x$. Consider now the case in which we know that the predictions of those networks are statistically independent (for example, because they are run on independent parts of the input data). Then, we should really multiply the probabilities estimated by each network, rather than computing a weighted sum. That is, we should make a decision according to the maximum of $\prod_i p^i_j(x)$, not according to the maximum of $\sum_i w_i p^i_j(x)$ (assuming a 0-1 loss function). As another example, consider the case in which we have an odd number of experts. If they are trained and designed individually in a particularly peculiar way, it might turn out that the optimal decision rule is to output class 1 if an odd number of them pick class 1, and pick class 0 otherwise. Now, Michael probably limits the scope of his claims in his thesis to exclude such cases (I only had a brief look, I must admit), but I think it is important to make the point that, without some additional assumptions, averaging is just a heuristic and not necessarily optimal. Still, linear combinations of the outputs of classifiers, regressors, and networks seem to be useful in practice for improving classification rates in many cases. Lots of practical experience in both statistics and neural networks points in that direction. Thomas.  From dhw at santafe.edu Wed Aug 18 18:37:32 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Wed, 18 Aug 93 16:37:32 MDT Subject: Random vs. single-valued rules Message-ID: <9308182237.AA03709@zia> tmb writes: >>>>> In reply to dhw at santafe.edu: |The bottom line, as I see it: arguments like Michael's show that one |should always use a single-valued learning algorithm rather than a |stochastic one. >From context, I'm assuming that you are referring to "deterministic" vs. "randomized" decision rules, as they are called in decision theory ("stochastic learning algorithm" means something different to me, but maybe I'm just misinterpreting your posting). Picking an opinion from a pool of experts randomly is clearly not a particularly good randomized decision rule in most cases. However, there are cases in which properly chosen randomized decision rules are important (any good introduction on Bayesian statistics should discuss this). Unless there is an intelligent adversary involved, such cases are probably mostly of theoretical interest, but nonetheless, a randomized decision rule can be "better" than any deterministic one. >>>> Implicit in my statement was the context of Michael Perrone's posting (which I was responding to): convex loss functions, and the fact that in particular, one "single-valued learning algorithm" one might use is the one Michael advocates: average over your pool of experts. Obviously one can choose a single-valued learning algorithm which performs more poorly than randomly drawing from a pool of experts: 1) One can prove that (for convex loss) averaging over the pool is preferable to randomly sampling the pool (Michael's result; note assumptions about lack of correlations between the experts and the like apply.) 2) One can not prove that averaging beats any other single-valued use of the experts. 3) Note that neither (1) nor (2) contradict the assertion that there might be single-valued algorithms which perform worse than randomly sampling the pool. 4) For the case of a 0-1 loss function, and a uniform prior over target functions, it doesn't matter how you guess; all algorithms perform the same, both averaged over data and for one particular data (as far as off-training set average loss is concerned). David Wolpert  From tmb at idiap.ch Thu Aug 19 09:17:14 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Thu, 19 Aug 93 15:17:14 +0200 Subject: Yet more on averaging In-Reply-To: <9308180629.AA18508@idiap.ch> References: <9308180629.AA18508@idiap.ch> Message-ID: <9308191317.AA22756@idiap.ch> I wrote, in response to a discussion of Michael Perrone's work: |In general, averaging is clearly not optimal, nor even justifiable on |theoretical grounds. [... some examples follow...] Judging from some private mail that I have been receiving, some people seem to have misunderstood my message. I wasn't making a statement about Michael's results per se, but about their application. In particular, in the case of combining estimates of probabilities by different "experts" for subsequent classification (e.g., in Michael's OCR example), or in the case of combining expert "votes", using any kind of linear combination is not justifiable in general on theoretical grounds, and it is actually provably suboptimal in some cases. Now, such examples do violate some of the assumptions on which Michael's results rely, so there is no contradiction. My message was only intended as a reminder that there are a number of important problems in which the assumptions actually are violated, and in which the approach of linear combinations reduces to a heuristic (one, I might add, that often does work well in practice). Thomas.  From brandyn at brainstorm.com Fri Aug 20 03:32:18 1993 From: brandyn at brainstorm.com (Brandyn) Date: Fri, 20 Aug 93 00:32:18 PDT Subject: Paper available on neuroprose Message-ID: <9308200732.AA14000@brainstorm.com> FTP-host: archive.cis.ohio-state.edu FTP-file: pub/neuroprose/webb.furf.ps.Z The following paper is now available by anonymous FTP: Fusion-Reflection (Self-Supervised Learning) Brandyn Jerad Webb brandyn at brainstorm.com ABSTRACT By analyzing learning from the perspective of knowledge acquisition, a number of common limitations are overcome. Modeling efficacy is proposed as an empirical measure of knowledge, providing a concrete, mathematical means of "acquiring knowledge" via gradient ascent. A specific network architecture is described, a hierarchical analog of node-labeled Hidden Markov Models, and its evaluation and learning laws are derived. In empirical studies using a hand-printed character recognition task, an unsupervised network was able to discover n-gram statistics from groups of letter images, and to use these statistics to enhance its ability to later identify individual letters. Host: archive.cis.ohio-state.edu (128.146.8.52) Directory: pub/neuroprose Filename: webb.furf.ps.Z A version of this paper was submitted to NIPS in May '93. If there is sufficient interest, and if it wouldn't violate neuroprose etiquette, I could possibly make the C code available as well. -Brandyn (brandyn at brainstorm.com)  From mikewj at signal.dra.hmg.gb Fri Aug 20 12:00:02 1993 From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb) Date: Fri, 20 Aug 93 17:00:02 +0100 Subject: Practical Neural Nets Conference & Workshops in UK. Message-ID: AA16188@ravel.dra.hmg.gb *********************************** NEURAL COMPUTING APPLICATIONS FORUM *********************************** ***************************************** PRACTICAL APPLICATIONS OF NEURAL NETWORKS CALL FOR PRESENTATIONS ***************************************** The Neural Computing Applications Forum is the primary meeting place for people developing Neural Network applications in industry and academia. It has 200 members in the UK and Europe, from Universities and small and large companies, and holds four main meetings each year. It has been running for three years. Presentations, tutorials, and workshops are sought on all practical aspects of Neural Computing and Pattern Recognition. Previous events have included presentations and workshops on practical issues including machine health monitoring, neural control, financial prediction, chemical structure analysis, power station load prediction, copyright law, alternative energy, automatic speech recognition, and human-computer interaction. We also hold introductory tutorials and theoretical workshops on all aspects of Neural computing. Presentations at NCAF do not require a written paper for publication. You will have the chance to draw the attention of the top industrial Neural Network practitioners to your work. conference presenters of outstanding quality will be invited to submit a paper to the Springer Verlag journal Neural Computing and Applications. Please contact Mike Wynne-Jones, Programme Organiser, NCAF, PO Box 62, Malvern, WR14 4NU, UK, enclosing your proposed title and a brief synopsis of your presentation. Email: mikewj at signal.dra.hmg.gb; phone +44 684 563858.  From shashem at ecn.purdue.edu Sat Aug 21 18:08:11 1993 From: shashem at ecn.purdue.edu (Sherif Hashem) Date: Sat, 21 Aug 93 17:08:11 -0500 Subject: Combining (averaging) NNs Message-ID: <9308212208.AA18678@cornsilk.ecn.purdue.edu> I have recently joined Connectionists and I read some of the email messages arguing about combining/averaging NNs. Unfortunately, I missed the earlier discussion that started this argument. I am interested in combining NNs, in fact, my Ph.D. thesis is about optimal linear combinations of NNs. Averaging a number of estimators has been suggested/debated/examined in the literature for a long time, dating as far as 1818 (Laplace 1818). Clemen (1989) cites more than 200 papers in his review of the literature related to combining forecasts (estimators), including contributions from forecasting, psychology, statistics, and management science literatures. Numerous empirical studies have been conducted to assess the benefits/limitations of combining estimators (Clemen 1989). Besides, there are quite a few analytical results established in the area. Most of these studies and results are in the forecasting literature (more than 100 publications in the last 20 years). I think that it is fair to say that, as long as no "absolute" best estimator can be identified, combining estimators may provide a superior alternative to picking the best from a population of estimators. I have published some of my preliminary results on the benefits of combining NNs in (Hashem and Schmeiser 1992, 1993a, and Hashem et al. 1993b), and based on my experience with combining NNs, I join Michael Perrone in advocating the use of combining NNs to enhance the estimation accuracy of NN based models. Sherif Hashem email:shashem at ecn.purdue.edu References: ----------- Clemen, R.T. (1989). Combining Forecasts: A Review and Annotated Bibliography. International Journal of Forecasting, Vol. 5, pp. 559-583. Hashem, S., Y. Yih, & B. Schmeiser (1993b). An Efficient Model for Product Allocation using Optimal Combinations of Neural Networks. In Intelligent Engineering Systems through Artificial Neural Networks, Vol. 3, C. Dagli, L. Burke, B. Femandez, & J. Ghosh (Eds.), ASME Press, forthcoming. Hashem, S., & B. Schmeiser (1993a). Approximating a Function and its Derivatives using MSE-Optimal Linear Combinations of Trained Feedforward Neural Networks. Proceedings of the World Congress on Neural Networks, Lawrence Erlbaum Associates, New Jersey, Vol. 1, pp. 617-620. Hashem, S., & B. Schmeiser (1992). Improving Model Accuracy using Optimal Linear Combinations of Trained Neural Networks, Technical Report SMS92-16, School of Industrial Engineering, Purdue University. (Submitted) Laplace P.S. de. (1818). Deuxieme Supplement a la Theorie Analytique des Probabilites (Courcier, Paris).; reprinted (1847) in Oeuvers Completes de Laplace, Vol. 7 (Paris, Gauthier-Villars) 531-580.  From furu at uchikawa.nuem.nagoya-u.ac.jp Mon Aug 23 11:22:41 1993 From: furu at uchikawa.nuem.nagoya-u.ac.jp (Takeshi Furuhashi) Date: Mon, 23 Aug 93 11:22:41 JST Subject: Call for Papers of WWW Message-ID: <9308230222.AA00124@cancer.uchikawa.nuem.nagoya-u.ac.jp> CALL FOR PAPERS TENTATIVE 1994 IEEE/Nagoya University World Wisemen/women Workshop(WWW) ON FUZZY LOGIC AND NEURAL NETWORKS/GENETIC ALGORITHMS -Architecture and Applications for Knowledge Acquisition/Adaptation- August 9 and 10, 1994 Nagoya University Symposion Chikusa-ku, Nagoya, JAPAN Sponsored by Nagoya University Co-sponsored by IEEE Industrial Electronics Society Technically Co-sponsored by IEEE Neural Network Council IEEE Robotics and Automation Society International Fuzzy Systems Association Japan Society for Fuzzy Theory and Systems North American Fuzzy Information Processing Society Society of Instrument and Control Engineers Robotics Society of Japan There are growing interests in combination technologies of fuzzy logic and neural networks, fuzzy logic and genetic algorithm for acquisition of experts' knowledge, modeling of nonlinear systems, realizing adaptive systems. The goal of the 1994 IEEE/Nagoya University WWW on Fuzzy Logic and Neural Networks/Genetic Algorithm is to give its attendees opportunities to exchange information and ideas on various aspects of the Combination Technologies and to stimulate and inspire pioneering works in this area. To keep the quality of these workshop high, only a limited number of people are accepted as participants of the workshops. The papers presented at the workshop will be edited and published from the Oxford University Press. TOPICS: Combination of Fuzzy Logic and Neural Networks, Combination of Fuzzy Logic and Genetic Algorithm, Learning and Adaptation, Knowledge Acquisition, Modeling, Human Machine Interface IMPORTANT DATES: Submission of Abstracts of Papers : April 31, 1994 Acceptance Notification : May 31, 1994 Final Manuscript : July 1, 1994 A partial or full assistance of travel expenses for speakers of excellent papers will be provided by the WWW. The candidates should apply as soon as possible, preferably by Jan. 30, '94 All correspondence and submission of papers should be sent to Takeshi Furuhashi, General Chair Dept. of Information Electronics, Nagoya University Furo-cho, Chikusa-ku, Nagoya 464-01, JAPAN TEL: +81-52-781-5111 ext.2792 FAX: +81-52-781-9263 E mail: furu at uchikawa.nuem.nagoya-u.ac.jp IEEE/Nagoya University WWW: IEEE/Nagoya University WWW(World Wisemen/women Workshop) is a series of workshops sponsored by Nagoya University and co-sponsored by IEEE Industrial Electronics Society. City of Naoya, located two hours away from Tokyo, has many electro-mechanical industries in its surroundings such as Mitsubishi, TOYOTA, and their allied companies. Nagoya is a mecca of robotics industries, machine industries and aerospace industries in Japan. The series of workshops will give its attendees opportunities to exchange information on advanced sciences and technologies and to visit industries and research institutes in this area. *This workshop will be held just after the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing(IIZUKA'94) from Aug. 1 to 7, '94. WORKSHOP ORGANIZATION Honorary Chair: Tetsuo Fujimoto (Dean, School of Engineering, Nagoya University) General Chair: Takeshi Furuhashi (Nagoya University) Advisory Committee: Chair: Toshio Fukuda (Nagoya University) Fumio Harashima (University of Tokyo) Yoshiki Uchikawa (Nagoya University) Takeshi Yamakawa (Kyushu Institute of Technology) Steering Committee: H.Berenji (NASA Ames Research Center) W.Eppler (University of Karlsruhe) I.Hayashi (Hannan University) Y.Hayashi (Ibaraki University) H.Ichihashi (Osaka Prefectural University) A.Imura (Laboratory for International Fuzzy Engineering) M.Jordan (Massachusetts Institute of Technology) C.-C.Jou (National Chiao Tung Universtiy) E.Khan (National Semiconductor) R.Langari (Texas A & M University) H.Takagi (Matsushita Electric Industrial Co., Ltd.) K.Tanaka (Kanazawa University) M.Valenzuela-Rendon (Institute Tecnologico y de Estudios Superiores de Monterrey) L.-X.Wang (University of California Berkeley) T.Yamaguchi (Utsunomiya University) J.Yen (Texas A & M Universtiy)  From joachim at fit.qut.edu.au Wed Aug 25 21:46:11 1993 From: joachim at fit.qut.edu.au (Joachim Diederich) Date: Wed, 25 Aug 1993 21:46:11 -0400 Subject: Second Brisbane Neural Network Workshop Message-ID: <199308260146.VAA09819@fitmail.fit.qut.edu.au> Second Brisbane Neural Network Workshop --------------------------------------- Queensland University of Technology Brisbane Q 4001, AUSTRALIA Gardens Point Campus, ITE 410 24 September 1993 This Second Brisbane Neural Network Workshop is intended to bring together those interested in neurocomputing and neural network applications. The objective of the workshop is to provide a discussion platform for researchers and practitioners interested in theoretical and applied aspects of neurocomputing. The workshop should be of interest to computer scientists and engineers, as well as to biologists, cognitive scientists and others interested in the application of neural networks. The Second Brisbane Neural Network Workshop will be held at Queensland University of Technology, Gardens Point Campus (ITE 410) on September 24, 1993 from 9:00am to 6:00pm. Program ------- 9:00-9:15 Welcome Joachim Diederich, Queensland University of Technology, Neurocomputing Research Concentration Area Cognitive Science ----------------- 9:15-10:00 Graeme Halford, University of Queensland, Department of Psychology "Representation of concepts in PDP models" 10:00-10:30 Joachim Diederich, Queensland University of Technology, Neurocomputing Research Concentration Area "Re-learning in connectionist semantic networks" 10:30-11:00 Coffee Break 11:00-11:30 James Hogan, Queensland University of Technology, Neurocomputing Research Concentration Area "Recruitment learning in randomly connected neural networks" 11:30-12:00 Kate Stevens, University of Queensland, Department of Psychology "Music perception and neural network modelling" 12:00-1:00 Lunch Break 1:00-1:30 Software Demonstration: "Animal breeding advice using neural networks" Learning -------- 1:30-2:15 Tom Downs, University of Queensland, Department of Electrical Engineering "Generalisation, structure and learning in artificial neural networks" 2:15-3:00 Ah Chung Tsoi, University of Queensland, Department of Electrical Engineering "Training algorithms for recurrent neural networks, a unified framework" 3:00-3:30 Steven Young, University of Queensland, Department of Electrical Engineering "Constructive algorithms for neural networks" 3:30-4:00 Coffee Break Pattern Recognition and Control ------------------------------- 4:00-4:30 Gerald Finn, Queensland University of Technology, Neurocomputing Research Concentration Area "Learning fuzzy rules by genetic algorithms" 4:30-5:00 Paul Hannah & Russel Stonier, University of Central Queensland, Department of Mathematics and Computing "Using a modified Kohonen associative map for function approximation with application to control" Theory and Artificial Intelligence ---------------------------------- 5:00-5:30 M. Mohammadian, X. Yu & J.D. Smith, University of Central Queensland, Department of Mathematics and Computing "From connectionist learning to an optimised fuzzy knowledge base" 5:30-6:00 Richard Bonner & Louis Sanzogni, Griffith University, School of Information Systems & Management Science "Embedded neural networks" All are welcome. Participation is free and there is no registration. Enquiries should be sent to Professor Joachim Diederich Neurocomputing Research Concentration Area School of Computing Science Queensland University of Technology GPO Box 2434 Brisbane Q 4001 Australia Phone: +61 7 864-2143 Fax: +61 7 864-1801 Email: joachim at fitmail.fit.qut.edu.au  From sims at pdesds1.scra.org Thu Aug 26 11:48:04 1993 From: sims at pdesds1.scra.org (Jim Sims) Date: Thu, 26 Aug 93 11:48:04 EDT Subject: fyi, late, but better than never Message-ID: <9308261548.AA07086@pdesds1.noname> I saw this while browsing the electronic CBD materils. Agency : NAS Deadline : 12/01/93 Title : Neurolab Reference: Commerce Business Dailly, 07/06/93 BASIC RESEARCH OPPORTUNITY SOL OA SLS-4 POC Dr. Frank Sulzman tel: 202/358-2359 The National Aeronautics and Space Administration (NASA), along with its domestic (NIH, NSF) and international (CNES, CSA, DARA, ESA, NASDA) partners is soliciting proposals for Neurolab, a Space Shuttle mission dedicated to brain and behavior research that is scheduled for launch in 1998. A more detailed description of the opportunity with specific guidelines for proposal preparation is available from Neurolab Program Scientist, NASA Headquarters, Code UL, 300 E St., SW, Washington, DC 20546. This NASA Announcement of Opportunity will be open for the period through December 1, 1993. (0182) SPONSOR: NASA Headquarters, Code UL/Neurolab Program Scientist, Washington, DC 20546 Attn:UL/Dr. Frank Sulzman  From PIURI at IPMEL1.POLIMI.IT Fri Aug 27 07:55:19 1993 From: PIURI at IPMEL1.POLIMI.IT (PIURI@IPMEL1.POLIMI.IT) Date: 27 Aug 1993 12:55:19 +0100 (MET) Subject: call for papers Message-ID: <01H28OFZ9KS291WC7T@icil64.cilea.it> ============================================================================= 14th IMACS WORLD CONGRESS ON COMPUTATION AND APPLIED MATHEMATICS July 11-15, 1994 Atlanta, Georgia, USA Sponsored by: IMACS - International Association for Mathematics and Computers in Simulation IFAC - International Federation for Automatic Control IFIP - International Federation for Information Processing IFORS - International Federation of Operational Research Societies IMEKO - International Measurement Confederation General Chairman: Prof. W.F. Ames Georgia Institute of Technology, Atlanta, GA, USA SESSIONS ON NEURAL NETWORKS 1. NEURAL NETWORK ARCHITECTURES AND IMPLEMENTATIONS 2. APPLICATION OF NEURAL TECHNIQUES FOR SIGNAL AND IMAGE PROCESSING >>>>>> CALL FOR PAPERS <<<<<< The IMACS World Congress on Computation and Applied Mathematics is held every three year to provide a large general forum to professionals and scientists for analyzing and discussing the fundamental advances of research in all areas of scientific computation, applied mathematics, mathematical modelling, and system simulation in and for specific disciplines, the philosophical aspects, and the impact on society and on disciplinary and interdisciplinary research. In the 14th edition, two sessions are planned on neural networks: "Neural Network Architectures and Implementations" and "Application of Neural Techniques for Signal and Image Processing". The first session will focus on all theoretical and practical aspects of architectural design and realization of neural networks: from mathematical analysis and modelling to behavioral specification, from architectural definition to structural design, from VLSI implementation to software emulation, from design simulation at any abstraction level to CAD tools for neural design, simulation and evaluation. The second session will present the concepts, the design and the use of neural solutions within the area of signal and image processing, e.g., for modelling, identification, analysis, classification, recognition, and filtering. Particular emphasis will be given to presentation of specific applications or application areas. Authors interested in the above neural sessions are invited to send a one page abstract, the title of the paper and the author's address by electronic mail, fax or postal mail to the Neural Sessions' Chairman by October 15, 1993. Authors must then submit five copies of their typed manuscript by postal mail or fax to the Neural Sessions' Chairman by November 19, 1993. Preliminary notification of acceptance/rejection will be mailed by November 30, 1993. Final acceptance/rejection will be mailed by January 31, 1994. Neural Sessions' Chairman: Prof. Vincenzo Piuri Department of Electronics and Information Politecnico di Milano piazza L. da Vinci 32 I-20133 Milano, Italy phone no. +39-2-23993606, +39-2-23993623 fax no. +39-2-23993411 e-mail piuri at ipmel1.polimi.it =============================================================================  From goodman at unr.edu Thu Aug 26 12:35:53 1993 From: goodman at unr.edu (Phil Goodman) Date: Thu, 26 Aug 93 16:35:53 GMT Subject: NevProp 1.16 Update Available Message-ID: <9308262335.AA24854@equinox.ccs.unr.edu> Please consider the following update announcement: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * NevProp 1.16 corrects a bug in the output range of symmetric sigmoids and one occuring when the number of testing is fewer than training cases. These fixes are further described in the README.CHANGES file at the UNR anonymous ftp, described below. The UNR anonymous ftp host is 'unssun.scs.unr.edu', and the files are in the directory 'pub/goodman/nevpropdir'. Version 1.15 users can update 3 ways: a. Just re-ftp the 'nevprop1.16.shar' file and unpack and 'make' np again. (also available at the CMU machine, describe below.) b. Just re-ftp (in "binary" mode) the DOS or MAC executable binaries located in the 'dosdir' or 'macdir' subdirectories, respectively. c. Ftp only the 'np.c' file provided, replacing your old version, then 'make' d. Ftp only the 'np-patchfile', then issue the command 'patch < np-patchfile' to locally update np.c, then 'make' again. New users can obtain NevProp 1.16 from the anonymous UNR anonymous ftp as described in (a) or (b) above, or from the CMU machine: a. Create an FTP connection from wherever you are to machine "ftp.cs.cmu.edu". The internet address of this machine is 128.2.206.173, for those who need it. b. Log in as user "anonymous" with your own ID as password. You may see an error message that says "filenames may not have /.. in them" or something like that. Just ignore it. c. Change remote directory to "/afs/cs/project/connect/code". NOTE: You must do this in a single operation. Some of the super directories on this path are protected against outside users. d. At this point FTP should be able to get a listing of files in this directory with "dir" & fetch the ones you want with "get". (The exact FTP commands depend on your local FTP server.) Version 1.2 will be released soon. A major new feature will be the option of using cross-entropy rather than least squares error function. Phil ___________________________ ___________________________ Phil Goodman,MD,MS goodman at unr.edu | __\ | _ \ | \/ || _ \ Associate Professor & CBMR Director || ||_// ||\ /||||_// Cardiovascular Studies Team Leader || | _( || \/ ||| _( ||__ ||_\\ || |||| \\ CENTER for BIOMEDICAL MODELING RESEARCH |___/ |___/ || |||| \\ University of Nevada School of Medicine Washoe Medical Center H1-166, 77 Pringle Way, Reno, NV 89520 702-328-4867 FAX:328-4111 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  From heiniw at sun1.eeb.ele.tue.nl Fri Aug 27 08:37:13 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Fri, 27 Aug 1993 14:37:13 +0200 (MET DST) Subject: Neural hardware performance criteria Message-ID: <9308271237.AA00409@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 1296 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/c47cdf08/attachment.ksh From alex at brain.physics.swin.oz.au Sat Aug 28 07:05:34 1993 From: alex at brain.physics.swin.oz.au (Alex A Sergejew) Date: Sat, 28 Aug 93 21:05:34 +1000 Subject: Pan Pacific Conf on Brain Electric Topography - 1st announcement Message-ID: <9308281105.AA12138@brain.physics.swin.oz.au> FIRST ANNOUNCEMENT PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY February 10 - 12, 1994 SYDNEY, AUSTRALIA INVITATION Brain electric and magnetic topography is an exciting emerging area which draws on the disciplines of neurophysiology, physics, signal processing, computing and cognitive neuroscience. This conference will offer a forum for the presentation of recent findings. The program will include an outstanding series of plenary lectures, as well as platform and poster presentations by active participants in the field. The conference includes two major plenary sessions. In the Plenary Session entitled "Brain Activity Topography and Cognitive Processes," the keynote speakers include Frank Duffy (Boston), Alan Gevins (San Francisco), Steven Hillyard (La Jolla), Yoshihiko Koga (Tokyo) and Paul Nunez (New Orleans). Keynote speakers for the Plenary Session entitled "Brain Rhythmic Activity and States of Consciousness," will include Walter Freeman (Berkeley), Rodolfo Llinas (New York), Shigeaki Matsuoka (Kitakyushu) and Yuzo Yamaguchi (Osaka). The plenary sessions will provide a forum for discussion of some of the most recent developments of analysis and models of electrical brain function, and findings of brain topography and cognitive processes. This conference is aimed at harnessing multidisciplinary participation and will be of interest to those working in the areas of clinical neurophysiology, cognitive neuroscience, biological signal processing, neurophysiology, neurology, neuropsychology and neuropsychiatry. CALL FOR PAPERS Papers are invited for platform and poster presentation. Platform presentations will be allocated 20 minutes (15 mins for presentation and 5 mins for questions). Abstracts of no more than 300 words are invited. The deadline for receipt of abstracts is November 10th, 1993, while notification of acceptance of abstracts will be sent on December 10th, 1993 The abstract can be sent by mail, Fax or Email to: PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY C/- Cognitive Neuroscience Unit Westmead Hospital, Hawkesbury Road Westmead NSW 2145, Sydney AUSTRALIA Fax : +61 (2) 635 7734 Tel : +61 (2) 633 6688 Email : pan at brain.physics.swin.oz.au Authors may be invited to provide full manuscripts for publication of the proceedings in CD-ROM and book form. All authors wishing to have their papers included must supply a full manuscript at the time of the conference. GENERAL INFORMATION: Date: February 10 - 12, 1994 Venue: The conference will be held at the Hotel Intercontinental on Sydney Harbour. Climate: February is summertime in Australia and the average maximum day-time temperature in Sydney is 26 degC (78 degF). Social Programme: There will be a conference dinner on a yacht sailing Sydney Harbour on February 11th, 1994. Cost $A65 per person. Hotel Accommodation: Hotels listed offer a range of accommodation at special conference rates. Please quote the name of the conference when arranging your booking. Scientific Committee: Organising Committee: Prof Richard Silberstein, Melbourne (Chairman) E Gordon (Chairman) A/Prof Helen Beh, Sydney R Silberstein Dr Evian Gordon, Sydney J Restom Dr Shigeaki Matsuoka, Kitakyushu Dr Patricia Michie, Sydney Dr Ken Nagata, Akita Dr Alex Sergejew, Melbourne A/Prof James Wright, Auckland REGISTRATION: Name(Prof/Dr/Ms/Mr):__________________________________________________ Address:______________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Telephone: ______________________________ (include country/area code) Fax:______________________________ E Mail______________________________ On or before November 10th, 1993 $A380.00 After November 10th, 1993 $A400.00 Students before November 10th,1993 $A250.00 Conference Harbour Cruise Dinner $A65.00 per person number of people _____ Method of Payment: Cheque _ MasterCard _ VISA _ BankCard _ To be completed by credit card users only: Card Number _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Expiration Date __________________________ Signature __________________________ (Signature not required if registering by E-mail) Date __________________________ Cheques should be payable to "Pan Pacific Conference" (Address below) SOME SUGGESTIONS FOR HOTEL ACCOMODATION Special conference rates apply. Quote the name of the conference when booking. Prices are per double room per night SYDNEY RENAISSANCE HOTEL***** Guaranteed harbour view. 10 min walk under cover. $A170.00 30 Pitt St, Sydney NSW 2000, Australia. Ph: +61 (2) 259 7000 Fax +61 (2) 252 1999 HOTEL INTERCONTINENTAL SYDNEY***** Harbour view $A205.00 City View $A165.00 117 Macquarie Street, Sydney NSW 2000, Australia. Ph: +61 (2) 230 0200 Fax: +61 (2) 240 1240 OLD SYDNEY PARKROYAL**** 10 min walk. $A190.00 including breakfast 55 George St, Sydney NSW 2000, Australia. Ph: +61 (2) 252 0524 Fax: (2) +61 251 2093 RAMADA GRAND HOTEL, BONDI BEACH**** Complementary shuttlebus service. $A130 - $A170 including breakfast Beach Rd, Bondi Beach NSW 2026, Australia. Ph: +61 (2) 365 5666 Fax: +61 (2) 3655 330 HOTEL CRANBROOK INTERNATIONAL*** Older style, budget type accomodation overlooking Rose Bay. Free shuttlebus service and airport transfers. $A80.00 including breakfast 601 New South Head Rd, Rose Bay NSW 2020, Australia. Ph: +61 (2) 252 0524 Fax: +61 (2) 251 2093 Post registration details with your cheque to: PAN PACIFIC CONFERENCE ON ELECTRIC BRAIN TOPOGRAPHY C/- Cognitive Neuroscience Unit Westmead Hospital, Hawkesbury Road Westmead NSW 2145, Sydney AUSTRALIA  From taylor at world.std.com Sun Aug 29 22:21:27 1993 From: taylor at world.std.com (Russell R Leighton) Date: Sun, 29 Aug 1993 22:21:27 -0400 Subject: AM6 Users: release notes and bug fixes available Message-ID: <199308300221.AA27236@world.std.com> There has been an update to the am6.notes file at the AM6 ftp sites. User's not on the AM6 users mailing list should get this file and update their installation. Russ ======== REPOST OF AM6 RELEASE (long) ======== The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance ANSI C code implementations for modular backpropagation neural networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 6.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates an ANSI C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The previous version (Version 5.0) has over 600 registered installation sites world wide. The system has been ported to a number of platforms: Host platforms: convex_c2 /* Convex C2 */ convex_c3 /* Convex C3 */ cray_xmp /* Cray XMP */ cray_ymp /* Cray YMP */ cray_c90 /* Cray C90 */ dga_88k /* Data General Aviion w/88XXX */ ds_r3k /* Dec Station w/r3000 */ ds_alpha /* Dec Station w/alpha */ hp_parisc /* HP w/parisc */ pc_iX86_sysvr4 /* IBM pc 386/486 Unix SysVR4 */ pc_iX86_sysvr3 /* IBM pc 386/486 Interactive Unix SysVR3 */ ibm_rs6k /* IBM w/rs6000 */ news_68k /* News w/68XXX */ news_r3k /* News w/r3000 */ next_68k /* NeXT w/68XXX */ sgi_r3k /* Silicon Graphics w/r3000 */ sgi_r4k /* Silicon Graphics w/r4000 */ sun_sparc /* Sun w/sparc */ sun_68k /* Sun w/68XXX */ Coprocessors: mc_i860 /* Mercury w/i860 */ meiko_i860 /* Meiko w/i860 Computing Surface */ Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. New Features ------------ - ANSI C ( ANSI C compiler required! If you do not have an ANSI C compiler, a free (and very good) compiler called gcc is available by anonymous ftp from prep.ai.mit.edu (18.71.0.38). ) Gcc is what was used to develop am6 on Suns. - Autoregressive backprop has better stability constraints (see examples: ringing and sequence), very good for sequence recognition - File reader supports "caching" so you can use HUGE data files (larger than physical/virtual memory). - The "analyze" utility which aids the analysis of hidden unit behavior (see examples: sonar and characters) - More examples - More portable system configuration for easy installation on systems without a "config" file in distribution Aspirin 6.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. bayes: Approximating the optimal bayes decision surface for a gauss-gauss problem. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. wine: From the ``UCI Repository Of Machine Learning Databases and Domain Theories'' (ics.uci.edu: pub/machine-learning-databases). Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/720: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/720: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/720: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/720: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3 - Matlab - Mathematica - Xgobi Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3. Gnuplot3 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3 is available as the file gnuplot3.?.tar.Z >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.?.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis Analyze ------- "analyze" is a program inspired by Denis and Phillips' Nettools. The "analyze" program does PCA, CDA, projections, and histograms. It can read the same data file formats as are supported by "bpmake" simulations and output data in a variety of formats. Associated with this utility are shell scripts that implement data reduction and feature extraction. "analyze" can be used to understand how the hidden layers separate the data in order to optimize the network architecture. How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am6.tar.Z" 7. Get the file "am6.notes" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "ftp.cognet.ucla.edu" (128.97.8.19) (typically with the command "ftp ftp.cognet.ucla.edu") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "pub/alexis", by typing the command "cd pub/alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am6.tar.Z" 6. Get the file "am6.notes" Other sites ----------- If these sites do not work well for you, then try the archie internet mail server. Send email: To: archie at cs.mcgill.ca Subject: prog am6.tar.Z Archie will reply with a list of internet ftp sites that you can get the software from. How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am6.tar.Z | tar xvf - -or- uncompress am6.tar.Z ; tar xvf am6.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. Russell Leighton ^ / |\ /| INTERNET: taylor at world.std.com |-| / | | | | | / | | |  From sun at umiacs.UMD.EDU Mon Aug 30 13:11:10 1993 From: sun at umiacs.UMD.EDU (Guo-Zheng Sun) Date: Mon, 30 Aug 93 13:11:10 -0400 Subject: Preprint Message-ID: <9308301711.AA06031@sunsp2.umiacs.UMD.EDU> Reprint: THE NEURAL NETWORK PUSHDOWN AUTOMATON: MODEL, STACK AND LEARNING SIMULATIONS The following reprint is available via the NEC Research Institute ftp archive external.nj.nec.com. Instructions for retrieval from the archive follow the abstract summary. Comments and remarks are always appreciated. ----------------------------------------------------------------------------- .............................................................................. "The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations" G.Z. Sun(a,b), C.L. Giles(b,c), H.H. Chen(a,b), Y.C. Lee(a,b) (a) Laboratory for Plasma Research and (b) Institute for Advanced Computer Studies, U. of Maryland, College Park, MD 20742 (c) NEC Research Institute, 4 Independence Way, Princeton, NJ 08540 In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one obvious approach to enhancing the processing power of a recurrent neural network is to couple it with an external stack mem ory - in effect creating a neural network pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. In order to couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars - the balanced parenthesis language, 1n0n, and the deterministic Palindrome - the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings. UNIVERSITY OF MARYLAND TR NOs. UMIACS-TR-93-77 & CS-TR-3118, August 20, 1993. --------------------------------------------------------------------------- FTP INSTRUCTIONS unix> ftp external.nj.nec.com (138.15.10.100) Name: anonymous Password: (your_userid at your_site) ftp> cd pub/giles/papers ftp> binary ftp> get NNPDA.ps.Z ftp> quit unix> uncompress NNPDA.ps.Z (Please note that this is a 35 page paper.) -----------------------------------------------------------------------------  From biblio at nucleus.hut.fi Tue Aug 31 13:08:00 1993 From: biblio at nucleus.hut.fi (Bibliography) Date: Tue, 31 Aug 93 13:08:00 DST Subject: Kohonen maps & LVQ -- huge bibliography (and reference request) Message-ID: <9308311008.AA20054@nucleus.hut.fi.hut.fi> Hello, We are in the process of compiling the complete bibliography of works on Kohonen Self-Organizing Map and Learning Vector Quantization all over the world. Currently the bibliography contains more than 1000 entries. The bibliography is now available (in BibTeX and PostScript formats) by anonymous FTP from: cochlea.hut.fi:/pub/ref/references.bib.Z ( BibTeX file) cochlea.hut.fi:/pub/ref/references.ps.Z ( PostScript file) The above files are compressed. Please make sure you use "binary" mode when you transfer these files. Please send any additions and corrections to : biblio at cochlea.hut.fi Please follow the IEEE instructions of references (full names of authors, name of article, journal name, volume + number where applicable, first and last page number, year, etc.) and BibTeX-format, if possible. Yours, Jari Kangas Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C SF-02150 Espoo, FINLAND  From hicks at cs.titech.ac.jp Sun Aug 1 16:14:14 1993 From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp) Date: Sun, 1 Aug 93 16:14:14 JST Subject: Multiple Models, Committee of nets etc... In-Reply-To: "Michael P. Perrone"'s message of Thu, 29 Jul 93 03:27:21 EDT <9307290727.AA19084@cns.brown.edu> Message-ID: <9308010714.AA25751@maruko.cs.titech.ac.jp> Michael P. Perrone writes >Tom Dietterich write: >> This analysis predicts that using a committee of very diverse >> algorithms (i.e., having diverse approximation errors) would yield >> better performance (as long as the committee members are competent) >> than a committee made up of a single algorithm applied multiple times >> under slightly varying conditions. > >and David Wolpert writes: >>There is a good deal of heuristic and empirical evidence supporting >>this claim. In general, when using stacking to combine generalizers, >>one wants them to be as "orthogonal" as possible, as Tom maintains. > >One minor result from my thesis shows that when the estimators are >orthogonal in the sense that > > E[n_i(x)n_j(x)] = 0 for all i<>j > >where n_i(x) = f(x) - f_i(x), f(x) is the target function, f_i(x) is >the i-th estimator and the expected value is over the underlying >distribution; then the MSE of the average estimator goes like 1/N >times the average of the MSE of the estimators where N is the number >of estimators in the population. > >This is a shocking result because all we have to do to get arbitrarily >good performance is to increase the size of our estimator population! >Of course in practice, the nets are correlated and the result is no >longer true. The matrix E[n_i(x)n_j(x)] may not be known but an estimate E[n'_i(x)n'_j(x)] may be obtained using some training data which is different from the training data used to train the generalizers in the first place. Here n'_i(x) = f'(x) - f_i(x), E[n'_i(x)] = 0, f'(x) is a training data, f_i(x) is the i-th estimator, and the expected value is over the training data. Take the eigenvectors (with non-zero eigenvalues) of E[n'_i(x)n'_j(x)] and you have a set of generalizers (each a linear combination of the original generalizers) which are orthogonal and uncorrelated over the training data, i.e. E[n'_i(x)n'_j(x)] = 0 for all i<>j. They can even be normalized by their eigenvalues so that E[n'_i(x)n'_j(x)] = 1 for all i==j. To summarize, in practice the generalizers can be de-correlated (to the extent that they are linearly independent) by finding new generalizers composed of appropriate linear sums of the originals. I have an unrelated comment regarding Drucker Harris' earlier mail about using synthetic data to improve performance. Wouldn't it be true to say that if you had a choice between learning with N synthetically created data and learning with N novel training data that the latter is, on average, going to give better results? If so, then using synth data is a way to stretch your training data; something like potato helper.  From jim at hydra.maths.unsw.EDU.AU Sun Aug 1 19:51:14 1993 From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU) Date: Mon, 2 Aug 93 09:51:14 +1000 Subject: committees Message-ID: <9308012351.AA15492@hydra.maths.unsw.EDU.AU> A small caveat about when it is good to average different estimates of an unknown quantity: If you have a fairly accurate and a fairly inaccurate way of estimating something, it is obviously not good to take their simple average (that is, half of one plus half of the other). The correct weighting of the estimates is in inverse proportion to their variances (that is, keep closer to the more accurate one). (At least, that is the correct weighting if the estimates are independent: if they are correlated, it is more complicated, but not much more). Proofs are easy, and included in the ref below: R. Templeton & J. Franklin, `Adaptive information and animal behaviour', Evolutionary Theory 10 (Dec 1992): 145-155. (Note that this concerns inaccurate estimates, not biased ones, as some previous posters have been considering). Of course, averaging and correlations are very easy calculations for neural nets. Some similar ideas have been studied in connection with "sensor fusion" for robots: Journal of Robotic Systems 7 (3): (1990), Special issue on multisensor integration and fusion for intelligent robots, ed. R.C. Luo. Interesting work on how real committees combine information is reviewed in: D. Bunn & G. Wright, `Interaction of judgemental and statistical forecasting methods: issues and analysis', Management Science 37 (1991): 501. James Franklin School of Mathematics University of New South Wales  From mpp at cns.brown.edu Mon Aug 2 17:54:29 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Mon, 2 Aug 93 17:54:29 EDT Subject: Multiple Models, Committee of nets etc... Message-ID: <9308022154.AA00323@cns.brown.edu> Joydeep Ghosh writes: > in our experiments, the difference between simple averaging > and the best among other arbitration mechanisms does not > seem statistically significant, thus supporting Waibel and > Hampshire's observations. The combination of > networks trained on different feature vectors, on the other > hand leads to 15-25% reduction in errors on a very difficult data set. The result I discussed in a previous posting (that there is a 1/n relation between the MSE of the averaged estimator and the avg. population MSE) helps explain this result in the following terms: Averaging is more effective when the estimates are more distinct. Thus in the example that Joydeep gives, the fact that different features where used to generate different estimates suggests that those estimates will be distinct (unless the features carry the same information). also we have the advantage that using fewer features, we can use smaller nets which helps avoid problems like over-fitting and the curse of dimensionality. -Michael -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From mpp at cns.brown.edu Tue Aug 3 01:45:18 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Tue, 3 Aug 93 01:45:18 EDT Subject: Committees Message-ID: <9308030545.AA01131@cns.brown.edu> David Wolpert writes: -->Many of the results in the literature which appear to dispute this -->are simply due to use of an error function which is not restricted to -->being off-training set. In other words, there's always a "win" -->if you perform rationally on the training set (e.g., reproduce it -->exactly, when there's no noise), if your error function gives you -->points for performing rationally on the training set. In a certain -->sense, this is trivial, and what's really interesting is off-training -->set behavior. In any case, this automatic on-training set win is all -->those aforementioned results refer to; in particular, they imply essentially -->nothing concerning performance off of the training set. In the case of averaging for MSE optimization (the meat and potatoes of neural networks) and any other convex measure, the improvement due to averaging is independent of the distribution - on-training or off-. It depends only on the topology of the optimization measure. It is important to note that this result does NOT say the average is better than any individual estimate - only better than the average population performance. For example, if one had a reliable selection criterion for deciding which element of a population of estimators was the best and that estimator was better than the average estimator, then just choose the better one. (Going one step further, simply use the selection criterion to choose the best estimator from all possible weighted averages of the elements of the population.) As David Wolpert pointed out, any estimator can be confounded by a pathological data sample and therefore there doesn't exist a *guaranteed* method for deciding which estimator is the best from a population in all cases. Weak (as opposed to guaranteed) selection criteria exist in in the form of cross-validation (in all of its flavors). Coupling cross-validation with averaging is a good idea since one gets the best of both worlds particularly for problems with insufficient data. I think that another very interesting direction for research (as David Wolpert alluded to) is the investigation of more reliable selection criterion. -Michael  From bernasch at forwiss.tu-muenchen.de Tue Aug 3 03:41:45 1993 From: bernasch at forwiss.tu-muenchen.de (Jost Bernasch) Date: Tue, 3 Aug 1993 09:41:45 +0200 Subject: weighting of estimates In-Reply-To: jim@hydra.maths.unsw.EDU.AU's message of Mon, 2 Aug 93 09:51:14 +1000 <9308012351.AA15492@hydra.maths.unsw.EDU.AU> Message-ID: <9308030741.AA29386@forwiss.tu-muenchen.de> James Franklin writes: > If you have a fairly accurate and a fairly inaccurate way of estimating >something, it is obviously not good to take their simple average (that >is, half of one plus half of the other). The correct weighting of the >estimates is in inverse proportion to their variances (that is, keep >closer to the more accurate one). Of course this is the correct weighting. Since the 60s this is done very succesfully with the well-known "Kalman Filter". In this theory the optimal combination of knowledge sources is described and proofed in detail. See the original work @article{Kalman:60, AUTHOR = {R.E. Kalman}, TITLE = "A New Approach to Linear Filtering and Prdiction Problems.", VOLUME = 12, number = 1, PAGES = {35--45}, JOURNAL = "Trans. ASME, series D, J. Basic Eng.", YEAR = 1960 } some neural network literature concerning this subject @Article{WatanabeTzafestas:90, author = "Watanabe and Tzafestas", title = "Learning Algorithms for Neural Networks with the Kalman Filter", journal = JIRS, year = 1990, volume = 3, number = 4, pages = "305-319", keywords= "kalman, neural net" } @string{JIRS = {Journal of Intelligent and Robotic Systems}} and a very good and practice oriented book @book{Gelb:74, AUTHOR = "A. Gelb", TITLE = "Applied {O}ptimal {E}stimation", PUBLISHER = "{M.I.T} {P}ress, {C}ambridge, {M}assachusetts", YEAR = "1974" } (At least, that is the correct >weighting if the estimates are independent: if they are correlated, >it is more complicated, but not much more). Proofs are easy, and included >in the ref below: For proofs and extensions to non-linear filtering and correlated weights see the control theory literature. A lot of work is already done! -- Jost Jost Bernasch Bavarian Research Center for Knowledge-Based Systems Orleansstr. 34, D-81667 Muenchen , Germany bernasch at forwiss.tu-muenchen.de  From edelman at wisdom.weizmann.ac.il Tue Aug 3 16:23:11 1993 From: edelman at wisdom.weizmann.ac.il (Edelman Shimon) Date: Tue, 3 Aug 93 23:23:11 +0300 Subject: TR on representation with receptive fields available Message-ID: <9308032023.AA23457@wisdom.weizmann.ac.il> The following TR is available via anonymous ftp from eris.wisdom.weizmann.ac.il (132.76.80.53), as /pub/rfs-for-recog.ps.Z Representation with receptive fields: gearing up for recognition Weizmann Institute CS-TR 93-09 Yair Weiss and Shimon Edelman Abstract: Receptive fields are probably the most prominent and ubiquitous computational mechanism employed by biological information processing systems. We report an attempt to understand the representational capabilities of the kind of receptive fields found in mammalian vision motivated by the assumption that the successive stages of processing remap the retinal representation space in a manner that makes objectively similar stimuli (e.g., different views of the same 3D object) closer to each other, and dissimilar stimuli farther apart. We present theoretical analysis and computational experiments that compare the similarity between stimuli as they are represented at the successive levels of the processing hierarchy, from the retina to the nonlinear cortical units. Our results indicate that population-based codes do convey information that seems lost in the activities of the individual receptive fields, and that at the higher levels of the hierarchy objects may be represented in a form that is more useful for visual recognition. This finding may, therefore, explain the success of previous empirical approaches to object recognition that employed representation by localized receptive fields.  From jim at hydra.maths.unsw.EDU.AU Wed Aug 4 02:32:13 1993 From: jim at hydra.maths.unsw.EDU.AU (jim@hydra.maths.unsw.EDU.AU) Date: Wed, 4 Aug 93 16:32:13 +1000 Subject: weighting of estimates Message-ID: <9308040632.AA07933@hydra.maths.unsw.EDU.AU> bernasch at forwiss.tu-muenchen.de (Jost Bernasch) writes: >James Franklin writes: >> If you have a fairly accurate and a fairly inaccurate way of estimating >>something, it is obviously not good to take their simple average (that >>is, half of one plus half of the other). The correct weighting of the >>estimates is in inverse proportion to their variances (that is, keep >>closer to the more accurate one). > >Of course this is the correct weighting. Since the 60s this is done >very succesfully with the well-known "Kalman Filter". In this theory >the optimal combination of knowledge sources is described and >proofed in detail. Well, yes, in a way, but that's something like saying that the motion of your body can be derived from Einstein's equations of General Relativity. Too complicated. In particular, Kalman filters, and control theory generally, are about time-varying entities, and Kalman filters are an (essentially Bayesian) way of successively updating estimates of a (possibly time-varying) quantity (See R.J. Meinhold & N.D. Singpurwalla, `Understanding the Kalman filter', American Statistician 37 (1983): 123). The situation I was considering, and what is relevant to committees, is much simpler (hence more general): how to combine estimates (possibly correlated) of a single unknown quantity. James Franklin Mathematics University of New South Wales  From Graham.Lamont at newcastle.ac.uk Wed Aug 4 12:41:36 1993 From: Graham.Lamont at newcastle.ac.uk (Graham Lamont) Date: Wed, 4 Aug 93 12:41:36 BST Subject: multiple models, hybrid estimation Message-ID: When I emailled Wray Buntine about his original posting on the subject of multiple models, I quipped: `Shhh.... dont tell everyone, they'll all want one!' (a multiple model) Little did I know everyman and his dog appears to have one already:) The recent postings and especially Michael Perrone's recent contribution(s) have persuaded me to sketch the extent of my work in this area and donate a FREE piece of Mathematica code. I mention Michael's work because it follows the same basic approach of general least squares as mine, and I agree with many of the points that he raises in his general discussion of hybrid estimation, such as the need for a completely general method, the utility of a closed form solution, and his novel description of distinct local minima in functional space as opposed to parameter space. However..... he says that for his method (GEM): >> 7) The *optimal* parameters of the ensemble estimator are given in closed >> form. I present a method in the same general spirit of Michael's that is slightly more optimal and general (and I am not claiming even this is the best!). It is based on the unconstrained least squares of the estimator poulation "design matrix" via SVD. 1 Generality: The technique utilises singular value decomposition (SVD), and hence avoids the problem of collinearity between estimators that can (and often does) occur in a population of estimators as mentioned by Michael. SVD happily copes with highly collinear or even duplicate estimators in the design matrix, without preprocessing/thresholding. 2 Optimality: The technique places no constraint on the value of the weights (MP [1] has sum=1 and also in the results he presents all w are 0 This seemed relevant. Please excuse the bandwidth if it's not. jim From: IN%"DFP10 at ALBANY.ALBANY.EDU" "Donald F. Parsons MD" 3-AUG-1993 05:38:36.51 To: IN%"hspnet-l at albnydh2.bitnet" "Rural Hospital Consulting Network" CC: Subj: Call for Papers: AIM-94 Spring Symposium ----------------------------Original message---------------------------- Call for Papers AAAI 1994 Spring Symposium: Artificial Intelligence in Medicine: Interpreting Clinical Data (March 21-23, 1994, Stanford University, Stanford, CA) The deployment of on-line clinical databases, many supplanting the traditional role of the paper patient chart, has increased rapidly over the past decade. The consequent explosion in the quality and volume of available clinical data, along with an ever more stringent medicolegal obligation to remain aware of all implications of these data, has created a substantial burden for the clinician. The challenge of providing intelligent tools to help clinicians monitor patient clinical courses, forecast likely prognoses, and discover new relational knowledge, is at least as large as that generated by the knowledge explosion which motivated earlier efforts in Artificial Intelligence in Medicine (AIM). Whereas many of the pioneering programs worked on small data sets which were entered interactively by knowledge engineers or clinicians, the current generation of programs have to act on raw data, unfiltered and unmediated by human beings. Interaction with human users typically only occurs on demand or on detection of clinically significant events. The emphasis of this symposium will be on methodologies that provide robust autonomous performance in data-rich clinical environments ranging from busy outpatient practices to operating rooms and intensive care units. Relevant topics include intelligent alarming (including anticipation and prevention of adverse clinical events), data abstraction, sensor validation, preliminary event classification, therapy advice, critiquing, and assistance in the establishment and execution of clinical treatment protocols. Detection of temporal and geographical patterns of disease manifestations and machine learning of clinical patterns are also of interest. Organizing committee Serdar Uckun, Co-chair (Stanford University) Isaac Kohane, Co-chair (Harvard Medical School) Enrico Coiera (Hewlett-Packard Laboratories/Bristol) Ramesh Patil (USC/Information Sciences Institute) Mario Stefanelli (Universita di Pavia) Format A large data sample will be made available to participants to serve as training and test sets for various approaches to information management and to provide a common domain of discourse. The sample will consist of two data sets: * A dense, high volume data set typical of a critical care environment. This data set will consist of hemodynamic measurements, mechanical ventilator settings, laboratory values including arterial blood gas measurements, and treatment information covering a 12-hour period of a patient with severe respiratory distress. Monitored parameters (10-15 channels of data) will be sampled and recorded at rates up to 1/10 Hz. The data set will be annotated with other clinically relevant data, physician's interpretations, and established diagnoses. * A large number of sparse data sets representative of outpatient environments. The data will include laboratory measurements, treatment information, and physical findings on a large sample of patients (50 to 100 patients) taken from the same disorder population. Each patient record will consist of several weeks' or months' worth of clinical information sampled at irregular intervals. Most of the cases will be made available to interested researchers to be used as training cases. For interested parties, a small percentage of cases will be made available two weeks prior to the symposium to be used as an optional testing set for various approaches. The data samples and accompanying clinical information will be available via ftp or e-mail server around August 15, 1993. Please contact the organizers at the addresses below for further information. The data will also be made available on diskettes to participants who do not have Internet access. It will be left to the discretion of the participants to use any subset of these samples to help focus their approaches and presentations. The data can also be used as test vehicles for their own research and to create sample programs for demonstration at the symposium. Participants do not have to use the data in order to participate. However, the program committee will favor presentations which exploit the provided data sets in their analyses. Submission process Potential participants are invited to submit abstracts no longer than 2 pages (< 1200 words) by October 15, 1993. The abstracts should outline methodology and indicate, if applicable, how the provided data may be used as a proof-of-principle for the discussed methodology. Electronic submissions are encouraged. The abstracts may be sent to in ASCII, RTF, or PostScript formats. Authors of accepted abstracts will be asked to submit a working paper by January 31, 1994. They will also be asked to prepare either a poster or an oral presentation. Submissions by mail Use this method ONLY IF you cannot submit an abstract electronically. Fax submissions will not be accepted. Send 6 copies of the abstract to: Serdar Uckun, MD, PhD Co-chair, AIM-94 Knowledge Systems Laboratory Stanford University 701 Welch Road, Bldg. C Palo Alto, CA 94304 U.S.A. Phone: [+1] (415) 723-1915 Calendar Abstracts due: October 15, 1993 Notification of authors by: November 15, 1993 Working papers due: January 31, 1994 Spring Symposium: March 21-23, 1994 Information For further information, please contact the co-chairs at the address above or (preferably) via e-mail at:  From hicks at cs.titech.ac.jp Thu Aug 5 11:33:00 1993 From: hicks at cs.titech.ac.jp (hicks@cs.titech.ac.jp) Date: Thu, 5 Aug 93 11:33:00 JST Subject: weighting of estimates In-Reply-To: Jost Bernasch's message of Tue, 3 Aug 1993 09:41:45 +0200 <9308030741.AA29386@forwiss.tu-muenchen.de> Message-ID: <9308050233.AA29633@maruko.cs.titech.ac.jp> Jost Bernasch writes: > >James Franklin writes: > > If you have a fairly accurate and a fairly inaccurate way of estimating > >something, it is obviously not good to take their simple average (that > >is, half of one plus half of the other). The correct weighting of the > >estimates is in inverse proportion to their variances (that is, keep > >closer to the more accurate one). > >Of course this is the correct weighting. Since the 60s this is done >very succesfully with the well-known "Kalman Filter". In this theory >the optimal combination of knowledge sources is described and >proofed in detail. > > (At least, that is the correct > >weighting if the estimates are independent: if they are correlated, > >it is more complicated, but not much more). Proofs are easy, and included > >in the ref below: > >For proofs and extensions to non-linear filtering and correlated >weights see the control theory literature. A lot of work is already >done! I think the comments about the Kalman filter are a bit off the mark. The Kalman filter is based on the mathematics of conditional expectation. However, the Kalman filter is designed to be used for time series. What makes the Kalman filter particularly useful is its recursive nature; a stream of observations may be processed (often in real time) to produce a stream of current estimates (or next estimates if you're trying to beat the stock market). Committees of networks may also use conditional expectation, but combining networks is not the same as processing time series of data. I think it is appropriate at this point to bring up 2 classical results concerning probability theory, conditional expectation, and wide sense conditional expectation. (Wide sense conditional expectation uses the same formulas as conditional expectation. "Wide sense' merely serves to emphasize that the distribution is not assumed to be normal. 'Conditional expectation' is used in the case where the underlying distribution is assumed to be normal.) (1) When the objective function is to minimize the mean squared error over the training data, the wide sense conditional expectation is the best linear predictor, regardless of the original distribution. (2) If the original distribution is normal, and the objective function is to minimize the MSE over the >entire< distribution, (both on-training and off-training), then the conditional expectation is the best predictor, linear or otherwise. There are 3 important factors here. [1]: Underlying distribution (of network outputs): normal? not normal? [2]: Objective function (assume MSE): on-training? off-training? [3]: Predictor: linear? non-linear? {1} [1:normal] => [2:off-training],[3:linear] Neural nets (as opposed to systolic arrays) are needed because the world is full of non-normal distributions. But that doesn't mean that the ouputs of non-linear networks don't have joint normal distributions (over off-training data). Perhaps the non-linearities have been successfully ironed out by the non-linear networks, leaving only linear (or nearly linear) errors to be corrected. In that case we can refer to result (2) to build the optimal off-training predictor for the given committee of networks. {2} [1:not normal] and [2:on-training] and [3:linear] => best predictor is WSE. If the distribution of network outputs is not normal, and we use an on-training criterion, then by virtue of (1), the best linear predictor is the wide sense conditional expectation. {3} [1:not normal] and [2:off-training] and [3:non-linear] => research It is the case in {2} that since [1:not normal], <1> better on-training results may be obtained using some non-linear predictor <2> better on-or-off-training results may be obtained using some different criterion <3> <1> and <2> together. The problem is of course to find such criterion and non-linear predictors. The existence of a priori knowledge can play an important role here; for example adding a term to penelize the complexity of output functions. In conclusion, if {1} is true, that is the networks have captured the non-linearities and the network outputs are joint normal (or nearly normal) distributions, we're home free. Otherwise we ought to think about {3}, non-linear predictors and alternative criterion. {2}, using the WSE, the best performing linear predictor over the MSE of the on-training data, is useful to get the job done, but is only optimal in a limited sense. Craig Hicks hicks at cs.titech.ac.jp Ogawa Laboratory, Dept. of Computer Science Tokyo Institute of Technology, Tokyo, Japan lab: 03-3726-1111 ext. 2190 home: 03-3785-1974 fax: +81(3)3729-0685 (from abroad), 03-3729-0685 (from Japan)  From mpp at cns.brown.edu Thu Aug 5 15:13:44 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Thu, 5 Aug 93 15:13:44 EDT Subject: committees Message-ID: <9308051913.AA13266@cns.brown.edu> Scott Farrar writes: -->John Hampshire characterized a committee as a collection of biased -->estimators; the idea being that a collection of many different kinds of -->bias might constitute a unbiased estimator. I was wondering if anyone -->had any ideas about how this might be related to, supported by, or refuted -->by the Central Limit Theorem. Could experimental variances or confounds -->be likened to "biases", and if so, do these "average out" in a manner which -->can give us a useful mean or useful estimator? I think that this is a very interesting point because, for averaging with MSE optimization, it is possible to show using the strong law of large numbers that the bias of the average estimator converges to the expected bias of any individual estimator while the variance converges to zero. Thus the only way to cancel existing bias using averaging is to average two (or more) different populations from two (or more) estimators which are (somehow) known to have complementary bias. The trick is of course the "somehow"... Any ideas? -Michael -------------------------------------------------------------------------------- Michael P. Perrone Email: mpp at cns.brown.edu Institute for Brain and Neural Systems Tel: 401-863-3920 Brown University Fax: 401-863-3934 Providence, RI 02912  From wray at ptolemy.arc.nasa.gov Thu Aug 5 19:37:42 1993 From: wray at ptolemy.arc.nasa.gov (Wray Buntine) Date: Thu, 5 Aug 93 16:37:42 PDT Subject: committees In-Reply-To: "Michael P. Perrone"'s message of Thu, 5 Aug 93 15:13:44 EDT <9308051913.AA13266@cns.brown.edu> Message-ID: <9308052337.AA04745@ptolemy.arc.nasa.gov> I'm not convinced that the notion of an "unbiased estimator" is useful here. It comes from classical statistics and is really a means of justifying the choice of an estimator for lack of better ideas. An estimator is "unbiased" if the average of the estimator based on all the other samples which we might have seen (but didn't) is equal to the "truth". Notice that unbiased estimators and the use of Occam's razor conflict. We all routinely throw away an "unbiased" neural network, i.e. the best fitting network, in favor of a smoother, simpler network, i.e. by early stopping, weight decay, ...., which is very clearly "biased". So I think its a great thing to be biased. One reason for averaging is because we have several quite different biased networks that we think are reasonable, so like any good gambler, we hedge our bets. Of course, averaging is also standard Bayesian practice, i.e. an obvious result of the mathematics. ---------- Wray Buntine NASA Ames Research Center phone: (415) 604 3389 Mail Stop 269-2 fax: (415) 604 3594 Moffett Field, CA, 94035-1000 email: wray at kronos.arc.nasa.gov  From cohn at psyche.mit.edu Fri Aug 6 11:28:11 1993 From: cohn at psyche.mit.edu (David Cohn) Date: Fri, 6 Aug 93 11:28:11 EDT Subject: Call for Participation: Workshop on Exploration Message-ID: <9308061528.AA06177@psyche.mit.edu> I am helping organize the following one-day workshop during the post-NIPS workshops in Vail, Colorado, on December 3, 1993. We would like to hear from people interested in participating in the workshop, either formally, as a presenter, or informally, as an attendee. Even if you will not be able to attend, if you have work which you feel is relevant, and would like to see discussed, please contact me at the email address below. Given the limited time available, we will not be able to present *every* approach, but we hope to cover a broad range of approaches, both in formal presentations, and in informal discussion, Many thanks in advance, -David Cohn (cohn at psyche.mit.edu) ====================== begin workshop announcement ===================== Robot Learning II: Exploration and Continuous Domains A NIPS '93 Workshop David Cohn Dept. of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02138 cohn at psyche.mit.edu The goal of this one-day workshop will be to provide a forum for researchers active in the area of robot learning and related fields. Due to the limited time available, we will focus on two major issues: efficient exploration of a learner's state space, and learning in continuous domains. Robot learning is characterized by sensor noise, control error, dynamically changing environments and the opportunity for learning by experimentation. A number of approaches, such as Q-learning, have shown great practical utility learning under these difficult conditions. However, these approaches have only been proven to converge to a solution if all states of a system are visited infinitely often. What has yet to be determined is whether we can efficiently explore a state space so that we can learn without having to visit every state an infinite number of times, and how we are to address problems on continuous domains, where there are effectively an infinite number of states to be visited. This workshop is intended to serve as a followup to last year's post-NIPS workshop on robot learning. The two problems to be addressed this year were identified as two (of the many) crucial issues facing the field. The morning session of the workshop will consist of short presentations discussing theoretical approaches to exploration and to learning in continuous domains, followed by general discussion guided by a moderator. The afternoon session will center on practical and/or heuristic approaches to these problems in the same format. As time permits, we may also attempt to create an updated "Where do we go from here?" list, like that drawn up in last year's workshop. The targeted audience for the workshop are those researchers who are interested in robot learning, exploration, or active learning in general. We expect to draw an eclectic audience, so every attempt will be made to ensure that presentations are accessible to people without any specific background in the field.  From sontag at control.rutgers.edu Fri Aug 6 17:35:16 1993 From: sontag at control.rutgers.edu (Eduardo Sontag) Date: Fri, 6 Aug 93 17:35:16 EDT Subject: Expository Tech Report on Neural Nets Available by FTP Message-ID: <9308062135.AA06104@control.rutgers.edu> As notes for a short course given at the 1993 European Control Conference this summer, I prepared an expository introduction to two related topics: 1. Some mathematical results on "neural networks". 2. "Neurocontrol" and "learning control". The choice of topics was heavily influenced by my interests, but some readers may still find the material useful. The two parts are essentially independent. In particular, the part on mathematical results does not require any knowledge of (nor interest in) control theory. An *extended* version of the paper which appeared in the conference proceedings is now available as a tech report. This report, in postscript form, can be obtained by anonymous FTP. Retrieval instructions are as follows: yourhost> ftp siemens.com Connected to siemens.com. 220 siemens FTP server (SunOS 4.1) ready. Name (siemens.com:sontag): anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> cd pub/learning/TechReports 250 CWD command successful. ftp> bin 200 Type set to I. ftp> get Sontag9302.ps.Z 200 PORT command successful. 150 Binary data connection for Sontag9302.ps.Z (128.6.62.9,1600) (114253 bytes) 226 Binary Transfer complete. local: Sontag9302.ps.Z remote: Sontag9302.ps.Z 114253 bytes received in 24 seconds (4.6 Kbytes/s) ftp> quit 221 Goodbye. yourhost> uncompress Sontag9302.ps.Z yourhost> lpr Sontag9302.ps (or however you print PostScript) ****** Please note: I am not able to send hardcopy. ****** -- Eduardo D. Sontag  From liaw%dylink.usc.edu at usc.edu Fri Aug 6 18:45:39 1993 From: liaw%dylink.usc.edu at usc.edu (Jim Liaw) Date: Fri, 6 Aug 93 15:45:39 PDT Subject: Workshop on Neural Architectures and Distributed AI Message-ID: <9308062245.AA23804@dylink.usc.edu> Please note the change in deadline of submission of abstracts. ------ The Center for Neural Engineering University of Southern California announces a Workshop on Neural Architectures and Distributed AI: >From Schema Assemblages to Neural Networks October 19-20, 1993 [This Workshop was previously scheduled for April 1993] Program Committee: Michael Arbib (Organizer), George Bekey, Damian Lyons, Paul Rosenbloom, and Ron Sun To design complex technological systems, we need a multilevel methodology which combines a coarse- grain analysis of cooperative or distributed computation (we shall refer to the computing agents at this level as "schemas") with a fine-grain model of flexible, adaptive computation (for which neural networks provide a powerful general paradigm). Schemas provide a language for distributed artificial intelligence and perceptual robotics which is "in the style of the brain", but at a relatively high level of abstraction relative to neural networks. We seek (both at the level of schema asemblages, and in terms of "modular" neural networks) a distributed model of computation, supporting many concurrent activities for recognition of objects, and the planning and control of different activities. The use, representation, and recall of knowledge is mediated through the activity of a network of interacting computing agents which between them provide processes for going from a particular situation and a particular structure of goals and tasks to a suitable course of action. This action may involve passing of messages, changes of state, instantiation to add new schema instances to the network, deinstantiation to remove instances, and may involve self-modification and self- organization. Schemas provide a form of knowledge representation which differs from frames and scripts by being of a finer granularity. Schema theory is generative: schemas may well be linked to others to provide yet more comprehensive schemas, whereas frames tend to "build in" from the overall framework. The analysis of interacting computing agents (the schema instances) is intermediate between the overall specification of some behavior and the neural networks that subserve it. The Workshop will focus on different facets of this multi-level methodology. While the emphasis will be on technological systems, papers will also be accepted on biological and cognitive systems. Submission of Papers A list of sample topics for contributions is as follows, where a hybrid approach means one in which the abstract schema level is integrated with neural or other lower level models: Schema Theory as a description language for neural networks Modular neural networks Alternative paradigms for modeling symbolic and subsymbolic knowledge Hierarchical and distributed representations: adaptation and coding: Linking DAI to Neural Networks to Hybrid Architecture Formal Theories of Schemas Hybrid approaches to integrating planning & reaction Hybrid approaches to learning Hybrid approaches to commonsense reasoning by integrating neural networks and rule-based reasoning (using schemas for the integration) Programming Languages for Schemas and Neural Networks Schema Theory Applied in Cognitive Psychology, Linguistics, and Neuroscience Prospective contributors should send a five-page extended abstract, including figures with informative captions and full references - a hard copy, either by regular mail or fax - by August 30, 1993 to Michael Arbib, Center for Neural Engineering, University of Southern California, Los Angeles, CA 90089-2520, USA [Tel: (213) 740-9220, Fax: (213) 746-2863, arbib at pollux.usc.edu]. Please include your full address, including fax and email, on the paper. In accepting papers submitted in response to this Call for Papers, preference will be given to papers which present practical examples of, theory of, and/or methodology for the design and analysis of complex systems in which the overall specification or analysis is conducted in terms of a network of interacting schemas, and where some but not necessarily all of the schemas are implemented in neural networks. Papers which present a single neural network for pattern recognition ("perceptual schema") or pattern generation ("motor schema") will not be accepted. It is the development of a methodology to analyze the interaction of multiple functional units that constitutes the distinctive thrust of this Workshop. Notification of acceptance or rejection will be sent by email no later than September 1, 1993. There are currently no plans to issue a formal proceedings of full papers, but (revised versions) of accepted abstracts received prior to October 1, 1993 will be collected with the full text of the Tutorial in a CNE Technical Report which will be made available to registrants at the start of the meeting. A number of papers have already been accepted for the Workshop. These include the following: Arbib: Schemas and Neural Networks: A Tutorial Introduction to Integrating Symbolic and Subsymbolic Approaches to Cooperative Computation Arkin: Reactive Schema-based Robotic Systems: Principles and Practice Heenskerk and Keijzer: A Real-time Neural Implementation of a Schema Driven Toy-Car Leow and Miikkulainen, Representing and Learning Visual Schemas in Neural Networks for Scene Analysis Lyons & Hendriks: Describing and analysing robot behavior with schema theory Murphy, Lyons & Hendriks: Visually Guided Multi- Fingered Robot Hand Grasping as Defined by Schemas and a Reactive System Sun: Neural Schemas and Connectionist Logic: A Synthesis of the Symbolic and the Subsymbolic Weitzenfeld: Hierarchy, Composition, Heterogeneity, and Multi-granularity in Concurrent Object-Oriented Programming for Schemas and Neural Networks Wilson & Hendler: Neural Network Software Modules Bonus Event: The CNE Research Review: Monday, October 18, 1993 The CNE Review will present a day-long sampling of CNE research, with talks by faculty, and students, as well as demos of hardware and software. Special attention will be paid to talks on, and demos in, our new Autonomous Robotics Lab and Neuro-Optical Computing Lab. Fully paid registrants of the Workshop are entitled to attend the CNE Review at no extra charge. Registration The registration fee of $150 ($40 for qualified students who include a "certificate of student status" from their advisor) includes a copy of the abstracts, coffee breaks, and a dinner to be held on the evening of October 18th. Those wishing to register should send a check payable to "Center for Neural Engineering, USC" for $150 ($40 for students and CNE members) together with the following information to Paulina Tagle, Center for Neural Engineering, University of Southern California, University Park, Los Angeles, CA 90089-2520, USA. --------------------------------------------------- SCHEMAS AND NEURAL NETWORKS Center for Neural Engineering, USC October 19-20, 1993 NAME: ___________________________________________ ADDRESS: _________________________________________ PHONE NO.: _______________ FAX:___________________ EMAIL: ___________________________________________ I intend to submit a paper: YES [ ] NO [ ] I wish to be registered for the CNE Research Review: YES [ ] NO [ ] Accommodation Attendees may register at the hotel of their choice, but the closest hotel to USC is the University Hilton, 3540 South Figueroa Street, Los Angeles, CA 90007, Phone: (213) 748-4141, Reservation: (800) 872-1104, Fax: (213) 7480043. A single room costs $70/night while a double room costs $75/night. Workshop participants must specify that they are "Schemas and Neural Networks Workshop" attendees to avail of the above rates. Information on student accommodation may be obtained from the Student Chair, Jean-Marc Fellous, fellous at pollux.usc.edu.  From sims at pdesds1.scra.org Mon Aug 9 07:39:31 1993 From: sims at pdesds1.scra.org (Jim Sims) Date: Mon, 9 Aug 93 07:39:31 EDT Subject: [fwd: jbeard@aip.org: bee learning in Nature] Message-ID: <9308091139.AA02487@pdesds1.noname> Some cross-disciplinary, ah, pollination. jim From greiner at learning.siemens.com Mon Aug 9 14:59:26 1993 From: greiner at learning.siemens.com (Russell Greiner) Date: Mon, 9 Aug 93 14:59:26 EDT Subject: CLNL'93 Schedule Message-ID: <9308091859.AA05371@eagle.siemens.com> *********************************************************** * CLNL'93 -- Computational Learning and Natural Learning * * Provincetown, Massachusetts * * 10-12 September 1993 * *********************************************************** CLNL'93 is the fourth of an ongoing series of workshops designed to bring together researchers from a diverse set of disciplines --- including computational learning theory, AI/machine learning, connectionist learning, statistics, and control theory --- to explore issues at the intersection of theoretical learning research and natural learning systems. The schedule of presentations appears below, followed by logistics and information on registration ================ ** CLNL'93 Schedule (tentative) ** ======================= Thursday 9/Sept/93: 6:30-9:00 (optional) Ferry (optional): Boston to Provincetown [departs Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue] Friday 10/Sept/93 [CLNL meetings, at Provincetown Inn] 9 - 9:15 Opening remarks 9:15-10:15 Scaling Up Machine Learning: Practical and Theoretical Issues Thomas Dietterich [Oregon State Univ] (invited talk, see abstract below) 10:30-12:30 Paper session 1 What makes derivational analogy work: an experience report using APU Sanjay Bhansali [Stanford]; Mehdi T. Harandi [Univ of Illinois] Scaling Up Strategy Learning: A Study with Analogical Reasoning Manuela M. Veloso [CMU] Learning Hierarchies in Stochastic Domains Leslie Pack Kaebling [Brown] Learning an Unknown Signalling Alphabet Edward C. Posner, Eugene R. Rodemich [CalTech/JPL] 12:30- 2 Lunch (on own) Unscheduled TIME ( Whale watching, beach walking, ... ) ( Poster set-up time; Poster preview (perhaps) ) Dinner (on own) 7 - 10 Poster Session [16 posters] (Hors d'oeuvres) Induction of Verb Translation Rules from Ambiguous Training and a Large Semantic Hierarchy Hussein Almuallim, Yasuhiro Akiba, Takefumi Yamazaki, Shigeo Kaneda [NTT Network Information Systems Lab.] What Cross-Validation Doesn't Say About Real-World Generalization Gunner Blix, Gary Bradshaw, Larry Rendall [Univ of Illinois] Efficient Learning of Regular Expressions from Approximate Examples Alvis Brazma [Univ of Latvia] Capturing the Dynamics of Chaotic Time Series by Neural Networks Gurtavo Deco, Bernd Schurmann [Siemens AG] Learning One-Dimensional Geometrical Patterns Under One-Sided Random Misclassification Noise Paul Goldberg [Sandia National Lab]; Sally Goldman [Washington Univ] Adaptive Learning of Feedforward Control Using RBF Network ... Dimitry M Gorinevsky [Univ of Toronto] A practical approach for evaluating generalization performance Marjorie Klenin [North Carolina State Univ] Scaling to Domains with Many Irrelevant Features Pat Langley, Stephanie Sage [Siemens Corporate Research] Variable-Kernel Similarity Metric Learning David G. Lowe [Univ British Columbia] On-Line Training of Recurrent Neural Networks with Continuous Topology Adaptation Dragan Obradovic [Siemens AG] N-Learners Problem: System of PAC Learners Nageswara Rao, E.M. Oblow [Engineering Systems/Advanced Research] Soft Dynamic Programming Algorithms: Convergence Proofs Satinder P. Singh [Univ of Mass] Integrating Background Knowledge into Incremental Concept Formation Leon Shklar [Bell Communications Research]; Haym Hirsh [Rutgers] Learning Metal Models Astro Teller [Stanford] Generalized Competitive Learning and then Handling of Irrelevant Features Chris Thornton [Univ of Sussex] Learning to Ignore: Psychophysics and Computational Modeling of Fast Learning of Direction in Noisy Motion Stimuli Lucia M. Vaina [Boston Univ], John G. Harris [Univ of Florida] Saturday 11/Sept/93 [CLNL meetings, at Provincetown Inn] 9:00-10:00 Current Tree Research Leo Breiman [UCBerkeley] (invited talk, see abstract below) 10:30-12:30 Paper session 2 Initializing Neural Networks using Decision Trees Arunava Banerjee [Rutgers] Exploring the Decision Forest Patrick M. Murphy, Michael Pazzani [UC Irvine] What Do We Do When There Is Outrageous Data Points in the Data Set? - Algorithm for Robust Neural Net Regression Yong Liu [Brown] A Comparison of RBF and MLP Networks for Classification of Biomagnetic Fields Martin F. Schlang, Ralph Neunier, Klaus Abraham-Fuchs [Siemens AG] 12:30- 2 Lunch (on own) 2:30- 3:30 TBA (invited talk) Yann le Cun [ATT] 4:00- 6:00 Paper session 3 On Learning the Neural Network Architecture: An Average Case Analysis Mostefa Golea [Univ of Ottawa] Fast (Distribution Specific) Learning Dale Schuurmans [Univ of Toronto] Computational capacity of single neuron models Anthony Zador [Yale Univ School of Medicine] Probalistic Self-Structuring and Learning A.D.M. Garvin, P.J.W. Rayner [Cambridge] 7:00- 9 Banquet dinner Sunday 12/Sept/93 [CLNL meetings, at Provincetown Inn] 9 -11 Paper session 4 Supervised Learning from real and Discrete Incomplete Data Zoubin Ghaharamani, Michael Jordan [MIT] Model Building with Uncertainty in the Independent Variable Volker Tresp, Subutai Ahmad, Ralph Neuneier [Siemens AG] Supervised Learning using Unclassified and Classified Examples Geoff Towell [Siemens Corp. Res.] Learning to Classify Incomplete Examples Dale Schuurmans [Univ of Toronto]; R. Greiner [Siemens Corp. Res.] 11:30 -12:30 TBA (invited talk) Ron Rivest [MIT] 12:30 - 2 Lunch (on own) 3:30 - 6:30 Ferry (optional): Provincetown to Boston Depart from Boston (on own) ------ ------ Scaling Up Machine Learning: Practical and Theoretical Issues Thomas G. Dietterich Oregon State University and Arris Pharmaceutical Corporation Supervised learning methods are being applied to an ever-expanding range of problems. This talk will review issues arising in these applications that require further research. The issues can be organized according to the problem-solving task, the form of the inputs and outputs, and any constraints or prior knowledge that must be considered. For example, the learning task often involves extrapolating beyond the training data in ways that are not addressed in current theory or engineering experience. As another example, each training example may be represented by a disjunction of feature vectors, rather than a unique feature vector as is usually assumed. More generally, each training example may correspond to a manifold of feature vectors. As a third example, background knowledge may take the form of constraints that must be satisfied by any hypothesis output by a learning algorithm. The issues will be illustrated using examples from several applications including recent work in computational drug design and ecosystem modelling. -------- Current Tree Research Leo Breiman Deptartment of Statistics University of California, Berkeley This talk will summarize current research by myself and collaborators into methods of enhancing tree methodology. The topics covered will be: 1) Tree optimization 2) Forming features 3) Regularizing trees 4) Multiple response trees 5) Hyperplane trees These research areas are in a simmer. They have been programmed and are undergoing testing. The results are diverse. -------- -------- Programme Committee: Andrew Barron, Russell Greiner, Tom Hancock, Steve Hanson, Robert Holte, Michael Jordan, Stephen Judd, Pat Langley, Thomas Petsche, Tomaso Poggio, Ron Rivest, Eduardo Sontag, Steve Whitehead Workshop Sponsors: Siemens Corporate Research and MIT Laboratory of Computer Science ================ ** CLNL'93 Logistics ** ======================= Dates: The workshop begins at 9am Friday 10/Sept, and concludes by 3pm Sunday 12/Sept, in time to catch the 3:30pm Provincetown--Boston ferry. Location: All sessions will take place in the Provincetown Inn (800 942-5388); we encourage registrants to stay there. Provincetown Massachusetts is located at the very tip of Cape Cod, jutting into the Atlantic Ocean. Transportation: We have rented a ship from The Portuguese Princess to transport CLNL'93 registrants from Boston to Provincetown on Thursday 9/Sept/93, at no charge to the registrants. We will also supply light munchies en route. This ship will depart from the back of Boston Harbor Hotel, 70 Rowes Wharf on Atlantic Avenue (parking garage is 617 439-0328); tentatively at 6:30pm. If you are interested in using this service, please let us know ASAP (via e-mail to clnl93 at learning.scr.siemens.com) and also tell us whether you be able to make the scheduled 6:30pm departure. (N.b., this service replaces the earlier proposal, which involved the Bay State Cruise Lines.) The drive from Boston to Provincetown requires approximately two hours. There are cabs, busses, ferries and commuter airplanes (CapeAir, 800 352-0714) that service this Boston--Provincetown route. The Hyannis/Plymouth bus (508 746-0378) leaves Logan Airport at 8:45am, 11:45am, 2:45pm, 4:45pm on weekdays, and arrives in Provincetown about 4 hours later; its cost is $24.25. For the return trip (only), Bay State Cruise Lines (617 723-7800) runs a ferry that departs Provincetown at 3:30pm on Sundays, arriving at Commonwealth Pier in Boston Harbor at 6:30pm; its cost is $15/person, one way. Inquiries: For additional information about CLNL'93, contact clnl93 at learning.scr.siemens.com or CLNL'93 Workshop Learning Systems Department Siemens Corporate Research 755 College Road East Princeton, NJ 08540--6632 To learn more about Provincetown, contact their Chamber of Commerce at 508 487-3424. ================ ** CLNL'93 Registration ** ======================= Name: ________________________________________________ Affiliation: ________________________________________________ Address: ________________________________________________ ________________________________________________ Telephone: ____________________ E-mail: ____________________ Select the appropriate options and fees: Workshop registration fee ($50 regular; $25 student) ___________ Includes * attendance at all presentation and poster sessions * the banquet dinner on Saturday night; and * a copy of the accepted abstracts. Hotel room ($74 = 1 night deposit) ___________ [This is at the Provincetown Inn, assuming a minimum stay of 2 nights. The total cost for three nights is $222 = $74 x 3, plus optional breakfasts. Room reservations are accepted subject to availability. See hotel for cancellation policy.] Arrival date ___________ Departure date _____________ Name of person sharing room (optional) __________________ [Notice the $74/night does correspond to $37/person per night double-occupancy, if two people share one room.] # of breakfasts desired ($7.50/bkfst; no deposit req'd) ___ Total amount enclosed: ___________ If you are not using a credit card, make your check payable in U.S. dollars to "Provincetown Inn/CLNL'93", and mail your completed registration form to Provincetown Inn/CLNL P.O. Box 619 Provincetown, MA 02657. If you are using Visa or MasterCard, please fill out the following, which you may mail to above address, or FAX to 508 487-2911. Signature: ______________________________________________ Visa/MasterCard #: ______________________________________________ Expiration: ______________________________________________  From bill at nsma.arizona.edu Mon Aug 9 17:00:59 1993 From: bill at nsma.arizona.edu (Bill Skaggs) Date: Mon, 09 Aug 1993 14:00:59 -0700 (MST) Subject: [fwd: jbeard@aip.org: bee learning in Nature] Message-ID: <9308092100.AA10510@nsma.arizona.edu> This is a very interesting piece of work, but the "news release" is overblown and historically ignorant. The connection between mushroom bodies and learning has been known for a long time. There is also direct evidence for changes in the structure of the mushroom bodies as a result of experience: Coss and Perkel over a decade ago found changes in the length of dendritic spines after honeybees went on a single exploratory flight. This is much more direct than the evidence described in the "news release". Contrary to the claims in the "news release", these new results are unlikely to tell us much about human learning. It is not true that the honeybee brain is merely a simpler version of the human brain. They're completely different -- even the neurons are different in structure. Also insect learning and mammal learning are qualitatively different: for example, both honeybees and mammals can learn to navigate to a location using landmarks, but honeybees do it by simple visual pattern-matching, while mammals use considerably more sophisticated algorithms. Furthermore, it is not news that experience can lead to an increase in the number of connections. It has long been known that mammals raised in an enriched environment have thicker cortices, due to a greater density of synaptic structures. Surely this is more directly relevant to humans than data from honeybees could be. It's a shame to obscure a nice piece of work by making bogus claims about its significance. -- Bill  From dhw at santafe.edu Tue Aug 10 15:58:41 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 10 Aug 93 13:58:41 MDT Subject: Provable optimality of averaging generalizers Message-ID: <9308101958.AA15514@zia> Michael Perrone writes: >>> In the case of averaging for MSE optimization (the meat and potatoes of neural networks) and any other convex measure, the improvement due to averaging is independent of the distribution - on-training or off-. It depends only on the topology of the optimization measure. It is important to note that this result does NOT say the average is better than any individual estimate - only better than the average population performance. For example, if one had a reliable selection criterion for deciding which element of a population of estimators was the best and that estimator was better than the average estimator, then just choose the better one. (Going one step further, simply use the selection criterion to choose the best estimator from all possible weighted averages of the elements of the population.) As David Wolpert pointed out, any estimator can be confounded by a pathological data sample and therefore there doesn't exist a *guaranteed* method for deciding which estimator is the best from a population in all cases. Weak (as opposed to guaranteed) selection criteria exist in in the form of cross-validation (in all of its flavors). Coupling cross-validation with averaging is a good idea since one gets the best of both worlds particularly for problems with insufficient data. I think that another very interesting direction for research (as David Wolpert alluded to) is the investigation of more reliable selection criterion. >>> *** Well, I agree with the second two paragraphs, but not the first. At least not exactly as written. Although Michael is making an interesting and important point, I think it helps to draw attention to some things: I) First, I haven't yet gone though Michael's work in detail, but it seems to me that the "measures" Michael is referring to really only make sense as real-world cost functions (otherwise known as loss functions, sometimes as risk functions, etc.). Indeed many very powerful learning algorithms (e.g., memory based reasoning) are not directly cast as finding the minimum on an energy surface, be it "convex" or otherwise. For such algorithms, "measures" come in with the cost function. In fact, *by definition*, one is only interested in such real world cost - results concerning anything else do not concern the primary object of interest. With costs, an example of a convex surface is the quadratic cost function, which says that given truth f, your penalty for guessing h is given by the function (f - h)^2. For such a cost, Michael's result holds essentially because by guessing the average you reduce variance but keep the same bias (as compared to the average over all guesses). In other words, it holds because for any f, h1, and h2, [(h1 + h2)/2 - f)]^2 <= [(h1 - f)^2 + (h2 - f)^2] / 2. (When f, h1, and h2 refer to distributions rather than single values, as Michael rightly points out, you have to worry about other issues before making this statement, like whether the distributions are correlated with one another.) *** It should be realized though that there are many non-convex cost functions in the real world. For example, when doing classification, one popular cost function is zero-one. This function says you get credit for guessing exactly correctly, and if you miss, it doesn't matter what you guessed; all misses "cost you" the same. This cost function is implicit in much of PAC, stat. mech. of learning, etc. Moreover, in Bayesian decision theory, guessing the weights which maximize the posterior probability P(weights | data) (which in the Bayesian perspective of neural nets is exactly what is done in backprop with weight decay) is the optimal strategy only for this zero-one cost. Now if we take this zero-one cost function, and evaluate it only off the training set, it is straight-forward to prove that for a uniform Pr(target function), the probability of a certain value of cost, given data, is independent of the learning algorithm. (The same result holds for other cost functions as well, though as Michael points out, you must be careful in trying to extend this result to convex cost functions.) This is true for any data set, i.e., it is not based on "pathological data", as Michael puts it. It says that unless you can rule out a uniform Pr(target function), you can not prove any one algorithm to be superior to any other (as far as this cost function is concerned). *** II) Okay. Now Michael correctly points out that even in those cases w/ a convex cost "measure", you must interpret his result with caution. I agree, and would say that this is somewhat like the famous "two letters" paradox of probability theory. Consider the following: 1) Say I have 3 real numbers, A, B, and X. In general, it's always true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} / 2. (This is exactly analogous to having the cost of the average guess bounded above by the average cost of the individual guesses.) 2) This means that if we had a choice of either randomly drawing one of the numbers {A, B}, or drawing C, that on average drawing C would give smaller quadratic cost with respect to X. 3) However, as Michael points out, this does *not* mean that if we had just the numbers A and C, and could either draw A or C, that we should draw C. In fact, point (1) tells us nothing whatsoever about whether A or C is preferable (as far as quadratic cost with respect to X is concerned). 4) In fact, now create a 5th number, D = [C + A] / 2. By the same logic as in (1), we see that the cost (wrt/ X) of D is less than the average of the costs of C and A. So to the exact same degree that (1) says we "should" guess C rather than A or B, it also says we should guess D rather than A or C. (Note that this does *not* mean that D's cost is necessarily less than C's though; we don't get endlessly diminishing costs.) 5) Step (4) can be repeated ad infinitum, getting a never-ending sequence of "newly optimal" guesses. In particular, in the *exact* sense in which C is "preferable" to A or B, and therefore should "replace" them, D is preferable to A or B, and therefore should replace *them* (and in particular replace C). So one is never left with C as the object of choice. *** So (1) isn't really normative; it doesn't say one "should" guess the average of a bunch of guesses: 7) Choosing D is better than randomly choosing amongst C or A, just as choosing C is better than randomly choosing amongst A or B. 8) This doesn't mean that given C, one should introduce an A and then guess the average of C and A (D) rather than C, just as this doesn't mean that given A, one should introduce a B and then guess the average of A and B (C) rather than A. 9) An analogy which casts some light on all this: view A and B not as the outputs of separate single-valued learning algorithms, but rather as the random outputs of a single learning algorithm. Using this analogy, the result of Michael's, that one should always guess C rather than randomly amongst A or B, suggest that one should always use a deterministic, single-valued learning algorithm (i.e., just guess C) rather than one that guesses randomly from a distribution over possible guesses (i.e., one that guess randomly amongst A or B). This implication shouldn't surprise anyone familiar with Bayesian decision theory. In fact, it's (relatively) straight-forward to prove that independent of priors or the like, for a convex cost function, one should always use a single-valued learning algorithm rather than one which guesses randomly. (This has probably been proven many times. One proof can be found in Wolpert and Stolorz, On the implementation of Bayes optimal generalizers, SFI tech. report 92-03-012.) (Blatant self-promotion: Other interesting things proven in that report and others in its series are: there are priors and noise processes such that the expected cost, given the data set and that one is using a Bayes-optimal learning algorithm, can *decrease* with added noise; if the cost function is a proper metric, then the magnitude of the change in expected cost if one guesses h rather than h' is bounded above by the cost of h relative to h'; other results about using "Bayes-optimal" generalizers predicated on an incorrect prior, etc., etc.) *** The important point is that although it is both intriguing and illuminating, there are no implications of Michael's result for what one should do with (or in place of) a particular deterministic, single-valued learning algorithm. It was for such learning algorithms that my original comments were intended. David Wolpert  From dhw at santafe.edu Tue Aug 10 16:29:07 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 10 Aug 93 14:29:07 MDT Subject: MacKay's recent work and feature selection Message-ID: <9308102029.AA15554@zia> Recently David MacKay made a posting concerning a technique he used to win an energy prediction competition. Parts of that technique have been done before (e.g., combining generalizers via validation set behavior). However other parts are both novel and very interesting. This posting concerns the "feature selection" aspect of his technique, which I understand MacKay developed in association w/ Radford Neal. (Note: MacKay prefers to call the technique "automatic relevance determination"; nothing I'll discuss here will be detailed enough for that distinction to be important though.) What I will say grew out of conversations w/ David Rosen and Tom Loredo, in part. Of course, any stupid or silly aspects to what I will say should be assumed to originate w/ me. *** Roughly speaking, MacKay implemented feature selection in a neural net framework as follows: 1) Define a potentially different "weight decay constant" (i.e., regularization hyperparameter) for each input neuron. The idea is that one wants to have those constants set high for input neurons representing "features" of the input vector which it behooves us to ignore. 2) One way to set those hyperparameters would be via a technique like cross-validation. MacKay instead set them via maximum likelihood, i.e., he set the weight decay constants alpha_i to those values maximizing P(data | alpha_i). Given a reasonably smooth prior P(alpha_i), this is equivalent to finding the maximum a posterior (MAP) alpha_i, i.e., the alpha_i maximizing P(alpha_i | data). 3) Empirically, David found that this worked very well. (I.e., he won the competition.) *** This neat idea makes some interesting suggestions: 1) The first grows out of "blurring" the distinction between parameters (i.e., weights w_j) and hyperparameters (the alpha_i). Given such squinting, MacKay's procedure amounts to a sort of "greedy MAP". First he sets one set of parameters to its MAP values (the alpha_i), and then with those values fixed, he sets the other parameters (the w_j) to their MAP values (this is done via the usual back-propagation w/ weight-decay, which we can do since the first stage set the weight decay constants). In general, the resultant system will not be at the global MAP maximizing P(alpha_i, w_j | D). In essence, a sort of extra level of regularization has been added. (Note: Radford Neal informs me that calculationally, in the procedure MacKay used, the second MAP step is "automatic", in the sense that one has already made the necessary calcualtions to perform that step when one carries out the first MAP step.) Of course, this viewing the technique from a "blurred" perspective is a bit of a fudge, since hyperparameters are not the same thing as parameters. Nonetheless, this view suggests some interesting new techniques. E.g., first set the weights leading to hidden layer 1 to their MAP values (or maximum likelihood values, for that matter). Then with those values fixed, do the same to the weights in the second layer, etc. Another reason to consider this layer-by-layer technique is the fact that training of the weights connecting different layers should in general be distinguishable, e.g., as MacKay has pointed out, one should have different weight-decay constants for the different layers. 2) Another interesting suggestion comes from justifying the technique not as a priori reasonable, but rather as an approximation to a full "hierarchical" Bayesian technique, in which one writes P(w_j | data) (i.e., the ultimate object of interest) prop. to integral d_alpha_i P(data | w_j alpha_i) P(w_j | alpha_i) P(alpha_i). Note that all 3 distributions occuring in this integrand must be set in order to use MacKay's technique. (The by-now-familiar object of contention between MacKay and myself is on how generically this approximation will be valid, and whether one should explicitly test its validity when one claims that it holds. This issue isn't pertinent to the current discussion however.) Let's assume the approximation is very good. Then under the assumptions: i) P(alpha_i) is flat enought to be ignored; ii) the distribution P(w_j | alpha_i) is a product of gaussians (each gaussian being for those w_j connecting to input neuron i, i.e., for those weights using weight decay constant alpha_i); then what MacKay did is equivalent to back-propagation with weight-decay, where rather than minimizing {training set error} + constant x {sum over all j (w_j)^2}, as in conventional weight decay, MacKay is minimizing (something like) {training set error } + {(sum over i) [ (number of weights connecting to neuron i) x ln [(sum over j; those weights connecting to neuron i) (w_j)^2] ]}. What's interesting about this isn't so much the logarithm in the "weight decay" term, but rather the fact that weights are being clumped together in that weight-decay term, into groups of those weights connecting to the same neuron. (This is not true in conventional weight decay.) So in essence, the weight-decay term in MacKay's scenario is designed to affect all the weights connecting to a given neuron as a group. This makes intuitive sense if the goal is feature selection. 3) One obvious idea based on viewing things this way is to try to perform weight-decay using this modified weight-decay term. This might be reasonable even if MacKay's technique is not a good approximation to this full Bayesian technique. 4) The idea of MacKay's also leads to all kinds of ideas about how to set up the weight-decay term so as to enforce feature selection (or automatic relevance determination, if you prefer). These need not have anything to do w/ the precise weight-decay term MacKay used; rather the idea is simply to take his (implicit) suggestion of trying to do feature selection via the weight-decay term, and see where it leads. For example: Where originally we have input neurons at layer 1, hidden layers 2 through n, and then output neurons at layer n+1, now have the same architecture with an extra "pre-processing" layer 0 added. Inputs are now fed to the neurons at layer 0. For each input neuron at layer 0, there is one and only weight, leading straight up to the neuron at layer 1 which in the original formulation was the (corresponding) input neuron. The hope would be that for those input neurons which we "should" mostly ignore, something like backprop might set the associated weights from layer 0 to layer 1 to very small values. David Wolpert  From rsun at athos.cs.ua.edu Tue Aug 10 17:33:56 1993 From: rsun at athos.cs.ua.edu (Ron Sun) Date: Tue, 10 Aug 1993 16:33:56 -0500 Subject: No subject Message-ID: <9308102133.AA22967@athos.cs.ua.edu> CALL FOR PAPERS International Symposium on Integrating Knowledge and Neural Heuristics (ISIKNH'94) Sponsored by University of Florida, and AAAI, in cooperation with IEEE Neural Network Council, and Florida AI Research Society. Time: May 9-10 1994; Place: Pensacola Beach, Florida, USA. A large amount of research has been directed toward integrating neural and symbolic methods in recent years. Especially, the integration of knowledge-based principles and neural heuristics holds great promise in solving complicated real-world problems. This symposium will provide a forum for discussions and exchanges of ideas in this area. The objective of this symposium is to bring together researchers from a variety of fields who are interested in applying neural network techniques to augmenting existing knowledge or proceeding the other way around, and especially, who have demonstrated that this combined approach outperforms either approach alone. We welcome views of this problem from areas such as constraint-(knowledge-) based learning and reasoning, connectionist symbol processing, hybrid intelligent systems, fuzzy neural networks, multi-strategic learning, and cognitive science. Examples of specific research include but are not limited to: 1. How do we build a neural network based on {\em a priori} knowledge (i.e., a knowledge-based neural network)? 2. How do neural heuristics improve the current model for a particular problem (e.g., classification, planning, signal processing, and control)? 3. How does knowledge in conjunction with neural heuristics contribute to machine learning? 4. What is the emergent behavior of a hybrid system? 5. What are the fundamental issues behind the combined approach? Program activities include keynote speeches, paper presentation, and panel discussions. ***** Scholarships are offered to assist students in attending the symposium. Students who wish to apply for a scholarship should send their resumes and a statement of how their researches are related to the symposium. ***** Symposium Chairs: LiMin Fu, University of Florida, USA. Chris Lacher, Florida State University, USA. Program Committee: Jim Anderson, Brown University, USA Michael Arbib, University of Southern California, USA Fevzi Belli, The University of Paderborn, Germany Jim Bezdek, University of West Florida, USA Bir Bhanu, University of California, USA Su-Shing Chen, National Science Foundation, USA Tharam Dillon, La Trobe University, Australia Douglas Fisher, Vanderbilt University, USA Paul Fishwick, University of Florida, USA Stephen Gallant, HNC Inc., USA Yoichi Hayashi, Ibaraki University, Japan Susan I. Hruska, Florida State University, USA Michel Klefstad-Sillonville CCETT, France David C. Kuncicky, Florida State University, USA Joseph Principe, University of Florida, USA Sylvian Ray, University of Illinois, USA Armando F. Rocha, University of Estadual, Brasil Ron Sun, University of Alabama, USA Keynote Speaker: Balakrishnan Chandrasekaran, Ohio-State University Schedule for Contributed Papers ---------------------------------------------------------------------- Paper Summaries Due: December 15, 1993 Notice of Acceptance Due: February 1, 1994 Camera Ready Papers Due: March 1, 1994 Extended paper summaries should be limited to four pages (single or double-spaced) and should include the title, names of the authors, the network and mailing addresses and telephone number of the corresponding author. Important research results should be attached. Send four copies of extended paper summaries to LiMin Fu Dept. of CIS, 301 CSE University of Florida Gainesville, FL 32611 USA (e-mail: fu at cis.ufl.edu; phone: 904-392-1485). Students' applications for a scholarship should also be sent to the above address. General information and registration materials can be obtained by writing to Rob Francis ISIKNH'94 DOCE/Conferences 2209 NW 13th Street, STE E University of Florida Gainesville, FL 32609-3476 USA (Phone: 904-392-1701; fax: 904-392-6950) --------------------------------------------------------------------- --------------------------------------------------------------------- If you intend to attend the symposium, you may submit the following information by returning this message: NAME: _______________________________________ ADDRESS: ____________________________________ _____________________________________________ _____________________________________________ _____________________________________________ _____________________________________________ PHONE: ______________________________________ FAX: ________________________________________ E-MAIL: _____________________________________ ---------------------------------------------------------------------  From ld231782 at longs.lance.colostate.edu Wed Aug 11 00:56:26 1993 From: ld231782 at longs.lance.colostate.edu (L. Detweiler) Date: Tue, 10 Aug 93 22:56:26 -0600 Subject: neuroanatomy list ad & more on bee brains In-Reply-To: Your message of "Mon, 09 Aug 93 14:00:59 PDT." <9308092100.AA10510@nsma.arizona.edu> Message-ID: <9308110456.AA06912@longs.lance.colostate.edu> While many on this list will not be interested in the details of bee-brain neuroanatomy or arguments thereon, an excellent list for discussions of this can be requested from cogneuro-request at ptolemy.arc.nasa.gov, maintained by Kimball Collins . The list has fairly low volume although definitely more than connectionists, and I'd like to encourage any of this amazingly literate connectionist crowd with a strong interest in neurobiological research to subscribe (recent/past topics: neurobiology of rabies infections, Hebb's rule, vision, dyslexia, etc.) * * * Mr. Skaggs writes an exceedingly hostile flame (a redundant phrase) on the recent syndicated news article describing research into bee function and neuroanatomy, calling it `overblown and historically ignorant'. While I don't have as close of a background to the area in question as Mr. Skaggs appears to, this is just a short note to balance the scale a little closer to equilibrium. The critical feature that I see going on here is a professional scientist demeaning a non-detailed popular account of scientific work, esp. in that person's area of expertise, for lapses in precise description. This happens all the time, of course, both the presence of the quasi-skewed material and the criticism. Definitely, the article was the overwrought cheerleeding type, rather stereotypical, but Mr. Skaggs, on the other hand, plays into the cliche of the pessimistic and sour curmudgeon-scientist in attacking it. I'd like to point out that this popular literature serves a very useful purpose in keeping the lay public apprised of new developments in scientific fields and, ultimately, encouraging funding. It is not fair to apply the strict scientific standard of evaluation to something that appears in the popular press. In this case, there is no significant error, and the purpose is served in being `approximately correct', and there is no point to rebutting it. We are bound to lose something in the translation, and the major points of disagreement are likely to be over opinion. We should instead be highly encouraged and appreciative of these attempts to bring increasingly abstruse and technical science to the interested layman. I appreciate the popular press to some degree in that it forces scientists to get at the essence of their research, something they sometimes lose sight of. The scientist (perhaps the neuroscientist in particular) is forever saying `it's not quite that simple' or `it doesn't quite happen like that' or `there are exceptions to that' to the point that an outsider can give up in frustration, thinking that it is nothing but a disconnected morass with no underlying message or cohesion. The general press usually gives a close and fascinating view into what the `big picture' is. Looking at reporters as nothing but clueless intruders is a somewhat self-destructive position, IMHO. And yes, the grandiose statements like `will shed insight into human learning' can be recognized by other scientists as the necessary fodder and not criticized but ignored. Now, to address a few points: >Coss and Perkel over a decade ago found >changes in the length of dendritic spines after honeybees went on a >single exploratory flight. This is much more direct than the evidence >described in the "news release". Incidentally, the changes in dendritic growth with learning are IMHO one of the most fascinating studies of plasticity, and on the cutting edge of current research, and perhaps others will wish to post references. (The classic study showed that rats reared in deprived vs. abundant sensory-stimulii containing environments had less or more growth, respectively.) >It is not true that the >honeybee brain is merely a simpler version of the human brain. They're >completely different -- even the neurons are different in structure. definitely, any animal model always has minor or major imperfections and pitfalls. But this brings up an interesting point--is there an analogue to LTP in the insect brain? there is probably at least a degree of overlap in the kinds of neurotransmitters involved. However, arguing against the relevance, superiority, and verisimilitude of one animal model vs. another can turn into a very emotional debate, and should be engaged with the utmost delicacy or statements come out with a connotation much like `the car you drive all day is worthless'.  From delliott at src.umd.edu Wed Aug 11 17:52:44 1993 From: delliott at src.umd.edu (David L. Elliott) Date: Wed, 11 Aug 1993 17:52:44 -0400 Subject: Call for papers, NeuroControl book Message-ID: <199308112152.AA04995@newra.src.umd.edu> PROGRESS IN NEURAL NETWORKS series Editor O. M. Omidvar Special Volume: NEURAL NETWORKS FOR CONTROL Editor: David L. Elliott CALL FOR PAPERS Original manuscripts describing recent progress in neural networks research directly applicable to Control or making use of modern control theory. Manuscripts may be survey or tutorial in nature. Suggested topics for this book are: %New directions in neurocontrol %Adaptive control %Biological control architectures %Mathematical foundations of control %Model-based control with learning capability %Natural neural control systems %Neurocontrol hardware research %Optimal control and incremental dynamic programming %Process control and manufacturing %Reinforcement-Learning Control %Sensor fusion and vector quantization %Validating neural control systems The papers will be refereed and uniformly typeset. Ablex and the Progress Series editors invite you to submit an abstract, extended summary or manuscript proposal, directly to the Special Volume Editor: Dr. David L. Elliott, Institute for Systems Research University of Maryland, College Park, MD 20742 Tel: (301)405-1241 FAX (301)314-9920 Email: DELLIOTT at SRC.UMD.EDU or to the Series Editor: Dr. Omid M. Omidvar, Computer Science Dept., University of the District of Columbia, Washington DC 20008 Tel: (202)282-7345 FAX: (202)282-3677 Email: OOMIDVAR at UDCVAX.BITNET The Publisher is Ablex Publishing Corporation, Norwood, NJ  From pittman at mcc.com Thu Aug 12 08:38:30 1993 From: pittman at mcc.com (Jay Pittman) Date: Thu, 12 Aug 93 08:38:30 EDT Subject: neuroanatomy list ad & more on bee brains Message-ID: <9308121338.AA14022@gluttony.mcc.com> Excellent note, well stated. I agree with everything Detweiler said about the press. On the other hand, when I originally read Bill Skaggs note I didn't think he was being all that critical. I went back and looked at it again, and, yes, he does sound like a real flamer, WHEN I START OUT ASSUMING THAT. One can also read it as a calmly-stated critique of the article. I find myself imagining different "tones of voice", depending on (presumably) random triggers. I hope when you read this note you perceive me speaking in a calm, relaxed manner. While I agree with Detweiler's attitude toward the popular press, I think Skaggs statements were addressed to us, the members of the research community, and not to the reporters. As long as the note does not reach members of that community, we should tolerate somewhat-more-grouchy phrasing than we might want for lay consumption. I've just spent a lot of time trying to carefully word the above message. The neat thing about a group such as connectionists is that (I think) we can skip that labor, and just spit out our thoughts. Or perhaps I am being naive? BTW, I have no HO on bee brains. My own dendrites get thinner every day. J  From chris at arraysystems.nstn.ns.ca Sat Aug 14 14:25:29 1993 From: chris at arraysystems.nstn.ns.ca (Chris Brobeck) Date: Sat, 14 Aug 93 15:25:29 ADT Subject: Genetic Algorithms Message-ID: <9308141825.AA07238@arraysystems.nstn.ns.ca> Dear Colleagues; We're currently in the process of building a relatively large net and were looking at of using a genetic algorithm to optimize the network structure. The question is as follows. Early forms of genetic algorithms seemed to rely on reading the gene once, linearly, in the construction process, whereas a number of more recent algorithms allow the reading to start anywhere along the gene, and continue to read (construct rules) until some stopping criteria is met. In the former, it seems reasonable then for one organism to compete against the other in a winner-take-all sort of way. On the other hand, the rigidity of the genetic structure makes it very sensitive to mutation. In the latter case the gene may be thought of as a generator for a process (randomly) creating rules of a variety of lengths. If one assumes that individual rules are much shorter than the entire gene this method becomes less sensitive to mutation,crossover,etc (both the beneficial and not so beneficial aspects). In this case it seems that competition among species would be as critical as competion among individuals, with the interspecies competion perhaps representing a fast way to remove ineffective rule sets, and individual competion more of a way of fine-tuning a distribution. The upshot would be (one assumes) slower but more robust convergence. In any case, if there is anyone out there who can point us in the direction of some good references let us know - particularly ones that might be available via ftp. Thanks, Chris Brobeck.  From bengio at iro.umontreal.ca Mon Aug 16 11:09:57 1993 From: bengio at iro.umontreal.ca (Samy Bengio) Date: Mon, 16 Aug 1993 11:09:57 -0400 Subject: Preprint announcement: Generalization of a Parametric Learning Rule Message-ID: <9308161509.AA06576@carre.iro.umontreal.ca> FTP-host: archive.cis.ohio-state.edu FTP-filename: /pub/neuroprose/bengio.general.ps.Z The following file has been placed in neuroprose: (no hardcopies will be provided): GENERALIZATION OF A PARAMETRIC LEARNING RULE (8 pages) by Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei Abstract: In previous work we discussed the subject of parametric learning rules for neural networks. In this article, we present a theoretical basis permitting to study the {\it generalization} property of a learning rule whose parameters are estimated from a set of learning tasks. By generalization, we mean the possibility of using the learning rule to learn solve new tasks. Finally, we describe simple experiments on two-dimensional categorization tasks and show how they corroborate the theoretical results. This paper is an extended version of a paper which will appear in ICANN'93: Proceedings of the International Conference on Artificial Neural Networks. To retrieve the file: unix> ftp cheops.cis.ohio-state.edu Connected to cheops.cis.ohio-state.edu. 220 cheops.cis.ohio-state.edu FTP server ready. Name: anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/neuroprose 250 CWD command successful. ftp> get bengio.general.ps.Z 200 PORT command successful. 150 Opening BINARY mode data connection for bengio.general.ps.Z 226 Transfer complete. 100000 bytes sent in 3.14159 seconds ftp> quit 221 Goodbye. unix> uncompress bengio.general.ps.Z unix lpr bengio.general.ps (or however you print out postscript) Many thanks to Jordan Pollack for maintaining this archive. -- Samy Bengio E-mail: bengio at iro.umontreal.ca Fax: (514) 343-5834 Tel: (514) 343-6111 ext. 3545/3494 Residence: (514) 495-3869 Universite de Montreal, Dept. IRO, C.P. 6128, Succ. A, Montreal, Quebec, Canada, H3C 3J7  From reza at ai.mit.edu Mon Aug 16 12:37:02 1993 From: reza at ai.mit.edu (Reza Shadmehr) Date: Mon, 16 Aug 93 12:37:02 EDT Subject: Tech Reports from CBCL at MIT Message-ID: <9308161637.AA03497@corpus-callosum.ai.mit.edu> The following technical reports from the Center for Biological and Computational Learning at M.I.T. are now available via anonymous ftp. -------------- :CBCL Paper #83/AI Memo #1440 :author Michael I. Jordan and Robert A. Jacobs :title Hierarchical Mixtures of Experts and the EM Algorithm :date August 1993 :pages 29 We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation- Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. -------------- :CBCL Paper #84/AI Memo #1441 :author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh :title On the Convergence of Stochastic Iterative Dynamic Programming Algorithms :date August 1993 :pages 13 Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD (lambda) and Q-learning belong. ============================ How to get a copy of above reports: The files are in compressed postscript format and are named by their AI memo number, e.g., the Jordan and Jacobs paper is named AIM-1440.ps.Z. Here is the procedure for ftp-ing: unix> ftp ftp.ai.mit.edu (log-in as anonymous) ftp> cd ai-pubs/publications/1993 ftp> binary ftp> get AIM-number.ps.Z ftp> quit unix> zcat AIM-number.ps.Z | lpr I will periodically update the above list as new titles become available. Best wishes, Reza Shadmehr Center for Biological and Computational Learning M. I. T. Cambridge, MA 02139  From mikewj at signal.dra.hmg.gb Tue Aug 17 04:15:33 1993 From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb) Date: Tue, 17 Aug 93 09:15:33 +0100 Subject: Practical Neural Nets Conference & Workshops in UK. Message-ID: AA20707@ravel.dra.hmg.gb *********************************** NEURAL COMPUTING APPLICATIONS FORUM *********************************** 8 - 9 September 1993 Brunel University, Runnymede, UK ***************************************** PRACTICAL APPLICATIONS OF NEURAL NETWORKS ***************************************** Neural Computing Applications Forum is the primary meeting place for people developing Neural Network applications in industry and academia. It has 200 members from the UK and Europe, from universities, small companies and big ones, and holds four main meeting each year. It has been running for 3 years, and is cheap to join. This meeting spans two days with informal workshops on 8 September and the main meeting comprising talks about neural network techniques and applications on 9 September. ********* WORKSHOPS ********* ********************************************************** Neural Networks in Engine Health Monitoring 8 September, 13.00 to 15.00 ********************************************************** Technical Contact: Tom Harris: (+44/0) 784 431341 Including : Roger Hutton (ENTEK): "What is Predictive Maintenance?" John Hobday (Lloyds Register): "Gas Turbine Start Monitoring" John McIntyre (University of Sunderland / National Power plc): Predictive Maintenance at Blyth Power Station ********************************************************* Building a Neural Network Application 8 September, 15.30 to 17.30 ********************************************************* Technical Contact: Tom Harris: (+44/0) 784 431341 Including: Chris Bishop (Aston University): "The DTI Neural Computing Guidelines Project" Tom Harris (Brunel University): "A Design Process for Neural Network Applications" Paul Gregory (Recognition Research Ltd.): "Building an Applications in Software" (case study) Simon Hancock (Neural Technologies Ltd.): "Implementing Hardware Neural Network Solutions" (case study) ************************* Evening: Barbecue Supper ************************* ***************************** MAIN MEETING - 24 June 1993 ***************************** 8.30 Registration 9.05 Welcome 9.15 Neil Burgess (CRL): "Feature Selection in Neural Networks" 9.50 Bryn Williams (Aston University): "Convergence and Diversity of Species in Genetic Algorithms for Optimization of a Bump-Tree Classifier" 10.20 Coffee 11.00 Mike Brinn (Health and Safety Executive): "Kohonen Networks Classifying Toxic Molecules" 11.40 John Bridle (Dragon Systems Ltd.): "Speech Recognition in Principle and Practice" 12.15 Lunch 2.00 Bruce Wilkie (Brunel University): "Real Time Logical Neural Networks" 2.40 Stan Swallow (Brunel University): "TARDIS: The World's Fastest Neural Network?" 3.15 Tea 3.40 Dave Cressy (Logica Cambridge Ltd.): "Neural Control of an Experimental Batch Distillation Column" 4.10 Discussions 4.30 Close & minibus to the station ACCOMODATION is available in Brunel University at 35 pounds (including barbecue supper) and **MUST** be booked and paid for in advance. Accommodation and breakfast only: 25 pounds; barbecue supper only: 12 pounds. ***************** Application ***************** Members of NCAF get free entry to all meetings for a year. (This is very good value - main meetings, tutorials, special interest meetings). It also includes subscription to Springer Verlag's journal "Neural Computing and Applications". Full membership: 250 pounds. - anybody in your small company / research group in big company. Individual membership: 140 pounds - named individual only. Student membership (with journal): 55 pounds - copy of student ID required. Student membership (no journal, very cheap!): 25 pounds - copy of student ID required. Entry to this meeting without membership costs 35 pounds for the workshops, and 80 pounds for the main day. Payment in advance if possible; please give an official order number if an invoice is required. Email enquiries to Mike Wynne-Jones, mikewj at signal.dra.hmg.gb. Postal to Mike Wynne-Jones, NCAF, PO Box 62, Malvern, WR14 4NU, UK. Fax to Karen Edwards, (+44/0) 21 333 6215  From mpp at cns.brown.edu Tue Aug 17 12:18:01 1993 From: mpp at cns.brown.edu (Michael P. Perrone) Date: Tue, 17 Aug 93 12:18:01 EDT Subject: Provable optimality of averaging generalizers Message-ID: <9308171618.AA15207@cns.brown.edu> David Wolpert writes: -->1) Say I have 3 real numbers, A, B, and X. In general, it's always -->true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} / -->2. (This is exactly analogous to having the cost of the average guess -->bounded above by the average cost of the individual guesses.) --> -->2) This means that if we had a choice of either randomly drawing one -->of the numbers {A, B}, or drawing C, that on average drawing C would -->give smaller quadratic cost with respect to X. --> -->3) However, as Michael points out, this does *not* mean that if we had -->just the numbers A and C, and could either draw A or C, that we should -->draw C. In fact, point (1) tells us nothing whatsoever about whether A -->or C is preferable (as far as quadratic cost with respect to X is -->concerned). --> -->4) In fact, now create a 5th number, D = [C + A] / 2. By the same -->logic as in (1), we see that the cost (wrt/ X) of D is less than the -->average of the costs of C and A. So to the exact same degree that (1) -->says we "should" guess C rather than A or B, it also says we should -->guess D rather than A or C. (Note that this does *not* mean that D's -->cost is necessarily less than C's though; we don't get endlessly -->diminishing costs.) --> -->5) Step (4) can be repeated ad infinitum, getting a never-ending -->sequence of "newly optimal" guesses. In particular, in the *exact* -->sense in which C is "preferable" to A or B, and therefore should -->"replace" them, D is preferable to A or B, and therefore should -->replace *them* (and in particular replace C). So one is never left -->with C as the object of choice. This argument does not imply a contradiction for averaging! This argument shows the natural result of throwing away information. Step (4) throws away number B. Given that we no longer know B, number D is the correct choice. (One could imagine such "forgetting" to be useful in time varying situations - which leads towards the Kalman filtering that was mentioned in relation to averaging a couple of weeks ago.) In Step (5), an infinite sequence is developed by successively throwing away more and more of number B. The infinite limit of Step (5) is number A. In other words, we have thrown away all knowledge of B. -->So (1) isn't really normative; it doesn't say one "should" guess the -->average of a bunch of guesses: Normative? Hey is this an ethics class!? :-) -->7) Choosing D is better than randomly choosing amongst C or A, just as --> choosing C is better than randomly choosing amongst A or B. --> -->8) This doesn't mean that given C, one should introduce an A and --> then guess the average of C and A (D) rather than C, just as --> this doesn't mean that given A, one should introduce a B and --> then guess the average of A and B (C) rather than A. Sure, if you're willing to throw away information. Michael  From cns at clarity.Princeton.EDU Tue Aug 17 11:30:02 1993 From: cns at clarity.Princeton.EDU (Cognitive Neuroscience) Date: Tue, 17 Aug 93 11:30:02 EDT Subject: RFP Research - McDonnell-Pew Program Message-ID: <9308171530.AA27618@clarity.Princeton.EDU> McDonnell-Pew Program in Cognitive Neuroscience SEPTEMBER 1993 Individual Grants-in-Aid for Research Program supported jointly by the James S. McDonnell Foundation and The Pew Charitable Trusts INTRODUCTION The McDonnell-Pew Program in Cognitive Neuroscience has been created jointly by the James S. McDonnell Foundation and The Pew Charitable Trusts to promote the development of cognitive neuroscience. The foundations have allocated $20 million over a five-year period for this program. Cognitive neuroscience attempts to understand human mental events by specifying how neural tissue carries out computations. Work in cognitive neuroscience is interdisciplinary in character, drawing on developments in clinical and basic neuroscience, computer science, psychology, linguistics, and philosophy. Cognitive neuroscience excludes descriptions of psychological function that do not address the underlying brain mechanisms and neuroscientific descriptions that do not speak to psychological function. The program has three components. (1) Institutional grants, which have already been awarded, for the purpose of creating centers where cognitive scientists and neuroscientists can work together. (2) Small grants-in-aid, presently being awarded, for individual research projects to encourage Ph.D. and M.D. investigators in cognitive neuroscience. (3) Small grants-in-aid, presently being awarded, for individual training projects to encourage Ph.D. and M.D. investigators to acquire skills for interdisciplinary research. This brochure describes the individual grants-in-aid for research. RESEARCH GRANTS The McDonnell-Pew Program in Cognitive Neuroscience will issue a limited number of awards to support collaborative work by cognitive neuroscientists. Applications are sought for projects of exceptional merit that are not currently fundable through other channels and from investigators who are not at institutions already funded by an institutional grant from the program. In order to distribute available funds as widely as possible, preference will be given to applicants who have not received previous grants under this program. Preference will be given to projects that are interdisciplinary in character. The goals of the program are to encourage broad participation in the development of the field and to facilitate the participation of investigators outside the major centers of cognitive neuroscience. There are no U.S. citizenship restrictions or requirements, nor does the proposed work need to be conducted at a U.S. institution, providing the sponsoring organization qualifies as tax-exempt as described in the "Applications" section of this brochure. Ph.D. thesis research of graduate students will not be funded. Grant support under the research component is limited to $30,000 per year for two years. Indirect costs are to be included in the $30,000 maximum and may not exceed 10 percent of total salaries and fringe benefits. These grants are not renewable after two years. The program is looking for innovative proposals that would, for example: * combine experimental data from cognitive psychology and neuroscience; * explore the implications of neurobiological methods for the study of the higher cognitive processes; * bring formal modeling techniques to bear on cognition, including emotions and higher thought processes; * use sensing or imaging techniques to observe the brain during conscious activity; * make imaginative use of patient populations to analyze cognition; * develop new theories of the human mind/brain system. This list of examples is necessarily incomplete but should suggest the general kind of proposals desired. Ideally, a small grant-in-aid for research should facilitate the initial exploration of a novel or risky idea, with success leading to more extensive funding from other sources. APPLICATIONS Applicants should submit five copies of the following information: * a brief, one-page abstract describing the proposed work; * a brief, itemized budget that includes direct and indirect costs (indirect costs may not exceed 10 percent of total salaries and fringe benefits); * a budget justification; * a narrative proposal that does not exceed 5,000 words; the 5,000-word proposal should include: 1) a description of the work to be done and where it might lead; 2) an account of the investigator's professional qualifications to do the work; 3) an account of any plans to collaborate with other cognitive neuroscientists; 4) a brief description of the available research facilities; * curriculum(a) vitae of the participating investigator(s); * an authorized document indicating clearance for the use of human and animal subjects; * an endoresement letter from the officer of the sponsoring institution who will be responsible for administering the grant. One copy of the following items must also be submitted along with the proposal. These documents should be readily available from the sponsoring institution's grants or development office. * A copy of the IRS determination letter, or the international equivalent, stating that the sponsoring organization is a nonprofit, tax-exempt institution classified as a 501(c)(3) organization. * A copy of the IRS determination letter stating that your organization is not listed as a private foundation under section 509(a) of the Internal Revenue Service Code. * A statement on the sponsoring institution's letterhead, following the wording on Attachment A and signed by an officer of the institution, certifying that the status or purpose of the organization has not changed since the issuance of the IRS determinations. (If your organization's name has changed, include a copy of the IRS document reflecting this change.) * An audited financial statement of the most recently completed fiscal year of the sponsoring organization. * A current list of the names and professional affiliations of the members of the organization's board of trustees and the names and titles of the principal officers. Other appended documents will not be accepted for evaluation and will be returned to the applicant. Any incomplete proposals will also be returned to the applicant. Submissions will be reviewed by the program's advisory board. Applications must be postmarked on or before FEBRUARY 1 to be considered for review. INFORMATION McDonnell-Pew Program in Cognitive Neuroscience Green Hall 1-N-6 Princeton University Princeton, New Jersey 08544-1010 Telephone: 609-258-5014 Facsimile: 609-258-3031 Email: cns at clarity.princeton.edu ADVISORY BOARD Emilio Bizzi, M.D. Eugene McDermott Professor in the Brain Sciences and Human Behavior Chairman, Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, E25-526 Cambridge, Massachusetts 02139 Sheila E. Blumstein, Ph.D. Professor of Cognitive and Linguistic Sciences Dean of the College Brown University University Hall, Room 218 Providence, Rhode Island 02912 Stephen J. Hanson, Ph.D. Head, Learning Systems Department Siemens Corporate Research 755 College Road East Princeton, New Jersey 08540 Jon H. Kaas, Ph.D. Centennial Professor Department of Psychology Vanderbilt University 301 Wilson Hall 111 21st Avenue South Nashville, Tennessee 37240 George A. Miller, Ph.D. Director, McDonnell-Pew Program in Cognitive Neuroscience James S. McDonnell Distinguished University Professor of Psychology Department of Psychology Princeton University Princeton, New Jersey 08544-1010 Mortimer Mishkin, Ph.D. Chief, Laboratory of Neurpsychology National Institute of Mental Health 9000 Rockville Pike Building 49, Room 1B80 Bethesda, Maryland 20892 Marcus E. Raichle, M.D. Professor of Neurology and Radiology Division of Radiation Sciences Washington University School of Medicine Campus Box 8225 510 S. Kingshighway Boulevard St. Louis, Missouri 63110 Endel Tulving, Ph.D. Tanenbaum Chair in Cognitive Neuroscience Rotman Research Institute of Baycrest Centre 3560 Bathurst Street North York, Ontario M6A 2E1 Canada  From dhw at santafe.edu Tue Aug 17 21:26:08 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Tue, 17 Aug 93 19:26:08 MDT Subject: Yet more on averaging Message-ID: <9308180126.AA02904@zia> In several recent e-mail conversations, Michael Perrone and I have gotten to where think we agree with each other substance, although we disagree a bit on emphasis. To complete the picture for the connectionist community and present the other side to Michael's recent posting: In my back pocket, I have a number. I'll fine you according to the squared difference between your guess for the number and its actual value. Okay, should you guess 3 or 5? Obviously you can't answer. 7 or 5? Same response. 5 or a random sample of 3 or 7? Now, as Michael points out, you *can* answer: 5. However I'm not as convinced as Michael that this actually tells us anything of practical use. How should you use this fact to help you guess the number in my back pocket? Seems to me you can't. The bottom line, as I see it: arguments like Michael's show that one should always use a single-valued learning algorithm rather than a stochastic one. (Subtle caveat: If used only once, there is no difference between a stochastic learning algorithm and a single-valued one; multiple trials are implicitly assumed here.) But if one has before one a smorgasbord of single-valued learning algorithms, one can not infer that one should average over them. Even if I choose amongst them in a really stupid way (say according to the alphabetical listing of their creators), *so long as I am consistent and single-valued in how I make my choice*, I have no assurace that doing this will give worse results than averaging them. To sum it up: one can not prove averaging to be preferable to a scheme like using the alphabet to pick. Michael's result shows instead that averaging the guess is better (over multiple trials) than randomly picking amongst the guesses. Which simply means that one should not randomly pick amongst the guesses. It does *not* mean that one should average rather than use some other (arbitrarilly silly) single-valued scheme. David Wolpert Disclaimer: All the above notwithstanding, I personally *would* use some sort of averaging scheme in practice. The only issue of contention here is what is *provably* the way one should generalize. In addition to disseminating the important result concerning the sub-optimality of stochastic schems (of which there are many in the neural nets community!), Michael is to be commended for bringing this entire fascinating subject to the attention of the community.  From tmb at idiap.ch Wed Aug 18 02:27:58 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Wed, 18 Aug 93 08:27:58 +0200 Subject: Yet more on averaging Message-ID: <9308180627.AA18505@idiap.ch> In reply to dhw at santafe.edu: |The bottom line, as I see it: arguments like Michael's show that one |should always use a single-valued learning algorithm rather than a |stochastic one. From tmb at idiap.ch Wed Aug 18 02:29:42 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Wed, 18 Aug 93 08:29:42 +0200 Subject: Yet more on averaging Message-ID: <9308180629.AA18508@idiap.ch> dhw at santafe.edu writes: |To sum it up: one can not prove averaging to be preferable to a scheme |like using the alphabet to pick. Michael's result shows instead that |averaging the guess is better (over multiple trials) than randomly |picking amongst the guesses. | |Which simply means that one should not randomly pick amongst the |guesses. It does *not* mean that one should average rather than use |some other (arbitrarilly silly) single-valued scheme. I would like to strengthen this point a little. In general, averaging is clearly not optimal, nor even justifiable on theoretical grounds. For example, let us take the classification case and let us assume that each neural network $i$ returns an estimate $p^i_j(x)$ of the probability that the object belongs to class $j$ given the measurement $x$. Consider now the case in which we know that the predictions of those networks are statistically independent (for example, because they are run on independent parts of the input data). Then, we should really multiply the probabilities estimated by each network, rather than computing a weighted sum. That is, we should make a decision according to the maximum of $\prod_i p^i_j(x)$, not according to the maximum of $\sum_i w_i p^i_j(x)$ (assuming a 0-1 loss function). As another example, consider the case in which we have an odd number of experts. If they are trained and designed individually in a particularly peculiar way, it might turn out that the optimal decision rule is to output class 1 if an odd number of them pick class 1, and pick class 0 otherwise. Now, Michael probably limits the scope of his claims in his thesis to exclude such cases (I only had a brief look, I must admit), but I think it is important to make the point that, without some additional assumptions, averaging is just a heuristic and not necessarily optimal. Still, linear combinations of the outputs of classifiers, regressors, and networks seem to be useful in practice for improving classification rates in many cases. Lots of practical experience in both statistics and neural networks points in that direction. Thomas.  From dhw at santafe.edu Wed Aug 18 18:37:32 1993 From: dhw at santafe.edu (dhw@santafe.edu) Date: Wed, 18 Aug 93 16:37:32 MDT Subject: Random vs. single-valued rules Message-ID: <9308182237.AA03709@zia> tmb writes: >>>>> In reply to dhw at santafe.edu: |The bottom line, as I see it: arguments like Michael's show that one |should always use a single-valued learning algorithm rather than a |stochastic one. >From context, I'm assuming that you are referring to "deterministic" vs. "randomized" decision rules, as they are called in decision theory ("stochastic learning algorithm" means something different to me, but maybe I'm just misinterpreting your posting). Picking an opinion from a pool of experts randomly is clearly not a particularly good randomized decision rule in most cases. However, there are cases in which properly chosen randomized decision rules are important (any good introduction on Bayesian statistics should discuss this). Unless there is an intelligent adversary involved, such cases are probably mostly of theoretical interest, but nonetheless, a randomized decision rule can be "better" than any deterministic one. >>>> Implicit in my statement was the context of Michael Perrone's posting (which I was responding to): convex loss functions, and the fact that in particular, one "single-valued learning algorithm" one might use is the one Michael advocates: average over your pool of experts. Obviously one can choose a single-valued learning algorithm which performs more poorly than randomly drawing from a pool of experts: 1) One can prove that (for convex loss) averaging over the pool is preferable to randomly sampling the pool (Michael's result; note assumptions about lack of correlations between the experts and the like apply.) 2) One can not prove that averaging beats any other single-valued use of the experts. 3) Note that neither (1) nor (2) contradict the assertion that there might be single-valued algorithms which perform worse than randomly sampling the pool. 4) For the case of a 0-1 loss function, and a uniform prior over target functions, it doesn't matter how you guess; all algorithms perform the same, both averaged over data and for one particular data (as far as off-training set average loss is concerned). David Wolpert  From tmb at idiap.ch Thu Aug 19 09:17:14 1993 From: tmb at idiap.ch (Thomas M. Breuel) Date: Thu, 19 Aug 93 15:17:14 +0200 Subject: Yet more on averaging In-Reply-To: <9308180629.AA18508@idiap.ch> References: <9308180629.AA18508@idiap.ch> Message-ID: <9308191317.AA22756@idiap.ch> I wrote, in response to a discussion of Michael Perrone's work: |In general, averaging is clearly not optimal, nor even justifiable on |theoretical grounds. [... some examples follow...] Judging from some private mail that I have been receiving, some people seem to have misunderstood my message. I wasn't making a statement about Michael's results per se, but about their application. In particular, in the case of combining estimates of probabilities by different "experts" for subsequent classification (e.g., in Michael's OCR example), or in the case of combining expert "votes", using any kind of linear combination is not justifiable in general on theoretical grounds, and it is actually provably suboptimal in some cases. Now, such examples do violate some of the assumptions on which Michael's results rely, so there is no contradiction. My message was only intended as a reminder that there are a number of important problems in which the assumptions actually are violated, and in which the approach of linear combinations reduces to a heuristic (one, I might add, that often does work well in practice). Thomas.  From brandyn at brainstorm.com Fri Aug 20 03:32:18 1993 From: brandyn at brainstorm.com (Brandyn) Date: Fri, 20 Aug 93 00:32:18 PDT Subject: Paper available on neuroprose Message-ID: <9308200732.AA14000@brainstorm.com> FTP-host: archive.cis.ohio-state.edu FTP-file: pub/neuroprose/webb.furf.ps.Z The following paper is now available by anonymous FTP: Fusion-Reflection (Self-Supervised Learning) Brandyn Jerad Webb brandyn at brainstorm.com ABSTRACT By analyzing learning from the perspective of knowledge acquisition, a number of common limitations are overcome. Modeling efficacy is proposed as an empirical measure of knowledge, providing a concrete, mathematical means of "acquiring knowledge" via gradient ascent. A specific network architecture is described, a hierarchical analog of node-labeled Hidden Markov Models, and its evaluation and learning laws are derived. In empirical studies using a hand-printed character recognition task, an unsupervised network was able to discover n-gram statistics from groups of letter images, and to use these statistics to enhance its ability to later identify individual letters. Host: archive.cis.ohio-state.edu (128.146.8.52) Directory: pub/neuroprose Filename: webb.furf.ps.Z A version of this paper was submitted to NIPS in May '93. If there is sufficient interest, and if it wouldn't violate neuroprose etiquette, I could possibly make the C code available as well. -Brandyn (brandyn at brainstorm.com)  From mikewj at signal.dra.hmg.gb Fri Aug 20 12:00:02 1993 From: mikewj at signal.dra.hmg.gb (mikewj@signal.dra.hmg.gb) Date: Fri, 20 Aug 93 17:00:02 +0100 Subject: Practical Neural Nets Conference & Workshops in UK. Message-ID: AA16188@ravel.dra.hmg.gb *********************************** NEURAL COMPUTING APPLICATIONS FORUM *********************************** ***************************************** PRACTICAL APPLICATIONS OF NEURAL NETWORKS CALL FOR PRESENTATIONS ***************************************** The Neural Computing Applications Forum is the primary meeting place for people developing Neural Network applications in industry and academia. It has 200 members in the UK and Europe, from Universities and small and large companies, and holds four main meetings each year. It has been running for three years. Presentations, tutorials, and workshops are sought on all practical aspects of Neural Computing and Pattern Recognition. Previous events have included presentations and workshops on practical issues including machine health monitoring, neural control, financial prediction, chemical structure analysis, power station load prediction, copyright law, alternative energy, automatic speech recognition, and human-computer interaction. We also hold introductory tutorials and theoretical workshops on all aspects of Neural computing. Presentations at NCAF do not require a written paper for publication. You will have the chance to draw the attention of the top industrial Neural Network practitioners to your work. conference presenters of outstanding quality will be invited to submit a paper to the Springer Verlag journal Neural Computing and Applications. Please contact Mike Wynne-Jones, Programme Organiser, NCAF, PO Box 62, Malvern, WR14 4NU, UK, enclosing your proposed title and a brief synopsis of your presentation. Email: mikewj at signal.dra.hmg.gb; phone +44 684 563858.  From shashem at ecn.purdue.edu Sat Aug 21 18:08:11 1993 From: shashem at ecn.purdue.edu (Sherif Hashem) Date: Sat, 21 Aug 93 17:08:11 -0500 Subject: Combining (averaging) NNs Message-ID: <9308212208.AA18678@cornsilk.ecn.purdue.edu> I have recently joined Connectionists and I read some of the email messages arguing about combining/averaging NNs. Unfortunately, I missed the earlier discussion that started this argument. I am interested in combining NNs, in fact, my Ph.D. thesis is about optimal linear combinations of NNs. Averaging a number of estimators has been suggested/debated/examined in the literature for a long time, dating as far as 1818 (Laplace 1818). Clemen (1989) cites more than 200 papers in his review of the literature related to combining forecasts (estimators), including contributions from forecasting, psychology, statistics, and management science literatures. Numerous empirical studies have been conducted to assess the benefits/limitations of combining estimators (Clemen 1989). Besides, there are quite a few analytical results established in the area. Most of these studies and results are in the forecasting literature (more than 100 publications in the last 20 years). I think that it is fair to say that, as long as no "absolute" best estimator can be identified, combining estimators may provide a superior alternative to picking the best from a population of estimators. I have published some of my preliminary results on the benefits of combining NNs in (Hashem and Schmeiser 1992, 1993a, and Hashem et al. 1993b), and based on my experience with combining NNs, I join Michael Perrone in advocating the use of combining NNs to enhance the estimation accuracy of NN based models. Sherif Hashem email:shashem at ecn.purdue.edu References: ----------- Clemen, R.T. (1989). Combining Forecasts: A Review and Annotated Bibliography. International Journal of Forecasting, Vol. 5, pp. 559-583. Hashem, S., Y. Yih, & B. Schmeiser (1993b). An Efficient Model for Product Allocation using Optimal Combinations of Neural Networks. In Intelligent Engineering Systems through Artificial Neural Networks, Vol. 3, C. Dagli, L. Burke, B. Femandez, & J. Ghosh (Eds.), ASME Press, forthcoming. Hashem, S., & B. Schmeiser (1993a). Approximating a Function and its Derivatives using MSE-Optimal Linear Combinations of Trained Feedforward Neural Networks. Proceedings of the World Congress on Neural Networks, Lawrence Erlbaum Associates, New Jersey, Vol. 1, pp. 617-620. Hashem, S., & B. Schmeiser (1992). Improving Model Accuracy using Optimal Linear Combinations of Trained Neural Networks, Technical Report SMS92-16, School of Industrial Engineering, Purdue University. (Submitted) Laplace P.S. de. (1818). Deuxieme Supplement a la Theorie Analytique des Probabilites (Courcier, Paris).; reprinted (1847) in Oeuvers Completes de Laplace, Vol. 7 (Paris, Gauthier-Villars) 531-580.  From furu at uchikawa.nuem.nagoya-u.ac.jp Mon Aug 23 11:22:41 1993 From: furu at uchikawa.nuem.nagoya-u.ac.jp (Takeshi Furuhashi) Date: Mon, 23 Aug 93 11:22:41 JST Subject: Call for Papers of WWW Message-ID: <9308230222.AA00124@cancer.uchikawa.nuem.nagoya-u.ac.jp> CALL FOR PAPERS TENTATIVE 1994 IEEE/Nagoya University World Wisemen/women Workshop(WWW) ON FUZZY LOGIC AND NEURAL NETWORKS/GENETIC ALGORITHMS -Architecture and Applications for Knowledge Acquisition/Adaptation- August 9 and 10, 1994 Nagoya University Symposion Chikusa-ku, Nagoya, JAPAN Sponsored by Nagoya University Co-sponsored by IEEE Industrial Electronics Society Technically Co-sponsored by IEEE Neural Network Council IEEE Robotics and Automation Society International Fuzzy Systems Association Japan Society for Fuzzy Theory and Systems North American Fuzzy Information Processing Society Society of Instrument and Control Engineers Robotics Society of Japan There are growing interests in combination technologies of fuzzy logic and neural networks, fuzzy logic and genetic algorithm for acquisition of experts' knowledge, modeling of nonlinear systems, realizing adaptive systems. The goal of the 1994 IEEE/Nagoya University WWW on Fuzzy Logic and Neural Networks/Genetic Algorithm is to give its attendees opportunities to exchange information and ideas on various aspects of the Combination Technologies and to stimulate and inspire pioneering works in this area. To keep the quality of these workshop high, only a limited number of people are accepted as participants of the workshops. The papers presented at the workshop will be edited and published from the Oxford University Press. TOPICS: Combination of Fuzzy Logic and Neural Networks, Combination of Fuzzy Logic and Genetic Algorithm, Learning and Adaptation, Knowledge Acquisition, Modeling, Human Machine Interface IMPORTANT DATES: Submission of Abstracts of Papers : April 31, 1994 Acceptance Notification : May 31, 1994 Final Manuscript : July 1, 1994 A partial or full assistance of travel expenses for speakers of excellent papers will be provided by the WWW. The candidates should apply as soon as possible, preferably by Jan. 30, '94 All correspondence and submission of papers should be sent to Takeshi Furuhashi, General Chair Dept. of Information Electronics, Nagoya University Furo-cho, Chikusa-ku, Nagoya 464-01, JAPAN TEL: +81-52-781-5111 ext.2792 FAX: +81-52-781-9263 E mail: furu at uchikawa.nuem.nagoya-u.ac.jp IEEE/Nagoya University WWW: IEEE/Nagoya University WWW(World Wisemen/women Workshop) is a series of workshops sponsored by Nagoya University and co-sponsored by IEEE Industrial Electronics Society. City of Naoya, located two hours away from Tokyo, has many electro-mechanical industries in its surroundings such as Mitsubishi, TOYOTA, and their allied companies. Nagoya is a mecca of robotics industries, machine industries and aerospace industries in Japan. The series of workshops will give its attendees opportunities to exchange information on advanced sciences and technologies and to visit industries and research institutes in this area. *This workshop will be held just after the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing(IIZUKA'94) from Aug. 1 to 7, '94. WORKSHOP ORGANIZATION Honorary Chair: Tetsuo Fujimoto (Dean, School of Engineering, Nagoya University) General Chair: Takeshi Furuhashi (Nagoya University) Advisory Committee: Chair: Toshio Fukuda (Nagoya University) Fumio Harashima (University of Tokyo) Yoshiki Uchikawa (Nagoya University) Takeshi Yamakawa (Kyushu Institute of Technology) Steering Committee: H.Berenji (NASA Ames Research Center) W.Eppler (University of Karlsruhe) I.Hayashi (Hannan University) Y.Hayashi (Ibaraki University) H.Ichihashi (Osaka Prefectural University) A.Imura (Laboratory for International Fuzzy Engineering) M.Jordan (Massachusetts Institute of Technology) C.-C.Jou (National Chiao Tung Universtiy) E.Khan (National Semiconductor) R.Langari (Texas A & M University) H.Takagi (Matsushita Electric Industrial Co., Ltd.) K.Tanaka (Kanazawa University) M.Valenzuela-Rendon (Institute Tecnologico y de Estudios Superiores de Monterrey) L.-X.Wang (University of California Berkeley) T.Yamaguchi (Utsunomiya University) J.Yen (Texas A & M Universtiy)  From joachim at fit.qut.edu.au Wed Aug 25 21:46:11 1993 From: joachim at fit.qut.edu.au (Joachim Diederich) Date: Wed, 25 Aug 1993 21:46:11 -0400 Subject: Second Brisbane Neural Network Workshop Message-ID: <199308260146.VAA09819@fitmail.fit.qut.edu.au> Second Brisbane Neural Network Workshop --------------------------------------- Queensland University of Technology Brisbane Q 4001, AUSTRALIA Gardens Point Campus, ITE 410 24 September 1993 This Second Brisbane Neural Network Workshop is intended to bring together those interested in neurocomputing and neural network applications. The objective of the workshop is to provide a discussion platform for researchers and practitioners interested in theoretical and applied aspects of neurocomputing. The workshop should be of interest to computer scientists and engineers, as well as to biologists, cognitive scientists and others interested in the application of neural networks. The Second Brisbane Neural Network Workshop will be held at Queensland University of Technology, Gardens Point Campus (ITE 410) on September 24, 1993 from 9:00am to 6:00pm. Program ------- 9:00-9:15 Welcome Joachim Diederich, Queensland University of Technology, Neurocomputing Research Concentration Area Cognitive Science ----------------- 9:15-10:00 Graeme Halford, University of Queensland, Department of Psychology "Representation of concepts in PDP models" 10:00-10:30 Joachim Diederich, Queensland University of Technology, Neurocomputing Research Concentration Area "Re-learning in connectionist semantic networks" 10:30-11:00 Coffee Break 11:00-11:30 James Hogan, Queensland University of Technology, Neurocomputing Research Concentration Area "Recruitment learning in randomly connected neural networks" 11:30-12:00 Kate Stevens, University of Queensland, Department of Psychology "Music perception and neural network modelling" 12:00-1:00 Lunch Break 1:00-1:30 Software Demonstration: "Animal breeding advice using neural networks" Learning -------- 1:30-2:15 Tom Downs, University of Queensland, Department of Electrical Engineering "Generalisation, structure and learning in artificial neural networks" 2:15-3:00 Ah Chung Tsoi, University of Queensland, Department of Electrical Engineering "Training algorithms for recurrent neural networks, a unified framework" 3:00-3:30 Steven Young, University of Queensland, Department of Electrical Engineering "Constructive algorithms for neural networks" 3:30-4:00 Coffee Break Pattern Recognition and Control ------------------------------- 4:00-4:30 Gerald Finn, Queensland University of Technology, Neurocomputing Research Concentration Area "Learning fuzzy rules by genetic algorithms" 4:30-5:00 Paul Hannah & Russel Stonier, University of Central Queensland, Department of Mathematics and Computing "Using a modified Kohonen associative map for function approximation with application to control" Theory and Artificial Intelligence ---------------------------------- 5:00-5:30 M. Mohammadian, X. Yu & J.D. Smith, University of Central Queensland, Department of Mathematics and Computing "From connectionist learning to an optimised fuzzy knowledge base" 5:30-6:00 Richard Bonner & Louis Sanzogni, Griffith University, School of Information Systems & Management Science "Embedded neural networks" All are welcome. Participation is free and there is no registration. Enquiries should be sent to Professor Joachim Diederich Neurocomputing Research Concentration Area School of Computing Science Queensland University of Technology GPO Box 2434 Brisbane Q 4001 Australia Phone: +61 7 864-2143 Fax: +61 7 864-1801 Email: joachim at fitmail.fit.qut.edu.au  From sims at pdesds1.scra.org Thu Aug 26 11:48:04 1993 From: sims at pdesds1.scra.org (Jim Sims) Date: Thu, 26 Aug 93 11:48:04 EDT Subject: fyi, late, but better than never Message-ID: <9308261548.AA07086@pdesds1.noname> I saw this while browsing the electronic CBD materils. Agency : NAS Deadline : 12/01/93 Title : Neurolab Reference: Commerce Business Dailly, 07/06/93 BASIC RESEARCH OPPORTUNITY SOL OA SLS-4 POC Dr. Frank Sulzman tel: 202/358-2359 The National Aeronautics and Space Administration (NASA), along with its domestic (NIH, NSF) and international (CNES, CSA, DARA, ESA, NASDA) partners is soliciting proposals for Neurolab, a Space Shuttle mission dedicated to brain and behavior research that is scheduled for launch in 1998. A more detailed description of the opportunity with specific guidelines for proposal preparation is available from Neurolab Program Scientist, NASA Headquarters, Code UL, 300 E St., SW, Washington, DC 20546. This NASA Announcement of Opportunity will be open for the period through December 1, 1993. (0182) SPONSOR: NASA Headquarters, Code UL/Neurolab Program Scientist, Washington, DC 20546 Attn:UL/Dr. Frank Sulzman  From PIURI at IPMEL1.POLIMI.IT Fri Aug 27 07:55:19 1993 From: PIURI at IPMEL1.POLIMI.IT (PIURI@IPMEL1.POLIMI.IT) Date: 27 Aug 1993 12:55:19 +0100 (MET) Subject: call for papers Message-ID: <01H28OFZ9KS291WC7T@icil64.cilea.it> ============================================================================= 14th IMACS WORLD CONGRESS ON COMPUTATION AND APPLIED MATHEMATICS July 11-15, 1994 Atlanta, Georgia, USA Sponsored by: IMACS - International Association for Mathematics and Computers in Simulation IFAC - International Federation for Automatic Control IFIP - International Federation for Information Processing IFORS - International Federation of Operational Research Societies IMEKO - International Measurement Confederation General Chairman: Prof. W.F. Ames Georgia Institute of Technology, Atlanta, GA, USA SESSIONS ON NEURAL NETWORKS 1. NEURAL NETWORK ARCHITECTURES AND IMPLEMENTATIONS 2. APPLICATION OF NEURAL TECHNIQUES FOR SIGNAL AND IMAGE PROCESSING >>>>>> CALL FOR PAPERS <<<<<< The IMACS World Congress on Computation and Applied Mathematics is held every three year to provide a large general forum to professionals and scientists for analyzing and discussing the fundamental advances of research in all areas of scientific computation, applied mathematics, mathematical modelling, and system simulation in and for specific disciplines, the philosophical aspects, and the impact on society and on disciplinary and interdisciplinary research. In the 14th edition, two sessions are planned on neural networks: "Neural Network Architectures and Implementations" and "Application of Neural Techniques for Signal and Image Processing". The first session will focus on all theoretical and practical aspects of architectural design and realization of neural networks: from mathematical analysis and modelling to behavioral specification, from architectural definition to structural design, from VLSI implementation to software emulation, from design simulation at any abstraction level to CAD tools for neural design, simulation and evaluation. The second session will present the concepts, the design and the use of neural solutions within the area of signal and image processing, e.g., for modelling, identification, analysis, classification, recognition, and filtering. Particular emphasis will be given to presentation of specific applications or application areas. Authors interested in the above neural sessions are invited to send a one page abstract, the title of the paper and the author's address by electronic mail, fax or postal mail to the Neural Sessions' Chairman by October 15, 1993. Authors must then submit five copies of their typed manuscript by postal mail or fax to the Neural Sessions' Chairman by November 19, 1993. Preliminary notification of acceptance/rejection will be mailed by November 30, 1993. Final acceptance/rejection will be mailed by January 31, 1994. Neural Sessions' Chairman: Prof. Vincenzo Piuri Department of Electronics and Information Politecnico di Milano piazza L. da Vinci 32 I-20133 Milano, Italy phone no. +39-2-23993606, +39-2-23993623 fax no. +39-2-23993411 e-mail piuri at ipmel1.polimi.it =============================================================================  From goodman at unr.edu Thu Aug 26 12:35:53 1993 From: goodman at unr.edu (Phil Goodman) Date: Thu, 26 Aug 93 16:35:53 GMT Subject: NevProp 1.16 Update Available Message-ID: <9308262335.AA24854@equinox.ccs.unr.edu> Please consider the following update announcement: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * NevProp 1.16 corrects a bug in the output range of symmetric sigmoids and one occuring when the number of testing is fewer than training cases. These fixes are further described in the README.CHANGES file at the UNR anonymous ftp, described below. The UNR anonymous ftp host is 'unssun.scs.unr.edu', and the files are in the directory 'pub/goodman/nevpropdir'. Version 1.15 users can update 3 ways: a. Just re-ftp the 'nevprop1.16.shar' file and unpack and 'make' np again. (also available at the CMU machine, describe below.) b. Just re-ftp (in "binary" mode) the DOS or MAC executable binaries located in the 'dosdir' or 'macdir' subdirectories, respectively. c. Ftp only the 'np.c' file provided, replacing your old version, then 'make' d. Ftp only the 'np-patchfile', then issue the command 'patch < np-patchfile' to locally update np.c, then 'make' again. New users can obtain NevProp 1.16 from the anonymous UNR anonymous ftp as described in (a) or (b) above, or from the CMU machine: a. Create an FTP connection from wherever you are to machine "ftp.cs.cmu.edu". The internet address of this machine is 128.2.206.173, for those who need it. b. Log in as user "anonymous" with your own ID as password. You may see an error message that says "filenames may not have /.. in them" or something like that. Just ignore it. c. Change remote directory to "/afs/cs/project/connect/code". NOTE: You must do this in a single operation. Some of the super directories on this path are protected against outside users. d. At this point FTP should be able to get a listing of files in this directory with "dir" & fetch the ones you want with "get". (The exact FTP commands depend on your local FTP server.) Version 1.2 will be released soon. A major new feature will be the option of using cross-entropy rather than least squares error function. Phil ___________________________ ___________________________ Phil Goodman,MD,MS goodman at unr.edu | __\ | _ \ | \/ || _ \ Associate Professor & CBMR Director || ||_// ||\ /||||_// Cardiovascular Studies Team Leader || | _( || \/ ||| _( ||__ ||_\\ || |||| \\ CENTER for BIOMEDICAL MODELING RESEARCH |___/ |___/ || |||| \\ University of Nevada School of Medicine Washoe Medical Center H1-166, 77 Pringle Way, Reno, NV 89520 702-328-4867 FAX:328-4111 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  From heiniw at sun1.eeb.ele.tue.nl Fri Aug 27 08:37:13 1993 From: heiniw at sun1.eeb.ele.tue.nl (Heini Withagen) Date: Fri, 27 Aug 1993 14:37:13 +0200 (MET DST) Subject: Neural hardware performance criteria Message-ID: <9308271237.AA00409@sun1.eeb.ele.tue.nl> A non-text attachment was scrubbed... Name: not available Type: text Size: 1296 bytes Desc: not available Url : https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/00000000/c47cdf08/attachment-0001.ksh From alex at brain.physics.swin.oz.au Sat Aug 28 07:05:34 1993 From: alex at brain.physics.swin.oz.au (Alex A Sergejew) Date: Sat, 28 Aug 93 21:05:34 +1000 Subject: Pan Pacific Conf on Brain Electric Topography - 1st announcement Message-ID: <9308281105.AA12138@brain.physics.swin.oz.au> FIRST ANNOUNCEMENT PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY February 10 - 12, 1994 SYDNEY, AUSTRALIA INVITATION Brain electric and magnetic topography is an exciting emerging area which draws on the disciplines of neurophysiology, physics, signal processing, computing and cognitive neuroscience. This conference will offer a forum for the presentation of recent findings. The program will include an outstanding series of plenary lectures, as well as platform and poster presentations by active participants in the field. The conference includes two major plenary sessions. In the Plenary Session entitled "Brain Activity Topography and Cognitive Processes," the keynote speakers include Frank Duffy (Boston), Alan Gevins (San Francisco), Steven Hillyard (La Jolla), Yoshihiko Koga (Tokyo) and Paul Nunez (New Orleans). Keynote speakers for the Plenary Session entitled "Brain Rhythmic Activity and States of Consciousness," will include Walter Freeman (Berkeley), Rodolfo Llinas (New York), Shigeaki Matsuoka (Kitakyushu) and Yuzo Yamaguchi (Osaka). The plenary sessions will provide a forum for discussion of some of the most recent developments of analysis and models of electrical brain function, and findings of brain topography and cognitive processes. This conference is aimed at harnessing multidisciplinary participation and will be of interest to those working in the areas of clinical neurophysiology, cognitive neuroscience, biological signal processing, neurophysiology, neurology, neuropsychology and neuropsychiatry. CALL FOR PAPERS Papers are invited for platform and poster presentation. Platform presentations will be allocated 20 minutes (15 mins for presentation and 5 mins for questions). Abstracts of no more than 300 words are invited. The deadline for receipt of abstracts is November 10th, 1993, while notification of acceptance of abstracts will be sent on December 10th, 1993 The abstract can be sent by mail, Fax or Email to: PAN PACIFIC CONFERENCE ON BRAIN ELECTRIC TOPOGRAPHY C/- Cognitive Neuroscience Unit Westmead Hospital, Hawkesbury Road Westmead NSW 2145, Sydney AUSTRALIA Fax : +61 (2) 635 7734 Tel : +61 (2) 633 6688 Email : pan at brain.physics.swin.oz.au Authors may be invited to provide full manuscripts for publication of the proceedings in CD-ROM and book form. All authors wishing to have their papers included must supply a full manuscript at the time of the conference. GENERAL INFORMATION: Date: February 10 - 12, 1994 Venue: The conference will be held at the Hotel Intercontinental on Sydney Harbour. Climate: February is summertime in Australia and the average maximum day-time temperature in Sydney is 26 degC (78 degF). Social Programme: There will be a conference dinner on a yacht sailing Sydney Harbour on February 11th, 1994. Cost $A65 per person. Hotel Accommodation: Hotels listed offer a range of accommodation at special conference rates. Please quote the name of the conference when arranging your booking. Scientific Committee: Organising Committee: Prof Richard Silberstein, Melbourne (Chairman) E Gordon (Chairman) A/Prof Helen Beh, Sydney R Silberstein Dr Evian Gordon, Sydney J Restom Dr Shigeaki Matsuoka, Kitakyushu Dr Patricia Michie, Sydney Dr Ken Nagata, Akita Dr Alex Sergejew, Melbourne A/Prof James Wright, Auckland REGISTRATION: Name(Prof/Dr/Ms/Mr):__________________________________________________ Address:______________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ Telephone: ______________________________ (include country/area code) Fax:______________________________ E Mail______________________________ On or before November 10th, 1993 $A380.00 After November 10th, 1993 $A400.00 Students before November 10th,1993 $A250.00 Conference Harbour Cruise Dinner $A65.00 per person number of people _____ Method of Payment: Cheque _ MasterCard _ VISA _ BankCard _ To be completed by credit card users only: Card Number _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Expiration Date __________________________ Signature __________________________ (Signature not required if registering by E-mail) Date __________________________ Cheques should be payable to "Pan Pacific Conference" (Address below) SOME SUGGESTIONS FOR HOTEL ACCOMODATION Special conference rates apply. Quote the name of the conference when booking. Prices are per double room per night SYDNEY RENAISSANCE HOTEL***** Guaranteed harbour view. 10 min walk under cover. $A170.00 30 Pitt St, Sydney NSW 2000, Australia. Ph: +61 (2) 259 7000 Fax +61 (2) 252 1999 HOTEL INTERCONTINENTAL SYDNEY***** Harbour view $A205.00 City View $A165.00 117 Macquarie Street, Sydney NSW 2000, Australia. Ph: +61 (2) 230 0200 Fax: +61 (2) 240 1240 OLD SYDNEY PARKROYAL**** 10 min walk. $A190.00 including breakfast 55 George St, Sydney NSW 2000, Australia. Ph: +61 (2) 252 0524 Fax: (2) +61 251 2093 RAMADA GRAND HOTEL, BONDI BEACH**** Complementary shuttlebus service. $A130 - $A170 including breakfast Beach Rd, Bondi Beach NSW 2026, Australia. Ph: +61 (2) 365 5666 Fax: +61 (2) 3655 330 HOTEL CRANBROOK INTERNATIONAL*** Older style, budget type accomodation overlooking Rose Bay. Free shuttlebus service and airport transfers. $A80.00 including breakfast 601 New South Head Rd, Rose Bay NSW 2020, Australia. Ph: +61 (2) 252 0524 Fax: +61 (2) 251 2093 Post registration details with your cheque to: PAN PACIFIC CONFERENCE ON ELECTRIC BRAIN TOPOGRAPHY C/- Cognitive Neuroscience Unit Westmead Hospital, Hawkesbury Road Westmead NSW 2145, Sydney AUSTRALIA  From taylor at world.std.com Sun Aug 29 22:21:27 1993 From: taylor at world.std.com (Russell R Leighton) Date: Sun, 29 Aug 1993 22:21:27 -0400 Subject: AM6 Users: release notes and bug fixes available Message-ID: <199308300221.AA27236@world.std.com> There has been an update to the am6.notes file at the AM6 ftp sites. User's not on the AM6 users mailing list should get this file and update their installation. Russ ======== REPOST OF AM6 RELEASE (long) ======== The following describes a neural network simulation environment made available free from the MITRE Corporation. The software contains a neural network simulation code generator which generates high performance ANSI C code implementations for modular backpropagation neural networks. Also included is an interface to visualization tools. FREE NEURAL NETWORK SIMULATOR AVAILABLE Aspirin/MIGRAINES Version 6.0 The Mitre Corporation is making available free to the public a neural network simulation environment called Aspirin/MIGRAINES. The software consists of a code generator that builds neural network simulations by reading a network description (written in a language called "Aspirin") and generates an ANSI C simulation. An interface (called "MIGRAINES") is provided to export data from the neural network to visualization tools. The previous version (Version 5.0) has over 600 registered installation sites world wide. The system has been ported to a number of platforms: Host platforms: convex_c2 /* Convex C2 */ convex_c3 /* Convex C3 */ cray_xmp /* Cray XMP */ cray_ymp /* Cray YMP */ cray_c90 /* Cray C90 */ dga_88k /* Data General Aviion w/88XXX */ ds_r3k /* Dec Station w/r3000 */ ds_alpha /* Dec Station w/alpha */ hp_parisc /* HP w/parisc */ pc_iX86_sysvr4 /* IBM pc 386/486 Unix SysVR4 */ pc_iX86_sysvr3 /* IBM pc 386/486 Interactive Unix SysVR3 */ ibm_rs6k /* IBM w/rs6000 */ news_68k /* News w/68XXX */ news_r3k /* News w/r3000 */ next_68k /* NeXT w/68XXX */ sgi_r3k /* Silicon Graphics w/r3000 */ sgi_r4k /* Silicon Graphics w/r4000 */ sun_sparc /* Sun w/sparc */ sun_68k /* Sun w/68XXX */ Coprocessors: mc_i860 /* Mercury w/i860 */ meiko_i860 /* Meiko w/i860 Computing Surface */ Included with the software are "config" files for these platforms. Porting to other platforms may be done by choosing the "closest" platform currently supported and adapting the config files. New Features ------------ - ANSI C ( ANSI C compiler required! If you do not have an ANSI C compiler, a free (and very good) compiler called gcc is available by anonymous ftp from prep.ai.mit.edu (18.71.0.38). ) Gcc is what was used to develop am6 on Suns. - Autoregressive backprop has better stability constraints (see examples: ringing and sequence), very good for sequence recognition - File reader supports "caching" so you can use HUGE data files (larger than physical/virtual memory). - The "analyze" utility which aids the analysis of hidden unit behavior (see examples: sonar and characters) - More examples - More portable system configuration for easy installation on systems without a "config" file in distribution Aspirin 6.0 ------------ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. Aspirin was originally conceived as ``a way of dealing with MIGRAINES.'' Our goal was to create an underlying system that would exist behind the graphics and provide the network modeling facilities. The system had to be flexible enough to allow research, that is, make it easy for a user to make frequent, possibly substantial, changes to network designs and learning algorithms. At the same time it had to be efficient enough to allow large ``real-world'' neural network systems to be developed. Aspirin uses a front-end parser and code generators to realize this goal. A high level declarative language has been developed to describe a network. This language was designed to make commonly used network constructs simple to describe, but to allow any network to be described. The Aspirin file defines the type of network, the size and topology of the network, and descriptions of the network's input and output. This file may also include information such as initial values of weights, names of user defined functions. The Aspirin language is based around the concept of a "black box". A black box is a module that (optionally) receives input and (necessarily) produces output. Black boxes are autonomous units that are used to construct neural network systems. Black boxes may be connected arbitrarily to create large possibly heterogeneous network systems. As a simple example, pre or post-processing stages of a neural network can be considered black boxes that do not learn. The output of the Aspirin parser is sent to the appropriate code generator that implements the desired neural network paradigm. The goal of Aspirin is to provide a common extendible front-end language and parser for different network paradigms. The publicly available software will include a backpropagation code generator that supports several variations of the backpropagation learning algorithm. For backpropagation networks and their variations, Aspirin supports a wide variety of capabilities: 1. feed-forward layered networks with arbitrary connections 2. ``skip level'' connections 3. one and two-dimensional weight tessellations 4. a few node transfer functions (as well as user defined) 5. connections to layers/inputs at arbitrary delays, also "Waibel style" time-delay neural networks 6. autoregressive nodes. 7. line search and conjugate gradient optimization The file describing a network is processed by the Aspirin parser and files containing C functions to implement that network are generated. This code can then be linked with an application which uses these routines to control the network. Optionally, a complete simulation may be automatically generated which is integrated with the MIGRAINES interface and can read data in a variety of file formats. Currently supported file formats are: Ascii Type1, Type2, Type3 Type4 Type5 (simple floating point file formats) ProMatlab Examples -------- A set of examples comes with the distribution: xor: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 330-334. encode: from RumelHart and McClelland, et al, "Parallel Distributed Processing, Vol 1: Foundations", MIT Press, 1986, pp. 335-339. bayes: Approximating the optimal bayes decision surface for a gauss-gauss problem. detect: Detecting a sine wave in noise. iris: The classic iris database. characters: Learing to recognize 4 characters independent of rotation. ring: Autoregressive network learns a decaying sinusoid impulse response. sequence: Autoregressive network learns to recognize a short sequence of orthonormal vectors. sonar: from Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. spiral: from Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. ntalk: from Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. perf: a large network used only for performance testing. monk: The backprop part of the monk paper. The MONK's problem were the basis of a first international comparison of learning algorithms. The result of this comparison is summarized in "The MONK's Problems - A Performance Comparison of Different Learning algorithms" by S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang has been published as Technical Report CS-CMU-91-197, Carnegie Mellon University in Dec. 1991. wine: From the ``UCI Repository Of Machine Learning Databases and Domain Theories'' (ics.uci.edu: pub/machine-learning-databases). Performance of Aspirin simulations ---------------------------------- The backpropagation code generator produces simulations that run very efficiently. Aspirin simulations do best on vector machines when the networks are large, as exemplified by the Cray's performance. All simulations were done using the Unix "time" function and include all simulation overhead. The connections per second rating was calculated by multiplying the number of iterations by the total number of connections in the network and dividing by the "user" time provided by the Unix time function. Two tests were performed. In the first, the network was simply run "forward" 100,000 times and timed. In the second, the network was timed in learning mode and run until convergence. Under both tests the "user" time included the time to read in the data and initialize the network. Sonar: This network is a two layer fully connected network with 60 inputs: 2-34-60. Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 2.8 HP9000/720: 4.0 Meiko i860 (40MHz) : 4.4 Mercury i860 (40MHz) : 5.6 Cray YMP: 21.9 Cray C90: 33.2 Forward/Backward: SparcStation1: 0.3 IBM RS/6000 320: 0.8 Meiko i860 (40MHz) : 0.9 HP9000/720: 1.1 Mercury i860 (40MHz) : 1.3 Cray YMP: 7.6 Cray C90: 13.5 Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89. Nettalk: This network is a two layer fully connected network with [29 x 7] inputs: 26-[15 x 8]-[29 x 7] Millions of Connections per Second Forward: SparcStation1: 1 IBM RS/6000 320: 3.5 HP9000/720: 4.5 Mercury i860 (40MHz) : 12.4 Meiko i860 (40MHz) : 12.6 Cray YMP: 113.5 Cray C90: 220.3 Forward/Backward: SparcStation1: 0.4 IBM RS/6000 320: 1.3 HP9000/720: 1.7 Meiko i860 (40MHz) : 2.5 Mercury i860 (40MHz) : 3.7 Cray YMP: 40 Cray C90: 65.6 Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks that learn to pronounce English text" in Complex Systems, 1, 145-168. Perf: This network was only run on a few systems. It is very large with very long vectors. The performance on this network is in some sense a peak performance for a machine. This network is a two layer fully connected network with 2000 inputs: 100-500-2000 Millions of Connections per Second Forward: Cray YMP 103.00 Cray C90 220 Forward/Backward: Cray YMP 25.46 Cray C90 59.3 MIGRAINES ------------ The MIGRAINES interface is a terminal based interface that allows you to open Unix pipes to data in the neural network. This replaces the NeWS1.1 graphical interface in version 4.0 of the Aspirin/MIGRAINES software. The new interface is not a simple to use as the version 4.0 interface but is much more portable and flexible. The MIGRAINES interface allows users to output neural network weight and node vectors to disk or to other Unix processes. Users can display the data using either public or commercial graphics/analysis tools. Example filters are included that convert data exported through MIGRAINES to formats readable by: - Gnuplot 3 - Matlab - Mathematica - Xgobi Most of the examples (see above) use the MIGRAINES interface to dump data to disk and display it using a public software package called Gnuplot3. Gnuplot3 can be obtained via anonymous ftp from: >>>> In general, Gnuplot 3 is available as the file gnuplot3.?.tar.Z >>>> Please obtain gnuplot from the site nearest you. Many of the major ftp >>>> archives world-wide have already picked up the latest version, so if >>>> you found the old version elsewhere, you might check there. >>>> >>>> NORTH AMERICA: >>>> >>>> Anonymous ftp to dartmouth.edu (129.170.16.4) >>>> Fetch >>>> pub/gnuplot/gnuplot3.?.tar.Z >>>> in binary mode. >>>>>>>> A special hack for NeXTStep may be found on 'sonata.cc.purdue.edu' >>>>>>>> in the directory /pub/next/submissions. The gnuplot3.0 distribution >>>>>>>> is also there (in that directory). >>>>>>>> >>>>>>>> There is a problem to be aware of--you will need to recompile. >>>>>>>> gnuplot has a minor bug, so you will need to compile the command.c >>>>>>>> file separately with the HELPFILE defined as the entire path name >>>>>>>> (including the help file name.) If you don't, the Makefile will over >>>>>>>> ride the def and help won't work (in fact it will bomb the program.) NetTools ----------- We have include a simple set of analysis tools by Simon Dennis and Steven Phillips. They are used in some of the examples to illustrate the use of the MIGRAINES interface with analysis tools. The package contains three tools for network analysis: gea - Group Error Analysis pca - Principal Components Analysis cda - Canonical Discriminants Analysis Analyze ------- "analyze" is a program inspired by Denis and Phillips' Nettools. The "analyze" program does PCA, CDA, projections, and histograms. It can read the same data file formats as are supported by "bpmake" simulations and output data in a variety of formats. Associated with this utility are shell scripts that implement data reduction and feature extraction. "analyze" can be used to understand how the hidden layers separate the data in order to optimize the network architecture. How to get Aspirin/MIGRAINES ----------------------- The software is available from two FTP sites, CMU's simulator collection and UCLA's cognitive science machines. The compressed tar file is a little less than 2 megabytes. Most of this space is taken up by the documentation and examples. The software is currently only available via anonymous FTP. > To get the software from CMU's simulator collection: 1. Create an FTP connection from wherever you are to machine "pt.cs.cmu.edu" (128.2.254.155). 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "/afs/cs/project/connect/code". Any subdirectories of this one should also be accessible. Parent directories should not be. ****You must do this in a single operation****: cd /afs/cs/project/connect/code 4. At this point FTP should be able to get a listing of files in this directory and fetch the ones you want. Problems? - contact us at "connectionists-request at cs.cmu.edu". 5. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 6. Get the file "am6.tar.Z" 7. Get the file "am6.notes" > To get the software from UCLA's cognitive science machines: 1. Create an FTP connection to "ftp.cognet.ucla.edu" (128.97.8.19) (typically with the command "ftp ftp.cognet.ucla.edu") 2. Log in as user "anonymous" with password your username. 3. Change remote directory to "pub/alexis", by typing the command "cd pub/alexis" 4. Set binary mode by typing the command "binary" ** THIS IS IMPORTANT ** 5. Get the file by typing the command "get am6.tar.Z" 6. Get the file "am6.notes" Other sites ----------- If these sites do not work well for you, then try the archie internet mail server. Send email: To: archie at cs.mcgill.ca Subject: prog am6.tar.Z Archie will reply with a list of internet ftp sites that you can get the software from. How to unpack the software -------------------------- After ftp'ing the file make the directory you wish to install the software. Go to that directory and type: zcat am6.tar.Z | tar xvf - -or- uncompress am6.tar.Z ; tar xvf am6.tar How to print the manual ----------------------- The user documentation is located in ./doc in a few compressed PostScript files. To print each file on a PostScript printer type: uncompress *.Z lpr -s *.ps Why? ---- I have been asked why MITRE is giving away this software. MITRE is a non-profit organization funded by the U.S. federal government. MITRE does research and development into various technical areas. Our research into neural network algorithms and applications has resulted in this software. Since MITRE is a publically funded organization, it seems appropriate that the product of the neural network research be turned back into the technical community at large. Thanks ------ Thanks to the beta sites for helping me get the bugs out and make this portable. Thanks to the folks at CMU and UCLA for the ftp sites. Copyright and license agreement ------------------------------- Since the Aspirin/MIGRAINES system is licensed free of charge, the MITRE Corporation provides absolutely no warranty. Should the Aspirin/MIGRAINES system prove defective, you must assume the cost of all necessary servicing, repair or correction. In no way will the MITRE Corporation be liable to you for damages, including any lost profits, lost monies, or other special, incidental or consequential damages arising out of the use or in ability to use the Aspirin/MIGRAINES system. This software is the copyright of The MITRE Corporation. It may be freely used and modified for research and development purposes. We require a brief acknowledgement in any research paper or other publication where this software has made a significant contribution. If you wish to use it for commercial gain you must contact The MITRE Corporation for conditions of use. The MITRE Corporation provides absolutely NO WARRANTY for this software. Russell Leighton ^ / |\ /| INTERNET: taylor at world.std.com |-| / | | | | | / | | |  From sun at umiacs.UMD.EDU Mon Aug 30 13:11:10 1993 From: sun at umiacs.UMD.EDU (Guo-Zheng Sun) Date: Mon, 30 Aug 93 13:11:10 -0400 Subject: Preprint Message-ID: <9308301711.AA06031@sunsp2.umiacs.UMD.EDU> Reprint: THE NEURAL NETWORK PUSHDOWN AUTOMATON: MODEL, STACK AND LEARNING SIMULATIONS The following reprint is available via the NEC Research Institute ftp archive external.nj.nec.com. Instructions for retrieval from the archive follow the abstract summary. Comments and remarks are always appreciated. ----------------------------------------------------------------------------- .............................................................................. "The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations" G.Z. Sun(a,b), C.L. Giles(b,c), H.H. Chen(a,b), Y.C. Lee(a,b) (a) Laboratory for Plasma Research and (b) Institute for Advanced Computer Studies, U. of Maryland, College Park, MD 20742 (c) NEC Research Institute, 4 Independence Way, Princeton, NJ 08540 In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one obvious approach to enhancing the processing power of a recurrent neural network is to couple it with an external stack mem ory - in effect creating a neural network pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. In order to couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars - the balanced parenthesis language, 1n0n, and the deterministic Palindrome - the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings. UNIVERSITY OF MARYLAND TR NOs. UMIACS-TR-93-77 & CS-TR-3118, August 20, 1993. --------------------------------------------------------------------------- FTP INSTRUCTIONS unix> ftp external.nj.nec.com (138.15.10.100) Name: anonymous Password: (your_userid at your_site) ftp> cd pub/giles/papers ftp> binary ftp> get NNPDA.ps.Z ftp> quit unix> uncompress NNPDA.ps.Z (Please note that this is a 35 page paper.) -----------------------------------------------------------------------------  From biblio at nucleus.hut.fi Tue Aug 31 13:08:00 1993 From: biblio at nucleus.hut.fi (Bibliography) Date: Tue, 31 Aug 93 13:08:00 DST Subject: Kohonen maps & LVQ -- huge bibliography (and reference request) Message-ID: <9308311008.AA20054@nucleus.hut.fi.hut.fi> Hello, We are in the process of compiling the complete bibliography of works on Kohonen Self-Organizing Map and Learning Vector Quantization all over the world. Currently the bibliography contains more than 1000 entries. The bibliography is now available (in BibTeX and PostScript formats) by anonymous FTP from: cochlea.hut.fi:/pub/ref/references.bib.Z ( BibTeX file) cochlea.hut.fi:/pub/ref/references.ps.Z ( PostScript file) The above files are compressed. Please make sure you use "binary" mode when you transfer these files. Please send any additions and corrections to : biblio at cochlea.hut.fi Please follow the IEEE instructions of references (full names of authors, name of article, journal name, volume + number where applicable, first and last page number, year, etc.) and BibTeX-format, if possible. Yours, Jari Kangas Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C SF-02150 Espoo, FINLAND