CMU Learning Benchmark Database Updated
Matthew.White@cs.cmu.edu
Matthew.White at cs.cmu.edu
Fri Sep 24 03:15:48 EDT 1993
The CMU Learning Benchmark Archive has been updated. As you may know, in the
past, all the data sets in this collection have been in varying formats,
requiring that code be written to parse each one. This was a waste of
everybody's time. These old data sets have been replaced with data sets in a
standardized format. Now, all benchmarks consist of a file detailing the
benchmark and another file that is either a data set (.data) or a program to
generate the appropriate data set (.c).
Data sets currently avaialable are:
nettalk Pronunciation of English words.
parity N-input parity.
protein Prediction of secondary structure of proteins.
sonar Classification of sonar signals.
two-spirals Distinction of a twin spiral pattern.
vowel Speaker independant recognition of vowels.
xor Traditional xor.
To accompany this new data file format is a file describing the format and a
C library to parse the data file format. In addition, the simulator
(C version) for Cascade-Correlation has been rewritten to use the new file
format. Both the parsing code and the cascade correlation code are
distributed as compressed shell archives and should compile with any ANSI/ISO
compatible C compiler.
Code currently available:
nevprop1.16.shar A user friendly version of quickprop.
cascor1a.shar The re-engineered version of the Cascade
Correlation algorithm.
parse1.shar C code for the parsing algorithm to the new
data set format.
Data sets and code are available via anonymous FTP. Instructions follow.
If you have difficulties with either the data sets or the programs, please
send mail to: neural-bench at cs.cmu.edu. Any comments or suggestions should
also be sent to that address. Let me urge you not to hold back questions as
it is our single best way to spot places for improvement in our methods of
doing things.
If you would like to submit a data set to the CMU Learning Benchmark Archive,
send email to neural-bench at cs.cmu.edu. All data sets should be in the CMU
data file format. If you have difficulty converting your data file, contact
us for assistance.
Matt White
Maintainer, CMU Learning Benchmark Archive
-------------------------------------------------------------------------------
Directions for FTPing datasets:
For people whose systems support AFS, you can access the files directly
from directory "/afs/cs.cmu.edu/project/connect/bench".
For people accessing these files via FTP:
1. Create an FTP connection from wherever you are to machine "ftp.cs.cmu.edu".
The internet address of this machine is 128.2.206.173, for those who need it.
2. Log in as user "anonymous" with your own internet address as password.
You may see an error message that says "filenames may not have /.. in them"
or something like that. Just ignore it.
3. Change remote directory to "/afs/cs/project/connect/bench". NOTE: you
must do this in a single atomic operation. Some of the super directories
on this path are not accessible to outside users.
4. At this point the "dir" command in FTP should give you a listing of
files in this directory. Use get or mget to fetch the ones you want. If
you want to access a compressed file (with suffix .Z) be sure to give the
"binary" command before doing the "get". (Some version of FTP use
different names for these operations -- consult your local system
maintainer if you have trouble with this.)
5. The directory "/afs/cs/project/connect/code" contains public-domain
programs implementing the Quickprop and Cascade-Correlation algorithms,
among other things. Access it in the same way.
More information about the Connectionists
mailing list