CMU Learning Benchmark Database Updated

Fri Sep 24 03:15:48 EDT 1993

The CMU Learning Benchmark Archive has been updated.  As you may know, in the 
past, all the data sets in this collection have been in varying formats, 
requiring that code be written to parse each one.  This was a waste of 
everybody's time.  These old data sets have been replaced with data sets in a 
standardized format.  Now, all benchmarks consist of a file detailing the 
benchmark and another file that is either a data set (.data) or a program to 
generate the appropriate data set (.c). 

Data sets currently avaialable are: 
	nettalk	Pronunciation of English words. 
	parity		N-input parity. 
	protein	Prediction of secondary structure of proteins. 
	sonar		Classification of sonar signals. 
	two-spirals	Distinction of a twin spiral pattern. 
	vowel		Speaker independant recognition of vowels. 
	xor		Traditional xor. 

To accompany this new data file format is a file describing the format and a 
C library to parse the data file format.  In addition, the simulator 
(C version) for Cascade-Correlation has been rewritten to use the new file 
format.  Both the parsing code and the cascade correlation code are 
distributed as compressed shell archives and should compile with any ANSI/ISO 
compatible C compiler. 

Code currently available: 
	nevprop1.16.shar	A user friendly version of quickprop. 
	cascor1a.shar		The re-engineered version of the Cascade 
				Correlation algorithm. 
	parse1.shar		C code for the parsing algorithm to the new 
				data set format. 

Data sets and code are available via anonymous FTP.  Instructions follow. 

If you have difficulties with either the data sets or the programs, please 
send mail to: neural-bench at cs.cmu.edu.  Any comments or suggestions should 
also be sent to that address.  Let me urge you not to hold back questions as 
it is our single best way to spot places for improvement in our methods of 
doing things. 

If you would like to submit a data set to the CMU Learning Benchmark Archive, 
send email to neural-bench at cs.cmu.edu.  All data sets should be in the CMU 
data file format.  If you have difficulty converting your data file, contact 
us for assistance. 

Matt White 
Maintainer, CMU Learning Benchmark Archive 

-------------------------------------------------------------------------------

Directions for FTPing datasets: 

For people whose systems support AFS, you can access the files directly 
from directory "/afs/cs.cmu.edu/project/connect/bench". 

For people accessing these files via FTP: 

1. Create an FTP connection from wherever you are to machine "ftp.cs.cmu.edu". 
The internet address of this machine is 128.2.206.173, for those who need it. 

2. Log in as user "anonymous" with your own internet address as password. 
You may see an error message that says "filenames may not have /.. in them" 
or something like that.  Just ignore it. 

3. Change remote directory to "/afs/cs/project/connect/bench".  NOTE: you 
must do this in a single atomic operation.  Some of the super directories 
on this path are not accessible to outside users. 

4. At this point the "dir" command in FTP should give you a listing of 
files in this directory.  Use get or mget to fetch the ones you want.  If 
you want to access a compressed file (with suffix .Z) be sure to give the 
"binary" command before doing the "get".  (Some version of FTP use 
different names for these operations -- consult your local system 
maintainer if you have trouble with this.) 

5. The directory "/afs/cs/project/connect/code" contains public-domain 
programs implementing the Quickprop and Cascade-Correlation algorithms, 
among other things.  Access it in the same way.