Send us your data

Lutz Prechelt prechelt at ira.uka.de
Tue Sep 20 10:51:46 EDT 1994


> We are planning to create a database of tasks for evaluating supervised neural
> network learning procedures (both classification and regression).  The main

You may be interested to know that I have started a similar project
earlier this year, which will be finished in at most a few weeks.

My benchmark collection, called Proben1, contains 45 datasets for
15 different learning problems from 12 different domains.
All but one of these problems stem from the UCI machine learning
databases archive.

I chose an approach that differs from yours in a few points:
- Small datasets, too.
- I use a smaller part of the dataset as test set (25%)
  but use three different partitionings instead.
- All data partitionings also include an exactly specified
  validation set (if one is needed, otherwise this, too, is
  part of the training set)
- Problems with nominal attributes
- Problems with missing values
- canonical input and output representation (range 0...1)

Nevertheless, you may want to have a look at my collection.
I believe it would be good if you would for instance use the
same (very simple) file format for the data in your collection,
so that researchers can read the data from both collections using
the same input procedure.

My collection will be installed for anonymous ftp in the
neural bench archive at CMU.
The technical report describing it will be announced on
this mailing list and will be available from neuroprose.

[ Geoffrey, I'll send the draft version of my report to you by personal
  mail. ]

  Lutz

Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  76128 Karlsruhe;  Germany      | they get
(Voice: ++49/721/608-4068, FAX: ++49/721/694092)        | less simple.



More information about the Connectionists mailing list