NN Benchmarking WWW homepage

Mon Dec 11 07:11:32 EST 1995

The homepage of the very successful NIPS*95 workshop on benchmarking
has now been converted into a repository for information about
benchmarking issues: Status quo, methodology, facilities, and
related info.

I kindly ask everybody who has additional information that should
be on the page (in particular sources or potential sources of
learning data of all kinds) to submit that information to me.
Other comments are also welcome.

The URL is

http://wwwipd.ira.uka.de/~prechelt/NIPS_bench.html

The page is also still reachable over the benchmarking workshop
link on the NIPS*95 homepage.

Below is a textual version of the page.

  Lutz

Lutz Prechelt (http://wwwipd.ira.uka.de/~prechelt/)  | Whenever you 
Institut f. Programmstrukturen und Datenorganisation | complicate things,
Universitaet Karlsruhe;  D-76128 Karlsruhe;  Germany | they get
(Phone: +49/721/608-4068, FAX: +49/721/694092)       | less simple.

===============================================

Benchmarking of learning algorithms

information repository page 

Abstract: Proper benchmarking of (neural network and other)
learning architectures is a prerequisite for orderly progress in
this field. In many published papers deficiencies can be observed
in the benchmarking that is performed.
A workshop about NN benchmarking at NIPS*95 addressed the
status quo of benchmarking, common errors and how to avoid
them, currently existing benchmark collections, and, most
prominently, a new benchmarking facility including a results
database.
This page contains pointers to written versions or slides of most
of the talks given at the workshop plus some related material.
The page is intended to be a repository for such information to
be used as a reference by researchers in the field. Note that most
links lead to Postscript documents. Please send any additions or
corrections you might have to Lutz Prechelt
(prechelt at ira.uka.de). 

Workshop Chairs: 

   Thomas G. Dietterich <tgd at chert.cs.orst.edu>, 
   Geoffrey Hinton <hinton at cs.toronto.edu>, 
   Wolfgang Maass <maass at igi.tu-graz.ac.at>, 
   Lutz Prechelt <prechelt at ira.uka.de> [communicating
   chair] 
   Terry Sejnowski <terry at salk.edu> 

Assessment of the status quo:

 *  Lutz Prechelt. A quantitative study of current
   benchmarking practices.
   A quantitative survey of 400 journal articles of 1993 and
   1994 on NN algorithms. Most articles used far too few
   problems during benchmarking. 
 *  Arthur Flexer. Statistical Evaluation of Neural
   Network Experiments: Minimum Requirements and
   Current Practice. Says that it is insufficient what is
   reported about the benchmarks and how. 

Methodology:

 *  Tom Dietterich. Experimental Methodology
   Benchmarking types, correct statistical testing, synthetic
   versus real-world data, understanding via algorithm
   mutation or data mutation, data generators. 
 *  Lutz Prechelt. Some notes on neural learning
   algorithm benchmarking.
   A few general remarks about volume, validity,
   reproducibility, and comparability of benchmarking;
   DOs and DON'Ts. 
 *  Brian Ripley. What can we learn from the study of
   the design of experiments?
   (Only two slides, though). 
 *  Brian Ripley. Statistical Ideas for Selecting Network
   Architectures.
   (Also somewhat related to benchmarking.) 

Benchmarking facilities:

 *  Previously available NN benchmarking data
   collections
      CMU nnbench, 
      UCI machine learning databases archive, 
      Proben1, 
      StatLog data, 
      ELENA data. 
   Advantages of these: UCI is large and growing and
   popular, Statlog has largest and most orderly collection
   of results available (in a book, though), and Proben1 is
   most easy to use and best supports reproducible
   experiments. Elena and nnbench have no particular
   advantages.
   Disadvantages: UCI and Probem1 have too few and too
   unstructured results available, Proben1 is also inflexible
   and small, Statlog is partially confidential and neither
   data nor results collection are growing. 
 *  Carl Rasmussen and Geoffrey Hinton. DELVE: A
   thoroughly designed benchmark collection
   A proposal of data, terminology, and procedures and a
   facility for the collection of benchmarking results.
   This is the newly proposed standard for benchmarking
   NN (and other) learning algorithms. DELVE is currently
   still under construction at the University of Toronto. 

Other sources of data:

   (Thanks to Nici Schraudolph <schraudo at salk.edu>)
   There is a large amount of game data about the board
   game Go available on the net. One starting point is here.
   Others are the Go game database project, and the Go
   game server. The database holds several hundred
   thousand games of Go and could for instance be used for
   advanced reinforcement learning projects. 

Last correction: 1995/12/11
Please send additions and corrections to Lutz Prechelt,
prechelt at ira.uka.de. 

To NIPS homepage.
To original homepage of this workshop.