NN Benchmarking WWW homepage
Lutz Prechelt
prechelt at ira.uka.de
Mon Dec 11 07:11:32 EST 1995
The homepage of the very successful NIPS*95 workshop on benchmarking
has now been converted into a repository for information about
benchmarking issues: Status quo, methodology, facilities, and
related info.
I kindly ask everybody who has additional information that should
be on the page (in particular sources or potential sources of
learning data of all kinds) to submit that information to me.
Other comments are also welcome.
The URL is
http://wwwipd.ira.uka.de/~prechelt/NIPS_bench.html
The page is also still reachable over the benchmarking workshop
link on the NIPS*95 homepage.
Below is a textual version of the page.
Lutz
Lutz Prechelt (http://wwwipd.ira.uka.de/~prechelt/) | Whenever you
Institut f. Programmstrukturen und Datenorganisation | complicate things,
Universitaet Karlsruhe; D-76128 Karlsruhe; Germany | they get
(Phone: +49/721/608-4068, FAX: +49/721/694092) | less simple.
===============================================
Benchmarking of learning algorithms
information repository page
Abstract: Proper benchmarking of (neural network and other)
learning architectures is a prerequisite for orderly progress in
this field. In many published papers deficiencies can be observed
in the benchmarking that is performed.
A workshop about NN benchmarking at NIPS*95 addressed the
status quo of benchmarking, common errors and how to avoid
them, currently existing benchmark collections, and, most
prominently, a new benchmarking facility including a results
database.
This page contains pointers to written versions or slides of most
of the talks given at the workshop plus some related material.
The page is intended to be a repository for such information to
be used as a reference by researchers in the field. Note that most
links lead to Postscript documents. Please send any additions or
corrections you might have to Lutz Prechelt
(prechelt at ira.uka.de).
Workshop Chairs:
Thomas G. Dietterich <tgd at chert.cs.orst.edu>,
Geoffrey Hinton <hinton at cs.toronto.edu>,
Wolfgang Maass <maass at igi.tu-graz.ac.at>,
Lutz Prechelt <prechelt at ira.uka.de> [communicating
chair]
Terry Sejnowski <terry at salk.edu>
Assessment of the status quo:
* Lutz Prechelt. A quantitative study of current
benchmarking practices.
A quantitative survey of 400 journal articles of 1993 and
1994 on NN algorithms. Most articles used far too few
problems during benchmarking.
* Arthur Flexer. Statistical Evaluation of Neural
Network Experiments: Minimum Requirements and
Current Practice. Says that it is insufficient what is
reported about the benchmarks and how.
Methodology:
* Tom Dietterich. Experimental Methodology
Benchmarking types, correct statistical testing, synthetic
versus real-world data, understanding via algorithm
mutation or data mutation, data generators.
* Lutz Prechelt. Some notes on neural learning
algorithm benchmarking.
A few general remarks about volume, validity,
reproducibility, and comparability of benchmarking;
DOs and DON'Ts.
* Brian Ripley. What can we learn from the study of
the design of experiments?
(Only two slides, though).
* Brian Ripley. Statistical Ideas for Selecting Network
Architectures.
(Also somewhat related to benchmarking.)
Benchmarking facilities:
* Previously available NN benchmarking data
collections
CMU nnbench,
UCI machine learning databases archive,
Proben1,
StatLog data,
ELENA data.
Advantages of these: UCI is large and growing and
popular, Statlog has largest and most orderly collection
of results available (in a book, though), and Proben1 is
most easy to use and best supports reproducible
experiments. Elena and nnbench have no particular
advantages.
Disadvantages: UCI and Probem1 have too few and too
unstructured results available, Proben1 is also inflexible
and small, Statlog is partially confidential and neither
data nor results collection are growing.
* Carl Rasmussen and Geoffrey Hinton. DELVE: A
thoroughly designed benchmark collection
A proposal of data, terminology, and procedures and a
facility for the collection of benchmarking results.
This is the newly proposed standard for benchmarking
NN (and other) learning algorithms. DELVE is currently
still under construction at the University of Toronto.
Other sources of data:
(Thanks to Nici Schraudolph <schraudo at salk.edu>)
There is a large amount of game data about the board
game Go available on the net. One starting point is here.
Others are the Go game database project, and the Go
game server. The database holds several hundred
thousand games of Go and could for instance be used for
advanced reinforcement learning projects.
Last correction: 1995/12/11
Please send additions and corrections to Lutz Prechelt,
prechelt at ira.uka.de.
To NIPS homepage.
To original homepage of this workshop.
More information about the Connectionists
mailing list