Connectionists: Results of the ALvsPK challenge
Isabelle Guyon
isabelle at clopinet.com
Tue Aug 28 16:06:57 EDT 2007
Results of the Agnostic Learning vs. Prior Knowledge Challenge
-----------------------------------------------------------------
For the first few month of the challenge, AL lead over PK, showing that
the development of good AL classifiers is considerably faster. As of
March 1st 2007, PK was leading over AL on four out of five datasets. We
extended the challenge five more month, but the best performances did
not significant improve during that time period. On datasets not
requiring real expert domain knowledge (ADA
<cid:part1.04030605.05090309 at clopinet.com>, GINA
<cid:part2.08010605.07070209 at clopinet.com>, SYLVA
<cid:part3.04090904.06030607 at clopinet.com>), the participants entering
both track obtained better results in the PK track, using a
special-purpose coding of the inputs and/or the outputs, exploiting the
knowledge of which features were uninformative, and using "shared
weights" for redundantfeatures. For two datasets (HIVA
<cid:part4.00020606.04040104 at clopinet.com> and NOVA
<cid:part5.01090501.06040806 at clopinet.com>) the raw data was not in a
feature representation and required some domain knowledge to preprocess
data. The winning data representations consist in low level features
("molecular fingerprints" and "bag of words"). From the analysis of this
challenge, we conclude that agnostic learning methods are very powerful.
They quickly yield (in 40 to 60 days) to performances, which are near
the best achievable performances. General-purpose techniques for
exploiting prior knowledge in the encoding of inputs or outputs or the
design of the learning machine architecture (e.g. via shared weights)
may provide an additional performance boost, but exploiting real domain
knowledge is both hard and time consuming. The net result of using
domain knowledge rather using than low level features and relying on
agnostic learning may actually be to worsen results, as experienced by
some entrants. This fact seems to be a recurrent theme in machine
learning publications and the results of our challenge confirm it.
Future work includes incorporating the best identified methods in our
challenge toolkit CLOP <http://www.agnostic.inf.ethz.ch/models.php>. The
challenge web site <http://www.agnostic.inf.ethz.ch/> remains open for
post-challenge submissions (http://www.agnostic.inf.ethz.ch/)
For more details on the analysis, see:
http://clopinet.com/isabelle/Projects/agnostic/Results.html.
More information about the Connectionists
mailing list