Connectionists: Which benchmark data is trivial?
Wlodzislaw Duch
wduch at is.umk.pl
Thu Jul 5 03:06:29 EDT 2012
Dear Connectionists,
Learning methods with linear computational complexity O(nd), in number of
samples n and their dimension d, often gives results that are better or at
least not worse that more sophisticated and slower algorithms. This is
demonstrated for many benchmark datasets downloaded from the UCI Machine
Learning Repository.
32 out of 45 benchmark datasets were found trivial, either they contain no
information or O(nd) methods reach the same accuracy as the most
sophisticated methods. Such data should not be used as the only basis for
evaluation of the new algorithms.
Identification of trivial datasets is important to improve methodology of
comparison of new methods in computational intelligence and machine
learning. We suggest that the results reported in this short paper are quite
relevant as a baseline for testing new methods that is a bit more difficult
than the majority classifier baserates.
Duch W, Maszczyk T, Jankowski N, Make it cheap: learning with O(nd)
complexity. 2012 IEEE World Congress on Computational Intelligence,
Brisbane, Queensland, Australia, 10-15.06.2012, pp. 132-135.
Linked to: http://www.is.umk.pl/~duch/cv/papall.html
Regards, Wlodek Duch
_______________________________________________
Head, Dept of Informatics, Nicolaus Copernicus University
SCE NTU Nanyang Visiting Professor, Singapore
<http://www.google.com/search?q=W.+Duch> Google: W. Duch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/20120705/dd31e8db/attachment.html
More information about the Connectionists
mailing list