Connectionists: Which benchmark data is trivial?

Wlodzislaw Duch wduch at is.umk.pl
Thu Jul 5 03:06:29 EDT 2012


Dear Connectionists,

 

Learning methods with linear computational complexity O(nd), in number of
samples n and their dimension d, often gives results that are better or at
least not worse that more sophisticated and slower algorithms. This is
demonstrated for many benchmark datasets downloaded from the UCI Machine
Learning Repository. 

 

32 out of 45 benchmark datasets were found trivial, either they contain no
information or O(nd) methods reach the same accuracy  as the most
sophisticated methods. Such data should not be used as the only basis for
evaluation of the new algorithms. 

 

Identification of trivial datasets is important to improve methodology of
comparison of new methods in computational intelligence and machine
learning. We suggest that the results reported in this short paper are quite
relevant as a baseline for testing new methods that is a bit more difficult
than the majority classifier baserates. 

 

Duch W, Maszczyk T, Jankowski N, Make it cheap: learning with O(nd)
complexity. 2012 IEEE World Congress on Computational Intelligence,
Brisbane, Queensland, Australia, 10-15.06.2012, pp. 132-135.

Linked to: http://www.is.umk.pl/~duch/cv/papall.html 

 

 

Regards, Wlodek Duch
_______________________________________________

Head, Dept of Informatics, Nicolaus Copernicus University

SCE NTU Nanyang Visiting Professor, Singapore

 <http://www.google.com/search?q=W.+Duch> Google: W. Duch 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.srv.cs.cmu.edu/mailman/private/connectionists/attachments/20120705/dd31e8db/attachment.html


More information about the Connectionists mailing list