Workshop: Learning from Imbalanced Data Sets

Nathalie Japkowicz nat at
Tue Feb 11 19:46:59 EST 2003


		    ICML-KDD'2003 Workshop:

     	     Learning from Imbalanced Data Sets II

		 Thursday, August 21, 2003

    		       Washington, DC



  Nitesh Chawla, Business Analytic Solutions, CIBC (chawla at
  Nathalie Japkowicz, University of Ottawa  (nat at
  Aleksander Kolcz, America Online, Inc.   (ark at


Workshop Page:

Workshop Description:


Recent years brought increased interest in applying machine learning
techniques to difficult "real-world" problems, many of which are
characterized by imbalanced learning data, where at least one class is
under-represented relative to others. Examples include (but are not
limited to): fraud/intrusion detection, risk management, medical
diagnosis/monitoring, bioinformatics, text categorization and
personalization of information. The problem of imbalanced data is often
associated with asymmetric costs of misclassifying elements of different
classes. Additionally the distribution of the test data may differ from
that of the learning sample and the true misclassification costs may be
unknown at learning time.

The AAAI-2000 Workshop on "Learning from Imbalanced Data Sets" provided
the first venue where this important problem was explicitly addressed and
has been received with much interest. The related ICML-2000 Workshop on
"Cost-Sensitive Learning"  provided another venue for addressing the
problem of asymmetric costs of different classes and features.  Although
much awareness of the issues related to data imbalance has been raised,
many of the key problems still remain open and are in fact encountered
more often, especially when applied to massive datasets. We believe that
it would be of value to the machine learning community to not only examine
the progress achieved in this area over the last three years but also
discuss the current school of thought on research in learning from
imbalanced datasets. Based on our understanding of class imbalance problem,
the following topics of discussion are proposed (but not limited to):

* sampling (under-, over-, progressive, active)
* post-processing of learned models
* accounting for class imbalance via inductive bias
* one-sided learning
* handling uncertainty of target distribution and misclassification costs
* handling varying amounts (class dependent) of label noise

Proposed Format:

The workshop will open with an invited talk by Foster Provost that will
introduce and overview the topic. Presentations will then be organized
into several sessions corresponding roughly to the to the categories
identified above. The workshop will conclude with a discussion during
which a distinguished guest will comment on the presentations of the day,
and open the floor for general discussion.

Proposed Length:

One Day during which each panel will be allocated 1 to 2 hours, depending
on the number of contributions and the expected length of the discussion

Workshop Notes:

The accepted papers will be available electronically from the workhop
website, and also as printed workshop notes to the attendees.


Authors are invited to submit papers on the topics outlined above or
on other related issues. Submissions should not exceed 8 pages, and
should be in line with the ICML style sheet.  Electronic submissions,
in PDF format, are prefered and should be sent to:

	Nitesh Chawla at chawla at

If electronic submissions are inconvenient, please send four hard copies
of your submission to:

		Dr. Nitesh Chawla
	Business Analytic Solutions, TBRM,
		CIBC, BCE Place,
	  161 Bay Street, 11th Floor,
           Toronto, Ontario M5J 2S8,



* Submission deadline: May 1, 2003
* Notification date: May 25, 2003
* Final date for camera-ready copies to organizers: June 8, 2003


Invited Speakers:

  Foster Provost 	New York University, USA

  Others 		To Be Announced


Program Committee:

  Kevin Bowyer 		 University of Notre Dame, USA
  Chris Drummond	 National Research Council, Canada
  Charles Elkan 	 University of California San Diego, USA
  Marko Grobelnik  	 Jozef Stefan Institute, Slovenia
  Larry Hall 		 University of South Florida, USA
  Robert Holte 		 University of Alberta, Canada
  W.Philip Kegelmeyer 	 Sandia National Labs, USA
  Miroslav Kubat 	 University of Miami, USA
  Aleksandar Lazarevic   University of Minnesotta, USA
  Charles Ling 	 	 University of Western Ontario, Canada
  Dragos Margineantu 	 Boeing Corporation, USA
  Foster Provost 	 New York University, USA
  Gary Weiss 		 AT&T Labs, USA

Nathalie Japkowicz, Ph.D. 	Office: SITE Building 5-029
Assistant Professor  		Phone: (613) 562-5800 x6693
School of Information		E-mail:nat at
Technology & Engineering	WWW:
University of Ottawa 		FAX: (613) 562-5664

Street Address: 800 King Edward Avenue, P.O. Box 450 Stn. A
                Ottawa,	Ontario, Canada K1N 6N5

More information about the Connectionists mailing list