Connectionists: [CFP] Final call for papers - KDD2015 Workshop on Learning from Small Sample Sizes
Bob Durrant
bobd at waikato.ac.nz
Thu May 21 19:35:25 EDT 2015
With apologies for cross-posting.
Please feel free to bring this workshop to the attention of any
students, postdocs or colleagues who may be interested.
========================================================================
*Call for Papers - KDD2015 Workshop on Learning from Small Sample Sizes*
https://sites.google.com/site/smallsamplesizes
Submission site: https://easychair.org/conferences/?conf=ls3
Submission deadline: 23:59 Pacific Standard Time on Friday 5th June 2015
========================================================================
*Overview*
The small sample size ( or "large-p small-n") problem is a perennial in
the world of Big Data. A frequent occurrence in medical imaging,
computer vision, omics and bioinformatics it describes the situation
where the number of features p, in the tens of thousands or more, far
exceeds the sample size n, usually in the tens. Datamining, statistical
parameter estimation, and predictive modelling are all particularly
challenging in such a setting.
Moreover in all fields where the large-p small-n problem is a sensitive
issue (and actually also in many others) current technology is moving
towards higher resolution in sensing and recording while, in practice,
sample size is often bounded by hard limits or cost constraints.
Meanwhile even modest improvements in performance for modelling these
information-rich complex data promise significant cost savings or
advances in knowledge.
On the other hand it is becoming clear that "large-p small-n" is too
broad a categorization for these problems and progress is still possible
in the small sample setting either (1) in the presence of side
information - such as related unlabelled data (semi-supervised
learning), related learning tasks (transfer learning), or informative
priors (domain knowledge) - to further constrain the problem, or (2)
provided that data have low complexity, in some problem-specific sense,
that we are able to take advantage of. Concrete examples of such
low-complexity include: a large margin between classes (classification),
a sparse representation of data in some known linear basis (compressed
sensing), a sparse weight vector (regression), or a sparse correlation
structure (parameter estimation). However we do not know what other
properties of data, if any, act to make it "easy" or "hard" to work with
in terms of the sample size required for some specific class of
problems. For example: anti-learnable datasets in genomics are from the
same domain as many eminently learnable datasets. Is anti-learnability
then just a problem of data quality, the result of an unlucky draw of a
small sample, or is there something deeper that makes such data
inherently difficult to work with compared to other apparently similar data?
This workshop will bring together researchers working on different kinds
of challenges where the common thread is the small sample size problem.
It will provide a forum for exchanging theoretical and empirical
knowledge of small sample problems, and for sharing insight into which
data structures facilitate progress on particular families of problems -
even with a small sample size - and which do the opposite or when these
break down.
A further specific goal of this workshop is to make a start on building
links between the many disparate fields working with small data samples,
with the ultimate aim of creating a multi-disciplinary research network
devoted to this common issue.
We seek papers on all aspects of learning from small sample sizes, from
any problem domain where this issue is prevalent (e.g. bioinformatics
and omics, machine vision, anomaly detection, drug discovery, medical
imaging, multi-label classification, multi-task classification,
density-based clustering/density estimation, and others).
In particular:
*Theoretical and empirical analyses of learning from small samples:*
Which properties of data support, or prevent, learning from a small
sample?
Which forms of side information support learning from a small sample?
When do guarantees break down? In theory? In practice?
*Techniques and algorithms targeted at small sample size learning.*
Including, but not limited to:
Semi-supervised learning.
Transfer learning.
Representation learning.
Sparse methods.
Dimensionality reduction.
Application of domain knowledge/informative priors.
*Reproducible case studies.*
Please submit an extended abstract of no more than 8 pages, including
references, diagrams, and appendices, if any. The format is the standard
double column ACM Proceedings Template, Tighter Alternate style.
Please submit your abstract in pdf format only via Easychair at
https://easychair.org/conferences/?conf=ls3
Following KDD tradition reviews are not blinded, so you should include
author names and affiliations in your submission. Maximum file size for
submissions is 20MB.
*The deadline for submission is 23:59 Pacific Standard Time on Friday
5th June 2015.*
Important: Overfitting and serendipity are serious challenges to the
realistic empirical assessment of approaches applied to small data
samples. If you are submitting experimental findings then please give
enough detail in your submission to reproduce these in full.The ideal
way to ensure reproducibility is to provide code and data on the web
(including scripts used for data preparation if the data provided are
unprepared), and we strongly encourage authors to do this.
Bob Durrant, University of Waikato, Department of Statistics (Primary
Contact)
Alain C. Vandal, Auckland University of Technology, Department of
Biostatistics and Epidemiology
KDD2015 Workshop on Learning from Small Sample Sizes Organisers
--
Dr. Robert (Bob) Durrant, Senior Lecturer.
Room G.3.30,
Department of Statistics,
University of Waikato,
Private Bag 3105,
Hamilton 3240
New Zealand
e: bobd at waikato.ac.nz
w: http://www.stats.waikato.ac.nz/~bobd/
t: +64 (0)7 838 4466 x8334
f: +64 (0)7 838 4155
More information about the Connectionists
mailing list