[AI Seminar] AI Seminar sponsored by Apple -- Xiaolong Wang -- October 17
Adams Wei Yu
weiyu at cs.cmu.edu
Sat Oct 14 08:32:34 EDT 2017
Dear faculty and students,
We look forward to seeing you next Tuesday, October 17, at noon in NSH 3305
for AI Seminar sponsored by Apple. To learn more about the seminar series,
please visit the AI Seminar webpage <http://www.cs.cmu.edu/~aiseminar/>.
On Tuesday, Xiaolong Wang <http://www.cs.cmu.edu/~xiaolonw/> will give the
Title: Learning Visual Representations for Object Detection
Object detection is in the center of applications in computer vision. The
current pipeline for training object detectors include ConvNet pre-training
and fine-tuning. In this talk, I am going to cover our works on
self-supervised/unsupervised ConvNet pre-training as well as optimization
strategies on fine-tuning.
For ConvNet pre-training, instead of using millions of labeled images, we
explored to learn visual representations using supervisions from the data
itself without any human labels, i.e., self-supervised learning.
Specifically, we proposed to exploit different self-supervised approaches
to learn representations invariant to (i) inter-instance variations (two
objects in the same class should have similar features) and (ii)
intra-instance variations (viewpoint, pose, deformations, illumination).
Instead of combining two approaches with multi-task learning, we organized
the data with multiple variations in a graph and applied simple transitive
rules to generate pairs of images with richer visual invariance for
training. This approach brings the object detection accuracies on MSCOCO
dataset less than 1% away from methods using large amount of labeled data
For object detection fine-tuning, we proposed to train object detectors
invariant to occlusions and deformations. The common solution is to use a
data-driven strategy -- collect large-scale datasets which have object
instances under different conditions. However, like categories, occlusions
and object deformations also follow a long-tail. Some occlusions and
deformations are so rare that they hardly happen; yet we want to learn a
model invariant to such occurrences. In this talk, we propose to learn an
adversarial network that generates examples with occlusions and
deformations. The goal of the adversary is to generate examples that are
difficult for the object detector to classify. In our framework both the
original detector and adversary are learned in a joint manner. We show
significant improvements on different datasets (VOC, COCO) with different
network architectures (AlexNet, VGG16, ResNet101).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ai-seminar-announce