Fwd: Reminder - Thesis Proposal - 5/7/15 - Active Search and Bandit Methods for Complex Actions and Rewards - Yifei Ma

Thu May 7 09:13:05 EDT 2015

Hi Everyone,

Please come and listen to Yifei's thesis proposal this morning at 10am!

Jeff.

-------- Forwarded Message --------
Subject: 	Reminder - Thesis Proposal - 5/7/15 - Active Search and Bandit Methods
for Complex Actions and Rewards - Yifei Ma
Date: 	Wed, 06 May 2015 22:57:58 -0400
From: 	Diane Stidle <diane at cs.cmu.edu>
To: 	ML-SEMINAR at cs.cmu.edu, rpa at seas.harvard.edu, garnett at wustl.edu

/
Thesis Proposal/

Date: May 7, 2015
Time: 10:00am
Place: 8102 GHC
PhD Candidate: Yifei Ma

Title: Active Search and Bandit Methods for Complex Actions and Rewards

Abstract:
Multi-arm bandit problems study how to maximize cumulative rewards by adaptively
choosing arms and learning from their outcomes. Many algorithms have been
developed with "no regret'' guarantees in the long run, ensuring that the
average cumulative reward converges to the expected reward of one single best
policy. Active search is a newer concept that hopes to discover all positive
targets by adaptively examining candidate targets and capturing their outcomes.
Reward for the same target is only counted once, thus no point is ever chosen
more than once. Theoretical analyses of active search can be made similar to
bandit methods, and this proposal will consider both of these problems.

      To date, most active search and bandit methods act on the premise that the
action allowed is to sample a single candidate at a time and the reward is
determined by the chosen candidate alone. Many real world applications, however,
define action and reward in a more complex way. For example, in environmental
monitoring carried out by autonomous boats with on-board sensors, a set of
readings within a certain region that satisfy some pattern can indicate a real
pollution problem, but any single point measurement will be insufficient
evidence. This requires that reward be defined on regions. Similarly, many
sensors are constructed to observe large areas simultaneously rather than single
points. To operate these sensors, each action covers all the targets under the
sensing region. Finally, marketing through social media may return properties of
subgraphs, for example in social polling. In this work, we propose to develop
multi-arm bandit and active search algorithms in these more complex action and
reward settings.

      My previous research has covered some basic scenarios in this area:
searching for regions with sufficiently high means by collecting data at point
level, discovering complex global patterns defined on regions by adaptively
observing individual points, and using a new criterion called Sigma-optimality
to perform active learning and polling on social network graphs. I propose to
extend the theoretical aspect of these and similar problems: to apply
Sigma-optimality as the exploration term for active search of positives on
network graphs, to study the properties of other spectral functions in terms of
exploration on network graphs, to analyze cumulative regrets for Gaussian
process--bandit algorithms when rewards are defined by region averages, and
finally to develop efficient algorithms when the action allows pulling multiple
arms or regions.

Thesis Committee:
Jeff Schneider (Chair)
Aarti Singh
Alex Smola
Ryan P. Adams (Harvard University)
Roman Garnett (Washington University in St. Louis)

-- 
Diane Stidle
Graduate Programs Manager
Machine Learning Department
Carnegie Mellon University
diane at cs.cmu.edu
412-268-1299