BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Dylan Foster (MIT)
DTSTART:20200925T150500Z
DTEND:20200925T160500Z
DTSTAMP:20260404T131146Z
UID:sss/6
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/sss/6
 /">Separating Estimation from Decision Making in Contextual Bandits</a>\nb
 y Dylan Foster (MIT) as part of Stochastics and Statistics Seminar Series\
 n\n\nAbstract\nThe contextual bandit is a sequential decision making probl
 em in which a learner repeatedly selects an action (e.g.\, a news article 
 to display) in response to a context (e.g.\, a user’s profile) and recei
 ves a reward\, but only for the action they selected. Beyond the classic e
 xplore-exploit tradeoff\, a fundamental challenge in contextual bandits is
  to develop algorithms that can leverage flexible function approximation t
 o model similarity between contexts\, yet have computational requirements 
 comparable to classical supervised learning tasks such as classification a
 nd regression. To this end\, we provide the first universal and optimal re
 duction from contextual bandits to online regression. We show how to trans
 form any oracle for online regression with a given value function class in
 to an algorithm for contextual bandits with the induced policy class\, wit
 h no overhead in runtime or memory requirements. Conceptually\, our result
 s show that it is possible to provably separate estimation and decision ma
 king into separate algorithmic building blocks\, and that this can be effe
 ctive both in theory and in practice. Time permitting\, I will discuss ext
 ensions of these techniques to more challenging reinforcement learning pro
 blems.\n
LOCATION:https://stable.researchseminars.org/talk/sss/6/
END:VEVENT
END:VCALENDAR
