BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Weinan E (Peking University)
DTSTART:20220126T000000Z
DTEND:20220126T010000Z
DTSTAMP:20260404T111005Z
UID:SciML/1
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/SciML
 /1/">A Mathematical Perspective of Machine Learning</a>\nby Weinan E (Peki
 ng University) as part of e-Seminar on Scientific Machine Learning\n\nAbst
 ract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/SciML/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Alex Townsend (Cornell University)
DTSTART:20220204T180000Z
DTEND:20220204T190000Z
DTSTAMP:20260404T111005Z
UID:SciML/2
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/SciML
 /2/">Learning Green’s functions associated with elliptic PDEs</a>\nby Al
 ex Townsend (Cornell University) as part of e-Seminar on Scientific Machin
 e Learning\n\nAbstract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/SciML/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Cyril Zhang (Microsoft Research)
DTSTART:20220218T180000Z
DTEND:20220218T190000Z
DTSTAMP:20260404T111005Z
UID:SciML/3
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/SciML
 /3/">Understanding Neural Net Training Dynamics in Tractable Slices</a>\nb
 y Dr. Cyril Zhang (Microsoft Research) as part of e-Seminar on Scientific 
 Machine Learning\n\n\nAbstract\nThe ability of deep neural networks to suc
 cessfully train and generalize seems to evade precise characterization by 
 classical theory from optimization and statistics. Through some of my rece
 nt work\, I'll present some perspectives on how this theory-practice gap i
 nconveniently manifests itself\, and discuss how theoretically-grounded al
 gorithm design can still deliver near-term improvements and new tools in t
 his space:\n\n- Self-stabilizing unstable learning rates. We investigate a
  lesser-known variant of accelerated gradient descent in convex optimizati
 on\, which eschews Nesterov/Polyak momentum in favor of plain gradient des
 cent with a carefully selected "fractal" schedule of unstable learning rat
 es. We prove stronger stability properties for this "fractal Chebyshev sch
 edule"\, and try it out on neural networks. [https://arxiv.org/pdf/2103.01
 338v2.pdf]\n\n- Learning rate grafting. We propose an experiment which int
 erpolates between optimizers (e.g. SGD and Adam)\, to better understand th
 eir training dynamics. We find that Adam isn't necessary to pretrain a sta
 te-of-the-art BERT model after all: instead\, simply find its implicit (pe
 r-layer) learning rate schedule via grafting\, then transfer this schedule
  to SGD. [https://openreview.net/pdf?id=FpKgG31Z_i9]\n\n- Inductive biases
  of attention models. I'll briefly mention some very recent work\, which b
 egins to quantify the inductive biases of the self-attention-based models 
 used in {NMT\, GPT-{1\,2\,3}\, BERT\, AlphaFold\, Codex\, ...}. We propose
  a theoretical mechanism by which a bounded-norm Transformer can represent
  a concise circuit near the statistical limit. [https://arxiv.org/pdf/2110
 .10090.pdf]\n\nBio: Cyril Zhang is a Senior Researcher at Microsoft Resear
 ch NYC. He works on theory and algorithms for prediction and decision-maki
 ng in dynamical systems\, large-scale learning\, and language modeling. He
  completed a Ph.D. in Computer Science from Princeton under the supervisio
 n of Prof. Elad Hazan\, during which he was a Student Researcher at Google
  AI. Before that\, he worked on Laplacian solvers and exoplanets at Yale.\
 n
LOCATION:https://stable.researchseminars.org/talk/SciML/3/
END:VEVENT
END:VCALENDAR
