BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Philippe Rigollet (MIT)
DTSTART:20200408T140000Z
DTEND:20200408T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/1
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/1/">Statistical and Computational aspects of Wasserstein Barycenters</a
 >\nby Philippe Rigollet (MIT) as part of MAD+\n\n\nAbstract\nThe notion of
  average is central to most statistical methods. In this talk we study a g
 eneralization of this notion over the non-​Euclidean space of probabilit
 y measures equipped with a certain Wasserstein distance. This generalizati
 on is often called Wasserstein Barycenters and empirical evidence suggests
  that these barycenters allow to capture interesting notions of averages i
 n graphics\, data assimilation and morphometrics. However the statistical 
 (rates of convergence) and computational (efficient algorithms) for these 
 Wasserstein barycenters are largely unexplored. The goal of this talk is t
 o review two recent results: 1. Fast rates of convergence for empirical ba
 rycenters in general geodesic spaces\, and\, 2. Provable guarantees for gr
 adient descent and stochastic gradient descent to compute Wasserstein bary
 centers. Both results leverage geometric aspects of optimal transport. Bas
 ed on joint works (arXiv:1908.00828\, arXiv:2001.01700) with Chewi\, Le Go
 uic\, Maunu\, Paris\, and Stromme.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Francis Bach (INRIA/ENS)
DTSTART:20200520T140000Z
DTEND:20200520T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/2
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/2/">On the effectiveness of Richardson extrapolation in machine learnin
 g</a>\nby Francis Bach (INRIA/ENS) as part of MAD+\n\nAbstract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:David Gamarnik (MIT)
DTSTART:20200422T140000Z
DTEND:20200422T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/3
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/3/">Overlap gap property: a provable barrier to fast optimization in pr
 obabilistic combinatorial structures</a>\nby David Gamarnik (MIT) as part 
 of MAD+\n\n\nAbstract\nMany combinatorial optimization problems defined on
  random instances exhibit an apparent gap between the optimal values\, whi
 ch can be computed by non-constructive means\, and the best values achieva
 ble by fast (polynomial time) algorithms. Through a combined effort of mat
 hematicians\, computer scientists and statistical physicists\, it became a
 pparent that a potential barrier for designing fast algorithms bridging th
 is gap is an intricate topology of nearly optimal solutions\, in particula
 r the presence of the Overlap Gap Property (OGP)\, which we will introduce
  in this talk. We will discuss how for many such problems the onset of the
  OGP phase transition introduces indeed a provable barrier to a broad clas
 s of polynomial time algorithms. Examples of such problems include the pro
 blem of finding a largest independent set of a random graph\, finding a la
 rgest cut in a random hypergrah\, the problem of finding a ground state of
  a p-spin model\, and also many problems in high-dimensional statistics fi
 eld. In this talk we will demonstrate in particular why OGP is a barrier f
 or three classes of algorithms designed to find a near ground state in p-s
 pin models arising in the field of spin glass theory: Approximate Message 
 Passing algorithms\, algorithms based on low-degree polynomial and Langevi
 n dynamics. Joint work with Aukosh Jagannath and Alex Wein\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/3/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lenka Zdeborova (CNRS)
DTSTART:20200527T140000Z
DTEND:20200527T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/4
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/4/">Understanding machine learning via exactly solvable statistical phy
 sics models</a>\nby Lenka Zdeborova (CNRS) as part of MAD+\n\n\nAbstract\n
 The affinity between statistical physics and machine learning has a long h
 istory\, this is reflected even in the machine learning terminology that i
 s in part adopted from physics. I will describe the main lines of this lon
 g-lasting friendship in the context of current theoretical challenges and 
 open questions about deep learning. Theoretical physics often proceeds in 
 terms of solvable synthetic models\, I will describe the related line of w
 ork on solvable models of simple feed-forward neural networks. I will high
 light a path forward to capture the subtle interplay between the structure
  of the data\, the architecture of the network\, and the learning algorith
 m.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/4/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ingrid Daubechies (Duke)
DTSTART:20200603T140000Z
DTEND:20200603T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/5
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/5/">Diffusion Methods in Manifold and Fibre Bundle Learning</a>\nby Ing
 rid Daubechies (Duke) as part of MAD+\n\n\nAbstract\nDiffusion methods hel
 p understand and denoise data sets\; when there is additional structure (a
 s is often the case)\, one can use (and get additional benefit from) a fib
 er bundle model.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/5/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Sara van de Geer (ETHZ)
DTSTART:20200429T140000Z
DTEND:20200429T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/6
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/6/">Total variation regularization</a>\nby Sara van de Geer (ETHZ) as p
 art of MAD+\n\nAbstract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/6/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Andrea Montanari (Stanford)
DTSTART:20200610T140000Z
DTEND:20200610T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/7
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/7/">The generalization error of overparametrized models: Insights from 
 exact asymptotics</a>\nby Andrea Montanari (Stanford) as part of MAD+\n\n\
 nAbstract\nIn a canonical supervised learning setting\, we are given n dat
 a samples\, each comprising a feature vector and a label\, or response var
 iable. We are asked to learn a function f that can predict the the label a
 ssociated to a new –unseen– feature vector. How is it possible that th
 e model learnt from observed data generalizes to new points? Classical lea
 rning theory assumes that data points are drawn i.i.d. from a common distr
 ibution and argue that this phenomenon is a consequence of uniform converg
 ence: the training error is close to its expectation uniformly over all mo
 dels in a certain class. Modern deep learning systems appear to defy this 
 viewpoint: they achieve training error that is significantly smaller than 
 the test error\, and yet generalize well to new data. I will present a seq
 uence of high-dimensional examples in which this phenomenon can be underst
 ood in detail. [Based on joint work wit\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/7/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Emmanuel Candes (Stanford)
DTSTART:20200512T180000Z
DTEND:20200512T190000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/8
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/8/">Reliable predictions? Equitable treatment? Some recent progress in 
 predictive inference</a>\nby Emmanuel Candes (Stanford) as part of MAD+\n\
 n\nAbstract\nRecent progress in machine learning (ML) provides us with man
 y potentially effective tools to learn from datasets of ever increasing si
 zes and make useful predictions. How do we know that these tools can be tr
 usted in critical and high-sensitivity systems? If a learning algorithm pr
 edicts the GPA of a prospective college applicant\, what guarantees do I h
 ave concerning the accuracy of this prediction? How do we know that it is 
 not biased against certain groups of applicants? This talk introduces stat
 istical ideas to ensure that the learned models satisfy some crucial prope
 rties\, especially reliability and fairness (in the sense that the models 
 need to apply to individuals in an equitable manner). To achieve these imp
 ortant objectives\, we shall not ‘open up the black box’ and try under
 standing its underpinnings. Rather we discuss broad methodologies — conf
 ormal inference\, quantile regression\, the Jackknife+ — that can be wra
 pped around any black box as to produce results that can be trusted and ar
 e equitable.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/8/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Aviv Regev (Broad Institute\, MIT/Harvard)
DTSTART:20200617T140000Z
DTEND:20200617T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/9
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/9/">Design for Inference and the power of random experiments in biology
 </a>\nby Aviv Regev (Broad Institute\, MIT/Harvard) as part of MAD+\n\nAbs
 tract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Andrea Montanari (Stanford)
DTSTART:20200624T140000Z
DTEND:20200624T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/10
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/10/">The generalization error of overparametrized models: Insights from
  exact asymptotics</a>\nby Andrea Montanari (Stanford) as part of MAD+\n\n
 \nAbstract\nIn a canonical supervised learning setting\, we are given n da
 ta samples\, each comprising a feature vector and a label\, or response va
 riable. We are asked to learn a function f that can predict the the label 
 associated to a new –unseen– feature vector. How is it possible that t
 he model learnt from observed data generalizes to new points? Classical le
 arning theory assumes that data points are drawn i.i.d. from a common dist
 ribution and argue that this phenomenon is a consequence of uniform conver
 gence: the training error is close to its expectation uniformly over all m
 odels in a certain class. Modern deep learning systems appear to defy this
  viewpoint: they achieve training error that is significantly smaller than
  the test error\, and yet generalize well to new data. I will present a se
 quence of high-dimensional examples in which this phenomenon can be unders
 tood in detail. [Based on joint work with Song Mei\, Feng Ruan\, Youngtak 
 Sohn\, Jun Yan]\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/10/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mahdi Soltanolkotabi (USC)
DTSTART:20200708T140000Z
DTEND:20200708T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/11
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/11/">Learning via early stopping and untrained neural nets</a>\nby Mahd
 i Soltanolkotabi (USC) as part of MAD+\n\n\nAbstract\nModern neural networ
 ks are typically trained in an over-parameterized regime where the paramet
 ers of the model far exceed the size of the training data. Such neural net
 works in principle have the capacity to (over)fit any set of labels includ
 ing significantly corrupted ones. Despite this (over)fitting capacity\, ov
 er-parameterized networks have an intriguing robustness capability: they a
 re surprisingly robust to label noise when first order methods with early 
 stopping are used to train them. Even more surprising\, one can remove noi
 se and corruption from a natural image without using any training data wha
 t-so-ever\, by simply fitting (via gradient descent) a randomly initialize
 d\, over-parameterized convolutional generator to a single corrupted image
 . In this talk I will first present theoretical results aimed at explainin
 g the robustness capability of neural networks when trained via early-stop
 ped gradient descent. I will then present results towards demystifying unt
 rained networks for image reconstruction/restoration tasks such as denoisi
 ng and those arising in inverse problems such as compressive sensing.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Samory Kpotufe (Columbia)
DTSTART:20200715T140000Z
DTEND:20200715T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/12
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/12/">Some recent insights on transfer-learning</a>\nby Samory Kpotufe (
 Columbia) as part of MAD+\n\n\nAbstract\nA common situation in Machine Lea
 rning is one where training data is not fully representative of a target p
 opulation due to bias in the sampling mechanism or high costs in sampling 
 the target population\; in such situations\, we aim to ’transfer’ rele
 vant information from the training data (a.k.a. source data) to the target
  application. How much information is in the source data? How much target 
 data should we collect if any? These are all practical questions that depe
 nd crucially on ‘how far’ the source domain is from the target. Howeve
 r\, how to properly measure ‘distance’ between source and target domai
 ns remains largely unclear.\n\nIn this talk we will argue that much of the
  traditional notions of ‘distance’ (e.g. KL-divergence\, extensions of
  TV such as D_A discrepancy\, density-ratios\, Wasserstein distance) can y
 ield an over-pessimistic picture of transferability. Instead\, we show tha
 t some new notions of ‘relative dimension’ between source and target (
 which we simply term ‘transfer-exponents’) capture a continuum from ea
 sy to hard transfer. Transfer-exponents uncover a rich set of situations w
 here transfer is possible even at fast rates\, encode relative benefits of
  source and target samples\, and have interesting implications for related
  problems such as multi-task or multi-source learning.\n\nIn particular\, 
 in the case of multi-source learning\, we will discuss (if time permits) a
  strong dichotomy between minimax and adaptive rates: no adaptive procedur
 e can achieve a rate better than single source rates\, although minimax (o
 racle) procedures can.\n\nThe talk is based on earlier work with Guillaume
  Martinet\, and ongoing work with Steve Hanneke.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Guilio Biroli (ENS Paris)
DTSTART:20200729T140000Z
DTEND:20200729T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/13
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/13/">On the benefit of over-parametrization and the origin of double de
 scent curves in artificial neural networks</a>\nby Guilio Biroli (ENS Pari
 s) as part of MAD+\n\n\nAbstract\nDeep neural networks have triggered a re
 volution in machine learning\, and more generally in computer science. Und
 erstanding their remarkable performance is a key scientific challenge with
  many open questions. For instance\, practitioners find that using massive
 ly over-parameterised networks is beneficial to learning and generalizatio
 n ability. This fact goes against standard theories\, and defies intuition
 . In this talk I will address this issue. I will first contrast standard e
 xpectations based on variance-bias trade-off to the results of numerical e
 xperiments on deep neural networks\, which display a “double-descent” 
 behavior of the test error when increasing the number of parameters instea
 d of the traditional U-curve. I will then discuss a theory of this phenome
 non based on the solution of simplified models of deep neural networks by 
 statistical physics methods.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ahmed El Alaoui (Stanford)
DTSTART:20200722T140000Z
DTEND:20200722T150000Z
DTSTAMP:20260404T092653Z
UID:MADPlus/14
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/MADPl
 us/14/">Optimization of mean-field spin glass Hamiltonians</a>\nby Ahmed E
 l Alaoui (Stanford) as part of MAD+\n\n\nAbstract\nWe consider the questio
 n of computing an approximate ground state configuration of an Ising (mixe
 d) p-spin Hamiltonian H_N from a bounded number of gradient evaluations.\n
 \nI will present an efficient algorithm which exploits the ultrametric str
 ucture of the superlevel sets of H_N in order to achieve an energy E_* cha
 racterized via an extended Parisi variational principle. This energy E_* i
 s optimal when the model satisfies a `no overlap gap’ condition. At the 
 heart of this algorithmic approach is a stochastic control problem\, whose
  dual turns out to be the Parisi formula\, thereby shedding new light on t
 he nature of the latter.\n\nThis is joint work with Andrea Montanari and M
 ark Sellke.\n
LOCATION:https://stable.researchseminars.org/talk/MADPlus/14/
END:VEVENT
END:VCALENDAR
