BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Albert Cohen (Google)
DTSTART:20211015T150000Z
DTEND:20211015T160000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/1
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/1/">Herding Tensor Compilers</a>\nby Albert Cohen (Goog
 le) as part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe or
 chestration of high-performance numerical computations on distributed and 
 heterogeneous systems is not getting any simpler. In the last 5 years\, dr
 iven by the needs of machine learning\, systems and compilers made tremend
 ous progress towards hiding this complexity while delivering excellent per
 formance. These undeniable successes of computing systems and programming 
 language research also came with undesirable and somewhat paradoxical side
  effects: abstractions and engineering frameworks diversifying out of cont
 rol while machine learning models got stuck in the rut defined by a small 
 set of highly optimized operators. We will recall algebraic principles sup
 porting the compilation of tensor algebra\, and illustrate these principle
 s on three optimization strategies with different degrees of human/expert 
 intervention. While the presentation focuses on optimization and algorithm
 s\, we will also discuss MLIR\, a large-scale compiler construction effort
  to rationalize the landscape of machine learning systems.\n\nBio: Albert 
 is a research scientist at Google. He has been a research scientist at Inr
 ia from 2000 to 2018. Alumni of École Normale Supérieure de Lyon and the
  University of Versailles in 1999. He has been a visiting scholar at the U
 niversity of Illinois\, an invited professor at Philips Research\, and a v
 isiting scientist at Facebook Artificial Intelligence Research. Albert Coh
 en works on parallelizing and optimizing compilers\, parallel programming 
 languages and systems\, machine learning compilers\, synchronous programmi
 ng\, with applications to high-performance computing\, artificial intellig
 ence and reactive control. He served as the general or program chair of ma
 jor conferences\, including PLDI\, PPoPP\, PACT\, HiPEAC\, CC\, the embedd
 ed software track of DAC\, and as a member of the editorial board of ACM T
 ACO\, TECS and IJPP. Several research projects initiated by Albert Cohen r
 esulted in effective transfer to production compilers and programming envi
 ronments.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Jonathan Ragan-Kelley (MIT)
DTSTART:20211022T150000Z
DTEND:20211022T160000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/2
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/2/">Organizing Computation for High-Performance Visual 
 Computing</a>\nby Jonathan Ragan-Kelley (MIT) as part of Oxford Seminars o
 n Tensor Computation\n\n\nAbstract\nIn the face of declining returns to Mo
 ore’s law\, future visual computing applications—from photorealistic r
 eal-time rendering\, to 4D light field cameras\, to pervasive sensing with
  deep learning—still demand orders of magnitude more computation than we
  currently have. From data centers to mobile devices\, performance and ene
 rgy scaling is limited by locality (the distance over which data has to mo
 ve\, e.g.\, from nearby caches\, far away main memory\, or across networks
 ) and parallelism. Because of this\, I argue that we should think of the p
 erformance and efficiency of an application as determined not just by the 
 algorithm and the hardware on which it runs\, but critically also by the o
 rganization of its computations and data. For algorithms with the same com
 plexity—even the exact same set of arithmetic operations—the order and
  granularity of execution and placement of data can easily change performa
 nce by an order of magnitude because of locality and parallelism. To extra
 ct the full potential of our machines\, we must treat the organization of 
 computation as a first-class concern\, while working across all levels\, f
 rom algorithms and data structures\, to programming languages\, to hardwar
 e.\n\nThis talk will present facets of this philosophy in systems for imag
 e processing\, 3D graphics\, and machine learning. I will show that\, for 
 the data-parallel pipelines common in these data-intensive applications\, 
 the organization of computations and data for a given algorithm is constra
 ined by a fundamental tension between parallelism\, locality\, and redunda
 nt computation of shared values. I will focus particularly on the Halide l
 anguage and compiler\, which explicitly separates what computations define
  an algorithm from the choices of organization which determine parallelism
 \, locality\, and synchronization. I will show how this approach can enabl
 e much simpler programs to deliver performance often many times faster tha
 n the best prior implementations\, while scaling across radically differen
 t architectures\, from ARM phones to massively parallel GPUs\, FPGAs\, and
  custom ASICs.\n\nBio: Jonathan Ragan-Kelley is the Esther and Harold E. E
 dgerton Assistant Professor of Electrical Engineering & Computer Science a
 t MIT and assistant professor of EECS at UC Berkeley.  He works on high-ef
 ficiency visual computing\, including systems\, compilers\, and architectu
 res for image processing\, vision\, 3D rendering\, simulation\, and machin
 e learning. He is a recipient of the ACM SIGGRAPH Significant New Research
 er award\, NSF CAREER award\,  Intel Outstanding Researcher award\, and tw
 o CACM Research Highlights. He was previously a visiting researcher at Goo
 gle\, a postdoc in Computer Science at Stanford\, and earned his PhD in Co
 mputer Science from MIT in 2014. He co-created the Halide language and has
  built more than a half-dozen other DSL and compiler systems\, the first o
 f which was a finalist for an Academy technical achievement award.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Conal Elliott
DTSTART:20211029T150000Z
DTEND:20211029T160000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/3
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/3/">Can Tensor Programming Be Liberated from the Fortra
 n Data Paradigm?</a>\nby Conal Elliott as part of Oxford Seminars on Tenso
 r Computation\n\nAbstract: TBA\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 3/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Markus Püschel (ETH Zürich)
DTSTART:20211105T160000Z
DTEND:20211105T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/9
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/9/">Program Generation for Small-Scale Linear Algebra</
 a>\nby Markus Püschel (ETH Zürich) as part of Oxford Seminars on Tensor 
 Computation\n\n\nAbstract\nMany performance-critical computations in commu
 nication\, control\, multimedia processing\, machine learning\, or graphic
 s fall into the domain of linear algebra. Existing optimized libraries for
  linear algebra are usually optimized for large scale computation and for 
 uses in scientific computing. For small scale computations in other domain
 s they are often suboptimal. In this talk I present our work on generating
  optimized linear algebra code directly from a mathematical description us
 ing techniques developed in Spiral (www.spiral.net): layers of domain-spec
 ific languages to express the mathematics and the use of rewriting systems
  to reshape the computation at a high level of abstraction to overcome kno
 wn compiler limitations. (This is the thesis work of Daniele Spampinato\; 
 project website: https://acl.inf.ethz.ch/research/LGen/.)\n\nBio: Markus P
 üschel is a Professor and former Department Head of Computer Science at E
 TH Zürich\, Switzerland. Before\, he was a Professor of Electrical and Co
 mputer Engineering at Carnegie Mellon University\, where he still has an a
 djunct status. He is an IEEE Fellow. One of his longstanding interests is 
 automating the production of high performance software and hardware design
 s for mathematical functionality as exemplified by the Spiral project. Bes
 ides this\, his current interests include program generation\, novel forms
  of Fourier analysis and its applications\, machine learning\, and program
  analysis. For more information\, please visit https://acl.inf.ethz.ch/.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rohan Yadav (Stanford)
DTSTART:20211119T160000Z
DTEND:20211119T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/11
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/11/">DISTAL: The Distributed Tensor Algebra Compiler</a
 >\nby Rohan Yadav (Stanford) as part of Oxford Seminars on Tensor Computat
 ion\n\n\nAbstract\nWe introduce DISTAL\, a compiler for dense tensor algeb
 ra that targets modern distributed and heterogenous systems. DISTAL allows
  users to independently describe how tensors and computation map onto the 
 target machine through the tensors’ formats and a scheduling language. T
 he combination of choices for data and computation distribution creates a 
 design space that includes algorithms from the past (Cannon’s algorithm)
  and present (COSMA). DISTAL compiles a tensor algebra domain specific lan
 guage to a distributed task-based runtime system and supports nodes with m
 ulti-core CPUs and multiple GPUs. Code generated by DISTAL is competitive 
 with optimized codes for matrix multiply on 256 nodes of the Lassen superc
 omputer and outperforms existing systems by between 1.8$\\times$ to 3.7$\\
 times$ (with a 45.7$\\times$ outlier) on higher order tensor operations.\n
 \nBio: Rohan Yadav is a second year computer science PhD student at Stanfo
 rd University\, advised by Alex Aiken and Fredrik Kjolstad. He is generall
 y interested in programming languages and computer systems\, with a partic
 ular focus in systems for parallel and distributed computing.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Martin Elsman (Copenhagen)
DTSTART:20211126T160000Z
DTEND:20211126T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/12
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/12/">Size-Dependent Types for Practical Data-Parallel P
 rogramming</a>\nby Martin Elsman (Copenhagen) as part of Oxford Seminars o
 n Tensor Computation\n\n\nAbstract\nWe present a type system for expressin
 g size constraints on array\ntypes in an ML-style type system.  The goal i
 s to detect shape\nmismatches at compile-time without having to deal with 
 all the\nconsequences of a full dependent type system.  The main restricti
 ons\nare that the only terms that can occur in types are array sizes\, whi
 ch\nare constrained syntactically to be either variables or constants.\nFo
 r expressions that result in arrays of sizes that are not\nexpressible usi
 ng these restrictions\, the system supports a form of\nexistential types\,
  with the type system automatically managing the\nrequisite book-keeping\,
  while guaranteeing that\, at runtime\, all\narrays are regular.\n\nThe ty
 pe system forms the basis of the type system for Futhark\, a\ndata-paralle
 l functional language (and compiler)\, which is aimed at\ngenerating effic
 ient parallel code for GPUs and multi-threaded CPUs.\nFuthark is equipped 
 with a number of first- and second-order array\ncombinators\, which have d
 ata-parallel semantics.  Futhark performs a\nnumber of fusion\, tiling and
  flattening transformations\, and may even\ngenerate multiple code version
 s that are dispatched dynamically based\non auto-tuned size-aspects of inp
 ut data.  The size-dependent type\nsystem works well with Futhark's suppor
 t for higher-order modules and its\nlimited form of higher-order functions
 \, which are all eliminated\nentirely at compile time.  We give examples o
 f library functions and\ndata structures that utilise the features of size
 -dependent types to\nexpress the intentions of how functions are used\, fo
 r instance\, to\navoid out-of-bounds array-index errors and to guard again
 st the\ncomposition of incompatible neural network layers.\n\nFuthark is j
 oint work with a number of researchers at DIKU\, including\nTroels Henriks
 en (DIKU)\, Cosmin E. Oancea\, Ken Friis Larsen\, Fritz\nHenglein\, and Ph
 ilip Munksgaard.\n\nBio:  	\nMartin Elsman conducts research in the design
  and implementation of\nprogramming languages.  Areas of research include 
 compilation\ntechniques for functional languages\, in particular with focu
 s on\nparallel languages\, module systems\, Web technology\, program analy
 ses\nfor memory management\, program optimisation\, and domain-specific\nl
 anguages for financial contracts.  Martin is Professor in the\nProgramming
  Languages and Theory of Computation section at Department\nof Computer Sc
 ience\, University of Copenhagen (DIKU)\, where he serves\nas head of stud
 ies for a BSc education on Computer Science and\nEconomics.  Martin is als
 o an active maintainer of several software\ntools\, including the MLKit\, 
 a full-blown Standard ML compiler\, which\ntargets both JavaScript and x86
 -64 machine code.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dimitrios Vytiniotis (DeepMind)
DTSTART:20211203T160000Z
DTEND:20211203T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/13
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/13/">Automating Tensor Program Partitioning on Accelera
 tor Systems with PartIR</a>\nby Dimitrios Vytiniotis (DeepMind) as part of
  Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe rapid rise in de
 mand for training large neural networks has brought into focus the need fo
 r partitioning across systems of accelerator devices. Implementing various
  forms of partitioning is increasingly supported through program primitive
 s\, but identifying efficient partitioning strategies requires expensive e
 xperimentation and expertise. We present the prototype of an automated par
 titioning system that integrates into existing compilers and existing user
  workflows. Our system relies on layering functional loop abstractions –
  that return or reduce over chunks of arrays – on top of an arbitrary ar
 ray “dialect” (following the MLIR terminology) such as XLA. We use rew
 rite rules reminiscent of fusion rules from stream fusion to express vario
 us forms of propagation of partitioning information across a program. Our 
 system compiles functional loops to SPMD abstractions in a lower-level dia
 lect whose types capture distributed arrays and which includes explicit ar
 ray redistribution commands. This dialect can then be lowered\, compiled\,
  and executed using the “native” backend compiler and runtime (e.g. XL
 A) in a device-agnostic manner. We will present the design of a search env
 ironment controlling the actions of our rewrite engine that is specificall
 y aiming to tame the size of search space by (a) mimicking the way expert 
 programmers would attempt to partition their programs and (b) exploiting h
 igh-level model structure already available in popular libraries for neura
 l networks. We show promising initial results\, such as the ability to aut
 omatically recover good partitioning for important neural network architec
 tures\; and we outline remaining challenges.\n\nBio: Dimitrios Vytiniotis 
 is a research scientist leading the research in programming languages and 
 machine learning systems at DeepMind. He holds a PhD from the University o
 f Pennsylvania (2008) and was a researcher with Microsoft Research Cambrid
 ge until 2018. His interests span functional programming and type systems\
 , and more broadly language design and implementation\, with applications 
 in areas like systems and machine learning.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Alex Aiken (Stanford University)
DTSTART:20220121T160000Z
DTEND:20220121T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/14
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/14/">Legion: Programming Distributed Heterogeneous Arch
 itectures</a>\nby Alex Aiken (Stanford University) as part of Oxford Semin
 ars on Tensor Computation\n\n\nAbstract\nProgrammers tend to think of para
 llel programming as a problem of\ndividing up computation\, but increasing
 ly the most important decisions\ninvolve the partitioning\, placement and 
 movement of data.  Legion is a\ndata-centric task-based programming model 
 for the development of\ncomposable and portable software on distributed\, 
 heterogeneous\narchitectures.  The Legion model is built around two core f
 eatures: a\ndata model that allows users to dynamically describe the struc
 ture of\nprogram data and a suite of partitioning operators for describing
  the\nsubsets of data used by tasks.  Leveraging its detailed knowledge of
 \nprogram data\, the Legion runtime uses dynamic dependence analysis to\na
 utomatically infer implicit parallelism\, data movement\, and\nsynchroniza
 tion. A separate mapping interface decouples Legion\nprograms from how the
 y are scheduled onto individual machines\, making\nLegion programs easy to
  port.  We will give several examples of how\nLegion is used for accelerat
 ing both HPC and machine learning\nworkloads at scale.\n\nBio: Alex Aiken 
 is the Alcatel-Lucent Professor of Computer Science at Stanford. Alex rece
 ived his Bachelors degree in Computer Science and Music from Bowling Green
  State University in 1983 and his Ph.D. from Cornell University in 1988. A
 lex was a Research Staff Member at the IBM Almaden Research Center (1988-1
 993) and a Professor in the EECS department at UC Berkeley (1993-2003) bef
 ore joining the Stanford faculty in 2003. His research interest is in area
 s related to programming languages. He is an ACM Fellow\, a recipient of A
 CM SIGPLAN's Programming Languages Achievement Award and Phi Beta Kappa's 
 Teaching Award\, and a former chair of the Stanford Computer Science Depar
 tment.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 14/
END:VEVENT
BEGIN:VEVENT
SUMMARY:David Ham (Imperial College)
DTSTART:20220128T160000Z
DTEND:20220128T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/15
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/15/">Automating Finite Element Simulation by Generating
  Tensor Computations from Vector Calculus</a>\nby David Ham (Imperial Coll
 ege) as part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe s
 imulation of continuous physical systems described by Partial Differential
  Equations (PDEs) has been and continues to be one of the great challenges
  of scientific computing. From nanomaterials to the weather forecast\, the
  ability to simulate and optimise continuous systems underpins much of sci
 ence and engineering. From a software perspective\, the creation of simula
 tion tools requires the complex manipulation of the PDEs involved\, then t
 heir discretisation\, and finally the optimal scheduling of the resulting 
 calculation. In this talk I will show how the various stages of this tool 
 creation process can be modelled as tensor computations\, and that each st
 age can be automatically generated from the previous one using specialised
  compiler technology. The result is that scientists and engineers can form
 ulate advanced numerical methods for ever-changing PDEs\, and have high pe
 rformance computational tools generated automatically. This brings both pr
 oductivity and performance to the simulation problem\, enabling scientists
  to undertake work that would previously have exceeded their human and com
 putational resources.\n\nBio: Dr David Ham is a reader in Computational Ma
 thematics at Imperial College London. He has degrees in mathematics and la
 w from the Australian National University\, and a doctorate in numerical m
 ethods for PDEs from TU Delft. His research focusses on automating the fin
 ite element method\, and focuses on the Firedrake automated simulation sys
 tem. He received the 1995 Wilkinson Prize for Numerical Software for his a
 utomation of inverse finite element simulation. Dr Ham co-leads the joint 
 mathematics and computer science degree programme at Imperial College Lond
 on\, and founded and leads the Mary Lister McCammon Summer Research Fellow
 ship for Women in Mathematics and Statistics. He is the chief executive ed
 itor of the European Geosciences Union journal Geoscientific Model Develop
 ment.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 15/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Sven-Bodo Scholz (Radboud University Nijmegen)
DTSTART:20220204T160000Z
DTEND:20220204T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/16
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/16/">Tensor Comprehensions in SaC: A Minimalistic Notat
 ion for High-Performance Computing</a>\nby Sven-Bodo Scholz (Radboud Unive
 rsity Nijmegen) as part of Oxford Seminars on Tensor Computation\n\n\nAbst
 ract\nThis talk focusses on programmer productivity when it comes to defin
 ing\,\nunderstanding\, and maintaining computations on multi-dimensional a
 rrays.\nShape-invariant programming\, i.e.\, the ability to define APL-lik
 e operators that\ncan be applied to arrays of arbitrary dimensionality\, s
 urely constitutes a key\nelement here. This raises the question what the m
 inimal building blocks for\nsuch operators should be. Should they be a set
  of fixed primitives a la APL?\nShould they be a small set of higher-order
  operators? Should they be inherently\nn-dimensional or should they be one
 -dimensional and then be applied recursively?\nShould they be loops?  Or d
 o we need all of these to conveniently express our\nalgorithms?\n\nIn the 
 context of SaC\, we propose a new form of array comprehensions named\n"Ten
 sor Comprehensions".  This notation strives to be flexible enough to allow
 \nfor all of the above-mentioned flavours. Despite this flexibility\, Tens
 or\nComprehensions aim to be minimalistic in the syntactical requirements\
 , building\non sophisticated inference technology to enable programmers to
  leave out many\n"obvious" parts.  The resulting notation comes rather clo
 se to the so-called\nTensor Notations used in Physics and Mathematics.  As
  a result\, complex\noperators with rich semantics can be defined more con
 cisely than before.\n\nBio: Sven-Bodo Scholz is Professor of Computer Scie
 nce at Radboud University\, Nijmegen\, Netherlands.\nHe also holds a profe
 ssorship at Heriot-Watt University\, Edinburgh\, Scotland.\nHis research i
 s driven by the desire to bridge the gap between high-productivity program
 ming tools\nand high-performance heterogeneous many-core systems by means 
 of compilation technology. Typical\napplication areas range from multi-sen
 sor robotics systems over big-data analytics to vision and\ncomputational 
 science. Target systems range from embedded circuits over large clusters o
 f\nGPU-accelerated systems into cloud infrastructures.\n\nMost of his work
  on parallelising compiler technology is driven by the needs of industrial
  project\npartners such as Intel\, AMD\, Thales\, SAP\, Philips and others
 . Besides regular international\ndissemination in both\, academia and indu
 stry\, his work has led to several systems in the public\ndomain most nota
 bly the SaC compiler tool-chain (www.sac-home.org).\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 16/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Adam Paszke (Google)
DTSTART:20220211T160000Z
DTEND:20220211T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/17
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/17/">Getting to the Point with Dex: Safe Parallel Progr
 amming for Scientific Applications</a>\nby Adam Paszke (Google) as part of
  Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe talk will be foc
 used on the design of Dex\, both in terms of the surface syntax and its ty
 ping discipline. Dex is a new domain specific programming language aiming 
 to make it easier to implement parallel scientific computing workloads in 
 a clear and safe way\, while being able to achieve the efficiency of low-l
 evel numerical languages. The core idea underlying its design is the treat
 ment of arrays as memoized representations of functions with finite domain
 s\, allowing abstract function manipulations\, such as currying or abstrac
 tion\, to work on arrays. Additionally\, instead of following the well-tro
 dden path of bulk-array combinators\, we argue for a programming style hea
 vy on explicit array indexing\, that closely mirrors function applications
 . We associate the classical bulk-array programming with “pointfree” s
 tyle of functional programming and try to rebuild the array paradigm in an
  (arguably more popular) “pointful” style instead. For increased expre
 ssiveness and efficiency (especially under automatic differentiation)\, we
  additionally extend the language with a fine-grained effect system that a
 llows us to reason about performance in a type-directed way.\n\nBio: Adam 
 Paszke is a Senior Research Scientist at Google\, based in Warsaw\, Poland
 . His work focuses on automatic differentiation\, parallelism-friendly pro
 gramming languages for scientific computing\, and partitioning of those fo
 r purposes of distributed execution. Before Google\, he worked with Facebo
 ok and authored PyTorch. He graduated in Computer Science and Mathematics 
 from the University of Warsaw.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 17/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Oleg Kiselyov (Tohoku University)
DTSTART:20220218T160000Z
DTEND:20220218T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/18
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/18/">Even Better Stream Fusion</a>\nby Oleg Kiselyov (T
 ohoku University) as part of Oxford Seminars on Tensor Computation\n\n\nAb
 stract\nStream processing is one of the key data processing modes\, relate
 d to dataflow programming. It was dominant in the punch-card era\, and is 
 becoming prevalent again\, in the era of huge data\, ubiquitous sensors an
 d distributed computing. Its characteristic is incremental\, sequential pr
 ocessing with bounded buffering\, which lets one handle possibly unbounded
  amount of data in limited space. Another characteristic is the ease of sp
 ecifying it as a Xmas-lights diagram: if some further processing is needed
 \, just plug in another segment.\n\nAlthough the diagrams are easy to draw
 \, they are difficult to implement with low latency and in low memory. Thi
 s talk is about the key optimization: stream fusion\, which is combining s
 everal simple processing steps into one complex step\, reducing the amount
  of intermediary data and communication overhead. Specifically\, we will t
 alk about complete fusion: not just reduction but complete elimination. Th
 is is hard\, especially for diagrams with "fat pipes" (flatmap) and "joins
 " (zip).\n\nThis talk introduces the ongoing work on strymonas\, which is 
 a high-performance code generation library (DSL) that converts a diagram-l
 ike specification into hand-written-like code -- with assured complete fus
 ion. We describe the main ideas behind the complete fusion of diagrams wit
 h joins\, and illustrate on the example of the software FM radio.\n\nBio: 
 Oleg Kiselyov is an Assistant Professor at Tohoku University in Japan. He 
 got interested in stream processing when automating scientific instruments
  (calorimeters and neuron activity recording) 35 years ago. In 1990s he wr
 ote and maintained a C++ linear algebra library based on streams rather th
 an arrays. Later on he wrote a streaming XML parser\, still used in Scheme
  community\, and designed Iteratees (see Wikipedia). His latest interest i
 s generating fast stream processing code.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 18/
END:VEVENT
BEGIN:VEVENT
SUMMARY:James Demmel (University of Berkeley)
DTSTART:20220225T160000Z
DTEND:20220225T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/19
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/19/">Communication-Avoiding Algorithms for Linear Algeb
 ra\, Machine Learning and Beyond</a>\nby James Demmel (University of Berke
 ley) as part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nAlgor
 ithms have two costs: arithmetic and communication\, i.e. moving data betw
 een levels of a memory hierarchy or processors over a network. Communicati
 on costs (measured in time or energy per operation) greatly exceed arithme
 tic costs\, so our goal is to design algorithms that minimize communicatio
 n. We survey some known algorithms that communicate asymptotically less th
 an their classical counterparts\, for a variety of linear algebra and mach
 ine learning problems\, often attaining lower bounds. We also discuss rece
 nt work on automating the design and implementation of these algorithms\, 
 starting from a simple specification as nested loops.\n\nBio: James Demmel
  is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Scienc
 e and Mathematics at the University of California at Berkeley\, and former
  Chair of the EECS Dept.  His research is in numerical linear algebra\, hi
 gh performance computing\, and communication avoiding algorithms. He is kn
 own for his work on the widely used LAPACK and ScaLAPACK linear algebra li
 braries.  He is a member of the National Academy of Sciences\, National Ac
 ademy of Engineering\, and American Academy of Arts and Sciences\; a Fello
 w of the AAAS\, ACM\, AMS\, IEEE and SIAM\; and winner of the IPDPS Charle
 s Babbage Award\, IEEE Computer Society Sidney Fernbach Award\, the ACM Pa
 ris Kanellakis Award\, and numerous best paper prizes.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 19/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mike Giles (University of Oxford)
DTSTART:20220304T160000Z
DTEND:20220304T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/20
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/20/">Some Reflections on Automated Code Generation</a>\
 nby Mike Giles (University of Oxford) as part of Oxford Seminars on Tensor
  Computation\n\n\nAbstract\nIn this talk from a workshop 8 years ago\, I r
 eflect on a number of projects I was involved in\, or aware of\, at that t
 ime.  The common feature was the desire to simplify high performance compu
 ting through abstraction\, separating the specification of what was to be 
 computed from the details of how it was computed.  In practice\, this invo
 lved automated code generation\, either through the processing of embedded
  DSLs\, or through the creation of application-specific DSLs with custom c
 ode generation backends.\n\nBio: Mike Giles is a Professor of Scientific C
 omputing and currently head of the Mathematical Institute\; from 1992 to 2
 008 he was in Oxford's Computer Science department which was then called t
 he Computing Laboratory.  His primary research interests are in the develo
 pment and analysis of a wide variety of numerical algorithms\, but a secon
 dary interest is in various aspects of high performance computing.  This i
 ncluded being one of the UK's early pioneers in GPU computing\, and led to
  him establishing the Emerald and JADE GPU supercomputers.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 20/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Gabriele Keller (Utrecht University)
DTSTART:20220311T160000Z
DTEND:20220311T170000Z
DTSTAMP:20260604T221804Z
UID:OxfordTensorComputation/21
DESCRIPTION:Title: <a href="https://stable.researchseminars.org/talk/Oxfor
 dTensorComputation/21/">Accelerate: High-Performance Computing in Haskell<
 /a>\nby Gabriele Keller (Utrecht University) as part of Oxford Seminars on
  Tensor Computation\n\n\nAbstract\nThis talk presents Accelerate\, a data-
 parallel programming language embedded in Haskell\, with multi-core CPU an
 d GPU backends. In Accelerate\, data parallelism is expressed through a se
 t of first and second order functions operating on (possibly multi-dimensi
 onal) arrays\, where parallel and sequential operations are distinguished 
 through types. This statically excludes irregular nested data parallel ope
 rations\, which the compiler currently cannot efficiently map to the targe
 t architecture. We will discuss how Accelerate is positioned in the space 
 of comparable languages and present some of the core ideas underlying the 
 implementation of Accelerate and its embedding in the host language\, incl
 uding the type system of the language. Furthermore\, we provide a summary 
 of current projects and where we are planning to take the language in the 
 near future.\n\nBio: Gabriele Keller is the chair of the Software Technolo
 gy Group at Utrecht University in the Netherlands. Before moving to Utrech
 t\, she was an Associate Professor at University of New South Wales in Syd
 ney\, Australia\, where she co-founded the Programming Language and System
 s Group. Her research interests are in programming languages\, in particul
 ar functional languages and languages for high-performance computing\, as 
 well as verified compilation of programming languages.\n
LOCATION:https://stable.researchseminars.org/talk/OxfordTensorComputation/
 21/
END:VEVENT
END:VCALENDAR