Senior Software Engineer, AI Frameworks.
Mathematician by training.

Works on the parts of the ML stack you don't see — performance-critical framework code that lands upstream in PyTorch, oneDNN, and Arm Compute Library.

Arm AI Frameworks ·PyTorch / oneDNN / ACL ·cmakefmt

Puneet Matharu, PhD
My opinions are my own and not those of my employer.

C++ ·Python ·PyTorch ·oneDNN ·ACL ·CMake ·Docker ·Linux ·C++ ·Python ·PyTorch ·oneDNN ·ACL ·CMake ·Docker ·Linux

Experience

See full journey
  1. 2025 — Now · Arm

    Senior Software Engineer, AI Frameworks

    Upstream Lead for Arm's Tool Solutions ML framework distributions. Integrate contributions across PyTorch, oneDNN, Arm Compute Library, and OpenBLAS into shippable Arm-optimised builds.

  2. 2021 — 25 · Arm

    Senior Imaging Algorithms Engineer

    Built imaging algorithms and engineering tooling for Arm's ISP pipeline. C++/LibTorch performance work, RGB reconstruction for automotive RGB-IR sensors, and Python wrappers for low-level imaging libraries.

  3. 2016 — 20 · University of Manchester

    PhD in Applied Mathematics

    Numerical computation of time-periodic solutions to PDEs, applied to nonlinear fluid dynamics. Published in the Journal of Fluid Mechanics.

Selected Impact

See projects
  1. 01 Upstream Tool Solutions

    Lead the framework release path

    Integrate PyTorch, oneDNN, Arm Compute Library, and OpenBLAS contributions into Arm-optimised framework builds.

    PyTorchoneDNNACLOpenBLAS
  2. 02 Inference LLaMA

    Route hot paths through optimised kernels

    Upstreamed oneDNN changes that dispatch LLaMA matmul plus JIT post-op through Arm Compute Library instead of the reference path.

    matmulJIT post-opACL
  3. 03 Release AArch64

    Restore public binary distribution

    Brought Docker releases back online by clearing SBOM, vulnerability handling, threat modelling, and release-governance bottlenecks.

    SBOMPolicyDocker
  4. 04 Imaging 2-10x

    Move IQ metrics out of Python overhead

    Reimplemented Python/PyTorch image-quality metric scripts in C++ with LibTorch while preserving the same metric behavior.

    PythonLibTorchC++
  5. 05 Tooling cmakefmt

    Ship a fast formatter

    A Rust CMake formatter with command specs, editor integrations, and CI-friendly flags for real build-system workflows.

    ParserLSPEditors

Research & Publications

The time-periodic P+S vortex-shedding pattern in the wake behind an oscillating cylinder (Re = 100).

Completed a PhD in applied mathematics at the University of Manchester, focused on fluid dynamics and numerical methods for time-periodic solutions of PDEs.

For modest Reynolds numbers (Re ≤ 100), a fixed cylinder sheds vortices in a classical 2S pattern — the Kármán vortex street. When the cylinder oscillates with a period close to the natural shedding frequency, increasing the oscillation amplitude triggers a transition to a different, asymmetric wake (the P+S pattern). A central question of the thesis was whether this transition arises through a continuous (topological) evolution of the flow, or via bifurcations of the Navier–Stokes equations.

Why symmetric oscillations can create asymmetric wakes works through the answer end to end — the bifurcation diagram, the isolas, the numerical method, and an interactive scrubber across the asymmetric branch.

ORCID 0000-0001-9359-9814