Universal AI OS Write once → run everywhere

Run any AI model on any processor

AXIR (Axiomos IR) is a neutral, portable execution layer that translates CUDA, HIP, SYCL, OpenCL into a single intermediate representation targeting CPUs & GPUs via PTX, ROCm, and SPIR‑V/Vulkan paths. Business‑first platform: open interfaces, commercial core.

The Problem

AI has become core infrastructure. Yet ~70% of AI compute runs on a single proprietary stack (CUDA/NVIDIA). Every new chip (TPU, Trainium, photonic, neuromorphic) ships with its own programming model — creating fragmentation, lock‑in, porting costs, and slower innovation.

  • Porting kernels across stacks costs months and fractures teams.
  • Enterprises want multi‑hardware strategies without vendor lock‑in.
  • New chips struggle to reach developers at scale.
We build the neutral layer between algorithms and hardware. Write once → run everywhere.

The Vision

A Universal Operating System for AI: one portable standard powering models across CPUs & GPUs, on‑prem and cloud.

Neutral & portable
Performance‑aware codegen
Developer‑first UX

Prototype Status

Frontends
CUDA, HIP, SYCL, OpenCL

Backends
CPU (NumPy runtime); GPU paths for PTX (CUDA), ROCm; SPIR‑V/Vulkan path in progress.

Kernels today

vector_add saxpy reduce_sum matmul conv

Prototype runs on CPU and real GPUs (NVIDIA, AMD, Intel). Optimized PTX/ROCm codegen underway.

DemoWhat it shows
4×5 kernels matrixAll frontends compile to AXIR and execute across backends
Matmul/ConvPerformance baseline vs. vendor libs; parity is the target
Autotuner (WIP)Search over tiling/threads/blocks per backend

Benchmarks

Comparisons vs. vendor libraries (cuBLAS/cuDNN, rocBLAS/MIOpen, oneDNN) — initial baseline, parity target.

KernelBackendShape / ParamsVendor LibAXIRΔ (%)
matmulPTX (NVIDIA)4096×4096XX GFLOPSYY GFLOPS+/-Z%
convROCm (AMD)NCHW 64×3×224×224XX msYY ms+/-Z%
reduce_sumSPIR‑V (Intel)1e8 elemsXX msYY ms+/-Z%

Methodology: warm‑up runs, median of N=20, same precision, same device. Full scripts will be published.

Technical Principles

IR & Lowering

Frontends ingest CUDA/HIP/SYCL/OpenCL → lower to AXIR. We separate algorithmic intent from hardware scheduling, enabling portable optimizations.

Backends

Target‑specific codegen for PTX (NVIDIA), ROCm (AMD), and SPIR‑V/Vulkan (Intel/portable). CPU runtime for correctness & CI.

Performance

Backend‑aware passes (tiling, vectorization, shared‑mem usage), planned autotuning and kernel libraries for matmul/conv/reduction. Academic pilots drive real workloads.

Product, not just research

Open interfaces for collaboration; commercial core for reliability, support, and enterprise features. We avoid the trap of becoming a pure LLVM research project.

Roadmap (next 6 months)

Optimized PTX backend → parity demos vs vendor libs
ROCm bring‑up + perf passes
SPIR‑V/Vulkan path for Intel & portability
Autotuning + benchmark suite (open results)
Docs & CLI: developer onboarding
Academic pilots (EPFL, ENS) & early enterprise design‑partners

Get in Touch

We’re assembling a founding team (systems/compilers engineer and HPC mathematician) and seeking pilot users.

If you’re a professor/researcher: AXIR is a practical testbed for algorithms across backends. If you’re an investor: this is a business‑first standard with enterprise demand.

Prefer GitHub? AidenKuro10/axiomos-axir