Universal AI OS Write once → run everywhere

Run any AI model on any processor

AXIR (Axiomos IR) is a neutral, portable execution layer that translates CUDA, HIP, SYCL, OpenCL into a single intermediate representation targeting CPUs & GPUs via PTX, ROCm, and SPIR‑V/Vulkan paths. Business‑first platform: open interfaces, commercial core.

Request demo Download 1‑pager (PDF) Download White Paper (PDF) GitHub Repo

The Problem

AI has become core infrastructure. Yet ~70% of AI compute runs on a single proprietary stack (CUDA/NVIDIA). Every new chip (TPU, Trainium, photonic, neuromorphic) ships with its own programming model — creating fragmentation, lock‑in, porting costs, and slower innovation.

Porting kernels across stacks costs months and fractures teams.
Enterprises want multi‑hardware strategies without vendor lock‑in.
New chips struggle to reach developers at scale.

We build the neutral layer between algorithms and hardware. Write once → run everywhere.

The Vision

A Universal Operating System for AI: one portable standard powering models across CPUs & GPUs, on‑prem and cloud.

Neutral & portable

Performance‑aware codegen

Developer‑first UX

Prototype Status

Frontends
CUDA, HIP, SYCL, OpenCL

Backends
CPU (NumPy runtime); GPU paths for PTX (CUDA), ROCm; SPIR‑V/Vulkan path in progress.

Kernels today

vector_add saxpy reduce_sum matmul conv

Prototype runs on CPU and real GPUs (NVIDIA, AMD, Intel). Optimized PTX/ROCm codegen underway.

Demo	What it shows
4×5 kernels matrix	All frontends compile to AXIR and execute across backends
Matmul/Conv	Performance baseline vs. vendor libs; parity is the target
Autotuner (WIP)	Search over tiling/threads/blocks per backend

Benchmarks

Comparisons vs. vendor libraries (cuBLAS/cuDNN, rocBLAS/MIOpen, oneDNN) — initial baseline, parity target.

Kernel	Backend	Shape / Params	Vendor Lib	AXIR	Δ (%)
matmul	PTX (NVIDIA)	4096×4096	XX GFLOPS	YY GFLOPS	+/-Z%
conv	ROCm (AMD)	NCHW 64×3×224×224	XX ms	YY ms	+/-Z%
reduce_sum	SPIR‑V (Intel)	1e8 elems	XX ms	YY ms	+/-Z%

Methodology: warm‑up runs, median of N=20, same precision, same device. Full scripts will be published.

Technical Principles

IR & Lowering

Frontends ingest CUDA/HIP/SYCL/OpenCL → lower to AXIR. We separate algorithmic intent from hardware scheduling, enabling portable optimizations.

Backends

Target‑specific codegen for PTX (NVIDIA), ROCm (AMD), and SPIR‑V/Vulkan (Intel/portable). CPU runtime for correctness & CI.

Performance

Backend‑aware passes (tiling, vectorization, shared‑mem usage), planned autotuning and kernel libraries for matmul/conv/reduction. Academic pilots drive real workloads.

Product, not just research

Open interfaces for collaboration; commercial core for reliability, support, and enterprise features. We avoid the trap of becoming a pure LLVM research project.

Roadmap (next 6 months)

Optimized PTX backend → parity demos vs vendor libs

ROCm bring‑up + perf passes

SPIR‑V/Vulkan path for Intel & portability

Autotuning + benchmark suite (open results)

Docs & CLI: developer onboarding

Academic pilots (EPFL, ENS) & early enterprise design‑partners

Get in Touch

We’re assembling a founding team (systems/compilers engineer and HPC mathematician) and seeking pilot users.

If you’re a professor/researcher: AXIR is a practical testbed for algorithms across backends. If you’re an investor: this is a business‑first standard with enterprise demand.

pierre.seck@unine.ch contact@axiomos.ai One‑Pager (PDF) White Paper (PDF)

Prefer GitHub? AidenKuro10/axiomos-axir