Run any AI model on any processor
AXIR (Axiomos IR) is a neutral, portable execution layer that translates CUDA, HIP, SYCL, OpenCL into a single intermediate representation targeting CPUs & GPUs via PTX, ROCm, and SPIR‑V/Vulkan paths. Business‑first platform: open interfaces, commercial core.
The Problem
AI has become core infrastructure. Yet ~70% of AI compute runs on a single proprietary stack (CUDA/NVIDIA). Every new chip (TPU, Trainium, photonic, neuromorphic) ships with its own programming model — creating fragmentation, lock‑in, porting costs, and slower innovation.
- Porting kernels across stacks costs months and fractures teams.
- Enterprises want multi‑hardware strategies without vendor lock‑in.
- New chips struggle to reach developers at scale.
The Vision
A Universal Operating System for AI: one portable standard powering models across CPUs & GPUs, on‑prem and cloud.
Prototype Status
Frontends
CUDA, HIP, SYCL, OpenCL
Backends
CPU (NumPy runtime); GPU paths for PTX (CUDA), ROCm; SPIR‑V/Vulkan path in progress.
Kernels today
Prototype runs on CPU and real GPUs (NVIDIA, AMD, Intel). Optimized PTX/ROCm codegen underway.
Demo | What it shows |
---|---|
4×5 kernels matrix | All frontends compile to AXIR and execute across backends |
Matmul/Conv | Performance baseline vs. vendor libs; parity is the target |
Autotuner (WIP) | Search over tiling/threads/blocks per backend |
Benchmarks
Comparisons vs. vendor libraries (cuBLAS/cuDNN, rocBLAS/MIOpen, oneDNN) — initial baseline, parity target.
Kernel | Backend | Shape / Params | Vendor Lib | AXIR | Δ (%) |
---|---|---|---|---|---|
matmul | PTX (NVIDIA) | 4096×4096 | XX GFLOPS | YY GFLOPS | +/-Z% |
conv | ROCm (AMD) | NCHW 64×3×224×224 | XX ms | YY ms | +/-Z% |
reduce_sum | SPIR‑V (Intel) | 1e8 elems | XX ms | YY ms | +/-Z% |
Methodology: warm‑up runs, median of N=20, same precision, same device. Full scripts will be published.
Technical Principles
IR & Lowering
Frontends ingest CUDA/HIP/SYCL/OpenCL → lower to AXIR. We separate algorithmic intent from hardware scheduling, enabling portable optimizations.
Backends
Target‑specific codegen for PTX (NVIDIA), ROCm (AMD), and SPIR‑V/Vulkan (Intel/portable). CPU runtime for correctness & CI.
Performance
Backend‑aware passes (tiling, vectorization, shared‑mem usage), planned autotuning and kernel libraries for matmul/conv/reduction. Academic pilots drive real workloads.
Product, not just research
Open interfaces for collaboration; commercial core for reliability, support, and enterprise features. We avoid the trap of becoming a pure LLVM research project.
Roadmap (next 6 months)
Get in Touch
We’re assembling a founding team (systems/compilers engineer and HPC mathematician) and seeking pilot users.
If you’re a professor/researcher: AXIR is a practical testbed for algorithms across backends. If you’re an investor: this is a business‑first standard with enterprise demand.
Prefer GitHub? AidenKuro10/axiomos-axir