/ architecture

The Madness Stack.

Physics-bound infrastructure, layer by layer.

We do not optimize software; we eliminate it. Traditional clouds are built on general-purpose abstractions stacked twelve deep. Avahana collapses that stack using unikernel-like OS principles, kernel-bypass networking, and control-plane-as-a-service primitives — delivering infrastructure that runs at the speed of the underlying wire and silicon.

01Foundations

Three core pillars.

Zero-Copy Networking

eBPF and XDP process packets at the NIC driver level, bypassing the heavy Linux TCP/IP stack entirely.

Hard Multi-Tenancy

We reject namespaces for isolation. Every tenant gets a microVM (Kata / Cloud Hypervisor) that boots in under 100ms.

The Hollow Fleet

A centralized control-plane factory manages distributed, immutable worker nodes via persistent reverse tunnels.

Layer by layer.

Twelve layers, each rejecting a legacy abstraction in favor of a primitive that respects the hardware.

L0

The Substrate

Talos Linux

An immutable, API-driven OS that boots in RAM. No SSH. No console. No package manager.

The physics

  • General-purpose Linux is technical debt. Talos is a <80MB bootloader for Kubernetes.
  • System extensions are immutable overlays applied at boot time for specialized hardware (GPUs, NICs).
  • We manage 10,000 nodes via gRPC API, eliminating configuration drift entirely.

Why not Ubuntu / RHEL

Configuration drift, SSH-based ops, manual patching. Doesn't scale operationally.

Operational gain

  • Repurpose a Web Node into a GPU Node by updating its MachineConfig API and rebooting.
L1

The Factory

Kamaji (Control Plane as a Service)

Tenant control planes run as lightweight pods sharing a hyper-optimized multi-tenant etcd backend.

The physics

  • Stateless API servers. Kubernetes control planes spin up as pods in <5 seconds.
  • Shared etcd: thousands of tenants on a single NVMe-backed datastore, separated by cryptographic keys.
  • 100% Kubernetes API compatibility — every Helm chart, operator, and tool works out of the box.

Why not Custom Rust control plane

We prioritize ecosystem compatibility over reinvention. Standard kube-apiserver wins.

Operational gain

  • 1,000+ control planes per bare metal node.
  • Provisioning a tenant cluster is just starting a pod.
L2

Virtualization

Polymorphic Isolation Engine

Adaptive runtime: microVMs on metal, hardened containers on cloud, Wasm for edge functions.

The physics

  • On bare metal: Cloud Hypervisor (Rust) via Kata. 100% native speed, hardware-level isolation.
  • On public cloud: hardened native containers wrapped in user namespaces and policed by Tetragon eBPF. 99% native speed, provider-grade enforcement.
  • For high-density logic: WebAssembly (WasmEdge). Millisecond startup for AI inference and serverless.

Why not Nested KVM on cloud VMs

VM-in-VM costs 50% of native performance. Unacceptable.

L3

The Fabric

Cilium (eBPF) + Gateway API

Adaptive networking: BGP on metal for line-rate routing, accelerated overlays in the cloud.

The physics

  • On Avahana metal: BGP advertises pod IPs to top-of-rack switches. Zero encapsulation overhead.
  • On hybrid/cloud: Geneve encapsulation with eBPF host routing — bypasses iptables to minimize overlay penalty.
  • Cilium ClusterMesh: a single flat IP space across regions, secured with WireGuard.

Why not Calico / kube-proxy

iptables-based forwarding caps throughput and visibility.

Operational gain

  • Hubble provides packet-level visibility — DNS, HTTP, latency — without instrumenting application code.
  • Identity-aware L3–L7 network policies enforced at the NIC.
L4

Storage & State

LINSTOR + OpenEBS LocalPV

Dual-engine NVMe plane: replicated DRBD for stateful pets, raw passthrough for cattle.

The physics

  • Tier 1: LINSTOR / DRBD network replication. Interrupt-driven (vs SPDK's CPU-bound polling). <0.2ms overhead.
  • Tier 0: OpenEBS LocalPV — direct NVMe passthrough for AI training, Neon-style databases, line-rate IOPS.

Why not Pure SPDK

Burns 100% CPU polling idle drives. Wasteful on small nodes.

Operational gain

  • Standard (Replicated) and Turbo (Local NVMe) storage classes, automatically.
  • ~20% compute cost saved on small nodes vs polling-based stacks.
L5

The Edge

Pingora (Rust)

Programmable, self-hosted edge proxy with auth, billing, and WAF compiled into the binary.

The physics

  • Mode A — Global Acceleration: Anycast IPs announced via BGP at our PoPs; Pingora routes through ClusterMesh / WireGuard to the workload.
  • Mode B — Local Ingress: VIPs announced via ARP/BGP inside customer LANs; provides hardware-load-balancer behavior with no external dependencies.

Why not Nginx + sidecars + external CDN

Sidecar latency, fragmented logic, vendor dependency.

Operational gain

  • 10k+ concurrent connections/sec, zero drops during upgrades.
L6

Management Plane

Go controllers + ConnectRPC + Zitadel

Translates human intent to infrastructure specs. Stateless, horizontally scalable.

The physics

  • Business state in PostgreSQL (users, organizations, billing, audit).
  • Infrastructure state in etcd. The Cortex never touches a server directly — it updates the manifest, and Operators converge reality.
  • OpenMeter ingests telemetry to compute usage in real time.
L7

Supply Chain

BuildKit (Kata-isolated) + Dragonfly P2P

Hostile builds in microVMs; image distribution accelerated via peer-to-peer at the edge.

The physics

  • Rootless BuildKit wrapped in Kata microVMs. Cloud Native Buildpacks auto-detect languages — no Dockerfile required.
  • Dragonfly: a single supernode pulls a 10GB image once, then streams chunks to neighbors over the LAN. ~95% WAN bandwidth saved on edge clusters.
  • Cosign signs every image. Talos refuses unsigned images via admission control.

Operational gain

  • Aggressive NVMe-backed build caching enables Git-to-Production in seconds.
L8

The Interface — Telemetry

Vector (Rust) + ClickHouse

Logs, metrics, and traces unified into one stream. Petabyte-scale, sub-second queries.

The physics

  • Vector runs on every node. No heavy Java or Ruby agents.
  • ClickHouse handles petabyte-scale telemetry with sub-second query latency.
  • Live-tail debugging works even for air-gapped nodes.
L9

API Contract

ConnectRPC (Protobuf)

The entire platform surface area defined in Protobuf. Type-safe clients for Go and TypeScript.

The physics

  • Speaks HTTP/1.1, HTTP/2, and gRPC seamlessly. No browser proxy needed.
  • Backend, CLI, and Web Console are guaranteed in sync — generated from one source of truth.
L10

The CLI

avactl — Go + Cobra + ConnectRPC

A single static binary for Linux, macOS, and Windows. Primary tool for super-admins and CI/CD.

The physics

  • Shares business-logic libraries with the Backend, reducing duplication ~40%.
L11

The Web Consoles

Next.js (App Router) + shadcn/ui

Two consoles from a shared component library: User Console for customers, Admin Console for operators.

The physics

  • Server-side rendering for instant page loads.
  • Connects directly to ConnectRPC (Layer 9) and ClickHouse (Layer 8) for real-time data.

Performance & SLA targets.

What this stack is designed to deliver. Every claim links back to a layer above.

CategoryMetricIndustry standardAvahana targetTechnical enabler
ProvisioningControl Plane Creation5–15 min<15 secKamaji (pod-based)
ComputeVM Cold Start30–120 sec<200 msCloud Hypervisor
ComputeCloud VM Overhead20–50%<1%Tetragon eBPF enforcement
ComputeWasm Cold Start<5 msWasmEdge
NetworkNetwork Overhead50 ms+<5 mseBPF / XDP bypass
StorageIOPS PerformanceThrottledLine rateOpenEBS LocalPV / NVMe Gen5
EdgeGlobal RoutingDNS propagationAnycast (<1 sec)BGP + Pingora
Supply ChainBuild to Deploy5–10 min<60 secBuildKit + Dragonfly P2P
ObservabilityTelemetry Latency1–5 min<5 secVector + ClickHouse
OperationsAdmin : Node Ratio1 : 1001 : 5,000Talos (immutable OS)

Want to run on this stack?

The stack is real. The product is in active build. Get an invite when the beta opens.