From f5f58d5d164e48154af8f312f5ab638f4584560f Mon Sep 17 00:00:00 2001 From: bilby91 Date: Sun, 21 Jun 2026 15:58:03 -0300 Subject: [PATCH 1/4] runtime: add checkpoint/restore (CheckpointRuntime) + Podman backend MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Docker's checkpoint/restore is broken on current engines (the netns bind-mount on restore — open upstream containerd#12141 / moby#37344). Podman does the full round trip (`checkpoint --export` / `restore --import`: process + memory + writable rootfs in a portable, node-independent archive). So expose an optional CheckpointRuntime sub-interface and add a Podman backend that implements it. Contract (backend-agnostic): - runtime.CheckpointRuntime + CheckpointSpec/RestoreSpec/CheckpointRef (export/import model), Capabilities.Checkpoint, and typed errors (ErrCheckpointUnsupported, CheckpointFailedError, RestoreFailedError). - Engine.Checkpoint / Engine.Restore wrappers (checkpoint.go) that type-assert the sub-interface and gate on the capability bit. Restore returns a fully reattached *Workspace (re-inspect, rebuild config) via a shared reattachWorkspace helper factored out of Attach. - Engine.CheckpointProject / RestoreProject (checkpoint_project.go): a thin orchestrator over the per-container primitive for multi-service compose projects — enumerate by com.docker.compose.project label, checkpoint each service to its own archive + a manifest, restore each and reattach the devcontainer service as the Primary workspace. Decoupled from the compose-go / `docker compose` machinery (label-based only). Backend (runtime/podman, Linux): - Embeds *docker.Runtime over Podman's docker-compatible socket for the standard surface; drives checkpoint/restore + buildah build through the libpod REST API on the same socket via a thin stdlib net/http client. No CLI shell-out, no pkg/bindings (that spike took deps 76->384 + cgo/gpgme), zero new modules, cross-compiles clean. - Capabilities().Checkpoint gates on libpod reachability plus an optional Options.CheckpointProbe: the REST transport has no `criu check` equivalent, so the deployer (who runs the service) asserts CRIU. Tests: - Unit: httptest-based libpod request/response shapes, fake runtimes for the engine wrappers + project orchestrator (incl. partial-failure → no manifest, RestoreFailedError propagation), criu-probe gating, and the reattach image-metadata merge. - Integration (PODMAN_SOCKET-gated; skip without a live Podman+CRIU host): single-container reattach, multi-service project round trip, and a two-phase cross-node test (checkpoint on one store, restore on a fresh store that never pulled the image — proving the archive is self-contained; it also pins the workspace-bind-source-must-exist-on-the-destination requirement). - CI: a test-integration-podman job installs podman+criu+crun+iptables, starts the socket, and runs the gated tests for real (continue-on-error until the hosted-runner CRIU capability is confirmed). Validated end-to-end on real Podman+CRIU (all integration tests pass). Design records: design/checkpoint-restore.md (primitive + project orchestrator §3.3 + empirical runtime matrix) and design/podman-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/ci.yml | 53 +++ .gitignore | 4 + README.md | 18 +- attach.go | 35 +- checkpoint.go | 136 ++++++ checkpoint_project.go | 250 ++++++++++ checkpoint_project_test.go | 271 +++++++++++ checkpoint_test.go | 198 ++++++++ design/README.md | 2 + design/checkpoint-restore.md | 426 ++++++++++++++++++ design/podman-backend.md | 326 ++++++++++++++ runtime/compose_primitives.go | 12 + runtime/errors.go | 36 ++ runtime/podman/build.go | 174 +++++++ runtime/podman/checkpoint.go | 89 ++++ runtime/podman/integration_test.go | 163 +++++++ runtime/podman/libpod.go | 93 ++++ runtime/podman/podman.go | 121 +++++ runtime/podman/podman_test.go | 207 +++++++++ runtime/runtime.go | 75 +++ .../podman_checkpoint_restore_test.go | 141 ++++++ test/integration/podman_crossnode_test.go | 169 +++++++ .../podman_project_checkpoint_test.go | 195 ++++++++ 23 files changed, 3179 insertions(+), 15 deletions(-) create mode 100644 checkpoint.go create mode 100644 checkpoint_project.go create mode 100644 checkpoint_project_test.go create mode 100644 checkpoint_test.go create mode 100644 design/checkpoint-restore.md create mode 100644 design/podman-backend.md create mode 100644 runtime/podman/build.go create mode 100644 runtime/podman/checkpoint.go create mode 100644 runtime/podman/integration_test.go create mode 100644 runtime/podman/libpod.go create mode 100644 runtime/podman/podman.go create mode 100644 runtime/podman/podman_test.go create mode 100644 test/integration/podman_checkpoint_restore_test.go create mode 100644 test/integration/podman_crossnode_test.go create mode 100644 test/integration/podman_project_checkpoint_test.go diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index bac2775..6e38de8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -125,6 +125,59 @@ jobs: go test -race -count=1 -tags=integration -timeout=15m \ -run "$pattern" ./test/integration/... + # Real Podman + CRIU checkpoint/restore. The runtime/podman backend and + # the Engine checkpoint/restore + project-orchestrator paths only execute + # against a live Podman socket with CRIU; their tests skip everywhere + # else (PODMAN_SOCKET-gated). This job installs that stack and runs them + # for real. The cross-node test (TestPodmanXNode_*) needs two hosts, so + # it skips here (no DCCKPT_XNODE_DIR) — run it on two machines by hand. + # + # continue-on-error: whether GitHub's hosted ubuntu kernel can do CRIU + # checkpoint/restore end-to-end is unverified (criu check can pass while + # a real dump still hits a missing kernel feature). Kept non-blocking + # like the darwin VZ job below; drop it once a green run is confirmed, + # else move this job to a self-hosted CRIU-capable runner. + test-integration-podman: + runs-on: ubuntu-latest + needs: [lint, test-linux] + continue-on-error: true + steps: + - uses: actions/checkout@v6 + - uses: actions/setup-go@v6 + with: + go-version: "1.25" + cache: true + - name: Install Podman + CRIU + crun + run: | + set -euo pipefail + sudo apt-get update -qq + # iptables is REQUIRED: CRIU shells out to iptables-restore to + # lock the network namespace during a dump (without it the dump + # fails with "execvp(iptables-restore) ... No such file"). + sudo apt-get install -y -qq podman criu crun iptables uidmap + podman --version && criu --version && crun --version + - name: criu check (capability gate) + run: sudo criu check + - name: Start Podman API socket + run: | + set -euo pipefail + sudo mkdir -p /etc/containers + printf '[engine]\nevents_logger="file"\n' | sudo tee /etc/containers/containers.conf >/dev/null + sudo systemctl enable --now podman.socket + for i in $(seq 1 30); do sudo test -S /run/podman/podman.sock && break; sleep 1; done + sudo test -S /run/podman/podman.sock + - name: Build gated test binaries + run: | + go test -tags=integration -c ./test/integration -o ./int.test + go test -c ./runtime/podman -o ./podman.test + - name: Run Podman checkpoint/restore tests (root; checkpoint needs it) + env: + PODMAN_SOCKET: unix:///run/podman/podman.sock + run: | + set -euo pipefail + sudo -E ./podman.test -test.run TestIntegration -test.v -test.timeout 15m + sudo -E ./int.test -test.run '^TestPodman' -test.v -test.timeout 15m + # Integration tests against a live Apple `container` daemon. # # Verified-on-CI status: diff --git a/.gitignore b/.gitignore index eb97613..fe50773 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,10 @@ *.so *.dylib +# Locally-built CLI/example binaries (extension-less Mach-O/ELF outputs) +/dap +/devcontainer + # Test artifacts *.test *.out diff --git a/README.md b/README.md index 4e55bcb..47a4347 100644 --- a/README.md +++ b/README.md @@ -37,8 +37,15 @@ The container backend is pluggable. Pick one at engine construction time: embedded Swift bridge (`libACBridge.dylib`, dlopen'd at runtime). Lets you run devcontainers on Apple Silicon without Docker Desktop. - -Both backends implement the same `runtime.Runtime` interface — the +- **`runtime/podman`** — Podman over its docker-compatible socket (the + same `moby/moby/client`, embedded), plus CRIU-backed + **checkpoint/restore** (`runtime.CheckpointRuntime`) driven through the + libpod REST API on that one socket. The only backend that can migrate a + running devcontainer — process + memory — to another node; see + [`design/checkpoint-restore.md`](design/checkpoint-restore.md). Linux + only; needs a running `podman system service`. + +All three backends implement the same `runtime.Runtime` interface — the engine, feature pipeline, lifecycle, and compose paths don't care which one you wire in. @@ -177,6 +184,13 @@ Requires: system start` already up. Swift toolchain only if you're building the bridge from source — releases embed the pre-built dylib. + - **Podman:** Linux only. A running `podman system service` exposing + its socket (it serves both the docker-compatible and libpod APIs); + point the backend at it with `podman.Options{Socket}`. + Checkpoint/restore additionally requires `criu` (a CRIU-capable + kernel and an OCI runtime such as `crun`/`runc`). No BuildKit: + in-container builds go through buildah, so pre-built/pulled images + are the fast path. ## Quick start diff --git a/attach.go b/attach.go index e929830..518768e 100644 --- a/attach.go +++ b/attach.go @@ -63,17 +63,30 @@ func (e *Engine) AttachWith(ctx context.Context, id WorkspaceID, opts AttachOpti return nil, err } - // Reconstruct just enough config for the substituter. Attach can't - // reproduce the full ResolvedConfig (the source devcontainer.json may - // have changed since Up); callers that need it should Resolve again. + return e.reattachWorkspace(ctx, details, id, opts.LocalEnv), nil +} + +// reattachWorkspace rebuilds a *Workspace from an already-inspected, +// running container. It is shared by Attach (container found by label) +// and Restore (container freshly imported from a checkpoint archive): +// both have a live container and need the same MINIMAL config + bound +// substituter + userEnv probe, without re-reading devcontainer.json. +// +// It reconstructs just enough config for the substituter (Attach can't +// reproduce the full ResolvedConfig — the source devcontainer.json may +// have changed since Up; callers that need it should Resolve again), +// folds in the image's merged-config metadata label so callers see the +// same RemoteUser / lifecycle hooks / probe config as Up, and re-probes +// userEnv so subsequent Exec calls see the user's rc-file PATH additions. +// +// id stamps the workspace and cfg.DevcontainerID; localEnv may be nil +// (falls back to os.Environ()). +func (e *Engine) reattachWorkspace(ctx context.Context, details *runtime.ContainerDetails, id WorkspaceID, localEnv map[string]string) *Workspace { cfg := configFromContainerLabels(details) cfg.DevcontainerID = string(id) - // The container's image carries the merged-config metadata label - // from when Up created it; folding it in here means Attach-only - // callers see the same RemoteUser / lifecycle hooks / probe config - // as Up. Failures to read or parse the label are non-fatal — Attach - // then gives back a minimal cfg as before. + // Reading or parsing the metadata label is best-effort: failures + // leave baseLayers nil and we fall back to the minimal cfg. var baseLayers []config.FeatureMetadata if details.Image != "" { if imgDetails, err := e.runtime.InspectImage(ctx, details.Image); err == nil && imgDetails != nil { @@ -84,7 +97,6 @@ func (e *Engine) AttachWith(ctx context.Context, id WorkspaceID, opts AttachOpti } } } - localEnv := opts.LocalEnv if localEnv == nil { localEnv = environAsMap(os.Environ()) } @@ -104,11 +116,8 @@ func (e *Engine) AttachWith(ctx context.Context, id WorkspaceID, opts AttachOpti subst: newSubstituter(cfg, details, localEnv), } - // Re-probe on attach so subsequent Exec calls see PATH additions - // from the user's rc files. The original Up populated probedEnv, - // but a fresh Attach doesn't share that workspace value. if probed, err := e.probeUserEnv(ctx, ws, cfg.UserEnvProbe); err == nil { ws.probedEnv = probed } - return ws, nil + return ws } diff --git a/checkpoint.go b/checkpoint.go new file mode 100644 index 0000000..1f9ee9b --- /dev/null +++ b/checkpoint.go @@ -0,0 +1,136 @@ +package devcontainer + +import ( + "context" + "fmt" + + "github.com/crunchloop/devcontainer/runtime" +) + +// CheckpointOptions configures Engine.Checkpoint. +type CheckpointOptions struct { + // ArchivePath is where the portable checkpoint archive is written. + // Required. Point it at durable, transferable storage (the workspace + // volume, object storage) — the archive is self-contained, so a + // later Restore can run on a different node by moving this file. + ArchivePath string + + // StopAfter stops/removes the container once the archive is written + // — the spot-eviction path, where the node is going away anyway. + // False keeps the container running ("backup" checkpoint). + StopAfter bool + + // TCPEstablished requests checkpoint of established TCP connections. + // Recommended true for devcontainers: a container holding a live + // connection at checkpoint time fails to checkpoint without it. + TCPEstablished bool +} + +// RestoreOptions configures Engine.Restore. +type RestoreOptions struct { + // ArchivePath is the archive a prior Checkpoint wrote. Required. + ArchivePath string + + // Name optionally names the restored container. + Name string + + // TCPEstablished must match the checkpoint when the archive captured + // established connections. + TCPEstablished bool + + // LocalEnv overrides os.Environ() for the reattached workspace's + // substituter localEnv pass. Nil means use the current process + // environment — matches AttachOptions.LocalEnv. On a cross-node + // restore the destination's env may differ from the source's, so a + // caller that cares can pin it here. + LocalEnv map[string]string +} + +// Checkpoint writes a portable checkpoint archive for the workspace's +// container (process + memory state plus the writable rootfs), so it can +// later be restored — possibly on another node — by Restore. +// +// Returns ErrCheckpointUnsupported (wrapped) if the active backend does +// not implement runtime.CheckpointRuntime or advertises +// Capabilities().Checkpoint == false. Callers can errors.Is against +// runtime.ErrCheckpointUnsupported and fall back to a cold path. +// +// Checkpoint is the primitive; deciding *when* to checkpoint (e.g. on a +// spot-reclaim notice) is the caller's job. +func (e *Engine) Checkpoint(ctx context.Context, ws *Workspace, opts CheckpointOptions) (runtime.CheckpointRef, error) { + if err := ctxIfDone(ctx); err != nil { + return runtime.CheckpointRef{}, err + } + if ws == nil || ws.Container == nil { + return runtime.CheckpointRef{}, fmt.Errorf("Checkpoint: workspace has no container") + } + if opts.ArchivePath == "" { + return runtime.CheckpointRef{}, fmt.Errorf("Checkpoint: ArchivePath is required") + } + + cr, ok := e.runtime.(runtime.CheckpointRuntime) + if !ok || !e.runtime.Capabilities().Checkpoint { + return runtime.CheckpointRef{}, fmt.Errorf("Checkpoint: %w", runtime.ErrCheckpointUnsupported) + } + + ref, err := cr.Checkpoint(ctx, ws.Container.ID, runtime.CheckpointSpec{ + ArchivePath: opts.ArchivePath, + StopAfter: opts.StopAfter, + TCPEstablished: opts.TCPEstablished, + }) + if err != nil { + return runtime.CheckpointRef{}, fmt.Errorf("checkpoint: %w", err) + } + return ref, nil +} + +// Restore re-creates and resumes a container from a checkpoint archive +// written by Checkpoint, reconstructing its mounts and re-attaching +// networking, then rebuilds the *Workspace around it. The original +// container may be gone (the migration case). +// +// The returned Workspace has the MINIMAL config Attach produces — the +// devcontainer labels the checkpoint archive preserves plus the image's +// merged-config metadata — with the substituter bound to the restored +// container's live env and userEnv re-probed. It is enough to drive Exec +// and Down; callers needing the full devcontainer.json view should +// Resolve from source. See the Workspace type docs. +// +// Returns ErrCheckpointUnsupported (wrapped) when the backend can't, and +// a *runtime.RestoreFailedError (from the backend) on a restore failure +// — distinct from a cold-start failure, so callers can fall back to a +// cold Up on the (intact) workspace volume. +func (e *Engine) Restore(ctx context.Context, opts RestoreOptions) (*Workspace, error) { + if err := ctxIfDone(ctx); err != nil { + return nil, err + } + if opts.ArchivePath == "" { + return nil, fmt.Errorf("Restore: ArchivePath is required") + } + + cr, ok := e.runtime.(runtime.CheckpointRuntime) + if !ok || !e.runtime.Capabilities().Checkpoint { + return nil, fmt.Errorf("Restore: %w", runtime.ErrCheckpointUnsupported) + } + + c, err := cr.Restore(ctx, runtime.RestoreSpec{ + ArchivePath: opts.ArchivePath, + Name: opts.Name, + TCPEstablished: opts.TCPEstablished, + }) + if err != nil { + return nil, fmt.Errorf("restore: %w", err) + } + + // Reattach: the restored container carries the devcontainer labels + // from the archive, so rebuild the Workspace the same way Attach + // does. inspectStable absorbs the post-restore state lag (the daemon + // reports state asynchronously after import-and-start). The workspace + // id is recovered from the container's label. + details, err := e.inspectStable(ctx, c.ID) + if err != nil { + return nil, fmt.Errorf("restore: inspect restored container %s: %w", c.ID, err) + } + id := WorkspaceID(details.Labels[LabelDevcontainerID]) + return e.reattachWorkspace(ctx, details, id, opts.LocalEnv), nil +} diff --git a/checkpoint_project.go b/checkpoint_project.go new file mode 100644 index 0000000..3b2d2db --- /dev/null +++ b/checkpoint_project.go @@ -0,0 +1,250 @@ +package devcontainer + +import ( + "context" + "encoding/json" + "fmt" + "os" + "path/filepath" + "sort" + + "github.com/crunchloop/devcontainer/compose" + "github.com/crunchloop/devcontainer/runtime" +) + +// projectManifestName is the self-describing index a project checkpoint +// writes into its archive directory. RestoreProject reads it to learn the +// service set and restore order without re-deriving them. +const projectManifestName = "project.json" + +// ProjectCheckpointOptions configures Engine.CheckpointProject. +type ProjectCheckpointOptions struct { + // ArchiveDir is a directory (created if absent) that receives one + // archive per service container plus a manifest. Required. Point it at + // durable, transferable storage — the set is self-contained, so + // RestoreProject can run on another node by moving the directory. + ArchiveDir string + + // StopAfter stops each container once its archive is written (the + // spot-eviction path). False keeps them running ("backup" checkpoint). + StopAfter bool + + // TCPEstablished requests checkpoint of established TCP connections — + // recommended for a multi-service project, whose services hold live + // inter-container connections (see design/checkpoint-restore.md §7). + TCPEstablished bool +} + +// ServiceCheckpoint records one service container's archive within a +// project checkpoint. +type ServiceCheckpoint struct { + Service string `json:"service"` + ContainerID string `json:"containerId"` + // Archive is the archive's basename within the project ArchiveDir. + Archive string `json:"archive"` + Size int64 `json:"size"` +} + +// ProjectCheckpointRef describes a written project checkpoint. +type ProjectCheckpointRef struct { + Project string `json:"project"` + ArchiveDir string `json:"-"` + Services []ServiceCheckpoint `json:"services"` +} + +// ProjectRestoreOptions configures Engine.RestoreProject. +type ProjectRestoreOptions struct { + // ArchiveDir is the directory a prior CheckpointProject wrote (it reads + // the manifest at projectManifestName). Required. + ArchiveDir string + + // TCPEstablished must match the checkpoint when archives captured + // established connections. + TCPEstablished bool + + // LocalEnv overrides os.Environ() for the reattached primary + // workspace's substituter (parity with RestoreOptions.LocalEnv). + LocalEnv map[string]string +} + +// ProjectRestore is the result of restoring a multi-service project. +type ProjectRestore struct { + Project string + + // Primary is the reattached devcontainer workspace — the service whose + // restored container carries the dev.containers.id label. Nil if the + // project had no devcontainer service (e.g. an all-sidecar set). + Primary *Workspace + + // Services maps compose service name → restored container for every + // service in the project (including the primary's container). + Services map[string]*runtime.Container +} + +// CheckpointProject checkpoints every container of a compose project — the +// project the given workspace belongs to — to per-service archives under +// opts.ArchiveDir, then writes a manifest describing the set. +// +// The checkpoint primitive is per-container (design/checkpoint-restore.md +// §3); CheckpointProject is the engine-level sequencer over it for a +// multi-service project (decision recorded in §9). It enumerates the +// project's containers by their com.docker.compose.project label and +// checkpoints each via the same CheckpointRuntime the single-container +// Engine.Checkpoint uses — so it inherits the same capability gate and +// typed errors, and is independent of how the project was brought up. +// +// On any per-service failure it returns that error WITHOUT writing the +// manifest, so a present manifest always implies a complete set (a partial +// RestoreProject then fails cleanly on the missing manifest). Returns +// ErrCheckpointUnsupported (wrapped) if the backend can't checkpoint. +func (e *Engine) CheckpointProject(ctx context.Context, ws *Workspace, opts ProjectCheckpointOptions) (ProjectCheckpointRef, error) { + if err := ctxIfDone(ctx); err != nil { + return ProjectCheckpointRef{}, err + } + if ws == nil || ws.Container == nil { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: workspace has no container") + } + if opts.ArchiveDir == "" { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: ArchiveDir is required") + } + project := ws.Container.Labels[compose.LabelComposeProject] + if project == "" { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: workspace is not a compose project (missing %s label) — use Engine.Checkpoint for a single container", compose.LabelComposeProject) + } + + cr, ok := e.runtime.(runtime.CheckpointRuntime) + if !ok || !e.runtime.Capabilities().Checkpoint { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: %w", runtime.ErrCheckpointUnsupported) + } + + containers, err := e.runtime.ListContainers(ctx, runtime.LabelFilter{ + Match: map[string]string{compose.LabelComposeProject: project}, + }) + if err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: list project %q containers: %w", project, err) + } + if len(containers) == 0 { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: no containers found for project %q", project) + } + // Deterministic order by service name so the manifest (and restore + // order) are stable across runs. + sort.Slice(containers, func(i, j int) bool { + return serviceName(containers[i]) < serviceName(containers[j]) + }) + + if err := os.MkdirAll(opts.ArchiveDir, 0o755); err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: create archive dir: %w", err) + } + + ref := ProjectCheckpointRef{Project: project, ArchiveDir: opts.ArchiveDir} + for _, c := range containers { + svc := serviceName(c) + archive := svc + ".tar" + cref, err := cr.Checkpoint(ctx, c.ID, runtime.CheckpointSpec{ + ArchivePath: filepath.Join(opts.ArchiveDir, archive), + StopAfter: opts.StopAfter, + TCPEstablished: opts.TCPEstablished, + }) + if err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: service %q: %w", svc, err) + } + ref.Services = append(ref.Services, ServiceCheckpoint{ + Service: svc, ContainerID: c.ID, Archive: archive, Size: cref.Size, + }) + } + + // Manifest last: its presence marks a complete checkpoint. + if err := writeProjectManifest(opts.ArchiveDir, ref); err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: write manifest: %w", err) + } + return ref, nil +} + +// RestoreProject restores every service archive recorded in the manifest +// under opts.ArchiveDir and reattaches the project. The shared network +// re-forms as the containers come back (per-container restore re-attaches +// networking; design §7) — restore order is the manifest's (service-name) +// order, which is forgiving for reconnecting services. +// +// The service whose restored container carries the dev.containers.id label +// is reattached as the full Primary *Workspace (re-inspect + rebuild +// config + bind substituter), the same as Engine.Restore; the rest are +// returned as restored containers. Returns ErrCheckpointUnsupported +// (wrapped) if the backend can't, and a *runtime.RestoreFailedError on a +// per-service restore failure. +func (e *Engine) RestoreProject(ctx context.Context, opts ProjectRestoreOptions) (*ProjectRestore, error) { + if err := ctxIfDone(ctx); err != nil { + return nil, err + } + if opts.ArchiveDir == "" { + return nil, fmt.Errorf("RestoreProject: ArchiveDir is required") + } + + cr, ok := e.runtime.(runtime.CheckpointRuntime) + if !ok || !e.runtime.Capabilities().Checkpoint { + return nil, fmt.Errorf("RestoreProject: %w", runtime.ErrCheckpointUnsupported) + } + + manifest, err := readProjectManifest(opts.ArchiveDir) + if err != nil { + return nil, fmt.Errorf("RestoreProject: %w", err) + } + + out := &ProjectRestore{Project: manifest.Project, Services: map[string]*runtime.Container{}} + for _, svc := range manifest.Services { + c, err := cr.Restore(ctx, runtime.RestoreSpec{ + ArchivePath: filepath.Join(opts.ArchiveDir, svc.Archive), + TCPEstablished: opts.TCPEstablished, + }) + if err != nil { + return nil, fmt.Errorf("RestoreProject: service %q: %w", svc.Service, err) + } + out.Services[svc.Service] = c + + // Reattach the devcontainer service as the Primary workspace. Its + // restored container is the one carrying our id label (sidecars + // carry only compose labels), so inspect to find out. + details, err := e.inspectStable(ctx, c.ID) + if err != nil { + return nil, fmt.Errorf("RestoreProject: inspect restored %q (%s): %w", svc.Service, c.ID, err) + } + if id := details.Labels[LabelDevcontainerID]; id != "" && out.Primary == nil { + out.Primary = e.reattachWorkspace(ctx, details, WorkspaceID(id), opts.LocalEnv) + } + } + return out, nil +} + +// serviceName returns a container's compose service name, falling back to +// its container name when the label is absent (a non-compose-managed +// container that nonetheless shares the project label). +func serviceName(c runtime.Container) string { + if s := c.Labels[compose.LabelComposeService]; s != "" { + return s + } + return c.Name +} + +func writeProjectManifest(dir string, ref ProjectCheckpointRef) error { + b, err := json.MarshalIndent(ref, "", " ") + if err != nil { + return err + } + return os.WriteFile(filepath.Join(dir, projectManifestName), b, 0o644) +} + +func readProjectManifest(dir string) (ProjectCheckpointRef, error) { + b, err := os.ReadFile(filepath.Join(dir, projectManifestName)) + if err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("read project manifest in %q: %w", dir, err) + } + var ref ProjectCheckpointRef + if err := json.Unmarshal(b, &ref); err != nil { + return ProjectCheckpointRef{}, fmt.Errorf("parse project manifest: %w", err) + } + if len(ref.Services) == 0 { + return ProjectCheckpointRef{}, fmt.Errorf("project manifest has no services") + } + ref.ArchiveDir = dir + return ref, nil +} diff --git a/checkpoint_project_test.go b/checkpoint_project_test.go new file mode 100644 index 0000000..1fa0f21 --- /dev/null +++ b/checkpoint_project_test.go @@ -0,0 +1,271 @@ +package devcontainer + +import ( + "context" + "errors" + "fmt" + "os" + "path/filepath" + "testing" + + "github.com/crunchloop/devcontainer/compose" + "github.com/crunchloop/devcontainer/runtime" +) + +// fakeProjectRuntime is a CheckpointRuntime that round-trips a whole +// project: ListContainers filters the seeded set by label, Checkpoint +// records each container's labels keyed by archive path (and writes a stub +// archive file), and Restore recreates a fresh container from those +// recorded labels — modelling podman preserving labels across the archive. +type fakeProjectRuntime struct { + *fakeRuntime + archiveLabels map[string]map[string]string // archive path → original labels + restoreSeq int + + // failCheckpointID makes Checkpoint fail for that container id (to + // exercise the partial-failure → no-manifest path). + failCheckpointID string + // restoreErr, when set, makes every Restore fail with it. + restoreErr error +} + +func newFakeProjectRuntime() *fakeProjectRuntime { + return &fakeProjectRuntime{fakeRuntime: newFakeRuntime(), archiveLabels: map[string]map[string]string{}} +} + +func (f *fakeProjectRuntime) Capabilities() runtime.Capabilities { + c := f.fakeRuntime.Capabilities() + c.Checkpoint = true + return c +} + +func (f *fakeProjectRuntime) ListContainers(ctx context.Context, filter runtime.LabelFilter) ([]runtime.Container, error) { + f.fakeRuntime.mu.Lock() + defer f.fakeRuntime.mu.Unlock() + var out []runtime.Container + for _, d := range f.fakeRuntime.containersByID { + if labelsMatch(d.Labels, filter.Match) { + c := d.Container + c.Labels = d.Labels + out = append(out, c) + } + } + return out, nil +} + +func (f *fakeProjectRuntime) Checkpoint(ctx context.Context, id string, spec runtime.CheckpointSpec) (runtime.CheckpointRef, error) { + if id == f.failCheckpointID { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: errors.New("injected checkpoint failure")} + } + f.fakeRuntime.mu.Lock() + var labels map[string]string + if d := f.fakeRuntime.containersByID[id]; d != nil { + labels = d.Labels + } + f.archiveLabels[spec.ArchivePath] = labels + f.fakeRuntime.mu.Unlock() + if err := os.WriteFile(spec.ArchivePath, []byte("FAKE-TAR"), 0o600); err != nil { + return runtime.CheckpointRef{}, err + } + return runtime.CheckpointRef{ArchivePath: spec.ArchivePath, Size: int64(len("FAKE-TAR"))}, nil +} + +func (f *fakeProjectRuntime) Restore(ctx context.Context, spec runtime.RestoreSpec) (*runtime.Container, error) { + if f.restoreErr != nil { + return nil, f.restoreErr + } + f.fakeRuntime.mu.Lock() + defer f.fakeRuntime.mu.Unlock() + f.restoreSeq++ + id := fmt.Sprintf("restored-%d", f.restoreSeq) + labels := f.archiveLabels[spec.ArchivePath] + c := &runtime.Container{ID: id, State: runtime.StateRunning, Labels: labels} + f.fakeRuntime.containersByID[id] = &runtime.ContainerDetails{ + Container: *c, + Labels: labels, + Env: []string{"HOME=/root", "PATH=/usr/bin"}, + } + return c, nil +} + +func labelsMatch(have, want map[string]string) bool { + for k, v := range want { + if have[k] != v { + return false + } + } + return true +} + +func seedProjectContainer(f *fakeProjectRuntime, id string, labels map[string]string) { + f.fakeRuntime.mu.Lock() + defer f.fakeRuntime.mu.Unlock() + f.fakeRuntime.containersByID[id] = &runtime.ContainerDetails{ + Container: runtime.Container{ID: id, Name: id, State: runtime.StateRunning, Labels: labels}, + Labels: labels, + } +} + +func TestCheckpointRestoreProject_RoundTrip(t *testing.T) { + rt := newFakeProjectRuntime() + // A 3-service project: primary "app" carries the devcontainer id; + // "db" and "cache" are plain sidecars (compose labels only). + seedProjectContainer(rt, "app-1", map[string]string{ + compose.LabelComposeProject: "dc-proj", + compose.LabelComposeService: "app", + LabelDevcontainerID: "ws-app", + LabelLocalWorkspaceFolder: "/work", + }) + seedProjectContainer(rt, "db-1", map[string]string{ + compose.LabelComposeProject: "dc-proj", + compose.LabelComposeService: "db", + }) + seedProjectContainer(rt, "cache-1", map[string]string{ + compose.LabelComposeProject: "dc-proj", + compose.LabelComposeService: "cache", + }) + // A container from a DIFFERENT project must not be swept in. + seedProjectContainer(rt, "other-1", map[string]string{ + compose.LabelComposeProject: "other-proj", + compose.LabelComposeService: "app", + }) + + eng, _ := New(EngineOptions{Runtime: rt}) + ws := &Workspace{Container: &runtime.ContainerDetails{ + Container: runtime.Container{ID: "app-1"}, + Labels: map[string]string{compose.LabelComposeProject: "dc-proj"}, + }} + + ctx := context.Background() + dir := t.TempDir() + + ref, err := eng.CheckpointProject(ctx, ws, ProjectCheckpointOptions{ArchiveDir: dir, TCPEstablished: true}) + if err != nil { + t.Fatalf("CheckpointProject: %v", err) + } + if ref.Project != "dc-proj" || len(ref.Services) != 3 { + t.Fatalf("ref = %+v (want project dc-proj, 3 services)", ref) + } + // Deterministic service-name order: app, cache, db. + if ref.Services[0].Service != "app" || ref.Services[1].Service != "cache" || ref.Services[2].Service != "db" { + t.Fatalf("service order = %q/%q/%q, want app/cache/db", ref.Services[0].Service, ref.Services[1].Service, ref.Services[2].Service) + } + // Manifest written; archive files present. + if _, err := os.Stat(filepath.Join(dir, projectManifestName)); err != nil { + t.Fatalf("manifest not written: %v", err) + } + for _, s := range ref.Services { + if _, err := os.Stat(filepath.Join(dir, s.Archive)); err != nil { + t.Fatalf("archive for %q missing: %v", s.Service, err) + } + } + + pr, err := eng.RestoreProject(ctx, ProjectRestoreOptions{ArchiveDir: dir, TCPEstablished: true}) + if err != nil { + t.Fatalf("RestoreProject: %v", err) + } + if pr.Project != "dc-proj" || len(pr.Services) != 3 { + t.Fatalf("restore = %+v (want project dc-proj, 3 services)", pr) + } + // Every service came back. + for _, svc := range []string{"app", "cache", "db"} { + if pr.Services[svc] == nil { + t.Errorf("service %q not restored", svc) + } + } + // The devcontainer service is reattached as the Primary workspace, id + // recovered from the preserved label. + if pr.Primary == nil { + t.Fatal("Primary workspace is nil — devcontainer service not reattached") + } + if pr.Primary.ID != "ws-app" { + t.Errorf("Primary.ID = %q, want ws-app (from dev.containers.id label)", pr.Primary.ID) + } + if pr.Primary.subst == nil { + t.Error("Primary workspace has no substituter") + } +} + +func TestCheckpointProject_Validation(t *testing.T) { + rt := newFakeProjectRuntime() + eng, _ := New(EngineOptions{Runtime: rt}) + ctx := context.Background() + + // Not a compose workspace (no project label). + ws := &Workspace{Container: &runtime.ContainerDetails{Container: runtime.Container{ID: "c1"}}} + if _, err := eng.CheckpointProject(ctx, ws, ProjectCheckpointOptions{ArchiveDir: t.TempDir()}); err == nil { + t.Fatal("want error for non-compose workspace") + } + // Missing ArchiveDir. + composeWS := &Workspace{Container: &runtime.ContainerDetails{ + Labels: map[string]string{compose.LabelComposeProject: "p"}, + }} + if _, err := eng.CheckpointProject(ctx, composeWS, ProjectCheckpointOptions{}); err == nil { + t.Fatal("want error for empty ArchiveDir") + } + // RestoreProject with no manifest in the dir. + if _, err := eng.RestoreProject(ctx, ProjectRestoreOptions{ArchiveDir: t.TempDir()}); err == nil { + t.Fatal("want error for missing manifest") + } +} + +// A mid-loop service failure aborts CheckpointProject WITHOUT writing the +// manifest, so a later RestoreProject fails cleanly rather than restoring a +// partial set. +func TestCheckpointProject_PartialFailureWritesNoManifest(t *testing.T) { + rt := newFakeProjectRuntime() + rt.failCheckpointID = "db-1" // "db" sorts after "app" → app succeeds, db fails + seedProjectContainer(rt, "app-1", map[string]string{ + compose.LabelComposeProject: "p", compose.LabelComposeService: "app", LabelDevcontainerID: "ws", + }) + seedProjectContainer(rt, "db-1", map[string]string{ + compose.LabelComposeProject: "p", compose.LabelComposeService: "db", + }) + eng, _ := New(EngineOptions{Runtime: rt}) + ws := &Workspace{Container: &runtime.ContainerDetails{ + Labels: map[string]string{compose.LabelComposeProject: "p"}, + }} + dir := t.TempDir() + + if _, err := eng.CheckpointProject(context.Background(), ws, ProjectCheckpointOptions{ArchiveDir: dir}); err == nil { + t.Fatal("want error when a service checkpoint fails") + } + if _, err := os.Stat(filepath.Join(dir, projectManifestName)); !os.IsNotExist(err) { + t.Fatalf("manifest must be absent after a partial failure (stat err = %v)", err) + } +} + +// A per-service restore failure propagates (wrapped) as *RestoreFailedError +// so callers can fall back to a cold project Up. +func TestRestoreProject_BackendErrorPropagates(t *testing.T) { + rt := newFakeProjectRuntime() + seedProjectContainer(rt, "app-1", map[string]string{ + compose.LabelComposeProject: "p", compose.LabelComposeService: "app", LabelDevcontainerID: "ws", + }) + eng, _ := New(EngineOptions{Runtime: rt}) + ws := &Workspace{Container: &runtime.ContainerDetails{ + Labels: map[string]string{compose.LabelComposeProject: "p"}, + }} + dir := t.TempDir() + if _, err := eng.CheckpointProject(context.Background(), ws, ProjectCheckpointOptions{ArchiveDir: dir}); err != nil { + t.Fatalf("CheckpointProject setup: %v", err) + } + + rt.restoreErr = &runtime.RestoreFailedError{ArchivePath: "x", Err: errors.New("criu boom")} + _, err := eng.RestoreProject(context.Background(), ProjectRestoreOptions{ArchiveDir: dir}) + var rfe *runtime.RestoreFailedError + if !errors.As(err, &rfe) { + t.Fatalf("want *RestoreFailedError, got %v", err) + } +} + +func TestCheckpointProject_UnsupportedBackend(t *testing.T) { + eng, _ := New(EngineOptions{Runtime: newFakeRuntime()}) + ws := &Workspace{Container: &runtime.ContainerDetails{ + Labels: map[string]string{compose.LabelComposeProject: "p"}, + }} + _, err := eng.CheckpointProject(context.Background(), ws, ProjectCheckpointOptions{ArchiveDir: t.TempDir()}) + if err == nil { + t.Fatal("want ErrCheckpointUnsupported for a non-checkpoint backend") + } +} diff --git a/checkpoint_test.go b/checkpoint_test.go new file mode 100644 index 0000000..bc15f84 --- /dev/null +++ b/checkpoint_test.go @@ -0,0 +1,198 @@ +package devcontainer + +import ( + "context" + "errors" + "testing" + + "github.com/crunchloop/devcontainer/feature" + "github.com/crunchloop/devcontainer/runtime" +) + +// fakeCheckpointRuntime wraps fakeRuntime to also implement +// runtime.CheckpointRuntime and advertise Capabilities().Checkpoint. +type fakeCheckpointRuntime struct { + *fakeRuntime + checkpointable bool + + gotCheckpointID string + gotCheckpointSpec runtime.CheckpointSpec + gotRestoreSpec runtime.RestoreSpec + checkpointErr error + restoreErr error +} + +func (f *fakeCheckpointRuntime) Capabilities() runtime.Capabilities { + c := f.fakeRuntime.Capabilities() + c.Checkpoint = f.checkpointable + return c +} + +func (f *fakeCheckpointRuntime) Checkpoint(ctx context.Context, id string, spec runtime.CheckpointSpec) (runtime.CheckpointRef, error) { + f.gotCheckpointID = id + f.gotCheckpointSpec = spec + if f.checkpointErr != nil { + return runtime.CheckpointRef{}, f.checkpointErr + } + return runtime.CheckpointRef{ArchivePath: spec.ArchivePath, Size: 42}, nil +} + +func (f *fakeCheckpointRuntime) Restore(ctx context.Context, spec runtime.RestoreSpec) (*runtime.Container, error) { + f.gotRestoreSpec = spec + if f.restoreErr != nil { + return nil, f.restoreErr + } + c := &runtime.Container{ID: "restored-1", State: runtime.StateRunning} + // Register the restored container so Engine.Restore's reattach + // (inspect → rebuild Workspace) finds it. Real podman restore + // preserves the original's labels in the new container, so carry the + // devcontainer id + local-workspace labels the reattach reads. + f.fakeRuntime.mu.Lock() + f.fakeRuntime.containersByID["restored-1"] = &runtime.ContainerDetails{ + Container: *c, + Labels: map[string]string{ + LabelDevcontainerID: "ws-restored-id", + LabelLocalWorkspaceFolder: "/work", + }, + Env: []string{"HOME=/root", "PATH=/usr/bin"}, + } + f.fakeRuntime.mu.Unlock() + return c, nil +} + +func wsWithContainer(id string) *Workspace { + return &Workspace{Container: &runtime.ContainerDetails{Container: runtime.Container{ID: id}}} +} + +// A runtime that doesn't implement CheckpointRuntime at all (plain +// fakeRuntime) must surface ErrCheckpointUnsupported on both verbs. +func TestCheckpoint_UnsupportedBackend(t *testing.T) { + eng, _ := New(EngineOptions{Runtime: newFakeRuntime()}) + + _, err := eng.Checkpoint(context.Background(), wsWithContainer("c1"), CheckpointOptions{ArchivePath: "/tmp/a.tar"}) + if !errors.Is(err, runtime.ErrCheckpointUnsupported) { + t.Fatalf("Checkpoint: want ErrCheckpointUnsupported, got %v", err) + } + + _, err = eng.Restore(context.Background(), RestoreOptions{ArchivePath: "/tmp/a.tar"}) + if !errors.Is(err, runtime.ErrCheckpointUnsupported) { + t.Fatalf("Restore: want ErrCheckpointUnsupported, got %v", err) + } +} + +// A backend that implements the interface but advertises +// Capabilities().Checkpoint == false is still unsupported (e.g. podman +// present but criu check failed). +func TestCheckpoint_CapabilityFalseIsUnsupported(t *testing.T) { + rt := &fakeCheckpointRuntime{fakeRuntime: newFakeRuntime(), checkpointable: false} + eng, _ := New(EngineOptions{Runtime: rt}) + + _, err := eng.Checkpoint(context.Background(), wsWithContainer("c1"), CheckpointOptions{ArchivePath: "/tmp/a.tar"}) + if !errors.Is(err, runtime.ErrCheckpointUnsupported) { + t.Fatalf("Checkpoint: want ErrCheckpointUnsupported, got %v", err) + } + if rt.gotCheckpointID != "" { + t.Fatalf("backend Checkpoint should not be called when capability is false") + } +} + +func TestCheckpoint_HappyPath(t *testing.T) { + rt := &fakeCheckpointRuntime{fakeRuntime: newFakeRuntime(), checkpointable: true} + eng, _ := New(EngineOptions{Runtime: rt}) + + ref, err := eng.Checkpoint(context.Background(), wsWithContainer("c1"), CheckpointOptions{ + ArchivePath: "/vol/ckpt.tar", + StopAfter: true, + TCPEstablished: true, + }) + if err != nil { + t.Fatalf("Checkpoint: unexpected error: %v", err) + } + if ref.ArchivePath != "/vol/ckpt.tar" || ref.Size != 42 { + t.Fatalf("Checkpoint: unexpected ref %+v", ref) + } + // Spec is threaded through from options + the workspace container id. + if rt.gotCheckpointID != "c1" { + t.Fatalf("Checkpoint: backend got id %q, want c1", rt.gotCheckpointID) + } + if rt.gotCheckpointSpec.ArchivePath != "/vol/ckpt.tar" || !rt.gotCheckpointSpec.StopAfter || !rt.gotCheckpointSpec.TCPEstablished { + t.Fatalf("Checkpoint: backend got spec %+v", rt.gotCheckpointSpec) + } +} + +func TestRestore_HappyPath(t *testing.T) { + rt := &fakeCheckpointRuntime{fakeRuntime: newFakeRuntime(), checkpointable: true} + eng, _ := New(EngineOptions{Runtime: rt}) + + ws, err := eng.Restore(context.Background(), RestoreOptions{ArchivePath: "/vol/ckpt.tar", Name: "ws-restored", TCPEstablished: true}) + if err != nil { + t.Fatalf("Restore: unexpected error: %v", err) + } + // Restore reattaches a full *Workspace around the restored container: + // the container handle, the workspace id recovered from its label, and + // a substituter bound to its live env. + if ws == nil || ws.Container == nil || ws.Container.ID != "restored-1" { + t.Fatalf("Restore: unexpected workspace %+v", ws) + } + if ws.ID != "ws-restored-id" { + t.Fatalf("Restore: workspace id = %q, want ws-restored-id (from container label)", ws.ID) + } + if ws.subst == nil { + t.Fatal("Restore: workspace has no substituter") + } + if rt.gotRestoreSpec.ArchivePath != "/vol/ckpt.tar" || rt.gotRestoreSpec.Name != "ws-restored" || !rt.gotRestoreSpec.TCPEstablished { + t.Fatalf("Restore: backend got spec %+v", rt.gotRestoreSpec) + } +} + +// A backend RestoreFailedError propagates (wrapped) so callers can +// distinguish it from a cold-start failure and fall back to a cold Up. +func TestRestore_BackendErrorPropagates(t *testing.T) { + want := &runtime.RestoreFailedError{ArchivePath: "/vol/ckpt.tar", Err: errors.New("criu boom")} + rt := &fakeCheckpointRuntime{fakeRuntime: newFakeRuntime(), checkpointable: true, restoreErr: want} + eng, _ := New(EngineOptions{Runtime: rt}) + + _, err := eng.Restore(context.Background(), RestoreOptions{ArchivePath: "/vol/ckpt.tar"}) + var rfe *runtime.RestoreFailedError + if !errors.As(err, &rfe) { + t.Fatalf("Restore: want *RestoreFailedError, got %v", err) + } +} + +// reattachWorkspace (shared by Attach and Restore/RestoreProject) folds the +// restored image's devcontainer.metadata label into the reconstructed +// config — so a reattached workspace sees the same RemoteUser etc. as Up. +func TestReattachWorkspace_MergesImageMetadataLabel(t *testing.T) { + rt := newFakeRuntime() + rt.imagesByRef["img-meta"] = &runtime.ImageDetails{ + Labels: map[string]string{feature.MetadataLabel: `[{"remoteUser":"dc-user"}]`}, + } + eng, _ := New(EngineOptions{Runtime: rt}) + + details := &runtime.ContainerDetails{ + Container: runtime.Container{ID: "rc", Image: "img-meta", State: runtime.StateRunning}, + Labels: map[string]string{LabelDevcontainerID: "ws"}, + Env: []string{"HOME=/root"}, + } + ws := eng.reattachWorkspace(context.Background(), details, "ws", nil) + if ws.Config.RemoteUser != "dc-user" { + t.Fatalf("RemoteUser = %q, want dc-user (merged from the image %s label)", ws.Config.RemoteUser, feature.MetadataLabel) + } +} + +func TestCheckpoint_Validation(t *testing.T) { + rt := &fakeCheckpointRuntime{fakeRuntime: newFakeRuntime(), checkpointable: true} + eng, _ := New(EngineOptions{Runtime: rt}) + + // No container on the workspace. + if _, err := eng.Checkpoint(context.Background(), &Workspace{}, CheckpointOptions{ArchivePath: "/tmp/a.tar"}); err == nil { + t.Fatal("Checkpoint: want error for workspace with no container") + } + // Missing archive path. + if _, err := eng.Checkpoint(context.Background(), wsWithContainer("c1"), CheckpointOptions{}); err == nil { + t.Fatal("Checkpoint: want error for empty ArchivePath") + } + if _, err := eng.Restore(context.Background(), RestoreOptions{}); err == nil { + t.Fatal("Restore: want error for empty ArchivePath") + } +} diff --git a/design/README.md b/design/README.md index d68a831..2907051 100644 --- a/design/README.md +++ b/design/README.md @@ -24,6 +24,8 @@ work shipped. | [`compose-native.md`](compose-native.md) | The runtime-agnostic compose orchestrator that drives any backend through `runtime.Runtime` primitives. Replaces (when opted in) the `docker compose` shell-out, and is what enables compose source on apple-container. | | [`features.md`](features.md) | The Dev Container Features pipeline: OCI / HTTPS / local resolution, DAG ordering, dockerfile generation, the pre-baked-image fast path, and the content-addressed cache. | | [`structured-errors.md`](structured-errors.md) | The `*devcontainer.Error` surface returned from every public failure path. Code catalog, `Cause` chain conventions, and the `StderrCarrier` interface for subprocess-output access. | +| [`checkpoint-restore.md`](checkpoint-restore.md) | The optional `CheckpointRuntime` sub-interface: CRIU-backed checkpoint/restore for migrating spot-evicted devcontainers. Pass-2 records the empirical finding that docker's restore is broken upstream and only **Podman** (`checkpoint --export`/`restore --import`) works end-to-end — so the primitive lands in a new `runtime/podman` backend. Includes the runtime matrix, capability gating, integration options, and what's proven vs open. | +| [`podman-backend.md`](podman-backend.md) | The implementation plan for the Podman backend that makes checkpoint/restore real: reuse the docker/moby backend against Podman's docker-compatible socket and add C/R via libpod, the build-path risk (BuildKit vs buildah) that gates it, the phased plan (de-risk spikes → contract → backend → consumer adoption), and the file-level change list. | ## What's *not* here diff --git a/design/checkpoint-restore.md b/design/checkpoint-restore.md new file mode 100644 index 0000000..3820554 --- /dev/null +++ b/design/checkpoint-restore.md @@ -0,0 +1,426 @@ +# Design — Checkpoint / Restore + +**Status:** Draft for review — Pass 2 (empirically revised 2026-06-19) +**Date:** 2026-06-19 +**Scope:** an optional `Runtime` sub-interface, `CheckpointRuntime`, that +checkpoints a running container's process + memory state (and writable +rootfs) to a portable archive and restores it later, possibly on another +host — to migrate spot-evicted devcontainers without losing in-memory +work. Defines the primitive, the capability gate, the engine wrappers, +and the backend that actually implements it. Backends that can't do it +return `ErrNotImplemented` and advertise `Capabilities().Checkpoint == false`. + +> **Pass-2 headline:** a day of empirical testing on a real consumer +> workspace pod (on a managed Kubernetes cluster) **disproved the original +> premise that the docker backend would implement this.** Docker's +> checkpoint/restore is broken on current versions (a known, open upstream +> bug). The mechanism works end-to-end only on **Podman** +> (`checkpoint --export` / `restore --import`). The primitive below +> survives; the backend that implements it changes from `runtime/docker` +> to a new `runtime/podman`. §2 records what we tested and why. The +> original docker-centric design is preserved in Appendix A as the +> reasoning trail. + +Companion to `design/runtime.md` (the `Runtime` interface this extends) +and `design/structured-errors.md` (the error surface). Follows the +optional sub-interface pattern established by `ComposeRuntime` +(`design/compose-native.md` §3). + +--- + +## 1. Motivation + +The primary consumer runs coding agents inside devcontainers on +**spot instances**. A spot node can be reclaimed at any time. Today a +reclaim kills the workspace: the agent process dies and any in-memory +work (an in-progress build, a half-written file held open, a running +test, the agent's own working state) is lost. + +The platform gets advance warning of a reclaim (cloud providers deliver a +termination notice ~30s–2min ahead). The target flow: + +```text +1. Platform detects the node-reclaim notice. +2. Platform tells the runtime to CHECKPOINT the devcontainer + (process + memory + rootfs → a portable archive on the workspace volume). +3. A new pod starts on a healthy node, RESTORES the archive, and the + agent resumes mid-task instead of cold-booting. +``` + +This is textbook CRIU live-migration; the contribution is exposing it as +a clean library primitive with honest capability gating. + +## 2. What we tested (2026-06-19) and what it proved + +We ran the whole stack against a live consumer workspace pod and a dedicated +bench pod (same workspace runtime image, custom entrypoint that starts +dockerd/containerd and idles, so we could drive checkpoint/restore by +hand without the workspace supervisor tearing things down). + +### 2.1 The environment is capable + +On the real workspace pod's inner daemon (the devcontainers run in a +docker-in-docker inside the `runtime` container): + +- `docker 29.2.1`, `containerd v2.2.5`, `runc 1.3.6`, `criu 4.1.1`. +- Storage: `overlayfs` (containerd snapshotter), **data-root on the + workspace PVC** (`/workspace/docker`). This matters: the inner docker's + entire graph — every container's writable layer and named volumes — + already lives on the PVC and survives a pod move. +- The `runtime` container is **privileged**; dockerd runs as root with + `cap_sys_admin` + `cap_checkpoint_restore`. +- `criu check` (as root) → **"Looks good."** The node kernel fully + supports CRIU. (An earlier failure was only because criu was invoked as + an unprivileged user.) + +So kernel + CRIU + runc + containerd are all capable. **The blocker is +purely the container-manager layer**, not the substrate. + +### 2.2 Runtime matrix — what works + +| Layer | Checkpoint | Restore | Verdict | +| --- | --- | --- | --- | +| **`docker checkpoint` CLI** | ✅ (~0.5s, idle) | ❌ netns bind-mount `/proc/0/ns/net` fails; custom `--checkpoint-dir` unsupported on restore | **Dead end** (open upstream bug) | +| **raw `runc` (plain OCI bundle)** | ✅ | ✅ with `--empty-ns network` — counter resumed 10→14 | Works, but bypasses any manager | +| **raw `runc` on a docker-managed container** | ✅ | ❌ docker unmounts rootfs + removes the bundle on task exit | No home to restore into | +| **containerd `ctr containers …` (plain `ctr run`)** | ✅ `--rw --task` | ✅ `--rw --live` — counter resumed 9→13, into a *new* container | Works for manager-free containers | +| **containerd `ctr` on a docker-created container** | ❌ `snapshot does not exist` | — | Docker leaves the containerd container's `SnapshotKey` **empty** | +| **nerdctl (CNI bridge) via `ctr`** | ✅ `--rw` (proper `SnapshotKey`) | ❌ nerdctl's OCI hooks (CNI + `/etc/hosts`) fail outside nerdctl's control | Manager hooks break generic restore | +| **Podman `checkpoint --export` / `restore --import`** | ✅ | ✅ **full e2e** — memory resumed 7→10, **bridge networking re-attached & egress working**, into a fresh container | **This is the path.** | + +### 2.3 The two recurring failure modes + +1. **Docker's restore is broken.** `docker start --checkpoint` fails on + the network-namespace bind-mount (`/proc/0/ns/net`, pid 0) regardless + of network mode (bridge *and* none). This is a **known, open upstream + bug** on the current containerd-integrated engine + ([containerd#12141](https://github.com/containerd/containerd/issues/12141), + our exact stack), and docker's custom-checkpoint-dir support has been + broken since the containerd 1.0 integration + ([moby#37344](https://github.com/moby/moby/issues/37344)). CRIU's own + project [recommends Podman over Docker](https://criu.org/Docker). + Docker's CLI also can't pass the CRIU options that would help + (`--empty-ns`, `--tcp-established`) — those live behind + `/etc/criu/runc.conf` at the runc layer, which fixes TCP handling but + *not* the daemon-level netns bug. + +2. **Container managers inject state/hooks that generic restore can't + reconstruct.** Docker leaves the containerd container's `SnapshotKey` + empty (so `ctr` can't bundle the rootfs). nerdctl bakes CNI + hosts + OCI hooks that fail when restored outside nerdctl. In both cases the + *checkpoint* succeeds but a *manager-agnostic restore* cannot rebuild + the container's environment (mounts, `/etc/hosts`, network). + +**Podman is the one tool that owns the full lifecycle on both ends:** +`restore --import` reconstructs the rootfs, the bind-mount sources, and +re-attaches the network itself. That is exactly why it works end-to-end +where everything else stalls — and why CRIU recommends it. + +### 2.4 Filesystem note (supersedes the old §2) + +The original draft warned that "a checkpoint is process state, not the +filesystem," making cross-node restore fragile. Two findings soften this: + +- Podman's `--export` archive **bundles the writable rootfs layer** + alongside the CRIU images, so the artifact is self-contained and + portable by construction. +- The consumer's inner docker keeps **data-root on the PVC**, so writable + layers persist across pods anyway. + +The constraint is no longer "everything mutable must be on a volume" — the +export artifact carries the rootfs. The constraint that remains is that +the **destination must be able to reconstruct the container's external +mounts** (the workspace bind, secrets, and any other injected mounts) — +which the orchestrator can, because it created the devcontainer and knows +its mount set. + +## 3. The primitive + +Mirror `ComposeRuntime`: an optional sub-interface the engine +type-asserts, gated by `Capabilities()`. Backends that don't implement it +are invisible to the rest of the library. The shape now models Podman's +**export/import** (a portable archive), not docker's checkpoint-dir. + +```go +package runtime + +// CheckpointRuntime is the optional sub-interface a Runtime implements +// when it can checkpoint a running container to a portable archive +// (process + memory via CRIU, plus the writable rootfs layer) and later +// restore it — possibly on another host — into a fresh container. +// +// Implemented by runtime/podman (podman container checkpoint --export / +// restore --import). NOT implemented by runtime/docker: docker's restore +// is broken on current engines (see design/checkpoint-restore.md §2 and +// Appendix A). Backends without it cause Engine.Checkpoint/Restore to +// return ErrCheckpointUnsupported. +type CheckpointRuntime interface { + // Checkpoint writes a self-contained checkpoint archive for a running + // container to spec.ArchivePath. The archive carries the CRIU image, + // the writable rootfs diff, and the config needed to restore. With + // spec.StopAfter the container is stopped/removed after the archive + // is written (the eviction path); otherwise it keeps running. + Checkpoint(ctx context.Context, id string, spec CheckpointSpec) (CheckpointRef, error) + + // Restore re-creates and resumes a container from a checkpoint + // archive, reconstructing its mounts and re-attaching networking. + // Restores into a NEW container (migration), so the source may be + // gone. Returns the new Container handle. + Restore(ctx context.Context, spec RestoreSpec) (*Container, error) +} + +// CheckpointSpec configures Checkpoint → `podman container checkpoint`. +type CheckpointSpec struct { + // ArchivePath is where the export archive is written. Point it at the + // workspace PVC (or anywhere that travels to the destination — a + // registry blob, object storage). Maps to `--export`. + ArchivePath string + + // StopAfter leaves the container stopped after export (eviction + // path). False keeps it running ("backup" checkpoint). + StopAfter bool + + // TCPEstablished requests checkpoint of established TCP connections + // (`--tcp-established`). Needed if the agent holds connections we + // want to survive; otherwise they reset and the agent reconnects. + TCPEstablished bool +} + +// RestoreSpec configures Restore → `podman container restore --import`. +// Note: unlike the old docker model, no RunSpec is needed — the archive +// is self-describing (image, config, mounts, rootfs). +type RestoreSpec struct { + ArchivePath string // the archive Checkpoint wrote (--import) + Name string // optional new container name + TCPEstablished bool // must match the checkpoint if it had connections +} + +// CheckpointRef describes a written checkpoint archive. +type CheckpointRef struct { + ArchivePath string + // Size is the archive size in bytes — feeds the platform's + // eviction-window / transfer budgeting. + Size int64 +} +``` + +### 3.1 Restore is into a new container, by design + +Podman `restore --import` creates a fresh container from the archive — it +does not need the original to exist. This is the migration shape exactly: +checkpoint on the dying pod, ship the archive, import on a new pod. We +verified this end-to-end (removed the original entirely, imported into a +new container, memory + networking intact). No `RunContainer`-then-start +dance is needed (that was a docker-model artifact; see Appendix A §3.1). + +### 3.2 No `ListCheckpoints` / `RemoveCheckpoint` in v1 + +The archive is a plain file the platform owns and reclaims (PVC/registry +lifecycle). No server-side enumeration needed; defer it. + +### 3.3 Project orchestration above the primitive (decision 2026-06-21) + +The primitive stays strictly per-container (§3). §9 originally pushed +*all* multi-container sequencing onto the platform. **Revised:** the +engine also ships a thin **project orchestrator** — +`Engine.CheckpointProject` / `RestoreProject` (root `checkpoint_project.go`) +— layered above the per-container primitive, so a caller can checkpoint and +restore a whole compose project in one call rather than re-implementing the +loop. It is intentionally decoupled from the compose-go / `docker compose` +machinery: it identifies the project's containers purely by their +`com.docker.compose.project` label (via `runtime.ListContainers`), +checkpoints each through the same `CheckpointRuntime` the single-container +path uses, and writes one archive per service plus a `project.json` +manifest. Restore reads the manifest, restores each archive, and reattaches +the devcontainer service (the one carrying `dev.containers.id`) as the +`Primary *Workspace` — the rest are returned as restored containers. + +Model notes (validated by the Phase-0 spike, §7): + +- **Order:** checkpoint/restore in deterministic service-name order; restore + order is forgiving because reconnecting services self-heal. +- **Network:** the shared network re-forms as containers come back + (`restore --import` re-attaches networking, and Podman restores the + original container name, so service-name DNS resolves again). The network + must still exist on the target; recreating it cross-node is the + orchestrator's caller's job (or a follow-up). +- **Completeness:** the manifest is written last, so its presence implies a + complete set; a partial checkpoint leaves no manifest and `RestoreProject` + fails cleanly. +- **Scale:** one container per service in v1 (no compose `scale`). + +## 4. Capability gate + +Add one field to the `Capabilities` struct +(`runtime/compose_primitives.go`): + +```go +type Capabilities struct { + // ... existing fields ... + + // Checkpoint reports whether this backend can checkpoint/restore a + // running container (CheckpointRuntime). True on runtime/podman when + // the libpod API is reachable (and a deployer-supplied CRIU probe, if + // any, passes); false on runtime/docker (restore is broken upstream) + // and runtime/applecontainer. + Checkpoint bool +} +``` + +The podman backend probes at construction (libpod reachable, plus an +optional `Options.CheckpointProbe` the deployer supplies to assert CRIU — +the REST transport can't run `criu check` itself; see +`design/podman-backend.md` §5.3) and sets the bit. The engine checks it +before attempting an operation and returns a typed error (§6) rather than +surfacing an opaque failure. + +## 5. Implementation: `runtime/podman` + +The mechanism lives in a **new backend**, `runtime/podman`, implementing +the full `runtime.Runtime` interface plus `CheckpointRuntime`. The +checkpoint/restore methods map directly to verified commands: + +```text +Checkpoint → podman container checkpoint --export \ + [--tcp-established] [--leave-running=!StopAfter] +Restore → podman container restore --import \ + [--tcp-established] [--name ] +``` + +Three integration options for talking to podman, lightest first: + +1. **Shell out to the `podman` CLI.** Simplest, lowest coupling — and the + library already shells out for `docker compose`. Recommended for v1. +2. **Thin HTTP client against the libpod REST API** + (`POST /libpod/containers/{name}/checkpoint` + `/restore`), served by + `podman system service`. No heavy deps. Checkpoint/restore is a + *libpod* extension, not in podman's docker-compatible API, so target + the libpod endpoints. +3. **`github.com/containers/podman/v5/pkg/bindings/containers`** + (`Checkpoint` / `Restore`). Official and typed, but pulls in the very + large `containers/podman` module (cgo, storage build tags) — a heavy + dependency for a Go library. Avoid unless we want the full surface. + +### 5.1 The bigger architectural implication + +This backend means **running devcontainers under Podman instead of +Docker**. That is a consumer-runtime decision beyond this library, but it +is the price of working checkpoint/restore: docker cannot do it on current +versions. Podman runs standard OCI/Docker images and offers a +docker-compatible API + `podman-compose`, so it is a plausible swap, but +it is real migration work and should be costed separately. + +## 6. Error surface + +Per `design/structured-errors.md`: + +- `ErrCheckpointUnsupported` — active runtime doesn't implement + `CheckpointRuntime` or `Capabilities().Checkpoint == false`. Returned by + the engine wrappers before any work. +- `CheckpointFailedError` — checkpoint/export failed; carries the + container id and podman's stderr via the `StderrCarrier` convention. +- `RestoreFailedError` — import/restore failed; carries the archive path + and podman's message. Distinct from a cold-start failure so the + platform can deterministically **fall back to a cold `Up`** (workspace + data on the PVC is intact; only in-memory state is lost). + +## 7. Validation status + +What's **proven** (2026-06-19, bench pod, real image): + +- ✅ Kernel/CRIU/runc/containerd all capable (`criu check` good). +- ✅ Podman `checkpoint --export` → remove original → `restore --import` + round trip: memory resumed (counter 7→10), **bridge networking + re-attached and egress working**, single 43 KB portable archive, + ~1.8s checkpoint / ~0.5s restore on an idle container. +- ✅ Docker is a dead end (netns restore bug, reproduced; corroborated by + open upstream issues). +- ✅ **Cross-pod transfer / self-contained archive.** Checkpointed on + pod A, copied the archive to pod B (separate, *empty* Podman store that + had never pulled the image), `restore --import` on B: container resumed + (counter 9→12), networking functional (egress ok), and the image was + populated *from the archive*. The archive needs nothing node-local — + cross-node is just "copy the file." (Pods were same-node by scheduler + chance; stores were fully isolated, so the node boundary isn't + load-bearing. A forced cross-node run with anti-affinity is the only + belt-and-suspenders gap.) Build-path spike also done — see + `design/podman-backend.md` §4 (buildah build works). +- ✅ **Multi-service project + inter-container networking.** Two services + on a user-defined podman network (`app` → `db:9000` every second, + service-name DNS). Checkpointed *both* (`--export`), removed both, + restored *both* (`--import`): both resumed (app counter 8→18, db + tracking), and the **inter-container link re-established** (app resolves + `db` and reconnects). Per-container checkpoint/restore + the shared + network is sufficient for loosely-coupled (reconnecting) services; no + compose-level C/R primitive needed. Restore ordering is forgiving + (db-then-app; app self-heals each tick). +- ✅ **`--tcp-established` is required for connection-holding services.** + Without it, checkpoint of a service with a live TCP connection fails + intermittently (timing-dependent, exit 125). With it, checkpoint + succeeds. Caveat: it lets checkpoint *succeed* and reconnecting clients + recover; a service relying on a *persistent* connection surviving a + peer-IP change on restore is the residual edge (matches the + "agents reconnect" assumption). + +What's **still open** (minor): + +- **Working-set timing.** Idle container was fast (~0.5s same-pod, ~3.5s + cross-pod incl. rootfs unpack); a busy agent's memory footprint sets the + real checkpoint time vs the eviction window. Measure on a real workload. +- **Forced cross-node** placement (anti-affinity) — belt-and-suspenders; + the archive is already proven node-independent. + +## 8. Constraints, risks & mitigations + +| Risk | Impact | Mitigation | +| --- | --- | --- | +| Docker can't do restore | Original premise invalid | Pivot to `runtime/podman`; docker backend reports `Checkpoint=false`. Tracked: [containerd#12141](https://github.com/containerd/containerd/issues/12141). | +| Running devcontainers under Podman is a stack change | Consumer migration cost | Costed separately; Podman runs OCI/Docker images + has a docker-compat API. | +| Multi-container compose project | Per-container checkpoint, ordering | Open spike; consider podman pods / staged restore. | +| In-flight TCP breaks on new IP | Agent resumes into dead sockets | `--tcp-established` available; otherwise agents reconnect (transient-blip semantics). | +| Kernel / CRIU parity between nodes | Restore fails across mismatched nodes | Homogeneous spot fleet + node-image requirement; `RestoreFailedError` → cold-boot fallback. | +| Eviction window too short | Checkpoint doesn't finish | `CheckpointRef.Size` + working-set timing feed a go/no-go; degrade to cold boot for large footprints. | +| Restore failure | Lost in-memory state | `RestoreFailedError` distinct from cold-start; fall back to cold `Up` on the PVC — data intact. | + +## 9. What this does not do (v1) + +- **No live migration without an eviction notice.** Checkpoint-then- + restart, predicated on advance warning. Not transparent fault tolerance + for instant node death. +- **No docker / applecontainer support.** `Capabilities().Checkpoint == false`; + `ErrCheckpointUnsupported`. (Docker: broken upstream. Apple: no CRIU.) +- **No multi-container orchestration in the primitive.** The primitive is + per-container. Sequencing a multi-service project is done one level up by + the engine orchestrator (`Engine.CheckpointProject` / `RestoreProject`, + §3.3) — not by the primitive, and not (for the simple loosely-coupled + case) by the platform. Heavier sequencing (dependency-ordered restore, + cross-node network recreation) remains the platform's job. +- **No engine-driven scheduling.** *When* to checkpoint and *where* to + restore are the platform's job. + +--- + +## Appendix A — The docker-centric design (superseded 2026-06-19) + +The original Pass-1 draft assumed `runtime/docker` would implement +`CheckpointRuntime` via the moby client's `CheckpointCreate` + +checkpoint-aware `ContainerStart`, with a `--checkpoint-dir` redirected +onto the PVC and a daemon-experimental capability probe. The API surface +(`CheckpointCreate{CheckpointID, CheckpointDir, Exit}`, +`ContainerStart{CheckpointID, CheckpointDir}`) is real and present in +`moby/moby/client v0.4.1`, and **checkpoint (dump) does work**. + +It was abandoned for restore, not checkpoint. Empirically (§2): + +- `docker start --checkpoint` fails on the network-namespace bind-mount + on the current containerd-integrated engine — an open upstream bug, not + a config error. +- Custom `--checkpoint-dir` is unsupported on restore (broken since + containerd 1.0); the default-dir workaround doesn't help because the + netns failure is downstream of it. +- Docker leaves the containerd container's `SnapshotKey` empty, so the + containerd-level checkpoint path can't bundle the rootfs either. + +The reasoning is kept because "did anyone try docker?" is the first +question any reviewer will ask. Yes — in depth, on the real image — and +it does not work. See §2.3 and the linked issues. diff --git a/design/podman-backend.md b/design/podman-backend.md new file mode 100644 index 0000000..9e84387 --- /dev/null +++ b/design/podman-backend.md @@ -0,0 +1,326 @@ +# Design — Podman backend (for checkpoint/restore) + +**Status:** Draft for review +**Date:** 2026-06-19 +**Scope:** how the library gains Podman support so it can checkpoint and +restore devcontainers (the `CheckpointRuntime` primitive). Commits to an +approach — **reuse the existing docker/moby backend pointed at Podman's +docker-compatible socket, and add checkpoint/restore via Podman's libpod +API** — and lays out the phased plan, the build-path risk that makes or +breaks it, file-level changes, testing, and the consumer-adoption dependency. + +Companion to `design/checkpoint-restore.md` (defines the primitive and the +empirical reason docker can't do it) and `design/runtime.md` (the +`Runtime` interface). + +--- + +## 1. Why a Podman backend + +`design/checkpoint-restore.md` §2 records the empirical result: on the +current docker/containerd stack, `docker checkpoint` *dumps* fine but +`docker start --checkpoint` is broken (an open upstream bug), and +container managers (docker, nerdctl) leave state generic restore can't +reconstruct. **Podman is the only tool that does the full round trip** +(`checkpoint --export` → `restore --import`: memory resumed, networking +re-attached, into a fresh container), which is why CRIU recommends it. + +To checkpoint a devcontainer with Podman, the devcontainer must be +**Podman-managed**. So the library needs a Podman backend. This doc is +about getting that backend with the least new, brittle code. + +## 2. Chosen approach: reuse docker backend + libpod C/R (Option A) + +> **IMPLEMENTED (2026-06-21) — fully API-driven, no CLI shell-out.** The +> backend embeds `*docker.Runtime` over the Podman docker-compat socket +> for the standard surface, and drives **build + checkpoint/restore +> through the libpod REST API on the *same* socket** via a thin stdlib +> `net/http` client (`runtime/podman/libpod.go`) — no `podman` CLI +> subprocess. The endpoint shapes were captured from the official client +> (`podman --remote --log-level=debug`) and verified end-to-end on a live +> bench: checkpoint `POST …/checkpoint?export=true` → tar response body; +> restore `POST …/containers/import/restore?import=true` with the archive +> in the body → `{"Id":…}`; build `POST …/build` with the context tar in +> the body → streamed `{"stream":…}` (last line = image id). Both +> integration tests PASS. The earlier CLI plan (below) was dropped in +> favour of REST to match the library's SDK-first design (the docker +> backend uses the moby SDK, not shell-out) — see §2.1 for why not +> `pkg/bindings`. + +Podman's `system service` exposes **two APIs on one socket**: + +- a **docker-compatible API** (the moby REST surface), and +- the **libpod API** (Podman's native extensions, including + checkpoint/restore — which are *not* in the docker-compatible API). + +The library's `runtime/docker` backend already talks to the moby REST API +via `moby/moby/client` and already supports a `DOCKER_HOST` override +(`runtime/docker/client.go`). So: + +- **Point the existing docker backend at Podman's socket** for the bulk + of the `Runtime` surface (Run/Start/Stop/Remove, Exec, Inspect, Logs, + FindByLabel, Pull, networks/volumes). These are docker-compatible and + should work unchanged. +- **Add checkpoint/restore against the libpod API** (a thin addition), + since those endpoints don't exist in the docker-compat surface. + +This avoids writing a second full backend. The new code is small: a +constructor that wires the docker runtime to the podman socket, the +`CheckpointRuntime` methods, a `Capabilities` override, and (pending §4) +a build-path override. + +### 2.1 Why not B (full native backend) or C (CLI) + +- **B — full `runtime/podman` via libpod REST / `pkg/bindings`:** the most + idiomatic, but it re-implements the entire `Runtime` surface we already + have working over the moby client, and `pkg/bindings` drags in the very + large `containers/podman` module. **Spiked 2026-06-21:** importing + `pkg/bindings` takes the dependency tree from **76 → 384 modules** and + fails to build CGO-free (`proglottis/gpgme` needs cgo + `pkg-config`), + requiring the `containers_image_openpgp` build tag threaded through every + downstream consumer. Rejected — too heavy and build-hostile for a + library. We hit the libpod endpoints with a **thin stdlib HTTP client** + instead (zero new deps, cross-compiles cleanly). +- **C — full backend via `podman` CLI:** streaming exec/build events over + the CLI are brittle — exactly the brittleness the docker backend avoids + by using the SDK. Reserve CLI shell-out for the few calls where it's + simplest (possibly the C/R calls themselves; §5.2). + +The cost of Option A is a hard dependency on Podman's docker-compat +fidelity. The one place that's known to leak is **build** — see §4. + +## 3. Packaging: a thin `runtime/podman` that composes `runtime/docker` + +```go +// runtime/podman/podman.go (Linux build tag; Podman is Linux-only) +package podman + +// Runtime is the Podman backend. It embeds a docker.Runtime wired to +// Podman's docker-compatible socket for the standard Runtime surface, +// and adds the libpod-only checkpoint/restore on top. +type Runtime struct { + *docker.Runtime // docker-compat API at the podman socket + libpod *libpodClient // thin client for libpod-only endpoints (C/R) +} + +func New(ctx context.Context, opts Options) (*Runtime, error) { + // opts.Socket defaults to the podman service socket + // (e.g. unix:///run/podman/podman.sock). Construct the docker.Runtime + // against it, construct the libpod client against the same socket, + // probe capabilities (libpod reachable + optional deployer-supplied + // CRIU probe — see §5.3 for why CRIU can't be checked over REST). +} + +// Checkpoint / Restore implement runtime.CheckpointRuntime via libpod. +// Capabilities overrides docker.Runtime's to set Checkpoint=true. +// BuildImage may be overridden depending on §4. +``` + +Embedding `*docker.Runtime` means the Podman backend satisfies +`runtime.Runtime` for free and we override only what differs +(`Capabilities`, `Checkpoint`/`Restore`, possibly `BuildImage`). The +docker backend is untouched. + +## 4. The make-or-break risk: the build path + +The library's `runtime/docker/build.go` is **BuildKit-only** (requires +Docker 23+ BuildKit). **Podman has no BuildKit** — it builds with buildah. +Podman's docker-compat `/build` endpoint exists but does not provide +BuildKit semantics, so the library's build path will not work as-is +against Podman. + +> **IMPLEMENTED — build-path spike + backend, 2026-06-19 (bench).** Native +> `podman build` (buildah) of a representative devcontainer Dockerfile +> succeeds, so the buildah path is viable. **`runtime/podman.BuildImage` +> now overrides the embedded `docker.Runtime`'s BuildKit build** and shells +> out to `podman build` (`runtime/podman/build.go`): maps `BuildSpec` → +> `podman build` flags, reads the image ID from `--iidfile`, and streams +> the build log as `BuildEventLog` events. Validated by the gated +> integration test `TestIntegration_BuildImage` (PASS on live podman: +> built `sha256:9cecd9d…`, tag applied, 6 log events). Pre-baked/pulled +> images remain the fast path; this covers the in-container build case. + +This is the deciding factor for Option A and was settled in a Phase-0 +spike (above). Candidate resolutions, in order of preference: + +1. **Pre-built / pulled images (no in-container build).** If the consumer's + devcontainer images are pre-baked and pulled (the common case — see the + pre-baked-image fast path in `design/features.md`), the build path is + rarely exercised at workspace start. `BuildImage` could return + `ErrNotImplemented` on the Podman backend for v1, and the feature + pipeline relies on pulled images. +2. **Route build through Podman/buildah** — override `BuildImage` to use + the libpod build endpoint or shell out to `podman build`. More work; + loses BuildKit features (cache mounts, etc.) the current path uses. +3. **Build elsewhere, load into Podman** — keep a real BuildKit builder in + CI / a sidecar, push to a registry, `podman pull`. Shifts build out of + the workspace runtime entirely. + +**Decision:** Phase-0 build-path spike decides between (1) and (2). Lean +(1) for v1 if the consumer's images are pre-baked; otherwise (2). + +## 5. Checkpoint / restore implementation + +The primitive (`CheckpointRuntime`, `CheckpointSpec`, `RestoreSpec`, +`CheckpointRef`) is defined in `design/checkpoint-restore.md` §3. The +Podman backend implements it. + +### 5.1 Mapping (verified on the bench, 2026-06-19) + +```text +Checkpoint → podman container checkpoint --export \ + [--tcp-established] [--leave-running=!StopAfter] +Restore → podman container restore --import \ + [--tcp-established] [--name ] +``` + +`restore --import` rebuilds the container (rootfs + mounts + network) from +the self-contained archive into a *new* container — no `RunContainer` +pre-step needed. + +### 5.2 libpod REST vs CLI for these two calls + +- **libpod REST:** `POST /libpod/containers/{name}/checkpoint?export=…` and + `POST /libpod/containers/{name}/restore?import=…`. No CLI dependency, + programmatic errors. Preferred if the thin libpod client is small. +- **CLI shell-out:** simplest; mirrors the `docker compose` shell-out the + library already does. Acceptable fallback. + +Either way, keep it behind the backend so the engine only sees +`CheckpointRuntime`. + +> **Transport: RESOLVED (integration test, 2026-06-19).** The earlier +> "wedge" concern (podman service + CLI contending on the store) was +> **disproven for normal use**: the live integration test ran the full +> Option A path — moby client over the podman service for +> pull/run/start/exec/remove, *plus* podman-CLI `checkpoint`/`restore` +> against the same store — and passed cleanly in 25.6s (memory resumed +> 7→10). The original wedge symptom traced to (a) a forcibly-killed +> `podman system service` leaving a stale lock and (b) a test-harness bug +> (`pkill -f "system service"` matching the runner's own shell). So +> **CLI for C/R is viable alongside the moby-over-service surface** — no +> need to switch to libpod REST. (REST remains the fallback if heavier +> concurrency ever surfaces contention.) + +### 5.3 Capability probe + +> **IMPLEMENTED with a caveat (2026-06-21).** `Capabilities().Checkpoint` +> is gated at construction on the **libpod API being reachable** +> (`GET /libpod/_ping` → 2xx, which also confirms it is genuinely Podman: +> a docker socket 404s the `/libpod/` path), cached in `checkpointOK`. +> +> The original plan was to *also* gate on `criu check` passing. That is +> **not achievable over the REST transport**: libpod has no `criu check` +> endpoint, `/info` does not report CRIU, and the backend is deliberately +> CLI-free (§2.1) so it won't shell out to `criu check` either. So the +> backend cannot verify CRIU itself. Instead, `podman.Options.CheckpointProbe +> func(context.Context) bool` lets the **deployer** — who runs +> `podman system service` and therefore knows the host — fold in that +> assertion (exec `criu check`, read a provisioning marker, etc.); it runs +> once at `New` alongside the ping. Nil means "don't probe": the bit then +> reflects libpod reachability only. +> +> When CRIU is absent and no probe caught it, `Checkpoint` fails at call +> time with a `*runtime.CheckpointFailedError` (carrying libpod's stderr), +> which is distinct from `ErrCheckpointUnsupported` and lets the platform +> fall back to a cold `Up`. `Engine.Checkpoint/Restore` still return +> `ErrCheckpointUnsupported` up front when the bit is false. + +## 6. Phased plan + +### Phase 0 — de-risk spikes (no library code; on `ckpt-bench`) + +Gates the whole investment. Each is a go/no-go: + +- **Build-path spike (§4):** does the consumer's devcontainer come up under + Podman with pre-built/pulled images (option 4.1), or do we need a buildah build + path (4.2)? *Decides the BuildImage strategy.* +- **Multi-service compose:** ✅ DONE (2026-06-19). Two networked services + (`app`→`db`), checkpointed + restored *per container*; both resumed and + the inter-container link (service-name DNS + TCP) re-established. No + compose-level primitive needed — the engine checkpoints/restores each + service and the shared network re-forms the links. Restore ordering + forgiving for reconnecting services. +- **`--tcp-established` survival:** ✅ DONE (2026-06-19). Required for any + service holding a live TCP connection — without it checkpoint fails + intermittently (exit 125). The backend should pass it by default for + C/R. Residual edge: a persistent connection across a peer-IP change + still breaks; reconnecting clients recover. +- **Cross-node transfer:** ✅ DONE (2026-06-19). Checkpointed on pod A, + copied the 43 KB archive to pod B (separate empty Podman store), restored + on B — resumed + networking + image populated from the archive. Archive + is fully self-contained; cross-node is just file transfer. (Same-node by + scheduler chance; a forced cross-node run is the only remaining nicety.) + +### Phase 1 — the contract (small, backend-agnostic library change) + +- `runtime/runtime.go`: add `CheckpointRuntime` + `CheckpointSpec` / + `RestoreSpec` / `CheckpointRef`. +- `runtime/compose_primitives.go`: add `Capabilities.Checkpoint`. +- `runtime/errors.go` (+ `devcontainer.Error`): `ErrCheckpointUnsupported`, + `CheckpointFailedError`, `RestoreFailedError`. +- New `checkpoint.go` (repo root): `Engine.Checkpoint` / `Engine.Restore` + wrappers that type-assert `CheckpointRuntime` (mirror compose-source + handling). +- Unit tests against a fake runtime implementing the sub-interface. +- *No backend implements it yet — this just lands the API shape.* + +### Phase 2 — the Podman backend + +- `runtime/podman/` package (Linux build tag): `New` wiring `docker.Runtime` + to the podman socket + libpod client; capability probe. +- `runtime/podman/checkpoint.go`: `Checkpoint`/`Restore` (§5). +- `BuildImage` per the Phase-0 build decision (§4). +- `Capabilities` override (`Checkpoint=true`). +- Integration tests behind a real-podman gate (mirror the existing + real-docker compose integration-test gate). + +### Phase 3 — consumer adoption (out of library scope; consumer repo) + +- The consumer's workspace runtime image: Podman + criu (keep), drop the + docker-in-docker dependency for the devcontainer runtime. +- The consumer runtime wires `runtime/podman` and orchestrates checkpoint-on- + eviction → archive to PVC/registry → restore-on-new-pod. +- Separate design effort in the consumer's repo; this is the largest piece + and the real prerequisite for production use. + +## 7. File-level change list (Phases 1–2) + +| File | Change | +| --- | --- | +| `runtime/runtime.go` | `CheckpointRuntime` interface + spec/ref types | +| `runtime/compose_primitives.go` | `Capabilities.Checkpoint` | +| `runtime/errors.go` | `ErrCheckpointUnsupported`, `CheckpointFailedError`, `RestoreFailedError` | +| `checkpoint.go` (new, root) | `Engine.Checkpoint` / `Engine.Restore` wrappers | +| `runtime/podman/podman.go` (new) | backend: embed `docker.Runtime` @ podman socket, probe | +| `runtime/podman/checkpoint.go` (new) | libpod/CLI `Checkpoint`/`Restore` | +| `runtime/podman/build.go` (new, maybe) | build-path override per §4 | +| `runtime/docker/client.go` | (verify) `DOCKER_HOST`/socket override is reusable as-is | +| `*_test.go` | unit (fake runtime) + integration (real-podman gate) | + +## 8. Testing + +- **Unit:** engine wrappers + capability gating against a fake runtime + (no podman needed). +- **Integration (real podman gate):** bring a container up via the Podman + backend, checkpoint→remove→restore, assert memory + networking resume — + the bench test, codified. Gated like the existing real-docker compose + tests (skipped without a reachable podman). + +## 9. Open questions / decisions + +- **Build path (§4)** — the gating decision; Phase-0 spike. +- **libpod REST vs CLI (§5.2)** for the two C/R calls. +- **Multi-container** — per-container primitive + engine sequencing, vs + modelling Podman pods. +- **Socket provisioning** — who runs `podman system service` and where the + socket lives (consumer-runtime concern; the library just takes the address). + +## 10. Risks + +| Risk | Impact | Mitigation | +| --- | --- | --- | +| Podman docker-compat build incompatible (BuildKit) | Backend can't build images | Phase-0 build spike; pre-built/pulled images for v1 (§4.1) or buildah path (§4.2) | +| Other docker-compat fidelity gaps (exec/inspect edge cases) | Subtle backend bugs | Integration tests against real podman; fall back to native libpod calls per-method if needed | +| Running devcontainers under Podman is a stack change | Consumer migration cost | Phase 3, costed separately; Podman runs OCI/Docker images | +| Multi-service compose unproven | Real devcontainer may not migrate cleanly | Phase-0 spike before committing | diff --git a/runtime/compose_primitives.go b/runtime/compose_primitives.go index 6114251..8098a51 100644 --- a/runtime/compose_primitives.go +++ b/runtime/compose_primitives.go @@ -132,4 +132,16 @@ type Capabilities struct { // depends_on edge race and may miss the patch on first DNS // lookup — documented limitation on this backend. ServiceNameDNS bool + + // Checkpoint: backend implements CheckpointRuntime — it can + // checkpoint a running container to a portable archive and restore + // it (CRIU). Engine.Checkpoint / Engine.Restore gate on this. + // + // True on runtime/podman when the libpod API is reachable at + // construction (and a deployer-supplied CRIU probe, if any, passes — + // the REST transport can't run `criu check` itself). False on + // runtime/docker (restore is broken on current engines — see + // design/checkpoint-restore.md) and + // runtime/applecontainer (no CRIU). + Checkpoint bool } diff --git a/runtime/errors.go b/runtime/errors.go index a89483f..199cb94 100644 --- a/runtime/errors.go +++ b/runtime/errors.go @@ -10,6 +10,42 @@ import ( // CLI-shim runtime in v1). var ErrNotImplemented = errors.New("runtime: not implemented") +// ErrCheckpointUnsupported is returned by Engine.Checkpoint / +// Engine.Restore when the active runtime does not implement +// CheckpointRuntime, or advertises Capabilities().Checkpoint == false. +// Callers can errors.Is against it to fall back to a cold start. +var ErrCheckpointUnsupported = errors.New("runtime: checkpoint/restore not supported by this backend") + +// CheckpointFailedError indicates a checkpoint/export call failed. +// Carries the container id and the backend's captured output. +type CheckpointFailedError struct { + ID string + Stderr string + Err error +} + +func (e *CheckpointFailedError) Error() string { + return fmt.Sprintf("checkpoint failed for %s: %v: %s", e.ID, e.Err, e.Stderr) +} + +func (e *CheckpointFailedError) Unwrap() error { return e.Err } + +// RestoreFailedError indicates a restore/import call failed. Distinct +// from a cold-start failure so callers can deterministically fall back +// to a cold Up: the workspace data survives on the volume; only the +// in-memory state is lost. +type RestoreFailedError struct { + ArchivePath string + Stderr string + Err error +} + +func (e *RestoreFailedError) Error() string { + return fmt.Sprintf("restore failed from %s: %v: %s", e.ArchivePath, e.Err, e.Stderr) +} + +func (e *RestoreFailedError) Unwrap() error { return e.Err } + // ImageNotFoundError indicates the requested image is not present // locally and could not be pulled. type ImageNotFoundError struct { diff --git a/runtime/podman/build.go b/runtime/podman/build.go new file mode 100644 index 0000000..03f2e9d --- /dev/null +++ b/runtime/podman/build.go @@ -0,0 +1,174 @@ +package podman + +import ( + "archive/tar" + "context" + "encoding/json" + "fmt" + "io" + "net/http" + "net/url" + "os" + "path/filepath" + "regexp" + "strings" + + "github.com/crunchloop/devcontainer/runtime" +) + +var imageIDRe = regexp.MustCompile(`^[0-9a-f]{64}$`) + +// buildQuery maps a BuildSpec to libpod /build query params (verified +// against podman 5.4: dockerfile is a JSON array, t is the tag). +func buildQuery(spec runtime.BuildSpec) url.Values { + q := url.Values{} + df := "Dockerfile" + if spec.Dockerfile != "" { + df = filepath.Base(spec.Dockerfile) + } + b, _ := json.Marshal([]string{df}) + q.Set("dockerfile", string(b)) + if spec.Tag != "" { + q.Set("t", spec.Tag) + } + if len(spec.Args) > 0 { + ba, _ := json.Marshal(spec.Args) + q.Set("buildargs", string(ba)) + } + if spec.Target != "" { + q.Set("target", spec.Target) + } + if spec.NoCache { + q.Set("nocache", "1") + } + if spec.Platform != "" { + q.Set("platform", spec.Platform) + } + return q +} + +// tarDir streams a tar of dir's contents (rooted at dir) into w. +func tarDir(dir string, w *io.PipeWriter) { + tw := tar.NewWriter(w) + walkErr := filepath.Walk(dir, func(path string, fi os.FileInfo, err error) error { + if err != nil { + return err + } + rel, err := filepath.Rel(dir, path) + if err != nil { + return err + } + if rel == "." { + return nil + } + hdr, err := tar.FileInfoHeader(fi, "") + if err != nil { + return err + } + hdr.Name = filepath.ToSlash(rel) + if err := tw.WriteHeader(hdr); err != nil { + return err + } + if fi.Mode().IsRegular() { + f, err := os.Open(path) + if err != nil { + return err + } + _, copyErr := io.Copy(tw, f) + _ = f.Close() + if copyErr != nil { + return copyErr + } + } + return nil + }) + if walkErr != nil { + _ = tw.Close() + _ = w.CloseWithError(walkErr) + return + } + if err := tw.Close(); err != nil { + _ = w.CloseWithError(err) + return + } + _ = w.Close() +} + +// BuildImage builds an image with buildah via the libpod /build endpoint: +// streams the context as a tar request body, forwards the build log as +// BuildEventLog events, and returns the built image's reference. +func (r *Runtime) BuildImage(ctx context.Context, spec runtime.BuildSpec, events chan<- runtime.BuildEvent) (runtime.ImageRef, error) { + if spec.ContextPath == "" { + return runtime.ImageRef{}, fmt.Errorf("podman BuildImage: ContextPath is required") + } + + pr, pw := io.Pipe() + go tarDir(spec.ContextPath, pw) + + resp, err := r.lp.do(ctx, http.MethodPost, "/build", buildQuery(spec), pr, "application/x-tar") + if err != nil { + return runtime.ImageRef{}, fmt.Errorf("podman build: %w", err) + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + return runtime.ImageRef{}, fmt.Errorf("podman build: %s", errorBody(resp)) + } + + id, err := parseBuildStream(resp.Body, events) + if err != nil { + return runtime.ImageRef{}, fmt.Errorf("podman build: %w", err) + } + if id == "" { + return runtime.ImageRef{}, fmt.Errorf("podman build: no image id in build output") + } + ref := runtime.ImageRef{ID: id} + if spec.Tag != "" { + ref.Tags = []string{spec.Tag} + } + if events != nil { + select { + case events <- runtime.BuildEvent{Kind: runtime.BuildEventCompleted, Digest: id}: + default: + } + } + return ref, nil +} + +type buildMsg struct { + Stream string `json:"stream"` + Error string `json:"error"` +} + +// parseBuildStream consumes the libpod build response (a stream of JSON +// objects), forwards {"stream":...} as BuildEventLog events, and returns +// the built image id — the last stream line that is a bare 64-hex digest +// (verified: podman emits the full image id as the final stream line). +func parseBuildStream(r io.Reader, events chan<- runtime.BuildEvent) (string, error) { + dec := json.NewDecoder(r) + var imageID string + for { + var m buildMsg + if err := dec.Decode(&m); err != nil { + if err == io.EOF { + break + } + return imageID, err + } + if m.Error != "" { + return imageID, fmt.Errorf("%s", m.Error) + } + if m.Stream == "" { + continue + } + if events != nil { + select { + case events <- runtime.BuildEvent{Kind: runtime.BuildEventLog, Message: m.Stream}: + default: + } + } + if t := strings.TrimSpace(m.Stream); imageIDRe.MatchString(t) { + imageID = t + } + } + return imageID, nil +} diff --git a/runtime/podman/checkpoint.go b/runtime/podman/checkpoint.go new file mode 100644 index 0000000..6c6dd6b --- /dev/null +++ b/runtime/podman/checkpoint.go @@ -0,0 +1,89 @@ +package podman + +import ( + "context" + "encoding/json" + "io" + "net/http" + "net/url" + "os" + "strconv" + + "github.com/crunchloop/devcontainer/runtime" +) + +// Checkpoint exports a running container to a self-contained archive via +// the libpod checkpoint endpoint with export=true (the response body is +// the tar archive). Verified against podman 5.4: +// +// POST /libpod/containers/{id}/checkpoint?export=true&tcpestablished=&leaverunning= +func (r *Runtime) Checkpoint(ctx context.Context, id string, spec runtime.CheckpointSpec) (runtime.CheckpointRef, error) { + q := url.Values{} + q.Set("export", "true") + q.Set("tcpestablished", strconv.FormatBool(spec.TCPEstablished)) + q.Set("leaverunning", strconv.FormatBool(!spec.StopAfter)) + + resp, err := r.lp.do(ctx, http.MethodPost, "/containers/"+id+"/checkpoint", q, nil, "") + if err != nil { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: err} + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Stderr: errorBody(resp)} + } + + f, err := os.Create(spec.ArchivePath) + if err != nil { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: err} + } + n, copyErr := io.Copy(f, resp.Body) + closeErr := f.Close() + if copyErr != nil { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: copyErr} + } + if closeErr != nil { + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: closeErr} + } + return runtime.CheckpointRef{ArchivePath: spec.ArchivePath, Size: n}, nil +} + +// restoreReport is the libpod restore response shape, e.g. +// {"Id":"<64hex>","runtime_restore_duration":0,"criu_statistics":null}. +type restoreReport struct { + ID string `json:"Id"` +} + +// Restore re-creates and resumes a container from a checkpoint archive, +// uploading the archive in the request body. Verified against podman 5.4: +// +// POST /libpod/containers/import/restore?import=true&tcpestablished=&name= +// (body: the tar archive; "import" is the literal path segment) +func (r *Runtime) Restore(ctx context.Context, spec runtime.RestoreSpec) (*runtime.Container, error) { + f, err := os.Open(spec.ArchivePath) + if err != nil { + return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Err: err} + } + defer f.Close() + + q := url.Values{} + q.Set("import", "true") + q.Set("tcpestablished", strconv.FormatBool(spec.TCPEstablished)) + if spec.Name != "" { + q.Set("name", spec.Name) + } + + resp, err := r.lp.do(ctx, http.MethodPost, "/containers/import/restore", q, f, "application/x-tar") + if err != nil { + return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Err: err} + } + defer resp.Body.Close() + if resp.StatusCode != http.StatusOK { + return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Stderr: errorBody(resp)} + } + + var rep restoreReport + if err := json.NewDecoder(resp.Body).Decode(&rep); err != nil { + return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Err: err} + } + return &runtime.Container{ID: rep.ID, Name: spec.Name, State: runtime.StateRunning}, nil +} diff --git a/runtime/podman/integration_test.go b/runtime/podman/integration_test.go new file mode 100644 index 0000000..ea68c01 --- /dev/null +++ b/runtime/podman/integration_test.go @@ -0,0 +1,163 @@ +package podman + +import ( + "context" + "os" + "path/filepath" + "strconv" + "strings" + "testing" + "time" + + "github.com/crunchloop/devcontainer/runtime" +) + +// TestIntegration_CheckpointRestore exercises the full Option-A path +// against a live Podman: the standard surface (pull/run/start/exec) via +// the embedded docker.Runtime over Podman's docker-compatible socket, +// plus Checkpoint/Restore via the libpod REST API on the same socket. It +// also stress-tests the transport-wedge concern (moby client + libpod +// calls against the same Podman store). +// +// Skipped unless PODMAN_SOCKET is set, e.g.: +// +// PODMAN_SOCKET=unix:///run/podman/podman.sock \ +// go test -run Integration -count=1 ./runtime/podman +func TestIntegration_CheckpointRestore(t *testing.T) { + socket := os.Getenv("PODMAN_SOCKET") + if socket == "" { + t.Skip("set PODMAN_SOCKET to run the live Podman integration test") + } + image := os.Getenv("PODMAN_TEST_IMAGE") + if image == "" { + image = "docker.io/library/node:20-slim" + } + + ctx := context.Background() + rt, err := New(ctx, Options{Socket: socket}) + if err != nil { + t.Fatalf("New(%q): %v", socket, err) + } + if !rt.Capabilities().Checkpoint { + t.Fatalf("Capabilities().Checkpoint is false — podman/criu check did not pass") + } + + const name = "dc-podman-integration" + _ = rt.RemoveContainer(ctx, name, runtime.RemoveOptions{Force: true}) + + if _, err := rt.PullImage(ctx, image, nil); err != nil { + t.Fatalf("PullImage: %v", err) + } + + // A counter that keeps its value in memory and mirrors it to a file — + // resume vs cold restart is observable in the file. + c, err := rt.RunContainer(ctx, runtime.RunSpec{ + Image: image, + Name: name, + Cmd: []string{"sh", "-c", "i=0; while true; do i=$((i+1)); echo $i > /count.txt; sleep 1; done"}, + OverrideCommand: false, + }) + if err != nil { + t.Fatalf("RunContainer: %v", err) + } + t.Cleanup(func() { _ = rt.RemoveContainer(context.Background(), name, runtime.RemoveOptions{Force: true}) }) + + if err := rt.StartContainer(ctx, c.ID); err != nil { + t.Fatalf("StartContainer: %v", err) + } + time.Sleep(6 * time.Second) + + before := readCounter(t, ctx, rt, c.ID) + if before <= 0 { + t.Fatalf("counter not advancing before checkpoint (got %d)", before) + } + + arch := filepath.Join(t.TempDir(), "ckpt.tar") + ref, err := rt.Checkpoint(ctx, c.ID, runtime.CheckpointSpec{ArchivePath: arch, StopAfter: true, TCPEstablished: true}) + if err != nil { + t.Fatalf("Checkpoint: %v", err) + } + if ref.Size == 0 { + t.Errorf("checkpoint archive size is 0 (%s)", ref.ArchivePath) + } + + if err := rt.RemoveContainer(ctx, c.ID, runtime.RemoveOptions{Force: true}); err != nil { + t.Fatalf("RemoveContainer (source): %v", err) + } + time.Sleep(2 * time.Second) + + restored, err := rt.Restore(ctx, runtime.RestoreSpec{ArchivePath: arch, TCPEstablished: true}) + if err != nil { + t.Fatalf("Restore: %v", err) + } + t.Cleanup(func() { _ = rt.RemoveContainer(context.Background(), restored.ID, runtime.RemoveOptions{Force: true}) }) + time.Sleep(3 * time.Second) + + after := readCounter(t, ctx, rt, restored.ID) + // Resumed: the counter continues from where it was checkpointed. A + // cold restart would be back near 1–3 (and below `before`). + if after <= before { + t.Fatalf("counter did not resume: before=%d after=%d (looks like a cold restart, not a restore)", before, after) + } + t.Logf("checkpoint/restore OK: before=%d after=%d archive=%d bytes", before, after, ref.Size) +} + +func readCounter(t *testing.T, ctx context.Context, rt *Runtime, id string) int { + t.Helper() + res, err := rt.ExecContainer(ctx, id, runtime.ExecOptions{Cmd: []string{"cat", "/count.txt"}}) + if err != nil { + t.Fatalf("ExecContainer(cat): %v", err) + } + n, err := strconv.Atoi(strings.TrimSpace(res.Stdout)) + if err != nil { + t.Fatalf("parse counter %q: %v", res.Stdout, err) + } + return n +} + +// TestIntegration_BuildImage builds an image with buildah via the Podman +// backend and checks the returned reference + that build logs streamed. +func TestIntegration_BuildImage(t *testing.T) { + socket := os.Getenv("PODMAN_SOCKET") + if socket == "" { + t.Skip("set PODMAN_SOCKET to run the live Podman integration test") + } + base := os.Getenv("PODMAN_TEST_IMAGE") + if base == "" { + base = "docker.io/library/node:20-slim" + } + + ctx := context.Background() + rt, err := New(ctx, Options{Socket: socket}) + if err != nil { + t.Fatalf("New(%q): %v", socket, err) + } + + dir := t.TempDir() + dockerfile := "FROM " + base + "\nRUN echo built > /built.txt\n" + if err := os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte(dockerfile), 0o644); err != nil { + t.Fatal(err) + } + + events := make(chan runtime.BuildEvent, 512) + ref, err := rt.BuildImage(ctx, runtime.BuildSpec{ContextPath: dir, Tag: "dc-buildtest:1"}, events) + close(events) // BuildImage has finished streaming before it returns + if err != nil { + t.Fatalf("BuildImage: %v", err) + } + if ref.ID == "" { + t.Fatalf("BuildImage: empty image ID") + } + t.Cleanup(func() { _ = rt.RemoveImage(context.Background(), "dc-buildtest:1") }) + + var logs int + for e := range events { + if e.Kind == runtime.BuildEventLog && e.Message != "" { + logs++ + } + } + if logs == 0 { + t.Errorf("expected build log events, got none") + } + t.Logf("buildah build OK: id=%s tags=%v logEvents=%d", ref.ID, ref.Tags, logs) +} diff --git a/runtime/podman/libpod.go b/runtime/podman/libpod.go new file mode 100644 index 0000000..2499ff6 --- /dev/null +++ b/runtime/podman/libpod.go @@ -0,0 +1,93 @@ +package podman + +import ( + "context" + "encoding/json" + "fmt" + "io" + "net" + "net/http" + "net/url" + "strings" +) + +// libpodClient is a thin HTTP client for Podman's libpod REST API over a +// unix socket — the endpoints not covered by the docker-compatible API +// the embedded docker.Runtime uses (checkpoint/restore, buildah build). +// +// Deliberately dependency-free (stdlib net/http only): the official +// containers/podman/v5/pkg/bindings drags in the whole Podman module +// (cgo, gpgme, storage build tags, ~300 extra modules) — see +// design/podman-backend.md. This client is ~the same socket the moby +// client uses, so the backend has one transport and no CLI subprocess. +type libpodClient struct { + hc *http.Client + baseURL string +} + +// apiVersion is the version segment in the libpod path. Podman accepts a +// range down to its minimum; this is informational for the libpod API. +const apiVersion = "v5.0.0" + +// newLibpodClient builds a client for the given Podman socket +// (e.g. "unix:///run/podman/podman.sock"). +func newLibpodClient(socket string) *libpodClient { + sockPath := socket + for _, p := range []string{"unix://", "unix:"} { + sockPath = strings.TrimPrefix(sockPath, p) + } + tr := &http.Transport{ + DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) { + return (&net.Dialer{}).DialContext(ctx, "unix", sockPath) + }, + } + return &libpodClient{ + hc: &http.Client{Transport: tr}, + baseURL: "http://d/" + apiVersion + "/libpod", + } +} + +// do issues a request to the libpod API. The caller owns resp.Body. +func (c *libpodClient) do(ctx context.Context, method, path string, query url.Values, body io.Reader, contentType string) (*http.Response, error) { + u := c.baseURL + path + if len(query) > 0 { + u += "?" + query.Encode() + } + req, err := http.NewRequestWithContext(ctx, method, u, body) + if err != nil { + return nil, err + } + if contentType != "" { + req.Header.Set("Content-Type", contentType) + } + return c.hc.Do(req) +} + +// ping reports whether the libpod API is reachable (GET /_ping → 2xx). +func (c *libpodClient) ping(ctx context.Context) bool { + resp, err := c.do(ctx, http.MethodGet, "/_ping", nil, nil, "") + if err != nil { + return false + } + defer resp.Body.Close() + _, _ = io.Copy(io.Discard, resp.Body) + return resp.StatusCode >= 200 && resp.StatusCode < 300 +} + +// apiError is the libpod error response shape +// (e.g. {"cause":"...","message":"...","response":500}). +type apiError struct { + Cause string `json:"cause"` + Message string `json:"message"` + Response int `json:"response"` +} + +// errorBody reads an error response body into a short message. +func errorBody(resp *http.Response) string { + b, _ := io.ReadAll(io.LimitReader(resp.Body, 64*1024)) + var e apiError + if json.Unmarshal(b, &e) == nil && e.Message != "" { + return fmt.Sprintf("%s (http %d)", e.Message, resp.StatusCode) + } + return fmt.Sprintf("http %d: %s", resp.StatusCode, strings.TrimSpace(string(b))) +} diff --git a/runtime/podman/podman.go b/runtime/podman/podman.go new file mode 100644 index 0000000..004b7d1 --- /dev/null +++ b/runtime/podman/podman.go @@ -0,0 +1,121 @@ +// Package podman implements runtime.Runtime on Podman, adding +// CRIU-backed checkpoint/restore (runtime.CheckpointRuntime) — the one +// engine that does the full migration round trip (docker's restore is +// broken on current versions; see design/checkpoint-restore.md). +// +// Transport (design/podman-backend.md, Option A): the standard Runtime +// surface (run/exec/inspect/pull/networks/…) is served by an embedded +// *docker.Runtime pointed at Podman's docker-compatible socket — Podman +// exposes the moby REST API there, so the existing, well-tested docker +// backend works unchanged. Two areas differ and are overridden here, +// both driven through the libpod REST API on the SAME socket (a thin +// stdlib HTTP client — no `podman` CLI subprocess, no heavy +// pkg/bindings dependency): +// +// - Checkpoint/Restore: libpod-only, not in the docker-compat API. +// - BuildImage: the docker backend's build is BuildKit-only, which +// Podman's docker-compat /build does not provide; we build with +// buildah via the libpod /build endpoint. +package podman + +import ( + "context" + "fmt" + + "github.com/crunchloop/devcontainer/runtime" + "github.com/crunchloop/devcontainer/runtime/docker" +) + +// Compile-time assertions: *Runtime satisfies the core Runtime interface +// and the optional CheckpointRuntime sub-interface. +var ( + _ runtime.Runtime = (*Runtime)(nil) + _ runtime.CheckpointRuntime = (*Runtime)(nil) +) + +// Runtime is the Podman backend. It embeds a *docker.Runtime (wired to +// Podman's docker-compatible socket) for the standard surface and adds +// the libpod-only checkpoint/restore + buildah build via a thin libpod +// HTTP client over the same socket. +type Runtime struct { + *docker.Runtime + + lp *libpodClient + + // checkpointOK gates Capabilities().Checkpoint: the libpod API was + // reachable at New, and Options.CheckpointProbe (if supplied) returned + // true. See Options.CheckpointProbe for why CRIU itself can't be + // probed over the socket. + checkpointOK bool +} + +// Options configure New. +type Options struct { + // Socket is the Podman service socket serving both the + // docker-compatible and libpod APIs (e.g. + // "unix:///run/podman/podman.sock"). Required — Podman must be + // running `podman system service`. + Socket string + + // CheckpointProbe optionally asserts CRIU availability on the host + // serving Socket. It gates Capabilities().Checkpoint together with + // libpod reachability, runs once at New, and its result is cached. + // + // The backend cannot verify CRIU itself: the libpod REST API has no + // `criu check` equivalent and /info doesn't report CRIU, and the + // backend is deliberately CLI-free (no `criu check` shell-out). But + // the deployer runs `podman system service` and knows the host, so + // they can supply a probe (exec `criu check`, read a provisioning + // marker, etc.). + // + // Nil means "don't probe": Capabilities().Checkpoint then reflects + // libpod reachability only, and a missing CRIU surfaces at Checkpoint + // time as a *runtime.CheckpointFailedError (callers fall back to a + // cold Up — workspace data on the volume is intact). + CheckpointProbe func(context.Context) bool +} + +// New constructs a Podman runtime: wires the embedded docker.Runtime to +// the Podman service socket and a libpod client to the same socket. +func New(ctx context.Context, opts Options) (*Runtime, error) { + dr, err := docker.New(ctx, docker.Options{Host: opts.Socket}) + if err != nil { + return nil, fmt.Errorf("podman: connect to service socket %q: %w", opts.Socket, err) + } + lp := newLibpodClient(opts.Socket) + return &Runtime{ + Runtime: dr, + lp: lp, + checkpointOK: probeCheckpoint(ctx, lp, opts.CheckpointProbe), + }, nil +} + +// probeCheckpoint reports whether the Checkpoint capability should be set: +// the libpod API must be reachable (a 2xx from /libpod/_ping, which also +// confirms it is genuinely Podman — a docker socket 404s the /libpod/ +// path), and any caller-supplied CRIU probe must also pass. Split out of +// New so it is unit-testable without a daemon. +func probeCheckpoint(ctx context.Context, lp *libpodClient, probe func(context.Context) bool) bool { + if !lp.ping(ctx) { + return false + } + if probe != nil { + return probe(ctx) + } + return true +} + +// Capabilities reports the Podman backend's feature profile. It does not +// delegate to the embedded docker.Runtime: Podman has its own profile, +// and Checkpoint is the bit that matters here. +func (r *Runtime) Capabilities() runtime.Capabilities { + return runtime.Capabilities{ + Healthchecks: true, + ExitCodes: true, + NamespaceSharing: true, + RestartPolicies: true, + SharedVolumes: true, + ServiceNameDNS: true, + Checkpoint: r.checkpointOK, + } +} diff --git a/runtime/podman/podman_test.go b/runtime/podman/podman_test.go new file mode 100644 index 0000000..eafa8ef --- /dev/null +++ b/runtime/podman/podman_test.go @@ -0,0 +1,207 @@ +package podman + +import ( + "context" + "errors" + "io" + "net/http" + "net/http/httptest" + "os" + "path/filepath" + "reflect" + "strings" + "testing" + + "github.com/crunchloop/devcontainer/runtime" +) + +// testRuntime wires a *Runtime's libpod client at an httptest server. +func testRuntime(t *testing.T, h http.HandlerFunc) (*Runtime, *httptest.Server) { + t.Helper() + ts := httptest.NewServer(h) + t.Cleanup(ts.Close) + return &Runtime{lp: &libpodClient{hc: ts.Client(), baseURL: ts.URL}, checkpointOK: true}, ts +} + +func TestCapabilities_GatesCheckpoint(t *testing.T) { + if !(&Runtime{checkpointOK: true}).Capabilities().Checkpoint { + t.Fatal("checkpointOK=true should set Capabilities().Checkpoint") + } + if (&Runtime{checkpointOK: false}).Capabilities().Checkpoint { + t.Fatal("checkpointOK=false should clear Capabilities().Checkpoint") + } +} + +func TestProbeCheckpoint(t *testing.T) { + okPing := func(w http.ResponseWriter, _ *http.Request) { w.WriteHeader(http.StatusOK) } + badPing := func(w http.ResponseWriter, _ *http.Request) { w.WriteHeader(http.StatusInternalServerError) } + yes := func(context.Context) bool { return true } + no := func(context.Context) bool { return false } + + cases := []struct { + name string + ping http.HandlerFunc + probe func(context.Context) bool + want bool + }{ + {"reachable, no probe → reachability only", okPing, nil, true}, + {"reachable, probe asserts criu present", okPing, yes, true}, + {"reachable, probe reports criu missing", okPing, no, false}, + {"unreachable short-circuits before probe", badPing, yes, false}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + ts := httptest.NewServer(tc.ping) + defer ts.Close() + lp := &libpodClient{hc: ts.Client(), baseURL: ts.URL} + if got := probeCheckpoint(context.Background(), lp, tc.probe); got != tc.want { + t.Fatalf("probeCheckpoint = %v, want %v", got, tc.want) + } + }) + } +} + +func TestCheckpoint_RequestAndArchive(t *testing.T) { + var gotPath string + var gotQuery map[string][]string + rt, _ := testRuntime(t, func(w http.ResponseWriter, r *http.Request) { + gotPath = r.URL.Path + gotQuery = r.URL.Query() + _, _ = w.Write([]byte("FAKE-TAR-BYTES")) + }) + + dir := t.TempDir() + arch := filepath.Join(dir, "ckpt.tar") + ref, err := rt.Checkpoint(context.Background(), "c1", runtime.CheckpointSpec{ArchivePath: arch, StopAfter: true, TCPEstablished: true}) + if err != nil { + t.Fatalf("Checkpoint: %v", err) + } + if gotPath != "/containers/c1/checkpoint" { + t.Fatalf("path = %q", gotPath) + } + if gotQuery["export"][0] != "true" || gotQuery["tcpestablished"][0] != "true" || gotQuery["leaverunning"][0] != "false" { + t.Fatalf("query = %v", gotQuery) + } + b, _ := os.ReadFile(arch) + if string(b) != "FAKE-TAR-BYTES" || ref.Size != int64(len("FAKE-TAR-BYTES")) { + t.Fatalf("archive=%q size=%d", b, ref.Size) + } +} + +func TestCheckpoint_ErrorWrapsTyped(t *testing.T) { + rt, _ := testRuntime(t, func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"message":"criu boom","response":500}`)) + }) + _, err := rt.Checkpoint(context.Background(), "c1", runtime.CheckpointSpec{ArchivePath: filepath.Join(t.TempDir(), "a.tar")}) + var cfe *runtime.CheckpointFailedError + if !errors.As(err, &cfe) || !strings.Contains(cfe.Stderr, "criu boom") { + t.Fatalf("want CheckpointFailedError with message, got %v", err) + } +} + +func TestRestore_SendsBodyAndParsesID(t *testing.T) { + var gotPath, gotBody string + var gotQuery map[string][]string + rt, _ := testRuntime(t, func(w http.ResponseWriter, r *http.Request) { + gotPath = r.URL.Path + gotQuery = r.URL.Query() + b, _ := io.ReadAll(r.Body) + gotBody = string(b) + _, _ = w.Write([]byte(`{"Id":"restored-abc","runtime_restore_duration":0}`)) + }) + + dir := t.TempDir() + arch := filepath.Join(dir, "ckpt.tar") + _ = os.WriteFile(arch, []byte("ARCHIVE"), 0o600) + + c, err := rt.Restore(context.Background(), runtime.RestoreSpec{ArchivePath: arch, Name: "ws", TCPEstablished: true}) + if err != nil { + t.Fatalf("Restore: %v", err) + } + if c.ID != "restored-abc" || c.State != runtime.StateRunning { + t.Fatalf("container = %+v", c) + } + if gotPath != "/containers/import/restore" || gotQuery["import"][0] != "true" || gotQuery["name"][0] != "ws" { + t.Fatalf("path=%q query=%v", gotPath, gotQuery) + } + if gotBody != "ARCHIVE" { + t.Fatalf("body = %q (archive should be uploaded)", gotBody) + } +} + +func TestRestore_ErrorWrapsTyped(t *testing.T) { + rt, _ := testRuntime(t, func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + _, _ = w.Write([]byte(`{"message":"spec.dump missing","response":500}`)) + }) + dir := t.TempDir() + arch := filepath.Join(dir, "ckpt.tar") + _ = os.WriteFile(arch, []byte("x"), 0o600) + _, err := rt.Restore(context.Background(), runtime.RestoreSpec{ArchivePath: arch}) + var rfe *runtime.RestoreFailedError + if !errors.As(err, &rfe) { + t.Fatalf("want RestoreFailedError, got %v", err) + } +} + +func TestBuildQuery(t *testing.T) { + q := buildQuery(runtime.BuildSpec{ + ContextPath: "/ctx", Dockerfile: "/ctx/Dockerfile", Tag: "img:1", + Args: map[string]string{"A": "1"}, Target: "dev", NoCache: true, Platform: "linux/amd64", + }) + if q.Get("dockerfile") != `["Dockerfile"]` { + t.Errorf("dockerfile = %q", q.Get("dockerfile")) + } + if q.Get("t") != "img:1" || q.Get("buildargs") != `{"A":"1"}` || q.Get("target") != "dev" || q.Get("nocache") != "1" || q.Get("platform") != "linux/amd64" { + t.Errorf("query = %v", q) + } +} + +func TestBuildImage_StreamsAndParsesID(t *testing.T) { + const id = "c6f20fb73390b3ee69f99e99b7491af1214c79ab1106a1f7b117f52056eecdee" + var gotBody []byte + var gotDockerfile string + rt, _ := testRuntime(t, func(w http.ResponseWriter, r *http.Request) { + gotDockerfile = r.URL.Query().Get("dockerfile") + gotBody, _ = io.ReadAll(r.Body) + _, _ = io.WriteString(w, `{"stream":"STEP 1/2\n"}`+"\n") + _, _ = io.WriteString(w, `{"stream":"Successfully tagged localhost/img:1\n"}`+"\n") + _, _ = io.WriteString(w, `{"stream":"`+id+`\n"}`+"\n") + }) + + dir := t.TempDir() + _ = os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte("FROM scratch\n"), 0o644) + + events := make(chan runtime.BuildEvent, 64) + ref, err := rt.BuildImage(context.Background(), runtime.BuildSpec{ContextPath: dir, Tag: "img:1"}, events) + close(events) + if err != nil { + t.Fatalf("BuildImage: %v", err) + } + if ref.ID != id || !reflect.DeepEqual(ref.Tags, []string{"img:1"}) { + t.Fatalf("ref = %+v", ref) + } + if gotDockerfile != `["Dockerfile"]` { + t.Fatalf("dockerfile param = %q", gotDockerfile) + } + if len(gotBody) == 0 { + t.Fatalf("context tar was not uploaded as body") + } + var logs int + for e := range events { + if e.Kind == runtime.BuildEventLog { + logs++ + } + } + if logs != 3 { + t.Fatalf("expected 3 log events, got %d", logs) + } +} + +func TestParseBuildStream_Error(t *testing.T) { + _, err := parseBuildStream(strings.NewReader(`{"stream":"step\n"}`+"\n"+`{"error":"build kaboom"}`), nil) + if err == nil || !strings.Contains(err.Error(), "build kaboom") { + t.Fatalf("want build error, got %v", err) + } +} diff --git a/runtime/runtime.go b/runtime/runtime.go index 3e01975..0e47673 100644 --- a/runtime/runtime.go +++ b/runtime/runtime.go @@ -85,6 +85,81 @@ type ComposePsSpec struct { WorkingDir string } +// CheckpointRuntime is the optional sub-interface a Runtime implements +// when it can checkpoint a running container to a portable archive +// (process + memory state via CRIU, plus the writable rootfs layer) and +// later restore it — possibly on another host — into a fresh container. +// +// Implemented by runtime/podman (podman container checkpoint --export / +// restore --import). NOT implemented by runtime/docker: docker's restore +// is broken on current containerd-integrated engines (see +// design/checkpoint-restore.md). Engine.Checkpoint / Engine.Restore +// type-assert this and return ErrCheckpointUnsupported when the active +// runtime doesn't satisfy it (or advertises Capabilities().Checkpoint +// == false). +type CheckpointRuntime interface { + // Checkpoint writes a self-contained checkpoint archive for a + // running container to spec.ArchivePath. The archive carries the + // CRIU image, the writable rootfs diff, and the config needed to + // restore. With spec.StopAfter the container is stopped/removed + // after the archive is written (the spot-eviction path); otherwise + // it keeps running ("backup" checkpoint). + Checkpoint(ctx context.Context, id string, spec CheckpointSpec) (CheckpointRef, error) + + // Restore re-creates and resumes a container from a checkpoint + // archive, reconstructing its mounts and re-attaching networking. + // Restores into a NEW container (migration), so the source may be + // gone. Returns the new Container handle. + Restore(ctx context.Context, spec RestoreSpec) (*Container, error) +} + +// CheckpointSpec configures CheckpointRuntime.Checkpoint. +type CheckpointSpec struct { + // ArchivePath is the file the export archive is written to. The + // archive is self-contained and node-independent, so cross-node + // restore is just moving this file — point it at durable, + // transferable storage (the workspace volume, object storage). + ArchivePath string + + // StopAfter stops/removes the container after a successful export + // (the spot-eviction path: the node is going away). False keeps the + // container running. + StopAfter bool + + // TCPEstablished requests checkpoint of established TCP connections. + // Required for any container holding a live connection at checkpoint + // time — without it the checkpoint fails. Reconnecting clients + // recover regardless; a persistent connection across a peer-IP + // change on restore is the residual edge. + TCPEstablished bool +} + +// RestoreSpec configures CheckpointRuntime.Restore. The archive is +// self-describing (image, config, mounts, rootfs), so no RunSpec is +// needed — unlike a cold create. +type RestoreSpec struct { + // ArchivePath is the archive a prior Checkpoint wrote. + ArchivePath string + + // Name optionally names the restored container. Empty lets the + // backend choose (or reuse the archived name). + Name string + + // TCPEstablished must match the checkpoint when the archive captured + // established connections. + TCPEstablished bool +} + +// CheckpointRef describes a written checkpoint archive. +type CheckpointRef struct { + // ArchivePath echoes where the archive was written. + ArchivePath string + + // Size is the archive size in bytes — feeds the caller's + // eviction-window / transfer budgeting. Best-effort; 0 if unknown. + Size int64 +} + // Runtime is the container backend. Implementations must be safe for // concurrent use; the engine may issue concurrent Inspect / Exec calls // against the same container. diff --git a/test/integration/podman_checkpoint_restore_test.go b/test/integration/podman_checkpoint_restore_test.go new file mode 100644 index 0000000..f48e321 --- /dev/null +++ b/test/integration/podman_checkpoint_restore_test.go @@ -0,0 +1,141 @@ +//go:build integration && linux + +// End-to-end Engine-level checkpoint/restore against a live Podman. +// +// Unlike runtime/podman/integration_test.go — which drives the Runtime +// directly and proves *memory* resume — this exercises the engine path: +// Engine.Up → Engine.Checkpoint → Engine.Restore, and asserts the part +// that only exists at the engine level: Restore rebuilds a full +// *Workspace. The workspace id is recovered from the devcontainer label +// the archive preserves, the restored container is live, and Exec works +// through the reattached workspace (substituter bound to the live env, +// rootfs back from the archive). Memory resume is covered at the runtime +// level; here the contract under test is the reattach. +// +// Linux-only (Podman) and skipped unless PODMAN_SOCKET is set: +// +// PODMAN_SOCKET=unix:///run/podman/podman.sock \ +// go test -tags integration -run Podman -count=1 ./test/integration +package integration + +import ( + "context" + "os" + "path/filepath" + "strings" + "testing" + "time" + + devcontainer "github.com/crunchloop/devcontainer" + "github.com/crunchloop/devcontainer/runtime" + "github.com/crunchloop/devcontainer/runtime/podman" +) + +func newPodmanEngine(t *testing.T) (*devcontainer.Engine, *podman.Runtime) { + t.Helper() + socket := os.Getenv("PODMAN_SOCKET") + if socket == "" { + t.Skip("set PODMAN_SOCKET to run the live Podman engine integration test") + } + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + rt, err := podman.New(ctx, podman.Options{Socket: socket}) + if err != nil { + t.Skipf("Podman service unavailable at %q: %v", socket, err) + } + if !rt.Capabilities().Checkpoint { + t.Fatalf("Capabilities().Checkpoint is false — libpod API not reachable at %q", socket) + } + eng, err := devcontainer.New(devcontainer.EngineOptions{Runtime: rt}) + if err != nil { + t.Fatalf("New: %v", err) + } + return eng, rt +} + +func TestPodmanEngine_CheckpointRestore_ReattachesWorkspace(t *testing.T) { + if testing.Short() { + t.Skip("integration tests skipped with -short") + } + image := os.Getenv("PODMAN_TEST_IMAGE") + if image == "" { + image = "docker.io/library/alpine:3.20" + } + + eng, rt := newPodmanEngine(t) + + ws := writeWorkspace(t, `{"image": "`+image+`", "containerEnv": {"CKPT_MARKER": "reattach-ok"}}`) + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() + + wsObj, err := eng.Up(ctx, devcontainer.UpOptions{LocalWorkspaceFolder: ws, Recreate: true}) + if err != nil { + t.Fatalf("Up: %v", err) + } + origID := wsObj.ID + t.Cleanup(func() { + _ = eng.Down(context.Background(), wsObj, devcontainer.DownOptions{Remove: true}) + }) + + // Drop a marker into the writable rootfs. Podman's --export bundles + // the rootfs layer, so the marker must survive into the restored + // container — a check that restore reconstructed the filesystem, not + // just the process. + if _, err := eng.Exec(ctx, wsObj, devcontainer.ExecOptions{Cmd: []string{"sh", "-c", "echo persisted > /reattach-marker"}}); err != nil { + t.Fatalf("Exec (write marker): %v", err) + } + + arch := filepath.Join(t.TempDir(), "ckpt.tar") + ref, err := eng.Checkpoint(ctx, wsObj, devcontainer.CheckpointOptions{ArchivePath: arch, StopAfter: true, TCPEstablished: true}) + if err != nil { + t.Fatalf("Checkpoint: %v", err) + } + if ref.Size == 0 { + t.Errorf("checkpoint archive is empty (%s)", ref.ArchivePath) + } + + // Migration shape: the source is gone. Remove the original container so + // the restore truly recreates from the archive (and so no stale + // container lingers sharing the devcontainer-id label). + if err := rt.RemoveContainer(ctx, wsObj.Container.ID, runtime.RemoveOptions{Force: true}); err != nil { + t.Fatalf("RemoveContainer (source): %v", err) + } + time.Sleep(2 * time.Second) + + restored, err := eng.Restore(ctx, devcontainer.RestoreOptions{ArchivePath: arch, TCPEstablished: true}) + if err != nil { + t.Fatalf("Restore: %v", err) + } + t.Cleanup(func() { + if restored != nil && restored.Container != nil { + _ = rt.RemoveContainer(context.Background(), restored.Container.ID, runtime.RemoveOptions{Force: true}) + } + }) + + // --- the reattach contract ------------------------------------------ + if restored.ID != origID { + t.Errorf("reattached workspace id = %q, want %q (must be recovered from the preserved %s label)", + restored.ID, origID, devcontainer.LabelDevcontainerID) + } + if restored.Container == nil || restored.Container.ID == "" { + t.Fatalf("restored workspace has no container") + } + if got := restored.Container.Labels[devcontainer.LabelDevcontainerID]; got != string(origID) { + t.Errorf("restored container devcontainer-id label = %q, want %q", got, origID) + } + if restored.Container.State != runtime.StateRunning { + t.Errorf("restored container state = %q, want running", restored.Container.State) + } + + // Exec through the reattached workspace: proves the substituter is + // bound to the live container and the rootfs returned with the marker. + res, err := eng.Exec(ctx, restored, devcontainer.ExecOptions{Cmd: []string{"cat", "/reattach-marker"}}) + if err != nil { + t.Fatalf("Exec (read marker) through reattached workspace: %v", err) + } + if res.ExitCode != 0 || !strings.Contains(res.Stdout, "persisted") { + t.Errorf("marker read exit=%d stdout=%q — rootfs/exec not reattached", res.ExitCode, res.Stdout) + } + t.Logf("engine reattach OK: id=%s container=%s archive=%d bytes", restored.ID, restored.Container.ID, ref.Size) +} diff --git a/test/integration/podman_crossnode_test.go b/test/integration/podman_crossnode_test.go new file mode 100644 index 0000000..b59b8aa --- /dev/null +++ b/test/integration/podman_crossnode_test.go @@ -0,0 +1,169 @@ +//go:build integration && linux + +// Cross-node checkpoint/restore: the relocated-pod case. A pod running +// Podman is reclaimed and its devcontainer must resume on a *different* +// node whose Podman store never saw the image. We model "different node" +// as two Podman stores (two hosts) that share only a file path for the +// archive — exactly the production shape: each pod talks to its OWN local +// Podman socket, and the archive travels via PVC/registry (here, a shared +// directory). +// +// This is a TWO-PHASE test, run once per machine, coordinated through a +// shared DCCKPT_XNODE_DIR (must be a path visible to both, e.g. an +// OrbStack /Users mount): +// +// # on the SOURCE host (its own Podman store): +// PODMAN_SOCKET=unix:///run/podman/podman.sock DCCKPT_XNODE_DIR=/Users/.../xnode \ +// ./integration_arm64.test -test.run TestPodmanXNode_Checkpoint -test.v +// # on the DESTINATION host (a DIFFERENT, fresh Podman store): +// PODMAN_SOCKET=unix:///run/podman/podman.sock DCCKPT_XNODE_DIR=/Users/.../xnode \ +// ./integration_arm64.test -test.run TestPodmanXNode_Restore -test.v +// +// The destination phase NEVER pulls the image: if restore succeeds and the +// image then exists in its store, the archive proved self-contained. +package integration + +import ( + "context" + "os" + "path/filepath" + "strings" + "testing" + "time" + + devcontainer "github.com/crunchloop/devcontainer" + "github.com/crunchloop/devcontainer/runtime" +) + +const xnodeMarker = "xnode-persisted" + +func xnodeDir(t *testing.T) string { + d := os.Getenv("DCCKPT_XNODE_DIR") + if d == "" { + t.Skip("set DCCKPT_XNODE_DIR (a path shared between both hosts) to run the cross-node test") + } + if err := os.MkdirAll(d, 0o755); err != nil { + t.Fatalf("mkdir xnode dir: %v", err) + } + return d +} + +func xnodeImage() string { + if v := os.Getenv("PODMAN_TEST_IMAGE"); v != "" { + return v + } + return "docker.io/library/alpine:3.20" +} + +// Phase 1 (SOURCE host): bring a devcontainer up, mark its rootfs, and +// checkpoint it to the shared archive dir. Records the container name so +// the destination phase can clear a name collision on rerun. +func TestPodmanXNode_Checkpoint(t *testing.T) { + if testing.Short() { + t.Skip("integration tests skipped with -short") + } + dir := xnodeDir(t) + eng, _ := newPodmanEngine(t) + image := xnodeImage() + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() + + // The workspace folder must live on the SHARED path: a devcontainer + // binds LocalWorkspaceFolder into the container, and cross-node restore + // fails unless that bind source exists on the destination host too + // (checkpoint-restore.md §2.4). This models the consumer's workspace + // PVC, reattached on the new node. (A machine-local temp dir here makes + // restore fail with crun "error stat'ing : No such file".) + wsDir := filepath.Join(dir, "ws") + if err := os.MkdirAll(filepath.Join(wsDir, ".devcontainer"), 0o755); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(filepath.Join(wsDir, ".devcontainer", "devcontainer.json"), []byte(`{"image": "`+image+`"}`), 0o644); err != nil { + t.Fatal(err) + } + wsObj, err := eng.Up(ctx, devcontainer.UpOptions{LocalWorkspaceFolder: wsDir, Recreate: true}) + if err != nil { + t.Fatalf("Up: %v", err) + } + // Mark the writable rootfs — must survive into the archive and across hosts. + if _, err := eng.Exec(ctx, wsObj, devcontainer.ExecOptions{Cmd: []string{"sh", "-c", "echo " + xnodeMarker + " > /xnode-marker"}}); err != nil { + t.Fatalf("Exec (write marker): %v", err) + } + + if _, err := eng.Checkpoint(ctx, wsObj, devcontainer.CheckpointOptions{ + ArchivePath: filepath.Join(dir, "single.tar"), StopAfter: true, TCPEstablished: true, + }); err != nil { + t.Fatalf("Checkpoint: %v", err) + } + // Hand the container name to the destination phase (restore re-creates + // it with the archived name; a stale one would collide on rerun). + if err := os.WriteFile(filepath.Join(dir, "name.txt"), []byte(wsObj.Container.Name), 0o644); err != nil { + t.Fatalf("write name.txt: %v", err) + } + t.Logf("cross-node checkpoint written: %s (container %q, workspace %q)", filepath.Join(dir, "single.tar"), wsObj.Container.Name, wsObj.ID) +} + +// Phase 2 (DESTINATION host, a DIFFERENT/fresh Podman store): restore from +// the shared archive WITHOUT ever pulling the image, and assert the +// workspace reattaches, the rootfs marker survived, and the image is now +// present (populated from the archive) — i.e. the archive is self-contained +// and node-independent. +func TestPodmanXNode_Restore(t *testing.T) { + if testing.Short() { + t.Skip("integration tests skipped with -short") + } + dir := xnodeDir(t) + eng, rt := newPodmanEngine(t) + image := xnodeImage() + archive := filepath.Join(dir, "single.tar") + if _, err := os.Stat(archive); err != nil { + t.Skipf("no archive at %s — run TestPodmanXNode_Checkpoint on the source host first", archive) + } + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() + + // Clear any prior restored container (rerun name collision) and the + // image, so "image present after restore" genuinely means "from the + // archive". We deliberately do NOT pull the image here. + if b, err := os.ReadFile(filepath.Join(dir, "name.txt")); err == nil { + if nm := strings.TrimSpace(string(b)); nm != "" { + _ = rt.RemoveContainer(ctx, nm, runtime.RemoveOptions{Force: true}) + } + } + _ = rt.RemoveImage(ctx, image) + + restored, err := eng.Restore(ctx, devcontainer.RestoreOptions{ArchivePath: archive, TCPEstablished: true}) + if err != nil { + t.Fatalf("Restore on fresh store: %v", err) + } + t.Cleanup(func() { + if restored != nil && restored.Container != nil { + _ = rt.RemoveContainer(context.Background(), restored.Container.ID, runtime.RemoveOptions{Force: true}) + } + }) + + if restored.ID == "" { + t.Error("reattached workspace has empty id (devcontainer label not recovered from archive)") + } + if restored.Container == nil || restored.Container.State != runtime.StateRunning { + t.Fatalf("restored container not running: %+v", restored.Container) + } + + // Rootfs traveled: the marker we wrote on the source host is present. + res, err := eng.Exec(ctx, restored, devcontainer.ExecOptions{Cmd: []string{"cat", "/xnode-marker"}}) + if err != nil { + t.Fatalf("Exec (read marker): %v", err) + } + if !strings.Contains(res.Stdout, xnodeMarker) { + t.Errorf("marker = %q, want %q — rootfs did not travel in the archive", res.Stdout, xnodeMarker) + } + + // Self-contained: the image now exists in this store, though we never + // pulled it — it was populated from the archive. + if _, err := rt.InspectImage(ctx, image); err != nil { + t.Errorf("image %q absent after restore (%v) — archive was not self-contained", image, err) + } + t.Logf("cross-node restore OK on fresh store: workspace=%q container=%s marker+image from archive", restored.ID, restored.Container.ID) +} diff --git a/test/integration/podman_project_checkpoint_test.go b/test/integration/podman_project_checkpoint_test.go new file mode 100644 index 0000000..4baf4ac --- /dev/null +++ b/test/integration/podman_project_checkpoint_test.go @@ -0,0 +1,195 @@ +//go:build integration && linux + +// Multi-service project checkpoint/restore through the engine orchestrator +// (Engine.CheckpointProject / RestoreProject) against a live Podman. +// +// Codifies the Phase-0 multi-service spike: two containers sharing a +// compose-project label on a user-defined network — a TCP "server" and a +// "client" that connects to it by name every second and counts successful +// round-trips into /count.txt. We checkpoint the whole project, remove +// both, restore the whole project, and assert (a) every service comes +// back, (b) the devcontainer service reattaches as the Primary workspace +// with its id recovered from the preserved label, and (c) the client's +// counter resumes climbing — proving memory resumed AND the inter-service +// link (service-name DNS over the shared network) re-formed. +// +// Linux-only (Podman); skipped unless PODMAN_SOCKET is set: +// +// PODMAN_SOCKET=unix:///run/podman/podman.sock \ +// go test -tags integration -run PodmanProject -count=1 ./test/integration +package integration + +import ( + "context" + "os" + "strconv" + "strings" + "testing" + "time" + + devcontainer "github.com/crunchloop/devcontainer" + "github.com/crunchloop/devcontainer/compose" + "github.com/crunchloop/devcontainer/runtime" + "github.com/crunchloop/devcontainer/runtime/podman" +) + +func TestPodmanProject_CheckpointRestore_MultiService(t *testing.T) { + if testing.Short() { + t.Skip("integration tests skipped with -short") + } + image := imageOrDefault() + + eng, rt := newPodmanEngine(t) + + const ( + project = "dcckpt-itest" + netName = "dcckpt-itest-net" + serverNm = "dcckptserver" + clientNm = "dcckptclient" + clientWID = "ws-dcckpt-client" + ) + ctx, cancel := context.WithTimeout(context.Background(), 6*time.Minute) + defer cancel() + + if _, err := rt.PullImage(ctx, image, nil); err != nil { + t.Fatalf("PullImage: %v", err) + } + + // Fresh network for the project (idempotent across reruns). + if _, err := rt.CreateNetwork(ctx, runtime.NetworkSpec{ + Name: netName, + Labels: map[string]string{compose.LabelComposeProject: project}, + }); err != nil { + t.Fatalf("CreateNetwork: %v", err) + } + t.Cleanup(func() { _ = rt.RemoveNetwork(context.Background(), netName) }) + + // server: a trivial TCP server on :9000. + runService(t, ctx, rt, runtime.RunSpec{ + Image: image, + Name: serverNm, + Networks: []string{netName}, + OverrideCommand: false, + Cmd: []string{"node", "-e", "require('net').createServer(s=>s.end('ok')).listen(9000,()=>console.log('listening'))"}, + Labels: map[string]string{ + compose.LabelComposeProject: project, + compose.LabelComposeService: "server", + }, + }) + + // client: connect to the server by name each second, count successful + // round-trips into /count.txt. + clientScript := "const net=require('net'),fs=require('fs');let n=0;" + + "setInterval(()=>{const s=net.connect(9000,'" + serverNm + "');" + + "s.on('connect',()=>{n++;try{fs.writeFileSync('/count.txt',String(n))}catch(e){}s.end()});" + + "s.on('error',()=>{})},1000);" + client := runService(t, ctx, rt, runtime.RunSpec{ + Image: image, + Name: clientNm, + Networks: []string{netName}, + OverrideCommand: false, + Cmd: []string{"node", "-e", clientScript}, + Labels: map[string]string{ + compose.LabelComposeProject: project, + compose.LabelComposeService: "client", + devcontainer.LabelDevcontainerID: clientWID, + devcontainer.LabelLocalWorkspaceFolder: "/work", + }, + }) + + // Let the link establish. + time.Sleep(8 * time.Second) + before := readCount(t, ctx, rt, client.ID) + if before <= 0 { + t.Fatalf("client counter not advancing before checkpoint (%d) — link never formed", before) + } + + // Build the anchor workspace from the client (the devcontainer service). + details, err := rt.InspectContainer(ctx, client.ID) + if err != nil { + t.Fatalf("InspectContainer(client): %v", err) + } + ws := &devcontainer.Workspace{Container: details} + + dir := t.TempDir() + ref, err := eng.CheckpointProject(ctx, ws, devcontainer.ProjectCheckpointOptions{ + ArchiveDir: dir, StopAfter: true, TCPEstablished: true, + }) + if err != nil { + t.Fatalf("CheckpointProject: %v", err) + } + if len(ref.Services) != 2 { + t.Fatalf("checkpointed %d services, want 2 (%+v)", len(ref.Services), ref.Services) + } + + // Migration shape: both sources gone. Network stays so restore can + // re-attach to it. + for _, nm := range []string{clientNm, serverNm} { + if err := rt.RemoveContainer(ctx, nm, runtime.RemoveOptions{Force: true}); err != nil { + t.Fatalf("RemoveContainer(%s): %v", nm, err) + } + } + time.Sleep(2 * time.Second) + + pr, err := eng.RestoreProject(ctx, devcontainer.ProjectRestoreOptions{ArchiveDir: dir, TCPEstablished: true}) + if err != nil { + t.Fatalf("RestoreProject: %v", err) + } + t.Cleanup(func() { + for _, c := range pr.Services { + if c != nil { + _ = rt.RemoveContainer(context.Background(), c.ID, runtime.RemoveOptions{Force: true}) + } + } + }) + + if pr.Services["server"] == nil || pr.Services["client"] == nil { + t.Fatalf("restored services = %v, want server+client", pr.Services) + } + if pr.Primary == nil || pr.Primary.ID != clientWID { + t.Fatalf("Primary = %+v, want reattached workspace id %q", pr.Primary, clientWID) + } + + // Link + memory resumed: the counter climbs past its pre-checkpoint value. + time.Sleep(6 * time.Second) + after := readCount(t, ctx, rt, pr.Services["client"].ID) + if after <= before { + t.Fatalf("counter did not resume/relink: before=%d after=%d (cold restart or broken DNS)", before, after) + } + t.Logf("multi-service C/R OK: before=%d after=%d primary=%s services=%d", before, after, pr.Primary.ID, len(pr.Services)) +} + +func imageOrDefault() string { + // Reuse the runtime-level test's override knob. + if v := os.Getenv("PODMAN_TEST_IMAGE"); v != "" { + return v + } + return "docker.io/library/node:20-slim" +} + +func runService(t *testing.T, ctx context.Context, rt *podman.Runtime, spec runtime.RunSpec) *runtime.Container { + t.Helper() + _ = rt.RemoveContainer(ctx, spec.Name, runtime.RemoveOptions{Force: true}) + c, err := rt.RunContainer(ctx, spec) + if err != nil { + t.Fatalf("RunContainer(%s): %v", spec.Name, err) + } + t.Cleanup(func() { _ = rt.RemoveContainer(context.Background(), spec.Name, runtime.RemoveOptions{Force: true}) }) + if err := rt.StartContainer(ctx, c.ID); err != nil { + t.Fatalf("StartContainer(%s): %v", spec.Name, err) + } + return c +} + +func readCount(t *testing.T, ctx context.Context, rt *podman.Runtime, id string) int { + t.Helper() + res, err := rt.ExecContainer(ctx, id, runtime.ExecOptions{Cmd: []string{"cat", "/count.txt"}}) + if err != nil || res.ExitCode != 0 { + return 0 + } + n, err := strconv.Atoi(strings.TrimSpace(res.Stdout)) + if err != nil { + return 0 + } + return n +} From a63b789884f7d580125a58cee3a913b617402d9f Mon Sep 17 00:00:00 2001 From: bilby91 Date: Sun, 21 Jun 2026 21:36:09 -0300 Subject: [PATCH 2/4] address PR review: harden Podman checkpoint/restore + fix CI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CodeRabbit review on #98. Substantive fixes: - runtime/podman/checkpoint.go: write the archive 0600 (it carries process memory) instead of os.Create's umask default; delete a partial archive if the copy/close fails; reject an empty container id from libpod restore. - runtime/podman/build.go: CloseWithError the pipe reader when the /build request errors, so the tarDir goroutine doesn't leak parked on the pipe. - checkpoint.go: Engine.Restore now errors if the restored container has no dev.containers.id label (was silently returning an empty Workspace.ID). - checkpoint_project.go: reject duplicate service names (would overwrite archives); write project.json atomically via temp+rename so a present manifest is always complete; validate manifest archive entries are plain basenames (block path traversal on restore). - ci.yml: criu was dropped from Ubuntu 24.04 (the install failed with "no installation candidate"); pin ubuntu-22.04 (jammy ships criu in universe) and have the setup step skip the job green when criu can't be installed or `criu check` fails, so it's a ready harness rather than a red X. - design/podman-backend.md: §5.1/§5.2 still presented the CLI plan as final; marked superseded by the libpod-REST implementation. - multi-service integration test: clear a leftover network before CreateNetwork so reruns after an interrupted run are idempotent. Skipped: the unpinned-actions flag — matches the repo-wide @vN convention (every existing job uses it); out of scope for this PR. Re-validated on real Podman+CRIU (OrbStack): all integration tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/ci.yml | 51 +++++++++++-------- checkpoint.go | 3 ++ checkpoint_project.go | 25 ++++++++- design/podman-backend.md | 23 +++++++-- runtime/podman/build.go | 3 ++ runtime/podman/checkpoint.go | 22 +++++--- .../podman_project_checkpoint_test.go | 4 +- 7 files changed, 96 insertions(+), 35 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6e38de8..a0536b5 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -132,13 +132,16 @@ jobs: # for real. The cross-node test (TestPodmanXNode_*) needs two hosts, so # it skips here (no DCCKPT_XNODE_DIR) — run it on two machines by hand. # - # continue-on-error: whether GitHub's hosted ubuntu kernel can do CRIU - # checkpoint/restore end-to-end is unverified (criu check can pass while - # a real dump still hits a missing kernel feature). Kept non-blocking - # like the darwin VZ job below; drop it once a green run is confirmed, - # else move this job to a self-hosted CRIU-capable runner. + # Runner notes: criu was dropped from Ubuntu 24.04, so this pins + # ubuntu-22.04 (jammy still ships it in universe). Whether the hosted + # kernel can actually checkpoint/restore is not guaranteed, so the setup + # step SKIPS the job green when criu can't be installed or `criu check` + # fails — the job is a ready harness that runs for real wherever CRIU + # works (incl. a future self-hosted runner). continue-on-error is a + # backstop for a dump that fails after criu check passed; drop it once a + # green real run is confirmed. test-integration-podman: - runs-on: ubuntu-latest + runs-on: ubuntu-22.04 needs: [lint, test-linux] continue-on-error: true steps: @@ -147,30 +150,34 @@ jobs: with: go-version: "1.25" cache: true - - name: Install Podman + CRIU + crun + - name: Build gated test binaries run: | - set -euo pipefail - sudo apt-get update -qq - # iptables is REQUIRED: CRIU shells out to iptables-restore to - # lock the network namespace during a dump (without it the dump - # fails with "execvp(iptables-restore) ... No such file"). - sudo apt-get install -y -qq podman criu crun iptables uidmap - podman --version && criu --version && crun --version - - name: criu check (capability gate) - run: sudo criu check - - name: Start Podman API socket + go test -tags=integration -c ./test/integration -o ./int.test + go test -c ./runtime/podman -o ./podman.test + - name: Install Podman + CRIU + crun (skip job if unavailable) + id: setup run: | - set -euo pipefail + set -uo pipefail + sudo add-apt-repository -y universe || true + sudo apt-get update -qq || true + # iptables is REQUIRED: CRIU shells out to iptables-restore to lock + # the netns during a dump (without it: "execvp(iptables-restore) + # ... No such file"). + if ! sudo apt-get install -y -qq podman criu crun iptables uidmap; then + echo "::warning::criu/podman not installable on this runner — skipping real checkpoint/restore" + echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 + fi + if ! sudo criu check; then + echo "::warning::criu check failed on this runner kernel — skipping checkpoint/restore" + echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 + fi sudo mkdir -p /etc/containers printf '[engine]\nevents_logger="file"\n' | sudo tee /etc/containers/containers.conf >/dev/null sudo systemctl enable --now podman.socket for i in $(seq 1 30); do sudo test -S /run/podman/podman.sock && break; sleep 1; done sudo test -S /run/podman/podman.sock - - name: Build gated test binaries - run: | - go test -tags=integration -c ./test/integration -o ./int.test - go test -c ./runtime/podman -o ./podman.test - name: Run Podman checkpoint/restore tests (root; checkpoint needs it) + if: steps.setup.outputs.skip != '1' env: PODMAN_SOCKET: unix:///run/podman/podman.sock run: | diff --git a/checkpoint.go b/checkpoint.go index 1f9ee9b..3f8f407 100644 --- a/checkpoint.go +++ b/checkpoint.go @@ -132,5 +132,8 @@ func (e *Engine) Restore(ctx context.Context, opts RestoreOptions) (*Workspace, return nil, fmt.Errorf("restore: inspect restored container %s: %w", c.ID, err) } id := WorkspaceID(details.Labels[LabelDevcontainerID]) + if id == "" { + return nil, fmt.Errorf("restore: restored container %s has no %s label — not a devcontainer workspace archive", c.ID, LabelDevcontainerID) + } return e.reattachWorkspace(ctx, details, id, opts.LocalEnv), nil } diff --git a/checkpoint_project.go b/checkpoint_project.go index 3b2d2db..edf951f 100644 --- a/checkpoint_project.go +++ b/checkpoint_project.go @@ -7,6 +7,7 @@ import ( "os" "path/filepath" "sort" + "strings" "github.com/crunchloop/devcontainer/compose" "github.com/crunchloop/devcontainer/runtime" @@ -137,8 +138,16 @@ func (e *Engine) CheckpointProject(ctx context.Context, ws *Workspace, opts Proj } ref := ProjectCheckpointRef{Project: project, ArchiveDir: opts.ArchiveDir} + seen := make(map[string]bool, len(containers)) for _, c := range containers { svc := serviceName(c) + // Distinct archive per service; a collision would silently + // overwrite (and restore would collapse the entries). v1 assumes + // one container per service — reject scaled services explicitly. + if seen[svc] { + return ProjectCheckpointRef{}, fmt.Errorf("CheckpointProject: duplicate service name %q in project %q (scaled services are not supported)", svc, project) + } + seen[svc] = true archive := svc + ".tar" cref, err := cr.Checkpoint(ctx, c.ID, runtime.CheckpointSpec{ ArchivePath: filepath.Join(opts.ArchiveDir, archive), @@ -230,7 +239,14 @@ func writeProjectManifest(dir string, ref ProjectCheckpointRef) error { if err != nil { return err } - return os.WriteFile(filepath.Join(dir, projectManifestName), b, 0o644) + // Write-then-rename: rename is atomic, so a present project.json is + // always complete — never a half-written file from an interrupted run. + final := filepath.Join(dir, projectManifestName) + tmp := final + ".tmp" + if err := os.WriteFile(tmp, b, 0o644); err != nil { + return err + } + return os.Rename(tmp, final) } func readProjectManifest(dir string) (ProjectCheckpointRef, error) { @@ -245,6 +261,13 @@ func readProjectManifest(dir string) (ProjectCheckpointRef, error) { if len(ref.Services) == 0 { return ProjectCheckpointRef{}, fmt.Errorf("project manifest has no services") } + // Archive entries are joined onto ArchiveDir at restore; a tampered + // manifest must not escape it. Require a plain basename. + for _, s := range ref.Services { + if s.Archive == "" || s.Archive != filepath.Base(s.Archive) || strings.Contains(s.Archive, "..") { + return ProjectCheckpointRef{}, fmt.Errorf("project manifest has an unsafe archive entry %q", s.Archive) + } + } ref.ArchiveDir = dir return ref, nil } diff --git a/design/podman-backend.md b/design/podman-backend.md index 9e84387..ea826ae 100644 --- a/design/podman-backend.md +++ b/design/podman-backend.md @@ -167,10 +167,19 @@ Podman backend implements it. ### 5.1 Mapping (verified on the bench, 2026-06-19) +> **IMPLEMENTED via libpod REST, not the CLI (2026-06-21).** §2 settled the +> transport as the libpod REST API on the Podman socket (no `podman` CLI +> shell-out). The CLI flags below are kept only as the human-readable +> equivalent of the endpoints the backend actually calls: +> `POST …/libpod/containers/{id}/checkpoint?export=true&tcpestablished=&leaverunning=` +> (response body = the archive) and +> `POST …/libpod/containers/import/restore?import=true&tcpestablished=&name=` +> (archive in the request body → `{"Id":…}`). See `runtime/podman/checkpoint.go`. + ```text -Checkpoint → podman container checkpoint --export \ +Checkpoint ≈ podman container checkpoint --export \ [--tcp-established] [--leave-running=!StopAfter] -Restore → podman container restore --import \ +Restore ≈ podman container restore --import \ [--tcp-established] [--name ] ``` @@ -198,9 +207,13 @@ Either way, keep it behind the backend so the engine only sees > 7→10). The original wedge symptom traced to (a) a forcibly-killed > `podman system service` leaving a stale lock and (b) a test-harness bug > (`pkill -f "system service"` matching the runner's own shell). So -> **CLI for C/R is viable alongside the moby-over-service surface** — no -> need to switch to libpod REST. (REST remains the fallback if heavier -> concurrency ever surfaces contention.) +> **CLI for C/R is viable alongside the moby-over-service surface.** +> +> **Superseded (2026-06-21):** although CLI C/R was proven viable, the +> final implementation uses **libpod REST**, not the CLI — to keep the +> backend SDK-first and shell-out-free like the docker backend (§2/§2.1). +> The transport-wedge finding above still stands and is why REST over the +> *same* socket is safe. ### 5.3 Capability probe diff --git a/runtime/podman/build.go b/runtime/podman/build.go index 03f2e9d..9f1f390 100644 --- a/runtime/podman/build.go +++ b/runtime/podman/build.go @@ -107,6 +107,9 @@ func (r *Runtime) BuildImage(ctx context.Context, spec runtime.BuildSpec, events resp, err := r.lp.do(ctx, http.MethodPost, "/build", buildQuery(spec), pr, "application/x-tar") if err != nil { + // Unblock the tarDir goroutine still writing into pw; without + // this it leaks, parked on the full pipe. + _ = pr.CloseWithError(err) return runtime.ImageRef{}, fmt.Errorf("podman build: %w", err) } defer resp.Body.Close() diff --git a/runtime/podman/checkpoint.go b/runtime/podman/checkpoint.go index 6c6dd6b..63120d2 100644 --- a/runtime/podman/checkpoint.go +++ b/runtime/podman/checkpoint.go @@ -3,6 +3,7 @@ package podman import ( "context" "encoding/json" + "errors" "io" "net/http" "net/url" @@ -32,17 +33,23 @@ func (r *Runtime) Checkpoint(ctx context.Context, id string, spec runtime.Checkp return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Stderr: errorBody(resp)} } - f, err := os.Create(spec.ArchivePath) + // 0600: the archive carries the container's process memory — keep it + // private regardless of the caller's umask. + f, err := os.OpenFile(spec.ArchivePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) if err != nil { return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: err} } n, copyErr := io.Copy(f, resp.Body) closeErr := f.Close() - if copyErr != nil { - return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: copyErr} - } - if closeErr != nil { - return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: closeErr} + if copyErr != nil || closeErr != nil { + // Don't leave a truncated archive that a later Restore could + // mistake for a valid one. + _ = os.Remove(spec.ArchivePath) + werr := copyErr + if werr == nil { + werr = closeErr + } + return runtime.CheckpointRef{}, &runtime.CheckpointFailedError{ID: id, Err: werr} } return runtime.CheckpointRef{ArchivePath: spec.ArchivePath, Size: n}, nil } @@ -85,5 +92,8 @@ func (r *Runtime) Restore(ctx context.Context, spec runtime.RestoreSpec) (*runti if err := json.NewDecoder(resp.Body).Decode(&rep); err != nil { return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Err: err} } + if rep.ID == "" { + return nil, &runtime.RestoreFailedError{ArchivePath: spec.ArchivePath, Err: errors.New("libpod restore returned an empty container id")} + } return &runtime.Container{ID: rep.ID, Name: spec.Name, State: runtime.StateRunning}, nil } diff --git a/test/integration/podman_project_checkpoint_test.go b/test/integration/podman_project_checkpoint_test.go index 4baf4ac..338d5a5 100644 --- a/test/integration/podman_project_checkpoint_test.go +++ b/test/integration/podman_project_checkpoint_test.go @@ -55,7 +55,9 @@ func TestPodmanProject_CheckpointRestore_MultiService(t *testing.T) { t.Fatalf("PullImage: %v", err) } - // Fresh network for the project (idempotent across reruns). + // Fresh network for the project. Clear any leftover from an interrupted + // run first so CreateNetwork doesn't fail on a stale name. + _ = rt.RemoveNetwork(ctx, netName) if _, err := rt.CreateNetwork(ctx, runtime.NetworkSpec{ Name: netName, Labels: map[string]string{compose.LabelComposeProject: project}, From 5a8dd70551ced4d6c3a8422de5925670a73f17db Mon Sep 17 00:00:00 2001 From: bilby91 Date: Sun, 21 Jun 2026 22:29:38 -0300 Subject: [PATCH 3/4] ci: gate podman C/R job on podman>=5; skip green on inadequate hosted runners The test-integration-podman job failed for real reasons, not flakiness: Ubuntu 24.04 dropped criu; 22.04 ships podman 3.4.4 whose OCI runtime reports "does not support checkpoint/restore" (and 3.x predates the libpod v5 API this backend targets). criu check passing wasn't a sufficient gate. Now the job always compiles the gated tests (build step), and only runs them when the host is genuinely capable (criu + criu check + podman>=5), skipping green with a warning otherwise. Dropped continue-on-error so a failure on a capable (self-hosted) runner is honest. Real C/R is validated locally on podman 5.x + criu (OrbStack). Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/ci.yml | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index a0536b5..4a0772d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -132,18 +132,19 @@ jobs: # for real. The cross-node test (TestPodmanXNode_*) needs two hosts, so # it skips here (no DCCKPT_XNODE_DIR) — run it on two machines by hand. # - # Runner notes: criu was dropped from Ubuntu 24.04, so this pins - # ubuntu-22.04 (jammy still ships it in universe). Whether the hosted - # kernel can actually checkpoint/restore is not guaranteed, so the setup - # step SKIPS the job green when criu can't be installed or `criu check` - # fails — the job is a ready harness that runs for real wherever CRIU - # works (incl. a future self-hosted runner). continue-on-error is a - # backstop for a dump that fails after criu check passed; drop it once a - # green real run is confirmed. + # Runner reality (measured): no GitHub-hosted runner can actually run + # these — Ubuntu 24.04 dropped criu, and 22.04 ships podman 3.4.4 whose + # OCI runtime reports "does not support checkpoint/restore" (3.x also + # predates the libpod v5 API the backend targets). So this job always + # COMPILES the gated tests (catching API/signature breaks every push) and + # only RUNS them when the host is genuinely capable: criu + `criu check` + # good + podman >= 5. Hosted runners don't meet that, so the job skips + # green with a warning. Real execution is local on podman 5.x + criu + # (OrbStack — see design/private notes) and runs here automatically if + # pointed at a self-hosted podman-5 + criu runner. test-integration-podman: runs-on: ubuntu-22.04 needs: [lint, test-linux] - continue-on-error: true steps: - uses: actions/checkout@v6 - uses: actions/setup-go@v6 @@ -154,25 +155,29 @@ jobs: run: | go test -tags=integration -c ./test/integration -o ./int.test go test -c ./runtime/podman -o ./podman.test - - name: Install Podman + CRIU + crun (skip job if unavailable) + - name: Probe Podman + CRIU (skip unless the host can checkpoint) id: setup run: | set -uo pipefail sudo add-apt-repository -y universe || true sudo apt-get update -qq || true # iptables is REQUIRED: CRIU shells out to iptables-restore to lock - # the netns during a dump (without it: "execvp(iptables-restore) - # ... No such file"). + # the netns during a dump. if ! sudo apt-get install -y -qq podman criu crun iptables uidmap; then - echo "::warning::criu/podman not installable on this runner — skipping real checkpoint/restore" + echo "::warning::podman/criu not installable — skipping real C/R run" + echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 + fi + pmaj=$(podman version --format '{{.Client.Version}}' 2>/dev/null | cut -d. -f1) + if [ "${pmaj:-0}" -lt 5 ]; then + echo "::warning::podman ${pmaj:-?}.x < 5 (backend targets the libpod v5 API) — skipping real C/R run; validated locally on podman 5.x" echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 fi if ! sudo criu check; then - echo "::warning::criu check failed on this runner kernel — skipping checkpoint/restore" + echo "::warning::criu check failed on this runner kernel — skipping real C/R run" echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 fi sudo mkdir -p /etc/containers - printf '[engine]\nevents_logger="file"\n' | sudo tee /etc/containers/containers.conf >/dev/null + printf '[engine]\nevents_logger="file"\nruntime="crun"\n' | sudo tee /etc/containers/containers.conf >/dev/null sudo systemctl enable --now podman.socket for i in $(seq 1 30); do sudo test -S /run/podman/podman.sock && break; sleep 1; done sudo test -S /run/podman/podman.sock From fa74dcc15eb6d5fec77ef3ebd4592067e33693f4 Mon Sep 17 00:00:00 2001 From: bilby91 Date: Sun, 21 Jun 2026 22:48:05 -0300 Subject: [PATCH 4/4] ci: run podman checkpoint/restore for real in a privileged container MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The hosted runner's apt podman can't do C/R (24.04 has no criu; 22.04's podman 3.4.4 can't checkpoint and predates the libpod v5 API), so the previous job could only skip. Instead bring a modern stack in a container: build the gated tests on the runner (static, CGO_ENABLED=0) for compile coverage, then run them inside quay.io/podman/stable (podman 5.x + crun + criu) launched --privileged --cgroupns=host so CRIU can drive the runner's kernel. .github/scripts/podman-cr.sh smoke-tests an actual checkpoint and skips green with the real reason if the runner can't (e.g. cgroup-freezer perms), so a capable runner runs for real and an incapable one stays green. Validated in the equivalent topology (Linux VM → Docker → modern-podman container → CRIU) on a criu-capable kernel: all four integration tests pass (checkpoint/restore, buildah build, engine reattach, multi-service project). Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/scripts/podman-cr.sh | 44 ++++++++++++++++++++++ .github/workflows/ci.yml | 72 ++++++++++++------------------------ 2 files changed, 67 insertions(+), 49 deletions(-) create mode 100755 .github/scripts/podman-cr.sh diff --git a/.github/scripts/podman-cr.sh b/.github/scripts/podman-cr.sh new file mode 100755 index 0000000..d68a5af --- /dev/null +++ b/.github/scripts/podman-cr.sh @@ -0,0 +1,44 @@ +#!/usr/bin/env bash +# Run the gated Podman checkpoint/restore tests inside a modern-podman +# container (invoked by ci.yml's test-integration-podman job, which docker-runs +# this privileged + --cgroupns=host so CRIU can use the runner's kernel). +# +# The hosted runner's own apt podman is unusable (24.04 has no criu; 22.04 +# ships podman 3.4.4 whose runtime can't checkpoint and predates the libpod +# v5 API), so we bring podman 5.x + crun + criu via the container image and +# only need the runner for its kernel + Docker. +# +# Skips GREEN (exit 0 + ::warning::) if this runner can't actually +# checkpoint — e.g. the nested cgroup freezer is not permitted. Runs the +# tests for real (failing red) only once a checkpoint smoke test proves the +# environment is capable. Real C/R is also validated locally on podman 5.x + +# criu (OrbStack). +set -uo pipefail + +dnf install -y -q criu iptables >/dev/null 2>&1 \ + || { echo "::warning::criu install failed in container — skipping C/R run"; exit 0; } +echo "stack: $(podman --version) / $(criu --version | head -1) / $(crun --version | head -1)" + +criu check || { echo "::warning::criu check failed on this runner kernel — skipping C/R run"; exit 0; } + +mkdir -p /etc/containers /run/podman +printf '[engine]\nevents_logger="file"\nruntime="crun"\n' > /etc/containers/containers.conf +podman system service --time=0 unix:///run/podman/podman.sock & +for _ in $(seq 1 30); do [ -S /run/podman/podman.sock ] && break; sleep 1; done +test -S /run/podman/podman.sock || { echo "::warning::podman service socket did not come up — skipping"; exit 0; } + +# Capability smoke test: can this runner actually freeze + dump a container? +# Nested CRIU frequently can't ("Unable to freeze tasks: Operation not +# permitted"). If it can't, skip green with the real reason rather than fail. +podman run -d --name smoke docker.io/library/alpine:3.20 sleep 600 >/dev/null +sleep 2 +if ! podman container checkpoint smoke >/tmp/ckpt.log 2>&1; then + echo "::warning::this runner cannot checkpoint a container (likely cgroup freezer perms): $(tail -1 /tmp/ckpt.log) — skipping. Real C/R is validated locally on podman 5.x + criu." + exit 0 +fi +podman rm -f smoke >/dev/null 2>&1 || true +echo "checkpoint smoke passed — running the gated tests for real" + +export PODMAN_SOCKET=unix:///run/podman/podman.sock +/w/podman.test -test.run TestIntegration -test.v -test.timeout 15m +/w/int.test -test.run '^TestPodman' -test.v -test.timeout 15m diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 4a0772d..33a0b00 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -127,23 +127,21 @@ jobs: # Real Podman + CRIU checkpoint/restore. The runtime/podman backend and # the Engine checkpoint/restore + project-orchestrator paths only execute - # against a live Podman socket with CRIU; their tests skip everywhere - # else (PODMAN_SOCKET-gated). This job installs that stack and runs them - # for real. The cross-node test (TestPodmanXNode_*) needs two hosts, so - # it skips here (no DCCKPT_XNODE_DIR) — run it on two machines by hand. + # against a live Podman socket with CRIU (PODMAN_SOCKET-gated). The + # cross-node test (TestPodmanXNode_*) needs two hosts, so it skips here + # (no DCCKPT_XNODE_DIR) — run it on two machines by hand. # - # Runner reality (measured): no GitHub-hosted runner can actually run - # these — Ubuntu 24.04 dropped criu, and 22.04 ships podman 3.4.4 whose - # OCI runtime reports "does not support checkpoint/restore" (3.x also - # predates the libpod v5 API the backend targets). So this job always - # COMPILES the gated tests (catching API/signature breaks every push) and - # only RUNS them when the host is genuinely capable: criu + `criu check` - # good + podman >= 5. Hosted runners don't meet that, so the job skips - # green with a warning. Real execution is local on podman 5.x + criu - # (OrbStack — see design/private notes) and runs here automatically if - # pointed at a self-hosted podman-5 + criu runner. + # Why a container: the hosted runner's apt podman is unusable (24.04 has + # no criu; 22.04's podman 3.4.4 can't checkpoint and predates the libpod + # v5 API). So we build the gated tests on the runner (compile coverage, + # static CGO_ENABLED=0 so they run on Fedora), then run them INSIDE a + # modern-podman container (podman 5.x + crun + criu) that's privileged + + # --cgroupns=host so CRIU can drive the runner's kernel. The script + # smoke-tests an actual checkpoint first and skips green (with the real + # reason) if this runner can't — e.g. nested cgroup-freezer perms — so a + # capable runner runs for real while an incapable one stays green. test-integration-podman: - runs-on: ubuntu-22.04 + runs-on: ubuntu-latest needs: [lint, test-linux] steps: - uses: actions/checkout@v6 @@ -151,44 +149,20 @@ jobs: with: go-version: "1.25" cache: true - - name: Build gated test binaries + - name: Build gated test binaries (static; compile coverage + run in container) + env: + CGO_ENABLED: "0" run: | go test -tags=integration -c ./test/integration -o ./int.test go test -c ./runtime/podman -o ./podman.test - - name: Probe Podman + CRIU (skip unless the host can checkpoint) - id: setup - run: | - set -uo pipefail - sudo add-apt-repository -y universe || true - sudo apt-get update -qq || true - # iptables is REQUIRED: CRIU shells out to iptables-restore to lock - # the netns during a dump. - if ! sudo apt-get install -y -qq podman criu crun iptables uidmap; then - echo "::warning::podman/criu not installable — skipping real C/R run" - echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 - fi - pmaj=$(podman version --format '{{.Client.Version}}' 2>/dev/null | cut -d. -f1) - if [ "${pmaj:-0}" -lt 5 ]; then - echo "::warning::podman ${pmaj:-?}.x < 5 (backend targets the libpod v5 API) — skipping real C/R run; validated locally on podman 5.x" - echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 - fi - if ! sudo criu check; then - echo "::warning::criu check failed on this runner kernel — skipping real C/R run" - echo "skip=1" >> "$GITHUB_OUTPUT"; exit 0 - fi - sudo mkdir -p /etc/containers - printf '[engine]\nevents_logger="file"\nruntime="crun"\n' | sudo tee /etc/containers/containers.conf >/dev/null - sudo systemctl enable --now podman.socket - for i in $(seq 1 30); do sudo test -S /run/podman/podman.sock && break; sleep 1; done - sudo test -S /run/podman/podman.sock - - name: Run Podman checkpoint/restore tests (root; checkpoint needs it) - if: steps.setup.outputs.skip != '1' - env: - PODMAN_SOCKET: unix:///run/podman/podman.sock + - name: Checkpoint/restore in a modern-podman privileged container run: | - set -euo pipefail - sudo -E ./podman.test -test.run TestIntegration -test.v -test.timeout 15m - sudo -E ./int.test -test.run '^TestPodman' -test.v -test.timeout 15m + docker run --rm --privileged --cgroupns=host \ + --security-opt seccomp=unconfined \ + --security-opt apparmor=unconfined \ + --security-opt label=disable \ + -v "$PWD":/w -w /w \ + quay.io/podman/stable bash /w/.github/scripts/podman-cr.sh # Integration tests against a live Apple `container` daemon. #