A Workspace is durable, versioned, forkable agent state that lives independent
of any single sandbox. It is not a CSI PersistentVolume; the rationale is in
docs/adr/0002-workspace-not-csi.md. The declarative model (the Workspace and
WorkspaceRevision CRDs, the revision DAG, retention, lineage) is documented by
the API types in api/v1alpha1/workspace_types.go. This page documents slice 2:
how a sandbox binds to a workspace and how its /workspace tree moves in and out
over the content-addressed store.
Binding a sandbox to a workspace
A Sandbox opts into workspace state with spec.workspaceRef:
apiVersion: mitos.run/v1kind: Sandboxmetadata: name: agent-session-7spec: source: poolRef: name: python-pool workspaceRef: name: project-acmeA sandbox without workspaceRef is unchanged: its /workspace is ephemeral, and
no hydrate or dehydrate runs. A sandbox with workspaceRef participates in the
lifecycle below.
The data path
start (activate) terminate / release ---------------- ------------------- Workspace.status.head guest /workspace | | v resolve head revision v TarDir (vsock, allowlisted) WorkspaceRevision.contentManifest tar bytes | | v cas.Materialize -> tar v strip secret excludes tar bytes cas.PutSnapshot -> digest | | v UntarDir (vsock, sanitized) v create WorkspaceRevision guest /workspace {fromClaim, contentManifest, Pending} | v Workspace controller commits head advances- Hydrate on start. When a bound claim reaches Ready, the claim reconciler
resolves the
Workspace, readsstatus.head, loads that revision’scontentManifest, and hydrates it into the sandbox/workspace. An empty workspace (no committed head) starts with an empty/workspace. Hydration runs exactly once per claim (an annotation guards against a requeue re-hydrating over in-sandbox edits); a transient transfer error requeues without failing the Ready claim. - Dehydrate on terminate. When a bound claim terminates (lifetime/idle expiry
or deletion), the reconciler dehydrates the sandbox
/workspaceBEFORE reaping the VM (the guest must still be alive to tar its workspace), creates a newWorkspaceRevision{spec.workspaceRef, source.fromClaim=<claim>, contentManifest=<digest>, phase Pending}, and the Workspace controller commits it (Pending -> Committed) and advancesstatus.head. The operation is idempotent: a claim dehydrated by a lifetime-expiry terminate is not dehydrated again on the subsequent delete.
The bulk transfer primitive
The guest agent serves two vsock ops, mirroring the ReadFile/WriteFile
message pattern:
TarDir(path): tars a directory and returns the tar bytes. The path is restricted to a workspace allowlist (only/workspaceand paths under it; never/or any secret/token path). Symlinks and other non-regular entries are skipped so a restored symlink can never re-introduce an escape.UntarDir(path, tar): extracts a tar into the target, sanitizing every member name against traversal (no absolute paths, no..escape outside the target) and refusing any non-regular member.
The tar is buffered whole on both ends, bounded by vsock.MaxTarBytes with a
matching vsock line buffer. A streaming (chunked) transfer for very large
workspaces is a later slice.
The host helpers in internal/workspace compose the primitive with the
content-addressed store (internal/cas):
Dehydrate(ctx, agent, store, excludePaths): tars/workspace, strips the exclude list, unpacks to a temp dir, andstore.PutSnapshots it; returns the manifest digest. An unchanged tree dedups to the same digest (content addressing).Hydrate(ctx, agent, store, manifest): materializes the manifest, tars it, andUntarDirs it into/workspace.
Wiring the transport: husk delegation
The controller is NOT on the node and cannot reach the guest vsock or the node
CAS, so in husk mode it DELEGATES the hydrate/dehydrate to the node component that
owns both: the husk-stub. The husk pod runs the VM, owns its vsock, and mounts the
node CAS (<dataDir>/cas) read-write. The stub serves two control ops over the
SAME mTLS control channel that already carries activate and the fork-snapshot
ops:
dehydrate-workspace(excludePaths, capturePaths, parentManifestDigest): runs the guest vsockTarDirover/workspace, stores the content-addressed chunks plus manifest into the node CAS, and returns the manifest digest. It reusesinternal/workspace.Dehydrate(the KVM-proven tar round trip); it does not reimplement tar or CAS. Secret/credential paths are excluded per the no-secrets-in-revisions policy. WhenparentManifestDigestis set (a{diff: true}terminate) it ALSO computes the content-hash diff of the new revision against that parent and returns it alongside the digest: the diff is computed from the two MANIFESTS (path -> chunk-digest lists) in the node CAS, not the chunk bytes, reusinginternal/workspace.DiffManifests. The diff must run here because the controller is off-node and cannot read either node-CAS manifest. Fail-closed: it requires an active VM and a configured node CAS, and never returns content bytes or secrets in an error.hydrate-workspace(manifestDigest): reads the manifest plus chunks from the node CAS and runsinternal/workspace.HydratetoUntarDirit into/workspace.
The controller dials the claim’s husk pod control channel
(DehydrateWorkspaceOnHusk / HydrateWorkspaceOnHusk, mirroring the fork
ForkSnapshotOnHusk path) and STILL owns the WorkspaceRevision commit + head
advance once the stub returns the manifest digest (and the optional diff
summary). The delegation is gated by EnableHuskPods; the default
hydrate/dehydrate path AND the {diff: true} path self-wire to the husk control
op, so the previous “transport is not wired” seam no longer fires in husk mode for
hydrate, dehydrate, or diff. The raw-forkd path has no in-controller transport and
keeps the documented seam (it reads the manifests in-controller for the diff).
The {git} rendezvous push is currently BEST-EFFORT on the husk path: reading the
spec.git.paths content out of the node CAS to the controller is not yet wired, so
in husk mode the push is logged and skipped rather than failing the terminate, and
the CRITICAL path (revision commit + head advance + fork-sees-state) is never
blocked. The cluster e2e treats git push as best-effort. Fully wiring the husk
{git} push (a node-CAS read op that returns the repo-paths content to the
controller, which does the credentialed push so the credential stays in the
controller and never reaches the pod) is a documented follow-up. The raw-forkd
{git} path is fully wired.
Single-writer-per-workspace
A Workspace is bound to at most one active claim at a time. A claim referencing
a workspace already bound to another active claim PENDS with the Ready condition
reason WorkspaceBusy and retries; it acquires no VM until the first claim
releases the workspace. This keeps two sandboxes from racing to dehydrate the same
workspace into divergent heads.
Outputs: capturing only what matters, with a diff
By default the dehydrate-on-terminate captures the whole /workspace tree into
the new revision. A sandbox narrows and enriches that capture with
spec.lifetime.onTerminate.outputs:
apiVersion: mitos.run/v1kind: Sandboxmetadata: name: agent-session-7spec: source: poolRef: { name: python-pool } workspaceRef: { name: project-acme } lifetime: onTerminate: outputs: - { path: /workspace/dist } # capture only this subtree - { diff: true } # record the diff vs the parent head - { git: { remote: rendezvous, branch: "attempt/{{.name}}" } }- A
{path}output narrows the captured revision to that/workspacesubtree. With anypathoutput set, only the union of those subtrees enters the revision; with nopathoutput the whole workspace is captured (the default). The filter is a prefix match on the workspace-relative file names, sodistcapturesdist/app.jsbut neverdistractor/x.txt. - A
{diff: true}output records a content-hash diff of the new revision against the workspace head before it onWorkspaceRevision.status.diffSummary: the added, removed, and modified file names plus counts. Modified means a file present in both whose chunk digests differ (its content changed). An unchanged tree diffs to empty. This is a content diff, not rename-aware: a rename shows as a delete plus an add on the workspace side (git handles renames on the repo-paths side).
The secret exclude list still applies to the captured set, so outputs never widen what a revision may carry.
Git rendezvous: fork-and-merge through git
A {git} output pushes the workspace repo paths to a rendezvous remote on a
per-attempt branch. On terminate, for each {git} output the controller resolves
the workspace spec.git.paths content from the just-dehydrated revision,
materializes it into a temporary worktree, makes one deterministic commit, and
pushes it to the output’s remote on a branch rendered from the output’s
branch template (a text/template over {{.name}}, the claim name, defaulting
to attempt/<name>). The push is recorded on
WorkspaceRevision.status.gitPushes (the branch and remote).
GIT IS THE MERGE LAYER. The engine only ever pushes a branch; it never merges working trees. Fork-and-merge means: fork the workspace, run each attempt in its own sandbox, push each attempt’s repo paths to its own per-attempt branch, and let a human or CI merge the branches with git. There is no automatic merge by design.
Honest behavior:
- A
{git}output on a workspace with nospec.git.pathsis a no-op with a logged warning: there is nothing to push. - A push failure surfaces on the claim/revision condition and the terminate retries it; it is never silently swallowed. The revision and the dehydrated marker are made durable first, so a failing push never loses the captured work.
- A
{git}output is a NEW EGRESS of tenant repo data to an operator-declared external remote; see docs/threat-model.md section 3.
The push uses the host git CLI via exec, so it adds no new dependency. CI’s
Linux runner has git; the controller image must ship git for the production path
(the tests skip gracefully when git is absent so the unit suite is not flaky).
Authenticating the push to an external remote
A push to a real external rendezvous remote (not a local bare repo) needs credentials. Set them on the workspace:
spec: git: paths: ["/workspace/repo"] credentialsUsername: x-access-token # not a secret; defaults to x-access-token credentialsSecretRef: name: rendezvous-creds # a Secret in the workspace namespace key: token # the key holding the push tokenThe controller resolves the Secret at push time and hands the token to git. The token is a SECRET VALUE and is handled accordingly:
- It NEVER appears on the git argv (so it cannot show up in a process table), in a log line, in an error, in a status condition, or in a committed revision.
- It reaches git ONLY through a mode
0o600.git-credentialsfile written into an ephemeral, isolatedHOMEthat is created per push and removed when the push returns (credential.helper=storereads only that file). - Any git output that is surfaced in an error has the token scrubbed defensively before it is returned.
- A missing or empty key is reported with an LLM-legible error that names only the Secret and the key, never the value.
Credentials require an https:// (or http://) remote that can carry basic
auth; a file:// or scp-like remote with credentials is rejected.
The matching server side is cmd/rendezvous-server (package
internal/rendezvous): a minimal authenticated git-http server that wraps the
stock git http-backend CGI behind HTTP basic auth, with its token mounted from
a Secret. It creates a bare repo on first push and enables receive-pack only
after the auth check passes. Deploy it as the remote the {git} output targets,
or point the output at any standard authenticated git remote (a forge or an
internal git server). An end-to-end test
(internal/rendezvous/server_test.go:TestRendezvousCredentialedPushLandsOnServer)
proves the credentialed push authenticates to this server and lands the branch.
Cluster-gated: the unit and envtest suites prove the credentialed push logic and
the token redaction against a local authenticated server. The push to a REAL
external rendezvous server on a live cluster is the gated tail; see
test/cluster-e2e/workspace-e2e.sh (run with
gh workflow run cluster-e2e.yaml -f suite=workspace).
This {git} egress with Secret credentials is a security-surface move; see
docs/threat-model.md section 3.
The revision change feed for external indexers
The controller emits a CloudEvents 1.0 feed so an external indexer (a vector DB, a search index, an audit sink) can react to workspace changes WITHOUT Mitos embedding any indexer of its own. There is no built-in vector store by design (ROADMAP.md, EPIC W4): indexing is the consumer’s job, fed by this event.
A dev.mitos.workspace.revision.created event is emitted when a
WorkspaceRevision is created. The envelope is a structured-mode CloudEvent
(internal/eventfeed/event.go); the data payload is RevisionCreatedData:
{ "specversion": "1.0", "type": "dev.mitos.workspace.revision.created", "source": "mitos.run/controller", "subject": "<namespace>/<revision-name>", "id": "<stable-dedupe-id>", "time": "2026-06-14T00:00:00Z", "datacontenttype": "application/json", "data": { "workspace": "<workspace-name>", "revision": "<revision-name>", "contentManifest": "<lowercase-hex-sha256-digest>", "lineage": "fromClaim:<claim> | fromWorkspaceRevision:<revision> | root", "memorySnapshotRef": "<snapshot-id-when-resumable>", "traceId": "<orchestrator-trace-id-when-enabled>" }}The payload carries content-addressed POINTERS and lineage NAMES only: a
manifest digest, not content; a snapshot id, not snapshot bytes; never a secret
value. An indexer materializes the revision’s files itself from the content store
using the contentManifest, then indexes them.
Delivery is via the existing WebhookSink (internal/eventfeed/sink.go), which
POSTs each event to an operator-configured webhook URL. The id is a stable
dedupe key so a re-emit of the same logical event reproduces the same envelope,
letting the consumer dedupe idempotently.
Secrets are never captured into a revision
Secret values live only in the guest’s in-memory configured env (delivered over
the configure message), never on disk under /workspace. As defense in depth,
the dehydrate is passed an explicit exclude list
(controller.WorkspaceSecretExcludePaths: .netrc, .git-credentials, .ssh,
.aws, .config/gh, .npmrc) so a careless agent that wrote a token to one of
those conventional paths still does not leak it into a committed revision.
SDK and CLI surface
A user creates, binds, logs, diffs, forks, reverts, and terminates-with-outputs
a workspace through the SDKs and the mitos ws CLI without hand-writing a CRD.
The verbs are git-shaped and map onto the revision DAG: a fork is a new
WorkspaceRevision whose source.fromWorkspaceRevision points at the parent, in
a (possibly new) workspace; a revert is a new tip in the same workspace that
shares a past revision’s content. Refusals carry an LLM-legible
{code, cause, remediation}: forking an uncommitted revision is
revision_not_committed.
Python:
from mitos import AgentRun
run = AgentRun(namespace="team-a")ws = run.create_workspace("proj-x")
# Bind a sandbox to the workspace: the controller hydrates the head into# /workspace on start and dehydrates a new committed revision on terminate.sb = run.sandbox(image="python", workspace="proj-x", ready=True)sb.files.write("/workspace/data.txt", "hello")
# Terminate with outputs: keep only a subtree, record a diff, push repo paths.sb.terminate(outputs=["/workspace", {"diff": True}], checkpoint=False)
for rev in ws.log(): # newest first print(rev.name, rev.phase, rev.lineage)ws.diff(ws.log()[0].name) # path-level content-hash diffbranch_rev = ws.fork(ws.log()[0].name, "proj-x-branch") # content-addressed branchws.revert("proj-x-1") # new tip sharing a past revision's contentTypeScript:
import { AgentRun, KubeConfigApi } from "@mitos/sdk";
const run = new AgentRun({ k8s: new KubeConfigApi(), namespace: "team-a" });const ws = await run.createWorkspace("proj-x");
const sb = await run.create("python-pool", { workspace: "proj-x" });await sb.files.write("/workspace/data.txt", "hello");await sb.terminate({ outputs: ["/workspace", { diff: true }] });
const revs = await ws.log(); // newest firstawait ws.fork(revs[0].name, "proj-x-branch");await ws.revert("proj-x-1");CLI: mitos ws create|ls|log|diff|fork|revert|rm|bind (see docs/cli.md).
Resumable head: pairing the workspace with a VM memory snapshot
A plain terminate dehydrates /workspace into a content-only revision. A
terminate(checkpoint=True) additionally pairs the new revision with the
sandbox’s VM MEMORY snapshot, so the workspace head becomes RESUMABLE: a later
claim bound to the workspace resumes MID-EXECUTION from the captured VM state
(memory image + filesystem state, restored together) instead of a cold start.
This is the “sleep / wake” of the reversible sleep-consolidation demo
(examples/sleep-consolidation/).
The pairing is two refs on the committed WorkspaceRevision:
memorySnapshotRef: the snapshot pointer (a content address / snapshot id, never the memory bytes);memorySnapshotPrincipal: the principal the image is BOUND to (the capturing claim’sServiceAccount).
On a new claim’s activation the controller resumes the head’s memory image ONLY
when (a) the head pairs a snapshot, (b) the snapshot still exists (a GC’d
snapshot degrades to a content-only hydrate), AND (c) the activating claim’s
principal MATCHES memorySnapshotPrincipal. A principal MISMATCH is REFUSED,
fail-closed (an error, never a silent cold-start downgrade): a memory image
carries secrets-in-RAM and is never served across principals. A resume reseeds
the guest CRNG and steps the wall clock exactly like a fork (see
docs/fork-correctness.md); the principal binding is a threat-model row (see
docs/threat-model.md).
The disk+memory pairing logic, the resumable status, and the principal-binding
refusal are proven END TO END in envtest behind the checkpoint/resume/exists
seams (internal/controller/resumable_envtest_test.go,
TestResumableHeadFromMemorySnapshot, including the cross-principal sa-b
intruder case). The seams are bound to the husk live-VM snapshot path behind the
controller --workspace-memory-snapshots flag
(internal/controller/workspace_memory_snapshot.go); off by default a
checkpoint-on-terminate fails loud rather than producing a falsely-resumable
revision.
CLUSTER-GATED: the real bare-metal VM-memory image (a live Firecracker memory
snapshot resuming mid-execution) needs a KVM-capable node, the same gate as the
husk e2e (.github/workflows/cluster-e2e.yaml). Cluster-verify:
# Controller deployed with the flag:kubectl -n mitos get deploy mitos-controller -o yaml | grep workspace-memory-snapshots# Run the demo on the KVM cluster, then confirm the head is resumable:examples/sleep-consolidation/run.sh mitos-e2e ~/.kube/kvm-clustermitos -n mitos-e2e ws log sleep-demo # head row RESUMABLE = trueWithout the gate the filesystem state still round-trips (hydrate/dehydrate is
KVM-proven), so the head is content-only and a wake restores disk state but not
the live memory image. Timing for the sleep (checkpoint+dehydrate) and wake
(resume+hydrate) phases is reproducible from bench/sleep-consolidation.sh; no
number is published until it is recorded from that script (no-unverified-claims).
Proven vs open
PROVEN:
- The bulk transfer and CAS round trip on KVM:
cmd/ws-smokeboots two real VMs, writes a known tree (nested + binary content + a secret file) into the source/workspace, Dehydrates to a CAS digest, Hydrates into the destination/workspace, and asserts every file is byte-identical while the secret is excluded (the gated KVM phase in.github/workflows/kvm-test.yaml). - The binding + revision lifecycle in envtest: hydrate-on-activate with the head
manifest, dehydrate-on-terminate creating a
fromClaimrevision that advances the head, single-writerWorkspaceBusy, an unbound claim unaffected, and the secret exclude list passed to dehydrate (internal/controller/workspace_binding_test.go). - Outputs extraction: the path filter captures only the listed subtrees and the
content-hash diff detects add/remove/modify (unchanged diffs to empty) in unit
tests (
internal/workspace/outputs_test.go); an envtest proves a{path}output narrows the dehydrate capture and a{diff: true}output records the diff summary on the new revision. - Git rendezvous: the push to a LOCAL bare repo lands the per-attempt branch with
exactly one commit carrying the repo-paths content
(
internal/workspace/git_test.go); an envtest proves a{git}output renders the branch, calls the rendezvous push with the resolved repo files, and records the push on the revision status. - Credentialed git rendezvous: a workspace
spec.git.credentialsSecretRefis resolved and the token delivered to git only through an ephemeral, mode0o600.git-credentialsfile in an isolatedHOME, never on argv, in a log, in an error, in a condition, or in a revision; redaction on the failure path is asserted in unit tests and the token-never-in-conditions invariant in an envtest (internal/workspace/git_test.go,internal/controller/workspace_binding_test.go). - The authenticated rendezvous server:
internal/rendezvouswrapsgit http-backendbehind HTTP basic auth (constant-time token compare); a unit test rejects an unauthenticated push with 401, accepts an authenticated one, and an end-to-end test proves the engine’s credentialed push lands the branch on the server (internal/rendezvous/server_test.go). - The revision change feed: the
dev.mitos.workspace.revision.createdCloudEvent with pointers-and-names-only payload, delivered viaWebhookSink(internal/eventfeed,internal/controller/eventfeed.go); documented above for external indexers (no built-in vector store by design). - The SDK/CLI surface: the controller-side fork/revert verbs with LLM-legible
rejection (
internal/controller/workspace_verbs.go), themitos wsCLI over a clusterWorkspaceBackend(internal/agentcli/workspace_{backend,cmd}.go,clusterbackend.go), the PythonWorkspacehandle plusterminate(outputs=..., checkpoint=...)(sdk/python/mitos/workspace.py), and the TypeScript parity (sdk/typescript/src/workspace.ts), each unit-tested. - Workspace store encryption at rest: when
spec.store.encryptionKeyRefis set, every revision chunk and manifest is encrypted with AES-256-GCM under a data-encryption key (DEK). The DEK the controller generates is keyed per-template (by templateID,internal/controller/enc_key_secret.go), so a template’s workspaces share its DEK; the nonce scheme is digest-keyed and safe under a shared key, but isolation granularity is the template. Per-workspace crypto-shred is a future option if finer granularity is wanted. It uses the same KMS envelope custody as templates; the DEK is unwrapped only in node memory and never logged. The manifest digest stays computed over plaintext, so an encrypted dehydrate yields the SAME content identifier as a plaintext dehydrate (dedup preserved) and the round trip is byte-identical with chunks ciphertext at rest; a wrong key fails closed. All asserted ininternal/workspace/encryption_test.go. - The S3 object-store backend: an S3-compatible bucket
(
spec.store.s3+objectStorageRef) as an alternative to the node CAS, with the same content-addressed interface, the same revision digest for a tree (drop-in), byte-identical round trip, dedup by chunk-digest object key, and composition with per-workspace encryption (ciphertext at rest in the bucket). Credentials come from a referenced Secret; the secret-access-key derives only the SigV4 signing key and never appears on the wire in cleartext. Proven ininternal/workspace/s3store_test.goands3client_test.go. - Workspace benchmarks:
bench/workspace-hydrate-latency.sh(hydrate / dehydrate wall clock, recording the store mode) andbench/workspace-fork-latency.sh(fork wall clock plus an O(0)-new-bytes assertion: the forked revision shares the parent content manifest). Method-only; numbers live inbench/results/only when reproducible from the scripts.
OPEN (later W4 slices):
- The push to a REAL external rendezvous server on a LIVE cluster is the gated
tail (
test/cluster-e2e/workspace-e2e.sh); the credentialed-push logic, the token redaction, and the authenticated server are proven in unit + envtest above. There is no auto-merge by design: git is the merge layer. - The fork-sees-committed-state path and the S3 / encrypted round trip on a LIVE
cluster are the gated tail (
test/cluster-e2e/workspace-e2e.sh); the round trip, the dedup, and the encryption are proven offline in unit tests above. - The memory-snapshot pairing that produces a resumable head is DONE: the
disk+memory pairing, the resumable status, and the principal-binding refusal are
envtest-proven and wired behind
--workspace-memory-snapshots(see “Resumable head” above). OPEN tail: the real bare-metal live-VM memory image resuming mid-execution, which is cluster-gated on a KVM node. - A streaming (non-buffered) tar for very large workspaces (this slice caps and
buffers; see
vsock.MaxTarBytes). - The REAL vsock + node-CAS + VM round trip behind the husk control ops on a LIVE
cluster (the workspace e2e commit/fork/fork-sees-state stages,
test/cluster-e2e/workspace-e2e.sh) is the gated KVM tail. The husk-stub ops (dehydrate-workspace/hydrate-workspace) and their tar -> CAS -> manifest round trip are unit-proven with a fake vsock + temp node CAS (internal/husk/workspace_test.go,internal/husk/netcontrol_test.go), and the controller delegation + commit + head advance is envtest-proven (internal/controller/workspace_husk_delegate_test.go).
DONE in this slice:
- The husk-mode controller-to-node transport wiring for the default hydrate/dehydrate path: the controller delegates to the husk-stub control op that owns the VM vsock and the node CAS, and still owns the revision commit + head advance (see “Wiring the transport: husk delegation” above). The raw-forkd path keeps the documented seam.