Skip to content

Give Firecracker forks their own mem-file instead of deferring the copy#298

Draft
sjmiller609 wants to merge 1 commit into
mainfrom
hypeship/fc-fork-local-memfile
Draft

Give Firecracker forks their own mem-file instead of deferring the copy#298
sjmiller609 wants to merge 1 commit into
mainfrom
hypeship/fc-fork-local-memfile

Conversation

@sjmiller609

Copy link
Copy Markdown
Collaborator

Summary

Firecracker standby forks (fork-from-standby-instance and fork-from-standby-snapshot) previously skipped copying the multi-GB mem-file. Instead they stored a path to the source's memory file (FirecrackerDeferredSnapshotMemoryPath), served UFFD faults from it while running, and only copied it at the fork's next standby ("materialize"). Until that first standby completed, every fork was silently dependent on a file it didn't own:

  • DeleteSnapshot / DeleteInstance / StopInstance on the source made standby-parked forks permanently unrestorable and broke running forks' next standby. Nothing tracked or guarded these dependencies.
  • Worse, a source instance's restore + re-standby diff-writes into the same inode forks were reading (snapshot-base is promoted by rename, then Firecracker writes dirty pages in place) — silent memory corruption for dependents, and mixed pre/post-mutation pages via the shared pager cache.

This PR removes the deferral entirely: fork copies always include the mem-file through the existing reflink-first directory copy (FICLONE, sparse-copy fallback). A fork owns its memory from the moment it's created, so the whole class of dependency bugs disappears by construction — the source snapshot can be deleted the instant the fork API returns.

Why this doesn't lose the UFFD win

  • On reflink-capable filesystems (CI's /ci scratch fs, prod xfs) the mem-file "copy" is an O(1) metadata clone, same cost as the deferral.
  • UFFD one-shot restore is unchanged except the pager opens the fork-local file instead of the source's. The pager's page cache is keyed by the metadata cache-key string, not by file — forks inherit the source's FirecrackerSnapshotCacheKey and their clones are byte-identical, so cross-fork page sharing behaves exactly as before.
  • The fork's first standby now writes its diff onto its own (CoW-isolated) mem-file instead of materializing a cross-instance copy first — same end state, no cross-instance read.

Removed (now dead)

  • FirecrackerDeferredSnapshotMemoryPath metadata field and all wiring in fork/snapshot-fork paths
  • materializeDeferredSnapshotMemory + base/latest alternate-path resolution (both copies)
  • repointForkDeferredSnapshotMemoryToSourceBase after running-source forks
  • lockFirecrackerSnapshotSource / snapshotSourceLocks
  • hypervisor.SnapshotOptions (existed only to carry the deferred path; Snapshot() drops the param across all hypervisors)
  • forkvm.CopyGuestDirectoryWithOptions / CopyOptions (only consumer was the mem-file skip)

Behavior changes / caveats

  • Non-reflink filesystems (ext4/overlayfs): forks now pay a real sparse copy of the mem-file at fork time instead of at first standby. Production and CI are reflink-capable (HYPEMAN_TEST_REFLINK_STRICT=1), so this only affects misprovisioned hosts — slower, but forks are still correct and independent.
  • Upgrade: standby forks created by an older build that still carry a deferred path will not restore after this change (the field is gone and the fork has no local mem-file). Accepted intentionally; drain or restore+standby existing deferred forks before rolling out if any exist.

Tests

  • Unit: lib/instances UFFD fork tests rewritten — forks now assert a fork-local mem-file with matching content on a different inode (source and snapshot-fork variants); obsolete deferred-path/alternate-path/lock-key tests removed. Passing locally along with lib/forkvm, lib/hypervisor/..., lib/guestmemory.
  • Integration: TestFCUFFDOneShotLifecycle updated to the new semantics and now also asserts DeleteSnapshot succeeds while a fork is running from it. Not validated locally — the test is currently skipped as flaky (KERNEL-1354) and this sandbox has no reflink fs; needs a CI run / manual re-enable to confirm end-to-end.
  • TestForkCloudHypervisorFromRunningNetwork fails in my sandbox on image readiness — reproduces identically on main (environmental, image pull timeout).

🤖 Generated with Claude Code

…the copy

Standby forks previously skipped the mem-file copy and stored a path to the
source's memory file, serving UFFD faults from it and copying it only at the
fork's next standby. That left forks silently dependent on the source snapshot
or instance: deleting or stopping the source stranded standby forks, and the
source's next diff snapshot mutated the shared file in place under running
forks.

Fork copies now always include the mem-file via the existing reflink-first
directory copy, so a fork owns its memory from creation and the source can be
deleted immediately. UFFD one-shot restore is unchanged except that the pager
serves from the fork-local file; the shared page cache still works because
forks inherit the source's cache key and the cloned files are byte-identical.

Removes the deferred-path machinery: FirecrackerDeferredSnapshotMemoryPath,
materialize-on-standby, base/latest alternate path resolution, snapshot source
locks, the repoint step after running-source forks, and SnapshotOptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant