Give Firecracker forks their own mem-file instead of deferring the copy#298
Draft
sjmiller609 wants to merge 1 commit into
Draft
Give Firecracker forks their own mem-file instead of deferring the copy#298sjmiller609 wants to merge 1 commit into
sjmiller609 wants to merge 1 commit into
Conversation
…the copy Standby forks previously skipped the mem-file copy and stored a path to the source's memory file, serving UFFD faults from it and copying it only at the fork's next standby. That left forks silently dependent on the source snapshot or instance: deleting or stopping the source stranded standby forks, and the source's next diff snapshot mutated the shared file in place under running forks. Fork copies now always include the mem-file via the existing reflink-first directory copy, so a fork owns its memory from creation and the source can be deleted immediately. UFFD one-shot restore is unchanged except that the pager serves from the fork-local file; the shared page cache still works because forks inherit the source's cache key and the cloned files are byte-identical. Removes the deferred-path machinery: FirecrackerDeferredSnapshotMemoryPath, materialize-on-standby, base/latest alternate path resolution, snapshot source locks, the repoint step after running-source forks, and SnapshotOptions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Firecracker standby forks (fork-from-standby-instance and fork-from-standby-snapshot) previously skipped copying the multi-GB mem-file. Instead they stored a path to the source's memory file (
FirecrackerDeferredSnapshotMemoryPath), served UFFD faults from it while running, and only copied it at the fork's next standby ("materialize"). Until that first standby completed, every fork was silently dependent on a file it didn't own:DeleteSnapshot/DeleteInstance/StopInstanceon the source made standby-parked forks permanently unrestorable and broke running forks' next standby. Nothing tracked or guarded these dependencies.This PR removes the deferral entirely: fork copies always include the mem-file through the existing reflink-first directory copy (
FICLONE, sparse-copy fallback). A fork owns its memory from the moment it's created, so the whole class of dependency bugs disappears by construction — the source snapshot can be deleted the instant the fork API returns.Why this doesn't lose the UFFD win
/ciscratch fs, prod xfs) the mem-file "copy" is an O(1) metadata clone, same cost as the deferral.FirecrackerSnapshotCacheKeyand their clones are byte-identical, so cross-fork page sharing behaves exactly as before.Removed (now dead)
FirecrackerDeferredSnapshotMemoryPathmetadata field and all wiring in fork/snapshot-fork pathsmaterializeDeferredSnapshotMemory+ base/latest alternate-path resolution (both copies)repointForkDeferredSnapshotMemoryToSourceBaseafter running-source forkslockFirecrackerSnapshotSource/snapshotSourceLockshypervisor.SnapshotOptions(existed only to carry the deferred path;Snapshot()drops the param across all hypervisors)forkvm.CopyGuestDirectoryWithOptions/CopyOptions(only consumer was the mem-file skip)Behavior changes / caveats
HYPEMAN_TEST_REFLINK_STRICT=1), so this only affects misprovisioned hosts — slower, but forks are still correct and independent.Tests
lib/instancesUFFD fork tests rewritten — forks now assert a fork-local mem-file with matching content on a different inode (source and snapshot-fork variants); obsolete deferred-path/alternate-path/lock-key tests removed. Passing locally along withlib/forkvm,lib/hypervisor/...,lib/guestmemory.TestFCUFFDOneShotLifecycleupdated to the new semantics and now also assertsDeleteSnapshotsucceeds while a fork is running from it. Not validated locally — the test is currently skipped as flaky (KERNEL-1354) and this sandbox has no reflink fs; needs a CI run / manual re-enable to confirm end-to-end.TestForkCloudHypervisorFromRunningNetworkfails in my sandbox on image readiness — reproduces identically on main (environmental, image pull timeout).🤖 Generated with Claude Code