feat(workflows): honor max_concurrency in fan-out via a bounded thread pool by doquanghuy · Pull Request #3224 · github/spec-kit

doquanghuy · 2026-06-29T12:43:12Z

Description

Closes #3222.

The fan-out step has carried a max_concurrency field since the workflow engine landed (#2158), but the engine ignored it: _execute_steps ran fan-out items in a sequential for loop and max_concurrency was only recorded in the step output. This honors it.

A new WorkflowEngine._run_fan_out runs items on a bounded ThreadPoolExecutor when max_concurrency > 1, and takes the existing sequential path when <= 1 (the default) — so existing workflows are byte-for-byte unchanged. Results are always assembled in item order (a preallocated slot per item, collected in submission order), never completion order, so fan-in — which reads them positionally — is unaffected. The parentId:templateId:index id grammar and halt-on-first-failure are preserved; max_concurrency is coerced with int(): a value that cannot be coerced (None, a non-numeric string) or that coerces to <= 1 runs sequentially, while a numeric string like "4" or a float like 4.0 is honored.

Fan-out items are I/O-bound — each typically dispatches a command step that spawns a blocking agent-CLI subprocess, which releases the GIL — so a thread pool yields real wall-clock parallelism.

Two concurrency care points:

Per-item context isolation — each item runs against its own dataclasses.replace(context, item=…), so context.item is never clobbered across threads; the shared steps dict is written only on the disjoint parent:template:index key.
State persistence — RunState.save() previously serialized the live step_results dict via a plain open("w"), so a concurrent fan-out could both interleave on-disk writes and mutate the dict mid-json.dump (dictionary changed size during iteration). save() is now held under a per-run lock and written atomically (temp file + os.replace), and per-item result recording goes through a small record_step_result helper under that lock. Sequential runs see only an uncontended lock.

A genuine exception escaping an item (as opposed to a normal step FAILED, which sets the run status) cancels outstanding work and re-raises, so the run is marked failed rather than reporting a vacuous completion.

Testing

Ran the workflow suite with .venv/bin/python -m pytest tests/test_workflows.py — 325 passed, including 15 new TestFanOutConcurrency cases: K≤1 sequential parity, item-order under forced reverse completion (event chain, no sleeps), real parallelism, max_concurrency coercion (0 / negative / None / non-int / string), per-thread item isolation, halt-on-failure prefix, and first-exception cancel + re-raise.
Ran the full suite — green except 3 pre-existing failures in unrelated branch-name tests (test_timestamp_branches / git extension) that fail identically on main.
uvx ruff check src/ tests/test_workflows.py — clean
Tested locally with uv run specify --help
Tested with a sample project (covered by the unit tests above)

AI Disclosure

I did not use AI assistance for this contribution
I did use AI assistance (describe below)

Code, tests, and this description were authored with AI assistance (Claude Code), from a fan-out concurrency investigation; everything was verified by running the repo's test suite and ruff locally.

@mnriem — would appreciate your review when you have a moment. Happy to swap the save() atomicity for a narrower lock if you'd prefer a smaller change.

…d pool

Copilot

Pull request overview

Implements actual parallel execution for workflow fan-out by honoring max_concurrency, while making RunState persistence safe under concurrent execution (locking + atomic JSON writes). This fits into the workflow engine’s execution model by enabling opt-in bounded parallelism for I/O-bound fan-out items without changing default sequential behavior.

Changes:

Add WorkflowEngine._run_fan_out() to execute fan-out items sequentially or via a bounded ThreadPoolExecutor depending on max_concurrency.
Make RunState.save() concurrency-safe via a per-run lock and atomic temp-file writes; route step result recording through a locked helper.
Add workflow tests covering fan-out concurrency behavior, ordering, coercion, and error/exception handling.

Show a summary per file

File	Description
`src/specify_cli/workflows/engine.py`	Adds bounded concurrent fan-out execution and hardens run-state persistence for concurrency.
`tests/test_workflows.py`	Adds a new test suite validating fan-out concurrency semantics and edge cases.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 5
Review effort level: Low

                    context.steps[step_id]["output"] = result.output
                    state.step_results[step_id]["output"] = result.output


+            futures: dict[Any, int] = {}
+            for idx in range(n):
+                if state.status in halting:
+                    break
+                futures[pool.submit(run_isolated, idx)] = idx
+            # Collect in submission (item) order: each .result() blocks on that
+            # item, so slots fill 0,1,2,… and the collected set is a prefix.
+            for fut in list(futures):
+                idx = futures[fut]
+                try:
+                    slots[idx] = fut.result()
+                except Exception as exc:
+                    # A genuine exception escaping a step (not a normal step
+                    # FAILED, which sets state.status) must not be masked: cancel
+                    # outstanding work and re-raise so the engine marks the run
+                    # failed instead of reporting a vacuous completion.
+                    first_exc = exc
+                    for other in futures:
+                        other.cancel()
+                    break
+                ran = idx + 1
+                if state.status in halting:
+                    for other in futures:
+                        other.cancel()
+                    break


+        in item order (never completion order), and the first item to reach a halting
+        run status (PAUSED/FAILED/ABORTED) stops further dispatch; on halt the
+        contiguous prefix of items that ran is returned. A non-int / 0 / negative /
+        ``None`` ``max_concurrency`` coerces to 1 (sequential).
+        """


+        t0 = time.time()
+        self._run(tmp_path, list(range(4)), 4, on_item)
+        assert time.time() - t0 < 0.6  # serialized would be >= 1.2s


+                    fan_out_results = self._run_fan_out(
+                        items, template, step_id, context, state, registry,
+                        result.output.get("max_concurrency", 1),
+                    )


doquanghuy · 2026-06-29T15:45:07Z

Thanks @copilot — all five points addressed at root cause (latest: d4479ed):

Run-ahead after halt: replaced the submit-all-up-front loop with a sliding submission window (≤ max_workers in flight) that stops launching new items once the run is halting.
Prefix could drop the halting item: halt is now attributed per item from each item's own recorded result (replaying the sequential break condition, honoring continue_on_error/aborted) rather than the shared run status a later concurrent item may have flipped — so the returned prefix includes the actual halting item, matching the sequential path. (An item that fails before recording, e.g. an unknown step type, is attributed too, since every item runs the same template.)
Unlocked output mutation: the parent fan-out output update now routes through a new RunState.set_step_output() under the run lock, so it can't race a concurrent save().
Docstring vs int() coercion: the docstring now accurately describes the behavior — numeric strings/floats are honored; only non-coercible or <= 1 runs sequentially.
Flaky timing test: switched to a monotonic clock with a looser threshold while keeping a clear gap vs the serialized baseline.

Coverage added: concurrent halt-includes-halting-item, continue_on_error-does-not-truncate, and unknown-template-type-matches-sequential. Full workflows suite green (328 passed), ruff check clean, 30× stress on the concurrent path.

@mnriem could you give this a review when you get a chance? 🙏

…ut, faithful halt Address the reviewer feedback on the bounded fan-out concurrency: - Sliding submission window: keep at most `workers` items in flight and stop launching new items once the run is halting, instead of submitting all items up front (which let the pool keep starting queued work after a halt). - Faithful halt prefix: attribute a halt to the specific item whose own recorded result halted the run (replaying the sequential break condition, honoring continue_on_error/aborted), not the shared run status a later concurrent item may have flipped. The returned prefix now includes the actual halting item, matching the sequential path. An item that fails before recording a result (e.g. an unknown step type) is attributed too, since every item runs the same template. - Lock the parent fan-out output mutation: route the post-fan-out step_results[...]['output'] update through a new RunState.set_step_output() under the run lock, so it cannot race a concurrent save(). - Docstring: describe int() coercion accurately (numeric strings / floats are honored; only non-coercible or <= 1 runs sequentially). Tests: add concurrent halt-includes-halting-item, continue_on_error-does-not- truncate, and unknown-template-type-matches-sequential coverage; make the timing test use a monotonic clock with a looser threshold to avoid CI flakiness.

Copilot

Review details

Files reviewed: 2/2 changed files
Comments generated: 5
Review effort level: Low

+        # Guards step_results mutation and save() so a concurrent fan-out cannot
+        # mutate the dict while save() is serializing it (which would raise
+        # "dictionary changed size during iteration").
+        self._lock = threading.Lock()


+            self._execute_steps(
+                [item_step], item_ctx, state, registry, step_offset=-1,
+            )
+            return context.steps.get(item_step["id"], {}).get("output", {})


+        try:
+            workers = max(1, int(max_concurrency))
+        except (TypeError, ValueError):
+            workers = 1


+    def test_concurrency_is_real(self, tmp_path):
+        import time
+
+        n = 4
+        delay = 0.2
+
+        def on_item(item):
+            time.sleep(delay)
+            return None
+
+        # Monotonic clock (immune to wall-clock adjustments). All n items run
+        # concurrently, so elapsed is ~one delay; a generous bound well under the
+        # serialized baseline keeps a clear gap while tolerating slow/loaded CI.
+        serialized = n * delay
+        t0 = time.monotonic()
+        self._run(tmp_path, list(range(n)), n, on_item)
+        elapsed = time.monotonic() - t0
+        assert elapsed < serialized * 0.6  # serialized would be >= n*delay
+


+        with ThreadPoolExecutor(max_workers=workers) as pool:
+            futures: dict[int, Future] = {}
+            next_submit = 0
+            for idx in range(n):


- append_log: serialize the log_entries append + log.jsonl write under a dedicated RunState._log_lock so concurrent fan-out workers can't interleave or corrupt log lines (kept separate from the state lock; never nested). - _run_fan_out.run_item: read the item output back through the item_ctx it executed against rather than the outer context closure — clearer and robust if StepContext ever stops sharing the steps dict by reference. - StepBase: document the thread-safety contract — STEP_REGISTRY holds one shared instance per type, so concurrent fan-out invokes execute() on the same object; implementations must be stateless/thread-safe (the built-ins already are). - test_concurrency_is_real: prove parallelism deterministically with a threading.Barrier (sequential execution can't clear it) instead of a wall-clock timing assertion.

doquanghuy · 2026-06-29T17:03:30Z

Thanks @copilot — second-pass comments addressed at root cause in ce352a3:

append_log race (engine.py:388): append_log now serializes its log_entries append + log.jsonl write under a dedicated RunState._log_lock, so concurrent fan-out workers can't interleave/corrupt log lines. It's a separate lock from the state _lock (logging shouldn't contend with save()), and since append_log is never called while _lock is held, the two never nest.
run_item returns via outer closure (engine.py:1004): now reads back through the item_ctx it executed against (item_ctx.steps) instead of the outer context.steps — same dict today, but clearer and robust if StepContext copying ever stops sharing by reference.
PR description vs int() coercion (engine.py:990): honoring numeric strings/floats is the intended contract (locked by test_string_max_concurrency_is_honored), so I updated the PR description to describe the real int() behavior (the docstring already does).
Flaky timing test (tests:2075): test_concurrency_is_real now proves parallelism deterministically with a threading.Barrier(n) — sequential execution can't clear it (times out → BrokenBarrierError), so there's no wall-clock threshold to tune or flake.
Singleton step instances (engine.py:1054): documented the StepBase thread-safety contract — STEP_REGISTRY holds one shared instance per type, so concurrent fan-out invokes execute() on the same object; implementations must be stateless/thread-safe (the built-ins already are). I chose documenting the contract over instantiating a fresh step per execution to avoid a broader behavioral change to step construction — happy to switch to fresh-instance-per-exec instead if you'd prefer that, @mnriem.

Full workflows suite green (328 passed), ruff check clean, 30× stress on the concurrency tests.

@mnriem would appreciate your review when you have a chance 🙏

feat(workflows): honor max_concurrency in fan-out via a bounded threa…

f69afd1

…d pool

doquanghuy requested a review from mnriem as a code owner June 29, 2026 12:43

mnriem requested a review from Copilot June 29, 2026 14:49

Copilot started reviewing on behalf of mnriem June 29, 2026 14:49 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

doquanghuy force-pushed the feat/3222-fanout-max-concurrency branch from 38c7798 to d4479ed Compare June 29, 2026 15:52

mnriem requested a review from Copilot June 29, 2026 16:50

Copilot started reviewing on behalf of mnriem June 29, 2026 16:50 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(workflows): honor max_concurrency in fan-out via a bounded thread pool#3224

feat(workflows): honor max_concurrency in fan-out via a bounded thread pool#3224
doquanghuy wants to merge 3 commits into
github:mainfrom
doquanghuy:feat/3222-fanout-max-concurrency

doquanghuy commented Jun 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

doquanghuy commented Jun 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

doquanghuy commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		context.steps[step_id]["output"] = result.output
		state.step_results[step_id]["output"] = result.output

Uh oh!

Conversation

doquanghuy commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

AI Disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

doquanghuy commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Review details

Uh oh!

doquanghuy commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

doquanghuy commented Jun 29, 2026 •

edited

Loading

doquanghuy commented Jun 29, 2026 •

edited

Loading