Skip to content

gh-151518: Avoid STW starvation of attaching threads#152826

Draft
tpn wants to merge 1 commit into
python:mainfrom
tpn:gh-151518-stw-attach-fairness
Draft

gh-151518: Avoid STW starvation of attaching threads#152826
tpn wants to merge 1 commit into
python:mainfrom
tpn:gh-151518-stw-attach-fairness

Conversation

@tpn

@tpn tpn commented Jul 1, 2026

Copy link
Copy Markdown
Member

Fixes #151518.

Repeated stop-the-world requests can starve a thread that was detached when
an earlier request parked it. start_the_world() restores the tstate to
DETACHED, but the next requester can park it again before its OS thread
completes _PyThreadState_Attach().

This change distinguishes tstates parked while detached from those suspended
while attached, and marks a waiter only when tstate_wait_attach() observes
that detached-origin state. Later STW passes leave only active attach waiters
unparked until they attach. Threads that are merely detached for sleep or I/O
remain immediately parkable, and the normal uncontended attach path remains a
single CAS. The waiter flag reuses existing _PyThreadStateImpl tail padding.

The regression test runs a tight gc.collect() loop in a subprocess and
verifies that another thread can return from time.sleep() and stop the
collector.

Validation:

  • Rebased onto main at efcfb1a4e0f4.
  • On the same B200 node with 28 CPUs allocated, the exact regression test
    timed out after 30 seconds in 2/3 runs on unpatched main; the patched
    CI-style debug build passed 5/5 in about 1.2 seconds per run.
  • CI-style debug free-threaded build
    (--with-pydebug --enable-safety --enable-slower-safety --disable-gil):
    PYTHON_GIL=0 ./python -m test --fast-ci test_free_threading.test_gc test_threading
    passed, 251 tests run and 5 skipped.
  • Release free-threaded build (--disable-gil): the same focused test command
    passed, 251 tests run and 5 skipped.
  • The exact regression test also passed 5/5 in each local patched build.
  • Current-main baseline and patched _PyThreadStateImpl are both 15,792
    bytes, with all common offsets unchanged; stw_attach_waiting overlays
    __padding at offset 15,728.
  • make patchcheck passed.

Earlier no-debug performance comparisons on 64-CPU AMD and 28-CPU Intel
systems found the no-thread and detached-sleeper paths roughly flat. In the
adversarial attached/churn cases, lower repeated-collector throughput was
paired with greater worker progress, which is the intended fairness effect.

Free-threaded stop-the-world pauses can otherwise starve a thread trying to reattach after it was suspended while detached.  A tight manual gc.collect() loop can release and immediately request the next stop-the-world pause, repeatedly parking the detached thread before it can attach and make progress.

Add a distinct _Py_THREAD_SUSPENDED_DETACHED state for tstates parked from DETACHED.  tstate_wait_attach() marks an attach waiter only after observing that detached-origin suspended state, and park_detached_threads() skips only those active waiters on later stop-the-world passes.  The ordinary successful tstate_try_attach() path remains the baseline CAS-only path.

Teach the related stop-the-world paths about both suspended states, including start_the_world() and tstate_delete_common().  Keep the new wait flag after the existing hot free-threaded _PyThreadStateImpl fields so their offsets do not move.

Add a free-threaded GC regression test that runs a subprocess with a tight gc.collect() worker and verifies the main thread can reattach after sleeping and stop the worker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Free-threaded CPython can starve a thread reattaching during repeated stop-the-world GC

1 participant