gh-136063: Fix quadratic complexity in the email header value parser by serhiy-storchaka · Pull Request #152521 · python/cpython

serhiy-storchaka · 2026-06-28T18:59:03Z

Issue: Potential Quadratic Complexity Vulnerabilities in the email Module #136063

The email header value parser advanced through the input by repeatedly slicing off the already-parsed prefix (value = value[1:], value = ''.join(remainder)), copying the whole remainder on every step. For headers built from many small tokens (long address lists, encoded-word runs, content-type parameters) this is O(n²) and can be exploited for denial of service.

This rewrites the parser to pass an index: every get_*/parse_* now takes (value, pos) and returns the parsed token together with the new position, scanning with value[pos], regex.match(value, pos), and match.end(). No remainder is ever copied.

A few things worth calling out for review:

Error messages now interpolate at most a 60-character fragment of the unparsed input (_tail()). Several of these HeaderParseErrors are raised and caught as ordinary control flow, so interpolating the full remainder was itself O(n) and reintroduced the quadratic behaviour. No test asserts on the message text.
EncodedWord.cte now holds just the encoded word rather than the entire remaining header (the old code assigned the whole remainder to .cte).
An obsolete local-part is now re-parsed from the original source text instead of from str(local_part) + remaining (the decoded rendering of the already-parsed tokens). This is both a quadratic spot and incorrect when a token rendered back to text differs from the source — e.g. an RFC 2047 encoded-word decoding to a bare special such as @, which RFC 2047 does not permit in an addr-spec anyway. Such an encoded-word is no longer decoded and the address is reported as invalid. See email: RFC 2047 encoded-word in an addr-spec local-part corrupts address parsing #152519 for the underlying correctness bug.

All test_email, test_smtplib and test_logging tests pass. Non-debug benchmarks show the previously quadratic headers now scale linearly (e.g. parse_content_type ~8× faster at 64k, get_address_list ~5.6× faster at 72k).

…arser Rewrite the parser in Lib/email/_header_value_parser.py to advance through the input using indices instead of repeatedly slicing off the already-parsed prefix. Each get_*/parse_* function now takes a (value, pos) pair and returns the parsed token together with the new position, removing the O(n) remainder copy that made parsing O(n^2). As part of this change, an obsolete local-part is re-parsed from the original source text rather than from the decoded representation of the already-parsed tokens. This only affects malformed addresses that contain an RFC 2047 encoded-word inside an addr-spec, which RFC 2047 does not permit; such an encoded-word is no longer decoded and the address is reported as invalid (pythongh-152519). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

serhiy-storchaka requested a review from a team as a code owner June 28, 2026 18:59

bedevere-app Bot added the awaiting core review label Jun 28, 2026

bedevere-app Bot mentioned this pull request Jun 28, 2026

Potential Quadratic Complexity Vulnerabilities in the email Module #136063

Open

serhiy-storchaka requested a review from bitdancer June 28, 2026 19:01

Fix NEWS entry filename

c161638

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-136063: Fix quadratic complexity in the email header value parser#152521

gh-136063: Fix quadratic complexity in the email header value parser#152521
serhiy-storchaka wants to merge 2 commits into
python:mainfrom
serhiy-storchaka:email-parser-indexing

serhiy-storchaka commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

serhiy-storchaka commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant