Documentation
Summary
While investigating tarfile.extractall(filter="data") on Windows, I noticed that the current extraction guidance does not explicitly mention NTFS Alternate Data Stream (ADS) path syntax or Windows reserved pathnames.
Adding a short note to the documentation would help developers understand that archive member names may still require platform-specific validation on Windows.
Background
PEP 706 and the current tarfile documentation describe filter="data" as the recommended extraction filter for general-purpose data archives, while also explaining that extraction filters cannot protect against every filesystem-specific behavior.
During my investigation, I confirmed that archive member names such as:
file.txt:secret
file.txt:Zone.Identifier
are interpreted by NTFS as Alternate Data Streams rather than ordinary filenames.
This behavior is consistent with Windows path semantics, but it is not currently mentioned in the extraction guidance.
Suggested Documentation Improvement
The existing "Hints for further verification" section already recommends validating filenames before extraction.
It may be helpful to explicitly mention Windows reserved pathnames and NTFS Alternate Data Streams there.
For example, something along the lines of:
On Windows, archive member names may contain NTFS Alternate Data Stream (ADS) syntax or other reserved pathnames that are interpreted by the filesystem. Applications extracting archives from untrusted sources should validate filenames according to their platform requirements (for example, using os.path.isreserved() where appropriate).
The exact wording is, of course, up to the maintainers.
Why this may help
This would:
Clarify Windows-specific behavior.
Make the extraction guidance more complete.
Point users toward the existing os.path.isreserved() helper.
Help developers perform appropriate platform-specific filename validation when extracting archives from untrusted sources.
Investigation
I originally investigated this behavior on current upstream/main while preparing a private PSRT report.
After discussion with the PSRT, the conclusion was that this is better handled as a documentation improvement rather than a security issue, so I'm opening this public issue instead.
If this direction looks reasonable, I'd be happy to prepare a focused documentation PR.
Linked PRs
Documentation
Summary
While investigating tarfile.extractall(filter="data") on Windows, I noticed that the current extraction guidance does not explicitly mention NTFS Alternate Data Stream (ADS) path syntax or Windows reserved pathnames.
Adding a short note to the documentation would help developers understand that archive member names may still require platform-specific validation on Windows.
Background
PEP 706 and the current tarfile documentation describe filter="data" as the recommended extraction filter for general-purpose data archives, while also explaining that extraction filters cannot protect against every filesystem-specific behavior.
During my investigation, I confirmed that archive member names such as:
file.txt:secret
file.txt:Zone.Identifier
are interpreted by NTFS as Alternate Data Streams rather than ordinary filenames.
This behavior is consistent with Windows path semantics, but it is not currently mentioned in the extraction guidance.
Suggested Documentation Improvement
The existing "Hints for further verification" section already recommends validating filenames before extraction.
It may be helpful to explicitly mention Windows reserved pathnames and NTFS Alternate Data Streams there.
For example, something along the lines of:
The exact wording is, of course, up to the maintainers.
Why this may help
This would:
Clarify Windows-specific behavior.
Make the extraction guidance more complete.
Point users toward the existing os.path.isreserved() helper.
Help developers perform appropriate platform-specific filename validation when extracting archives from untrusted sources.
Investigation
I originally investigated this behavior on current upstream/main while preparing a private PSRT report.
After discussion with the PSRT, the conclusion was that this is better handled as a documentation improvement rather than a security issue, so I'm opening this public issue instead.
If this direction looks reasonable, I'd be happy to prepare a focused documentation PR.
Linked PRs