Skip to content

Add an experimental opt-in libfyaml backend#805

Open
hsbt wants to merge 16 commits into
masterfrom
claude/cool-carson-18ae10
Open

Add an experimental opt-in libfyaml backend#805
hsbt wants to merge 16 commits into
masterfrom
claude/cool-carson-18ae10

Conversation

@hsbt

@hsbt hsbt commented Jul 1, 2026

Copy link
Copy Markdown
Member

Psych is tied to libyaml, which implements YAML 1.1 and rejects some valid YAML 1.2 documents. This adds an experimental alternative backend built on libfyaml, a fully YAML 1.2 compliant parser and emitter, compiled only when psych is built with --enable-libfyaml. Without the flag nothing changes and the default libyaml backend is used, and the flag is rejected on Windows where libfyaml is unsupported.

Because libfyaml follows YAML 1.2, the YAML 1.1 booleans yes, no, on, and off load as plain strings rather than true or false, which resolves the "Norway problem" where the country code no became false. This is keyed on Psych::BACKEND, so the libyaml default keeps its current behavior.

The parser and emitter are reimplemented against libfyaml's event API in translation units guarded by PSYCH_USE_LIBFYAML. Errors and positions are recovered from libfyaml's diagnostics, UTF-16 IO is transcoded to UTF-8, and scalar emission is matched to libyaml where it matters. The test suite is made backend-aware and gains positive coverage for the libfyaml behavior, and a new CI workflow builds with the flag on Linux and macOS and asserts the active backend before running the suite.

The backend is experimental. Its output is valid YAML but is formatted differently from libyaml in places and a few emitter edge cases are not yet matched, so libyaml remains the supported default.

hsbt and others added 2 commits July 1, 2026 06:48
Built only with --enable-libfyaml; without the flag the default libyaml
backend is unchanged. The parser and emitter are reimplemented against
libfyaml's event API in separate translation units guarded by
PSYCH_USE_LIBFYAML, and the backend is not supported on Windows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scalar type resolution happens in ScalarScanner, not the C backend, so
swapping to libfyaml alone still resolved yes/no/on/off to booleans. Key
the boolean set on Psych::BACKEND so the libyaml default keeps the YAML
1.1 set while the experimental libfyaml backend follows 1.2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 1, 2026 02:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental, opt-in libfyaml backend to Psych to enable YAML 1.2–compliant parsing/emitting when built with --enable-libfyaml, while keeping the existing libyaml backend as the default. The change introduces new libfyaml-backed C implementations for the parser/emitter, exposes backend identification/version info, updates boolean/scalar scanning for YAML 1.2 semantics, and adjusts the test suite + CI to be backend-aware.

Changes:

  • Add PSYCH_USE_LIBFYAML build mode (--enable-libfyaml) with libfyaml-backed parser/emitter translation units and Psych::BACKEND / Psych.libfyaml_version.
  • Update scalar scanning and tests to reflect YAML 1.2 core schema behavior (notably yes/no/on/off as strings).
  • Add a dedicated GitHub Actions workflow to compile and run the suite with the libfyaml backend on Linux/macOS.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
ext/psych/extconf.rb Adds --enable-libfyaml build option, pkg-config detection, and Windows rejection.
ext/psych/psych.h Switches header include between yaml.h and libfyaml.h based on PSYCH_USE_LIBFYAML.
ext/psych/psych.c Exposes Psych::BACKEND; adds Psych.libfyaml_version; adapts version reporting under libfyaml.
ext/psych/psych_parser.c Compiles out the libyaml parser when building with libfyaml.
ext/psych/psych_parser_fy.c New libfyaml-backed parser implementation using libfyaml’s event API.
ext/psych/psych_emitter.c Compiles out the libyaml emitter when building with libfyaml.
ext/psych/psych_emitter_fy.c New libfyaml-backed emitter implementation mapping Psych events to libfyaml emission.
lib/psych/scalar_scanner.rb Makes boolean scanning backend-aware (YAML 1.1 vs 1.2 boolean sets).
test/psych/helper.rb Adds libfyaml? helper for backend-aware tests.
test/psych/test_boolean.rb Updates boolean tests to reflect YAML 1.2 behavior and adds Norway-problem coverage.
test/psych/test_psych.rb Adds backend/version assertions and skips unsupported/format-different cases under libfyaml.
test/psych/test_string.rb Skips YAML 1.1 boolean quoting assertions under libfyaml; adds libfyaml-specific expectations.
test/psych/visitors/test_to_ruby.rb Skips YAML 1.1 yes/no/on/off boolean visitor tests under libfyaml.
test/psych/test_yaml.rb Skips a YAML-spec fixture that depends on YAML 1.1 boolean typing under libfyaml.
test/psych/test_yaml_special_cases.rb Skips YAML 1.1 off boolean special-case under libfyaml.
test/psych/test_tree_builder.rb Skips location assertions that differ under libfyaml.
test/psych/test_symbol.rb Skips a known libfyaml emitter round-trip limitation for indicator-named symbols.
test/psych/test_set.rb Skips strict dump-format assertion under libfyaml.
test/psych/test_parser.rb Skips mark position assertions that differ under libfyaml.
test/psych/test_omap.rb Skips tag-format assertion under libfyaml.
test/psych/test_json_tree.rb Skips JSON flow formatting assertions under libfyaml.
test/psych/test_exception.rb Adjusts exception context expectation (libfyaml diagnostics differ).
test/psych/test_encoding.rb Skips encoding-related expectations where libfyaml errors/tags differ.
test/psych/test_data.rb Skips strict dump-format assertion under libfyaml.
test/psych/test_coder.rb Skips formatting/tag-synthesis expectations under libfyaml.
test/psych/json/test_stream.rb Skips JSON flow formatting assertions under libfyaml.
README.md Documents the experimental libfyaml backend, build flag, behavior differences, and backend detection.
.github/workflows/libfyaml.yml Adds CI job that builds with --enable-libfyaml, verifies backend, and runs tests on Linux/macOS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ext/psych/psych.c
Comment thread ext/psych/psych_emitter_fy.c
Comment thread ext/psych/psych_emitter_fy.c
Comment thread ext/psych/psych_emitter_fy.c
Comment thread ext/psych/psych_emitter_fy.c
Comment thread test/psych/test_string.rb
Comment thread ext/psych/psych.c
Comment thread ext/psych/psych_parser_fy.c Outdated
hsbt and others added 14 commits July 1, 2026 15:25
Create a fresh parser per parse instead of reusing one via
fy_parser_reset(), which left default tag handles unset and rejected
bare ("---"-less) tag-led documents. Recover the real message and
position by switching the parser's own diagnostic object to collect
mode; creating a replacement diag crashes libfyaml 0.9.6. Drop the
spurious empty tag directive libfyaml reports.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the tag when plain_implicit or quoted_implicit is set, matching how
libyaml omits a tag that the value resolves to on reload; otherwise nil
emitted as "!<tag:yaml.org,2002:null>" instead of an empty scalar. Honor
the plain hint when choosing the scalar style, and restore the Check_Type
guards on anchor and tag so non-string arguments raise TypeError.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
libfyaml only consumes UTF-8, so a UTF-16 IO fed through the chunked
reader reached it as raw bytes and was rejected as invalid UTF-8. When
the IO's external encoding is UTF-16LE/BE, slurp the whole stream and
transcode it first; a 2-byte unit could otherwise straddle a read
boundary. Other non-UTF-8 encodings stay raw and libfyaml rejects them,
matching psych's UTF-8/UTF-16-only IO contract (Shift_JIS still raises).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xcalloc'd tag buffers leaked when StringValue, the tuple-length
check, or the emit raised mid-way. Wrap the work in rb_ensure so the
buffers are always freed, and keep the exported directive strings in a
Ruby array so the GC cannot reclaim them while their C pointers are in
use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a libfyaml? helper and guard the tests that intentionally diverge on
the experimental YAML 1.2 backend: yes/no/on/off are strings rather than
booleans, flow collections and block scalars are formatted differently,
tags and marks are reported differently, and non-ASCII tags/aliases are
rejected. test_boolean asserts the 1.2 string result directly; the
formatting and mark cases are skipped. The default libyaml build is
unaffected (every guard keys off Psych::BACKEND).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mark the reconstructed SyntaxError message UTF-8 instead of US-ASCII, so
a diagnostic that embeds a multibyte snippet of the input does not raise
Encoding::CompatibilityError when concatenated with UTF-8. Add
RB_GC_GUARD for the anchor and tag strings in the emitter (matching the
existing guard on the scalar value) so their C pointers cannot dangle if
a GC runs inside fy_emit_event_create.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cover what the libfyaml backend does distinctly, each placed with the
concern it belongs to: the Psych::BACKEND and Psych.libfyaml_version
checks in test_psych.rb, the YAML 1.2 "Norway problem" boolean case in
test_boolean.rb, and the "1.1 booleans are not quoted" emission case in
test_string.rb. The 1.2 assertions are skipped on the default libyaml
backend so the same suite passes under both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Build psych with --enable-libfyaml and run the suite on Linux and macOS
(libfyaml is not supported on Windows). A verification step asserts that
Psych::BACKEND is actually libfyaml so the job cannot silently fall back
to libyaml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Explain how to enable it (--enable-libfyaml), that it is unsupported on
Windows, the YAML 1.2 boolean behavior it brings (the "Norway problem"
fix), and how to check the active backend, with an experimental caveat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The verify command was a plain YAML scalar, so the " #{Psych::BACKEND}"
started a YAML comment and truncated the shell command, leaving an
unterminated quote. Use a block scalar so the interpolation survives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reject canonical output with NotImplementedError instead of silently
ignoring the request, and honor the implicit flag in start_sequence and
start_mapping so an implicit tag is not printed as a redundant verbose
tag. Relax the unquoted-boolean dump test to allow an optional document
end marker. Also correct the version docstrings and a stale comment
about how the parser collects diagnostics.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The libfyaml-dev package on Ubuntu is 0.8, which segfaults psych's
emitter, while 0.9.6 (used on macOS via Homebrew) passes the whole suite.
Build the same 0.9.6 release from source in the Linux CI job, and reject
libfyaml older than 0.9 in extconf so users get a clear error instead of
a runtime crash. Also drop an unused variable in set_canonical.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An adversarial review confirmed two backend-visible differences that were
not written down. Scalars emitted with the default style can be formatted
differently from libyaml, and Psych::SyntaxError#context is always nil
because libfyaml keeps the whole diagnostic in #problem.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Record the rough parse/emit benchmark so users pick the backend on purpose:
libfyaml is for YAML 1.2 semantics, and libyaml stays the choice when emit
throughput matters since libfyaml dumps about 1.7x to 1.9x slower.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hsbt hsbt force-pushed the claude/cool-carson-18ae10 branch from 68098bf to d185ff2 Compare July 1, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants