feat(types): support CHAR/VARCHAR/BINARY/VARBINARY in data type json parser by SteNicholas · Pull Request #392 · alibaba/paimon-cpp

SteNicholas · 2026-06-30T09:20:14Z

Purpose

Linked issue: #197

CHAR, VARCHAR, BINARY, and VARBINARY were already registered as keywords
(in the Keyword enum and Keywords() map), but ParseTypeByKeyword had no
case for them, so they fell through to the default branch. Deserializing any
schema that contained them failed with:

deserialize failed, possibly type incompatible: parse data type failed, error msg: Invalid: Unsupported type: VARCHAR

This change maps them to the corresponding Arrow types, consistent with how
STRING and BYTES are handled:

CHAR / VARCHAR -> arrow::utf8()
BINARY / VARBINARY -> arrow::binary()

It reuses the existing ParseStringType<T>() helper. Arrow has no fixed-length
char/binary type, so the optional length parameter (e.g. the (10) in
VARCHAR(10)) is parsed but not retained on the Arrow type. The declared length
is validated to be within [1, INT_MAX] (both inclusive), consistent with Java
Paimon; out-of-range lengths such as VARCHAR(0) or VARCHAR(2147483648) are
rejected.

Tests

DataTypeJsonParserTest.ParseTypeAtomicTypeSuccess is extended to cover each new
type with and without a length argument (CHAR, CHAR(10), VARCHAR,
VARCHAR(10), BINARY, BINARY(10), VARBINARY, VARBINARY(10)), the
inclusive length boundaries (CHAR(1), VARCHAR(2147483647)), and invalid
lengths (VARCHAR(0), VARBINARY(0), VARCHAR(2147483648), plus the existing
VARCHAR(test)).

WriteAndReadInteTest.TestCharVarcharBinaryVarbinaryTypes adds an end-to-end
write/read case (parquet/orc): it creates a table, then evolves to a hand-written
schema whose columns declare CHAR(10)/VARCHAR(20)/BINARY(10)/VARBINARY(20)
(the Arrow-based serializer always renders string/binary as STRING/BYTES, so
the schema is written by hand), and writes then reads back rows — including a
NULL row — through the full Paimon write/commit/scan/read flow.

API and Format

No public API (include/) or storage format/protocol change. This only broadens
schema deserialization to accept type strings that previously errored.

Documentation

Yes. docs/source/user_guide/data_types.rst is updated to mark
CHAR/VARCHAR -> Utf8 and BINARY/VARBINARY -> Binary as supported, with
a note that the declared length is not enforced.

Generative AI tooling

Generated-by: Claude Code (Claude Opus 4.8)

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

lxy-9602

+1

…parser Previously CHAR/VARCHAR/BINARY/VARBINARY were registered as keywords but had no handling branch in ParseTypeByKeyword, so deserializing a schema that contained them failed with "Unsupported type: VARCHAR". Map CHAR/VARCHAR to arrow::utf8() and BINARY/VARBINARY to arrow::binary() (consistent with STRING and BYTES), parsing the optional length parameter and validating it is within [1, INT_MAX] (both inclusive), consistent with Java Paimon. Add parser test cases covering these types with and without a length argument (including the min/max length boundaries and invalid lengths), an end-to-end write/read integration test exercising the types through the full Paimon flow, and update the data type mapping doc accordingly. Use a custom raw-string delimiter (R"json(...)json") for the schema JSON in the integration test so that the embedded type strings such as "CHAR(10)" do not terminate the raw string early at their internal )" sequence.

Copilot AI review requested due to automatic review settings June 30, 2026 09:20

Copilot AI reviewed Jun 30, 2026

SteNicholas force-pushed the PAIMON-197 branch 3 times, most recently from 7f7d92a to ad1be82 Compare June 30, 2026 09:29

lxy-9602 reviewed Jun 30, 2026

View reviewed changes

Comment thread src/paimon/common/types/data_type_json_parser_test.cpp

SteNicholas force-pushed the PAIMON-197 branch 2 times, most recently from 3dbddd7 to 53f54e5 Compare June 30, 2026 10:05

SteNicholas requested a review from lxy-9602 June 30, 2026 10:07

lxy-9602 previously approved these changes Jun 30, 2026

View reviewed changes

SteNicholas dismissed lxy-9602’s stale review via fd9ae34 June 30, 2026 11:18

SteNicholas force-pushed the PAIMON-197 branch 3 times, most recently from 7e44844 to 879f85c Compare June 30, 2026 11:59

SteNicholas force-pushed the PAIMON-197 branch from 879f85c to 11196f6 Compare June 30, 2026 15:19

SteNicholas requested a review from lxy-9602 June 30, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(types): support CHAR/VARCHAR/BINARY/VARBINARY in data type json parser#392

feat(types): support CHAR/VARCHAR/BINARY/VARBINARY in data type json parser#392
SteNicholas wants to merge 1 commit into
alibaba:mainfrom
SteNicholas:PAIMON-197

SteNicholas commented Jun 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lxy-9602 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SteNicholas commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Generative AI tooling

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lxy-9602 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SteNicholas commented Jun 30, 2026 •

edited

Loading