Skip to content

[QNN-EP] add gather handler for transpose optimizer#28755

Open
quic-muchhsu wants to merge 3 commits into
microsoft:mainfrom
CodeLinaro:dev/muchhsu/add_gather_handler_transpose_optimization_code_linaro
Open

[QNN-EP] add gather handler for transpose optimizer#28755
quic-muchhsu wants to merge 3 commits into
microsoft:mainfrom
CodeLinaro:dev/muchhsu/add_gather_handler_transpose_optimization_code_linaro

Conversation

@quic-muchhsu

@quic-muchhsu quic-muchhsu commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Description

Gather is missing from the transpose optimizer's handler map. This PR adds HandleGather, which pushes a Transpose past a Gather so the
transpose can cancel or fuse with neighboring transposes.

The rewrite is:

data -> Transpose(perm) -> Gather(axis=k, indices=scalar_const)

becomes

data -> Gather(axis=perm[k], indices=scalar_const) -> Transpose(SqueezePerm({perm[k]}, perm))

Scope is intentionally narrow — only the scalar (0-D) constant-indices case, which is structurally identical to a Squeeze along the gathered axis and reuses SqueezePerm for the post-rewrite output perm. Non-scalar or non-constant indices are left untouched. The framework's DefaultCostCheck still gates the rewrite on profitability, so unprofitable cases are rejected before the handler runs.

Tests added in transpose_optimizer_test.cc cover the positive scalar-indices case, negative-axis normalization, rank-1-indices fall-through, and dynamic-indices fall-through.

Motivation and Context

We've observed that the Transpose → Gather → Transpose pattern produces sub-optimal inference performance on the QNN EP. With this
handler in place, the surrounding transposes can fold or cancel, eliminating layout-conversion overhead around Gather.

@quic-muchhsu quic-muchhsu changed the title add gather handler for transpose optimizer [QNN-EP] add gather handler for transpose optimizer Jun 3, 2026
xadupre
xadupre previously approved these changes Jun 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ONNX Gather support to the transpose optimizer so a leading Transpose can be pushed past a Gather when indices is a 0-D constant scalar, enabling subsequent transpose fusion/cancellation (notably improving QNN EP graphs with Transpose→Gather→Transpose patterns).

Changes:

  • Implement HandleGather in the transpose optimizer to remap axis under the incoming perm, push the transpose through, and emit the correct reduced-rank output permutation via SqueezePerm.
  • Register the new handler in the optimizer’s ONNX handler map.
  • Add unit tests covering: positive scalar-indices rewrite, negative-axis normalization, and two explicit no-opt fall-through cases (rank-1 indices and non-constant indices).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc Adds a narrow Gather handler for scalar constant indices and registers it in the default handler map.
onnxruntime/test/optimizer/transpose_optimizer_test.cc Adds coverage for the new Gather rewrite and key non-applicable cases.

Comment thread onnxruntime/test/optimizer/transpose_optimizer_test.cc Outdated
Comment thread onnxruntime/test/optimizer/transpose_optimizer_test.cc Outdated
@quic-muchhsu quic-muchhsu requested a review from edgchen1 June 24, 2026 20:54
@edgchen1 edgchen1 enabled auto-merge (squash) June 25, 2026 18:10
@quic-muchhsu quic-muchhsu requested a review from edgchen1 June 25, 2026 21:25
@edgchen1

Copy link
Copy Markdown
Contributor

hm, the windows_x64_asan build keeps failing consistently:

https://github.com/microsoft/onnxruntime/actions/runs/28128962920/job/83556848419?pr=28755#step:5:20897

1: AddressSanitizer: Out of memory. The process has exhausted 8192MB for size class 8192.
1: =================================================================
1: ==11460==ERROR: AddressSanitizer: allocator is out of memory trying to allocate 0x2000 bytes
1: AddressSanitizer: nested bug in the same thread, aborting.

I'm not sure why. maybe try merging from main first? if it still is an issue, there might be something to investigate further.

Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
auto-merge was automatically disabled June 29, 2026 22:46

Head branch was pushed to by a user without write access

@quic-muchhsu quic-muchhsu force-pushed the dev/muchhsu/add_gather_handler_transpose_optimization_code_linaro branch from 15a970b to 7e66d7c Compare June 29, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants