[QNN-EP] add gather handler for transpose optimizer#28755
Open
quic-muchhsu wants to merge 3 commits into
Open
Conversation
xadupre
previously approved these changes
Jun 3, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds ONNX Gather support to the transpose optimizer so a leading Transpose can be pushed past a Gather when indices is a 0-D constant scalar, enabling subsequent transpose fusion/cancellation (notably improving QNN EP graphs with Transpose→Gather→Transpose patterns).
Changes:
- Implement
HandleGatherin the transpose optimizer to remapaxisunder the incomingperm, push the transpose through, and emit the correct reduced-rank output permutation viaSqueezePerm. - Register the new handler in the optimizer’s ONNX handler map.
- Add unit tests covering: positive scalar-indices rewrite, negative-axis normalization, and two explicit no-opt fall-through cases (rank-1 indices and non-constant indices).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc | Adds a narrow Gather handler for scalar constant indices and registers it in the default handler map. |
| onnxruntime/test/optimizer/transpose_optimizer_test.cc | Adds coverage for the new Gather rewrite and key non-applicable cases. |
edgchen1
reviewed
Jun 24, 2026
edgchen1
reviewed
Jun 24, 2026
edgchen1
approved these changes
Jun 24, 2026
Contributor
|
hm, the windows_x64_asan build keeps failing consistently: I'm not sure why. maybe try merging from main first? if it still is an issue, there might be something to investigate further. |
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
auto-merge was automatically disabled
June 29, 2026 22:46
Head branch was pushed to by a user without write access
15a970b to
7e66d7c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Gather is missing from the transpose optimizer's handler map. This PR adds HandleGather, which pushes a Transpose past a Gather so the
transpose can cancel or fuse with neighboring transposes.
The rewrite is:
data -> Transpose(perm) -> Gather(axis=k, indices=scalar_const)becomes
data -> Gather(axis=perm[k], indices=scalar_const) -> Transpose(SqueezePerm({perm[k]}, perm))Scope is intentionally narrow — only the scalar (0-D) constant-indices case, which is structurally identical to a Squeeze along the gathered axis and reuses SqueezePerm for the post-rewrite output perm. Non-scalar or non-constant indices are left untouched. The framework's DefaultCostCheck still gates the rewrite on profitability, so unprofitable cases are rejected before the handler runs.
Tests added in transpose_optimizer_test.cc cover the positive scalar-indices case, negative-axis normalization, rank-1-indices fall-through, and dynamic-indices fall-through.
Motivation and Context
We've observed that the Transpose → Gather → Transpose pattern produces sub-optimal inference performance on the QNN EP. With this
handler in place, the surrounding transposes can fold or cancel, eliminating layout-conversion overhead around Gather.