[OVEP] OpenVINO Development Updates#28954
Conversation
* Update operator support status for OpenVINO 2025.2 * Disable unsupported tests --------- Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Sync with Microsoft ONNX Runtime - 28/08/2025
* [CPU] Optimize GQA attention bias application for FP16 (microsoft#25871) ### Description When using attention bias input for GQA op with FP16, on the platforms that don't natively support FP16 math a cast to fp32 needs to be performed, and thus a temporary buffer needs to be created to store the fp32 values. The issue is that this temporary buffer was being allocated / deallocated inside of a loop for every token being processed. Refactored the implementation so that the allocation takes place only once. Phi model throughput increased by 15%. * Fixes for DynamicQuantizeMatMul and Attention3D tests (microsoft#25814) ### Description This change fixes correctness issues in two areas that were causing failures in onnxruntime_test_all: - DynamicQuantizeMatMul.WithConstantBInputs - AttentionTest.Attention3DDefault - AttentionTest.Attention3DWithPastAndPresentQkMatmul What was wrong and how it’s fixed 1) DynamicQuantizeMatMul.WithConstantBInputs - Root cause: The Kleidi dynamic quantization GEMM path could be selected even when the B scales contained values such as (zero, negative, or non-finite). That violates kernel assumptions and can lead to incorrect results. - Fix: In `onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`, we now explicitly validate that all B scales are finite and strictly positive before enabling the Kleidi/MLAS dynamic path. If any scale is invalid, we disable that path. 2) Attention tests (Attention3DDefault, Attention3DWithPastAndPresentQkMatmul) - Root causes in `onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`: - Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g., not respecting C = beta*C when alpha==0 or K==0). - Unnecessary or premature fallbacks for small shapes. - Fixes: - Add early-outs for degenerate sizes: if M==0 or N==0, return handled. - Correctly implement alpha/beta semantics: --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> * Fix MoE CPP tests (microsoft#25877) This change adds skip test for QMoE CPU tests when running on TensorRT or CUDA EP. In the QMoE kernel there was a memory overwrite bug in the accumulate part, updated that and this fixed the python tests back * [c++] Eliminate dynamic initialization of static Ort::Global<void>::api_ (microsoft#25741) ### Description Delay the call to `OrtGetApiBase()` until the first call to `Ort::GetApi()` so that `OrtGetApiBase()` is typically called after dynamic library loading. ### Motivation and Context When ORT_API_MANUAL_INIT is not defined (which is the default), the static `Ort::Global<void>::api_` has a dynamic initializer that calls `OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization can cause problems when it interacts with other global/static initialization. On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries. * Replace the templated `Global<void>::api_` with an inline static initialized to nullptr. * `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls `detail::Global::DefaultInit()` if initialization is needed. * When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns nullptr, which will eventually cause the program to crash. The callers have violated the initialization contract by not calling one of the `Ort::InitApi` overloads. * When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a function-level static to compute the result of `OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it. * `Ort::Global<void>` has been replaced with a non-templated type and moved inside a `detail` namespace. Since the `Global<void>` object was documented as being used internally, it is believed that these changes here are non-breaking, as they do not impact a public API. The public APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain unchanged. * Add `#pragma detect_mismatch` to surface issues with compilation units that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * python GPU IO Bindings for NVIDIA (microsoft#25776) ### Description <!-- Describe your changes. --> 1. A Small change to use the shared allocator in Python binding. 2. Remove the FP64 support from the EP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The Python GPU IO binding is necessary for performance. The change will enable the shared allocator for GPU allocation. The FP64 was using the FP32 inference—aligned WRT TRT RTX support. --------- Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> * [CANN] Add a `enable_cann_subgraph` feature parameter (microsoft#25867) ### Description Add a `enable_cann_subgraph` feature parameter. this parameter controls whether graph splitting is performed and can help quickly identify issues in certain scenarios. * [EP ABI] Add OpAttr_GetTensorAttributeAsOrtValue and replace the existing Node_GetTensorAttributeAsOrtValue (microsoft#25886) ### Description Replace `Node_GetTensorAttributeAsOrtValue` with `OpAttr_GetTensorAttributeAsOrtValue`. Change the API signature to make it one of the `OpAttr` interfaces instead of the `OrtNode` interface. The original API was added [here](microsoft#25566). * Language bindings for model compatibility API (microsoft#25878) ### Description This change builds on top of microsoft#25841 , and adds the scaffolding necessary to call into this API from C++ / C# / Python. ### Motivation and Context microsoft#25454 talks more about the broader notion of precompiled model compatibility. This change is directed at app developers whose apps may want to determine if a particular precompiled model (e.g. on a server somewhere) is compatible with the device where the application is running. There is functionality in `OrtEpFactory` for making this determination, which was exposed as a C API in microsoft#25841, and this change makes the API more broadly available in other languages. ### Testing and Validation Introduced new unit test cases across each language, and verified that the API was being called and returned the correct result for the default CPU EP. --------- Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> * [QNN-EP] Introduce Level1 Transformer into qnn.preprocess (microsoft#25883) ### Description - Introduce Level1 Transformer into qnn.preprocess to support various optimizations. ### Motivation and Context - This change brings in several useful optimizations such as `ConvBnFusion` and `ConstantFolding`, which are part of `TransformerLevel::Level1` and can benefit QNNEP. - The goal is to optimize the ONNX model before quantization by integrating these passes into the Python tooling workflow. * [QNN EP] Minor fix weight name missing when not valid QDQ node group (microsoft#25887) ### Description Minor fix weight name missing when not valid QDQ node group ### Motivation and Context Some quantized model failed QDQ node group validation, the weights then won't be folded as initializer. QNN EP failed to handle the dynamic weights here due to the transpose op input name look up. This change make sure we process the weights tensor before adding transposes. * Add custom ops library_path to EP metadata (microsoft#25830) ## Summary Adds EP metadata library path support to enable custom ops DLL registration with proper path resolution. ## Changes - Added `library_path` metadata key to EP metadata infrastructure - Pass resolved library path directly to `EpLibraryProviderBridge` constructor - Simplified implementation per reviewer feedback (removed virtual method complexity) - Added `#include <utility>` for std::move compliance ## Purpose Enables downstream applications (like onnxruntime-genai) to resolve relative custom ops library paths using EP metadata, improving DLL registration reliability. ## Files Modified - `plugin_ep/ep_factory_provider_bridge.h` - `plugin_ep/ep_library.h` - `plugin_ep/ep_library_plugin.h` - `plugin_ep/ep_library_provider_bridge.cc` - `plugin_ep/ep_library_provider_bridge.h` - `utils.cc` * [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.23 (microsoft#25884) ### Description This update introduces multiple improvements, fixes, and feature enhancements to the OpenVINO Execution Provider (OVEP) and related components in ONNX Runtime: #### Configuration & Properties - Updated load_config mapping to act as a passthrough to OpenVINO properties. - Added support for providing layout information to inputs/outputs in OpenVINO. #### Inference & Tensor Handling - Improved OVInferRequest::SetTensor to correctly handle cached binding shape mismatches. - Added support for self-detecting on-the-fly bfloat16 → float16 conversion. - Fixed issues with input ONNX models when used with shared execution contexts. #### Model Handling & Operator Support - Fixed model copying behavior for QDQ stripping. - Updated operator support status for OpenVINO 2025.2. #### Platform & Integration Fixes - Applied multiple PSU Lora fixes and related updates. - Resolved filename confusion issues with wrapped OVIRs in EPCtx. - Enabled memory-mapped native binaries for OpenVINO 2025.3. #### Quality & Maintenance - Addressed linting issues. - Fixed coverage gaps in OVEP. - Added a new test script for OpenVINO with ORT ABI integration. --------- Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> * [java] Auto EP and compile model support (microsoft#25131) ### Description Java API for compile model and EP discovery APIs. Roughly equivalent to the C# version in microsoft#24604. cc: @skottmckay. I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through. ### Motivation and Context API parity for Java. * Add error handling to extract_nuget_files.ps1 (microsoft#25866) ### Description 1. Check process exit code when running 7z.exe . Currently the errors were silently ignored. 2. Add snld20 flag to the 7z.exe commands, which is needed to be compatible with the latest 7z release. * [Fix] illegal memory access in GetInputIndices with optional inputs (microsoft#25881) ### Description Fix illegal memory access in GetInputIndices with optional inputs ### Motivation and Context When an input is optional, its ValueInfo may be nullptr. The current implementation directly calls InputValueInfo->GetName(), leading to illegal memory access. Update logic to skip optional inputs when valueInfo is nullptr . * Re-enable cpuinfo for ARM64EC (microsoft#25863) ### Description <!-- Describe your changes. --> Re-enable cpuinfo for ARM64EC build and fix `CPUIDINFO_ARCH_ARM` so it is actually used. Patch cpuinfo to support vcpkg ARM64EC build. See pytorch/cpuinfo#324. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix for workaround in microsoft#25831. --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com> Co-authored-by: Christopher Warrington <chwarr@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Ishwar Raut <iraut@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Xinpeng Dou <15529241576@163.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Pradeep Sakhamoori <psakhamoori@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: mingyue <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
This reverts commit 2f1ad9d.
Revert "Sync with Microsoft ONNX Runtime - 01/09/2025"
Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Handled tensors with bf16 data type and external in memory data Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Backmerging with Msft commits
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Sync with Microsoft ONNX Runtime - 15/09/2025
Sync with Microsoft ONNX Runtime - 17/09/2025
Fix issue with creating backend for every inference iteration. The issue was caused by using std::map operator[] that created a pair with key and empty value. In Ubuntu std::map insert method won't override the key value in backend_map if key exists in map (created by operator[])
Sync with Microsoft ONNX Runtime - 26/09/2025
Sync with Microsoft ONNX Runtime - 01/10/2025
…817) * early version, it doesn't embed initializers into the proto, but then restores the metadata so OV can read them back Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * improve code, refactor into smaller functions, run the logic when there are external initializers in memory (more than one) Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * revert the wrongly merged code Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * Updated the condition for the new logic based on the total size of ext initializers, comments, refactoring Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * Update onnxruntime/core/providers/openvino/backend_manager.cc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * make the condition less strict - 32MB threshold, move debug dump after the logic is executed, check for OV version Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * unit test that uses ext initializers, early version Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * used kOrtSessionOptionsDisableCPUEPFallback, cleanups, model is now over 2GB to show the proto limit (when the new logic for ext initializers is enabled, then the test passes) Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * address code review comments Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * Update onnxruntime/test/providers/openvino/openvino_ep_ext_init.cc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix the Linux CI build, use PathString rather than wstring Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * As agreed, disable the test as it requires OV 2025.4, while the current CI version is only 2025.2 Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * add missing comment Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> --------- Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…zers (#825) * check for the size of RAW data, skip if it's zero Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> * Update onnxruntime/core/providers/openvino/backend_manager.cc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sync with Microsoft ONNX Runtime - 10/10/2025
… from initializers might become invalid (#828) Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>
Co-author: Beheshti, Nazanin
Backmerge pr Msft commits
Sync with Microsoft ONNX Runtime - 08062026
Update OV version to 2026.2.0
Sync with Microsoft ONNX Runtime - 09062026
|
@yuslepukhin & @adrianlizarraga Please review & merge this PR update. |
There was a problem hiding this comment.
Pull request overview
This PR updates the OpenVINO Execution Provider (OVEP) development stack by bumping the OpenVINO toolkit version to 2026.2, adjusting OVEP’s version gating/mapping to recognize 2026.2, fixing EPContext path validation behavior for OVIR-exported artifacts, and adding a new workload-type dynamic-options regression test.
Changes:
- Update OpenVINO toolkit references (Linux docker image + Windows CI workflow) to 2026.2.0.
- Extend OVEP version plumbing to include 2026.2 (new enum + capability mapping + unsupported-mode lists).
- Fix EPContext external path validation base-path selection for
.xmlcache contexts; add workload-type switching tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/ci_build/github/linux/docker/inference/x86_64/python/openvino/Dockerfile | Bumps Linux OpenVINO toolkit package + install path to 2026.2.0. |
| onnxruntime/test/providers/openvino/openvino_ep_workload_type_test.cc | Adds new OVEP workload-type dynamic switching and repeat-run consistency tests for NPU. |
| onnxruntime/core/providers/openvino/ov_versions/data_ops.h | Adds OpenVINO version enum value V_2026_2. |
| onnxruntime/core/providers/openvino/ov_versions/data_ops.cc | Extends unsupported-mode version lists to include V_2026_2. |
| onnxruntime/core/providers/openvino/ov_versions/capability.cc | Maps OpenVINO 2026.2 builds to V_2026_2 behavior. |
| onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc | Adjusts EPContext external path validation base for .xml cache contexts. |
| cmake/onnxruntime_providers_openvino.cmake | Raises minimum supported OpenVINO version to 2026.0+. |
| .github/workflows/windows_openvino.yml | Updates Windows CI OpenVINO download/extract to 2026.2.0. |
Comments suppressed due to low confidence (1)
cmake/onnxruntime_providers_openvino.cmake:22
- After raising the minimum supported OpenVINO version to 2026.0, the
if(OpenVINO_VERSION VERSION_GREATER_EQUAL 2024.4)condition is always true and can be removed to avoid dead conditional logic.
if(OpenVINO_VERSION VERSION_LESS 2026.0)
message(FATAL_ERROR "OpenVINO 2026.0 and newer are supported. Please, use latest OpenVINO release")
endif()
if(OpenVINO_VERSION VERSION_GREATER_EQUAL 2024.4)
add_definitions(-DUSE_OVEP_NPU_MEMORY=1)
endif()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Allow NPU resources to be fully released between tests. | ||
| // Without this delay the NPU driver may fail to re-initialise | ||
| void TearDown() override { | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(200)); | ||
| } |
| static Ort::Session CreateSqueezeNetSession( | ||
| Ort::SessionOptions& session_options, | ||
| std::unordered_map<std::string, std::string>& ov_options) { | ||
| session_options.SetIntraOpNumThreads(1); |
| - name: Set OpenVINORootDir | ||
| shell: pwsh | ||
| # Use $GITHUB_ENV to set the variable for subsequent steps | ||
| run: | | ||
| $openVinoRootDir = Join-Path $env:RUNNER_TEMP "openvino-v2026.1.0" | ||
| $openVinoRootDir = Join-Path $env:RUNNER_TEMP "openvino-v2026.2.0" | ||
| echo "OpenVINORootDir=$openVinoRootDir" >> $env:GITHUB_ENV |
|
Issues Found
|
|
@yuslepukhin.. This is a very targeted code path used by only Foundry customers and for models generated using Olive. This fix is to ensure the existing customer base is not impacted if using legacy EP. If you still think we should add one please let us know and we can scope. |
Regression test is a must for all PRs. |
* Unit test for EPCTX OVIR model * [OVEP] Case-insensitive .xml check and reuse cache_context_path in EPCtxHandler * Address review comments
|
@yuslepukhin Review comments have been addressed, please review & merge. |
|
|
NEW BUG introduced in the latest update Download step uses OpenVINOVersion: 2026.2.1 → extracts to openvino-v2026.2.1 |
0b58ffd to
5c028f2
Compare
Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
|
@yuslepukhin Pls review |
Description
This PR introduces minor development patches as listed below.