-
Notifications
You must be signed in to change notification settings - Fork 288
enable validationmode for eagle3 #2778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
enable validationmode for eagle3 #2778
Conversation
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables validation mode for Eagle 3 speculative decoding, specifically adding support for multinomial sampling. The change integrates EAGLE 3 model capabilities into the continuous batching pipeline with specialized handling for hidden state management and token validation.
- Adds
eagle3_mode
flag to llm_bench for enabling EAGLE 3 speculative decoding - Implements comprehensive EAGLE 3 support with model transformations, hidden state extraction, and specialized sampling
- Extends tokenization to support EAGLE 3 mode with conditional special token handling
Reviewed Changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
tools/llm_bench/benchmark.py | Adds command-line flag for EAGLE 3 mode |
tools/llm_bench/task/text_generation.py | Implements conditional tokenization for EAGLE 3 mode |
tools/continuous_batching/accuracy/continuous_batching_eagle_decoding.cpp | Adds new EAGLE decoding accuracy testing tool |
tools/continuous_batching/accuracy/CMakeLists.txt | Builds the new EAGLE decoding test executable |
src/cpp/src/speculative_decoding/ | Core EAGLE 3 implementation with model transforms and pipeline logic |
src/cpp/src/continuous_batching/ | Pipeline integration and model runner enhancements for EAGLE mode |
src/cpp/src/safe_tensor_wrapper.* | Utility for parsing SafeTensor format files |
src/cpp/include/openvino/genai/ | Header updates for EAGLE mode support |
Comments suppressed due to low confidence (1)
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp:1
- The magic number 0.9f should be defined as a named constant to improve code readability and maintainability.
// Copyright (C) 2023-2025 Intel Corporation
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
tokenization_start = time.perf_counter() | ||
input_data = tokenizer.encode(input_text_list) | ||
if args.get("eagle3_mode"): | ||
# eagle3 needs to disable special tokens to ensure compress rate |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions 'compress rate' but doesn't explain what compression is being referred to or why disabling special tokens improves it. Consider adding a more detailed comment explaining the relationship between EAGLE 3 mode and special token handling.
# eagle3 needs to disable special tokens to ensure compress rate | |
# In EAGLE 3 mode, the model uses a custom tokenization and compression scheme that achieves a higher compression rate | |
# (i.e., fewer tokens per input) when special tokens (such as BOS/EOS) are omitted. Disabling special tokens here ensures | |
# that the input is tokenized in a way that is compatible with EAGLE 3's expectations and maximizes compression efficiency. |
Copilot uses AI. Check for mistakes.
|
||
std::vector<ov::genai::GenerationConfig> get_spec_decoding_generation_config_examples() { | ||
|
||
// sampling param for speulative decoding |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: 'speulative' should be 'speculative'.
// sampling param for speulative decoding | |
// sampling param for speculative decoding |
Copilot uses AI. Check for mistakes.
print_cb_generation_result(generation_result); | ||
break; | ||
case ov::genai::GenerationStatus::IGNORED: | ||
std::cout << "Request was ignored due to lack of memory." <<std::endl; |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space before '<<std::endl;'. Should be consistent with spacing pattern used elsewhere in the file.
std::cout << "Request was ignored due to lack of memory." <<std::endl; | |
std::cout << "Request was ignored due to lack of memory." << std::endl; |
Copilot uses AI. Check for mistakes.
break; | ||
case ov::genai::GenerationStatus::STOP: | ||
case ov::genai::GenerationStatus::CANCEL: | ||
std::cout << "Request was aborted." <<std::endl; |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space before '<<std::endl;'. Should be consistent with spacing pattern used elsewhere in the file.
std::cout << "Request was aborted." <<std::endl; | |
std::cout << "Request was aborted." << std::endl; |
Copilot uses AI. Check for mistakes.
|
||
if (!m_new_results.empty()) { | ||
model->add_results(m_new_results); | ||
std::cout << "EagleModelTransform - Added last hidden output " << std::endl; |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::cout for logging in production code is not recommended. Consider using a proper logging framework or at least make these debug-only outputs that can be conditionally compiled out.
Copilot uses AI. Check for mistakes.
// m_initial_hidden_states.clear(); | ||
//auto key = std::make_pair(request_id, seq_grouped_id); |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed rather than left in the codebase. If this is for future use, consider adding a TODO comment explaining the purpose.
// m_initial_hidden_states.clear(); | |
//auto key = std::make_pair(request_id, seq_grouped_id); |
Copilot uses AI. Check for mistakes.
Signed-off-by: fishbell <bell.song@intel.com>
eagle 3 validation mode for multinomial sampling
to be merged after #2740