-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Upstream codebase diff #470
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,35 @@ | |||
name: cpu-test |
Check failure
Code scanning / Scorecard
Token-Permissions High
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
@@ -0,0 +1,45 @@ | |||
name: codespell |
Check failure
Code scanning / Scorecard
Token-Permissions High
Remediation tip: Visit https://app.stepsecurity.io/secureworkflow.
Tick the 'Restrict permissions for GITHUB_TOKEN'
Untick other options
NOTE: If you want to resolve multiple issues at once, you can visit https://app.stepsecurity.io/securerepo instead.
Click Remediation section below for further remediation help
def test_stateless_process_group(worker): | ||
port1 = get_open_port() | ||
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
s.bind(("", port1)) |
Check warning
Code scanning / CodeQL
Binding a socket to all network interfaces Medium test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI about 2 months ago
To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. In this case, we can bind it to the loopback interface 127.0.0.1
, which is commonly used for local testing and development. This change will limit the socket to accept connections only from the local machine, reducing the security risks.
-
Copy modified line R125
@@ -124,3 +124,3 @@ | ||
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
s.bind(("", port1)) | ||
s.bind(("127.0.0.1", port1)) | ||
port2 = get_open_port() |
|
||
sock = socket.socket(family=family, type=socket.SOCK_STREAM) | ||
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
sock.bind(addr) |
Check warning
Code scanning / CodeQL
Binding a socket to all network interfaces Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 12 days ago
To fix the problem, we need to ensure that the socket is not bound to all network interfaces. Instead, we should bind it to a specific interface. This can be achieved by modifying the create_server_socket
function to check if the provided address is empty or 0.0.0.0
and replace it with a specific interface address.
- Modify the
create_server_socket
function to check if the address is empty or0.0.0.0
. - If the address is empty or
0.0.0.0
, replace it with a specific interface address (e.g.,127.0.0.1
for localhost). - Update the
sock.bind(addr)
call to use the modified address.
-
Copy modified lines R760-R763
@@ -759,2 +759,6 @@ | ||
|
||
# Bind to a specific interface if the address is empty or 0.0.0.0 | ||
if addr[0] in ("", "0.0.0.0"): | ||
addr = ("127.0.0.1", addr[1]) | ||
|
||
sock = socket.socket(family=family, type=socket.SOCK_STREAM) |
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
# Llama3.2 models more reliable. | ||
|
||
TOOL_CALL_REGEX = re.compile( | ||
r"\[([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s)?\),\s*)*([a-zA-Z]+\w*\(([a-zA-Z]+\w*=.*,\s*)*([a-zA-Z]+\w*=.*\s*)?\)\s*)+\]", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
return resp | ||
|
||
except Exception as e: | ||
return web.Response(text=f"Error: {str(e)}", status=500) |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium test
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI about 2 months ago
To fix the problem, we need to ensure that detailed exception messages are not exposed to the end user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block to log the exception and return a generic error message.
- Import the
logging
module to enable logging of exceptions. - Configure the logging settings if not already configured.
- Modify the exception handling block to log the exception and return a generic error message.
-
Copy modified line R3 -
Copy modified line R7 -
Copy modified lines R41-R42
@@ -2,3 +2,3 @@ | ||
import itertools | ||
|
||
import logging | ||
import aiohttp | ||
@@ -6,2 +6,3 @@ | ||
|
||
logging.basicConfig(level=logging.ERROR) | ||
|
||
@@ -39,3 +40,4 @@ | ||
except Exception as e: | ||
return web.Response(text=f"Error: {str(e)}", status=500) | ||
logging.error("An error occurred while handling the request", exc_info=True) | ||
return web.Response(text="An internal error has occurred!", status=500) | ||
|
…ect#11695) Signed-off-by: mgoin <[email protected]>
Signed-off-by: xcnick <[email protected]>
…graph-capture (vllm-project#11233) Signed-off-by: Yan Burman <[email protected]> Signed-off-by: Ido Asraff <[email protected]>
…eVision (vllm-project#11717) Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
…11736) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
Co-authored-by: Lancer <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Divakar Verma <[email protected]>
… to EngineCore (vllm-project#11960) Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: Cody Yu <[email protected]>
…ject#12102) Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: jiang1.li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Multimodality fix for llava after rebase Fix for: ``` ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask ```
This PR updates `test/lora/utils.py` based on latest rebase.
1. This PR updates habana_main README_GAUDI to the Technical Writer reviewed version as seen in v1.19.0. (habana_main README_GAUDI and v1.19.0 README_GAUDI had diverged. ) 2. It also fixes broken urls due to recent restructuring in upstream vllm examples folder. 3. Adds notes in examples folder for new users and redirects them to see the Gaudi specific examples in README_GAUDI.md.
Supporting PR for HabanaAI/vllm-hpu-extension#76
Changes the sampler used by dummy sequences to greedy if any sequence is using it. Prevents sampler recompilations.
Co-authored-by: Michał Kuligowski <[email protected]>
- Resolves issue due to release of triton v3.2.0 (January 23rd, 2025). This is a workaround. A proper fix to support triton v3.2.0 may be required. Error when triton v3.2.0 is used is shown below. ```bash Traceback (most recent call last): File "/workspace/vllm/test_evaluation.py", line 15, in <module> from vllm import LLM, SamplingParams File "/workspace/vllm/vllm/__init__.py", line 7, in <module> from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs File "/workspace/vllm/vllm/engine/arg_utils.py", line 11, in <module> from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig, File "/workspace/vllm/vllm/config.py", line 16, in <module> from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS File "/workspace/vllm/vllm/model_executor/layers/quantization/__init__.py", line 6, in <module> from vllm.model_executor.layers.quantization.awq_marlin import AWQMarlinConfig File "/workspace/vllm/vllm/model_executor/layers/quantization/awq_marlin.py", line 6, in <module> import vllm.model_executor.layers.fused_moe # noqa File "/workspace/vllm/vllm/model_executor/layers/fused_moe/__init__.py", line 34, in <module> import vllm.model_executor.layers.fused_moe.fused_marlin_moe # noqa File "/workspace/vllm/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py", line 8, in <module> from vllm.model_executor.layers.fused_moe.fused_moe import ( File "/workspace/vllm/vllm/model_executor/layers/fused_moe/fused_moe.py", line 18, in <module> from vllm_hpu_extension.ops import scaled_fp8_quant File "/usr/local/lib/python3.10/dist-packages/vllm_hpu_extension/ops.py", line 9, in <module> import habana_frameworks.torch as htorch File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/__init__.py", line 54, in <module> import habana_frameworks.torch.core File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/__init__.py", line 114, in <module> import_compilers() File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/backends.py", line 39, in import_compilers from .compilers import hpu_inference_compiler, hpu_training_compiler_bw, hpu_training_compiler_fw File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/compilers.py", line 27, in <module> from .freezing_passes import freeze File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/freezing_passes.py", line 28, in <module> from torch._inductor.freezing import discard_traced_gm_params, invalidate_eager_modules, replace_params_with_constants File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/freezing.py", line 15, in <module> from torch._inductor.fx_passes.freezing_patterns import freezing_passes File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/fx_passes/freezing_patterns.py", line 5, in <module> from torch._inductor.compile_fx import fake_tensor_prop File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 49, in <module> from torch._inductor.debug import save_args_for_compile_fx_inner File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 26, in <module> from . import config, ir # noqa: F811, this is needed File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/ir.py", line 77, in <module> from .runtime.hints import ReductionHint File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/runtime/hints.py", line 36, in <module> attr_desc_fields = {f.name for f in fields(AttrsDescriptor)} File "/usr/lib/python3.10/dataclasses.py", line 1198, in fields raise TypeError('must be called with a dataclass type or instance') from None TypeError: must be called with a dataclass type or instance ``` Signed-off-by: Voas, Tanner <[email protected]>
Co-authored-by: Michał Kuligowski <[email protected]>
…730) Currently we will have a hang at the end of script when using TP>1 and multistep scheduling. This is caused by lack of notification from driver worker about ending the execution loop. This is a workaround for this issue, by making sure that all workers are notified at the end of `llm_engine` loop. Other possible workaround could be modification of this check: https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/engine/llm_engine.py#L1379 with `or not self.has_unfinished_requests()`.
This PR enables multi step scheduling for encoder - decoder models
This is required for running the already quantized models with hpu, using the fp8 quantization method (and not "inc").
Scope of changes:
mark_step
s)