Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online weight update [WIP] #2119

Closed
wants to merge 31 commits into from
Closed

Online weight update [WIP] #2119

wants to merge 31 commits into from

Conversation

zhaochenyang20
Copy link
Collaborator

Motivation

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@merrymercy merrymercy marked this pull request as draft November 22, 2024 07:42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this, it's useless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this, it's useless.

@classmethod
def init_process(cls, rank, world_size, base_url, model_name, server_pid):
try:
# 设置分布式环境
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use Chinese annotations.

@merrymercy
Copy link
Contributor

You can rebase and add your 2-gpu test here

unit-test-backend-2-gpu-part-1:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: 2-gpu-runner
steps:

return ret
except Exception as e:
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that the exception details are not exposed to the user. Instead, we should log the exception details on the server and return a generic error message to the user. This can be achieved by modifying the exception handling block to log the exception and return a generic error message.

  • Modify the exception handling block in the get_memory_pool_size function to log the exception and return a generic error message.
  • Add the necessary import for logging if it is not already present.
Suggested changeset 1
python/sglang/srt/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -218,4 +218,5 @@
     except Exception as e:
+        logging.error("Exception occurred in get_memory_pool_size: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
EOF
@@ -218,4 +218,5 @@
except Exception as e:
logging.error("Exception occurred in get_memory_pool_size: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
return ORJSONResponse(ret, status_code=200)
except Exception as e:
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that detailed exception messages are not exposed to the user. Instead, we should log the detailed error information on the server and return a generic error message to the user. This can be achieved by modifying the exception handling code to log the exception and return a generic error message.

  1. Import the traceback module to log the stack trace.
  2. Modify the exception handling blocks to log the stack trace and return a generic error message.
Suggested changeset 1
python/sglang/srt/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -31,2 +31,3 @@
 import torch
+import traceback
 
@@ -284,4 +285,5 @@
     except Exception as e:
+        logging.error(traceback.format_exc())
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -296,4 +298,5 @@
     except Exception as e:
+        logging.error(traceback.format_exc())
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -308,4 +311,5 @@
     except Exception as e:
+        logging.error(traceback.format_exc())
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
EOF
@@ -31,2 +31,3 @@
import torch
import traceback

@@ -284,4 +285,5 @@
except Exception as e:
logging.error(traceback.format_exc())
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -296,4 +298,5 @@
except Exception as e:
logging.error(traceback.format_exc())
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -308,4 +311,5 @@
except Exception as e:
logging.error(traceback.format_exc())
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
python/sglang/srt/server.py Fixed Show fixed Hide fixed
return ret
except ValueError as e:
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that detailed exception messages are not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling code to log the exception and return a generic error message.

  1. Import the logging module if not already imported.
  2. Modify the exception handling code to log the exception message using the logging module.
  3. Return a generic error message to the user.
Suggested changeset 1
python/sglang/srt/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -325,3 +325,4 @@
             except ValueError as e:
-                out = {"error": {"message": str(e)}}
+                logging.error(f"Error in stream_results: {e}")
+                out = {"error": {"message": "An internal error has occurred."}}
                 yield b"data: " + orjson.dumps(
@@ -341,4 +342,5 @@
         except ValueError as e:
+            logging.error(f"Error in generate_request: {e}")
             return ORJSONResponse(
-                {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+                {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
             )
@@ -356,4 +358,5 @@
     except ValueError as e:
+        logging.error(f"Error in init_parameter_update_group_request: {e}")
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -370,4 +373,5 @@
     except ValueError as e:
+        logging.error(f"Error in get_weights_by_parameter_name_request: {e}")
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -386,4 +390,5 @@
     except ValueError as e:
+        logging.error(f"Error in update_parameter_from_distributed_request: {e}")
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
EOF
@@ -325,3 +325,4 @@
except ValueError as e:
out = {"error": {"message": str(e)}}
logging.error(f"Error in stream_results: {e}")
out = {"error": {"message": "An internal error has occurred."}}
yield b"data: " + orjson.dumps(
@@ -341,4 +342,5 @@
except ValueError as e:
logging.error(f"Error in generate_request: {e}")
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -356,4 +358,5 @@
except ValueError as e:
logging.error(f"Error in init_parameter_update_group_request: {e}")
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -370,4 +373,5 @@
except ValueError as e:
logging.error(f"Error in get_weights_by_parameter_name_request: {e}")
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -386,4 +390,5 @@
except ValueError as e:
logging.error(f"Error in update_parameter_from_distributed_request: {e}")
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
return ret
except ValueError as e:
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that detailed error messages and stack traces are not exposed to the user. Instead, we should log the detailed error information on the server and return a generic error message to the user. This can be achieved by modifying the exception handling code to log the exception and return a generic error message.

  • Modify the exception handling code in the generate_request, init_parameter_update_group_request, get_weights_by_parameter_name_request, update_parameter_from_distributed_request, and encode_request functions.
  • Use the logging module to log the detailed error message on the server.
  • Return a generic error message to the user.
Suggested changeset 1
python/sglang/srt/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -325,3 +325,4 @@
             except ValueError as e:
-                out = {"error": {"message": str(e)}}
+                logging.error("Error in stream_results: %s", str(e))
+                out = {"error": {"message": "An internal error has occurred."}}
                 yield b"data: " + orjson.dumps(
@@ -341,4 +342,5 @@
         except ValueError as e:
+            logging.error("Error in generate_request: %s", str(e))
             return ORJSONResponse(
-                {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+                {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
             )
@@ -356,4 +358,5 @@
     except ValueError as e:
+        logging.error("Error in init_parameter_update_group_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -370,4 +373,5 @@
     except ValueError as e:
+        logging.error("Error in get_weights_by_parameter_name_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -386,4 +390,5 @@
     except ValueError as e:
+        logging.error("Error in update_parameter_from_distributed_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -403,4 +408,5 @@
     except ValueError as e:
+        logging.error("Error in encode_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
EOF
@@ -325,3 +325,4 @@
except ValueError as e:
out = {"error": {"message": str(e)}}
logging.error("Error in stream_results: %s", str(e))
out = {"error": {"message": "An internal error has occurred."}}
yield b"data: " + orjson.dumps(
@@ -341,4 +342,5 @@
except ValueError as e:
logging.error("Error in generate_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -356,4 +358,5 @@
except ValueError as e:
logging.error("Error in init_parameter_update_group_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -370,4 +373,5 @@
except ValueError as e:
logging.error("Error in get_weights_by_parameter_name_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -386,4 +390,5 @@
except ValueError as e:
logging.error("Error in update_parameter_from_distributed_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -403,4 +408,5 @@
except ValueError as e:
logging.error("Error in encode_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
return ret
except ValueError as e:
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that detailed exception messages are not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling code to log the exception and return a generic error message.

  1. Import the logging module if it is not already imported.
  2. Replace the return statements that expose the exception message with code that logs the exception and returns a generic error message.
Suggested changeset 1
python/sglang/srt/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -296,4 +296,5 @@
     except Exception as e:
+        logging.error("Error in open_session: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -308,4 +309,5 @@
     except Exception as e:
+        logging.error("Error in close_session: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -325,3 +327,4 @@
             except ValueError as e:
-                out = {"error": {"message": str(e)}}
+                logging.error("Error in stream_results: %s", str(e))
+                out = {"error": {"message": "An internal error has occurred."}}
                 yield b"data: " + orjson.dumps(
@@ -341,4 +344,5 @@
         except ValueError as e:
+            logging.error("Error in generate_request: %s", str(e))
             return ORJSONResponse(
-                {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+                {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
             )
@@ -356,4 +360,5 @@
     except ValueError as e:
+        logging.error("Error in init_parameter_update_group_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -370,4 +375,5 @@
     except ValueError as e:
+        logging.error("Error in get_weights_by_parameter_name_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
@@ -386,4 +392,5 @@
     except ValueError as e:
+        logging.error("Error in update_parameter_from_distributed_request: %s", str(e))
         return ORJSONResponse(
-            {"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
+            {"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
         )
EOF
@@ -296,4 +296,5 @@
except Exception as e:
logging.error("Error in open_session: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -308,4 +309,5 @@
except Exception as e:
logging.error("Error in close_session: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -325,3 +327,4 @@
except ValueError as e:
out = {"error": {"message": str(e)}}
logging.error("Error in stream_results: %s", str(e))
out = {"error": {"message": "An internal error has occurred."}}
yield b"data: " + orjson.dumps(
@@ -341,4 +344,5 @@
except ValueError as e:
logging.error("Error in generate_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -356,4 +360,5 @@
except ValueError as e:
logging.error("Error in init_parameter_update_group_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -370,4 +375,5 @@
except ValueError as e:
logging.error("Error in get_weights_by_parameter_name_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
@@ -386,4 +392,5 @@
except ValueError as e:
logging.error("Error in update_parameter_from_distributed_request: %s", str(e))
return ORJSONResponse(
{"error": {"message": str(e)}}, status_code=HTTPStatus.BAD_REQUEST
{"error": {"message": "An internal error has occurred."}}, status_code=HTTPStatus.BAD_REQUEST
)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants