-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add flux example #1126
base: main
Are you sure you want to change the base?
add flux example #1126
Conversation
WalkthroughThe pull request introduces several significant changes across multiple scripts related to machine learning model benchmarking and image generation. A new benchmarking script, Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Outside diff range and nitpick comments (9)
benchmarks/run_benchmark.sh (1)
135-136
: Add timestamp and system info to benchmark results.The benchmark results could benefit from additional context.
Consider adding system information to the output:
+# Get system info +echo -e "\nSystem Information:" >> benchmark_result_"${gpu_name}".md +echo "- Date: $(date)" >> benchmark_result_"${gpu_name}".md +echo "- CPU: $(lscpu | grep 'Model name' | cut -f 2 -d ":")" >> benchmark_result_"${gpu_name}".md +echo "- Memory: $(free -h | awk '/^Mem:/ {print $2}')" >> benchmark_result_"${gpu_name}".md +echo "- GPU: $(nvidia-smi --query-gpu=gpu_name --format=csv,noheader)" >> benchmark_result_"${gpu_name}".md +echo -e "\n" >> benchmark_result_"${gpu_name}".md + echo -e "\nBenchmark Results:" echo -e ${BENCHMARK_RESULT_TEXT} | tee -a benchmark_result_"${gpu_name}".mdbenchmarks/text_to_image.py (2)
256-262
: Add error handling and documentation for quantization.The quantization implementation is functionally correct but could benefit from some improvements:
- Consider adding error handling for failed quantization attempts
- Add inline documentation explaining the quantization process and its impact
Consider adding the following improvements:
if args.quantize: + # Apply quantization to reduce model size and improve inference speed + try: if hasattr(pipe, "unet"): pipe.unet = quantize_model(pipe.unet) if hasattr(pipe, "transformer"): pipe.transformer = quantize_model(pipe.transformer) + except Exception as e: + print(f"Warning: Quantization failed - {str(e)}") + print("Proceeding with unquantized model")
256-262
: Consider standardizing quantization across backends.The current implementation has different quantization approaches for oneflow and nexfort backends. Consider:
- Creating a backend-agnostic quantization interface
- Standardizing quantization configuration parameters across backends
- Implementing a factory pattern for backend-specific quantization strategies
This would improve maintainability and make the codebase more modular.
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (4)
29-29
: Remove unused import 'os'The module
os
is imported but not used anywhere in the code. Removing it will clean up the imports.Apply this diff to remove the unused import:
-import os
🧰 Tools
🪛 Ruff
29-29:
os
imported but unusedRemove unused import:
os
(F401)
39-39
: Remove unused import 'quantize_pipe'The function
quantize_pipe
is imported but not used in the code. Since it's commented out and not in use, consider removing it to clean up the imports.Apply this diff to remove the unused import:
- quantize_pipe,
🧰 Tools
🪛 Ruff
39-39:
onediffx.quantize_pipe
imported but unusedRemove unused import:
onediffx.quantize_pipe
(F401)
184-185
: Parameterizeheight
andwidth
ingenerate_data_and_fit_model
Currently,
height
andwidth
are hard-coded to1024
. To increase flexibility and reusability, consider passingheight
andwidth
as parameters or using the values from the main arguments.
401-401
: Adjuststeps_range
for meaningful throughput dataStarting
steps_range
from1
may not provide meaningful performance insights. Consider starting from a higher minimum value, such as10
, to capture more relevant throughput information.Apply this diff to adjust the steps range:
- steps_range = range(1, 100, 1) + steps_range = range(10, 100, 10)onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py (2)
386-388
: Provide guidance for saving the output image.The script prints a message if
--output-image
is not set but does not provide a default path or an alternative method to save the image, which may confuse users.Consider setting a default output path or updating the message for clarity.
349-351
: Remove commented-out code to enhance readability.The commented lines seem unnecessary and can be removed to clean up the code.
Apply this diff to remove the unused comments:
-# warmup_compile + warmup_cache = warmup_time_first -# warmup_compile = warmup_time_second
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- benchmarks/run_benchmark.sh (1 hunks)
- benchmarks/text_to_image.py (2 hunks)
- onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1 hunks)
- onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py (1 hunks)
🧰 Additional context used
🪛 Shellcheck
benchmarks/run_benchmark.sh
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 85-85: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
🪛 Ruff
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
29-29:
os
imported but unusedRemove unused import:
os
(F401)
39-39:
onediffx.quantize_pipe
imported but unusedRemove unused import:
onediffx.quantize_pipe
(F401)
159-159: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py
30-30:
os
imported but unusedRemove unused import:
os
(F401)
156-156: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
🔇 Additional comments (5)
benchmarks/text_to_image.py (2)
38-38
: LGTM: Import for quantization support added.The new import aligns with the PR objective of adding quantization support.
256-262
: Verify quantization impact on performance.Consider adding specific benchmarking comparisons between quantized and non-quantized models to measure the performance impact.
Run the following script to analyze the impact:
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (3)
49-98
: Command-line argument parsing is comprehensive and flexibleThe
parse_args
function effectively defines a wide range of command-line arguments, allowing users to customize the script's behavior extensively.
103-144
:load_pipe
function handles model loading robustlyThe
load_pipe
function correctly loads the pipeline with the specified configurations, handling custom pipelines, schedulers, LoRA weights, and controlnets appropriately.
223-406
: Main function is well-structured and effectiveThe
main()
function orchestrates the pipeline loading, input processing, and inference execution efficiently. The sequential flow enhances readability and maintainability.
benchmarks/run_benchmark.sh
Outdated
model_dir="/data1/hf_model" | ||
sd15_path="${model_dir}/stable-diffusion-v1-5" | ||
sd21_path="${model_dir}/stable-diffusion-2-1" | ||
sdxl_path="${model_dir}/stable-diffusion-xl-base-1.0" | ||
sd3_path="/data1/home/zhangxu/stable-diffusion-3-medium-diffusers" | ||
flux_dev_path="${model_dir}/FLUX.1-dev/snapshots/0ef5fff789c832c5c7f4e127f94c8b54bbcced44" | ||
flux_schell_path="${model_dir}/FLUX.1-schnell" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hardcoding paths and add validation.
The script uses hardcoded paths which makes it less portable and could fail silently if models aren't present.
Consider:
- Using environment variables for model paths
- Adding path validation
-model_dir="/data1/hf_model"
+MODEL_DIR="${HF_MODEL_DIR:-/data1/hf_model}"
+
+validate_model_path() {
+ if [ ! -d "$1" ]; then
+ echo "Error: Model path not found: $1"
+ exit 1
+ fi
+}
+
+sd15_path="${MODEL_DIR}/stable-diffusion-v1-5"
+validate_model_path "${sd15_path}"
Committable suggestion was skipped due to low confidence.
benchmarks/run_benchmark.sh
Outdated
#!/bin/bash | ||
set -e | ||
|
||
# indicate which model to run | ||
# e.g. ./run_benchmark.sh sd15,sd21,sdxl or ./run_benchmark.sh all | ||
run_model=$1 | ||
|
||
export NEXFORT_GRAPH_CACHE=1 | ||
export NEXFORT_FX_FORCE_TRITON_SDPA=1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for required command-line argument.
The script should validate that the model argument is provided and show usage information if missing.
Add this at the beginning of the script:
#!/bin/bash
set -e
+
+if [ -z "$1" ]; then
+ echo "Usage: $0 <model_list>"
+ echo "Examples:"
+ echo " $0 sd15,sd21,sdxl"
+ echo " $0 all"
+ exit 1
+fi
# indicate which model to run
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#!/bin/bash | |
set -e | |
# indicate which model to run | |
# e.g. ./run_benchmark.sh sd15,sd21,sdxl or ./run_benchmark.sh all | |
run_model=$1 | |
export NEXFORT_GRAPH_CACHE=1 | |
export NEXFORT_FX_FORCE_TRITON_SDPA=1 | |
#!/bin/bash | |
set -e | |
if [ -z "$1" ]; then | |
echo "Usage: $0 <model_list>" | |
echo "Examples:" | |
echo " $0 sd15,sd21,sdxl" | |
echo " $0 all" | |
exit 1 | |
fi | |
# indicate which model to run | |
# e.g. ./run_benchmark.sh sd15,sd21,sdxl or ./run_benchmark.sh all | |
run_model=$1 | |
export NEXFORT_GRAPH_CACHE=1 | |
export NEXFORT_FX_FORCE_TRITON_SDPA=1 |
benchmarks/run_benchmark.sh
Outdated
benchmark_model_with_one_resolution() { | ||
model_name=$1 | ||
model_path=$2 | ||
steps=$3 | ||
compiler=$4 | ||
compiler_config=$5 | ||
height=$6 | ||
width=$7 | ||
quantize=$8 | ||
|
||
echo "Running ${model_path} ${height}x${width}..." | ||
|
||
if [[ "${model_name}" =~ sd3 ]]; then | ||
script_path="onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py" | ||
elif [[ "${model_name}" =~ flux ]]; then | ||
script_path="onediff_diffusers_extensions/examples/flux/text_to_image_flux.py" | ||
else | ||
script_path="benchmarks/text_to_image.py" | ||
fi | ||
|
||
if [[ ${quantize} == True ]]; then | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--quantize --quantize-config "${quantize_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
else | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
fi | ||
|
||
inference_time=$(echo "${script_output}" | grep -oP '(?<=Inference time: )\d+\.\d+') | ||
iterations_per_second=$(echo "${script_output}" | grep -oP '(?<=Iterations per second: )\d+\.\d+') | ||
max_used_cuda_memory=$(echo "${script_output}" | grep -oP '(?<=Max used CUDA memory : )\d+\.\d+') | ||
warmup_time=$(echo "${script_output}" | grep -oP '(?<=Warmup time: )\d+\.\d+') | ||
|
||
BENCHMARK_RESULT_TEXT="${BENCHMARK_RESULT_TEXT}| "${current_time}" | "${gpu_name}" | "${model_name}" | ${height}x${width} | ${compiler} | ${quantize} | ${iterations_per_second} | ${inference_time} | ${max_used_cuda_memory} | ${warmup_time} |\n" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the benchmark function's robustness and output handling.
The function has several areas for improvement:
- No error handling for failed command execution
- Regex parsing could fail silently
- Quote handling issues in result string
Suggested improvements:
benchmark_model_with_one_resolution() {
+ # Validate input parameters
+ [[ -z "$1" ]] && { echo "Error: model_name is required"; return 1; }
+ # ... validate other parameters ...
+
model_name=$1
model_path=$2
steps=$3
compiler=$4
compiler_config=$5
height=$6
width=$7
quantize=$8
+ # Capture command output and exit status
+ set +e
if [[ ${quantize} == True ]]; then
script_output=$(python3 ${script_path} \
--model ${model_path} --variant fp16 --steps ${steps} \
--height ${height} --width ${width} --seed 1 \
--compiler ${compiler} --compiler-config "${compiler_config}" \
--quantize --quantize-config "${quantize_config}" \
--prompt "${prompt}" --print-output | tee /dev/tty)
+ exit_status=$?
else
# ... similar for non-quantize case ...
fi
+ set -e
+
+ # Check for execution errors
+ if [ $exit_status -ne 0 ]; then
+ echo "Error: Benchmark failed for ${model_name}"
+ return 1
+ fi
# Extract metrics with error checking
- inference_time=$(echo "${script_output}" | grep -oP '(?<=Inference time: )\d+\.\d+')
+ inference_time=$(echo "${script_output}" | grep -oP '(?<=Inference time: )\d+\.\d+' || echo "N/A")
# ... similar for other metrics ...
# Fix quote handling in result string
- BENCHMARK_RESULT_TEXT="${BENCHMARK_RESULT_TEXT}| "${current_time}" | "${gpu_name}" | "${model_name}" | ${height}x${width} | ${compiler} | ${quantize} | ${iterations_per_second} | ${inference_time} | ${max_used_cuda_memory} | ${warmup_time} |\n"
+ BENCHMARK_RESULT_TEXT="${BENCHMARK_RESULT_TEXT}| ${current_time} | ${gpu_name} | ${model_name} | ${height}x${width} | ${compiler} | ${quantize} | ${iterations_per_second} | ${inference_time} | ${max_used_cuda_memory} | ${warmup_time} |\n"
}
Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Shellcheck
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
benchmarks/run_benchmark.sh
Outdated
#sdxl_nexfort_compiler_config="" | ||
sd3_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' | ||
flux_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last"}' | ||
|
||
benchmark_model_with_one_resolution() { | ||
model_name=$1 | ||
model_path=$2 | ||
steps=$3 | ||
compiler=$4 | ||
compiler_config=$5 | ||
height=$6 | ||
width=$7 | ||
quantize=$8 | ||
|
||
echo "Running ${model_path} ${height}x${width}..." | ||
|
||
if [[ "${model_name}" =~ sd3 ]]; then | ||
script_path="onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py" | ||
elif [[ "${model_name}" =~ flux ]]; then | ||
script_path="onediff_diffusers_extensions/examples/flux/text_to_image_flux.py" | ||
else | ||
script_path="benchmarks/text_to_image.py" | ||
fi | ||
|
||
if [[ ${quantize} == True ]]; then | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--quantize --quantize-config "${quantize_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
else | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
fi | ||
|
||
inference_time=$(echo "${script_output}" | grep -oP '(?<=Inference time: )\d+\.\d+') | ||
iterations_per_second=$(echo "${script_output}" | grep -oP '(?<=Iterations per second: )\d+\.\d+') | ||
max_used_cuda_memory=$(echo "${script_output}" | grep -oP '(?<=Max used CUDA memory : )\d+\.\d+') | ||
warmup_time=$(echo "${script_output}" | grep -oP '(?<=Warmup time: )\d+\.\d+') | ||
|
||
BENCHMARK_RESULT_TEXT="${BENCHMARK_RESULT_TEXT}| "${current_time}" | "${gpu_name}" | "${model_name}" | ${height}x${width} | ${compiler} | ${quantize} | ${iterations_per_second} | ${inference_time} | ${max_used_cuda_memory} | ${warmup_time} |\n" | ||
} | ||
|
||
# conda init | ||
source ~/miniconda3/etc/profile.d/conda.sh | ||
|
||
######################################### | ||
if [[ "${run_model}" =~ sd15|all ]]; then | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 none none 512 512 False | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 oneflow none 512 512 False | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 oneflow none 512 512 True | ||
fi | ||
|
||
if [[ "${run_model}" =~ sd21|all ]]; then | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 none none 768 768 False | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 oneflow none 768 768 False | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 oneflow none 768 768 True | ||
fi | ||
|
||
if [[ "${run_model}" =~ sdxl|all ]]; then | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 none none 1024 1024 False | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 oneflow none 1024 1024 False | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 oneflow none 1024 1024 True | ||
fi | ||
######################################### | ||
|
||
######################################### | ||
if [[ "${run_model}" =~ sd3|all ]]; then | ||
conda activate nexfort | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 none none 1024 1024 False | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 nexfort "${sd3_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 nexfort "${sd3_nexfort_compiler_config}" 1024 1024 True | ||
fi | ||
|
||
|
||
if [[ "${run_model}" =~ flux|all ]]; then | ||
conda activate nexfort | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 none none 1024 1024 False | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 nexfort "${flux_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 nexfort "${flux_nexfort_compiler_config}" 1024 1024 True | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 transform none 1024 1024 False | ||
|
||
|
||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 none none 1024 1024 False | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 nexfort "${flux_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 nexfort "${flux_nexfort_compiler_config}" 1024 1024 True | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 transform none 1024 1024 False | ||
fi | ||
######################################### | ||
|
||
|
||
echo -e "\nBenchmark Results:" | ||
echo -e ${BENCHMARK_RESULT_TEXT} | tee -a benchmark_result_"${gpu_name}".md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding parallel execution support for faster benchmarking.
The script runs benchmarks sequentially, which could be time-consuming when benchmarking multiple models.
Consider adding support for parallel execution using GNU Parallel:
+# Function to check if GNU Parallel is installed
+check_parallel() {
+ if ! command -v parallel &> /dev/null; then
+ echo "GNU Parallel not found. Running benchmarks sequentially."
+ return 1
+ fi
+ return 0
+}
+
+# Function to generate benchmark commands
+generate_benchmark_commands() {
+ local model=$1
+ case $model in
+ sd15)
+ echo "benchmark_model_with_one_resolution sd15 ${sd15_path} 30 none none 512 512 False"
+ echo "benchmark_model_with_one_resolution sd15 ${sd15_path} 30 oneflow none 512 512 False"
+ # ... other commands ...
+ ;;
+ # ... other models ...
+ esac
+}
+
+# Main execution
+if check_parallel; then
+ # Generate all commands and run in parallel
+ for model in ${run_model//,/ }; do
+ generate_benchmark_commands "$model"
+ done | parallel --will-cite
+else
+ # Existing sequential execution
+ if [[ "${run_model}" =~ sd15|all ]]; then
+ # ... existing code ...
+ fi
+ # ... other models ...
+fi
Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Shellcheck
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 81-81: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 85-85: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (6)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (3)
29-29
: Remove unused importsThe following imports are not used in the code:
os
(line 29)quantize_pipe
fromonediffx
(line 39)-import os import time import matplotlib.pyplot as plt import numpy as np import torch from diffusers.utils import load_image -from onediffx import ( # quantize_pipe currently only supports the nexfort backend. - compile_pipe, - quantize_pipe, -) +from onediffx import compile_pipe # quantize_pipe currently only supports the nexfort backend.Also applies to: 39-39
🧰 Tools
🪛 Ruff
29-29:
os
imported but unusedRemove unused import:
os
(F401)
74-79
: Fix formatting in compiler choicesThere's a missing space after comma in the choices list.
parser.add_argument( "--compiler", type=str, default=COMPILER, - choices=["none", "transform" ,"nexfort", "compile", "compile-max-autotune"], + choices=["none", "transform", "nexfort", "compile", "compile-max-autotune"], )
177-179
: Clean up commented codeEither remove or implement the alternative throughput calculation. Commented code can lead to confusion and maintenance issues.
- # pixels_processed = height * width * n_steps - # throughput = pixels_processed / inference_time throughput = n_steps / inference_timeonediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py (3)
25-42
: Consider organizing imports following PEP 8 style guide.Group imports in the following order with a blank line between each group:
- Standard library imports
- Third-party imports
- Local application imports
Apply this organization:
-import argparse -import importlib -import inspect -import json -import os -import time - -import matplotlib.pyplot as plt -import numpy as np -import torch -from diffusers.utils import load_image - -from onediffx import ( - compile_pipe, - quantize_pipe, -) -from PIL import Image, ImageDraw +# Standard library +import argparse +import importlib +import inspect +import json +import os +import time + +# Third-party +import matplotlib.pyplot as plt +import numpy as np +import torch +from diffusers.utils import load_image +from PIL import Image, ImageDraw + +# Local +from onediffx import ( + compile_pipe, + quantize_pipe, +)🧰 Tools
🪛 Ruff
30-30:
os
imported but unusedRemove unused import:
os
(F401)
46-93
: Add type hints and help text to improve CLI usability.The argument parser would benefit from help text descriptions and type hints to make it more user-friendly.
Example improvement for a few arguments:
- parser.add_argument("--model", type=str, default=MODEL) - parser.add_argument("--variant", type=str, default=VARIANT) - parser.add_argument("--steps", type=int, default=STEPS) + parser.add_argument( + "--model", + type=str, + default=MODEL, + help="Path or name of the pretrained model to use", + ) + parser.add_argument( + "--variant", + type=str, + default=VARIANT, + help="Specific model variant to use (if applicable)", + ) + parser.add_argument( + "--steps", + type=int, + default=STEPS, + help="Number of inference steps for generation", + )
212-212
: Remove or implement commented code.The commented
plt.savefig("output.png")
line should either be removed or implemented with a configurable output path.Consider adding a parameter to control figure saving:
- # plt.savefig("output.png") + if args.save_plot: + plt.savefig(args.plot_output or "throughput_plot.png")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1 hunks)
- onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py (1 hunks)
🧰 Additional context used
🪛 Ruff
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
29-29:
os
imported but unusedRemove unused import:
os
(F401)
39-39:
onediffx.quantize_pipe
imported but unusedRemove unused import:
onediffx.quantize_pipe
(F401)
159-159: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py
30-30:
os
imported but unusedRemove unused import:
os
(F401)
156-156: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
🔇 Additional comments (1)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1)
146-170
: LGTM: Effective CUDA event timing implementationThe
IterationProfiler
class correctly uses CUDA events for accurate timing measurements.🧰 Tools
🪛 Ruff
159-159: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
# config with string | ||
options = '{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' | ||
pipe = compile_pipe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move hardcoded compiler configuration to constants
The default compiler configuration is hardcoded in the function. Consider moving it to module-level constants for better maintainability.
+DEFAULT_COMPILER_CONFIG = '{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}'
+
def main():
# ...
else:
# config with string
- options = '{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}'
+ options = DEFAULT_COMPILER_CONFIG
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# config with string | |
options = '{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' | |
pipe = compile_pipe( | |
DEFAULT_COMPILER_CONFIG = '{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' | |
# config with string | |
options = DEFAULT_COMPILER_CONFIG | |
pipe = compile_pipe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1)
29-29
: Remove unused imports
The following imports are not used in the code:
os
(line 29)quantize_pipe
fromonediffx
(line 42)
-import os
import time
import matplotlib.pyplot as plt
import numpy as np
import torch
from diffusers.utils import load_image
from nexfort.compilers.transform_model import transform_model
from nexfort.quantization import quantize
-from onediffx import ( # quantize_pipe currently only supports the nexfort backend.
- compile_pipe,
- quantize_pipe,
-)
+from onediffx import compile_pipe # quantize_pipe currently only supports the nexfort backend.
Also applies to: 42-42
🧰 Tools
🪛 Ruff
29-29: os
imported but unused
Remove unused import: os
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1 hunks)
🧰 Additional context used
🪛 Ruff
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
29-29: os
imported but unused
Remove unused import: os
(F401)
42-42: onediffx.quantize_pipe
imported but unused
Remove unused import: onediffx.quantize_pipe
(F401)
163-163: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
benchmarks/run_benchmark.sh (3)
10-12
: Document environment variables.Add comments explaining the purpose and impact of each environment variable:
# set environment variables +# Enable graph caching for better performance export NEXFORT_GRAPH_CACHE=1 +# Force Triton SDPA (Scaled Dot Product Attention) implementation export NEXFORT_FX_FORCE_TRITON_SDPA=1
35-36
: Document prompt and quantization configuration.Add comments explaining:
- The purpose of the default prompt
- The quantization configuration parameters and their impact
+# Default prompt for generating test images prompt="beautiful scenery nature glass bottle landscape, purple galaxy bottle" +# FP8 quantization config with E4M3 format for both forward and backward, using dynamic per-tensor scaling quantize_config='{"quant_type": "fp8_e4m3_e4m3_dynamic_per_tensor"}'
38-44
: Improve compiler configuration documentation and cleanup.
- Remove commented-out configurations if they're no longer needed
- Document the active configurations and their optimization modes
-# oneflow 没有compiler_config -#sd15_nexfort_compiler_config="" -#sd21_nexfort_compiler_config="" -#sdxl_nexfort_compiler_config="" +# SD3 compiler config with maximum optimizations and low precision +# - max-optimize: Enable all possible optimizations +# - max-autotune: Enable autotuning for best performance +# - low-precision: Use reduced precision operations +# - cache-all: Cache all computations sd3_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' +# FLUX compiler config with maximum optimizations flux_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last"}'onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1)
25-46
: Remove unused importsThe following imports are not used in the code:
os
(line 29)quantize_pipe
fromonediffx
(line 42)-import os import time import matplotlib.pyplot as plt import numpy as np import torch from diffusers.utils import load_image from nexfort.compilers.transform_model import transform_model from nexfort.quantization import quantize -from onediffx import ( # quantize_pipe currently only supports the nexfort backend. - compile_pipe, - quantize_pipe, -) +from onediffx import compile_pipe # quantize_pipe currently only supports the nexfort backend.🧰 Tools
🪛 Ruff
29-29:
os
imported but unusedRemove unused import:
os
(F401)
42-42:
onediffx.quantize_pipe
imported but unusedRemove unused import:
onediffx.quantize_pipe
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
benchmarks/run_benchmark.sh
(1 hunks)onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
(1 hunks)
🧰 Additional context used
🪛 Shellcheck
benchmarks/run_benchmark.sh
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 105-105: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
🪛 Ruff
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
29-29: os
imported but unused
Remove unused import: os
(F401)
42-42: onediffx.quantize_pipe
imported but unused
Remove unused import: onediffx.quantize_pipe
(F401)
163-163: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (1)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (1)
102-107
: LGTM!
The GPU memory calculation is correct and properly converts bytes to gigabytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (8)
onediff_diffusers_extensions/examples/flux/README.md (5)
7-17
: Improve documentation formatting and clarity.The installation links should be properly formatted as markdown links for better readability and accessibility.
Apply these changes:
-https://github.com/siliconflow/onediff?tab=readme-ov-file#installation +[OneDiff Installation](https://github.com/siliconflow/onediff?tab=readme-ov-file#installation) -https://github.com/siliconflow/onediff?tab=readme-ov-file#install-a-compiler-backend +[Compiler Backend Installation](https://github.com/siliconflow/onediff?tab=readme-ov-file#install-a-compiler-backend) -HF model: https://huggingface.co/black-forest-labs/FLUX.1-dev and https://huggingface.co/black-forest-labs/FLUX.1-schnell +HF models: +- [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) +- [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) -HF pipeline: https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux +[HF Pipeline Documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux)🧰 Tools
🪛 Markdownlint
7-7: null
Bare URL used(MD034, no-bare-urls)
12-12: null
Bare URL used(MD034, no-bare-urls)
15-15: null
Bare URL used(MD034, no-bare-urls)
15-15: null
Bare URL used(MD034, no-bare-urls)
17-17: null
Bare URL used(MD034, no-bare-urls)
21-27
: Enhance package installation instructions.The package installation section could be improved for better reliability and clarity.
Consider these improvements:
- Pin package versions to ensure reproducibility
- Add a comment explaining the purpose of the
NEXFORT_FX_FORCE_TRITON_SDPA
environment variable- Specify Python version requirements if any
```bash -pip install --upgrade transformers -pip install --upgrade diffusers[torch] -pip install nvidia-cublas-cu12==12.4.5.8 +# Install required packages with pinned versions +pip install transformers==4.36.2 +pip install diffusers[torch]==0.25.0 +pip install nvidia-cublas-cu12==12.4.5.8 +# Enable Triton-based scaled dot-product attention for better performance export NEXFORT_FX_FORCE_TRITON_SDPA=1--- `31-54`: **Improve command documentation for FLUX.1-dev.** The run commands section for FLUX.1-dev needs better formatting and explanation. 1. Add language specifier to code blocks 2. Add explanation for compiler configuration options 3. Consider using a table or list to explain the command-line arguments ```diff ### Run FLUX.1-dev 1024*1024 without compile (the original pytorch HF diffusers baseline) -``` +```bash python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ --model black-forest-labs/FLUX.1-dev \ --height 1024 \ --width 1024 \ --steps 20 \ --seed 1 \ --output-image ./flux.png
Run FLUX.1-dev 1024*1024 with compile [nexfort backend]
+The compiler configuration options:
+-max-optimize
: Enables all available optimizations
+-max-autotune
: Enables autotuning for best performance
+-low-precision
: Uses lower precision where possible
+-cache-all
: Caches all compiled artifacts
+-channels_last
: Uses channels_last memory format for better performance-
+
bash
python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \<details> <summary>🧰 Tools</summary> <details> <summary>🪛 Markdownlint</summary> 32-32: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 44-44: null Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> --- `57-80`: **Improve command documentation for FLUX.1-schnell.** Similar improvements needed for FLUX.1-schnell section. 1. Add language specifier to code blocks 2. Note the difference in steps (4 vs 20) compared to FLUX.1-dev 3. Consider using variables or a configuration file to avoid command duplication ```diff ### Run FLUX.1-schnell 1024*1024 without compile (the original pytorch HF diffusers baseline) -``` +```bash # Note: FLUX.1-schnell uses 4 steps instead of 20 steps in FLUX.1-dev python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \
🧰 Tools
🪛 Markdownlint
58-58: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
70-70: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
83-129
: Enhance performance comparison documentation.The performance comparison sections are well-structured but could be improved for clarity.
Consider these improvements:
- Standardize date format (e.g., YYYY-MM-DD)
- Add testing methodology details:
- Batch size used
- Input prompt length
- Number of runs averaged
- Add a brief explanation of each metric:
- What "Iteration Speed" represents
- What's included in "E2E Time"
- Why warmup times differ significantly between first run and cached run
-Data update date: 2024-10-23 +Data update date: 2023-10-23 # Assuming this is the correct year + +Testing methodology: +- Batch size: <specify> +- Input prompt: <specify length or example> +- Results averaged over <N> runs + +Metrics explanation: +- Iteration Speed: Number of inference steps per second +- E2E Time: Total time including model loading and inference +- Max Memory: Peak GPU memory usage during execution +- Warmup time: First-time compilation and optimization +- Warmup with Cache: Subsequent runs using cached artifactsbenchmarks/run_benchmark.sh (3)
10-12
: Document environment variables.Add comments explaining the purpose and impact of these environment variables:
NEXFORT_GRAPH_CACHE
NEXFORT_FX_FORCE_TRITON_SDPA
# set environment variables +# Enable graph caching for NEXFORT to improve performance export NEXFORT_GRAPH_CACHE=1 +# Force using Triton SDPA (Scaled Dot Product Attention) implementation export NEXFORT_FX_FORCE_TRITON_SDPA=1
35-36
: Make prompt configurable via environment variable or command-line argument.Allow customization of the prompt and quantization config through environment variables or command-line arguments.
-prompt="beautiful scenery nature glass bottle landscape, purple galaxy bottle" -quantize_config='{"quant_type": "fp8_e4m3_e4m3_dynamic_per_tensor"}' +# Default prompt if not provided via environment variable +prompt="${BENCHMARK_PROMPT:-beautiful scenery nature glass bottle landscape, purple galaxy bottle}" + +# Load quantization config from file or use default +quantize_config_file="${QUANTIZE_CONFIG_FILE:-./quantize_config.json}" +if [[ -f "${quantize_config_file}" ]]; then + quantize_config=$(<"${quantize_config_file}") +else + quantize_config='{"quant_type": "fp8_e4m3_e4m3_dynamic_per_tensor"}' +fi
67-76
: Improve script path selection logic.The current script path selection can be made more maintainable using an associative array.
+# Define script paths for different model types +declare -A SCRIPT_PATHS=( + ["sd3"]="onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py" + ["flux"]="onediff_diffusers_extensions/examples/flux/text_to_image_flux.py" + ["default"]="benchmarks/text_to_image.py" +) + +# Get script path based on model name +get_script_path() { + local model_name=$1 + for key in "${!SCRIPT_PATHS[@]}"; do + if [[ "${model_name}" =~ ${key} ]]; then + echo "${SCRIPT_PATHS[${key}]}" + return + fi + done + echo "${SCRIPT_PATHS[default]}" +} + - # if model_name contains sd3, use sd3 script - if [[ "${model_name}" =~ sd3 ]]; then - script_path="onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py" - # if model_name contains flux, use flux script - elif [[ "${model_name}" =~ flux ]]; then - script_path="onediff_diffusers_extensions/examples/flux/text_to_image_flux.py" - else - # otherwise, use sd script - script_path="benchmarks/text_to_image.py" - fi + script_path=$(get_script_path "${model_name}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
benchmarks/run_benchmark.sh
(1 hunks)onediff_diffusers_extensions/examples/flux/README.md
(1 hunks)
🧰 Additional context used
🪛 Shellcheck
benchmarks/run_benchmark.sh
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 101-101: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
[warning] 105-105: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
🪛 Markdownlint
onediff_diffusers_extensions/examples/flux/README.md
7-7: null
Bare URL used
(MD034, no-bare-urls)
12-12: null
Bare URL used
(MD034, no-bare-urls)
15-15: null
Bare URL used
(MD034, no-bare-urls)
15-15: null
Bare URL used
(MD034, no-bare-urls)
17-17: null
Bare URL used
(MD034, no-bare-urls)
32-32: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
44-44: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
58-58: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
70-70: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (2)
29-29
: Remove unused imports.The following imports are not used in the code:
os
(line 29)onediffx.quantize_pipe
(line 42)-import os import time import matplotlib.pyplot as plt import numpy as np import torch from diffusers.utils import load_image from nexfort.compilers.transform_model import transform_model from nexfort.quantization import quantize -from onediffx import ( # quantize_pipe currently only supports the nexfort backend. - compile_pipe, - quantize_pipe, -) +from onediffx import compile_pipe # quantize_pipe currently only supports the nexfort backend.Also applies to: 42-42
🧰 Tools
🪛 Ruff
29-29:
os
imported but unusedRemove unused import:
os
(F401)
187-189
: Parameterize resolution values in benchmarking.The resolution values (1024x1024) are hardcoded, which limits the function's flexibility. Consider accepting these as parameters with default values.
-def generate_data_and_fit_model(model, steps_range): - height, width = 1024, 1024 +def generate_data_and_fit_model(model, steps_range, height=1024, width=1024):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
(1 hunks)
🧰 Additional context used
🪛 Ruff
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
29-29: os
imported but unused
Remove unused import: os
(F401)
42-42: onediffx.quantize_pipe
imported but unused
Remove unused import: onediffx.quantize_pipe
(F401)
163-163: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (4)
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py (4)
85-88
: Skip: Boolean type conversion improvement.
144-144
: Skip: Safety checker disabled.
163-163
: Skip: Mutable default argument.
🧰 Tools
🪛 Ruff
163-163: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
383-385
: Skip: Error handling in warmup iterations.
Summary by CodeRabbit
Release Notes
New Features
Improvements