-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstreaming MLPerf punet changes, server/harness support. #799
Open
monorimet
wants to merge
92
commits into
main
Choose a base branch
from
merge_punet_sdxl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,387
−834
Open
Changes from all commits
Commits
Show all changes
92 commits
Select commit
Hold shift + click to select a range
a3376d9
Bump punet revision to d30d6ff
eagarvey-amd 7cabac0
Enable punet t2i test.
eagarvey-amd 7dfd4c8
Use formatted strings as input to printer.
eagarvey-amd 1cd3ee9
Rework sdxl test to setup with a pipeline, fix unloading submodels, f…
eagarvey-amd 1a90abd
Add switch for punet preprocessing flags
eagarvey-amd b70318d
Xfail punet e2e test.
eagarvey-amd 2d7ebcd
Fixups to sdxl test arguments
eagarvey-amd feebc87
Fix flagset arg and enable vae encode.
eagarvey-amd af7782b
Enable VAE encode validation, mark as xfail
eagarvey-amd eff59a9
Fix formatting
eagarvey-amd 63fb053
fix runner function name in old sd test.
eagarvey-amd aff48ab
Fix xfail syntax.
eagarvey-amd b10ad8d
Update unet script for compile function signature change
eagarvey-amd 321d21d
Update punet to 4d4f955
IanNod 2de912e
Disable vulkan test on MI250 runner.
monorimet 9fdc07f
Change tqdm disable conditions and deepcopy model map on init.
eagarvey-amd b20be32
Don't break workarounds for model path
monorimet 02705a9
Fix for passing a path as attn_spec.
eagarvey-amd 9229aed
Bump punet revision to defeb489fe2bb17b77d587924db9e58048a8c140
eagarvey-amd f09ef4a
Move JIT cpu scheduling load helpers inside conditional.
eagarvey-amd bbcc424
formatting
eagarvey-amd 1f19c7f
Don't pass benchmark as an export arg.
eagarvey-amd 39c0c00
Changes so no external downloads. (#781)
saienduri 3c59b25
fix so that we check exact paths as well for is_prepared (#782)
saienduri 2e9de46
Update punet to 60edc91
IanNod aa0ac2b
Vae weight path none check (#784)
saienduri 6556a36
Bump punet to mi300_all_sym_8_step10 (62785ea)
monorimet 2c49cb6
Changes so that the default run without quant docker will work as wel…
saienduri cb911b1
Bump punet to 361df65844e0a7c766484707c57f6248cea9587f
eagarvey-amd d857f77
Sync flags to sdxl-scripts repo (#786)
saienduri 37548f2
Integrate int8 tk kernels (#783)
nithinsubbiah 25b2462
Update punet revision to deterministic version (42e9407)
monorimet 0e57b4e
Integration of tk kernels into pipeline (#789)
saienduri 920dbf5
Update unet horizontal fusion flag (#790)
saienduri 6f16731
Revert "Update unet horizontal fusion flag (#790)"
saienduri 15dbd93
[tk kernel] Add support to match kernel with number of arguments and …
nithinsubbiah 0c02652
Add functionality to SD pipeline and abstracted components for saving…
monorimet 3fd954b
Remove download links for tk kernels and instead specify kernel direc…
nithinsubbiah 7f8a2b0
Update to best iteration on unet weights (#794)
saienduri bf63aec
Add missing tk_kernel_args arg in function calls (#795)
nithinsubbiah a74d98e
update hash for config file
saienduri 925cd0c
Fix formatting
eagarvey-amd 7715fd0
Point to sdxl-vae-fix branch of iree-turbine.
eagarvey-amd e276c78
Add SD3 to sd_pipeline
eagarvey-amd de5d3de
Update test_models.yml
monorimet d0d3ae6
Remove default in mmdit export args.
eagarvey-amd 403fe47
set vae_harness to False in sdxl test.
eagarvey-amd 0ac6b64
Switch to main branch of iree-turbine
eagarvey-amd 1a41394
Update sd3_vae.py
monorimet 493f260
Remove preprocess arg that fails to parse.
monorimet 711403c
SD3 updates, CLI arguments for multi-device
eagarvey-amd e554da8
Tweaks to requirements, scheduler filenames
eagarvey-amd cdd2f66
xfail stateless llama test
monorimet d23a45b
Flag updates and parametrize a few more args.
eagarvey-amd 7ecfece
Merge branch 'merge_punet_sdxl' of https://github.com/nod-ai/SHARK-Tu…
eagarvey-amd 2d7a92e
Update SDXL tests, README for running on GFX942
eagarvey-amd 18bffdb
Fix vae script CLI and revert precision changes to sd3 text encoders …
eagarvey-amd df85dca
Merge branch 'merge_punet_sdxl' of https://github.com/nod-ai/SHARK-Tu…
eagarvey-amd 674128e
Small fixes to compile modes and requirements
eagarvey-amd 4d6198b
Adds explicit model arch flag, remove commented code
eagarvey-amd f3e3fe3
Fix formatting
eagarvey-amd 2ed8037
Merge branch 'main' into merge_punet_sdxl
monorimet 7adfc7a
Fix formatting
eagarvey-amd ff2c3c9
Update test_models.yml
monorimet afdb8d6
Decompose CLIP attention
eagarvey-amd a4e67e8
decompose implementation for clip
eagarvey-amd 35517d9
Add decompose clip flag to pipe e2e test
eagarvey-amd 6ca109a
Add attention decomposition mechanism to sdxl clip exports.
eagarvey-amd 453fb38
Update compile options for sdxl
eagarvey-amd c0be575
Decompose VAE for cpu
eagarvey-amd e3cd69d
skip i8 punet test on cpu
eagarvey-amd e3e1dcb
Don't use spec for clip by default
eagarvey-amd 56d6ee7
Revert change to attention spec handling in sdxl test
monorimet d330564
Don't use td spec for clip bs2 export test
monorimet ffba3ea
disable attn spec usage for sdxl bs2 on mi250 tests
monorimet fad7e6e
Update test_models.yml
monorimet 05fa32d
Update test_models.yml
monorimet 0291d43
Small fixes to SDXL inference pipeline/exports/compile
eagarvey-amd e337f2a
Pin torch to 2.4.1
eagarvey-amd 0fd8ad0
Largely disables attn spec usage.
eagarvey-amd e1c4ac2
Update canonicalization pass name, decouple model validation from pip…
eagarvey-amd 61bb4ef
Don't use punet spec.
eagarvey-amd dfb9474
Remove default/mfma/wmma specs from sd compile utils.
eagarvey-amd 9fe20a6
Guard path check for attn spec
eagarvey-amd f39b2d2
Separate punet run
eagarvey-amd d3c8e80
typo fixes
eagarvey-amd 40808db
Filename fixes, explicit input dtypes for i8 punet
eagarvey-amd e630d39
Update CPU test configuration.
eagarvey-amd fc6d018
Decompose VAE for cpu
eagarvey-amd 7d50dc8
Change compile flag reporting to CLI input
eagarvey-amd f140926
formatting
eagarvey-amd 67e6558
Rework prompt encoder export on aot.export API
eagarvey-amd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,16 @@ | ||
protobuf | ||
gguf | ||
transformers==4.37.1 | ||
transformers==4.43.3 | ||
torchsde | ||
accelerate | ||
peft | ||
safetensors>=0.4.0 | ||
diffusers @ git+https://github.com/nod-ai/[email protected] | ||
brevitas @ git+https://github.com/Xilinx/brevitas.git@6695e8df7f6a2c7715b9ed69c4b78157376bb60b | ||
# turbine tank downloading/uploading | ||
azure-storage-blob | ||
# microsoft/phi model | ||
einops | ||
pytest | ||
scipy | ||
shark-turbine @ git+https://github.com/iree-org/iree-turbine.git@main | ||
-e git+https://github.com/nod-ai/sharktank.git@main#egg=sharktank&subdirectory=sharktank | ||
-e git+https://github.com/nod-ai/sharktank.git@main#egg=sharktank&subdirectory=sharktank |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
49 changes: 49 additions & 0 deletions
49
models/turbine_models/custom_models/sd3_inference/diffusers_ref.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
from diffusers import StableDiffusion3Pipeline | ||
import torch | ||
from datetime import datetime as dt | ||
|
||
|
||
def run_diffusers_cpu( | ||
hf_model_name, | ||
prompt, | ||
negative_prompt, | ||
guidance_scale, | ||
seed, | ||
height, | ||
width, | ||
num_inference_steps, | ||
): | ||
from diffusers import StableDiffusion3Pipeline | ||
|
||
pipe = StableDiffusion3Pipeline.from_pretrained( | ||
hf_model_name, torch_dtype=torch.float32 | ||
) | ||
pipe = pipe.to("cpu") | ||
generator = torch.Generator().manual_seed(int(seed)) | ||
|
||
image = pipe( | ||
prompt=prompt, | ||
negative_prompt=negative_prompt, | ||
num_inference_steps=num_inference_steps, | ||
guidance_scale=guidance_scale, | ||
height=height, | ||
width=width, | ||
generator=generator, | ||
).images[0] | ||
timestamp = dt.now().strftime("%Y-%m-%d_%H-%M-%S") | ||
image.save(f"diffusers_reference_output_{timestamp}.png") | ||
|
||
|
||
if __name__ == "__main__": | ||
from turbine_models.custom_models.sd_inference.sd_cmd_opts import args | ||
|
||
run_diffusers_cpu( | ||
args.hf_model_name, | ||
args.prompt, | ||
args.negative_prompt, | ||
args.guidance_scale, | ||
args.seed, | ||
args.height, | ||
args.width, | ||
args.num_inference_steps, | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we integrate this with a test to output the image numerics we can compare against? I know you saw some significant different numerics between cpu and different gpu backends where this may be difficult to directly compare, maybe some FID/CLIP scores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a faithful comparison with diffusers reference is a larger problem -- we are really best off investing in getting real CLIP/FID scores with a validation dataset. This diffusers reference is really just a hold-over/sanity check for now; I don't even trust it to give us a decent baseline from CPU. We can leave this out for now but I'd rather keep it just to have something ready for comparison with diffusers on ROCM/CUDA