Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

Open
4 tasks
basantaxpatra opened this issue Aug 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@basantaxpatra
Copy link

System Info

System Configuration: Single node Habana Gaudi setup
Firmware Version: hl-1.15.0-fw-48.2.1.1
Software Stack: Synapse AI 1.15

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

$ docker pull vault.habana.ai/gaudi-docker/1.15.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest
$ docker run --rm -it vault.habana.ai/gaudi-docker/1.15.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest bash
$ git clone [email protected]:huggingface/optimum-habana.git
$ optimum-habana
$ pip install .
$ cd examples/stable-diffusion
$ pip install -r requirements.txt
$ python text_to_image_generation.py
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off"
--num_images_per_prompt 20
--batch_size 8
--image_save_dir /tmp/stable_diffusion_xl_images
--scheduler euler_discrete
--use_habana
--use_hpu_graphs
--gaudi_config Habana/stable-diffusion
--bf16

Logs for reference:
2 prompt(s) received, 20 generation(s) per prompt, 8 sample(s) per batch, 5 total batch(es).
{'generation_runtime': 470.2324, 'generation_samples_per_second': 0.219, 'generation_steps_per_second': 0.068}

initial compilation took 170 seconds, so if we disregard that, it'd be like 300 second for 32 images which is ~9.2 seconds per image on SDXL (H100s are around 2-3seconds depending on sampling params)
[{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:17:58.694850", "statistics": {"TotalNumber": 1, "TotalTime": 2406683, "AvgTime": 2406683.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:19:04.525621", "statistics": {"TotalNumber": 2, "TotalTime": 66733949, "AvgTime": 33366974.5}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:19:05.394485", "statistics": {"TotalNumber": 3, "TotalTime": 66871477, "AvgTime": 22290492.333333332}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:08.701577", "statistics": {"TotalNumber": 4, "TotalTime": 130001484, "AvgTime": 32500371.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:09.602500", "statistics": {"TotalNumber": 5, "TotalTime": 130138275, "AvgTime": 26027655.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:58.849669", "statistics": {"TotalNumber": 6, "TotalTime": 144735532, "AvgTime": 24122588.666666668}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:22:02.477322", "statistics": {"TotalNumber": 7, "TotalTime": 207751639, "AvgTime": 29678805.57142857}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:22:03.371944", "statistics": {"TotalNumber": 8, "TotalTime": 207892568, "AvgTime": 25986571.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:06.978577", "statistics": {"TotalNumber": 9, "TotalTime": 271316124, "AvgTime": 30146236.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:56.499370", "statistics": {"TotalNumber": 10, "TotalTime": 285510855, "AvgTime": 28551085.5}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:57.930979", "statistics": {"TotalNumber": 11, "TotalTime": 285652606, "AvgTime": 25968418.727272727}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:58.791526", "statistics": {"TotalNumber": 12, "TotalTime": 285788064, "AvgTime": 23815672.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:26:00.652013", "statistics": {"TotalNumber": 13, "TotalTime": 299983406, "AvgTime": 23075646.615384616}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:26:01.511422", "statistics": {"TotalNumber": 14, "TotalTime": 300058888, "AvgTime": 21432777.714285713}},
{"metric_name": "graph_compilation", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341419", "statistics": {"TotalNumber": 14, "TotalTime": 300058888, "AvgTime": 21432777.714285713}},
{"metric_name": "cpu_fallback", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341498", "statistics": {"TotalNumber": 0, "FallbackOps": {}}},
{"metric_name": "memory_defragmentation", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341520", "statistics": {"TotalNumber": 0, "TotalSuccessful": 0, "AvgTime": 0, "MaxTime": 0}}]

Expected behavior

initial compilation took 170 seconds, so if we disregard that, it'd be like 300 second for 32 images which is ~9.2 seconds per image on SDXL. Expecting performance ~ 2-3seconds

@basantaxpatra basantaxpatra added the bug Something isn't working label Aug 2, 2024
@regisss
Copy link
Collaborator

regisss commented Oct 21, 2024

@basantaxpatra Are you still seeing this issue on newer versions of the lib?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants