Super slow on Mac MPS #16

ran-weii · 2024-11-12T15:22:25Z

Hi, a follow up on #15: I compared cpu vs mps and compile vs no compile on halfcheetah for 100k steps using SAC. It shows that mps is significantly slower than cpu, and aot_eager backend makes compile slower and much more so for cpu, tho the default inductor backend makes compile quite a bit faster for cpu but doesn't work for mps.

Code change is the following:

if args.compile:
        mode = None  # "reduce-overhead" if not args.cudagraphs else None
        backend = "aot_eager" if device == torch.device("mps") else "inductor"
        update_main = torch.compile(update_main, mode=mode, backend=backend)
        update_pol = torch.compile(update_pol, mode=mode, backend=backend)
        policy = torch.compile(policy, mode=mode, backend=backend)

The text was updated successfully, but these errors were encountered:

vmoens · 2024-11-14T13:42:06Z

I'm looking into this. There aren't any more graph break but even collecting data is slower on MPS.
I ran this

python -m cProfile -o prof.prof leanrl/sac_continuous_action_torchcompile.py --compile --learning_starts=100 --total_timesteps=5000

and there are several interesting things in the profile:

13% of runtime is spent in torch.randint

This benchmark is also funny to look at

<torch.utils.benchmark.utils.common.Measurement object at 0x1147fde10>
torch.randint(1_000_000, (50,), device='cpu')
  Median: 1.87 us
  IQR:    0.08 us (1.83 to 1.92)
  5144 measurements, 1 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x10022abc0>
torch.randint(1_000_000, (50,), device='cpu').to('mps')
  Median: 596.19 us
  IQR:    57.96 us (578.77 to 636.73)
  18 measurements, 1 runs per measurement, 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x1147fee90>
torch.randint(1_000_000, (50,), device='mps')
  Median: 20.42 us
  IQR:    1.67 us (20.04 to 21.71)
  449 measurements, 1 runs per measurement, 1 thread

I raised an issue about this: pytorch/pytorch#140706

Another big chunk of time is spent in torch fx-related functions, which is also quite weird:

I'll keep you posted, but working with an MPS backend may not be a suitable option for the time being!

ran-weii · 2024-11-14T14:54:10Z

Thanks!

vmoens mentioned this issue Nov 15, 2024

MPS random number generation is slow, if not hanging forever pytorch/pytorch#140706

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super slow on Mac MPS #16

Super slow on Mac MPS #16

ran-weii commented Nov 12, 2024

vmoens commented Nov 14, 2024

ran-weii commented Nov 14, 2024

Super slow on Mac MPS #16

Super slow on Mac MPS #16

Comments

ran-weii commented Nov 12, 2024

vmoens commented Nov 14, 2024

ran-weii commented Nov 14, 2024