-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow on OSX Mac M1 pro/max silicon 32 cores and windows 11 13900k with dual RTX-A4500/A4000 workstation and dual GTX-4090 consumer #13
Comments
running from only using cpu on m1 pro - checking m1-max
|
try https://github.com/apple/tensorflow_macos
actually that repo is 2y old and links back to https://developer.apple.com/metal/tensorflow-plugin/
|
try again
retry using better - see cpu and gpu
|
code
|
Restest on Macbook Pro M1 Max 32g 8p/2e 32core GPU https://developer.apple.com/metal/tensorflow-plugin/ Test only CPU first
Adding GPU capability 54ms/step
|
on slowest windows without GPU - Lenovo X1 Carbon Gen9
|
on fastest windows laptop - Lenovo P17 gen 1 - Xeon W-10855M 2.8Ghz with RTX-5000 (TU-104) https://www.tensorflow.org/install/pip#windows-native
GPU support in Windows WSL
|
windows tensorflow 1.15 only using directml
change the batch size to 128 from 64 and to 10 epochs from 5 - power from 185 to 226 (idle 38)
adjust the batch size based on the gpu - in this case AD102 at 16384 cores
350 watts the max with 400 watt peaks is 4096 batch
|
force an OOM on 23/24G by using 8192 batch size on 16384 processor
Also issues using multi GPU on DirectML
|
Multi GPU specific to DirectML or
tensorflow/tensorflow#19083
|
move distribution to #15 |
look at steps_per_epoch in keras.model.fit to align M1 and RTX
|
Training Times to get past .7 accuracy Macbook Pro M1max 32 core
custom RTX-4500 GA-102 16g on 13900k
custom RTX-4090 AD102 24g on 13900k
Lenovo P17 gen 1 RTX-5000 TU-104
|
tensorflow gpu on docker for windows
|
revisiting on 13900b
running
|
See move/updates on M4 Max, RTX-3500, RTX-6000 and |
See move/updates on M4 Max, RTX-3500, RTX-6000
#71
and
ObrienlabsDev/machine-learning#30
ObrienlabsDev/machine-learning#31
ObrienlabsDev/machine-learning#32
ObrienlabsDev/machine-learning#33
ObrienlabsDev/machine-learning#34
ObrienlabsDev/machine-learning#35
GTX-4090 Ada generation consumer cards
RTX-A4500 Ampere generation professional workstation cards
Stats
https://github.com/ObrienlabsDev/blog/wiki/Machine-Learning-on-local-or-Cloud-based-NVidia-or-Apple-GPUs
Note: CPU is 340% CPU only and 100% GPU therefore 100% is CPU overhead
Mac Mini 2020 M1
Macbook Pro 14 M1 Pro 16G 4p/4e 8 core GPU = 516ms/step CPU at 50%, 79ms/step GPU = 6.5x faster GPU
Macbook Pro 16 M1 Pro 32G 8p/1e 32 core GPU = 437ms/step CPU at 50%, 54ms/step GPU = 10.5x faster GPU and 1.2x/1.5x faster than M1 Pro (49ms using 32 batch down from 64 - matching GPU size - 2.4/32G vram)
Lenovo P17 Gen1 128g RTX-5000 TU104 using batch of 5120 = 190us and 15.6/16G vram
follow
https://developer.apple.com/metal/tensorflow-plugin/
better
https://www.mrdbourke.com/setup-apple-m1-pro-and-m1-max-for-machine-learning-and-data-science/
base system M1 Pro 4p/4e 8 core GPU
The text was updated successfully, but these errors were encountered: