-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359
Comments
I have the exact same issue and message running a simple Keras model. Fresh install, etc. |
Hi @HeloWong, @FabricatiDiem, would you mind including the models that you saw this issue with? I'm not seeing this repro on the Keras tutorial model for MNIST, so it would be helpful for me to test using the scripts you're seeing this with. Please also double-check that your environment is using |
This is a minimal example that somewhat more closely aligns with my actual use case: https://gist.github.com/FabricatiDiem/07b8645faabb1ea0a887550a0544ea9d Note, the example works without error using WSL2 + Docker. The example also tends to work if I tweak it, such as by removing the sparse representation (not feasible in my real use case), by making the feature space smaller, or by reducing the width of the network. Could be a memory issue, but I'm not seeing any memory-related errors, and if it was that, it should affect the Docker version too (I would think). Also, just upgrading Keras to 2.12 breaks TF entirely for me. I'm using a fresh install of the latest tensorflow-directml-plugin package, which installs TF 2.10 and a bunch of other stuff. I'm not able to try out the bleeding-edge Github version on my local setup, so if it is already fixed but off-release, then I'm good with my WSL2+Docker setup until there's a new release. Thanks for looking at the issue. Edit: For completeness, my NVIDIA system information can be found here: https://gist.github.com/FabricatiDiem/fe0667aff7dc529a9b439112194f34b6 #341 looks similar, but I'm not sure. |
I have the same issue on my AMD GPU with latest driver (23.4.3) on squeeznet example at the epoch 38 and on MNIST exaple at epoch 11. i build the plugin from source with tensorflow-cpu 2.12 |
Having this issue too... Using an AMD GPU. Using this example: https://github.com/tensorflow/examples/blob/fb13f7e76d50b446b4b395abcdf09bd4aeddb29a/community/en/transformer_chatbot.ipynb |
any update here? |
I apologize for the delay. We had to pause the development of this plugin until further notice. For the time being, all latest DirectML features and performance improvements are going into onnxruntime for inference scenarios. We'll update this issue if/when things change. |
Envs:
Tensroflow 2.12
tensorflow_directml_plugin-0.5.0-cp39-cp39-win_amd64.whl
Python 3.9
Error:
F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason(), and Python Restart
I build newest tensorflow_directml_plugin 0.5.0, but when I run minst example on tf,
some error happened: F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason()
and GPU memory and shared memory have substantial growth.
The text was updated successfully, but these errors were encountered: