HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359

HeloWong · 2023-04-17T03:50:15Z

Envs:
Tensroflow 2.12
tensorflow_directml_plugin-0.5.0-cp39-cp39-win_amd64.whl
Python 3.9

Error:
F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason(), and Python Restart

I build newest tensorflow_directml_plugin 0.5.0, but when I run minst example on tf,
some error happened: F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason()
and GPU memory and shared memory have substantial growth.

FabricatiDiem · 2023-04-22T01:31:26Z

I have the exact same issue and message running a simple Keras model. Fresh install, etc.

maggie1059 · 2023-04-25T21:18:36Z

Hi @HeloWong, @FabricatiDiem, would you mind including the models that you saw this issue with? I'm not seeing this repro on the Keras tutorial model for MNIST, so it would be helpful for me to test using the scripts you're seeing this with. Please also double-check that your environment is using keras==2.12, as this latest version of the plugin is not compatible with previous versions of keras.

FabricatiDiem · 2023-04-26T03:09:20Z

This is a minimal example that somewhat more closely aligns with my actual use case: https://gist.github.com/FabricatiDiem/07b8645faabb1ea0a887550a0544ea9d

Note, the example works without error using WSL2 + Docker. The example also tends to work if I tweak it, such as by removing the sparse representation (not feasible in my real use case), by making the feature space smaller, or by reducing the width of the network. Could be a memory issue, but I'm not seeing any memory-related errors, and if it was that, it should affect the Docker version too (I would think).

Also, just upgrading Keras to 2.12 breaks TF entirely for me. I'm using a fresh install of the latest tensorflow-directml-plugin package, which installs TF 2.10 and a bunch of other stuff. I'm not able to try out the bleeding-edge Github version on my local setup, so if it is already fixed but off-release, then I'm good with my WSL2+Docker setup until there's a new release.

Thanks for looking at the issue.

Edit: For completeness, my NVIDIA system information can be found here: https://gist.github.com/FabricatiDiem/fe0667aff7dc529a9b439112194f34b6

#341 looks similar, but I'm not sure.

radudiaconu0 · 2023-05-04T07:28:16Z

I have the same issue on my AMD GPU with latest driver (23.4.3) on squeeznet example at the epoch 38 and on MNIST exaple at epoch 11. i build the plugin from source with tensorflow-cpu 2.12

NateAGeek · 2023-05-21T05:07:04Z

Having this issue too... Using an AMD GPU. Using this example: https://github.com/tensorflow/examples/blob/fb13f7e76d50b446b4b395abcdf09bd4aeddb29a/community/en/transformer_chatbot.ipynb

radudiaconu0 · 2023-06-29T16:38:15Z

any update here?

PatriceVignola · 2023-10-24T02:51:10Z

I apologize for the delay. We had to pause the development of this plugin until further notice. For the time being, all latest DirectML features and performance improvements are going into onnxruntime for inference scenarios. We'll update this issue if/when things change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359

HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359

HeloWong commented Apr 17, 2023

FabricatiDiem commented Apr 22, 2023 •

edited

Loading

maggie1059 commented Apr 25, 2023

FabricatiDiem commented Apr 26, 2023 •

edited

Loading

radudiaconu0 commented May 4, 2023 •

edited

Loading

NateAGeek commented May 21, 2023

radudiaconu0 commented Jun 29, 2023

PatriceVignola commented Oct 24, 2023

HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359

HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason() #359

Comments

HeloWong commented Apr 17, 2023

FabricatiDiem commented Apr 22, 2023 • edited Loading

maggie1059 commented Apr 25, 2023

FabricatiDiem commented Apr 26, 2023 • edited Loading

radudiaconu0 commented May 4, 2023 • edited Loading

NateAGeek commented May 21, 2023

radudiaconu0 commented Jun 29, 2023

PatriceVignola commented Oct 24, 2023

FabricatiDiem commented Apr 22, 2023 •

edited

Loading

FabricatiDiem commented Apr 26, 2023 •

edited

Loading

radudiaconu0 commented May 4, 2023 •

edited

Loading