Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow/directml is slow compared to coremltools on MacOS #68

Open
daverumph opened this issue Oct 7, 2020 · 2 comments
Open

tensorflow/directml is slow compared to coremltools on MacOS #68

daverumph opened this issue Oct 7, 2020 · 2 comments

Comments

@daverumph
Copy link

I'm trying to deploy a multi-model tool for mouse behavior classification on Linux, Windows & Mac. For Linux, I use tensor flow 1.15 directly, with Cuda drivers to access the GPU(s). For Mac, I translate the models into .mlmodel files using coremltools. For Windows, I'm trying to use tensorflow-directml in order to easily utilize whatever GPU (Nvidia or AMD) that is available. I'm finding that, on the same laptop with AMD GPU (a MacBook Pro), the tf-directml version runs about 3x slower than the mlmodel version in MacOS. Here are some stats:

Model                 mlmodel   tf-directml        notes
detection             0.033 sec   0.088 sec       based on inception resnet v2
pose                  0.067 sec.  0.248 sec.      8-stack hourglass

I realize I'm running a very early version. Do you expect the performance to improve substantially? Do you have a guess as to when we might see performance improvements?

@adtsai
Copy link
Contributor

adtsai commented Oct 8, 2020

Hi Dave, as you mentioned we're still in a very early preview stage and thus far we've been focused more on bringing up functionality and stability, which means we haven't had a ton of opportunity to look at performance yet. As you've noticed, there's ample room for improvement! It's something we're aware of and we do expect to make substantial strides in GPU performance in future, although we don't yet have a concrete timeline for when that'll become available. One thing that would help us in our profiling and performance testing is if we could take a look at the types of models you're using. You mentioned inception-resnet-v2 - is there a particular implementation you're using that's available elsewhere e.g. on GitHub that we could take a look at?

@adtsai adtsai transferred this issue from microsoft/DirectML Oct 8, 2020
@daverumph
Copy link
Author

Hi Adrian,

Thanks for your reply.

Our version of inception resnet v2 is our own, but should be the same layers as published versions, including the one in the Keras models in the TensorFlow repository. We do add some input processing at the beginning and a detection head at the end. I’m attaching our version, which still uses the deprecated “slim” contrib package, as model_detection.py.

The other model that we need GPU acceleration for is a stacked hourglass heat map model, which we implemented based on a published paper. I’m attaching our implementation of that model as model_pose.py. We currently use a stack of 8, but have determined that accuracy, at least on mice, doesn’t suffer much when reducing the stack size to 4.

I hope this helps your acceleration efforts. Please let me know if there is anything else I can provide or do to help.

Regards
models.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants