You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to deploy a multi-model tool for mouse behavior classification on Linux, Windows & Mac. For Linux, I use tensor flow 1.15 directly, with Cuda drivers to access the GPU(s). For Mac, I translate the models into .mlmodel files using coremltools. For Windows, I'm trying to use tensorflow-directml in order to easily utilize whatever GPU (Nvidia or AMD) that is available. I'm finding that, on the same laptop with AMD GPU (a MacBook Pro), the tf-directml version runs about 3x slower than the mlmodel version in MacOS. Here are some stats:
Model mlmodel tf-directml notes
detection 0.033 sec 0.088 sec based on inception resnet v2
pose 0.067 sec. 0.248 sec. 8-stack hourglass
I realize I'm running a very early version. Do you expect the performance to improve substantially? Do you have a guess as to when we might see performance improvements?
The text was updated successfully, but these errors were encountered:
Hi Dave, as you mentioned we're still in a very early preview stage and thus far we've been focused more on bringing up functionality and stability, which means we haven't had a ton of opportunity to look at performance yet. As you've noticed, there's ample room for improvement! It's something we're aware of and we do expect to make substantial strides in GPU performance in future, although we don't yet have a concrete timeline for when that'll become available. One thing that would help us in our profiling and performance testing is if we could take a look at the types of models you're using. You mentioned inception-resnet-v2 - is there a particular implementation you're using that's available elsewhere e.g. on GitHub that we could take a look at?
Our version of inception resnet v2 is our own, but should be the same layers as published versions, including the one in the Keras models in the TensorFlow repository. We do add some input processing at the beginning and a detection head at the end. I’m attaching our version, which still uses the deprecated “slim” contrib package, as model_detection.py.
The other model that we need GPU acceleration for is a stacked hourglass heat map model, which we implemented based on a published paper. I’m attaching our implementation of that model as model_pose.py. We currently use a stack of 8, but have determined that accuracy, at least on mice, doesn’t suffer much when reducing the stack size to 4.
I hope this helps your acceleration efforts. Please let me know if there is anything else I can provide or do to help.
I'm trying to deploy a multi-model tool for mouse behavior classification on Linux, Windows & Mac. For Linux, I use tensor flow 1.15 directly, with Cuda drivers to access the GPU(s). For Mac, I translate the models into .mlmodel files using coremltools. For Windows, I'm trying to use tensorflow-directml in order to easily utilize whatever GPU (Nvidia or AMD) that is available. I'm finding that, on the same laptop with AMD GPU (a MacBook Pro), the tf-directml version runs about 3x slower than the mlmodel version in MacOS. Here are some stats:
I realize I'm running a very early version. Do you expect the performance to improve substantially? Do you have a guess as to when we might see performance improvements?
The text was updated successfully, but these errors were encountered: