Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated code for conda, python 3.7, PyTorch 1.8, cuda 10/11 on ubuntu 20.04 #592

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from

Conversation

mikeseven
Copy link

@mikeseven mikeseven commented May 1, 2021

This PR updates AIMET to almost latest versions of various dependencies in a conda environment.

  • aimet environment in packaging/environment.yml
  • Python 3.7
  • google test 1.10
  • cuda 10 and 11

Tensorflow remains at 1.15 since AIMET uses deprecated contrib code not available in TF 2. Hopefully, AIMET team would upgrade one day!

Conda TF 1.15 conflicts with Pytorch on cuda, forcing Pytorch to be CPU only. There is no such issue with TF 2.

Most tests passes (dobuildntest.sh -u) though many have errors regardless of CPU or GPU. AIMET team, please check.

No update on docker containers. Still on old versions with many tests not passing.

AIMET team, please help make this code robust.

Mikael Bourges-Sevenier and others added 21 commits May 1, 2021 12:14
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Sangeetha Siddegowda <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
…tom markers with pytorch layer names

Also reverted back ONNX export mode to default(eval) from training

Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
… nodes.

Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Sendil Krishna <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Bharath Ramaswamy <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
TF tests fail because using tf.contrib modules removed in TF 2.x

Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
added conda environment

Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
updated to cmake 3.18+ to support cuda architectures

Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
CPU tests passing in conda

Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
…f quantizers

Signed-off-by: Sangeetha Siddegowda <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
…rner cases

Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Abhi Khobare <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
Signed-off-by: Mikael Bourges-Sevenier <[email protected]>
@mikeseven
Copy link
Author

PR is failing because your CI is using an older version of cmake. Without newer version, Cuda and lapack are not correctly found and linked on conda environments. @quic-akhobare please upgrade.

@quic-ssiddego
Copy link
Contributor

quic-ssiddego commented Jul 6, 2021

@mikeseven Thank you for pushing this update. Sorry about the delayed response. And, a quick note - we are working on updating AIMET to support TF2.0. @quic-akhobare @quic-bharathr could you please review this update?

@mikeseven
Copy link
Author

mikeseven commented Jul 6, 2021

Can't wait to test the new version with TF 2. Any tentative timeline?
The updates in this PR are really for build purpose ie. they should work with TF2 too.

@quic-akhobare
Copy link
Contributor

Can't wait to test the new version with TF 2. Any tentative timeline?
The updates in this PR are really for build purpose ie. they should work with TF2 too.

Difficult to project an ETA at the moment.

So one thing we realized that several contrib modules got deprecated and removed in TF 2.4. E.g. AIMET currently depends on contrib.quantize.python.graph_matcher module in TF 1.15. This is not present in TF 2.4. We are looking at alternatives. @mikeseven If you are familiar with this module by chance, any suggestions would be welcome.

# Conflicts:
#	Jenkins/Dockerfile
#	TrainingExtensions/common/CMakeLists.txt
#	TrainingExtensions/tensorflow/src/QcQuantizeOpDeprecated.hpp
#	TrainingExtensions/tensorflow/test/python/test_qc_quantize_op_deprecated.py
#	TrainingExtensions/torch/src/python/aimet_torch/onnx_utils.py
#	TrainingExtensions/torch/src/python/aimet_torch/torchscript_utils.py
#	TrainingExtensions/torch/test/python/test_onnx_utils.py
#	TrainingExtensions/torch/test/python/test_quantizer.py
#	dobuildntest.sh
#	packaging/dependencies/reqs_pip_common.txt
#	packaging/version.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants