Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

Open
Jeremy1189 opened this issue Nov 12, 2024 · 4 comments

Comments

@Jeremy1189
Copy link

we use the Pytorch backend for the distillation process, and download the model at /prep-run-train/output/models/task.0000 by "dpgen2 download ..." command, then get a model: model.ckpt.pt, and frozen it by the " dp --pt freeze -o model.pth" command (need a manual add checkpoint file) and obtain the model.pth. However, this model.pth cannot compress by the Pytorch compression command "dp compress -i model.pth -o model-compress.pth", and it gives the following error message:

root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp compress -i model.pth -o model_compress.pth
2024-11-12 14:48:51.537628: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-12 14:48:51.537679: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-12 14:48:51.537696: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-12 14:48:51.544735: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From /opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:108: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:tensorflow:disable_mixed_precision_graph_rewrite() called when mixed precision is already disabled.
Traceback (most recent call last):
File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main
deepmd_main(args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/main.py", line 81, in main
compress(**dict_args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/compress.py", line 98, in compress
graph, _ = load_graph_def(input)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/utils/graph.py", line 42, in load_graph_def
graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

@Jeremy1189
Copy link
Author

dp --pt compress -i model.pth -o compress_model.pth also not work
root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# ls
checkpoint model.ckpt.pt model.pth
root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp --pt compress -i model.pth -o compress_model.pth
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-12 21:39:04,666] DEEPMD INFO DeePMD version: 3.0.0b3
Traceback (most recent call last):
File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main
deepmd_main(args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 577, in main
raise RuntimeError(f"Invalid command {FLAGS.command}!")
RuntimeError: Invalid command compress!

@Jeremy1189
Copy link
Author

Jeremy1189 commented Nov 12, 2024

The setting of descriptor during distillation with the "attn_layer": 0
"descriptor": {
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 0.50,
"rcut": 6.00,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"attn": 128,
"attn_layer": 0,
"attn_dotr": true,
"attn_mask": false,
"precision": "float64",
"_comment2": " that's all"

@wanghan-iapcm
Copy link

b3 only supports compression with tf backend.

@Jeremy1189
Copy link
Author

Whether the b4 supports compression with pytorch backend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants