The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

Jeremy1189 · 2024-11-12T07:10:59Z

we use the Pytorch backend for the distillation process, and download the model at /prep-run-train/output/models/task.0000 by "dpgen2 download ..." command, then get a model: model.ckpt.pt, and frozen it by the " dp --pt freeze -o model.pth" command (need a manual add checkpoint file) and obtain the model.pth. However, this model.pth cannot compress by the Pytorch compression command "dp compress -i model.pth -o model-compress.pth", and it gives the following error message:

root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp compress -i model.pth -o model_compress.pth
2024-11-12 14:48:51.537628: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-12 14:48:51.537679: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-12 14:48:51.537696: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-12 14:48:51.544735: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From /opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:108: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
WARNING:tensorflow:disable_mixed_precision_graph_rewrite() called when mixed precision is already disabled.
Traceback (most recent call last):
File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main
deepmd_main(args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/main.py", line 81, in main
compress(**dict_args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/compress.py", line 98, in compress
graph, _ = load_graph_def(input)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/utils/graph.py", line 42, in load_graph_def
graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

Jeremy1189 · 2024-11-12T13:41:45Z

dp --pt compress -i model.pth -o compress_model.pth also not work
root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# ls
checkpoint model.ckpt.pt model.pth
root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp --pt compress -i model.pth -o compress_model.pth
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-12 21:39:04,666] DEEPMD INFO DeePMD version: 3.0.0b3
Traceback (most recent call last):
File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main
deepmd_main(args)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 577, in main
raise RuntimeError(f"Invalid command {FLAGS.command}!")
RuntimeError: Invalid command compress!

Jeremy1189 · 2024-11-12T13:45:39Z

The setting of descriptor during distillation with the "attn_layer": 0
"descriptor": {
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 0.50,
"rcut": 6.00,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"attn": 128,
"attn_layer": 0,
"attn_dotr": true,
"attn_mask": false,
"precision": "float64",
"_comment2": " that's all"

wanghan-iapcm · 2024-11-13T03:17:20Z

b3 only supports compression with tf backend.

Jeremy1189 · 2024-11-13T03:19:41Z

Whether the b4 supports compression with pytorch backend?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

Jeremy1189 commented Nov 12, 2024

Jeremy1189 commented Nov 12, 2024

Jeremy1189 commented Nov 12, 2024 •

edited

Loading

wanghan-iapcm commented Nov 13, 2024

Jeremy1189 commented Nov 13, 2024

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

Comments

Jeremy1189 commented Nov 12, 2024

Jeremy1189 commented Nov 12, 2024

Jeremy1189 commented Nov 12, 2024 • edited Loading

wanghan-iapcm commented Nov 13, 2024

Jeremy1189 commented Nov 13, 2024

Jeremy1189 commented Nov 12, 2024 •

edited

Loading