-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run Tutorial_2.1_MatterGPT_eform #10
Comments
Thank you for the update, but the issue in Tutorial 2.1 doesn't seem to be solved. The problem seems to originate in line 67 of the "1. building training set" cell, specifically with the |
I noticed that your conda location is ~/.conda, which indicates you're not using the provided Docker image. If you wish to run this tutorial on your own machine, you'll need to install not only SLICES but also the Slurm queue system to manage calculations, which can be a challenging task. I suggest you follow the steps below to setup the jupyter backend with the docker image provided. Jupyter backend setup (2) Put Materials Project's new API key in "APIKEY.ini". (3) Edit "CPUs" in "slurm.conf" to set up the number of CPU threads available for the docker container. (4) Run following commands in terminal (Linux or WSL2 Ubuntu on Win11) # Download SLICES_docker with pre-installed SLICES and other relevant packages.
docker pull xiaohang07/slices:v9
# Make entrypoint_set_cpus.sh executable
sudo chmod +x entrypoint_set_cpus_jupyter.sh
# Repalce "[]" with the absolute path of this repo's unzipped folder to setup share folder for the docker container.
docker run -it -p 8888:8888 -h workq --shm-size=0.5gb --gpus all -v /[]:/crystal xiaohang07/slices:v9 /crystal/entrypoint_set_cpus_jupyter.sh If you want to install slurm on your own machine, then follow these steps: apt update \
&& apt install munge slurm-wlm slurm-wlm-doc slurm-wlm-torque -y \
&& rm -rf /var/spool/slurm-llnl \
&& mkdir /var/spool/slurm-llnl \
&& chown -R slurm.slurm /var/spool/slurm-llnl \
&& rm -rf /var/run/slurm-llnl/ \
&& mkdir /var/run/slurm-llnl/ \
&& chown -R slurm.slurm /var/run/slurm-llnl/
#修改slurm.conf内容(改变cpu的数量,以及hostname到你的hostname),然后
cp ./slurm.conf /etc/slurm-llnl/
service munge restart \
&& service slurmctld restart \
&& service slurmd restart In addition, you should modify the 0_run.pbs files to fit your envs. Another workaround is: 如果您不想安装Slurm任务管理系统,那么需要修改utils.py的代码,在splitRun函数内部替换 qsub 0_run.pbs为 python 0_run.py,并且确认线程数不会超过电脑的cpu线程数量,否则会出现计算资源挤占的问题. 要感谢你提出这个问题,我在教程的开头加上了这些详细的描述,可能会帮助避免出现类似问题。 I have sent you a private message on linkedin with my wechat ID, BTW. |
Hi,
I’m encountering some issues with Tutorial 2.1 and wanted to ask if there might be something missing or incorrect.
Additionally, when I comment out the show_progress() function, it leads to another error
Could you please correct that? Thanks for your help.
The text was updated successfully, but these errors were encountered: