-
Notifications
You must be signed in to change notification settings - Fork 0
Starting and Testing a ParallelCluster in CloudShell
Note: This content has not yet been completed or fully arranged and formatted!
Once a ParallelCluster is configured, it is a good idea to test the configuration by creating a cluster, connecting to it and running a simple test program there.
running a Hello World application:
-
Create and start the cluster using the configuration file and giving the cluster a name:
pcluster create-cluster --cluster-name hello-world --cluster-configuration hello-world.yaml
-
When the cluster creation is complete, JSON describing the cluster will be printed. This is the output of pcluster list-clusters:
TBD
-
Repeatedly issue the list-clusters command until the status changes to CREATE_COMPLETE before using the cluster
My cluster took longer to create than Bennett's, or so we think. How long did it take? Was it because of min 2 workers? Other reasons? When Bennett tried to start a cluster while Ben's was still starting, it terminated Ben's. Not clear why - are there limitations, or bugs?
• Log in to the head node of the cluster (must supply the private SSH key to be allowed in) and run a test
○ pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem
§ Note: Ben got an error here about Boto3 no longer supporting Python 3.7 after December 3, 2023. Use a different Python?
§ You will need to accept the connection even though the authenticity is not known - just this one time
○ You can look at the EC2 dashboard to see the new instances
○ Create a file called hellojob.sh with the following contents:
#!/bin/bash
sleep 30
echo "Hello World from $(hostname)"
○ Submit a job that runs that script:
○ sbatch hellojob.sh
► Monitor the progress of the job in the queue until it starts and is completed:
○ squeue
► The job never ran for Ben - used "scancel 1" to kill it
○ View the output of the job:
○ More slurm-2.out
○ Output looks like:
asdfasdfasdf
○ Log off of head node
○ exit