Skip to content

Starting and Testing a ParallelCluster in CloudShell

Ben Trumbore edited this page Dec 1, 2023 · 2 revisions

Note: This content has not yet been completed or fully arranged and formatted!

Once a ParallelCluster is configured, it is a good idea to test the configuration by creating a cluster, connecting to it and running a simple test program there.

running a Hello World application:

Create and Start the Virtual Cluster

  • Create and start the cluster using the configuration file and giving the cluster a name:

    pcluster create-cluster --cluster-name hello-world --cluster-configuration hello-world.yaml

  • When the cluster creation is complete, JSON describing the cluster will be printed. This is the output of pcluster list-clusters:

    TBD

  • Repeatedly issue the list-clusters command until the status changes to CREATE_COMPLETE before using the cluster

My cluster took longer to create than Bennett's, or so we think. How long did it take? Was it because of min 2 workers? Other reasons? When Bennett tried to start a cluster while Ben's was still starting, it terminated Ben's. Not clear why - are there limitations, or bugs?

Test the Virtual Cluster

• Log in to the head node of the cluster (must supply the private SSH key to be allowed in) and run a test
	○ pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem
		§ Note: Ben got an error here about Boto3 no longer supporting Python 3.7 after December 3, 2023.  Use a different Python?
		§ You will need to accept the connection even though the authenticity is not known - just this one time
	○ You can look at the EC2 dashboard to see the new instances
	○ Create a file called hellojob.sh with the following contents:
		#!/bin/bash
		sleep 30
		echo "Hello World from $(hostname)"
	○ Submit a job that runs that script:
		○ sbatch hellojob.sh
	► Monitor the progress of the job in the queue until it starts and is completed:
		○ squeue
		► The job never ran for Ben - used "scancel 1" to kill it
	○ View the output of the job:
		○ More slurm-2.out
		○ Output looks like:
			asdfasdfasdf
	○ Log off of head node
		○ exit