Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 4: GCloud Branch of SCC Training Material #215

Open
nyameko opened this issue Oct 20, 2024 · 8 comments
Open

Week 4: GCloud Branch of SCC Training Material #215

nyameko opened this issue Oct 20, 2024 · 8 comments
Labels
admin Admin related to technical work preparation documentation Improvements or additions to documentation enhancement New feature or request

Comments

@nyameko
Copy link
Contributor

nyameko commented Oct 20, 2024

  • Topics: Document the steps involved in your cloud infrastructure deployment.
  • Tasks:
    • Prepare user documentation for someone else to be able to reproduce your steps.
@nyameko nyameko added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 20, 2024
@nyameko nyameko added the admin Admin related to technical work preparation label Oct 20, 2024
@ZamaZethu
Copy link
Collaborator

Greetings Team @TarynNicole @hoperiley0806 @ThabangMmusi @Nyadzani26 @mpumelelo-ndlovu

I hope you are making progress on your project.

  • You need to have started your poster by now.

  • You need to have a fork of the Student Cluster Competition Git Repo, that your whole team and mentor has access to and a feature-branch for the project you are working on.

If you would like my direct assistance and guidance, you can add me to this repo.

  • You need to have started preparing your competition benchmarks, i.e.:

You should have a repo, with at the very least a folder for each of the competition benchmarks and a README.md for each with information that you've already researched regarding building, installing, compiling, and running the various benchmarks.
You need to breakdown and allocate the various benchmarks and tasks amongst your teammates.

  • You need to start preparing a short presentation (lightning talk)... Each of your team members (excluding mentor) needs to start thinking about which benchmarking / aspect of your cluster they are responsible for, and think of some talking points to present to the judges

If you would like to arrange a meeting with me to discuss and review your progress, please let me know.

DO NOT LEAVE THINGS TO THE LAST MINUTE!!!

Do a little bit of work every week and ensure you're consistently making progress.

Good Luck with the remainder of your exams and assignments!

@TarynNicole
Copy link
Collaborator

@ZamaZethu please can we arrange a meeting with you tomorrow if you are available? We would like to discuss some issues we are encountering.

@ZamaZethu
Copy link
Collaborator

@TarynNicole no problem, I'm available from 1pm tomorrow.. Please me a meeting invitation.

@TarynNicole
Copy link
Collaborator

TarynNicole commented Nov 12, 2024

@ZamaZethu, kindly find below the meeting invite:

You're invited to CHPC Parallel Pioneers Meeting

Wed Nov 13, 2024

1:00 pm—2:00 pm (SAST)

https://teams.microsoft.com/l/meetup-join/19%3ameeting_YzdjNWNmYmItNzgwMC00ZTYxLTk3OWMtYzFlYTFlOWE5ZjM1%40thread.v2/0?context=%7b%22Tid%22%3a%22acbcaed8-7adc-460c-ba57-028bdc80d84a%22%2c%22Oid%22%3a%227e42eaeb-6f6a-4545-9530-463949cf5c57%22%7d

Tap on the link or paste it in a browser to join.

@nyameko
Copy link
Contributor Author

nyameko commented Nov 19, 2024

Hi Team,

In less than two weeks time, you'll be traveling to Gqeberha to participate in the Student Cluster Competition at the CHPC National Conference.

By this stage you should have a dedicated plan in place to tackle the Competition Benchmarks. You MUST KNOW HOWTO:

  • Configure and deploy your Operating Systems on your bare-metal clusters,
  • Configure and deploy your Infrastructure, Networking and Software Stack (drivers, environments, lmod, compilers, mpi, etc, etc...)

You have VM's to practice all of this. Make use of a private GitHub repo. You are strongly encourage to invite me, Mmabatho and / or your project mentors to assist and review your progress on your private repos.

Break down the tasks that you have to complete and split them up amongst your team mates, so that you have an expert in each area.

Assign a project manager (does not have to be the team lead or technical lead) to make sure everyone is doing what they are supposed to be doing and to keep track of progress from now and throughout the competition.

Ensure that you have practiced deploying and installing the known benchmarks:

  • Synthetics
    • HPCC
      • LINPACK
    • HPCG
  • LAMMPS
  • NWCHEM
  • RegCM
  • MILC
  • MATLAB
  • Secret Applications

You will be interacting with CSIR Staff, High School, University Undergraduate and Postgraduate, as well as full time employed working students, and general industry professionals. Use this opportunity to network, engage and meet prospective contacts that may assist you in your own careers.

To help you introduce and present yourselves to all of the prospective guests, delegates and judges that may pass by your booths, in addition to your posters, you are strongly encouraged to prepare a slide-deck and associated lightning talks, briefly covering aspects of your journey from the Selection Round up until the Competition Floor at the CHPC Conference. Your slide deck should be brief and succinct and include:

  • 1 Slide with a group photo, with names, degrees, year of study of all members including mentor - indicate Team Name and Institution on this slide,
  • 1 Slide with a picture very briefly covering your experience of the Selection Round and the Build-Up to Nationals,
  • 1 - 3 Slides describing your project:
    • Not everyone who comes by your booth will have a technical background in the specific topic you've been assigned,
    • Make sure you explain the task you were given,
    • Explain the challenges you encounters,
    • Explain the results you've obtained,
    • Explain the additions (if any) that you've made to the SCC Selection Round Training Content,
    • Remember that a "GOOD" picture or diagram is worth a thousand words...
  • 1 - 2 Slides describing your cluster design:
    • motivate your decisions,
    • explain the pros and cons,
    • explain your networking and interconnectivity,
    • explain you choice of OS and software stack
    • having prometheus and grafana dashboard "might" impress some judges...
  • 1 slide per application with:
    • DO NOT COPY AND PASTE GENERAL INFO FROM WIKIPEDIA OR THE APPLICATION PAGE,
    • At most, have one short sentence briefly describing the benchmark,
    • Describe how you installed it, frameworks, software stacks used, dependencies
    • Get to reporting your application benchmark results quickly,
    • Briefly describe challenges faced, improvements planned or implemented,
    • If you have a "meaningful" diagram, image or graph, this can go on a 2nd slide for the application.
  • Final slide, with a general project plan overview of:
    • What are you currently working on,
    • What have you completed,
    • Where are you stuck,
    • What is you plan to get unstuck

This may seem like a daunting amount of information, but it is here to assist you in succeeding in the competition. Everybody in the group needs to demonstrate to the judges that they've ALL CONTRIBUTED to the success of the team.

There is not much time left. You should be finalizing your posters and preparing your presentations.

I will available myself tomorrow and maybe Friday for an hour between 15:00 - 16:00 for a general Teams Call for anyone to come and ask questions about anything related to the competition.

@ZamaZethu
Copy link
Collaborator

Hi @TarynNicole @hoperiley0806 @ThabangMmusi @mpumelelo-ndlovu @Nyadzani26

I hope all is well, This is a reminder that the project poster is due today.

Enjoy the rest of your night.

@ThabangMmusi
Copy link
Collaborator

Hi @ZamaZethu

I hope you are well, can you please assist with Open MPI? We keep getting one error

ubuntu@headnode:~/RegCM/run$ mpirun -np 1 -host node1 ../bin/regcmMPI isc24_small.in
[mpiexec@headnode] Error: Unable to run bstrap_proxy on node1 (pid 13620, exit code 768)
[mpiexec@headnode] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:157): check exit codes error
[mpiexec@headnode] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:206): poll for event error
[mpiexec@headnode] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1069): error waiting for event
[mpiexec@headnode] Error setting up the bootstrap proxies
[mpiexec@headnode] Possible reasons:
[mpiexec@headnode] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@headnode] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts.
[mpiexec@headnode]    Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@headnode] 3. Firewall refused connection.
[mpiexec@headnode]    Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@headnode] 4. Ssh bootstrap cannot launch processes on remote host.
[mpiexec@headnode]    Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@headnode]    You may try using -bootstrap option to select alternative launcher.
[bstrap:0:0@compute1] HYD_sock_connect (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:209): getaddrinfo returned error -3 (Temporary failure in name resolution)
[bstrap:0:0@compute1] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:538): unable to connect to server headnode at port 39999 (check for firewalls!)

@nyameko
Copy link
Contributor Author

nyameko commented Nov 28, 2024

Hi @ThabangMmusi ,

Try initiating you mpirun command with the following debugging commands.

mpirun --debug-daemons --mca plm_base_verbose 10 --mca oob_base_verbose 10 -np 16 --hostfile hosts.txt 

Make sure you've added node1 to your hosts file, or alternatively as above add ip addresses in the hosts.txt file.

As the error message also indicated, make sure you've open firewall ports for the socket to bind to, or alternatively disable your firewall and flush the rules and test again.

Good Luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin Admin related to technical work preparation documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants