Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added docs for HPC on K8s. #17

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Added docs for HPC on K8s. #17

wants to merge 9 commits into from

Conversation

shubhamdang
Copy link

No description provided.

Copy link
Member

@ColonelPanics ColonelPanics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts/tweaks that ought to be made before this can be merged.

Also considering locating things within an existing section (perhaps a section within HPC Workflows titled "HPC with Kubernetes"?)

docs/docs/hpc-on-k8s/rke.md Show resolved Hide resolved
docs/docs/hpc-on-k8s/rke.md Show resolved Hide resolved
docs/docs/hpc-on-k8s/minio.md Outdated Show resolved Hide resolved
spec:
containers:
- name: dummy-job
image: shubhamdang/fastqc_python_image:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the image repository or has the openflight one been created now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and image is present in openflighthpc dockerhub account.

- name: MINIO_AKEY
value: "Mq6wmeNk0NOc0vD9Efut"
- name: MINIO_SKEY
value: "Z3ETBqC3GuIiU9PomjBbmmC5h8I5I7WgN1wNWlCG"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are sensitive secret keys or example ones but may be worth noting in the documentation what these keys should be set to (and maybe where to find them?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc in minio to create access key added the the link of minio in workload notes.

spec:
containers:
- name: dummy-job
image: shubhamdang/custom_tensorflow:1.15.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the image repository or has the openflight one been created now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and image is prsent in openflighthpc dockerhub account.

- name: MINIO_AKEY
value: "Mq6wmeNk0NOc0vD9Efut"
- name: MINIO_SKEY
value: "Z3ETBqC3GuIiU9PomjBbmmC5h8I5I7WgN1wNWlCG"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are sensitive secret keys or example ones but may be worth noting in the documentation what these keys should be set to (and maybe where to find them?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc in minio to create access key added the the link of minio in workload notes.

- name: MINIO_AKEY
value: "Mq6wmeNk0NOc0vD9Efut"
- name: MINIO_SKEY
value: "Z3ETBqC3GuIiU9PomjBbmmC5h8I5I7WgN1wNWlCG"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are sensitive secret keys or example ones but may be worth noting in the documentation what these keys should be set to (and maybe where to find them?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc in minio to create access key added the the link of minio in workload notes.

- name: MINIO_AKEY
value: "Mq6wmeNk0NOc0vD9Efut"
- name: MINIO_SKEY
value: "Z3ETBqC3GuIiU9PomjBbmmC5h8I5I7WgN1wNWlCG"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are sensitive secret keys or example ones but may be worth noting in the documentation what these keys should be set to (and maybe where to find them?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc in minio to create access key added the the link of minio in workload notes.

- name: MINIO_AKEY
value: "Mq6wmeNk0NOc0vD9Efut"
- name: MINIO_SKEY
value: "Z3ETBqC3GuIiU9PomjBbmmC5h8I5I7WgN1wNWlCG"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are sensitive secret keys or example ones but may be worth noting in the documentation what these keys should be set to (and maybe where to find them?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc in minio to create access key added the the link of minio in workload notes.

@shubhamdang
Copy link
Author

HPC with Kubernetes

Updated

Copy link
Member

@ColonelPanics ColonelPanics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback on the docs with things to improve. I am yet to retest the hadoop workflow and will take a look at pytorch, simpy and tensorflow shortly but this gives some things to address in the meantime

Comment on lines 7 to 12
- storageClass:
- Minio rootUser
- Minio rootPassword
- Minio ServiceType
- Minio API port
- Minio console port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth doing something with this section, either:

  • Explaining what these configurations are
  • Removing it entirely (as I don't see why we need to mention them here?)

Regardless, using the CLI arg names (e.g. rootUser is what flags spellcheck errors however these can also be alleviated by wrapping them in backticks to make them in-line code). This'll also address removing these from codespell.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

```bash
helm repo add bitnami https://charts.bitnami.com/bitnami

helm install -n default --set global.storageClass=longhorn --set auth.rootUser=admin --set auth.rootPassword=test123456 --set service.type=NodePort --set service.nodePorts.api=31100 --set service.nodePorts.console=31101 --set persistence.size=2Gi my-minio bitnami/minio --version 12.9.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that there were some issues in testing where the previous 8Gi persistent size was an issue but just changing it to 2 could also bring up issues with file storage.

Perhaps there should be a note or some information about considerations and constraints for the storage size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

@@ -0,0 +1,129 @@
# Steps to create bio-user in bio namespace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file needs some explanation of:

  • What the users are for
  • What the users have access to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

kubectl config use-context bio-context
```

## Steps to create physics-user in bio namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step being a H2 puts it at the same level as the rest of the docs. I have tripped up trying to add this user because I've followed the doc in order and switched to the bio-context and then received errors.

Fixing the formatting and probably noting the differences between the different contexts (as mentioned in the comment earlier in this file) should clear this up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

kind: Job
metadata:
generateName: fastqc-
namespace: default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL I think it's worth this job using the bio-context otherwise the users section isn't being put to any use

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

Comment on lines 36 to 46
value: <minio ip like "10.151.15.78">
- name: MINIO_PORT
value: <minio port like "31100">
# Minio Access Key
- name: MINIO_AKEY
value: <minio-access-key>
# Minio Secret Key
- name: MINIO_SKEY
value: <minio-secret-key>
- name: BUCKET_NAME
value: <minio bucket name like "hadoop">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues experienced IRL with not quoting strings, perhaps putting the entire <> section in quotes would make this a little clearer (especially as the access key and secret key don't have any indication of this being needed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks


- Docker must be installed on all nodes of the Kubernetes cluster.
- Password-less SSH access must be set up from the rke node to all Kubernetes nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experienced issues with bringing up rke before having firewall disabled (or appropriate rules setup for internal network communication) which were only fixed after making firewall correct and then restarting docker.

It seems sensible then to make one of the prerequisites be firewall related

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

Comment on lines 28 to 41
- name: MINIO_IP
value: <minio ip like "10.151.15.78">
- name: MINIO_PORT
value: <minio port like "31100">
# Minio Access Key
- name: MINIO_AKEY
value: <minio-access-key>
# Minio Secret Key
- name: MINIO_SKEY
value: <minio-secret-key>
- name: BUCKET_NAME
value: <minio bucket name like "genome">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues experienced IRL with not quoting strings, perhaps putting the entire <> section in quotes would make this a little clearer (especially as the access key and secret key don't have any indication of this being needed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants