Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

545 allow attached volumes #562

Merged
merged 73 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
9b2814a
renamed volumeSize to bootVolumeSize to avoid name issues
XaverStiensmeier Sep 27, 2024
0938e4c
added implementation for adding volumes to permanent workers (they ar…
XaverStiensmeier Sep 27, 2024
0bf0a35
implemented creating and terminating volumes without filesystem for p…
XaverStiensmeier Sep 30, 2024
303d77e
fully working for permanent workers. masterMount is broken now, but a…
XaverStiensmeier Oct 1, 2024
e6a4479
Added volume creation to create_server. Not yet working.
XaverStiensmeier Oct 10, 2024
5837a19
hostvar for each host
XaverStiensmeier Oct 11, 2024
35b1178
Fixed information handling and naming issues
XaverStiensmeier Oct 16, 2024
08c00ad
fixed host yaml creation
XaverStiensmeier Oct 17, 2024
5e6a25a
removed unnecessary prints
XaverStiensmeier Oct 17, 2024
3f488a5
improved readability fixed minor bugs
XaverStiensmeier Oct 17, 2024
52b8286
added volume deletion and set volume_ap_version explicitly
XaverStiensmeier Oct 17, 2024
4a0ff18
removed prints from test_provider.py
XaverStiensmeier Oct 17, 2024
d989d73
improved readability greatly. Fixed overwriting host vars bug
XaverStiensmeier Oct 18, 2024
bfa08d8
snapshot and existing volumes can now be attached to master and worke…
XaverStiensmeier Oct 22, 2024
4d38777
snapshot and existing volumes can now be attached to master and worke…
XaverStiensmeier Oct 23, 2024
445b0eb
removed mountPoint from a log message in case no mount point is speci…
XaverStiensmeier Oct 24, 2024
12b4487
fixed lsblk not finding item.device due to race condition
XaverStiensmeier Oct 25, 2024
d732f4e
improved comments and naming
XaverStiensmeier Oct 25, 2024
84e96be
removed server automount. This is now handled by a single automount t…
XaverStiensmeier Oct 25, 2024
aae6c68
allows nor to start new permanent volumes if a name is given. One cou…
XaverStiensmeier Oct 28, 2024
7ab04de
fixed wrong function call
XaverStiensmeier Oct 28, 2024
f75b357
renamed nfs_mount to nfs_shares
XaverStiensmeier Oct 29, 2024
6ec4f5d
added semipermanent as an option
XaverStiensmeier Oct 29, 2024
49b1061
fixed wrong method of default values for Ansible
XaverStiensmeier Oct 29, 2024
3515ebc
started reworking
XaverStiensmeier Oct 29, 2024
62a64b8
added volumes and changed bootVolumes
XaverStiensmeier Oct 29, 2024
7a25d0b
updated bibigrid.yaml and aligned naming of bootVolume and volume
XaverStiensmeier Oct 31, 2024
041b36f
added newline at end of file
XaverStiensmeier Oct 31, 2024
ab12dce
removed superfluous provider paramter
XaverStiensmeier Oct 31, 2024
73d722e
pleased linting
XaverStiensmeier Oct 31, 2024
0e5ae14
removed argument from function call
XaverStiensmeier Oct 31, 2024
eda7ae9
moved host vars creation, vars deletion, added comments
XaverStiensmeier Oct 31, 2024
4a598ab
largely reworked how volumes are attached to servers to be more explicit
XaverStiensmeier Nov 5, 2024
16ab5d4
small naming fixes
XaverStiensmeier Nov 6, 2024
c9bf411
updated priority order of permanent and semiPermanent. Updated docume…
XaverStiensmeier Nov 7, 2024
deea4e4
fixed bug regarding dontUploadCredentials
XaverStiensmeier Nov 7, 2024
68a1abb
updated schema validation
XaverStiensmeier Nov 7, 2024
bbc749b
Update linting.yml
XaverStiensmeier Nov 7, 2024
7dc3053
Update linting.yml
XaverStiensmeier Nov 7, 2024
16172a7
Update linting.yml
XaverStiensmeier Nov 7, 2024
bf7476e
Update linting.yml
XaverStiensmeier Nov 7, 2024
4d68fae
added __init__.py where appropriate
XaverStiensmeier Nov 11, 2024
9cb8bcc
Merge remote-tracking branch 'origin/545-allow-attached-volumes' into…
XaverStiensmeier Nov 11, 2024
f0da25a
update bibigrid.yaml for more explicit volumes documentation
XaverStiensmeier Nov 11, 2024
73494c6
volumes are now validated and fixed old state of masterInstance in va…
XaverStiensmeier Nov 11, 2024
aa2f9d7
Update linting.yml
XaverStiensmeier Nov 11, 2024
fb8b165
Update linting.yml
XaverStiensmeier Nov 11, 2024
95da514
fixed longtime naming bug for unknown openstack exceptions
XaverStiensmeier Nov 13, 2024
244a730
Merge remote-tracking branch 'origin/545-allow-attached-volumes' into…
XaverStiensmeier Nov 13, 2024
3c80f1e
saves more info in .mem file
XaverStiensmeier Nov 13, 2024
3029431
moved structure of tests and added a basic integration_test file that…
XaverStiensmeier Nov 14, 2024
a636e6a
moved tests
XaverStiensmeier Nov 14, 2024
1109ba0
added "not ready yet"
XaverStiensmeier Nov 27, 2024
c51bfc3
updated bootVolume documentation
XaverStiensmeier Nov 27, 2024
93d2844
moved tests added __init__.py files for better discovery. minor fixes
XaverStiensmeier Nov 27, 2024
a21d127
updated tests and comments
XaverStiensmeier Nov 28, 2024
f39cfdc
updated tests and comments
XaverStiensmeier Nov 28, 2024
c4fdca2
updated tests, code and comments for ansible_configuration
XaverStiensmeier Nov 29, 2024
040e118
updated tests for ansible_configurator
XaverStiensmeier Nov 29, 2024
6a5e79a
fixed test_ansible_configurator.py
XaverStiensmeier Nov 30, 2024
eb74156
fixed test_configuration_handler.py
XaverStiensmeier Nov 30, 2024
51d0208
improved exception messages
XaverStiensmeier Nov 30, 2024
8549202
pleased ansible linter
XaverStiensmeier Nov 30, 2024
16892d4
fixed terminate return values test
XaverStiensmeier Nov 30, 2024
adf1095
improved naming
XaverStiensmeier Nov 30, 2024
0ffa578
added tests to make sure that server regex only deletes bibigrid serv…
XaverStiensmeier Nov 30, 2024
56c0dc2
pleased pylint
XaverStiensmeier Nov 30, 2024
5baceed
fixed validation issue when using exists in master
XaverStiensmeier Nov 30, 2024
62aca95
removed forgotten print
XaverStiensmeier Nov 30, 2024
bfb5e45
fixed description bug
XaverStiensmeier Nov 30, 2024
496f78f
final bugfixes
XaverStiensmeier Dec 2, 2024
c3b6ea9
pleased linter
XaverStiensmeier Dec 2, 2024
139c61a
fixed too many positional arguments
XaverStiensmeier Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
- name: Set up Python 3.12.3
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: '3.12.3'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand All @@ -17,4 +17,4 @@ jobs:
- name: ansible_lint
run: ansible-lint resources/playbook/roles/bibigrid/tasks/main.yaml
- name: pylint_lint
run: pylint bibigrid
run: pylint bibigrid
4 changes: 2 additions & 2 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -562,8 +562,8 @@ min-public-methods=2
[EXCEPTIONS]

# Exceptions that will emit a warning when caught.
overgeneral-exceptions=BaseException,
Exception
overgeneral-exceptions=builtins.BaseException,
builtins.Exception


[STRING]
Expand Down
159 changes: 90 additions & 69 deletions bibigrid.yaml
Original file line number Diff line number Diff line change
@@ -1,105 +1,126 @@
# See https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/ (after update)
# See https://github.com/BiBiServ/bibigrid/blob/master/documentation/markdown/features/configuration.md
# First configuration also holds general cluster information and must include the master.
# All other configurations mustn't include another master, but exactly one vpngtw instead (keys like master).
# For an easy introduction see https://github.com/deNBI/bibigrid_clum
# For more detailed information see https://github.com/BiBiServ/bibigrid/blob/master/documentation/markdown/features/configuration.md

- infrastructure: openstack # former mode. Describes what cloud provider is used (others are not implemented yet)
cloud: openstack # name of clouds.yaml cloud-specification key (which is value to top level key clouds)
- # -- BEGIN: GENERAL CLUSTER INFORMATION --
# The following options configure cluster wide keys
# Modify these according to your requirements

# -- BEGIN: GENERAL CLUSTER INFORMATION --
# sshTimeout: 5 # number of attempts to connect to instances during startup with delay in between
# cloudScheduling:
# sshTimeout: 5 # like sshTimeout but during the on demand scheduling on the running cluster

## sshPublicKeyFiles listed here will be added to access the cluster. A temporary key is created by bibigrid itself.
#sshPublicKeyFiles:
# - [public key one]
## sshPublicKeyFiles listed here will be added to the master's authorized_keys. A temporary key is stored at ~/.config/bibigrid/keys
# sshPublicKeyFiles:
# - [public key one]

## Volumes and snapshots that will be mounted to master
#masterMounts: (optional) # WARNING: will overwrite unidentified filesystems
# - name: [volume name]
# mountPoint: [where to mount to] # (optional)
# masterMounts: DEPRECATED -- see `volumes` key for each instance instead

#nfsShares: /vol/spool/ is automatically created as a nfs
# - [nfsShare one]
# nfsShares: # list of nfs shares. /vol/spool/ is automatically created as an nfs if nfs is true
# - [nfsShare one]

# userRoles: # see ansible_hosts for all options
## Ansible Related
# userRoles: # see ansible_hosts for all 'hosts' options
# - hosts:
# - "master"
# roles: # roles placed in resources/playbook/roles_user
# - name: "resistance_nextflow"
# varsFiles: # (optional)
# - [...]

## Uncomment if you don't want assign a public ip to the master; for internal cluster (Tuebingen).
## If you use a gateway or start a cluster from the cloud, your master does not need a public ip.
# useMasterWithPublicIp: False # defaults True if False no public-ip (floating-ip) will be allocated
# gateway: # if you want to use a gateway for create.
# ip: # IP of gateway to use
# portFunction: 30000 + oct4 # variables are called: oct1.oct2.oct3.oct4

# deleteTmpKeypairAfter: False
# dontUploadCredentials: False
## Only relevant for specific projects (e.g. SimpleVM)
# deleteTmpKeypairAfter: False # warning: if you don't pass a key via sshPublicKeyFiles you lose access!
# dontUploadCredentials: False # warning: enabling this prevents you from scheduling on demand!

## Additional Software
# zabbix: False
# nfs: False
# ide: False # installs a web ide on the master node. A nice way to view your cluster (like Visual Studio Code)

### Slurm Related
# elastic_scheduling: # for large or slow clusters increasing these timeouts might be necessary to avoid failures
# SuspendTimeout: 60 # after SuspendTimeout seconds, slurm allows to power up the node again
# ResumeTimeout: 1200 # if a node doesn't start in ResumeTimeout seconds, the start is considered failed.

# Other keys - these are default False
# Usually Ignored
##localFS: True
##localDNSlookup: True
# cloudScheduling:
# sshTimeout: 5 # like sshTimeout but during the on demand scheduling on the running cluster

#zabbix: True
#nfs: True
#ide: True # A nice way to view your cluster as if you were using Visual Studio Code
# useMasterAsCompute: True

useMasterAsCompute: True
# -- END: GENERAL CLUSTER INFORMATION --

# bootFromVolume: False
# terminateBootVolume: True
# volumeSize: 50
# waitForServices: # existing service name that runs after an instance is launched. BiBiGrid's playbook will wait until service is "stopped" to avoid issues
# -- BEGIN: MASTER CLOUD INFORMATION --
infrastructure: openstack # former mode. Describes what cloud provider is used (others are not implemented yet)
cloud: openstack # name of clouds.yaml cloud-specification key (which is value to top level key clouds)

# waitForServices: # list of existing service names that affect apt. BiBiGrid's playbook will wait until service is "stopped" to avoid issues
# - de.NBI_Bielefeld_environment.service # uncomment for cloud site Bielefeld

# master configuration
## master configuration
masterInstance:
type: # existing type/flavor on your cloud. See launch instance>flavor for options
image: # existing active image on your cloud. Consider using regex to prevent image updates from breaking your running cluster
type: # existing type/flavor from your cloud. See launch instance>flavor for options
image: # existing active image from your cloud. Consider using regex to prevent image updates from breaking your running cluster
# features: # list
# - feature1
# partitions: # list
# bootVolume: None
# bootFromVolume: True
# terminateBootVolume: True
# volumeSize: 50

# -- END: GENERAL CLUSTER INFORMATION --
# - partition1
# bootVolume: # optional
# name: # optional; if you want to boot from a specific volume
# terminate: True # whether the volume is terminated on server termination
# size: 50
# volumes: # optional
# - name: volumeName # empty for temporary volumes
# snapshot: snapshotName # optional; to create volume from a snapshot
# mountPoint: /vol/mountPath
# size: 50
# fstype: ext4 # must support chown
# type: # storage type; available values depend on your location; for Bielefeld CEPH_HDD, CEPH_NVME
## Select up to one of the following options; otherwise temporary is picked
# exists: False # if True looks for existing volume with exact name. count must be 1. Volume is never deleted.
# permanent: False # if True volume is never deleted; overwrites semiPermanent if both are given
# semiPermanent: False # if True volume is only deleted during cluster termination

# fallbackOnOtherImage: False # if True, most similar image by name will be picked. A regex can also be given instead.

# worker configuration
## worker configuration
# workerInstances:
# - type: # existing type/flavor on your cloud. See launch instance>flavor for options
# - type: # existing type/flavor from your cloud. See launch instance>flavor for options
# image: # same as master. Consider using regex to prevent image updates from breaking your running cluster
# count: # any number of workers you would like to create with set type, image combination
# count: 1 # number of workers you would like to create with set type, image combination
# # features: # list
# # partitions: # list
# # bootVolume: None
# # bootFromVolume: True
# # terminateBootVolume: True
# # volumeSize: 50

# Depends on cloud image
sshUser: # for example ubuntu

# Depends on cloud site and project
subnet: # existing subnet on your cloud. See https://openstack.cebitec.uni-bielefeld.de/project/networks/
# or network:

# Uncomment if no full DNS service for started instances is available.
# Currently, the case in Berlin, DKFZ, Heidelberg and Tuebingen.
#localDNSLookup: True

#features: # list

# elastic_scheduling: # for large or slow clusters increasing these timeouts might be necessary to avoid failures
# SuspendTimeout: 60 # after SuspendTimeout seconds, slurm allows to power up the node again
# ResumeTimeout: 1200 # if a node doesn't start in ResumeTimeout seconds, the start is considered failed.
# # partitions: # list of slurm features that all nodes of this group have
# # bootVolume: # optional
# # name: # optional; if you want to boot from a specific volume
# # terminate: True # whether the volume is terminated on server termination
# # size: 50
# # volumes: # optional
# # - name: volumeName # optional
# # snapshot: snapshotName # optional; to create volume from a snapshot
# # mountPoint: /vol/mountPath # optional; not mounted if no path is given
# # size: 50
# # fstype: ext4 # must support chown
# # type: # storage type; available values depend on your location; for Bielefeld CEPH_HDD, CEPH_NVME
# ## Select up to one of the following options; otherwise temporary is picked
# # exists: False # if True looks for existing volume with exact name. count must be 1. Volume is never deleted.
# # permanent: False # if True volume is never deleted; overwrites semiPermanent if both are given
# # semiPermanent: False # if True volume is only deleted during cluster termination

# Depends on image
sshUser: # for example 'ubuntu'

# Depends on project
subnet: # existing subnet from your cloud. See https://openstack.cebitec.uni-bielefeld.de/project/networks/
# network: # only if no subnet is given

# features: # list of slurm features that all nodes of this cloud have
# - feature1

# bootVolume: # optional (cloud wide)
# name: # optional; if you want to boot from a specific volume
# terminate: True # whether the volume is terminated on server termination
# size: 50

#- [next configurations]
Empty file added bibigrid/__init__.py
Empty file.
Empty file added bibigrid/core/__init__.py
Empty file.
Empty file.
Loading
Loading