Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

423 rest api prototype #437

Merged
merged 60 commits into from
Sep 21, 2023
Merged

423 rest api prototype #437

merged 60 commits into from
Sep 21, 2023

Conversation

XaverStiensmeier
Copy link
Contributor

@XaverStiensmeier XaverStiensmeier commented Sep 21, 2023

This update includes:

Because the rest API hasn't been tested yet, we can't know whether important features are missing that SimpleVM needs. As soon as the SimpleVM team finds the time to take a look at this prototype implementation, we can make changes if needed. Therefore, the current implementation might change and it is not advised to use this API for releases as it might soon change.

XaverStiensmeier and others added 30 commits May 25, 2023 11:07
* add apt.bi.denbi.de as package source

* update slurm tasks (now uses self-build slurm packages -> v22.05.7, restructure slurm files)

* add documentation to build newer Slurm package

* fixes

* slurmrestd uses openapi/v0.0.38

* Added check_nfs as a non fatale evaluation (#366)

* Added "." and "-" cases for cid. This allows further rescuing and gives info messages. (#365)

* Added identifier for when no profile is defined to have a distinct identifier.

* Activated vpn setup

* Fixed example command

* Added logging info for file push and commands

* fix slurmrestd connfiguration

* Implementing wireguard

* update task order (slurm-server)

* fix default user chown settings

* Add an additional mariadb repository for Ubuntu 20.04. Zabbix 7.2 needs at least MariaDB 10.5 or higher and Focal comes with MariaDB 10.3.

* Extend slurm documentation.

* Extends documentation that BiBiGrid now supports Ubuntu 20.04/22.04 and Debian 11 (fixes #348).

* cleanup

* fix typos in documentation

* Updated wg0

* fix typos in documentation

* add workflow-job to lint python/ansible

* add more output

* add more output

* update runner working directory

* make ansible_lint happy

* rewrite linting workflow
add linting dependencies

* fix a typo

* fix pylintrc -> remove ignore-pattern=test/ (not needed, since pylint currently lints bibigrid folder)
make pylint happy

* fixing jinja

* changed jinja

* Fixed wrong when clause

* Removed unnecessary comments and added index implementation

* this_peer is now used

* Added configuration reload if necessary

* Moved restart to handlers

* Added missing handler

* Changed to systemd setup

* Fixed nfs

* Fixed a few bugs more to come

* added some defaults

* Added vpn wkr without ip

* removed unnecessary print and fixed typo

* added vpn counter

* debugging bug

* debugging vpnwkr naming is wrong

* Commenting out worker creation

* Fixed bug making first worker and numberless

* fixed number order in deletion

* vpn workers added to instances.yml

* Added key generator for wireguard keys
Fixed minor bus and added wireguard vpn support except subnets

* Added subnet cidr

* Fixing default value bugs

* added identifier

* added identifier as variable and changed providers to access all flavors

* reformatted

* slurm

* fixed ip assigning

* foreign workers are now included in compute nodes

* Added vpnwkrs to playbook start

* Fixed formatting. Added identifier instead of "Test" for wireguard configuration to improve debugging

* Larger rework of instances file

* fixing bugs caused by aforementioned rework

* fixing bugs caused by aforementioned rework

* fixing bugs caused by aforementioned rework

* fixing bugs caused by aforementioned rework

* cluster_dict no longer needed for ansible configuration

* Changed instances_yml so it allows grouping by cloud

* Renamed to match jinja extension of other files

* instances.master

* instances.master

* removed master from instances list and fixed minor bugs.

* Fixed slicing

* Removed empty vpnworkers list as there can be only one

* Removed no longer needed import

* minor reference fixes regarding master and vpn

* Changed ip to cidr as it should be in nfs exports

* removed faulty space in nfs export entry

* added vpnwkrs to list of nodes to run ansible-playbook on

* added missing vpnwkr

* Set default partition

* Removed default partition as this key doesn't exist

* default if cloud fits

* all credentials will now be stored. Not compatible with save script yet.

* fixed wrong parameter type due to ac handling multiple providers now instead of just one

* Fixed cidr bug

* changed cloud_specification to use identifier

* Fixed master not being filtered out due to buggy detection

* create is now cloud structured but badly implemented (needs asynchronous implementation)

* Removed master = none

* removed faulty bracket.

* Worker start follows cloud structure now

* fixed badly placed assignment of ac_cloud_yaml

* replaced no longer fitting regex by an actual exact check using slurm's hostname resolution

* fixed old variable name leading to hickups

* Changed nfs exports to add all subnets. Currently not very nice looking, but working.

* Added comments and improved variable names.

* Added delete_server.py routine and connected it to fail.sh (untested).

* Further grouped code and simplified logging.

* fixed minor bugs and added a little bit of logging.

* patch for wait for post-launch services to stop

* Added private_v4 to configuration implementation. Bit dirty.

* Changed nfs for workers back to private_v4. Will crash with vpnwkr as long as security groups are not set correctly.

* Added missing instances

* add dnsmasq support ( #372 ) (#380)

* add dnsmasq support ( #372 )

* extend dnsmasq support ( #372 )

* bugfixes dnsmasq support ( #372 )

* fix ansible syntax
add all vpnworker to dnsmasq.hosts ( #372 )
change order of copying clouds.yaml
many changes

* Added wireguard_ip

* wireguard_ip increased by 1 to ignore master

* Added a print for private_v4 to symbolize the start of dns entry creation

* Add support for additional vars file : hosts.yml
Extend hosts.j2 template to support worker entries

* - extends instances configuration
- add worker_userdata template

* - remove unused wireguard-worker.yml
- add userdata support (create_server.py)
- enable ip forwarding and tcp mtu probing  on vpn gateways

* Fix program crash when image is not active (#382)

* Fixed function missing call

* Fixed linter that wasn't troubled before

* Fix ephemeral not working (#385)

* implemented usage of host_vars

* probably solved, but not best solution yet

* changed from host_vars to group_vars to have fewer files doing the same work

* update requirements.txt

* add ConfigurationException

* Provider and it implementation for Openstack gets another method to add allowed_addresses to an interface/port

* Remove not longer functions/ code fragments.  Add support for extended network configuration, when creating a multi-cloud cluster.

* added hybrid cloud

* updating check documentation

* updating check documentation

* updating check documentation

* Removed artefact

* Filled text beyond headings

* Add security group support to provider and its implementing classes.

* Update create action:
- support for security groups
- slightly restructuring

* add wirguard network to list of allowed addresses

* fix wrong usage of jinja templating

* add usage of security groups when creating a worker

* fix wireguard systemd network configuration

* add firewall rules when running in a multi-cloud setup

* add termination of created security groups
fix a converning adding allowed addresses

* fix "allowed addresses" when running with more than 2 providers

* pin openstacksdk to an older version to avoid deprecation warnings.

* Added host file solution for vpnwkrs. Moved wireguard to configuration.

* Added host vars to deletion process and fixed vpnwkrs using group vars instead of host vars bug.

* Fixing structural changes due to merge

* Fixed vpn workers getting lost

* fixed merge bug, improved data structure ansible/jinja

* Removed another bug regarding passing too many arguments.

* removed delay for now

* fixed worker count

* fixed wireguard

* Added reattempt for ConflictException still not perfect.

* Further fixed vpnwkr merge issues

* Adapted command to new group vpn that contains both master and vpnwkr

* Fixed wireguard ip bug

* fixed bug wireguard not installed on vpn-worker

* Changed "local" to "ssh" in order to avoid sudo right issue on master.

* fixed group name?

* adapted timeout to experiences

* fixed group name now using "-" instead of ":"

* fixed userdata being list cause of using readlines instead of read. Now is string.

* group name cannot contain '-' therefore switched to underscores. Maybe change this in the node naming convention as well.

* Make all clouds default

* first draft add ip routes

* Added ip routes to main.yml

* Changed ip route registration to make use of linux network files

* Workers now save the gateway_ip (private_v4 of master or vpnwkr). Also fixed a counting error.

* now using common variable wireguard_common instead of group_var wireguard which is always missing on workers.

* Added rights.

* Disabling netplan and going full networkd

* Disabling cloud network changes after initialization

* Added netplan deactivation

* Fixed connection issues

* Added missing handler and added a task that updates the host file on worker

* Fixed minor bad namings and added missing ".yaml" extension to task file

* Added implementation of "bibiname" a short script that allows node name creation

* fixed name issue regarding slurm user executing ansible. Now master name is determined without user involvement.

* renamed task to "generate bibiname script"

* Adapted scripts to meet hybrid cloud solution

* Added delete_server.py script to bin copied files

* fixed fail and terminate script

* changed terminate script to timeout delete

* fixed minor code issues

* fixed linting issues delete_server.py

* fixed linting issues provider.py

* fixed linting issues startup_tests.py

* fixed linting issues

* fixed linting issues

* fixed typo

* fixed termination ConflictException not caught

* Added basic structure for multi_cloud.md

* Added elixir compute presentation as an additional light-weight read.

* added this file that - in the future - maybe should hold information regarding other projects that are using BiBiGrid. That makes it easier to keep an eye on all applications that might be affected by BiBiGrid's changes.

* Added basic wireguard.md documentation

* fixed grammar

* removed redundant warning

* added dnsmasq documentation structure

* removed encryption

* updated purpose description

* update DNS

* now creating empty hosts.yml file in order to allow ansible execution

* Remove entire vars folder

* fixed path

* changed provider.NAME provider.cloud_specification['identifier']

* Removed vpnwkr from slurm as it should only be used to establish connection and not for computing

* Decoupled for loop worker ansible host creation from vpnwkr host creation

* fixed vpnwkr still being added to the partition even though the node doesn't exist anymore

* Fixed bug in bibiname.j2 that gave master a number (master never has a number as there is only one)

* removed all references to the instances.master

* removed further references to instances.yml and fixed bugs appearing because of it. Needs rework where master access can be shortened.

* fixed slurm.conf creating NodeName duplicates. Still unordered.

* Added all partition

* Removed instances.yml from create_server.py

* Removed instances.yml from delete_server.py

* removed last remains of instance.yml

* Servers are now created asynchronously.

* Fixed rest error

* Added support for feature in slurm.conf

* Putting features into group_vars

* Updated configuration.md documentation to mention new feature "feature" for instances and configuration.

* Added merge information and updates bibigrid.yml accordingly

* added features to master and workergroups

* fixed features not added as string to slurm.conf

* added missing empty line

* Now a single string instead of a list of features is understood as well.

* Improved cloud_identifier selection and documented the new way: picking clouds.yaml key.

* updated configuration.md and removed many inaccuracies

* changed instances to instance for instance creation as workers are no longer created.

* Improved create.md

* Improved naming of subparagraph

* Fixed indentation, readability and documentation

* Improved logging information.

* Improved logging

* Added warning message when configuration is not list.

* added configuration list parameter

* Added logging when network or subnet couldn't be set

* Improved logging of ConfigurationExceptions

* Improved documentation. Removed unnecessary variable in ide

* Improved documentation.

* Added brief information regarding wireguard and zabbix

* changed vpnwkr to vpngtw

* Fixed security group deletion for not multi-cloud clusters.

---------

Co-authored-by: Jan Krüger <[email protected]>
Co-authored-by: Jan Krüger <[email protected]>
@jkrue jkrue merged commit af63c6d into dev Sep 21, 2023
1 check passed
@jkrue jkrue deleted the 423-rest-api-prototype branch September 21, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants