Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* added exact versions for openstsacksdk and python-openstackclient (#413) * Keep master updated (#401) * add apt.bi.denbi.de as package source * update slurm tasks (now uses self-build slurm packages -> v22.05.7, restructure slurm files) * add documentation to build newer Slurm package * fixes * slurmrestd uses openapi/v0.0.38 * Added check_nfs as a non fatale evaluation (#366) * Added "." and "-" cases for cid. This allows further rescuing and gives info messages. (#365) * Added identifier for when no profile is defined to have a distinct identifier. * Activated vpn setup * Fixed example command * Added logging info for file push and commands * fix slurmrestd connfiguration * Implementing wireguard * update task order (slurm-server) * fix default user chown settings * Add an additional mariadb repository for Ubuntu 20.04. Zabbix 7.2 needs at least MariaDB 10.5 or higher and Focal comes with MariaDB 10.3. * Extend slurm documentation. * Extends documentation that BiBiGrid now supports Ubuntu 20.04/22.04 and Debian 11 (fixes #348). * cleanup * fix typos in documentation * Updated wg0 * fix typos in documentation * add workflow-job to lint python/ansible * add more output * add more output * update runner working directory * make ansible_lint happy * rewrite linting workflow add linting dependencies * fix a typo * fix pylintrc -> remove ignore-pattern=test/ (not needed, since pylint currently lints bibigrid folder) make pylint happy * fixing jinja * changed jinja * Fixed wrong when clause * Removed unnecessary comments and added index implementation * this_peer is now used * Added configuration reload if necessary * Moved restart to handlers * Added missing handler * Changed to systemd setup * Fixed nfs * Fixed a few bugs more to come * added some defaults * Added vpn wkr without ip * removed unnecessary print and fixed typo * added vpn counter * debugging bug * debugging vpnwkr naming is wrong * Commenting out worker creation * Fixed bug making first worker and numberless * fixed number order in deletion * vpn workers added to instances.yml * Added key generator for wireguard keys Fixed minor bus and added wireguard vpn support except subnets * Added subnet cidr * Fixing default value bugs * added identifier * added identifier as variable and changed providers to access all flavors * reformatted * slurm * fixed ip assigning * foreign workers are now included in compute nodes * Added vpnwkrs to playbook start * Fixed formatting. Added identifier instead of "Test" for wireguard configuration to improve debugging * Larger rework of instances file * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * cluster_dict no longer needed for ansible configuration * Changed instances_yml so it allows grouping by cloud * Renamed to match jinja extension of other files * instances.master * instances.master * removed master from instances list and fixed minor bugs. * Fixed slicing * Removed empty vpnworkers list as there can be only one * Removed no longer needed import * minor reference fixes regarding master and vpn * Changed ip to cidr as it should be in nfs exports * removed faulty space in nfs export entry * added vpnwkrs to list of nodes to run ansible-playbook on * added missing vpnwkr * Set default partition * Removed default partition as this key doesn't exist * default if cloud fits * all credentials will now be stored. Not compatible with save script yet. * fixed wrong parameter type due to ac handling multiple providers now instead of just one * Fixed cidr bug * changed cloud_specification to use identifier * Fixed master not being filtered out due to buggy detection * create is now cloud structured but badly implemented (needs asynchronous implementation) * Removed master = none * removed faulty bracket. * Worker start follows cloud structure now * fixed badly placed assignment of ac_cloud_yaml * replaced no longer fitting regex by an actual exact check using slurm's hostname resolution * fixed old variable name leading to hickups * Changed nfs exports to add all subnets. Currently not very nice looking, but working. * Added comments and improved variable names. * Added delete_server.py routine and connected it to fail.sh (untested). * Further grouped code and simplified logging. * fixed minor bugs and added a little bit of logging. * patch for wait for post-launch services to stop * Added private_v4 to configuration implementation. Bit dirty. * Changed nfs for workers back to private_v4. Will crash with vpnwkr as long as security groups are not set correctly. * Added missing instances * add dnsmasq support ( #372 ) (#380) * add dnsmasq support ( #372 ) * extend dnsmasq support ( #372 ) * bugfixes dnsmasq support ( #372 ) * fix ansible syntax add all vpnworker to dnsmasq.hosts ( #372 ) change order of copying clouds.yaml many changes * Added wireguard_ip * wireguard_ip increased by 1 to ignore master * Added a print for private_v4 to symbolize the start of dns entry creation * Add support for additional vars file : hosts.yml Extend hosts.j2 template to support worker entries * - extends instances configuration - add worker_userdata template * - remove unused wireguard-worker.yml - add userdata support (create_server.py) - enable ip forwarding and tcp mtu probing on vpn gateways * Fix program crash when image is not active (#382) * Fixed function missing call * Fixed linter that wasn't troubled before * Fix ephemeral not working (#385) * implemented usage of host_vars * probably solved, but not best solution yet * changed from host_vars to group_vars to have fewer files doing the same work * update requirements.txt * add ConfigurationException * Provider and it implementation for Openstack gets another method to add allowed_addresses to an interface/port * Remove not longer functions/ code fragments. Add support for extended network configuration, when creating a multi-cloud cluster. * added hybrid cloud * updating check documentation * updating check documentation * updating check documentation * Removed artefact * Filled text beyond headings * Add security group support to provider and its implementing classes. * Update create action: - support for security groups - slightly restructuring * add wirguard network to list of allowed addresses * fix wrong usage of jinja templating * add usage of security groups when creating a worker * fix wireguard systemd network configuration * add firewall rules when running in a multi-cloud setup * add termination of created security groups fix a converning adding allowed addresses * fix "allowed addresses" when running with more than 2 providers * pin openstacksdk to an older version to avoid deprecation warnings. * Added host file solution for vpnwkrs. Moved wireguard to configuration. * Added host vars to deletion process and fixed vpnwkrs using group vars instead of host vars bug. * Fixing structural changes due to merge * Fixed vpn workers getting lost * fixed merge bug, improved data structure ansible/jinja * Removed another bug regarding passing too many arguments. * removed delay for now * fixed worker count * fixed wireguard * Added reattempt for ConflictException still not perfect. * Further fixed vpnwkr merge issues * Adapted command to new group vpn that contains both master and vpnwkr * Fixed wireguard ip bug * fixed bug wireguard not installed on vpn-worker * Changed "local" to "ssh" in order to avoid sudo right issue on master. * fixed group name? * adapted timeout to experiences * fixed group name now using "-" instead of ":" * fixed userdata being list cause of using readlines instead of read. Now is string. * group name cannot contain '-' therefore switched to underscores. Maybe change this in the node naming convention as well. * Make all clouds default * first draft add ip routes * Added ip routes to main.yml * Changed ip route registration to make use of linux network files * Workers now save the gateway_ip (private_v4 of master or vpnwkr). Also fixed a counting error. * now using common variable wireguard_common instead of group_var wireguard which is always missing on workers. * Added rights. * Disabling netplan and going full networkd * Disabling cloud network changes after initialization * Added netplan deactivation * Fixed connection issues * Added missing handler and added a task that updates the host file on worker * Fixed minor bad namings and added missing ".yaml" extension to task file * Added implementation of "bibiname" a short script that allows node name creation * fixed name issue regarding slurm user executing ansible. Now master name is determined without user involvement. * renamed task to "generate bibiname script" * Adapted scripts to meet hybrid cloud solution * Added delete_server.py script to bin copied files * fixed fail and terminate script * changed terminate script to timeout delete * fixed minor code issues * fixed linting issues delete_server.py * fixed linting issues provider.py * fixed linting issues startup_tests.py * fixed linting issues * fixed linting issues * fixed typo * fixed termination ConflictException not caught * Added basic structure for multi_cloud.md * Added elixir compute presentation as an additional light-weight read. * added this file that - in the future - maybe should hold information regarding other projects that are using BiBiGrid. That makes it easier to keep an eye on all applications that might be affected by BiBiGrid's changes. * Added basic wireguard.md documentation * fixed grammar * removed redundant warning * added dnsmasq documentation structure * removed encryption * updated purpose description * update DNS * now creating empty hosts.yml file in order to allow ansible execution * Remove entire vars folder * fixed path * changed provider.NAME provider.cloud_specification['identifier'] * Removed vpnwkr from slurm as it should only be used to establish connection and not for computing * Decoupled for loop worker ansible host creation from vpnwkr host creation * fixed vpnwkr still being added to the partition even though the node doesn't exist anymore * Fixed bug in bibiname.j2 that gave master a number (master never has a number as there is only one) * removed all references to the instances.master * removed further references to instances.yml and fixed bugs appearing because of it. Needs rework where master access can be shortened. * fixed slurm.conf creating NodeName duplicates. Still unordered. * Added all partition * Removed instances.yml from create_server.py * Removed instances.yml from delete_server.py * removed last remains of instance.yml * Servers are now created asynchronously. * Fixed rest error * Added support for feature in slurm.conf * Putting features into group_vars * Updated configuration.md documentation to mention new feature "feature" for instances and configuration. * Added merge information and updates bibigrid.yml accordingly * added features to master and workergroups * fixed features not added as string to slurm.conf * added missing empty line * Now a single string instead of a list of features is understood as well. * Improved cloud_identifier selection and documented the new way: picking clouds.yaml key. * updated configuration.md and removed many inaccuracies * changed instances to instance for instance creation as workers are no longer created. * Improved create.md * Improved naming of subparagraph * Fixed indentation, readability and documentation * Improved logging information. * Improved logging * Added warning message when configuration is not list. * added configuration list parameter * Added logging when network or subnet couldn't be set * Improved logging of ConfigurationExceptions * Improved documentation. Removed unnecessary variable in ide * Improved documentation. * Added brief information regarding wireguard and zabbix * changed vpnwkr to vpngtw * Fixed security group deletion for not multi-cloud clusters. --------- Co-authored-by: Jan Krüger <[email protected]> Co-authored-by: Jan Krüger <[email protected]> * Added option to generate cluster_id before create process * Added rest api prototype * reworked naming convention and added terminate command. Added basic replies. * Converter global LOG to class attribute self.log to enable different logs per thread * Reverted logging to global logging because using redirect might be more feasible * Using contextlib to redirect prints * Started rewriting prints to logging and make logging not global and thread-safe * Fixed list_clusters needing log now. * updated terminate.py and occurences to local logging. * changed logging to local for ansible configurator * unfinished: started localizing logging in logging_path_handler.py * updating ssh_handler.py now logging locally (and affected modules) * updating ssh_handler.py now logging locally (and affected modules) * improved variable names * updated provider_handler.py to local logging * changed global logging to local logging * changed global logging to local logging * Fixed many small logging mistakes and changed validation logging to local * Fixed formatting * Cleaned startup.py * Fixed logging error and made use of logging for all commands * Added cpu based worker selection * Added new logging option 42 for "PRINT" * Improved logger and added an explanation implementation * Changed info to post and contains list now instead of single element * Switched to main method. * fixed many small things regarding log, added gateway mode for ssh_handler.py and fixed rest added get_log option * Enabled multiple subnets for when network is given. Not fully operational yet. * Fixed crash causing bug when using network instead of subnet * Removed unnecessary debug warning * made print nicer * further fixed using network instead of subnet * fixed issues regarding port calculation and gateway_ip * Added check wether a cluster is running * removed prints * removed prints * Added comments for docs * Added pydantic base models * Capitalized names * added option to terminate with assume_true * removed as docs fulfills this purpose now * added option to not upload Credentials * fixed minor bug causing bibigrid not finding private keys. * removed print * fixed name not being capitalized (ansible) * fixed old linting error * fixed old linting error * implemented gateway with portFunction using sympy * using gateway automatically deactivates public ip usage now. * updated documentation * update is now able to use gateway if given. * ide is now able to use gateway if given. * new version correctly integrated * removed unnecessary add to stdout (already standard) * removed unnecessary add to stdout (already standard) from startup_rest.py * if regex is found, check will succeed now. * fixed ssh not using gateway --------- Co-authored-by: Jan Krüger <[email protected]> Co-authored-by: Jan Krüger <[email protected]>
- Loading branch information