Provisioner KnowledgeBase

https://docs.google.com/document/d/1LXOYYpbFCAOMQCiWpltEqI3Zr-5OVMI-5LGgK7BqyNA/edit?ts=5d031b12#heading=h.1ui14hyuzs9n

[[TOC]]

Dos and Don'ts

Dos

Before you begin with provisioning ensure you have had a good review of files that are expected to be changed:
- /opt/seagate/eos-prvsnr/pillar/components/release.sls
- /opt/seagate/eos-prvsnr/pillar/components/cluster.sls
  These files require customization for every node for node and release specific details. Such customization cannot be skipped.

Dont's

Frequently Asked Questions

Known Issues/Workarounds

Issue. Error `ModuleNotFoundError: No module named 's3iamcli'` while creating iam account

[root@ees-node1 ees-prvsnr]# s3iamcli CreateAccount -n cloud -e [email protected]
Traceback (most recent call last):
  File "/bin/s3iamcli", line 5, in <module>
    from s3iamcli.main import S3IamCli
ModuleNotFoundError: No module named 's3iamcli'

Solution

[root@ees-node1 ees-prvsnr]# ls -l /usr/bin/python3
lrwxrwxrwx. 1 root root 9 Jun 19 00:22 /usr/bin/python3 -> python3.6
[root@ees-node1 ees-prvsnr]# ls -l /usr/bin/python
lrwxrwxrwx. 1 root root 7 Mar 22 02:04 /usr/bin/python -> python2
[root@ees-node1 ees-prvsnr]# ls -l /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Mar 22 02:04 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Mar 22 02:04 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7216 Apr 11  2018 /usr/bin/python2.7
lrwxrwxrwx. 1 root root     9 Jun 19 00:22 /usr/bin/python3 -> python3.6
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4m
lrwxrwxrwx. 1 root root    18 Jun 19 00:22 /usr/bin/python36 -> /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6m  

[root@ees-node1 ees-prvsnr]# rm /usr/bin/python3
rm: remove symbolic link ‘/usr/bin/python3’? y  

[root@ees-node1 ees-prvsnr]# ln -s /usr/bin/python3.4 /usr/bin/python3
[root@ees-node1 ees-prvsnr]# ls -l /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Mar 22 02:04 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Mar 22 02:04 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7216 Apr 11  2018 /usr/bin/python2.7
lrwxrwxrwx. 1 root root    18 Jun 19 02:26 /usr/bin/python3 -> /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4m
lrwxrwxrwx. 1 root root    18 Jun 19 00:22 /usr/bin/python36 -> /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6m

Issue. Error `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)` while CreateAccount

[root@ees-node1 .sgs3iamcli]# s3iamcli createaccount -n s3user1 -e [email protected]
Enter Ldap User Id: sgiamadmin
Enter Ldap password:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)

Solution

Modify file /etc/haproxy/haproxy.cfg

#----------------------------------------------------------------------
# BackEnd roundrobin as balance algorith for s3 auth server
#----------------------------------------------------------------------
backend s3-auth
    balance static-rr                                     #Balance algorithm


    server s3authserver-instance1 0.0.0.0:9086 check ssl verify required ca-file /etc/ssl/stx-s3/s3auth/s3authserver.crt
    # server s3authserver-instance1 0.0.0.0:9085 check  # s3 auth server instance 1
    # server s3authserver-instance2 0.0.0.0:9086 check  # s3 auth server instance 2

Restart haproxy service

systemctl restart haprxoy

Issue. Vagrant doesn't rsync OR Files on VM not in sync with files on host after `vagrant up` or `vagrant rsync`

Solution

The issue lies with VM in Virtualbox not destroying correctly. To resolve this issue:

Open Virtualbox
Try to find the concerning VM
Select VM -> Right click -> Select Remove
Select Delete all files
Select Virtual Media Manager
Delete any suspicious vdi entries or entries with errors
Close Virtualbox
Attempt vagrant up again and verify the results.

Issue. s3server is not getting online while Bootstrap.

Solution

set the "S3_REUSEPORT" parameter to "true" in /opt/seagate/s3/config/s3config.yaml file.]]

Issue systemctl status Network.service shows service in failed state although its listed in systemctl status tree.

Solution

Check displayed IP addresses

[root@eosnode-1 ~]# ip a  
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever  
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:82:42:cd brd ff:ff:ff:ff:ff:ff inet 10.237.128.210/20 brd 10.237.143.255 scope global dynamic ens33 valid_lft 1021360sec preferred_lft 1021360sec inet6 fe80::20c:29ff:fe82:42cd/64 scope link valid_lft forever preferred_lft forever  
3: ens34: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:0c:29:82:42:d7 brd ff:ff:ff:ff:ff:ff  
4: ens35: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:0c:29:82:42:e1 brd ff:ff:ff:ff:ff:ff  
5: data0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether de:f8:8e:15:48:ce brd ff:ff:ff:ff:ff:ff inet 172.19.10.101/16 brd 172.19.255.255 scope global data0 valid_lft forever preferred_lft forever  
6: mgmt0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether b6:98:ca:f6:f7:f2 brd ff:ff:ff:ff:ff:ff inet 172.16.10.101/16 brd 172.16.255.255 scope global mgmt0 valid_lft forever preferred_lft forever

Check network interface configurations available
[root@eosnode-1 ~]# ls 1 /etc/sysconfig/network-scripts/ifcfg*

/etc/sysconfig/network-scripts/ifcfg-data0
/etc/sysconfig/network-scripts/ifcfg-enp0s3                           <-- Unwanted
/etc/sysconfig/network-scripts/ifcfg-enp0s8                           <-- Unwanted
/etc/sysconfig/network-scripts/ifcfg-enp0s9                           <-- Unwanted
/etc/sysconfig/network-scripts/ifcfg-ens33
/etc/sysconfig/network-scripts/ifcfg-ens36                             <-- Requires correction
/etc/sysconfig/network-scripts/ifcfg-ens37                             <-- Requires correction
/etc/sysconfig/network-scripts/ifcfg-lo
/etc/sysconfig/network-scripts/ifcfg-mgmt0

Remove the mismatch (extra) ones
[root@eosnode-1 ~]# rm -rf /etc/sysconfig/network-scripts/ifcfg-enp0s*

Correct the rest

[root@eosnode-1 ~]# mv /etc/sysconfig/network-scripts/ifcfg-ens36 /etc/sysconfig/network-scripts/ifcfg-ens34
[root@eosnode-1 ~]# mv /etc/sysconfig/network-scripts/ifcfg-ens37 /etc/sysconfig/network-scripts/ifcfg-ens35

Reboot node
[root@eosnode-1 ~]# shutdown -r now

Check network service

[root@eosnode-1 ~]# systemctl status network -l● network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
Active: active (running) since Thu 2019-09-26 23:27:19 IST; 2min 25s ago
Docs: man:systemd-sysv-generator(8)
Process: 913 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/network.service
└─1263 /sbin/dhclient q -lf /var/lib/dhclient/dhclient-ens33.lease -pf /var/run/dhclient-ens33.pid -H eosnode-1 ens33

Sep 26 23:27:12 eosnode-1 network[913]: Bringing up interface ens33:
Sep 26 23:27:12 eosnode-1 dhclient[1212]: DHCPREQUEST on ens33 to 255.255.255.255 port 67 (xid=0x36816470)
Sep 26 23:27:12 eosnode-1 dhclient[1212]: DHCPACK from 10.237.128.1 (xid=0x36816470)
Sep 26 23:27:14 eosnode-1 dhclient[1212]: bound to 10.237.128.210 – renewal in 564149 seconds.
Sep 26 23:27:14 eosnode-1 network[913]: Determining IP information for ens33... done.
Sep 26 23:27:14 eosnode-1 network[913]: [ OK ]
Sep 26 23:27:15 eosnode-1 network[913]: Bringing up interface mgmt0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ens36 does not seem to be present, delaying initialization.
Sep 26 23:27:15 eosnode-1 network[913]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start slave device ifcfg-ens34 for master mgmt0.
Sep 26 23:27:19 eosnode-1 network[913]: [ OK ]
Sep 26 23:27:19 eosnode-1 systemd[1]: Started LSB: Bring up/down networking.

Learnings

Packer

What has been learnt:

'packer' can source from existent VBox machine that allows to use VBoxManage calls as a part of packer spec to adjust the machine. Thus we may perform initial hardware and software configuration in that way to provide some base level for upper stacks of env.
'packer' can output to vagrant boxes which then might be utilized as source for the 'packer' builds again. It means that we can construct upper levels of the env in a iterative way.
Note: vagrant box as source for the packer doesn't allow to apply VBoxManage calls directly, options: prepare all common HW specific things for the base level as described before use runtime adjustment for running machines by direct VBoxManage calls if need some isolated changes per machine which are light enough (e.g. creation of medium and attaching them to some controller should work fast) use some temporary layer of active VBox machine to apply initial solution if we need some general env adjustment on an non-base level (valuable for a set of upper levels) and it is quite expensive (e.g. CPU and time bound) to apply that during the runtime
packer can build docker as well
packer operates the term "provisioner" for the configuration engines that applies changes to the environment. e.g. shell, salt, ansible and [many others|https://www.packer.io/docs/provisioners/index.html]. It is builder engine (docker, vagrant, virtualbox) agnostic. Thus we can use the same scripts for all providers

Vargant

NOTE: vagrant snapshop push/pop [--no-start] [--no-delete] may provide a good way for the fast env reset if a machine is used by a sequence of isolated tests

Overview what vagrant does when installing initial connection to VBox machine:

it uses ssh configuration that comes from vagrantfiles (it merges a set of them)
also tries to detect some things itself (like listening port, host of the machine)
also it considers some predefined things (like vagrant insecure key)
Usually base boxes come with insecure key installed so they can be easily initially accessed during "vagrant up" or e.g. when packer is asked to use an already running machine as a source
and it's likely that for non public boxes other more secure keys might be inserted then, also vagrant itself by default replaces it with some randomly generated key during "vagrant up" if it detects the insecure key (there is a good explanation as well) and it's a place where most of issues come into play
also some words regarding ssh port: by default machines are started in a private NAT network and internal service ports are forwarded to the host so e.g. sshd might be reachable using "127.0.0.1:2222". Usually vagrant/packer are ok with that but issues also take place...

Issues and solutions/workarounds:

packer/vagrant can't connect to machine since vagrant inserted randomly generated secure key: do not allow vagrant to insert that key and insert the one which we manage ourselves (related parameters: config.ssh.insert_key = false and config.ssh.private_key_path)
seems packer vagrant builder ignores ssh_private_key_file parameter (or may be I missed something) thus for cases when we set up custom secure key (instead of insecure one) in the source box we can workaround that by creation of vagrantfile template that would be packaged with the box, related parameters: output_vagrantfile and vagrantfile_template
packer for virtualbox-vm builder can't connect since can't get the right connection port: the reason might be in ssh_skip_nat_mapping options - if it's set to true than packer will require to set current ssh forwarded port on a localhost using ssh-port

Components

Check Logs

SSPL
$ journalctl -xefu sspl-ll.service

S3server
$ var/log/seagate/s3/s3server.INFO

MERO $ tail -f /var/log/halon.decision.log

Note: Check m0reportbug
$ m0reportbug -b

Index

Home
Demo_Videos
Dev_Guide
Framework
Known-issues
Miscellaneous-Guides
- HW-Setup_Storage-Enclosure
- Vagrant-Setup
Research
- Salt-Master_Active-Active
- PyPi-Hosted
Provisioner-KnowledgeBase
Setup-Guides
Sprints-planning-and-exit-info
User-Guides
Alternative-method-for-removing-LVM-metadata-information-from-enclosure-volumes
Provisioner-KnowledgeBase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioner KnowledgeBase

Dos and Don'ts

Dos

Dont's

Frequently Asked Questions

Known Issues/Workarounds

Issue. Error `ModuleNotFoundError: No module named 's3iamcli'` while creating iam account

Solution

Issue. Error `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)` while CreateAccount

Solution

Issue. Vagrant doesn't rsync OR Files on VM not in sync with files on host after `vagrant up` or `vagrant rsync`

Solution

Issue. s3server is not getting online while Bootstrap.

Solution

Issue systemctl status Network.service shows service in failed state although its listed in systemctl status tree.

Solution

Learnings

Packer

What has been learnt:

Vargant

Overview what vagrant does when installing initial connection to VBox machine:

Issues and solutions/workarounds:

Components

Check Logs

Index

Clone this wiki locally

Provisioner KnowledgeBase

Dos and Don'ts

Dos

Dont's

Frequently Asked Questions

Known Issues/Workarounds

Issue. Error ModuleNotFoundError: No module named 's3iamcli' while creating iam account

Solution

Issue. Error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600) while CreateAccount

Solution

Issue. Vagrant doesn't rsync OR Files on VM not in sync with files on host after vagrant up or vagrant rsync

Solution

Issue. s3server is not getting online while Bootstrap.

Solution

Issue systemctl status Network.service shows service in failed state although its listed in systemctl status tree.

Solution

Learnings

Packer

What has been learnt:

Vargant

Overview what vagrant does when installing initial connection to VBox machine:

Issues and solutions/workarounds:

Components

Check Logs

Index

Clone this wiki locally

Issue. Error `ModuleNotFoundError: No module named 's3iamcli'` while creating iam account

Issue. Error `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)` while CreateAccount

Issue. Vagrant doesn't rsync OR Files on VM not in sync with files on host after `vagrant up` or `vagrant rsync`