-
Notifications
You must be signed in to change notification settings - Fork 14
Home
The load balancing service dynamically handles the list of machines behind a given DNS alias to allow scaling and improve availability by allowing several nodes to be presented behind a single name. It is one of the technologies enabling deployment of large scale applications on cloud resources.
The Domain Name System (DNS) is a naming system for computers or other resources. DNS is essential for Internet. It is an internet-standard protocol that allows to use names instead of IP addresses. Load balancing is an advanced function that can be provided by DNS, to distribute requests across several machines running the same service by using the same DNS name.
The LBD DNS Load Balancer has been developed as a cost-effective way to handle applications accepting the DNS timing constraints and not requiring affinity (also known as persistence or sticky sessions). It is currently (March 2019) used by 761 services on the site with two small VMs acting as LBD master and slave. The alias member nodes have configured a Simple Network Management Protocol (SNMP) agent that communicates with the lbclient program. The proposal is to provide a load metric number used by the LBD server to determine the subset of nodes from the set whose IP is to be presented. Let see an overview of the service:
The LBD server periodically gets a load metric from the lbclient in the alias member nodes using SNMP and uses the information to update the A (IPV4) and AAAA (IPV6) records for a DNS delegated zone that corresponds to the alias using Dynamic DNS (see RFC2136). The period ("polling_interval") is 5 minutes by default.
The LBD slave does like the LBD master, i.e: periodically gets a load metric from the alias member nodes. However, it only updates the DNS delegated zone when it loses contact with the LBD master. This is verified by trying to get a file with a "heartbeat" from a web server on the LBD master.
The lbclient provides a built-in load metric. Alternative load metrics can be configured by combining several Collectd metrics and constants. Health monitoring checks can also be configured for the alias members to be taken out of the alias when certain condition is triggered. A typical example is the check of the Roger state so that the node is taken out when the appstate is not 'production'. As well as several built-in checks you may also configure additional ones using Collectd metrics. You can also use the return code of an arbitrary program (or script) as a check. If the node is in working state the load metric is an integer greater than 0. If the load metric is 0 or lower than 0, it means that the machine is not available.
We have produced a LBaaS interface with a self-service GUI to facilitate alias creation and management.
How we define DNS load balanced aliases and how to use them is in the next section.
Here follows to create a DNS Load Balanced Alias for his service.
- DNS load balanced aliases can be created using the Ermis self-service GUI in https://aiermis.cern.ch/lbweb
- The Ermis GUI uses CERN Single Sign On authentication.
- This GUI uses hostgroup based authorisation so any user can do the operations as long as he is registered as owner of the base hostgroup of the alias, see man ai-pwn.
- The GUI feeds the alias information to the Ermis REST service that is used by the Puppet type that generates the configuration of the LBD servers. The Puppet run interval in the LBD servers is currently 30 minutes. So once the alias is created in the GUI it will appear with a maximal latency of 30 minutes in the LBD servers.
1. CERN SSO Login.
2. Go to Add LB Alias
. The display should look as follows:
3. Fill a desired name for your alias as shown above. If it has not been specified, the domain .cern.ch"
will be added. Note that the alias can also be on a subdomain, like myalias.mydomain.cern.ch
.
4. Choose whether your alias will be external
, ie. visible in the CERN external DNS server or not. Please note that being visible in the CERN external DNS server will not automatically open external access to the LB alias member nodes in the CERN firewall. If needed, please read the section External Access.
5. Provide a hostgroup
. Only alias members belonging to the same base level hostgroup
will be allowed. Only users that are owners of this hostgroup
will be allowed to manage the alias.
6. If needed, add the parameters like the 'canonical name records' and the 'best hosts'.
7. Submit you request.
8. Configure the alias member nodes in Puppet
following the section How to define DNS Load Balanced Alias Members.
9. Wait until Puppet runs in the nodes and in the LBD server (max 30 min for the LBD server).
As well as creation, the Ermis GUI also allows:
-
Modification with
"Modify LB Alias"
. The display should look as follows: -
And deletion with
"Delete LB Alias"
: -
You can also display the log of your LB alias in monit-timber.cern.ch by selecting it in the
"LB Alias Logs"
section of the GUI:You will be re-directed to a timber dashboard. The default dashboard contains graphs showing the members of the alias over time. Cliking on the
Server logs
link will display the logs related to this particular alias
A user can do the modification and deletion operations as long as he is registered as owner of the base hostgroup of the alias, see man ai-pwn. Read operations are for the moment allowed to all authenticated users.
As the mechanics are the same as with creation, the maximal latency of the modifications is also 30 minutes.
In what follows, we describe how a user should configure hosts designated to be behind a specific DNS LB alias.
- First of all, you have to define the load balanced alias as described in the above section DNS LB Alias Definition.
- Once this is done, alias member nodes may be configured for the LB alias by letting their Puppet configuration. This could be done either in the manifests or in the hiera data. The two options are as follow:
Defining the alias in puppet
class xxxx::loadbalancing {
include '::lbclient'
#Define the alias, and the checks that have to be executed
lbclient::alias { 'xxxx.cern.ch':
checks => ['nologin', 'roger', 'sshdaemon', 'tmpfull', 'xsessions', {'type' =>'collectd', 'data' => '[systemd-nscd/gauge-running]>0' }]
}
# Checks can be added later on as well:
lbclient::alias::check{'second collectd check':
check => {type => 'collectd',
data => '[systemd-sshd/gauge-running]>0',}
}
...
}
Defining the alias in hiera
class xxxx::loadbalancing {
include '::lbclient'
...
}
---
lbclient::aliases::definitions:
xxx.cern.ch:
checks:
- nologin
- roger
- sshdaemon
- type: collectd
data: '[systemd-nscd/gauge-running]>0'
loads:
- type: collectd
data: '[load/load-relative:shortterm]*125 + 1'
!!! warning When defining the client, remember to use the FQDN for the node (cern.ch included).
An example can be found in the [configuration of LXPLUS in GitLab]({{ aiwwwgitdir }}/it-puppet-hostgroup-lxplus/blob/qa/code/manifests/nodes/login.pp).
Type lbclient::alias
is defined in modules/lbclient/manifests/alias.pp
. The list under checks
will be used for health monitoring of the load balanced alias. checks
could be either a string (for checks that do not require parameters), or a dictionary, with the keys type
and data
. For consistency reasons, the checks
without parameters could also be sent as a dictionary with only type
( in other words, it is equivalent to put checks => ['nologin']
and checks => [{'type' =>'nologin'}]
).
Here follow the details:
- If
nologin
then the existence of either files/etc/iss.nologin
or/etc/nologin
will be checked so the machine will be removed from the load balanced alias when they exist. - If
sshdaemon
the machine will be removed from the load balanced alias when the sshd (daemon on port 22) is not running. - If
tmpfull
the machine will be removed from the load balanced alias when /tmp is full. - If
ftpdaemon
the machine will be removed from the load balanced alias when the ftpd (daemon on port 21) is not running. - If
gridftpdaemon
the machine will be removed from the load balanced alias when the gridftpd (daemon on port 2811) is not running. - If
webdaemon
the machine will be removed from the load balanced alias when the httpd (daemon on port 80) is not running. - If
xsessions
the lbclient will take into account how many X windows managers (GNOME, KDE, FVWM) are running for the built-in metric load calculation. - If
swaping
the lbclient will take into account if the node is swaping (makes an average over 2 seconds) for the built-in metric load calculation. - If
afs
the machine will be removed from the load balanced alias when afs is not running (check by stat entries in /afs/cern.ch/user/). - If
eos
the machine will be removed from the load balanced alias when one of the mounted EOS filesystems returns error with 'Transport endpoint is not connected' or 'Operation not supported' or timeout to the 'eosxd get eos.mgmurl' command. - If
roger
the machine will be removed from the load balanced alias when the Roger appstate is not 'production' (check by querying /etc/roger/current.yaml, which is updated by CERNMegabus service). - If
{type => command, data => <scriptname> }
the lbclient will run the program<scriptname>
and the machine will be removed from the load balanced alias when the return code is != 0. Please note that you risk the lbclient to timeout if the program<scriptname>
takes more then 3 seconds to respond. - If
{type => collectd, data => <collectdexpression>}
, the machine will be removed from the load balanced alias when the<collectdexpression>
is false.