Use the shipped spec file to create an RPM. The package will install an example configuration file and a series of skeletons that can be used to generate example repositories. These will be used in the next section to help you to get started with the tool.
You can also use distutils and copy the example configuration file by hand.
Jens consumes a single configuration file that is located by default in
/etc/jens/main.conf
. Apart from tweaking this file, it's also necessary to
initialise and make available the metadata repositories that are needed by
Jens. The RPM ships examples for all the bits that are required to get
started with the tool, so in this section we will use all of them to get
a basic working configuration where you can start from.
The RPM creates a system user called jens
and sets the permissions of
the directories specified in the default configuration file for you.
So, let's put our hands on it. Firstly, we're going to initialise in /tmp
a
bunch of Git repositories for which, as mentioned previously, there are example
skeletons shipped by the package. This should be enough to get a clean
jens-update run that does something useful. However, if you already know how
Jens works you can safely forget about all these dummy repositories and plug-in
existing stuff.
The example configuration file installed by the package in /etc/jens
should
suffice for the time being. So, let's initialise the metadata repositories, a
module, a hostgroup and some Hiera data, based on the examples provided by the
package. It's recommended to run the commands below as jens
as the rest of the
tutorial relies on this account, however feel free to proceed as you see fit as
long as everything is consistent :)
# sudo -u jens bash
$ cp -r /usr/share/doc/puppet-jens-0.10/examples/example-repositories /tmp
$ cd /tmp/example-repositories
$ cp -r /usr/share/doc/puppet-jens-0.10/examples/environments /tmp/example-repositories
$ cp -r /usr/share/doc/puppet-jens-0.10/examples/repositories /tmp/example-repositories
$ ls | xargs -i bash -c "cd {} && git init && git add * && git commit -m 'Init' && cd .."
$ cd /var/lib/jens/metadata
$ git clone file:///tmp/example-repositories/environments environments
$ git clone file:///tmp/example-repositories/repositories repository
So now everything should be in place to start doing something interesting. Let's then trigger the first Jens run:
# cd /var/lib/jens
# sudo -u jens jens-update
Now take a look to /var/log/jens/jens-update.log
. It should look like:
INFO Obtaining lock 'jens' (attempt: 1)...
INFO Refreshing metadata...
INFO Refreshing repositories...
INFO Fetching repositories inventory...
WARNING Inventory on disk not found or corrupt, generating...
INFO Generating inventory of bares and clones...
INFO Refreshing bare repositories (modules)
INFO New repositories: set(['dummy'])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Cloning and expanding modules/dummy...
INFO Populating new ref '/var/lib/jens/clone/modules/dummy/master'
INFO Expanding EXISTING bare repositories...
INFO Purging REMOVED bare repositories...
INFO Refreshing bare repositories (hostgroups)
INFO New repositories: set(['myapp'])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Cloning and expanding hostgroups/myapp...
INFO Populating new ref '/var/lib/jens/clone/hostgroups/myapp/master'
INFO Expanding EXISTING bare repositories...
INFO Purging REMOVED bare repositories...
INFO Refreshing bare repositories (common)
INFO New repositories: set(['hieradata', 'site'])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Cloning and expanding common/hieradata...
INFO Populating new ref '/var/lib/jens/clone/common/hieradata/master'
INFO Cloning and expanding common/site...
INFO Populating new ref '/var/lib/jens/clone/common/site/master'
INFO Expanding EXISTING bare repositories...
INFO Purging REMOVED bare repositories...
INFO Persisting repositories inventory...
INFO Executed 'refresh_repositories' in 1405.49 ms
INFO Refreshing environments...
INFO New environments: set(['production'])
INFO Existing and changed environments: []
INFO Deleted environments: set([])
INFO Creating new environments...
INFO Creating new environment 'production'
INFO Processing modules...
INFO Processing hostgroups...
INFO Processing site...
INFO Processing common Hiera data...
INFO Purging deleted environments...
INFO Recreating changed environments...
INFO Refreshing not changed environments...
INFO Executed 'refresh_environments' in 11.53 ms
INFO Releasing lock 'jens'...
INFO Done
Also, keep an eye on /var/lib/jens/environments
and /var/lib/jens/clone
to
see what actually happened.
environments
└── production
├── hieradata
│ ├── common.yaml -> ../../../clone/common/hieradata/master/data/common.yaml
│ ├── environments -> ../../../clone/common/hieradata/master/data/environments
│ ├── fqdns
│ │ └── myapp -> ../../../../clone/hostgroups/myapp/master/data/fqdns
│ ├── hardware -> ../../../clone/common/hieradata/master/data/hardware
│ ├── hostgroups
│ │ └── myapp -> ../../../../clone/hostgroups/myapp/master/data/hostgroup
│ ├── module_names
│ │ └── dummy -> ../../../../clone/modules/dummy/master/data
│ └── operatingsystems -> ../../../clone/common/hieradata/master/data/operatingsystems
├── hostgroups
│ └── hg_myapp -> ../../../clone/hostgroups/myapp/master/code
├── modules
│ └── dummy -> ../../../clone/modules/dummy/master/code
└── site -> ../../clone/common/site/master/code
clone
├── common
│ ├── hieradata
│ │ └── master
│ │ └── data
│ │ ├── common.yaml
│ │ ├── environments
│ │ │ ├── production.yaml
│ │ │ └── qa.yaml
│ │ ├── hardware
│ │ │ └── vendor
│ │ │ └── sinclair.yaml
│ │ └── operatingsystems
│ │ └── RedHat
│ │ ├── 5.yaml
│ │ ├── 6.yaml
│ │ └── 7.yaml
│ └── site
│ └── master
│ └── code
│ └── site.pp
├── hostgroups
│ └── myapp
│ └── master
│ ├── code
│ │ ├── files
│ │ │ └── superscript.sh
│ │ ├── manifests
│ │ │ ├── frontend.pp
│ │ │ └── init.pp
│ │ └── templates
│ │ └── someotherconfig.erb
│ └── data
│ ├── fqdns
│ │ └── myapp-node1.example.org.yaml
│ └── hostgroup
│ ├── myapp
│ │ └── frontend.yaml
│ └── myapp.yaml
└── modules
└── dummy
└── master
├── code
│ ├── manifests
│ │ ├── init.pp
│ │ └── install.pp
│ ├── README.md
│ └── templates
│ └── config.erb
└── data
└── jens.yaml
As expected, one new environment and a few repositories were expanded as required by the environment (all master branches, basically).
Let's now add something to the dummy module in a separate branch, create a new environment using it and run jens-update again to see what it does:
# sudo -u jens bash
$ cd /tmp/example-repositories/module-dummy
$ git checkout -b test
Switched to a new branch 'test'
$ touch code/manifests/foo.pp
$ git add code/manifests/foo.pp
$ git commit -m 'foo'
[test 052c5b4] foo
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 code/manifests/foo.pp
$ cd /tmp/example-repositories/environments/
$ cp production.yaml test.yaml
## Edit the file so it looks like:
$ cat test.yaml
---
default: master
notifications: [email protected]
overrides:
modules:
dummy: test
$ git add test.yaml
$ git commit -m 'Add test environment'
[master 1d78cd5] Add test environment
1 files changed, 6 insertions(+), 0 deletions(-)
create mode 100644 test.yaml
$ cd /var/lib/jens/
$ jens-update
## A new branch is necessary so it's expanded
$ tree -L 1 /var/lib/jens/clone/modules/dummy/
/var/lib/jens/clone/modules/dummy/
├── master
└── test
## And the override is in place in the newly created environment
$ tree /var/lib/jens/environments/test/modules/
/var/lib/jens/environments/test/modules/
└── dummy -> ../../../clone/modules/dummy/test/code
$ tree /var/lib/jens/environments/test/hostgroups
/var/lib/jens/environments/test/hostgroups
└── hg_myapp -> ../../../clone/hostgroups/myapp/master/code
$ ls /var/lib/jens/environments/test/modules/dummy/manifests/foo.pp
/var/lib/jens/environments/test/modules/dummy/manifests/foo.pp
$ ls /var/lib/jens/environments/production/modules/dummy/manifests/foo.pp
ls: cannot access /var/lib/jens/environments/production/modules/dummy/manifests/foo.pp: No such file or directory
Log file wise this is what's printed:
INFO Obtaining lock 'jens' (attempt: 1)...
INFO Refreshing metadata...
INFO Refreshing repositories...
INFO Fetching repositories inventory...
INFO Refreshing bare repositories (modules)
INFO New repositories: set([])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Expanding EXISTING bare repositories...
INFO Populating new ref '/var/lib/jens/clone/modules/dummy/test'
INFO Purging REMOVED bare repositories...
INFO Refreshing bare repositories (hostgroups)
INFO New repositories: set([])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Expanding EXISTING bare repositories...
INFO Purging REMOVED bare repositories...
INFO Refreshing bare repositories (common)
INFO New repositories: set([])
INFO Deleted repositories: set([])
INFO Cloning and expanding NEW bare repositories...
INFO Expanding EXISTING bare repositories...
INFO Purging REMOVED bare repositories...
INFO Persisting repositories inventory...
INFO Executed 'refresh_repositories' in 691.18 ms
INFO Refreshing environments...
INFO New environments: set(['test'])
INFO Existing and changed environments: []
INFO Deleted environments: set([])
INFO Creating new environments...
INFO Creating new environment 'test'
INFO Processing modules...
INFO modules 'dummy' overridden to use treeish 'test'
INFO Processing hostgroups...
INFO Processing site...
INFO Processing common Hiera data...
INFO Purging deleted environments...
INFO Recreating changed environments...
INFO Refreshing not changed environments...
INFO Executed 'refresh_environments' in 19.11 ms
INFO Releasing lock 'jens'...
INFO Done
Now it's your turn to play with it! Commit things, create environments, add more modules... :)
Jens has two running modes that can be selected using the mode
key available
in the configuration file. By default Jens will run in polling mode
(mode=POLL
), meaning that all the repositories that Jens is aware of will be
polled (git-fetched) on every run. This is generally slow and not very
efficient but, on the other hand, simpler.
However, in deployments where notifications can be sent when new code is
available (for instance, via push webhooks emitted by Gitlab or Github), it's
recommended to run Jens in on-demand mode instead (mode=ONDEMAND
). These
notifications can be received by a listener (read below), transformed, and
handed over to Jens.
When this mode is enabled, every Jens run will first get "update hints" from a
local python-dirq queue (path set in the configuration file, being
/var/spool/jens-update
the default value) and only bug the servers when
there's actually something new to retrieve. This is much more efficient and it
allows running Jens more often as it's faster and more lightweight for the
server.
The format of the messages that Jens expects can be explored in detail by
reading messaging.py
but in short the schema is composed by two keys: a
timestamp in ISO format (time key) which is a string and a pickled binary
payload (data key) specifying what modules or hostgroups have changed. For
example:
{'time': '2015-12-10T14:06:35.339550',
'data': pickle.dumps({'modules': ['m1'], 'hostgroups': ['h1', 'h2']})}
The idea then is to have something producing this type of message. This suite
also ships a Gitlab listener that understands the payload contained in the
requests made via Gitlab push
webhooks
and translates it into the format used internally, producing this way the
messages required by Jens. This listener (and producer) can be run standalone
for testing purposes via jens-gitlab-producer-runner
or, much better, on top
of a web server talking WSGI. For this purpose an example WSGI file is also
shipped along the software. The producer has to run on the same host so it
has access to the local queue.
When a producer and jens-update
are cooperating and the on-demand mode is
enabled, information regarding the update hints consumed and the actions taken
is present in the usual log file, for example:
...
INFO Getting and processing hints...
INFO 1 messages found
INFO Executed 'fetch_update_hints' in 2.31 ms
...
INFO Fetching hostgroups/foo upon demand...
INFO Updating ref '/mnt/puppet/aijens-3afegt67.cern.ch/clone/hostgroups/foo/qa'
Also, jens-gitlab-producer.log
is populated as notifications come in and hints
are enqueued:
INFO 2015-12-10T14:43:01.705468 - hostgroups/foo - '0000003c/56698165ac7909' added to the queue
Requests can be authenticated using a secret token which has to be configured both on the Gitlab side (via the group or repository settings) and on the Jens side via:
[gitlabproducer]
secret_token = your_token_here
If there's no token set (default) then all requests will be accepted
regardless of the presence or the value of the header
X-Gitlab-Token
.
To protect Jens from hanging indefinitely in case of a lack of response from the remote, it's possible to override the SSH command and put a timeout in between so the Git operations don't run forever. This only applies if the traffic to the remotes is transported via SSH.
The idea is to write a file wrapping the call to ssh in a timeout. Example:
# cat /usr/local/bin/timeoutssh.sh
timeout 10 ssh $@
And then tell Jens to use that wrapper via the configuration file:
[git]
ssh_cmd_path = /usr/local/bin/timeoutssh.sh
If the configuration option is not set the environment variable GIT_SSH won't be internally set by Jens.
# cd /var/lib/jens
# sudo -u jens jens-stats -a
There are 1 modules:
- dummy (master,test) [17.5KiB]
There are 1 hostgroups:
- myapp (master) [18.2KiB]
There are 2 cached environments
- production
- test
There are 2 declared environments
- production.yaml
- test.yaml
There are 2 synchronized environments
- production
- test
Fetching repositories inventory...
{'common': {'hieradata': ['master'], 'site': ['master']},
'hostgroups': {'myapp': ['master']},
'modules': {'dummy': ['master', 'test']}}
The following command will remove all generated environments, caches and all repository clones, taking Jens to an initial state:
# cd /var/lib/jens
# sudo -u jens jens-reset --yes
...
Done -- Jens is sad now
# sudo -u jens jens-stats -a
There are 0 modules:
There are 0 hostgroups:
There are 0 cached environments
There are 2 declared environments
- production.yaml
- test.yaml
There are 0 synchronized environments
Fetching repositories inventory...
Inventory on disk not found or corrupt, generating...
Generating inventory of bares and clones...
{'common': {}, 'hostgroups': {}, 'modules': {}}
Jens implements one local locking mechanism based on a fcntl.flock so when run via a cronjob subsequent runs can't overlap. This is because of our deployment, detailed in the following paragraphs.
Currently, the deployment at CERN relies on several Jens instances on
top of different virtual machines running AlmaLinux 9 with different
update frequencies writing the clone
and environments
data to the
same NFS share but on different directories (whose name is based on
the FQDN of the Jens node in question). One is the primary instance
that runs jens-update every minute and the rest are satellite
instances that do it less often (frequently enough to not to overload
our internal Git service and to have a relatively up-to-date tree of
environments that could be taken to the "HEAD" state quickly if a node
had to take over).
This allows a relatively quick but manual fail over mechanism in case of primary node failure, which consists of: electing a new primary node so the new node has a higher update frequency than before, running jens-update by hand on that node to make sure that there's no configuration flip-flop and flipping a symlink so the Puppet masters (which are also reading manifests from the same NFS share) start consuming data from another Jens instance.
This is basically how our NFS share looks like:
/mnt/puppet
├── aijens -> aijens-3afegt67.cern.ch/
├── aijens-3afegt67.cern.ch
│ ├── clone
│ └── environments
├── aijens-ty5ee527.cern.ch
│ ├── clone
│ └── environments
├── aijens-ior59ff6.cern.ch
│ ├── clone
│ └── environments
├── aijens-dev.cern.ch
│ ├── clone
│ └── environments
└── environments -> aijens/environments/
11 directories, 0 files
So, in our case, the masters' modulepath is something like:
/mnt/puppetdata/aijens/environments/$environment/hostgroups:
/mnt/puppetdata/aijens/environments/$environment/modules
Where /mnt/puppetdata
is the mountpoint of the NFS share.
At the time of writing, our Jens instances are taking care of 260 modules, 150 hostgroups and 160 environments :)
It's possible to set a list of environments that won't ever be deleted from disk, even if the declaration is removed from the metadata. This is useful to protect delicate environments like production. To achieve this, set the following option in the configuration file:
[main]
protectedenvironments = production, qa
It's not recommended to add environments that have overrides to the list as they might end up incomplete if they're removed.
We recommend though to use this only as an extra safety feature and make sure that changes to environments are validated either manually or using a CI process.
The default value is an empty list, which means that no environment is protected.
If you wanted to know in detail what Jens does in every run, changing the debug
level to DEBUG (in /etc/jens/main.conf
) might be a good idea :)