Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime and runtime parts #1364

Closed
Closed
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
bac3bc1
Change Dockerfile to call install.sh && Add initial install.sh
todor-ivanov Apr 21, 2023
d2e165f
Install wmagent with root in the sys default module path && Add basic…
todor-ivanov Apr 24, 2023
01cb64e
Move all Dir struct at Dockerfile && Upgrade pip && Add better docstr…
todor-ivanov Apr 25, 2023
5531d22
Add README file
todor-ivanov Apr 25, 2023
ae96d43
Add container connection instructions to the README file
todor-ivanov Apr 25, 2023
11c9602
Identical option parsing for both run.sh and install.sh && Add partia…
todor-ivanov Apr 25, 2023
646b396
Set WMAgent docker specific bash prompt
todor-ivanov Apr 25, 2023
b32c195
Move user's aliases to the install.sh script.
todor-ivanov Apr 26, 2023
98e64f4
Fix basic_checks and README
todor-ivanov Apr 26, 2023
b7f7be4
Rename/Add WMA_* Prefix to all env vars
todor-ivanov Apr 27, 2023
0d5decc
Create hostadmin mountpoint and image deployment area && Download all…
todor-ivanov Apr 27, 2023
e6d2b90
Implement basic Docker to Host initialisation steps and checks
todor-ivanov Apr 27, 2023
b960f7e
Update README
todor-ivanov Apr 27, 2023
9e75581
Partial implementation of deploy_to_host && Update README
todor-ivanov Apr 27, 2023
6d45160
Add runtime parameters checks && Finalize docker_to_host && Finegrain…
todor-ivanov Apr 28, 2023
ef4fc4e
Implement _check_wmasecrets auxiliary parser
todor-ivanov Apr 28, 2023
8c981c5
Add fix about WMAgent.secrets temlpate identification for relval agents
todor-ivanov Apr 28, 2023
36ab1e4
Implement doeploy_to_container function
todor-ivanov Apr 28, 2023
05c2610
Add check for WMAgent.secrets checksum && Fix bug with missing .docke…
todor-ivanov Apr 28, 2023
2cfcb4b
Add _init_valid aux function && Improve md5sum checks.
todor-ivanov Apr 29, 2023
00dec37
Adding wmagent-docker-build.sh wmagent-docker-run.sh && Fixing bug in…
todor-ivanov Apr 29, 2023
abb2ca0
Add cron jobs creation at build time
todor-ivanov Apr 29, 2023
ef03489
Fix manage file mode && call deploy_to_agent again upon intialisation
todor-ivanov May 1, 2023
08b2b2e
Fix WMAgent.secrets update parsing commands.
todor-ivanov May 2, 2023
c5dc401
Fix missing wmagentpy3 links
todor-ivanov May 2, 2023
5a6df55
Add fix for missing mounts points at the host
todor-ivanov May 2, 2023
0cf2562
Add mariaDB to the container
todor-ivanov May 2, 2023
1d0e9d7
Add voms utils && Checks for certificate and myproxy
todor-ivanov May 3, 2023
1247014
Improve WMAgent.secrerts parsing && Start check_databases code && Reo…
todor-ivanov May 4, 2023
f7ab0ec
Temporary fixes for broken pypi packaging
todor-ivanov May 4, 2023
ca67200
Move temp fixes to deploy_to_container && Tie install downloads to th…
todor-ivanov May 5, 2023
5c588a1
Call activate-agent && init-agent
todor-ivanov May 8, 2023
e2fecea
Fix WMA_TAG_REG
todor-ivanov May 10, 2023
c618bb6
A really bad workaround for outdated yui library
todor-ivanov May 10, 2023
9072f07
Add agent config tweaks && Populate agaent resource-control
todor-ivanov May 10, 2023
55a775f
Tie default TEAMNAME with the current hostname at run.sh
todor-ivanov May 10, 2023
b4172e7
Add oracle client and databse checks && Clean leftovers and old comments
todor-ivanov May 11, 2023
75cf2a6
Change root mountpoint to /dat/dockerMount && Typo && More comments c…
todor-ivanov May 12, 2023
e48a468
Stop downloading files from the old deployment repository && upload t…
todor-ivanov May 12, 2023
a568a0e
Move WMA_DEPLOY_DIR to /usr/local
todor-ivanov May 12, 2023
2d7c5b8
Stop using cmsweb docker image for copying voms package files - use t…
todor-ivanov May 12, 2023
27634ef
Start using env.sh file from the pypi package deploy/ area instead of…
todor-ivanov May 12, 2023
7e070ca
Stop downloading utilitarian scripts and use them from the pypi packa…
todor-ivanov May 12, 2023
33cc87c
Release manage script from origin dependency && fetch deployment and …
todor-ivanov May 12, 2023
8fa2d29
Renew uploading agent config step
todor-ivanov May 12, 2023
14205b5
Update README
todor-ivanov May 12, 2023
69e67e0
Update README
todor-ivanov May 12, 2023
525b2af
Update README
todor-ivanov May 12, 2023
b338dba
Fix changed renew_proxy path
todor-ivanov May 15, 2023
dae519b
Update README
todor-ivanov May 18, 2023
df23f19
Typo while downloading yui rpm package
todor-ivanov May 19, 2023
a2151f5
Fix permissions for editting renew_proxy.sh at runtime
todor-ivanov Jun 15, 2023
f6be3c9
Remove Central_cervices runtime paramter
todor-ivanov Jun 15, 2023
176cb8f
Enable run/build wrapper scripts to download/upload docker images to …
todor-ivanov Jun 15, 2023
f589f42
Update README
todor-ivanov Jun 15, 2023
d149476
Add protection from missing /etc/tnsnames.ora mount for FNAL agents
todor-ivanov Jun 16, 2023
0293f28
Review comments - Get rid of rpm packages && Add environment tweaks i…
todor-ivanov Jun 22, 2023
ef297b0
Fix bad wget download command for yui files
todor-ivanov Jun 22, 2023
2bfc1f3
Fix typos && WARNING from check_docker_init
todor-ivanov Jun 28, 2023
d142249
Update README
todor-ivanov Jun 28, 2023
88c3ad5
Resolve $WMA_ROOT_DIR at runtime for $WMA_BUILD_ID
todor-ivanov Jun 29, 2023
298ef38
Export $TAG from Dockerfile && typo && TODO comments
todor-ivanov Jun 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 66 additions & 11 deletions docker/pypi/wmagent/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,69 @@
FROM registry.cern.ch/cmsweb/oracle:21_5-stable as oracle
FROM registry.cern.ch/cmsweb/dmwm-base:pypi-20230525
MAINTAINER Valentin Kuznetsov [email protected]

# Install basic OS package dependencies
RUN apt-get update
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo
ENV TAG=X.Y.Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep this line otherwise this will break the WMCore CD pipeline, see:
https://github.com/dmwm/WMCore/blob/master/.github/workflows/docker_images_template.yaml#L29

Perhaps you can do something like:

ENV TAG=X.Y.Z
ENV WMA_TAG=${TAG}

and it should resolve the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been done. Unresolving it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, I have lost that change in my previous commit. Sorry, my bad. Fixing it with my next one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code won't work! WMCore GH action workflow will update TAG but not WMA_TAG, which is the one actually used during the whole process.

RUN pip install wmagent==$TAG
ENV WDIR=/data
ENV USER=_wmagent
RUN useradd ${USER} && install -o ${USER} -d ${WDIR}
RUN echo "%$USER ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
USER ${USER}
RUN sudo chown -R $USER.$USER $WDIR
WORKDIR $WDIR
CMD ["python3"]
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils hostname net-tools iputils-ping cron mariadb-server myproxy voms-clients rlwrap libaio1 procps && apt-get clean

# copy oracle client:
COPY --from=oracle /usr/lib/oracle /usr/lib/oracle
ENV LD_LIBRARY_PATH=/usr/lib/oracle
ENV PATH=$PATH:/usr/lib/oracle
ENV PKG_CONFIG_PATH=/usr/lib/oracle

# WMA_TAG to be passed at build time through `--build-arg WMA_TAG=<WMA_TAG>`. Default: None
ARG WMA_TAG=None
ENV WMA_TAG=${WMA_TAG}
ENV WMA_USER=cmst1
ENV WMA_GROUP=zh
ENV WMA_UID=31961
ENV WMA_GID=1399
ENV WMA_ROOT_DIR=/data

# Basic WMAgent directory structure passed to all scripts through env variables:
# NOTE: Those should be static and depend only on $WMA_BASE_DIR
ENV WMA_BASE_DIR=$WMA_ROOT_DIR/srv
ENV WMA_ADMIN_DIR=$WMA_ROOT_DIR/admin/wmagent
ENV WMA_CERTS_DIR=$WMA_ROOT_DIR/certs

ENV WMA_HOSTADMIN_DIR=$WMA_ADMIN_DIR/hostadmin
ENV WMA_CURRENT_DIR=$WMA_BASE_DIR/wmagent/current
ENV WMA_INSTALL_DIR=$WMA_CURRENT_DIR/install
ENV WMA_CONFIG_DIR=$WMA_CURRENT_DIR/config
ENV WMA_MANAGE_DIR=$WMA_CONFIG_DIR/wmagent
ENV WMA_DEPLOY_DIR=/usr/local
ENV WMA_ENV_FILE=$WMA_DEPLOY_DIR/deploy/env.sh


# Setting up users and previleges
RUN groupadd -g ${WMA_GID} ${WMA_GROUP}
RUN useradd -u ${WMA_UID} -g ${WMA_GID} -m ${WMA_USER}
RUN install -o ${WMA_USER} -g ${WMA_GID} -d ${WMA_ROOT_DIR}
RUN usermod -aG mysql ${WMA_USER}
RUN rm -f /etc/mysql/mariadb.conf.d/50-server.cnf

# Add WMA_USER to sudoers
RUN echo "${WMA_USER} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Add all deployment needed directories
ADD bin $WMA_DEPLOY_DIR/bin
ADD etc $WMA_DEPLOY_DIR/etc

# Add install script
ADD install.sh ${WMA_ROOT_DIR}/install.sh

# Add wmagent run script
ADD run.sh ${WMA_ROOT_DIR}/run.sh

# Install the requested WMA_TAG.
RUN ${WMA_ROOT_DIR}/install.sh -v ${WMA_TAG}
RUN chown -R ${WMA_USER}:${WMA_GID} ${WMA_ROOT_DIR}

# Switch to the runtime directory and user
WORKDIR ${WMA_ROOT_DIR}
USER ${WMA_USER}
ENV USER=$WMA_USER

# Define the entrypoint. All the run.sh paramters should be passed at runtime.
ENTRYPOINT ["./run.sh"]
230 changes: 230 additions & 0 deletions docker/pypi/wmagent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
## WMAgent in Docker using pypi deployment method.

### Requires:
* Docker to be installed on the host VM (vocmsXXXX)
* HTcondor schedd to be installed and configured at the host VM
* CouchDB to be installed on the host VM
* MariaDB to be installed on the host VM (Depends on the type of relational database to be used MariaDB/Oracle)
* Service certificates to be present at the host VM
* `WMAgent.secrets` file to be present at the host VM

### The implementation is realized through the following files:
* `Dockerfile` - provides all basic requirements for the image and sets all common env variables to both `install.sh` and `run.sh`.
* `install.sh` - called through `Dockerfile` `RUN` command and provided with a single parameter at build time `WMA_TAG`
* `run.sh` - set as default `ENTRYPOINT` at container runtime. All agent related configuration parameters are passed as named arguments and used to (re)generate the agent configuration files. All service credentials and schedd caches are accessed via host mount points
* `wmagent-docker-build.sh` - simple script to be used for building a WMAgent docker image
* `wmagent-docker-run.sh` - simple script to be used for running a WMAgent docker container

**Build options (accepted by `install.sh`):**
* `WMA_TAG=2.2.1`

**RUN options (accepted by `run.sh`):**
* `TEAMNAME=testbed-$HOSTNAME`
* `CENTRAL_SERVICES=cmsweb-testbed.cern.ch`
* `AGENT_NUMBER=0`
* `FLAVOR=mysql`


### Building a WMAgent image

The build process may happen at any machine running a Docker Engine.

**Build command:**
* Using the wrapper script to build WMAgent locally:
```
ssh vocms****
cmst1
cd /data
git clone https://github.com/dmwm/CMSKubernetes.git
cd /data/CMSKubernetes/docker/pypi/wmagent/
./wmagent-docker-build.sh -v 2.2.1
```
* Using the wrapper script to build and upload WMAgent to registry.cern.ch:
```
./wmagent-docker-build.sh -v 2.2.1 -p
```
* Here is what is happening under the hood:
```
WMA_TAG=2.2.1
docker build --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG -t wmagent:$WMA_TAG -t wmagent:latest /data/CMSKubernetes/docker/pypi/wmagent/ 2>&1 |tee /data/build-wma.log
```
**Partial output:**
```
...
#4 [ 1/13] FROM registry.cern.ch/cmsweb/dmwm-base:pypi-20230314@sha256:71cf3825ed9acf4e84f36753365f363cfd53d933b4abf3c31ef828828e7bdf83
#4 DONE 0.0s
...
#14 0.110 =======================================================
#14 0.110 Starting new agent deployment with the following data:
#14 0.110 -------------------------------------------------------
#14 0.111 - WMAgent version : 2.2.1
#14 0.113 - Python verson : Python 3.8.16
#14 0.114 - Python Module Path : /usr/local/lib/python3.8/site-packages
#14 0.114 =======================================================
...
#18 naming to docker.io/library/wmagent:2.2.1 done
#18 DONE 3.3s
```

### Running a WMAgent container

One needs to bind mount several directories from the host VM (vocmsXXXX).
* /data/dockerMount/certs
* /etc/condor (schedd runs on the host, not the container)
* /tmp
* /data/dockerMount/srv/wmagent/current/install (stateful service and component dirs)
* /data/dockerMount/srv/wmagent/current/config (for persisting agent configuration data)
* /data/dockerMount/admin/wmagent (in order to access the WMAgent.secrets)


The install and config dirs will be initialized the first time you execute run.sh and a .dockerinit file will be placed to keep track of the initialization. Subsequent container restarts won't touch these directories.

**Run command:**

* Initialising the agent for the first time:
```
ssh vocms****
cmst1
cd /data/CMSKubernetes/docker/pypi/wmagent/
### cleaning old agent data:
rm -rf /data/dockerMount/srv/
./wmagent-docker-run.sh -t <team_name> -n <agent_number> -f <db_flavour> -c <central_services> &
```
* Initialising the agent for the first time using a docker image from registry.cern.ch:
```
./wmagent-docker-run.sh -t <team_name> -n <agent_number> -f <db_flavour> -c <central_services> -p -v 2.2.1 &
```
* Running the agent:
```
./wmagent-docker-run.sh &
```

* Here is what is happening under the hood:
```
WMA_ROOT_DIR=/data/dockerMount

dockerOpts=" \
--network=host \
--rm \
--hostname=`hostname -f` \
--name=wmagent \
--mount type=bind,source=/etc/tnsnames.ora,target=/etc/tnsnames.ora,readonly \
--mount type=bind,source=/etc/condor,target=/etc/condor,readonly \
--mount type=bind,source=/tmp,target=/tmp \
--mount type=bind,source=$WMA_ROOT_DIR/certs,target=/data/certs \
--mount type=bind,source=$WMA_ROOT_DIR/srv/wmagent/current/install,target=/data/srv/wmagent/current/install \
--mount type=bind,source=$WMA_ROOT_DIR/srv/wmagent/current/config,target=/data/srv/wmagent/current/config \
--mount type=bind,source=$WMA_ROOT_DIR/admin/wmagent,target=/data/admin/wmagent/hostadmin \
"

wmaOpts=" \
-f mysql \
-t testbed-vocms0260 \
-n 0 \
-c cmsweb-testbed.cern.ch"

docker run $dockerOpts wmagent $wmaOpts
```

**Partial output:**
```
=======================================================
Starting WMAgent with the following initial data:
-------------------------------------------------------
- WMAgent Version : 2.2.1
- WMAgent TeamName : testbed-vocms0260
- WMAgent Number : 0
- WMAgent Host : vocms0260.cern.ch
- WMAgent Config : /data/srv/wmagent/current/config
- WMAgent Relational DB type : oracle
- Python verson : Python 3.8.16
- Python Module Path : /usr/local/lib/python3.8/site-packages
=======================================================
...
```

**NOTE:**
Currently, it is a must that only one WMAgent container should be running on a singe agent VM. It is partially guarantied by setting the `--name=wmagent` parameter at the `docker run` command above. But it is in fact possible to over come this by setting a different name of the new container, but bare in mind all unpredictable consequences of such action. If one tries tr start two containers with the same name, the expected err is:
```
docker run $dockerOpts wmagent:$WMA_TAG $wmaOpts

docker: Error response from daemon: Conflict. The container name "/wmagent" is already in use by container "c4c64688a75b6ac8f5cc5e4c951db324b2441ec1434f2e1d604a49d8009ff2a1". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'
```




### Checking container status
```
ssh vocms****

docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
78d7e1baa3df wmagent:2.2.1 "./run.sh -f oracle ..." 2 hours ago Up 2 hours wmagent

```

## Stopping the WMAgent container
In order to stop the WMAgent container one just needs to kill it, the `--rm` option at `docker run` commands assures we leave no leftover containers.

**Shutdown command:**
```
docker kill wmagent
```

### Enforce container reinitialisation at the host:
The WMAgent needs to preserve its configuration and initialisation data permanently at the host. For the purpose we use Host to Docker bind mounts.
Once a specific WMAgent image has been run for the first time it leaves a small set of .dockerInit files at all places where permanent data(like config files and job caches) at the host is preserved.
On any further restart of the container, hence the WMAgent itself, we do not go through all the initialisation steps again if we find the
relevant .dockerInit file and the $WMA_BUILD_ID hash contained there matches the $WMA_BUILD_ID of the currently starting container.
In order for one to enforce reinitialisation steps to be performed one needs to delete all .dockerInit files and restart the wmagent container.

**NOTE: This reinitialisation may result in losing previous job caches and database records**
**Reinitialisation command:**
```
docker kill wmagent

sudo find /data/dockerMount -name .dockerInit -delete

docker run $dockerOpts wmagent:$WMA_TAG $wmaOpts
```

**Partial output:**
```
=======================================================
Starting WMAgent with the following initialisation data:
-------------------------------------------------------
- WMAgent Version : 2.2.1
...
=======================================================
-------------------------------------------------------
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 110b443165e3b5a4ba569b8a1ab063a616132602e55ba06b0c3e89a01e643f31
dockerInitId: /data/admin/wmagent/hostadmin/.dockerInit:
...
ERROR
-------------------------------------------------------
Start: Performing Docker image to Host initialisation steps
...
Done: Performing Docker image to Host initialisation steps
-------------------------------------------------------
-------------------------------------------------------
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 110b443165e3b5a4ba569b8a1ab063a616132602e55ba06b0c3e89a01e643f31
dockerInitId: 110b443165e3b5a4ba569b8a1ab063a616132602e55ba06b0c3e89a01e643f31
OK
-------------------------------------------------------
...
```

### Connecting to the container

First login at the VM and from there connect to the container:

**Login sequence:**
```
docker exec -it wmagent /bin/bash
...
(WMAgent-2.2.1) [cmst1@vocms0260:current]$ manage status
```
Loading