Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime and runtime parts #1364

Closed

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Apr 21, 2023

DO NOT MERGE YET

With the current PR we modify the Dockerfile such that:

  • It calls the install.sh script instead of directly executing pip install wmagent
  • It accept the WMA_TAG as a parameter at build time with --build-arg WMA_TAG=<WMA_TAG>
  • It creates the expected GUID and UID which should match with the respective ones at the VM

The following new scripts are provided as well:

  • We also add the basic install.sh script, which is configuring the whole deployment structure and the future places for the credentials mount points.

  • We also add the run.sh script to configure and start the container a the host

  • We also provide two wrapper scripts: wmagent-docer-build.sh and wmagent-docker-run.sh, which are taking care of all docker configs and parameters for both build and runtime.

NOTE:
The minimal WMAgent tag to be used for this PR should be 2.2.1rc3

NOTE:
The files:

  • etc/local.ini
  • etc/my.cnf
  • etc/rucio.conf
  • bin/manage
    Are a one to one copy from the old deployment repository (master branch), except a minor change in the few environment variables read by the bin/manage script, which I will explicitly point here:
[user@unit01 data]$ diff -u  /data/CMSKubernetes/docker/pypi/wmagent/bin/manage  /data/deployment/wmagentpy3/manage 
--- /data/CMSKubernetes/docker/pypi/wmagent/bin/manage	2023-05-12 17:53:15.536200825 +0200
+++ /data/deployment/wmagentpy3/manage	2023-05-03 13:10:45.040435640 +0200
@@ -4,7 +4,10 @@
 #
 # Global variables etc
 #
-ROOT_DIR=$WMA_CURRENT_DIR
+THIS_SCRIPT=$(readlink -f $0)
+PROJ_DIR=$(dirname $(dirname $THIS_SCRIPT))
+ROOT_DIR=$(dirname $(dirname $(dirname $THIS_SCRIPT)))

@@ -233,14 +236,14 @@
 # Environment
 #
 
-# . $ROOT_DIR/apps/wmagentpy3/etc/profile.d/init.sh
+. $ROOT_DIR/apps/wmagentpy3/etc/profile.d/init.sh
 
+export WMAGENTPY3_ROOT
 # WMAGENT_ROOT == WMCORE_ROOT now but in old rpms WMCORE_ROOT was seperate,
 # if WMCORE_ROOT already set export it, else set to WMAGENT_ROOT
-export WMAGENTPY3_ROOT=$WMA_DEPLOY_DIR
-export WMCORE_ROOT=$WMA_DEPLOY_DIR
-export YUI_ROOT=$WMA_DEPLOY_DIR/yui/
-export WMAGENTPY3_VERSION=$WMA_TAG
+export WMCORE_ROOT=${WMCORE_ROOT:-$WMAGENTPY3_ROOT}
+export YUI_ROOT
+export COUCHSKEL_ROOT

This one fixes the following two issues at WMCore:
Fixes dmwm/WMCore#8797
Fixes dmwm/WMCore#11564

@todor-ivanov
Copy link
Contributor Author

And just a heads up.... the interim results from my first commit are quite promising:

  • Here is the build log:
[user@unit01 wmagent]$ docker build --network=host --build-arg WMA_TAG=$WMA_TAG -t wmagent:$WMA_TAG .
[+] Building 64.4s (16/16) FINISHED                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                                     0.0s
 => => transferring dockerfile: 1.22kB                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                                          0.0s
 => [internal] load metadata for registry.cern.ch/cmsweb/dmwm-base:pypi-20230314                                                                                                         0.1s
 => [ 1/11] FROM registry.cern.ch/cmsweb/dmwm-base:pypi-20230314@sha256:71cf3825ed9acf4e84f36753365f363cfd53d933b4abf3c31ef828828e7bdf83                                                 0.0s
 => [internal] load build context                                                                                                                                                        0.0s
 => => transferring context: 3.35kB                                                                                                                                                      0.0s
 => CACHED [ 2/11] RUN apt-get update                                                                                                                                                    0.0s
 => CACHED [ 3/11] RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo                                                                                        0.0s
 => CACHED [ 4/11] RUN useradd cmst1                                                                                                                                                     0.0s
 => CACHED [ 5/11] RUN install -o cmst1 -g 1399 -d /data                                                                                                                                 0.0s
 => CACHED [ 6/11] RUN echo "%cmst1 ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers                                                                                                              0.0s
 => [ 7/11] ADD install.sh /data/install.sh                                                                                                                                              0.0s
 => [ 8/11] ADD run.sh /data/run.sh                                                                                                                                                      0.0s
 => [ 9/11] RUN chown cmst1 /data/run.sh /data/install.sh                                                                                                                                0.1s
 => [10/11] WORKDIR /data                                                                                                                                                                0.0s
 => [11/11] RUN /data/install.sh 2.2.0.2                                                                                                                                                61.3s
 => exporting to image                                                                                                                                                                   2.7s
 => => exporting layers                                                                                                                                                                  2.7s
 => => writing image sha256:e50eb18523dd7b4a0c11006481330332159acfea6d40760dd98bfde83b736480                                                                                             0.0s 
 => => naming to docker.io/library/wmagent:2.2.0.2                                                                                                                                       0.0s 
[user@unit01 wmagent]$ docker image list 
REPOSITORY    TAG       IMAGE ID       CREATED          SIZE
wmagent       2.2.0.2   e50eb18523dd   57 seconds ago   1.32GB

  • And here is the run log:
[user@unit01 WMCoreDev.d]$ docker ps  
CONTAINER ID   IMAGE             COMMAND          CREATED          STATUS          PORTS     NAMES
a0d98ac82feb   wmagent:2.2.0.2   "/bin/bash -l"   17 seconds ago   Up 16 seconds             sad_shaw
[user@unit01 WMCoreDev.d]$ docker exec -it a0d98ac82feb /bin/bash
cmst1@a0d98ac82feb:/data$ ll
total 40
drwxr-xr-x. 1 cmst1  1399    17 Apr 21 18:51 .
drwxr-xr-x. 1 root  root      6 Apr 21 18:54 ..
-rwxr-xr-x. 1 cmst1 root   3168 Apr 21 18:51 install.sh
-rwxr-xr-x. 1 root  root   2617 Jan  9 17:17 manage
-rwxr-xr-x. 1 root  root   1334 Mar 14 16:00 monitor.sh
-rwxr-xr-x. 1 cmst1 root  14410 Apr 21 16:37 run.sh
drwxr-xr-x. 5 cmst1 cmst1    47 Apr 21 18:51 srv

cmst1@a0d98ac82feb:/data$ cd srv/
cmst1@a0d98ac82feb:/data/srv$ ll
total 0
drwxr-xr-x. 5 cmst1 cmst1 47 Apr 21 18:51 .
drwxr-xr-x. 1 cmst1  1399 17 Apr 21 18:51 ..
drwxr-xr-x. 3 cmst1 cmst1 21 Apr 21 18:51 admin
drwxr-xr-x. 2 cmst1 cmst1  6 Apr 21 18:51 certs
drwxr-xr-x. 3 cmst1 cmst1 36 Apr 21 18:51 wmagent

cmst1@a0d98ac82feb:/data/srv/wmagent$ ll
total 12
drwxr-xr-x.   3 cmst1 cmst1   36 Apr 21 18:51 .
drwxr-xr-x.   5 cmst1 cmst1   47 Apr 21 18:51 ..
drwxr-xr-x. 167 cmst1 cmst1 8192 Apr 21 18:52 2.2.0.2
lrwxrwxrwx.   1 cmst1 cmst1   25 Apr 21 18:51 current -> /data/srv/wmagent/2.2.0.2

  • And obviously all paths and env variables are correctly set:
cmst1@a0d98ac82feb:/data/srv/wmagent/current$ python
Python 3.8.16 (default, Feb  4 2023, 12:19:00) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from WMCore import __version__ as WMCoreVersion
>>> print(WMCoreVersion)
2.2.0.2

FYI: @amaltaro @vkuznet

@todor-ivanov todor-ivanov changed the title Change Dockerfile to call install.sh && Add initial install.sh Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime runtime parts May 4, 2023
@todor-ivanov todor-ivanov changed the title Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime runtime parts Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime and runtime parts May 10, 2023
@todor-ivanov
Copy link
Contributor Author

Hi @vkuznet @amaltaro you may start reviewing this PR.
There are still many, many things that could be worked out, simplified, optimized .... etc, but anyway we will discuss them on the go. The agent is currently starting successfully, which was the main goal of this development. Things are in a much much better shape now. Please go ahead and give me your feedback.
Thanks

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov as we discussed today, here is my first review based on giving this docker build + docker run a try, looking what I get in my terminal, and making spot checks in the scrips.

In addition to the questions/concerns/issues that I will report below, I think it would be extremely useful if you:
a) either update the description stating which scripts are a 1 to 1 copy of the deployment repository; or make a PR against the deployment repository with all the files that you are also copying/touching in this PR, this way we can easily review those changes.
b) I feel bad for saying it, but I guess nobody likes to read documentation and or navigate through code with typos and mistakes. There are 10s of those, so instead of reporting each of them, I'd ask you to review your changes and correct those.

About the questions/concerns and issues that I found along the way while giving it a try. I built an image based on 2.2.0.7 from a cmsdocker node, uploaded it to the registry with label 2.2.0.7-alan and try to run it on your test agent node vocms0260 (I couldn't get it to run). Here is the remaining feedback:

  1. the documentation says that we need to build an image with the cmst1 user. Do we really need it?
  • In addition, FNAL nodes use a different user. Will there be any impact on this?
  1. I see this red warning message when building the image:
    "WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv"
    is it because we execute install.sh with the standard Dockerfile template user (root), instead of first switching it to cmst1 and only then then executing install.sh?
  2. Where does the validation of the TEAMNAME come from? I failed to run it with a non-expected team name (and it will fail for cms-home as well, for instance)
  3. When running wmagent-docker-run script, we have no way to define which tag we want to execute (comment made along the code as well)
  4. I also had an ERROR when executing wmagent-docker-run, please see [2]. Which is then followed by a few other errors like [3]
  5. On the question above, I thought it could be related to having the services down, but even after I started couchdb and mysql, it still gave me the same error.

[2]

Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
dockerInitId: /data/srv/wmagent/current/config/.dockerInit
/data/srv/wmagent/current/config/couchdb/.dockerInit
/data/srv/wmagent/current/config/mysql/.dockerInit
/data/srv/wmagent/current/config/rucio/.dockerInit
/data/srv/wmagent/current/config/wmagent/.dockerInit
/data/srv/wmagent/current/install/.dockerInit
7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
ERROR

[3]

Start: Performing local Docker image initialisation steps
deploy_to_container: Try Copying the host WMAgent.secrets file into the container admin area
deploy_to_container: Done
deploy_to_container: Updating WMAgent.secrets file with the current host's details
deploy_to_container: Double checking the final WMAgent.secrets file
deploy_to_container: Checking Certificates and Proxy
sed: can't read /data/admin/wmagent/renew_proxy.sh: No such file or directory
chmod: cannot access '/data/admin/wmagent/renew_proxy.sh': No such file or directory
deploy_to_container: Checking Certificate lifetime:
deploy_to_container: Certifficate end date: Sep  7 12:04:12 2023 GMT
deploy_to_container: Checking myproxy lifetime:
deploy_to_container: myproxy end date: May 17 13:03:40 2023 GMT
./run.sh: line 357: /data/admin/wmagent/renew_proxy.sh: No such file or directory
ERROR: Failed to renew expired myproxy
ERROR: deploy_to_container

docker/pypi/wmagent/install.sh Outdated Show resolved Hide resolved

# Fix for outdated yui library - A really bad workaround. We should get rid of it ASAP:
wget -nv -P $WMA_DEPLOY_DIR wget http://cmsrep.cern.ch/cmssw/repos/comp/slc7_amd64_gcc630/0000000000000000000000000000000000000000000000000000000000000000/RPMS/cd/cda5f9ef4b33696e67c9e2f995dd5cb6/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm
mkdir $WMA_DEPLOY_DIR/yui && cat $WMA_DEPLOY_DIR/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm|rpm2archive - |tar --strip-components=13 -xzv --directory $WMA_DEPLOY_DIR/yui
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the RPM contains all the files that we need for this yui library, I think we should adopt what is used in CouchDB:
https://github.com/dmwm/deployment/blob/master/couchdb/manage#L384-L393

Otherwise, it defeats the purpose of migrating away of RPMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can adopt that method indeed. I'll fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still unresolved as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

echo "-----------------------------------------------------------------------"
echo "Start $stepMsg"
set -x
cat <<EOF >> /home/${WMA_USER}/.bashrc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to provide a temp_bashrc with the actual content, then cat + append it to .bashrc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that different from what I am doing here? Except that we should juggle with one more temporary file. In the current implementation you do see the content on the screen, right here the next few lines. And it is concatenated to the original/standard .bashrc without overwriting it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outcome is the same. I am just suggesting a different approach on how the .bashrc could get populated. Given that it's not a handful of lines, in my opinion it would be cleaner to have this content in its own file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then, in order to minimize the number of files/places where we deal with the environment, I'd like to reuse the already existing env.sh file and just add to it. I'll also apply some basic tweaks to relate all the commands executed to the WMA_ROOT_DIR. That way, on a later stage, when we completely move away from the old deployment model and we no longer care about backwards compatibility, it will be good to have all the env tweaks in a single place which would help us in the rewriting the deployment scripts at https://github.com/dmwm/WMCore/tree/master/deploy.

docker/pypi/wmagent/install.sh Show resolved Hide resolved
docker/pypi/wmagent/wmagent-docker-run.sh Outdated Show resolved Hide resolved
--mount type=bind,source=/etc/tnsnames.ora,target=/etc/tnsnames.ora,readonly \
--mount type=bind,source=/etc/condor,target=/etc/condor,readonly \
--mount type=bind,source=/tmp,target=/tmp \
--mount type=bind,source=$WMA_ROOT_DIR/certs,target=/data/certs \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid maintaining credentials in different places, maybe we should use the standard /data/certs as source as well.
The same comment applies to the admin directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that.... but we will loose the feature of a common root mount point for the whole container deployment, initialization and run, which is why I wanted to have the $WMA_ROOT_DIR at the first place..... And actually reading the code now it should be called $WMA_ROOT_MOUNT or $WMA_DOCKER_MOUNT.

For the time being, I'd like to keep it separate, at least until we validate this. Upon that we may change it and reuse host cert and admin dirs.


dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG "

docker build $dockerOpts -t wmagent:$WMA_TAG -t wmagent:latest .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't explore this very much, but I was failing to push the new image to CERN registry. I had to rename the image and only then I managed to.
Said that, you might have to update this docker build command to actually define a tag like registry.cern.ch/cmsweb/wmagent:$WMA_TAG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, at this stage the whole procedure does not involve cern registry at all. We should get there in some other step. This PR concerns only local builds on the host where the container should be run.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved now.
With my latest commit I introduced the option -p to wmagent-docker-build.sh, which triggers a proper tagging and uploading/pushing the so built image to registry.cern.ch. If not used the build will be local only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be working fine, even though I mistyped my password and the upload step failed. Nonetheless, I do see all the tags in the machine where it was built:

amaltaro@cmsdocker01:/build/amaltaro/CMSKubernetes/docker/pypi/wmagent $ docker image ls
REPOSITORY                                 TAG                   IMAGE ID            CREATED             SIZE
registry.cern.ch/cmsweb/wmagent            2.2.2rc1              c129fdd21e3a        14 minutes ago      2.04GB
registry.cern.ch/cmsweb/wmagent            latest                c129fdd21e3a        14 minutes ago      2.04GB
wmagent                                    2.2.2rc1              c129fdd21e3a        14 minutes ago      2.04GB
wmagent                                    latest                c129fdd21e3a        14 minutes ago      2.04GB

I am unsure about the latest tag though, given that this will be part of the WMCore CI/CD pipeline, the latest tag would be used for any tag that we cut in WMCore. Said that, I would suggest to simply remove the latest tag from this setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the latest tag for the local image is needed. It achieves two goals:

  • simplifying docker commands for running and maintaining the agent to just docker <command> wmagent and avoids the complexity of pointing the full <repository/image:tag> tuple every time
  • it guaranties we run one and only one agent instance/docker container at the host

We can skip tagging it as latest during the upload step though. That way we will not affect the CI/CD pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

providing a second flag to the build wrapper script gives the expected behavior. It will upload the latest tag to registry.cern.ch only if requested by calling it with:

./wmagent-docker-build.sh -v 2.2.1 -p -l

The new option is described in the --help message as well. All coming with my next commit

passed to the current script are to be forwarded to the WMAgent container entrypoint 'run.sh'

Usage: wmagent-docker-run.sh [-t <team_name>] [-n <agent_number>] [-c <central_services_url>] [-f <db_flavour>]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should support the WMAgent tag that we want to pull/run as well. Otherwise it will run whatever image version we have in the host (lacking it, I guess it would pull the latest tag).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said it in several places already, it is always a local build, so for the time being it will always be pointing to the latest version. And it is actually intentional. As you can see we are explicitly building against two tags $WMA_TAG and latest. This is a crucial part of the protection for one not to be able to run two wamagent containers on the same host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the wmagent-docker-build.sh a -p option was added to wmagent-docerk-run.sh as well, which would trigger an image pull from registry.cern.ch. It is coupled with the -v <wma.tag> version option, so one can run a specific WMAgent version from the registry. It defaults to latest if skipped.

FYI: @amaltaro

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented May 19, 2023

Hi @amaltaro thanks for taking a look, but I am afraid you missed several things that I tried to clarify to the best I could, and then your effort from yesterday just went in vain, and you were observing old state/status of the final image, leading to many many errors and misbehavior you were facing that have been already resolved.

And let me hurrin in my reply and state it clear here as well: bellow are the boundary condition for this PR to work:

  • One needs a CERN agent VM with Docker Engine installed on it
  • These images are only locally built, no procedure for validation and uploading to CERN registry is considered so far.
  • The minimal WMAgent tag this whole thing would ever work is 2.2.1rc3 and no earlier.

So now to reply to your comment separately:

a) either update the description stating which scripts are a 1 to 1 copy of the deployment repository; or make a PR against the deployment repository with all the files that you are also copying/touching in this PR, this way we can easily review those changes.

This was done yesterday even while we were talking in the chat... maybe you have not refreshed your page.... I do not know.... but al these thigs you request were in the PR's description since yesterday.

I built an image based on 2.2.0.7 from a cmsdocker node, uploaded it to the registry with label 2.2.0.7-alan and try to run it on your test agent node vocms0260 (I couldn't get it to run)

This will not work - minimal possible version is 2.2.1rc3 all previous version are having completely broken python packegs and are not including any of the bug fixes I was finding on the way. A procedure for uploading the to cern registry have never been included.

Let me repeat here the build and run commands as given in the instructions to the README file:
Build command:

ssh vocms****
cmst1
cd /data
git clone https://github.com/dmwm/CMSKubernetes.git
cd /data/CMSKubernetes/docker/pypi/wmagent/
./wmagent-docker-build.sh -v 2.2.1rc3

Run command:
Initializing the agent:

ssh vocms****
cmst1
cd /data/CMSKubernetes/docker/pypi/wmagent/
# cleaning old agent data:
rm -rf /data/dockerMount/srv/
./wmagent-docker-run.sh -t <team_name> -n <agent_number> -f <db_flavour> -c <central_services> &

Running the agent

./wmagent-docker-run.sh &
  1. the documentation says that we need to build an image with the cmst1 user. Do we really need it?

We do not need to build the image with cmst1 user, but this is the user comomn for everybody on the agent/s that is part of the docker system groop and has proper access rights in vocms0*** machines ..... otherwise everybody may need to double check and manage his personal usergroups in the machine.

  1. I see this red warning message when building the image:
    "WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv"
    is it because we execute install.sh with the standard Dockerfile template user (root), instead of first switching it to cmst1 and only then then executing install.sh?

Well, what can I say .... I am the biggest supporter of the idea to stop running pip install with root and encapsulate everything in a common ROOT area with a virtual environment.... This will so much ease our life and make the setups identical for whom ever want s to do it either in a container or in a VM ... that would untie our hands in so many ways..... just let me know and I am ready to convert the whole thing and add the "virtual env concept" inside the container in a matter of hours. The change is not that big at the end. I've already investigated such potential path.

  1. Where does the validation of the TEAMNAME come from? I failed to run it with a non-expected team name (and it will fail for cms-home as well, for instance)

https://github.com/dmwm/CMSKubernetes/pull/1364/files#diff-1e02765b153585901f6ff915a6de7984c18f5c1e9de2d0f9ea14111847955348R87-R89

  1. When running wmagent-docker-run script, we have no way to define which tag we want to execute (comment made along the code as well)

This is intentional - until we stick to local builds, one latest build will always be tied as both $WMA_TAG and latest, so that way we guarantee that one always runs a single agent on the host. We can change that when we start uploading images to Cern registry

I also had an ERROR when executing wmagent-docker-run, please see [2]. Which is then followed by a few other errors like [3]

[2]

Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
dockerInitId: /data/srv/wmagent/current/config/.dockerInit
/data/srv/wmagent/current/config/couchdb/.dockerInit
/data/srv/wmagent/current/config/mysql/.dockerInit
/data/srv/wmagent/current/config/rucio/.dockerInit
/data/srv/wmagent/current/config/wmagent/.dockerInit
/data/srv/wmagent/current/install/.dockerInit
7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
ERROR

Well this is not exactly an error... maybe I should convert this to Warning. All it tells us, is that while starting this run of the docker image the run.sh script has found either a mismatch between the image's buildId and the previously initialized version on this host or it found a fresh uninitialized host. And then lists all the mismatches or areas which are missing initialization. This is how the script enters the initialization steps during its first run. By the end of the current run if every initialization step has been done with no errors everything should be set to match and the check will be repeated again before starting the agent. On every next run for a properly initialized agent everything should be set to match already, if not the agent won't start.

One additional benefit from this approach ... the script triggers only the initialization steps for which it finds missing or mismatched .dockerInit files. Meaning once the agent has been initialized, if one needs to repeat only a single step (for any reason - not subject to this discussion), he only needs to delete the relevant .dockerInit file and restart the agent. The set of initialization steps will be entered again, but only the one requested/mismatching will be triggered.

The error you observe in [3], is due to the old version you were trying to run.

I hope my clarifications do help a little... If you want we can meet in a quick chat session and run this together and quickly discuss the limitations of the current build-run approach.

@amaltaro
Copy link
Contributor

@todor-ivanov thank you for correcting me on the tag used. Somehow I thought that 2.2.0.7 would work, given that it came out after rc3. I will retry these steps again, but before, let me make further questions and comments based on your reply above.

a) Is there any strong reason to coupling the docker build to a CERN agent? This will eventually be part of the CD pipeline, so we should try to adopt any node that has Docker Engine available to build such images.

b) if you are constructing the Dockerfile, I think both build and image upload should be considered. Otherwise it is a limiting factor to require a local build whenever one wants to try/run the wmagent docker container.

c) regarding the cmst1 user. I don't think we need to provide a solution for it now, unless it's simple. However, that account is only common in the CERN agents. FNAL agents run under a different account. If the agent is fully functional, we can decide to pick one account and use it regardless of which site we have the agent running.

d) about the root pip triggered command. I wanted to understand if that was because pip packages are installed as root, or just because the installation is made system wide. I believe it's the former and such I would still prefer not to have a virtual environment inside the docker image.

e) for the script that runs wmagent, if you agree that it makes sense to pull the image from registry instead of local, then we definitely need to have the ability to say which tag we want to run. This makes total sense to me btw.

@todor-ivanov
Copy link
Contributor Author

Thanks @amaltaro

a) Is there any strong reason to coupling the docker build to a CERN agent? This will eventually be part of the CD pipeline, so we should try to adopt any node that has Docker Engine available to build such images.

No, those agents were just more accessible and easier to work with. Once we start shaping the image to be universally fit and also tested against all the FNAL agents, I believe we will have more work to do. Mostly because we now depend on the USER ID on the host and we try to match it inside the container... given that the User IDs between FNAL and CERN agents differ... that will be a problem to be resolved on a later stage.

b) if you are constructing the Dockerfile, I think both build and image upload should be considered. Otherwise it is a limiting factor to require a local build whenever one wants to try/run the wmagent docker container.

That's fine as well.... This is just a new initial condition that I have not considered working on in this PR... I can add it no problem. This won't change the rest of the Docker file.

c) regarding the cmst1 user. I don't think we need to provide a solution for it now, unless it's simple. However, that account is only common in the CERN agents. FNAL agents run under a different account. If the agent is fully functional, we can decide to pick one account and use it regardless of which site we have the agent running.

Well .. the best would be to make the service as "host account agnostic", but this would need some more changes. Possibly in the ways how we use the mount points from the host, how we run the two databases relative and CouchDB, maybe the user allowed to refresh service credentials and running myproxy is going to be affected as well.. etc. etc... But the biggest question is, how would we allow multiple operators to manipulate the same agent if we allow everyone to run the container with his won credentials.... then ... would that be affected by kerberos session expiration ... I believe not, but never tested it - we should look at it. In addition we must have a universal user inside the container not depending on who is starting or stopping the container itself... but again, because of the host mount points and external services we still have the requirement this user to exist and have the same privileges on both CERN and FNAL agents. As I already said. I did not consider solving that issue in this PR.

e) for the script that runs wmagent, if you agree that it makes sense to pull the image from registry instead of local, then we definitely need to have the ability to say which tag we want to run. This makes total sense to me btw.

In the very first implementation of the wmagent-docker-run wrapper was actually accepting the tag as a parameter.... and consequently I simplified it in order to protect us from running multiple agents on the same host etc. But yes I agree, this is natural to have it at runtime, especially in combination with the additional ability to upload to CERN registry... Ok then, I will start loosening quite few restrictions now... All good by me. Let me see what I can do with my next few commits. In the meantime please feel free to test it as it is. I will upload only working code so once I am ready with a properly uploaded image, I will ping you to test it as well, but the image itself should not differ from what is the locally built one.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov I gave it another try and built 2.2.1rc5, I have also uploaded and overwritten the previous 2.2.1rc5 wmagent image that was available in the CERN registry.

It looks like some of the points I raised in my previous review are yet to be considered. Once you go through those and this PR is ready for another review, please let us know.

I also left a couple of other comments along the code, I wanted to make the following remarks:

  • in order to pull/run an image from registry instead of localhost, you need to change the last line in the *-run.sh script to something like: docker run $dockerOpts registry.cern.ch/cmsweb/wmagent:$WMA_TAG $wmaOpts
  • image is still fairly large, likely because of the MariaDB dependency. But I let a comment to reduce it when installing packages through apt-get
  • I couldn't get that image to run, this is the terminal failure that I hit when running *-run.sh script [4]. Which might be because I built this image under my own CERN unix username on the node cmsdocker01

[4]

Start: Performing local Docker image initialisation steps
deploy_to_container: Try Copying the host WMAgent.secrets file into the container admin area
deploy_to_container: Done
deploy_to_container: Updating WMAgent.secrets file with the current host's details
deploy_to_container: Double checking the final WMAgent.secrets file
deploy_to_container: Checking Certificates and Proxy
sed: couldn't open temporary file /usr/local/deploy/sed2vz2dT: Permission denied
chmod: changing permissions of '/usr/local/deploy/renew_proxy.sh': Operation not permitted
deploy_to_container: Checking Certificate lifetime:
deploy_to_container: Certifficate end date: Sep  7 12:04:12 2023 GMT
deploy_to_container: Checking myproxy lifetime:
deploy_to_container: myproxy end date: May 17 13:03:40 2023 GMT
/usr/local/deploy/renew_proxy.sh: line 4: /data/admin/wmagent/env.sh: No such file or directory
MyProxy v6.2 Aug 2019 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to xxxx 
Successfully connected to xxxx 
using trusted certificates directory /etc/grid-security/certificates
Using Host cert file (/data/certs/servicecert.pem), key file (/data/certs/servicekey.pem)
server name: /DC=ch/DC=cern/OU=xxxx
checking that server name is acceptable...
server name matches "myproxy.cern.ch"
authenticated server name is acceptable
Failed to receive credentials.
ERROR from myproxy-server:
No credentials exist with username "amaltaro" and credential name "CREDNAME".
ERROR: Failed to renew expired myproxy
ERROR: deploy_to_container

[1]+  Exit 1                  ./wmagent-docker-run.sh -t testbed-alan -n 2 -f mysql -c cmsweb-testbed.cern.ch

docker/pypi/wmagent/Dockerfile Outdated Show resolved Hide resolved
docker/pypi/wmagent/Dockerfile Outdated Show resolved Hide resolved
@vkuznet
Copy link
Collaborator

vkuznet commented May 29, 2023

@todor-ivanov , before going into review of long set of changes I rather would like to request the breakdown of layers sizes in final image. You can obtain this info as following:

docker images
REPOSITORY                                 TAG               IMAGE ID       CREATED         SIZE
registry.cern.ch/cmsweb/reqmgr2ms-pileup   2.2.0.5           5316ae5b435b   4 weeks ago     1.08GB
...

docker history 5316ae5b435b
IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
5316ae5b435b   4 weeks ago    /bin/sh -c #(nop)  CMD ["python3"]              0B
<missing>      4 weeks ago    /bin/sh -c #(nop) WORKDIR /data                 0B
<missing>      4 weeks ago    /bin/sh -c sudo chown -R $USER.$USER $WDIR      6.76kB
<missing>      4 weeks ago    /bin/sh -c #(nop)  USER _reqmgr2ms              0B
<missing>      4 weeks ago    /bin/sh -c echo "%$USER ALL=(ALL) NOPASSWD:A…   704B
<missing>      4 weeks ago    /bin/sh -c useradd ${USER} && install -o ${U…   328kB
....

So, each step of docker build will be displayed along with its size. This way we'll have clear picture where to optimize the build and final image.

@amaltaro
Copy link
Contributor

@todor-ivanov this PR has been stale for a couple of weeks now. Can you please address the comments/questions/recommendations above - and along the code - such that we can progress and finish this development still this quarter?

Note that a wmagent Dockerfile updated injected a conflict to this PR, sorry about that.

@todor-ivanov
Copy link
Contributor Author

Hi @vkuznet
Here is the output of the the build history:

$ docker image list 
REPOSITORY                        TAG       IMAGE ID       CREATED       SIZE
wmagent                           2.2.1     9108a79248b3   5 hours ago   2.04GB

$ docker history wmagent
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
9108a79248b3   5 hours ago   ENTRYPOINT ["./run.sh"]                         0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV USER=cmst1                                  0B        buildkit.dockerfile.v0
<missing>      5 hours ago   USER cmst1                                      0B        buildkit.dockerfile.v0
<missing>      5 hours ago   WORKDIR /data                                   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c chown -R ${W…   42.2kB    buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c ${WMA_ROOT_D…   381MB     buildkit.dockerfile.v0
<missing>      5 hours ago   ADD run.sh /data/run.sh # buildkit              33.4kB    buildkit.dockerfile.v0
<missing>      5 hours ago   ADD install.sh /data/install.sh # buildkit      7.33kB    buildkit.dockerfile.v0
<missing>      5 hours ago   ADD etc /usr/local/etc # buildkit               4.47kB    buildkit.dockerfile.v0
<missing>      5 hours ago   ADD bin /usr/local/bin # buildkit               23.2kB    buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c echo "${WMA_…   698B      buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c rm -f /etc/m…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c usermod -aG …   3.71kB    buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c install -o $…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c useradd -u $…   10.4MB    buildkit.dockerfile.v0
<missing>      5 hours ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c groupadd -g …   1.93kB    buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_ENV_FILE=/usr/local/deploy/env.sh       0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_DEPLOY_DIR=/usr/local                   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_MANAGE_DIR=/data/srv/wmagent/current…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_CONFIG_DIR=/data/srv/wmagent/current…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_INSTALL_DIR=/data/srv/wmagent/curren…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_CURRENT_DIR=/data/srv/wmagent/current   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_HOSTADMIN_DIR=/data/admin/wmagent/ho…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_CERTS_DIR=/data/certs                   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_ADMIN_DIR=/data/admin/wmagent           0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_BASE_DIR=/data/srv                      0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_ROOT_DIR=/data                          0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_GID=1399                                0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_UID=31961                               0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_GROUP=zh                                0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_USER=cmst1                              0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV WMA_TAG=2.2.1                               0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ARG WMA_TAG=None                                0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV PKG_CONFIG_PATH=/usr/lib/oracle             0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      5 hours ago   ENV LD_LIBRARY_PATH=/usr/lib/oracle             0B        buildkit.dockerfile.v0
<missing>      5 hours ago   COPY /usr/lib/oracle /usr/lib/oracle # build…   258MB     buildkit.dockerfile.v0
<missing>      5 hours ago   RUN /bin/sh -c apt-get install -y libmariadb…   355MB     buildkit.dockerfile.v0
<missing>      5 hours ago   RUN /bin/sh -c apt-get update # buildkit        732kB     buildkit.dockerfile.v0
<missing>      5 hours ago   MAINTAINER Valentin Kuznetsov vkuznet@gmail.…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago   WORKDIR /data                                   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago   ADD manage /data/manage # buildkit              2.62kB    buildkit.dockerfile.v0
<missing>      4 weeks ago   ADD monitor.sh /data/monitor.sh # buildkit      1.33kB    buildkit.dockerfile.v0
<missing>      4 weeks ago   ADD run.sh /data/run.sh # buildkit              2.81kB    buildkit.dockerfile.v0
<missing>      4 weeks ago   RUN /bin/sh -c pip install dbs3-client # bui…   10MB      buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /data/cpy_exporter /usr/bin/cpy_exporte…   8.22MB    buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /data/process_exporter /usr/bin/process…   8.04MB    buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /data/cmsweb-ping /usr/bin/cmsweb-ping …   4.53MB    buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /etc/vomses /etc/vomses # buildkit         184B      buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /etc/grid-security/vomsdir /etc/grid-se…   198B      buildkit.dockerfile.v0
<missing>      4 weeks ago   COPY /etc/grid-security/certificates /etc/gr…   431kB     buildkit.dockerfile.v0
<missing>      4 weeks ago   RUN /bin/sh -c mkdir /etc/grid-security # bu…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago   RUN /bin/sh -c ln -s /usr/bin/python3 /usr/b…   16B       buildkit.dockerfile.v0
<missing>      4 weeks ago   RUN /bin/sh -c apt-get install -y curl vim l…   79.3MB    buildkit.dockerfile.v0
<missing>      4 weeks ago   RUN /bin/sh -c apt-get update # buildkit        18MB      buildkit.dockerfile.v0
<missing>      4 weeks ago   MAINTAINER Valentin Kuznetsov vkuznet@gmail.…   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   CMD ["python3"]                                 0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   RUN /bin/sh -c set -eux;   wget -O get-pip.p…   10.3MB    buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PYTHON_GET_PIP_SHA256=96461deced5c2a487d…   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PYTHON_GET_PIP_URL=https://github.com/py…   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PYTHON_SETUPTOOLS_VERSION=57.5.0            0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PYTHON_PIP_VERSION=22.0.4                   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   RUN /bin/sh -c set -eux;  for src in idle3 p…   32B       buildkit.dockerfile.v0
<missing>      6 weeks ago   RUN /bin/sh -c set -eux;   wget -O python.ta…   42.7MB    buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PYTHON_VERSION=3.8.16                       0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV GPG_KEY=E3FF2839C048B25C084DEBE9B26995E3…   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   RUN /bin/sh -c set -eux;  apt-get update;  a…   18.5MB    buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      6 weeks ago   /bin/sh -c set -ex;  apt-get update;  apt-ge…   529MB     
<missing>      6 weeks ago   /bin/sh -c apt-get update && apt-get install…   152MB     
<missing>      6 weeks ago   /bin/sh -c set -eux;  apt-get update;  apt-g…   28.6MB    
<missing>      6 weeks ago   /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      6 weeks ago   /bin/sh -c #(nop) ADD file:fc290cf8ddb984325…   124MB     

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro @vkuznet , I think I have adressed your comments. Could you please take another look.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov I performed the image construction step and things are looking better now.
I did not test the docker image that you uploaded to the registry because I think that there are important items to be resolved first (e.g., moving away from RPMs in this PR).

I also left a few extra comments along the code for your consideration.

docker/pypi/wmagent/Dockerfile Outdated Show resolved Hide resolved
docker/pypi/wmagent/Dockerfile Outdated Show resolved Hide resolved

# Fix for outdated yui library - A really bad workaround. We should get rid of it ASAP:
wget -nv -P $WMA_DEPLOY_DIR wget http://cmsrep.cern.ch/cmssw/repos/comp/slc7_amd64_gcc630/0000000000000000000000000000000000000000000000000000000000000000/RPMS/cd/cda5f9ef4b33696e67c9e2f995dd5cb6/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm
mkdir $WMA_DEPLOY_DIR/yui && cat $WMA_DEPLOY_DIR/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm|rpm2archive - |tar --strip-components=13 -xzv --directory $WMA_DEPLOY_DIR/yui
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still unresolved as well.

echo "-----------------------------------------------------------------------"
echo "Start $stepMsg"
set -x
cat <<EOF >> /home/${WMA_USER}/.bashrc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outcome is the same. I am just suggesting a different approach on how the .bashrc could get populated. Given that it's not a handful of lines, in my opinion it would be cleaner to have this content in its own file.


dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG "

docker build $dockerOpts -t wmagent:$WMA_TAG -t wmagent:latest .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be working fine, even though I mistyped my password and the upload step failed. Nonetheless, I do see all the tags in the machine where it was built:

amaltaro@cmsdocker01:/build/amaltaro/CMSKubernetes/docker/pypi/wmagent $ docker image ls
REPOSITORY                                 TAG                   IMAGE ID            CREATED             SIZE
registry.cern.ch/cmsweb/wmagent            2.2.2rc1              c129fdd21e3a        14 minutes ago      2.04GB
registry.cern.ch/cmsweb/wmagent            latest                c129fdd21e3a        14 minutes ago      2.04GB
wmagent                                    2.2.2rc1              c129fdd21e3a        14 minutes ago      2.04GB
wmagent                                    latest                c129fdd21e3a        14 minutes ago      2.04GB

I am unsure about the latest tag though, given that this will be part of the WMCore CI/CD pipeline, the latest tag would be used for any tag that we cut in WMCore. Said that, I would suggest to simply remove the latest tag from this setup.

echo "Pulling Docker image: registry.cern.ch/cmsweb/wmagent:$WMA_TAG"
docker login registry.cern.ch
docker pull registry.cern.ch/cmsweb/wmagent:$WMA_TAG
docker tag registry.cern.ch/cmsweb/wmagent:$WMA_TAG wmagent:$WMA_TAG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need to tag any image at this point. IMO, we only need to run the docker pull command with the correct tag, that is all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I just confirmed it with this error:

cmst1@vocms0260:/data/srv $ Starting the wmagent:latest docker container with the following parameters: -t testbed-vocms0260 -n 1 -f mysql -c cmsweb-test9.cern.ch -p -v 2.2.2rc1
Unable to find image 'wmagent:latest' locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment about the tagging answers your remark here as well. I'll try to simulate the error though. But tagging the local image as latest is needed (the local one in this case is wmagent:latest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro I could not simulate this error. I did even clean everything on my machine and then downloaded only one image from the registry... with no local tags... and it was again working perfectly well - just creating a latest tag locally, without having the latest tag uploaded in the registry... Maybe it was something when you have put a wrong password and your credentials have been mistakenly recorded in the docker config.... I cannot tell, but for me it works just fine.

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro
I have addressed all your comments and resolved the previously forgotten RPM leftover. Sorry for that. You may give it a try at your convenience.

@amaltaro
Copy link
Contributor

@todor-ivanov please resolve the conflict and also provide an up-to-date output for the image size (from docker history wmagent).

@todor-ivanov
Copy link
Contributor Author

which conflict @amaltaro ?

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro,
Sorry for the delay about the history from the image build, but in the meantime I found a bug in how I was downloading the yui files, which I had to fix with a separate commit. All is good now.

Here [1] is the history of the image build. One may compare it with my previous printout to see where we have gained a little in terms of space.

[1]

$ docker image history wmagent:2.2.1
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
1ffda1a986fd   10 minutes ago   ENTRYPOINT ["./run.sh"]                         0B        buildkit.dockerfile.v0
<missing>      10 minutes ago   ENV USER=cmst1                                  0B        buildkit.dockerfile.v0
<missing>      10 minutes ago   USER cmst1                                      0B        buildkit.dockerfile.v0
<missing>      10 minutes ago   WORKDIR /data                                   0B        buildkit.dockerfile.v0
<missing>      10 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c chown -R ${W…   42.7kB    buildkit.dockerfile.v0
<missing>      10 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c ${WMA_ROOT_D…   378MB     buildkit.dockerfile.v0
<missing>      11 minutes ago   ADD run.sh /data/run.sh # buildkit              33.4kB    buildkit.dockerfile.v0
<missing>      11 minutes ago   ADD install.sh /data/install.sh # buildkit      7.85kB    buildkit.dockerfile.v0
<missing>      29 minutes ago   ADD etc /usr/local/etc # buildkit               4.47kB    buildkit.dockerfile.v0
<missing>      29 minutes ago   ADD bin /usr/local/bin # buildkit               23.2kB    buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c echo "${WMA_…   698B      buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c rm -f /etc/m…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c usermod -aG …   3.56kB    buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c install -o $…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c useradd -u $…   10.4MB    buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN |1 WMA_TAG=2.2.1 /bin/sh -c groupadd -g …   1.87kB    buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_ENV_FILE=/usr/local/deploy/env.sh       0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_DEPLOY_DIR=/usr/local                   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_MANAGE_DIR=/data/srv/wmagent/current…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_CONFIG_DIR=/data/srv/wmagent/current…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_INSTALL_DIR=/data/srv/wmagent/curren…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_CURRENT_DIR=/data/srv/wmagent/current   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_HOSTADMIN_DIR=/data/admin/wmagent/ho…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_CERTS_DIR=/data/certs                   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_ADMIN_DIR=/data/admin/wmagent           0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_BASE_DIR=/data/srv                      0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_ROOT_DIR=/data                          0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_GID=1399                                0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_UID=31961                               0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_GROUP=zh                                0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_USER=cmst1                              0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV WMA_TAG=2.2.1                               0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ARG WMA_TAG=None                                0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV PKG_CONFIG_PATH=/usr/lib/oracle             0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   ENV LD_LIBRARY_PATH=/usr/lib/oracle             0B        buildkit.dockerfile.v0
<missing>      29 minutes ago   COPY /usr/lib/oracle /usr/lib/oracle # build…   258MB     buildkit.dockerfile.v0
<missing>      29 minutes ago   RUN /bin/sh -c apt-get install -y libmariadb…   345MB     buildkit.dockerfile.v0
<missing>      30 minutes ago   RUN /bin/sh -c apt-get update # buildkit        687kB     buildkit.dockerfile.v0
<missing>      30 minutes ago   MAINTAINER Valentin Kuznetsov vkuznet@gmail.…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      /bin/sh -c #(nop) WORKDIR /data                 0B        
<missing>      4 weeks ago      /bin/sh -c #(nop) ADD file:62b40ab409929052d…   2.62kB    
<missing>      4 weeks ago      /bin/sh -c #(nop) ADD file:d542a6348f3d47c04…   1.43kB    
<missing>      4 weeks ago      /bin/sh -c #(nop) ADD file:4b52f467960680022…   2.81kB    
<missing>      4 weeks ago      /bin/sh -c pip install dbs3-client              10MB      
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY file:179dde413893af6e…   8.22MB    
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY file:0efefb1ce6f445b6…   8.04MB    
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY file:bdd094e203553774…   4.53MB    
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY dir:fdfb134dd7bef922f…   184B      
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY dir:58a0efbd777b5881b…   198B      
<missing>      4 weeks ago      /bin/sh -c #(nop) COPY dir:4b189a3d1eebc67f3…   431kB     
<missing>      4 weeks ago      /bin/sh -c mkdir /etc/grid-security             0B        
<missing>      4 weeks ago      /bin/sh -c ln -s /usr/bin/python3 /usr/bin/p…   16B       
<missing>      4 weeks ago      /bin/sh -c apt-get install -y curl vim libcu…   79.3MB    
<missing>      4 weeks ago      /bin/sh -c apt-get update                       18MB      
<missing>      4 weeks ago      /bin/sh -c #(nop)  MAINTAINER Valentin Kuzne…   0B        
<missing>      4 weeks ago      CMD ["python3"]                                 0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      RUN /bin/sh -c set -eux;   wget -O get-pip.p…   10.3MB    buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PYTHON_GET_PIP_SHA256=96461deced5c2a487d…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PYTHON_GET_PIP_URL=https://github.com/py…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PYTHON_SETUPTOOLS_VERSION=57.5.0            0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PYTHON_PIP_VERSION=22.0.4                   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      RUN /bin/sh -c set -eux;  for src in idle3 p…   32B       buildkit.dockerfile.v0
<missing>      4 weeks ago      RUN /bin/sh -c set -eux;   wget -O python.ta…   42.7MB    buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PYTHON_VERSION=3.8.16                       0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV GPG_KEY=E3FF2839C048B25C084DEBE9B26995E3…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      RUN /bin/sh -c set -eux;  apt-get update;  a…   18.5MB    buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago      /bin/sh -c set -ex;  apt-get update;  apt-ge…   529MB     
<missing>      4 weeks ago      /bin/sh -c apt-get update && apt-get install…   152MB     
<missing>      4 weeks ago      /bin/sh -c set -eux;  apt-get update;  apt-g…   28.6MB    
<missing>      4 weeks ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      4 weeks ago      /bin/sh -c #(nop) ADD file:150a6453ab2258061…   124MB     

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov I have made further comments along the code.

I mentioned this in the code, but I think it's worth it to point out that the GH action needs to be updated to deal with the extra --network=host docker build option.

In addition, my understanding is that wmagent-docker-build.sh will not be really needed, given that we build these images from the GH action workflow. But I think we can keep it around for a little longer until we converge with everything and start adopting docker for WMAgent in production.

RUN apt-get update
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo
ENV TAG=X.Y.Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep this line otherwise this will break the WMCore CD pipeline, see:
https://github.com/dmwm/WMCore/blob/master/.github/workflows/docker_images_template.yaml#L29

Perhaps you can do something like:

ENV TAG=X.Y.Z
ENV WMA_TAG=${TAG}

and it should resolve the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been done. Unresolving it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, I have lost that change in my previous commit. Sorry, my bad. Fixing it with my next one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code won't work! WMCore GH action workflow will update TAG but not WMA_TAG, which is the one actually used during the whole process.

docker/pypi/wmagent/README.md Outdated Show resolved Hide resolved

# NOTE: NO WMA_TAG validation is done in the current script. It is implemented at the install.sh

dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the image needs to be built with --network=host option, then we need to adapt the GH action workflow:
https://github.com/dmwm/WMCore/blob/master/.github/workflows/docker_images_template.yaml#L26

Which we can either resolve in the same GH issue, or create a new GH issue, but this cannot be forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we resolve the Mysql and CouchDB related issues [1], we will have to reevaluate if --network=host is going to be even needed at the end. I suspect yes, but still to be checked. I'd suggest we create the relevant issue once we get to a final decision about that.

[1]
dmwm/WMCore#11312
dmwm/WMCore#11313

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please create a new GH issue for this --network=host option then.
Once we merge this PR, we will have all your changes in place, but images will be badly built without the host network option. Meaning, whenever someone decides to resume this activity, and standalone image will have to be rebuilt with the relevant docker option.

The issue will be hopefully good enough to keep track of this and avoid people wasting time testing things with the wrong image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here is the new issue:
dmwm/WMCore#11635

I tried to make it as descriptive as possible.

docker/pypi/wmagent/install.sh Outdated Show resolved Hide resolved
docker/pypi/wmagent/run.sh Outdated Show resolved Hide resolved
docker/pypi/wmagent/run.sh Show resolved Hide resolved
docker/pypi/wmagent/run.sh Outdated Show resolved Hide resolved
docker/pypi/wmagent/run.sh Show resolved Hide resolved
}

check_docker_init() {
# A function to check all previously populated */.dockerInit files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid I still do not have a thorough understanding of this docker build id vs docker init files.

Let me try with a real scenario: what would happen if we want to run a new wmagent container with one or two configuration lines that have been changed? Would the container start with the new configurations? Would it fail to start? Or what?

Copy link
Contributor Author

@todor-ivanov todor-ivanov Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general principal is:

  • At build time - we create the unique build id and we record it in the image
  • At run time - on every initialization step we search for the respective .dockerInit file in the host mount points and compare its contents against the buildId inside the image. If we do not find such file or the so recorded hash inside differs from the buildId of the current image, then we consider this agent has either never passed through the relevant initialization step or what we find is a leftover from a previous image. Then we trigger the initialization step and if successful at the end we record the new .dockerInit file containing the current buildId. This way we guarantee we never repeat an initialization step.
  • In addition, we use WMAgent.secrets file as a sole source for generating the actual agent config file. In order to check for any configuration (not initialization) change, during the agent initialization we create the WMAgent.secret's file md5 sum. And on every container/agent restart we compare it. If a change has happened there, we delete only $WMA_CONFIG_DIR/.dockerInit and $WMA_HOSTADMIN_DIR/.dockerInit, which are the initialization steps that could be affected by a change in this file (one of them checks the validity of the WMAgent.secrets file, the other regenerates the agent config). They will be the only initialization steps triggered again then.

Now to answer your question.

  • If the configuration change has happened in WMAgent.secrets file, then the above mechanism for re-configuring the agent will be triggered automatically. And the configuration steps will be reinitialized.
  • If the change has happened in the agent config directly, we have no mechanism to reflect it in anyway. So if this is a configuration change that affects other parts of the configuration as well (like database reconfiguration etc., which I cannot imagine how could be the case, but anyway), in such cases we may want to repeat the whole configuration step from scratch, similar to the above mechanism. Then somebody needs to go there and manually delete $WMA_CONFIG_DIR/.dockerInit and restart the agent.

Honestly I do not think we may have a configuration change in the agent config file itself such that would affect other parts beyond the agent's executable itself. So I do not believe we will ever run into the later case. And as we have talked in the past, the current rule of thumb, should be - we always reconfigure the agent through WMAgent.secrets file and we never touch the agent config directly. So the second scenario is completely hypothetical.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now, I do confirm the above is the original idea.....but double checking the latest version of the script, I see I have overlooked one bit and also the agent_resource_control and upload_agent_config step are triggered. While, I can see a reason why the later may be needed, I am not sure about the agent_resource_control though. We should discuss this. In the meantime I'll fix that with my next commit and will tie the resource control step to be dependent on a separate .dockerInit file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we a followup of our discussion on the topic, here is the issue created about refactoring this whole mechanism: dmwm/WMCore#11627

local stepMsg="Performing $FUNCNAME"
echo "-------------------------------------------------------"
echo "Start: $stepMsg"
_init_valid $WMA_CONFIG_DIR || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-confirm, these lines below only get executed if there is no .dockerInit file (thus, first initialization of the agent), right?

Copy link
Contributor Author

@todor-ivanov todor-ivanov Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or in case a change in WMAgent.secrets file has been noticed and the whole agent configuration step has been restarted.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Jun 28, 2023

Hi @amaltaro
I have addressed all your comments, but one. For which we'd have to have a quick discussion on how to proceed. You may take another look at the code now.

About keeping wmagent-docker-build.sh I am in favor of keeping it, because even though it is may not bee needed in the GH actions, it is still the major wrapper to help during development and local image tests and speeds up the process and.

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro
As we agreed, I have created a separate issue: dmwm/WMCore#11627 for refactoring the agent initialization process. The rest of the review comments have been addressed. You may go for a hopefully final iteration through the PR now.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov please find yet some extra comments for your consideration. Thanks

RUN apt-get update
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo
ENV TAG=X.Y.Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been done. Unresolving it.

docker/pypi/wmagent/run.sh Show resolved Hide resolved
docker/pypi/wmagent/run.sh Outdated Show resolved Hide resolved
echo "Start: $stepMsg"
echo "-------------------------------------------------------"
echo "Start sleeping now ...zzz..."
while true; do sleep 10; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this infinity loop here?

Copy link
Contributor Author

@todor-ivanov todor-ivanov Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because at this stage (while the work on the whole containerization is still ongoing) we'd better connect to the container and run the agent manually with

manage start-agent

It is to be removed once we are ready. I'll make a TODO comment, so we won't forget it.


# NOTE: NO WMA_TAG validation is done in the current script. It is implemented at the install.sh

dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please create a new GH issue for this --network=host option then.
Once we merge this PR, we will have all your changes in place, but images will be badly built without the host network option. Meaning, whenever someone decides to resume this activity, and standalone image will have to be rebuilt with the relevant docker option.

The issue will be hopefully good enough to keep track of this and avoid people wasting time testing things with the wrong image.

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro I have addressed your latest comments. Could you please take another look

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov there is still one line that will make things not to work. Comment was left along the code.

RUN apt-get update
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo
ENV TAG=X.Y.Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code won't work! WMCore GH action workflow will update TAG but not WMA_TAG, which is the one actually used during the whole process.

@amaltaro
Copy link
Contributor

This has been superseeded by #1393 (supposedly a cherry-pick of everything in here, with one extra fix). I guess it can now be closed out, but I leave it up to you to decide @todor-ivanov

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro ,
Since #1393 is already merged and is identical to this one here (modulo the addition regarding the CI/CD actions TAG parameter), we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a script to deploy WMAgent docker image Build a WMAgent docker image
4 participants