-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime and runtime parts #1364
Change WMAgent Dockerfile to call install.sh && Split the agent init to buildtime and runtime parts #1364
Conversation
And just a heads up.... the interim results from my first commit are quite promising:
|
e13a0f6
to
367370f
Compare
Hi @vkuznet @amaltaro you may start reviewing this PR. |
31ee5e6
to
0f131b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov as we discussed today, here is my first review based on giving this docker build + docker run a try, looking what I get in my terminal, and making spot checks in the scrips.
In addition to the questions/concerns/issues that I will report below, I think it would be extremely useful if you:
a) either update the description stating which scripts are a 1 to 1 copy of the deployment repository; or make a PR against the deployment repository with all the files that you are also copying/touching in this PR, this way we can easily review those changes.
b) I feel bad for saying it, but I guess nobody likes to read documentation and or navigate through code with typos and mistakes. There are 10s of those, so instead of reporting each of them, I'd ask you to review your changes and correct those.
About the questions/concerns and issues that I found along the way while giving it a try. I built an image based on 2.2.0.7
from a cmsdocker node, uploaded it to the registry with label 2.2.0.7-alan
and try to run it on your test agent node vocms0260 (I couldn't get it to run). Here is the remaining feedback:
- the documentation says that we need to build an image with the cmst1 user. Do we really need it?
- In addition, FNAL nodes use a different user. Will there be any impact on this?
- I see this red warning message when building the image:
"WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv"
is it because we execute install.sh with the standard Dockerfile template user (root), instead of first switching it tocmst1
and only then then executing install.sh? - Where does the validation of the TEAMNAME come from? I failed to run it with a non-expected team name (and it will fail for cms-home as well, for instance)
- When running wmagent-docker-run script, we have no way to define which tag we want to execute (comment made along the code as well)
- I also had an ERROR when executing wmagent-docker-run, please see [2]. Which is then followed by a few other errors like [3]
- On the question above, I thought it could be related to having the services down, but even after I started couchdb and mysql, it still gave me the same error.
[2]
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
dockerInitId: /data/srv/wmagent/current/config/.dockerInit
/data/srv/wmagent/current/config/couchdb/.dockerInit
/data/srv/wmagent/current/config/mysql/.dockerInit
/data/srv/wmagent/current/config/rucio/.dockerInit
/data/srv/wmagent/current/config/wmagent/.dockerInit
/data/srv/wmagent/current/install/.dockerInit
7ae2c3c8035ba8d6249900bb490668dc4bccb68150830cb1997919e215e17744
ERROR
[3]
Start: Performing local Docker image initialisation steps
deploy_to_container: Try Copying the host WMAgent.secrets file into the container admin area
deploy_to_container: Done
deploy_to_container: Updating WMAgent.secrets file with the current host's details
deploy_to_container: Double checking the final WMAgent.secrets file
deploy_to_container: Checking Certificates and Proxy
sed: can't read /data/admin/wmagent/renew_proxy.sh: No such file or directory
chmod: cannot access '/data/admin/wmagent/renew_proxy.sh': No such file or directory
deploy_to_container: Checking Certificate lifetime:
deploy_to_container: Certifficate end date: Sep 7 12:04:12 2023 GMT
deploy_to_container: Checking myproxy lifetime:
deploy_to_container: myproxy end date: May 17 13:03:40 2023 GMT
./run.sh: line 357: /data/admin/wmagent/renew_proxy.sh: No such file or directory
ERROR: Failed to renew expired myproxy
ERROR: deploy_to_container
docker/pypi/wmagent/install.sh
Outdated
|
||
# Fix for outdated yui library - A really bad workaround. We should get rid of it ASAP: | ||
wget -nv -P $WMA_DEPLOY_DIR wget http://cmsrep.cern.ch/cmssw/repos/comp/slc7_amd64_gcc630/0000000000000000000000000000000000000000000000000000000000000000/RPMS/cd/cda5f9ef4b33696e67c9e2f995dd5cb6/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm | ||
mkdir $WMA_DEPLOY_DIR/yui && cat $WMA_DEPLOY_DIR/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm|rpm2archive - |tar --strip-components=13 -xzv --directory $WMA_DEPLOY_DIR/yui |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the RPM contains all the files that we need for this yui library, I think we should adopt what is used in CouchDB:
https://github.com/dmwm/deployment/blob/master/couchdb/manage#L384-L393
Otherwise, it defeats the purpose of migrating away of RPMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can adopt that method indeed. I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still unresolved as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docker/pypi/wmagent/install.sh
Outdated
echo "-----------------------------------------------------------------------" | ||
echo "Start $stepMsg" | ||
set -x | ||
cat <<EOF >> /home/${WMA_USER}/.bashrc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner to provide a temp_bashrc
with the actual content, then cat + append it to .bashrc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that different from what I am doing here? Except that we should juggle with one more temporary file. In the current implementation you do see the content on the screen, right here the next few lines. And it is concatenated to the original/standard .bashrc
without overwriting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outcome is the same. I am just suggesting a different approach on how the .bashrc could get populated. Given that it's not a handful of lines, in my opinion it would be cleaner to have this content in its own file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then, in order to minimize the number of files/places where we deal with the environment, I'd like to reuse the already existing env.sh
file and just add to it. I'll also apply some basic tweaks to relate all the commands executed to the WMA_ROOT_DIR
. That way, on a later stage, when we completely move away from the old deployment model and we no longer care about backwards compatibility, it will be good to have all the env tweaks in a single place which would help us in the rewriting the deployment scripts at https://github.com/dmwm/WMCore/tree/master/deploy.
--mount type=bind,source=/etc/tnsnames.ora,target=/etc/tnsnames.ora,readonly \ | ||
--mount type=bind,source=/etc/condor,target=/etc/condor,readonly \ | ||
--mount type=bind,source=/tmp,target=/tmp \ | ||
--mount type=bind,source=$WMA_ROOT_DIR/certs,target=/data/certs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid maintaining credentials in different places, maybe we should use the standard /data/certs as source as well.
The same comment applies to the admin
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that.... but we will loose the feature of a common root mount point for the whole container deployment, initialization and run, which is why I wanted to have the $WMA_ROOT_DIR
at the first place..... And actually reading the code now it should be called $WMA_ROOT_MOUNT
or $WMA_DOCKER_MOUNT
.
For the time being, I'd like to keep it separate, at least until we validate this. Upon that we may change it and reuse host cert and admin dirs.
|
||
dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG " | ||
|
||
docker build $dockerOpts -t wmagent:$WMA_TAG -t wmagent:latest . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't explore this very much, but I was failing to push the new image to CERN registry. I had to rename the image and only then I managed to.
Said that, you might have to update this docker build command to actually define a tag like registry.cern.ch/cmsweb/wmagent:$WMA_TAG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, at this stage the whole procedure does not involve cern registry at all. We should get there in some other step. This PR concerns only local builds on the host where the container should be run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved now.
With my latest commit I introduced the option -p
to wmagent-docker-build.sh
, which triggers a proper tagging and uploading/pushing the so built image to registry.cern.ch
. If not used the build will be local only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be working fine, even though I mistyped my password and the upload step failed. Nonetheless, I do see all the tags in the machine where it was built:
amaltaro@cmsdocker01:/build/amaltaro/CMSKubernetes/docker/pypi/wmagent $ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cern.ch/cmsweb/wmagent 2.2.2rc1 c129fdd21e3a 14 minutes ago 2.04GB
registry.cern.ch/cmsweb/wmagent latest c129fdd21e3a 14 minutes ago 2.04GB
wmagent 2.2.2rc1 c129fdd21e3a 14 minutes ago 2.04GB
wmagent latest c129fdd21e3a 14 minutes ago 2.04GB
I am unsure about the latest
tag though, given that this will be part of the WMCore CI/CD pipeline, the latest tag would be used for any tag that we cut in WMCore. Said that, I would suggest to simply remove the latest
tag from this setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the latest
tag for the local image is needed. It achieves two goals:
- simplifying docker commands for running and maintaining the agent to just
docker <command> wmagent
and avoids the complexity of pointing the full<repository/image:tag>
tuple every time - it guaranties we run one and only one agent instance/docker container at the host
We can skip tagging it as latest
during the upload step though. That way we will not affect the CI/CD pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providing a second flag to the build wrapper script gives the expected behavior. It will upload the latest tag to registry.cern.ch only if requested by calling it with:
./wmagent-docker-build.sh -v 2.2.1 -p -l
The new option is described in the --help message as well. All coming with my next commit
passed to the current script are to be forwarded to the WMAgent container entrypoint 'run.sh' | ||
|
||
Usage: wmagent-docker-run.sh [-t <team_name>] [-n <agent_number>] [-c <central_services_url>] [-f <db_flavour>] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should support the WMAgent tag that we want to pull/run as well. Otherwise it will run whatever image version we have in the host (lacking it, I guess it would pull the latest tag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said it in several places already, it is always a local build, so for the time being it will always be pointing to the latest
version. And it is actually intentional. As you can see we are explicitly building against two tags $WMA_TAG
and latest
. This is a crucial part of the protection for one not to be able to run two wamagent
containers on the same host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the wmagent-docker-build.sh
a -p
option was added to wmagent-docerk-run.sh
as well, which would trigger an image pull from registry.cern.ch
. It is coupled with the -v <wma.tag>
version option, so one can run a specific WMAgent version from the registry. It defaults to latest
if skipped.
FYI: @amaltaro
Hi @amaltaro thanks for taking a look, but I am afraid you missed several things that I tried to clarify to the best I could, and then your effort from yesterday just went in vain, and you were observing old state/status of the final image, leading to many many errors and misbehavior you were facing that have been already resolved. And let me hurrin in my reply and state it clear here as well: bellow are the boundary condition for this PR to work:
So now to reply to your comment separately:
This was done yesterday even while we were talking in the chat... maybe you have not refreshed your page.... I do not know.... but al these thigs you request were in the PR's description since yesterday.
This will not work - minimal possible version is Let me repeat here the
Run command:
Running the agent
We do not need to build the image with
Well, what can I say .... I am the biggest supporter of the idea to stop running pip install with root and encapsulate everything in a common
This is intentional - until we stick to local builds, one latest build will always be tied as both
Well this is not exactly an error... maybe I should convert this to Warning. All it tells us, is that while starting this run of the docker image the One additional benefit from this approach ... the script triggers only the initialization steps for which it finds missing or mismatched The error you observe in [3], is due to the old version you were trying to run. I hope my clarifications do help a little... If you want we can meet in a quick chat session and run this together and quickly discuss the limitations of the current |
@todor-ivanov thank you for correcting me on the tag used. Somehow I thought that 2.2.0.7 would work, given that it came out after rc3. I will retry these steps again, but before, let me make further questions and comments based on your reply above. a) Is there any strong reason to coupling the docker build to a CERN agent? This will eventually be part of the CD pipeline, so we should try to adopt any node that has Docker Engine available to build such images. b) if you are constructing the Dockerfile, I think both build and image upload should be considered. Otherwise it is a limiting factor to require a local build whenever one wants to try/run the wmagent docker container. c) regarding the d) about the e) for the script that runs wmagent, if you agree that it makes sense to pull the image from registry instead of local, then we definitely need to have the ability to say which tag we want to run. This makes total sense to me btw. |
Thanks @amaltaro
No, those agents were just more accessible and easier to work with. Once we start shaping the image to be universally fit and also tested against all the FNAL agents, I believe we will have more work to do. Mostly because we now depend on the USER ID on the host and we try to match it inside the container... given that the User IDs between FNAL and CERN agents differ... that will be a problem to be resolved on a later stage.
That's fine as well.... This is just a new initial condition that I have not considered working on in this PR... I can add it no problem. This won't change the rest of the Docker file.
Well .. the best would be to make the service as "host account agnostic", but this would need some more changes. Possibly in the ways how we use the mount points from the host, how we run the two databases
In the very first implementation of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov I gave it another try and built 2.2.1rc5
, I have also uploaded and overwritten the previous 2.2.1rc5 wmagent image that was available in the CERN registry.
It looks like some of the points I raised in my previous review are yet to be considered. Once you go through those and this PR is ready for another review, please let us know.
I also left a couple of other comments along the code, I wanted to make the following remarks:
- in order to pull/run an image from registry instead of localhost, you need to change the last line in the *-run.sh script to something like:
docker run $dockerOpts registry.cern.ch/cmsweb/wmagent:$WMA_TAG $wmaOpts
- image is still fairly large, likely because of the MariaDB dependency. But I let a comment to reduce it when installing packages through apt-get
- I couldn't get that image to run, this is the terminal failure that I hit when running *-run.sh script [4]. Which might be because I built this image under my own CERN unix username on the node cmsdocker01
[4]
Start: Performing local Docker image initialisation steps
deploy_to_container: Try Copying the host WMAgent.secrets file into the container admin area
deploy_to_container: Done
deploy_to_container: Updating WMAgent.secrets file with the current host's details
deploy_to_container: Double checking the final WMAgent.secrets file
deploy_to_container: Checking Certificates and Proxy
sed: couldn't open temporary file /usr/local/deploy/sed2vz2dT: Permission denied
chmod: changing permissions of '/usr/local/deploy/renew_proxy.sh': Operation not permitted
deploy_to_container: Checking Certificate lifetime:
deploy_to_container: Certifficate end date: Sep 7 12:04:12 2023 GMT
deploy_to_container: Checking myproxy lifetime:
deploy_to_container: myproxy end date: May 17 13:03:40 2023 GMT
/usr/local/deploy/renew_proxy.sh: line 4: /data/admin/wmagent/env.sh: No such file or directory
MyProxy v6.2 Aug 2019 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to xxxx
Successfully connected to xxxx
using trusted certificates directory /etc/grid-security/certificates
Using Host cert file (/data/certs/servicecert.pem), key file (/data/certs/servicekey.pem)
server name: /DC=ch/DC=cern/OU=xxxx
checking that server name is acceptable...
server name matches "myproxy.cern.ch"
authenticated server name is acceptable
Failed to receive credentials.
ERROR from myproxy-server:
No credentials exist with username "amaltaro" and credential name "CREDNAME".
ERROR: Failed to renew expired myproxy
ERROR: deploy_to_container
[1]+ Exit 1 ./wmagent-docker-run.sh -t testbed-alan -n 2 -f mysql -c cmsweb-testbed.cern.ch
@todor-ivanov , before going into review of long set of changes I rather would like to request the breakdown of layers sizes in final image. You can obtain this info as following:
So, each step of docker build will be displayed along with its size. This way we'll have clear picture where to optimize the build and final image. |
@todor-ivanov this PR has been stale for a couple of weeks now. Can you please address the comments/questions/recommendations above - and along the code - such that we can progress and finish this development still this quarter? Note that a wmagent Dockerfile updated injected a conflict to this PR, sorry about that. |
Hi @vkuznet
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov I performed the image construction step and things are looking better now.
I did not test the docker image that you uploaded to the registry because I think that there are important items to be resolved first (e.g., moving away from RPMs in this PR).
I also left a few extra comments along the code for your consideration.
docker/pypi/wmagent/install.sh
Outdated
|
||
# Fix for outdated yui library - A really bad workaround. We should get rid of it ASAP: | ||
wget -nv -P $WMA_DEPLOY_DIR wget http://cmsrep.cern.ch/cmssw/repos/comp/slc7_amd64_gcc630/0000000000000000000000000000000000000000000000000000000000000000/RPMS/cd/cda5f9ef4b33696e67c9e2f995dd5cb6/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm | ||
mkdir $WMA_DEPLOY_DIR/yui && cat $WMA_DEPLOY_DIR/external+yui+2.9.0-1-1.slc7_amd64_gcc630.rpm|rpm2archive - |tar --strip-components=13 -xzv --directory $WMA_DEPLOY_DIR/yui |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still unresolved as well.
docker/pypi/wmagent/install.sh
Outdated
echo "-----------------------------------------------------------------------" | ||
echo "Start $stepMsg" | ||
set -x | ||
cat <<EOF >> /home/${WMA_USER}/.bashrc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outcome is the same. I am just suggesting a different approach on how the .bashrc could get populated. Given that it's not a handful of lines, in my opinion it would be cleaner to have this content in its own file.
|
||
dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG " | ||
|
||
docker build $dockerOpts -t wmagent:$WMA_TAG -t wmagent:latest . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be working fine, even though I mistyped my password and the upload step failed. Nonetheless, I do see all the tags in the machine where it was built:
amaltaro@cmsdocker01:/build/amaltaro/CMSKubernetes/docker/pypi/wmagent $ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cern.ch/cmsweb/wmagent 2.2.2rc1 c129fdd21e3a 14 minutes ago 2.04GB
registry.cern.ch/cmsweb/wmagent latest c129fdd21e3a 14 minutes ago 2.04GB
wmagent 2.2.2rc1 c129fdd21e3a 14 minutes ago 2.04GB
wmagent latest c129fdd21e3a 14 minutes ago 2.04GB
I am unsure about the latest
tag though, given that this will be part of the WMCore CI/CD pipeline, the latest tag would be used for any tag that we cut in WMCore. Said that, I would suggest to simply remove the latest
tag from this setup.
echo "Pulling Docker image: registry.cern.ch/cmsweb/wmagent:$WMA_TAG" | ||
docker login registry.cern.ch | ||
docker pull registry.cern.ch/cmsweb/wmagent:$WMA_TAG | ||
docker tag registry.cern.ch/cmsweb/wmagent:$WMA_TAG wmagent:$WMA_TAG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need to tag any image at this point. IMO, we only need to run the docker pull command with the correct tag, that is all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I just confirmed it with this error:
cmst1@vocms0260:/data/srv $ Starting the wmagent:latest docker container with the following parameters: -t testbed-vocms0260 -n 1 -f mysql -c cmsweb-test9.cern.ch -p -v 2.2.2rc1
Unable to find image 'wmagent:latest' locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment about the tagging answers your remark here as well. I'll try to simulate the error though. But tagging the local image as latest is needed (the local one in this case is wmagent:latest
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro I could not simulate this error. I did even clean everything on my machine and then downloaded only one image from the registry... with no local tags... and it was again working perfectly well - just creating a latest
tag locally, without having the latest
tag uploaded in the registry... Maybe it was something when you have put a wrong password and your credentials have been mistakenly recorded in the docker config.... I cannot tell, but for me it works just fine.
hi @amaltaro |
@todor-ivanov please resolve the conflict and also provide an up-to-date output for the image size (from |
which conflict @amaltaro ? |
Update README
…n a single place && Add Push_latest flag to the build wrapper
abcc2ff
to
0293f28
Compare
Hi @amaltaro, Here [1] is the history of the image build. One may compare it with my previous printout to see where we have gained a little in terms of space. [1]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov I have made further comments along the code.
I mentioned this in the code, but I think it's worth it to point out that the GH action needs to be updated to deal with the extra --network=host
docker build option.
In addition, my understanding is that wmagent-docker-build.sh
will not be really needed, given that we build these images from the GH action workflow. But I think we can keep it around for a little longer until we converge with everything and start adopting docker for WMAgent in production.
RUN apt-get update | ||
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo | ||
ENV TAG=X.Y.Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep this line otherwise this will break the WMCore CD pipeline, see:
https://github.com/dmwm/WMCore/blob/master/.github/workflows/docker_images_template.yaml#L29
Perhaps you can do something like:
ENV TAG=X.Y.Z
ENV WMA_TAG=${TAG}
and it should resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been done. Unresolving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow, I have lost that change in my previous commit. Sorry, my bad. Fixing it with my next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code won't work! WMCore GH action workflow will update TAG but not WMA_TAG, which is the one actually used during the whole process.
|
||
# NOTE: NO WMA_TAG validation is done in the current script. It is implemented at the install.sh | ||
|
||
dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the image needs to be built with --network=host
option, then we need to adapt the GH action workflow:
https://github.com/dmwm/WMCore/blob/master/.github/workflows/docker_images_template.yaml#L26
Which we can either resolve in the same GH issue, or create a new GH issue, but this cannot be forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we resolve the Mysql and CouchDB related issues [1], we will have to reevaluate if --network=host
is going to be even needed at the end. I suspect yes, but still to be checked. I'd suggest we create the relevant issue once we get to a final decision about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, please create a new GH issue for this --network=host
option then.
Once we merge this PR, we will have all your changes in place, but images will be badly built without the host network option. Meaning, whenever someone decides to resume this activity, and standalone image will have to be rebuilt with the relevant docker option.
The issue will be hopefully good enough to keep track of this and avoid people wasting time testing things with the wrong image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here is the new issue:
dmwm/WMCore#11635
I tried to make it as descriptive as possible.
} | ||
|
||
check_docker_init() { | ||
# A function to check all previously populated */.dockerInit files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid I still do not have a thorough understanding of this docker build id vs docker init files.
Let me try with a real scenario: what would happen if we want to run a new wmagent container with one or two configuration lines that have been changed? Would the container start with the new configurations? Would it fail to start? Or what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general principal is:
- At build time - we create the unique build id and we record it in the image
- At run time - on every initialization step we search for the respective
.dockerInit
file in the host mount points and compare its contents against thebuildId
inside the image. If we do not find such file or the so recorded hash inside differs from thebuildId
of the current image, then we consider this agent has either never passed through the relevant initialization step or what we find is a leftover from a previous image. Then we trigger the initialization step and if successful at the end we record the new.dockerInit
file containing the currentbuildId
. This way we guarantee we never repeat an initialization step. - In addition, we use
WMAgent.secrets
file as a sole source for generating the actual agent config file. In order to check for any configuration (not initialization) change, during the agent initialization we create theWMAgent.secret
's file md5 sum. And on every container/agent restart we compare it. If a change has happened there, we delete only$WMA_CONFIG_DIR/.dockerInit
and$WMA_HOSTADMIN_DIR/.dockerInit
, which are the initialization steps that could be affected by a change in this file (one of them checks the validity of theWMAgent.secrets
file, the other regenerates the agent config). They will be the only initialization steps triggered again then.
Now to answer your question.
- If the configuration change has happened in
WMAgent.secrets
file, then the above mechanism for re-configuring the agent will be triggered automatically. And the configuration steps will be reinitialized. - If the change has happened in the agent config directly, we have no mechanism to reflect it in anyway. So if this is a configuration change that affects other parts of the configuration as well (like database reconfiguration etc., which I cannot imagine how could be the case, but anyway), in such cases we may want to repeat the whole configuration step from scratch, similar to the above mechanism. Then somebody needs to go there and manually delete
$WMA_CONFIG_DIR/.dockerInit
and restart the agent.
Honestly I do not think we may have a configuration change in the agent config file itself such that would affect other parts beyond the agent's executable itself. So I do not believe we will ever run into the later case. And as we have talked in the past, the current rule of thumb, should be - we always reconfigure the agent through WMAgent.secrets
file and we never touch the agent config directly. So the second scenario is completely hypothetical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now, I do confirm the above is the original idea.....but double checking the latest version of the script, I see I have overlooked one bit and also the agent_resource_control
and upload_agent_config
step are triggered. While, I can see a reason why the later may be needed, I am not sure about the agent_resource_control
though. We should discuss this. In the meantime I'll fix that with my next commit and will tie the resource control step to be dependent on a separate .dockerInit
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we a followup of our discussion on the topic, here is the issue created about refactoring this whole mechanism: dmwm/WMCore#11627
local stepMsg="Performing $FUNCNAME" | ||
echo "-------------------------------------------------------" | ||
echo "Start: $stepMsg" | ||
_init_valid $WMA_CONFIG_DIR || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-confirm, these lines below only get executed if there is no .dockerInit
file (thus, first initialization of the agent), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or in case a change in WMAgent.secrets
file has been noticed and the whole agent configuration step has been restarted.
Hi @amaltaro About keeping |
Hi @amaltaro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov please find yet some extra comments for your consideration. Thanks
RUN apt-get update | ||
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo | ||
ENV TAG=X.Y.Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been done. Unresolving it.
echo "Start: $stepMsg" | ||
echo "-------------------------------------------------------" | ||
echo "Start sleeping now ...zzz..." | ||
while true; do sleep 10; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this infinity loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because at this stage (while the work on the whole containerization is still ongoing) we'd better connect to the container and run the agent manually with
manage start-agent
It is to be removed once we are ready. I'll make a TODO comment, so we won't forget it.
|
||
# NOTE: NO WMA_TAG validation is done in the current script. It is implemented at the install.sh | ||
|
||
dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, please create a new GH issue for this --network=host
option then.
Once we merge this PR, we will have all your changes in place, but images will be badly built without the host network option. Meaning, whenever someone decides to resume this activity, and standalone image will have to be rebuilt with the relevant docker option.
The issue will be hopefully good enough to keep track of this and avoid people wasting time testing things with the wrong image.
hi @amaltaro I have addressed your latest comments. Could you please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov there is still one line that will make things not to work. Comment was left along the code.
RUN apt-get update | ||
RUN apt-get install -y libmariadb-dev-compat libmariadb-dev apache2-utils sudo | ||
ENV TAG=X.Y.Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code won't work! WMCore GH action workflow will update TAG but not WMA_TAG, which is the one actually used during the whole process.
This has been superseeded by #1393 (supposedly a cherry-pick of everything in here, with one extra fix). I guess it can now be closed out, but I leave it up to you to decide @todor-ivanov |
DO NOT MERGE YET
With the current PR we modify the Dockerfile such that:
pip install wmagent
--build-arg WMA_TAG=<WMA_TAG>
The following new scripts are provided as well:
We also add the basic
install.sh
script, which is configuring the whole deployment structure and the future places for the credentials mount points.We also add the
run.sh
script to configure and start the container a the hostWe also provide two wrapper scripts:
wmagent-docer-build.sh
andwmagent-docker-run.sh
, which are taking care of all docker configs and parameters for both build and runtime.NOTE:
The minimal WMAgent tag to be used for this PR should be
2.2.1rc3
NOTE:
The files:
etc/local.ini
etc/my.cnf
etc/rucio.conf
bin/manage
Are a one to one copy from the old deployment repository (master branch), except a minor change in the few environment variables read by the
bin/manage
script, which I will explicitly point here:This one fixes the following two issues at WMCore:
Fixes dmwm/WMCore#8797
Fixes dmwm/WMCore#11564