Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

Migrate docker builds to Docker Hub #86

Closed
abesto opened this issue Jan 10, 2016 · 49 comments
Closed

Migrate docker builds to Docker Hub #86

abesto opened this issue Jan 10, 2016 · 49 comments
Assignees

Comments

@abesto
Copy link
Contributor

abesto commented Jan 10, 2016

#82 has a pretty big scope. One way to cut down on it is to remove complexity from the Docker build pipeline by leaving Quay.io.

For context: we're originally using Quay.io because it could start builds based on tags matching a regex being pushed to GitHub, back when Docker Hub could not. Today Docker Hub has the same functionality.

The intuitive thing to do is to start using automated builds on Docker Hub. Unfortunately there's no way to turn a repository into an automated build (which makes sense, it'd violate the expectation of images in the automated build "repository" being reproducible based only on Dockerfiles). There also doesn't seem to be a way to rename repositories. This leaves us with the following options that I can see:

  1. Create new automated builds with the same names as the repositories today, drop all the images we have so far. Highly undesirable, as this would break everything for everyone currently using Zipkin on Docker.
  2. Create new automated builds with slightly worse names (zipkin-collector-2 or zipkin-collector-auto or ...), mark the current repositories as deprecated in documentation, communicate very loudly. However careful we are, I think some confusion is unavoidable.
  3. Keep using the current repositories, changing only the build script to use Docker Hub instead of Quay.io. The limitation here is that Docker Hub doesn't have an API to query build status. We can work around that with using the Docker API ("is the build successful?" becomes "does this image already exist?" + a timeout), but it's clunky. (REST API docker/hub-feedback#381)
    • Slight con: automated builds are generally preferable over manually pushed repositories. They keep both the build process and the process of understanding an image itself simple.

Given all these, today I have a slight preference to stay with Quay.io. Docker Hub is bound to get significant improvements, so with time I expect we can still migrate to it. A word on why that's a good idea: being the standard repository, we must be on Docker Hub; which means if we're anywhere else too, that's added complexity.

@codefromthecrypt
Copy link

I think deferring a move to DockerHub until the pleasure/pain ratio balances is a sound idea.

@abesto
Copy link
Contributor Author

abesto commented Feb 3, 2016

Closing, there's either agreement or lack of interest, both lead to no action for now.

@anuraaga
Copy link
Contributor

I believe the ability to add builds to existing repositories is here now! I'm happy to do the migration if no one remembers any other outstanding issues.

@anuraaga anuraaga reopened this Aug 28, 2019
@codefromthecrypt
Copy link

@openzipkin/core frankly I have no stake specifically in quay. it is harder to setup and slightly more work to troubleshoot. It does "work" almost always, but that's not actually a reason :P

@anuraaga
Copy link
Contributor

Thinking for this, can start by adding a master tag to the docker-zipkin images that auto builds on master push. It will fix #170, takes no time to set up, and gives a chance to see the build work before migrating release processes or affecting any existing tags. How does that sound?

@codefromthecrypt
Copy link

codefromthecrypt commented Aug 29, 2019 via email

@anuraaga
Copy link
Contributor

Thanks - yep definitely intend to leave the latest tag as is, which isn't a problem with the docker hub UI.

@codefromthecrypt
Copy link

I haven't seen anyone say don't stop.. and usually I try to collect reasons on behalf of others who might be afk. As far as I know, you should have clear runway. thanks for the offer to help!

@anuraaga
Copy link
Contributor

Cool - migrated one repo and looking as expected, will do the rest too

image

@anuraaga
Copy link
Contributor

Migrated all repos, all have a master tag now. Builds seem to succeed fine. This shows we can use docker hub to push tags without losing our repos and it's fairly easy to set up. Will think through how we might migrate the release process.

@abesto
Copy link
Contributor Author

abesto commented Aug 29, 2019

FTR, I'm all for dropping Quay if all our requirements can now be satisfied using just Docker Hub. @anuraaga thank you for taking on this!

@anuraaga
Copy link
Contributor

anuraaga commented Sep 1, 2019

So now all the repos have a Dockerfile that builds from the latest source. Having docker hub build a release image is also as simple as configuring the same build, with the regex on release-(\d+).(\d+).(\d+).

Looking at #82, my understanding is the goal is to get rid of release.sh completely, which I agree with. This means steps for migrating the release to docker hub are

  1. Configure docker hub to build on the release- regex. We can start with a simpler repository like zipkin-dependencies, and then go to zipkin, zipkin-aws, zipkin-gcp. For safety, we can configure the pushed tag to be test-\1.\2.\3 instead of the normal \1.\2.\3 and not update latest.

  2. Check the built images to make sure they contain the correct binaries / versions.

  3. Update docker hub build configs to push to the correct tag, not test-. Delete the test- tags manually

  4. Update RELEASE.md to indicate that docker images are automatically built by docker hub when creating a release tag, and provide links to check on executing builds.

  5. Remove release.sh / associated travis config.

Then the final step becomes deciding what to do with the remaining folders in docker-zipkin for the storage backends. As these are images just for testing / playing around with, I suggest we don't associate release tags for these images anymore, just use latest and have them rebuilt on master push and that's it. In this case, we probably want to separate the images into separate repositories so the master push of one image does not needlessly update other images.

Let me know if you have any thoughts.

@codefromthecrypt
Copy link

only other thought comes to mind is double-checking the size/layer count of the images in case an accident happened along the way.

For good measure, having someone besides you triggering a re-publish would also be good for the team. Ex sometimes I have to re-publish by deleting the tag and re-pushing it.

for extra credit, see if @jeqo wants to try this out on his zipkin-kafka-storage repo, which we're discussing if is ready for going to oz-contrib

@anuraaga
Copy link
Contributor

anuraaga commented Sep 2, 2019

Realized one thing missing from the plan was the fact that current release tags do not have Dockerfiles. So testing out the new release builds requires creating tags. For testing the build, I'll create release-0.0.0 tags and delete them once the build finishes and is verified.

@anuraaga
Copy link
Contributor

anuraaga commented Sep 4, 2019

So zipkin-dependencies has the expected tags, and I'd say the release procedure in general checks out

https://cloud.docker.com/u/openzipkin/repository/docker/openzipkin/zipkin-dependencies

The image is a few megs smaller. I'm trying to find the difference, though it's not so obvious. I noticed though that when building the repo, mvn clean -> mvn package results in the 90MB jar that's like what's in the 0.0.0 tag, while then doing mvn package again without clean results in the 100MB jar that's like what's in the latest tag. Incidentally, both of these jars package themselves into the jar, so I guess they're double the size they need to be anyways. So something weird with the shade settings I think.

I also realized zipkin-dependencies might not have been the right image to start with since I have no idea how to test it :) Any tips to confirm the 0.0.0 tags check out?

@anuraaga
Copy link
Contributor

anuraaga commented Sep 4, 2019

Also just for the record, looking at the zipkin-dependencies build, I can see why it would be publishing a non-clean build. Travis first runs mvn install, doing a build, then publish.sh runs maven release plugin, which will build again without cleaning. I'm guessing shade plugin when run on the non-clean state detects new transitive dependencies as usage of classes so the second run ends up bigger. I'm guessing the 90MB jar should be fine, but there's also a chance that something coincidentally only works because of this double-build.

@saturnism
Copy link

@anuraaga is there a way to automate a smoke test / integration test of the docker image after build?

@saturnism
Copy link

or, we may still need to use travis to run a script to build the container and run the tests.

@anuraaga
Copy link
Contributor

anuraaga commented Sep 5, 2019

@saturnism To smoke test the docker image itself, I think docker hub can run a test defined as docker-compose.test.yml. I can look into that more, not so familiar with it.

But want to confirm, is the goal to check the docker image, or the goal to check a running server, with docker as one option for building a server? If the later, I think a better option might be to write normal integration tests that depend on the zipkin-server snapshot build. Then the server could be started by junit relatively easily and it'd work for PR builds without extra effort.

@jcchavezs
Copy link

jcchavezs commented Sep 5, 2019 via email

@anuraaga
Copy link
Contributor

anuraaga commented Sep 5, 2019

Thanks - yeah I think that those test a client of zipkin api, so I think we can start any server container. Here we want to check a server plugin (stackdriver storage) so we need to package the code under test in with the actual zipkin server which is not in the repo. It's possible to do this with docker too, but I'm suspecting it's simpler to just leave it to maven.

@saturnism
Copy link

saturnism commented Sep 5, 2019

For zipkin-gcp, i'm looking for at least the same test described by @jcchavezs.
I've added IT test against real endpoints for sender and storage autoconfiguration. However, there is still a chance where a base image change (similar to the one switching from libc to alpine) may break the actual server.

@anuraaga
Copy link
Contributor

anuraaga commented Sep 5, 2019

I want to make sure everyone is on the same page.

Currently we have tests on zipkin-gcp, committed or in PRs, which check directly

  • sender
  • storage

Neither of these use zipkin-server, they run business logic in the zipkin-gcp repo directly.

Now there is an issue of when actually running zipkin-server with stackdriver - zipkin-server uses spring boot magic to load the business logic from zipkin-gcp. It's quite possible that logic that works in storage integration tests does not actually work when loaded into zipkin server.

Testing this means having tests that combine the code of zipkin-server and zipkin-gcp. Maven can do this, as can docker. These involve building, they don't work with just running a docker image of the server that already exists since the code under test is the server itself.

There is also the possibility that when someone runs the built docker image, the server doesn't work. One reason for this could be because the jar was built for Java 12 but the base image is Java 11. To me, this sort of issue seems far more unlikely than an issue stemming from Java / spring-boot combining the modules.

So I'd like to ask again, is the goal to check the docker image, or the goal to check a running server? The latter has more options that give more flexibility (I personally don't think we need to go so far as to build a docker image on every PR build). The former needs to build a docker image and running tests - doing it on master commits is probably ok but also seems relatively low value vs cost to me.

@saturnism
Copy link

saturnism commented Sep 6, 2019

We definitely should have a server test in zipkin-gcp. The need to test against the docker image is different:

there is still a chance where a base image change (similar to the one switching from libc to alpine) may break the actual server.

This has happened before, and this wouldn't have been caught from zipkin-gcp tests, since it isn't running in the same environment as the Docker image.

I'm also not suggesting running a full docker image test on every PR. The question was:

smoke test / integration test of the docker image after build

We only need to run this before or after image is built, so that we can validate this automatically.

The goal is to check the docker image 😄

@saturnism
Copy link

E.g., if we use existing repository https://github.com/openzipkin/docker-zipkin-gcp/

Thinking out loud:

  • I think we can do this on PR or push, or on tag creation, where image is built, ran, and tested as integration test.
  • Or, if we wait for Docker Hub to do the build, we may be able to use a webhook to run the tests

@anuraaga
Copy link
Contributor

openzipkin/zipkin#2808 adds a docker test for zipkin-server. @saturnism I think a similar pattern will be pretty simple for zipkin-gcp and solve the issues we're seeing testing it. The caveat I've found is Docker Hub only supports older-style secrets, meaning just environment variables set in the web UI (similar to CircleCI). I guess we'd add a base64-encoded service account private key in the web UI, which would be viewable to all the owners of the openzipkin docker organization, as opposed to a Travis secret which is only decryptable by Travis itself. Would that be ok?

@saturnism
Copy link

nice! i think the only thing i discovered was that if healthcheck was not UP, the HTTP status code was still 200. Thus I had to check specifically for the parsed health response.

For Stackdriver secret, @elefeint: should we create a separate service account for dockerhub w/ just the append permission.

@codefromthecrypt
Copy link

codefromthecrypt commented Sep 24, 2019 via email

@saturnism
Copy link

saturnism commented Oct 2, 2019

@anuraaga what's the best way for me to get the service account information configured in the target environment? we can create a dedicated service account just for this.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 3, 2019

@saturnism Thanks a lot for all the help on maintaining zipkin-gcp :) I think it's appropriate to give you access to the openzipkin/zipkin-gcp repository and have set up a team for that - then you can edit the environment variables yourself. Can you let me know your docker hub ID? If you'd prefer, feel free to send to the email address in my profile.

In the meantime, I'd recommend going ahead and setting up a personal docker repo with your fork of the zipkin-gcp repo to get started. I do that to test out hooks, etc before sending a zipkin PR like here https://hub.docker.com/u/anuraaga. I think you can verify your service account + docker-compose.test.yml works in a fork before setting up the real repo.

@saturnism
Copy link

@anuraaga it's also saturnism :) will do the tests separately. cheers,

@anuraaga
Copy link
Contributor

anuraaga commented Oct 7, 2019

I have gone ahead and configured a test release tag build in openzipkin/zipkin. This matches the release tag but will only push tags prefixed with test-

test-{\1}.{\2}.{\3},test-{\1}.{\2},test-{\1},test-latest /^release-(\d+).(\d+).(\d+)$/

Instead of pushing a dummy tag to try it out, which sends notifications and such, I'm thinking it's simpler to just let this run during the next release since it only points to test- tags. It will also give images at the same commit for the old and new release process for easier comparison. Hope that sounds ok.

@codefromthecrypt
Copy link

codefromthecrypt commented Oct 7, 2019 via email

@anuraaga
Copy link
Contributor

As discussed in https://gitter.im/openzipkin/zipkin-private?at=5da6c32439e2ef28adef1e35

We are planning on atticing the docker-zipkin repository after migrating remaining storage image builds to Docker Hub. We have always kept the storage images in sync with the zipkin server version, for example to keep schema versions in sync, so building them together with zipkin-server reduces the number of steps to do a release with hopefully not much downside.

The first migration of a storage build is openzipkin/zipkin#2849

For non-storage builds, release Dockerfiles have also been added to the code repos, e.g. openzipkin/zipkin#2846

They are wired up to push tags such as test-2.19.0. After checking a build and seeing these tags are ok, we can finally stop syncing release images from quay and have everything in Docker hub.

@codefromthecrypt codefromthecrypt pinned this issue Oct 17, 2019
codefromthecrypt pushed a commit to openzipkin/zipkin that referenced this issue Oct 17, 2019
codefromthecrypt pushed a commit to openzipkin/zipkin that referenced this issue Oct 17, 2019
These are used in our integration tests for zipkin-storage/elasticsearch

See openzipkin-attic/docker-zipkin#86
@codefromthecrypt
Copy link

last step (I think) is to make sure that on the release-N.N.N tag, we update the ENV ZIPKIN_VERSION 2.18.0 accordingly for every Dockerfile in https://github.com/openzipkin/zipkin/tree/master/docker

@codefromthecrypt
Copy link

nevermind. we don't use ZIPKIN_VERSION anymore

@codefromthecrypt
Copy link

codefromthecrypt commented Oct 17, 2019

the last important step is to integrate such that on a release tag (N.N.N) we push all tags to dockerhub, not just the N.N.N. Now, for 2.18.0, we now pus all of 2.18.0 2.18 2 and latest

@codefromthecrypt
Copy link

@anuraaga I noticed here..

# Download jars using Maven. It will try to resolve the artifact from Maven Central, which might
# not work right away if the sync is taking time, followed by bintray, which should always work
# since it's where we publish to.

I think this won't be completely correct, right? the docker hook will be run in parallel with travis (which publishes). Only by accident would bintray complete first.. or am I missing some step here

https://cloud.docker.com/u/openzipkin/repository/docker/openzipkin/zipkin/builds

@codefromthecrypt
Copy link

answered own question about the tags.. ordering question still outstanding

openzipkin/zipkin#2856

@anuraaga
Copy link
Contributor

Ordering should be fine - release-1.2.3 triggers Travis, not Docker. Maven release plugin builds and uploads artifacts and pushes 1.2.3. It's this tag that triggers docker build, which will find the pushed stuff.

@codefromthecrypt
Copy link

actually the "1.2.3" tag actually does the deploy. release-1.2.3 does prepare, not deploy

  ./mvnw --batch-mode -s ./.settings.xml -Prelease -nsu -DskipTests -Dlicense.skip=true deploy

  # If the deployment succeeded, sync it to Maven Central. Note: this needs to be done once per project, not module, hence -N
  if is_release_commit; then
    ./mvnw --batch-mode -s ./.settings.xml -nsu -N io.zipkin.centralsync-maven-plugin:centralsync-maven-plugin:sync
    javadoc_to_gh_pages
  fi

@anuraaga
Copy link
Contributor

Oh shoot didn't notice that - so far only tested with already released version. In that case going to need to think a bit. With the current scheme will need to go ahead and wire the docker hub webhook after deploy.

But I wonder why we deploy on the separate build - I think it means the same release jar is built twice, once for prepare and once for deploy. This seems wasteful of Travis resources.

@codefromthecrypt
Copy link

the release tag is an automation hook to avoid prepping manually. The release commit N.N.N is helpful as frequently deployment fails, and all we have to do is delete the dead bintray thing and click rebuild. Let's try to decouple the issues to avoid adding more complexity.

maybe the travis run can invoke the dockerhub hook similar to how it does maven central?

@anuraaga
Copy link
Contributor

Makes sense yeah will add the hook.

@codefromthecrypt
Copy link

codefromthecrypt commented Oct 18, 2019 via email

@codefromthecrypt
Copy link

I think this should close out the main repo and allow us to archive this one. We still have loose ends for the -dependencies -gcp and -aws images to finish up afterwards

openzipkin/zipkin#2875

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants