Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: setup AWS kola job & move aws replication to release job #128

Merged
merged 1 commit into from
Sep 12, 2019

Conversation

arithx
Copy link
Contributor

@arithx arithx commented Aug 26, 2019

Creates a new job specifically for running the kola tests on AWS. As a
result moves the AWS replication out of the main Jenkinsfile into the
Jenkinsfile.release.

We'll likely have to move the AWS replication to it's own job down the
line if we want to start signing the meta.json / release.json files.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, though I think templating is going to be very important for making it easy to hack on the pipeline, as well as for supporting branches.

Jenkinsfile Outdated
stage('Kola Runs') {
container('jnlp') {
utils.shwrap("""
oc start-build --wait fedora-coreos-pipeline-kola-aws \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's template the name of the buildconfig to aid forking and branching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job itself (similar to the release job) is only meant to have one of it existing and developer pipelines can just re-use it. Did you have something else in mind?


stage('AWS Kola Run') {
utils.shwrap("""
kola run -p aws --aws-ami ${ami} --aws-region ${ami_region} -b fcos || :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an interesting topic here - should we e.g. store the kola results in the builddir too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be something that could become useful in the future but for now I don't think it hurts to just store them in jenkins.

s3_stream_dir = "${s3_bucket}/prod/streams/${params.STREAM}"
utils.shwrap("""
coreos-assembler buildprep s3://${s3_stream_dir}/builds
coreos-assembler aws-replicate --build=${params.VERSION}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, though it's making the assumption that the VERSION is the latest one. Which to be fair is likely almost always the case, though it'd be nice to have something a bit stronger here. Perhaps something like buildprep --build=${params.VERSION} and then buildupload --build=${params.VERSION}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlebon: buildprep doesn't seem to accept a specific build as a parameter just the URL; does upstream work there need to happen first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah. Was just spitballing a new UX in those cosa commands. Definitely doesn't need to block this.

coreos-assembler buildprep s3://${s3_stream_dir}/builds
coreos-assembler aws-replicate --build=${params.VERSION}
git clone https://github.com/coreos/fedora-coreos-releng-automation /var/tmp/fcos-releng
/var/tmp/fcos-releng/coreos-meta-translator/trans.py --workdir .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah, hmm we should probably just bake that into the cosa container, or have a sidecar container in the pod with utilities we need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't fit in cosa directly so I didn't have a good idea for where to fit it. Maybe just abstracting it into a generic function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can punt on this for now. Once scripts from that repo are used some more, we can think about how to avoid git cloneing all over the place.

properties([
pipelineTriggers([]),
parameters([
string(name: 'BUILDID',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this VERSION for consistency? (Even though they're not exactly the same thing, but for our use case they are).

description: 'Fedora CoreOS Build ID to test',
defaultValue: '',
trim: true),
string(name: 'S3_STREAM_DIR',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not choice(STREAM) to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a generic job it doesn't have a developer split so it cannot construct the S3_STREAM_DIR, as such my intention was to be able to still support running AWS tests in developer pipelines.

podTemplate(cloud: 'openshift', label: 'coreos-assembler', yaml: pod, defaultContainer: 'jnlp') {
node('coreos-assembler') { container('coreos-assembler') {
def ami, ami_region
stage('Fetching Build Details') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just "Fetch Metadata" ?

if (meta.amis.size() > 0) {
ami = meta['amis'][0]['hvm']
ami_region = meta['amis'][0]['name']
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an early exit or maybe better error out if there are no AMIs in the metadata?


stage('AWS Kola Run') {
utils.shwrap("""
kola run -p aws --aws-ami ${ami} --aws-region ${ami_region} -b fcos || :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want some --parallel value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably but the value itself highly depends on the account it's being ran in. I don't remember the exact numbers on the default resource limits so we probably want to keep it low, maybe 5 tests at a time which shouldn't ever hit more than 15 concurrent instances?

Copy link
Member

@jlebon jlebon Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh gotcha. Hmm, I guess this gets us back to https://github.com/coreos/mantle/issues/1037#issuecomment-517299593.

Yeah, let's go with 10 for now and go from there? The Fedora testing account at least doesn't barf on that, and e.g. my personal account doesn't either. :)

Jenkinsfile Outdated
stage('Kola Runs') {
container('jnlp') {
utils.shwrap("""
oc start-build --wait fedora-coreos-pipeline-kola-aws \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: let's not --wait for this? Essentially, consider kola AWS testing as a follow-up job, not something that's necessarily part of the build process itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm interesting - i would have figured we'd wanted it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons why I suggested this is that kola AWS tests can take a while, and there's no reason to delay building other things while that's ongoing (though related: #68). I also think errorring the build job if the AWS test fails isn't practical.

@@ -30,6 +30,23 @@ pod = pod.replace("COREOS_ASSEMBLER_IMAGE", "coreos-assembler:master")

podTemplate(cloud: 'openshift', label: 'coreos-assembler', yaml: pod, defaultContainer: 'jnlp') {
node('coreos-assembler') { container('coreos-assembler') {
// Replicate the newly uploaded AMI to other regions. Intentionally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should then remove the aws-replicate bit from the Jenkinsfile, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I haven't refreshed this PR in a while so I think that commit got lost.

@arithx arithx force-pushed the replicate_and_test branch from 26ad4a3 to bfb040f Compare September 9, 2019 18:27
@arithx
Copy link
Contributor Author

arithx commented Sep 9, 2019

Rebased & updated to support comments. Note that this shouldn't be merged until the upstream account has the credentials needed for replication added.

@dustymabe
Copy link
Member

Rebased & updated to support comments. Note that this shouldn't be merged until the upstream account has the credentials needed for replication added.

👍 - i'm looking at permissions now for the aws replication issue, so can consider this too.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dustymabe
Copy link
Member

I tested out #130 extensively today, which builds on top of this PR. We should be able to merge them both around the same time and update creds in prod to finally have aws replication and aws kola tests enabled.

Jenkinsfile Outdated
@@ -298,6 +286,19 @@ podTemplate(cloud: 'openshift', label: 'coreos-assembler', yaml: pod, defaultCon
}
}


if (!params.MINIMAL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be conditional on s3_stream_dir

ami_region = meta['amis'][0]['name']
} else {
// no AMI found
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC return inside a stage doesn't actually do anything. Also, thinking more on this now, I think we should just error out? It'd be pretty easy to mistake a job ending successfully because of tests passing vs actually a no-op.

Creates a new job specifically for running the kola tests on AWS. As a
result moves the AWS replication out of the main `Jenkinsfile` into the
`Jenkinsfile.release`.

We'll likely have to move the AWS replication to it's own job down the
line if we want to start signing the `meta.json` / `release.json` files.
@arithx
Copy link
Contributor Author

arithx commented Sep 11, 2019

Update pushed addressing comments ⬆️

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will let you or @dustymabe time the final merge.

@dustymabe dustymabe merged commit d59f42d into coreos:master Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants