Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "known fails" for kola run #1165

Closed
jlebon opened this issue Oct 18, 2019 · 7 comments
Closed

Add support for "known fails" for kola run #1165

jlebon opened this issue Oct 18, 2019 · 7 comments

Comments

@jlebon
Copy link
Member

jlebon commented Oct 18, 2019

From the CI for coreos/fedora-coreos-config#200:

--- FAIL: fcos.python (29.30s)
        python.go:43: python3-3.7.4-5.fc31.x86_64 should not be installed
...
--- FAIL: podman.base (33.97s)
    --- PASS: podman.base/info (0.79s)
    --- FAIL: podman.base/resources (4.58s)
            cluster.go:122: STEP 1: FROM scratch
            cluster.go:122: STEP 2: COPY . /
            cluster.go:122: STEP 3: COMMIT localhost/echo
            cluster.go:122: Getting image source signatures
            cluster.go:122: Copying blob sha256:b1f6a31602870db4d7c75f59c9e788883c0c2f14658c7f8d748f5c41d4f8dd41
            cluster.go:122: Copying config sha256:83e186be137d822f6eacb7e7547ce25abcdaac6e959e7849957edcd4d216645f
            cluster.go:122: Writing manifest to image destination
            cluster.go:122: Storing signatures
            cluster.go:122: Error: cannot set kernel memory with cgroupv2: File exists: OCI runtime error
            podman.go:279: Failed to run "sudo podman run --net=none --rm --kernel-memory=10m echo echo 1": output: "" status: "Process exited with status 127"
...
--- FAIL: coreos.ignition.resource.remote (2548.52s)
        harness.go:486: Cluster failed starting machines: machine "45f32065-a58b-404f-909f-865582167026" failed to start: ssh journalctl failed: ssh: handshake failed: read tcp 127.0.0.1:47564->127.0.0.1:42889: read: connection reset by peer

So e.g. the python3 one is a known failure (coreos/fedora-coreos-tracker#280). Haven't looked at the podman one yet.

I'm inclined to not block coreos/fedora-coreos-config#200 due to these failures. It just makes it harder for developers to debug them in the first place. But right now, both the CI there and the pipeline will block on failed kola tests. More generally, we need to be able to ship software even if there's a known test failure.

One suggestion I have is we maintain a "known fails" JSON list in e.g. https://github.com/coreos/fedora-coreos-config, which we can feed to kola run --kfails list.json. The semantics are as follow:

  • each member of the list follows this schema:
{
  "pattern": "fcos.python",
  "tracker": "https://github.com/coreos/fedora-coreos-tracker/issues/280"
}
  • kola run verifies the list matches this schema on startup. This ensures all kfails have an associated tracker bug
  • when kola run runs a test not in the kfail list, act as usual: exit rc 1 if the test failed, rc 0 otherwise
  • when kola run runs a test in the kfail list, reverse behaviour: exit rc 1 if the test passes, rc 0 otherwise. This will allow us to make sure the list stays up to date and we drop things as bugs are fixed. Also print out the link to the tracker bug.

This would allow us to maintain the list of known failures more easily, without having to do the mantle -> cosa -> pipeline dance. (And plus, I think putting the list somewhere more prominent + having kola yell to us about it each run will be much harder to ignore than commenting out tests!).

@cgwalters
Copy link
Member

See also https://url.corp.redhat.com/c062a98

@cgwalters
Copy link
Member

For the file I'd vote YAML so we can use comments

@jlebon
Copy link
Member Author

jlebon commented Oct 18, 2019

Ahh nice, didn't realize RHCOS already did this.

Right, so to clarify, the difference here from --blacklist-test is:

  • we still run the test
  • we get our input from a file instead of having to repeat arguments.

For the file I'd vote YAML so we can use comments

Sure. I initially wrote up JSON because AFAICT nothing in mantle currently consumes YAML, though github.com/ajeddeloh/yaml is already vendored at least.

@jlebon jlebon changed the title FCOS 31 test failures and kfail handling Add support for "known fails" for kola run Oct 18, 2019
@jlebon
Copy link
Member Author

jlebon commented Oct 18, 2019

(All the failures in coreos/fedora-coreos-config#200 are accounted for, so I'm dropping the bug label and making this strictly about the RFE.)

@ajeddeloh
Copy link
Contributor

Do you think kola should know about this or that something should interpret the kola results and know about this?

jlebon referenced this issue in jlebon/coreos-assembler Oct 22, 2019
Rather than encoding the blacklist in three different places (the
fedora-coreos-config CI, the coreos-assembler CI, and the pipeline),
let's just teach `cosa kola` to auto-detect a `kola-blacklist.yaml` from
the src config and automatically blacklisting them when executing kola.

I proposed making this baked in kola itself with slightly different
semantics in: https://github.com/coreos/mantle/issues/1103

Though teaching this to cosa should still make things easier to maintain
for now at least.
jlebon referenced this issue in jlebon/fedora-coreos-config Oct 22, 2019
This will contain a documented list of tests known to fail right now.
For details, see the corresponding cosa PR which learns how to read
this, and the mantle RFE:

coreos/coreos-assembler#866
https://github.com/coreos/mantle/issues/1103
jlebon referenced this issue in jlebon/coreos-assembler Oct 22, 2019
Rather than encoding the blacklist in three different places (the
fedora-coreos-config CI, the coreos-assembler CI, and the pipeline),
let's just teach `cosa kola` to auto-detect a `kola-blacklist.yaml` from
the src config and automatically blacklisting them when executing kola.

I proposed making this baked in kola itself with slightly different
semantics in: https://github.com/coreos/mantle/issues/1103

Though teaching this to cosa should still make things easier to maintain
for now at least.
@jlebon
Copy link
Member Author

jlebon commented Oct 22, 2019

Do you think kola should know about this or that something should interpret the kola results and know about this?

Yeah, we could definitely have it working completely externally to kola (that's a bit what #866 accomplishes, though not with the same semantics). Though I think the advantage of having it in kola is that it's more foolproof and just easier to work with both for humans and automation.

E.g. we'd have to implement this "interpreter" knowledge a bit everywhere (right now, in at least three places, as mentioned in #866). And while cosa kola could do this, we don't use that wrapper everywhere since it's a bit more opinionated. (For example, the AWS test pipeline uses kola directly right now: https://github.com/coreos/fedora-coreos-pipeline/blob/f2e6f229e9df6d21d1c40031fd91c23194d117b2/Jenkinsfile.kola.aws#L67).

jlebon referenced this issue Oct 22, 2019
Rather than encoding the blacklist in three different places (the
fedora-coreos-config CI, the coreos-assembler CI, and the pipeline),
let's just teach `cosa kola` to auto-detect a `kola-blacklist.yaml` from
the src config and automatically blacklisting them when executing kola.

I proposed making this baked in kola itself with slightly different
semantics in: https://github.com/coreos/mantle/issues/1103

Though teaching this to cosa should still make things easier to maintain
for now at least.
jlebon referenced this issue in coreos/fedora-coreos-config Oct 22, 2019
This will contain a documented list of tests known to fail right now.
For details, see the corresponding cosa PR which learns how to read
this, and the mantle RFE:

coreos/coreos-assembler#866
https://github.com/coreos/mantle/issues/1103
@cgwalters cgwalters transferred this issue from coreos/mantle Feb 27, 2020
jcajka pushed a commit to jcajka/coreos-assembler that referenced this issue Mar 24, 2020
@nikita-dubrovskii
Copy link
Contributor

We have a similar feature now - #3539, so closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants