Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All classic jobs in jenkins turn into failed on datadog plugin v6.0.0 #393

Closed
bitle opened this issue Feb 3, 2024 · 8 comments · Fixed by #390
Closed

All classic jobs in jenkins turn into failed on datadog plugin v6.0.0 #393

bitle opened this issue Feb 3, 2024 · 8 comments · Fixed by #390
Assignees
Labels
kind/bug Bug related issue

Comments

@bitle
Copy link

bitle commented Feb 3, 2024

Describe the bug
Today I upgraded all plugins in jenkins to the latest version. Among them is the most recent version of Datadog Plugin.
After jenkins restarted all Successful builds of classic jobs show up as failed. I can still see in the logs that the result was SUCCESS.
Here's what I found in the logs:

ConversionException: 
---- Debugging information ---- cause-exception : java.lang.NumberFormatException cause-message : For input string: "https://jenkins2.dev.lockhart.io/job/Infrastructure/job/fmc/job/cdfmc_rds_deploy/job/deploy_fmc_with_rds/2750/" class :
 java.lang.Long required-type : java.lang.Long converter-type : com.thoughtworks.xstream.converters.SingleValueConverterWrapper wrapped-converter : com.thoughtworks.xstream.converters.basic.LongConverter path : /build/actions/org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction/buildData/buildUrl line number : 137 class[1] : org.datadog.jenkins.plugins.datadog.traces.message.TraceSpan$TraceSpanContext required-type[1] : org.datadog.jenkins.plugins.datadog.traces.message.TraceSpan$TraceSpanContext converter-type[1] : hudson.util.XStream2$AssociatedConverterImpl class[2] : org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction required-type[2] : org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction -------------------------------, CannotResolveClassException: buildParameters, CannotResolveClassException: charsetName, CannotResolveClassException: nodeName, CannotResolveClassException: jobName, CannotResolveClassException: baseJobName, CannotResolveClassException: buildTag, CannotResolveClassException: jenkinsUrl, CannotResolveClassException: executorNumber, CannotResolveClassException: javaHome, CannotResolveClassException: branch, CannotResolveClassException: gitUrl, CannotResolveClassException: gitCommit, CannotResolveClassException: isCompleted, CannotResolveClassException: hostname, CannotResolveClassException: userId, CannotResolveClassException: tags, CannotResolveClassException: startTime, CannotResolveClassException: endTime, ConversionException: Refusing to unmarshal duration for security reasons; see https://www.jenkins.io/redirect/class-filter/ ---- Debugging information ---- message : Refusing to unmarshal duration for security reasons; see https://www.jenkins.io/redirect/class-filter/ class : java.time.Duration required-type : java.time.Duration converter-type : hudson.util.XStream2$BlacklistedTypesConverter path : /build/actions/org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction/buildData/duration line number : 197 -------------------------------, CannotResolveClassException: millisInQueue, CannotResolveClassException: buildSpanContext

I reverted back to the previous version and it fixed my issues.

To Reproduce
I didn't try to reproduce this issue. I can provide my job configs and build history if needed.

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Screenshot 2024-02-03 at 9 30 33 AM

Environment and Versions (please complete the following information):
Jenkins 2.426.3
Datadog 5.6.2 -> 6.0.0

Additional context
Add any other context about the problem here.

@bitle bitle added the kind/bug Bug related issue label Feb 3, 2024
@bitle bitle changed the title All classic jobs in jenkins turn into failed on datadog plugin v4 All classic jobs in jenkins turn into failed on datadog plugin v6.0.0 Feb 3, 2024
@jeohist
Copy link

jeohist commented Feb 6, 2024

We experienced the same issue, but unfortunately reverting to the old version did not resolve our issues.

@nikita-tkachenko-datadog
Copy link
Collaborator

nikita-tkachenko-datadog commented Feb 6, 2024

@bitle, @jeohist, thank you for reporting this. The issue was resolved in release v6.0.1. Please try updating and let me know if the issue persists. Thank you!

@lemeurherve
Copy link
Member

lemeurherve commented Feb 7, 2024

@nikita-tkachenko-datadog we upgraded the ci.jenkins.io instance from 6.0.0 to 6.0.1 after the stackoverflow error we encountered this afternoon cf #389 (comment), but unfortunately all previous jobs are still marked as "failed" in 1970.

Ex: https://ci.jenkins.io/job/Infra/job/pipeline-library/job/master/ (previous builds have successfully finished while they appear failed)

image

@nikita-tkachenko-datadog
Copy link
Collaborator

Hi @lemeurherve,

Could you please provide some additional info?

  • do you see any exceptions/error messages in the Jenkins log, in the Datadog plugin log, or in the Manage Old Data screen? If yes, could you please share them?
  • would it be possible for you to provide an example of a build.xml of a job that is displayed incorrectly? I believe they're stored at $JENKINS_HOME/jobs/<JOB_NAME>/builds/<BUILD_NUMBER>/build.xml. If the entire file cannot be provided because it contains sensitive info, providing a part of it would be helpful as well.

Thank you!

@lemeurherve
Copy link
Member

I'll provide you these elements first thing in the morning tomorrow.

@nikita-tkachenko-datadog
Copy link
Collaborator

As a side note, https://issues.jenkins.io/browse/JENKINS-66328 describes a similar issue.
Some of the reports in there are from 25/01/202 (which is before Datadog plugin v6.0.0 was released) and the reporters claim that they're not using the Datadog plugin.

So while there is a plugin data deserialisation problem in v6.0.0, it is possible that the date/status display issue is caused by something else.

@dduportal
Copy link

or in the Manage Old Data screen? If yes, could you please share them?

First (quick) feedback on ci.jenkins.io (I'll let @lemeurherve provides more details with logs and/or build.xml excerpts) : after upgrading datadog from 6.0.0 to 6.0.1, we had the following warning in the "Manage Old Data" screen:

Capture d’écran 2024-02-07 à 16 03 22

@nikita-tkachenko-datadog
Copy link
Collaborator

nikita-tkachenko-datadog commented Feb 8, 2024

Thanks for the details, @dduportal! I have managed to reproduce this in a local Jenkins instance.

The CannotResolveClassException checks-out: it indeed refers to a class that is no longer there in the new release of the plugin.
"It is okay to leave unreadable data in these items/records, as Jenkins will simply ignore it" - that part was also true for me. While I saw the error in the Manage Old Data screen, the build in question had correct date and status, and looked normal.

The v6.0.0 version of the plugin had a different issue, where it could not deserialise one of the plugin's action classes because its format has changed. In some cases this led to the build data stored on disk being rewritten with default values (timestamp 0, status FAILED, etc).

To sum up:

  • migrating from earlier versions of the plugin directly to v6.0.1 is safe (some harmless CannotResolveClassException errors may be displayed)
  • migrating to v6.0.0 may cause some build data to be corrupted (in which case those builds will continue to show incorrect date and status even after updating to the most recent version of the plugin)
  • similar behaviour was observed in instances that did not have the Datadog plugin installed. It looks like this is a general issue related to how Jenkins core reacts to errors encountered while deserialising build data: rather than ignoring the actions that cannot be deserialised, it overwrites data for the entire build with default values (this only seems to happen in some cases, I could not determine a strong pattern)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants