Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EDR Workflows] Improve agent downloader #196135

Merged
merged 15 commits into from
Oct 22, 2024
Merged

Conversation

tomsonpl
Copy link
Contributor

@tomsonpl tomsonpl commented Oct 14, 2024

This PR introduces the following improvements and new functionalities:

  • Improved logging to provide clearer insights into the status of Agent downloads, including detailed error messages for failed downloads and retries.
  • The caching mechanism has been updated to include not just the latest patch version, but also the previous patch version (GA) for more robust caching support.
  • Added retry functionality for Agent downloads to improve resilience in cases of network issues or temporary failures. The system will now attempt to retry the download in case of errors.

@tomsonpl tomsonpl added release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.16.0 labels Oct 14, 2024
@tomsonpl tomsonpl self-assigned this Oct 14, 2024
@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl tomsonpl added the backport:version Backport to applied version labels label Oct 14, 2024
@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl tomsonpl marked this pull request as ready for review October 15, 2024 07:49
@tomsonpl tomsonpl requested a review from a team as a code owner October 15, 2024 07:49
@tomsonpl tomsonpl requested review from pzl and szwarckonrad October 15, 2024 07:49
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-defend-workflows (Team:Defend Workflows)

@tomsonpl tomsonpl requested review from paul-tavares and removed request for pzl October 15, 2024 07:49
Copy link
Contributor

@paul-tavares paul-tavares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left feedback for your review.

also, I think there is a way to manually trigger the job/script that actually uses our agent_downloader... have you don that to ensure it is all running as expected?

@@ -11,21 +11,63 @@ import type { ToolingLog } from '@kbn/tooling-log';
import { getAgentDownloadUrl, getAgentFileName } from '../common/fleet_services';
import { downloadAndStoreAgent } from '../common/agent_downloads_service';

// Decrement the patch version by 1
const decrementPatchVersion = (version: string): string => {
const [major, minor, patch] = version.split('.').map(Number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you see if the semver package can help here. The code here assumes that version is a simple version of three separated values by . which may not be true.

Also - what happens if patch is 0? Example: 8.15.0, 9.0.0, etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give semver a try, great suggestion, thank you! 😊 As for the patch being 0, we only invoke decrementPatchVersion when the patch is greater than 0, so that shouldn't be an issue. 👍


// Generate a list of versions to attempt downloading, including a fallback to the previous patch (GA)
const getVersionsToDownload = (version: string): string[] => {
const patch = parseInt(version.split('.')[2], 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use semver.patch(version) (may need to ensure that version does not have -SNAPSHOT in it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good hint, I'll use semver here, and drop the pre-release tag

log.info(`Successfully downloaded and stored version ${v}`);
return; // Exit once successful
} catch (error) {
log.error(`Failed to download or store version ${v}: ${error.message}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to ensure you mean to ONLY log an error here and then keep going in downloading other versions. is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. If the first attempt fails, we move on to try downloading the fallback version.

}
}

log.error(`Failed to download agent for any available version: ${versionsToDownload.join(', ')}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading. you could reach have downloaded some successfully but this just reports that it all error'd out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I’m not sure about that. I believe the early return statement prevents us from reaching this point.


await downloadAndStoreAgent(url, fileName);
log.info(`Successfully downloaded and stored version ${v}`);
return; // Exit once successful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you download just the first one? and then just exit? you are inside of a loop here, so I'm confused by this return here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! The reason we don’t download multiple versions is in most cases, only one version of the agent is needed to complete the task. The getVersionsToDownload function generates a list of potential versions (the exact version requested and a fallback), but the intention is to find the first valid version that can be successfully downloaded. Once that happens, downloading multiple versions would be redundant.
Was my thinking correct? Are you suggesting that we should download both versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So you are still only looking ot download 1 version, even though you get a list of versions for download.

Maybe add this information to the above for...of loop so that that is clear to others that might look at this code in the future.

): Promise<void> => {
const versionsToDownload = getVersionsToDownload(version);

for (const v of versionsToDownload) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Optional) Don't use single letter variable name.

log.error(`Failed to download agent for any available version: ${versionsToDownload.join(', ')}`);
};

const isValidVersion = (version: string): boolean => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use semver to validate versions

Comment on lines 105 to 106
const maxAttempts = 2;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you use p-retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I forgot we have this utility :) Thanks!

const { body } = await nodeFetch(agentDownloadUrl);
await finished(body.pipe(outputStream));
},
() => fs.promises.unlink(newDownloadInfo.fullFilePath) // Clean up on interruption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fs.promises. valid? I can't find it in the docs for nodeJS. My understanding was that if you wanted to work with promise versions of methods you had to explicitly import them.

just use unlink since that was imported from the FS promises package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted 👍

Copy link
Contributor Author

@tomsonpl tomsonpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-tavares , thank you for the review 👍 I adjusted the code, or answered in comments :)

@@ -11,21 +11,63 @@ import type { ToolingLog } from '@kbn/tooling-log';
import { getAgentDownloadUrl, getAgentFileName } from '../common/fleet_services';
import { downloadAndStoreAgent } from '../common/agent_downloads_service';

// Decrement the patch version by 1
const decrementPatchVersion = (version: string): string => {
const [major, minor, patch] = version.split('.').map(Number);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give semver a try, great suggestion, thank you! 😊 As for the patch being 0, we only invoke decrementPatchVersion when the patch is greater than 0, so that shouldn't be an issue. 👍


// Generate a list of versions to attempt downloading, including a fallback to the previous patch (GA)
const getVersionsToDownload = (version: string): string[] => {
const patch = parseInt(version.split('.')[2], 10);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good hint, I'll use semver here, and drop the pre-release tag


await downloadAndStoreAgent(url, fileName);
log.info(`Successfully downloaded and stored version ${v}`);
return; // Exit once successful
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! The reason we don’t download multiple versions is in most cases, only one version of the agent is needed to complete the task. The getVersionsToDownload function generates a list of potential versions (the exact version requested and a fallback), but the intention is to find the first valid version that can be successfully downloaded. Once that happens, downloading multiple versions would be redundant.
Was my thinking correct? Are you suggesting that we should download both versions?

log.info(`Successfully downloaded and stored version ${v}`);
return; // Exit once successful
} catch (error) {
log.error(`Failed to download or store version ${v}: ${error.message}`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. If the first attempt fails, we move on to try downloading the fallback version.

}
}

log.error(`Failed to download agent for any available version: ${versionsToDownload.join(', ')}`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I’m not sure about that. I believe the early return statement prevents us from reaching this point.

const { body } = await nodeFetch(agentDownloadUrl);
await finished(body.pipe(outputStream));
},
() => fs.promises.unlink(newDownloadInfo.fullFilePath) // Clean up on interruption
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted 👍

Comment on lines 105 to 106
const maxAttempts = 2;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I forgot we have this utility :) Thanks!

@tomsonpl tomsonpl requested a review from paul-tavares October 18, 2024 17:05
@tomsonpl
Copy link
Contributor Author

@paul-tavares Could you help me trigger this job to test agent_downloader?

@paul-tavares
Copy link
Contributor

@paul-tavares Could you help me trigger this job to test agent_downloader?

@dasansol92 - I think you were the one that worked with someone from the Kibana ops side to use this agent downloader - can you answer this for @tomsonpl ?


await downloadAndStoreAgent(url, fileName);
log.info(`Successfully downloaded and stored version ${v}`);
return; // Exit once successful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So you are still only looking ot download 1 version, even though you get a list of versions for download.

Maybe add this information to the above for...of loop so that that is clear to others that might look at this code in the future.

Copy link
Contributor

@szwarckonrad szwarckonrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing this!

@tomsonpl
Copy link
Contributor Author

@jbudz Hey, could you please link this PR to a buildkite build so we can test if dowloading/caching agent works well?
Thanks!

@tomsonpl tomsonpl requested a review from a team as a code owner October 21, 2024 17:21
@tomsonpl
Copy link
Contributor Author

Agent was successfully downloaded here: https://buildkite.com/elastic/kibana-pull-request/builds/244398

Reverting the changes (manual trigger of downloader) 👍

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

cc @tomsonpl

@tomsonpl tomsonpl merged commit c5067fd into elastic:main Oct 22, 2024
43 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.16, 8.x

https://github.com/elastic/kibana/actions/runs/11456672120

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 22, 2024
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 22, 2024
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.16
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Oct 22, 2024
# Backport

This will backport the following commits from `main` to `8.16`:
- [[EDR Workflows] Improve agent downloader
(#196135)](#196135)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Tomasz
Ciecierski","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-22T08:51:55Z","message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Defend
Workflows","v8.16.0","backport:version","v8.17.0"],"title":"[EDR
Workflows] Improve agent
downloader","number":196135,"url":"https://github.com/elastic/kibana/pull/196135","mergeCommit":{"message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816"}},"sourceBranch":"main","suggestedTargetBranches":["8.16","8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196135","number":196135,"mergeCommit":{"message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816"}},{"branch":"8.16","label":"v8.16.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.17.0","branchLabelMappingKey":"^v8.17.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Tomasz Ciecierski <[email protected]>
kibanamachine added a commit that referenced this pull request Oct 22, 2024
# Backport

This will backport the following commits from `main` to `8.x`:
- [[EDR Workflows] Improve agent downloader
(#196135)](#196135)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Tomasz
Ciecierski","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-22T08:51:55Z","message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Defend
Workflows","v8.16.0","backport:version","v8.17.0"],"title":"[EDR
Workflows] Improve agent
downloader","number":196135,"url":"https://github.com/elastic/kibana/pull/196135","mergeCommit":{"message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816"}},"sourceBranch":"main","suggestedTargetBranches":["8.16","8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196135","number":196135,"mergeCommit":{"message":"[EDR
Workflows] Improve agent downloader
(#196135)","sha":"c5067fdd06425541d6eb5a9ef5260c9ea9a86816"}},{"branch":"8.16","label":"v8.16.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.17.0","branchLabelMappingKey":"^v8.17.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Tomasz Ciecierski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.16.0 v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants