Introduce Kibana task to deploy agentless connectors for 9.0 #203973

artem-shelkovnikov · 2024-12-12T09:21:21Z

Closes https://github.com/elastic/search-team/issues/8508

Closes https://github.com/elastic/search-team/issues/8465

Summary

This PR adds a background task for search_connectors plugin. This task checks connector records and agentless package policies and sees if new connector was added/old was deleted, and then adds/deletes package policies for these connectors.

Scenario 1: a new connector was added by a user/API call

User creates an Elastic-managed connector:

Screen.Recording.2024-12-25.at.12.59.14.mov

When the user is done, a package policy is created by this background task:

Screen.Recording.2024-12-25.at.13.00.14.mov

Scenario 2: a connector was deleted by a user/API call

User deletes an Elastic-managed connector:

Screen.Recording.2024-12-25.at.13.21.13.mov

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
Flaky Test Runner was used on any tests changed
The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

x-pack/plugins/search_connectors/server/plugin.ts

artem-shelkovnikov · 2024-12-25T12:27:48Z

x-pack/platform/plugins/shared/fleet/server/plugin.ts

-        fetchAllAgentPolicies: agentPolicyService.fetchAllAgentPolicies,
-        fetchAllAgentPolicyIds: agentPolicyService.fetchAllAgentPolicyIds,
-      },
+      agentPolicyService,


This was done due to the fact that I needed a create method that depends on a lot of other private/internal methods.

I had to either make the methods public + add here; or I could pass the service itself. Potentially there might be other way, but I'm not familiar enough with Kibana development yet to know, please tell me if there's a better way :)

elasticmachine · 2024-12-25T12:27:53Z

Pinging @elastic/fleet (Team:Fleet)

artem-shelkovnikov · 2024-12-25T12:29:01Z

x-pack/platform/plugins/shared/fleet/server/routes/agent_policy/handlers.ts

@@ -196,7 +196,7 @@ export const bulkGetAgentPoliciesHandler: FleetRequestHandler<
        'full query parameter require agent policies read permissions'
      );
    }
-    let items = await agentPolicyService.getByIDs(soClient, ids, {
+    let items = await agentPolicyService.getByIds(soClient, ids, {


Side-effect of removing the usage of AgentPolicyServiceInterface: interface had getByIDs and implementation has getByIds. I chose the latter to stay, but it's easy to rename implementation to getByIDs. This was mostly done to avoid pinging other code owners that might have used the interface method name.

artem-shelkovnikov · 2024-12-25T12:38:41Z

x-pack/plugins/search_connectors/server/services/index.ts

+              if (policy.supports_agentless !== true) {
+                this.logger.debug(`Policy ${policy.id} does not support agentless, skipping`);
+                continue;


For some reason this doesn't work - I never get a policy that has supports_agentless field.

artem-shelkovnikov · 2024-12-25T12:57:37Z

x-pack/plugins/search_connectors/server/services/index.ts

+      throw new Error(`Connector ${connector.id} service_type is null or empty`);
+    }
+
+    if (NATIVE_CONNECTOR_DEFINITIONS[connector.service_type] == null) {


Using our regular NATIVE_CONNECTOR_DEFINITIONS as a source of truth for connectors that we support. I could theoretically instead list integrations that are branched off connectors-py instead, is it possible/better?

artem-shelkovnikov · 2024-12-25T13:06:10Z

x-pack/plugins/search_connectors/server/task.ts

+const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID = 'search:agentless-connectors-sync-task';
+const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE = 'search:agentless-connectors-sync';
+
+const SCHEDULE = { interval: '1m' };


@elastic/fleet - what's the minimal interval with which we could query fleet package policies (we narrow them with a kuery that only returns our package elastic_connectors?

Can we do 10 seconds? 30 seconds?

If we only query for a certain package, there shouldn't be too many results, so it shouldn't be a problem with scale. I think using 30s sounds fine too, 10s might be too frequent.

artem-shelkovnikov · 2024-12-25T13:06:39Z

x-pack/plugins/search_connectors/server/task.ts

+        description:
+          'This task peridocally checks native connectors, agent policies and syncs them if they are out of sync',
+        timeout: '1m',
+        maxAttempts: 3,


Do we even need to retry, since we run pretty often?

…fix'

…-fix'

juliaElastic

Fleet changes LGTM

jedrazb

Great stuff! The changes in the search_connectors plugin LGTM. I have a couple of minor comments regarding naming and one question about hardcoding the package version in the task manager logic.

I’ll defer reviewing the changes in the fleet plugin to the fleet team. EDIT: I see they just approved 🚀

x-pack/solutions/search/plugins/search_connectors/server/task.ts

x-pack/test/plugin_api_integration/test_suites/task_manager/check_registered_task_types.ts

jedrazb · 2025-01-02T11:19:16Z

x-pack/solutions/search/plugins/search_connectors/server/services/index.ts

+
+const connectorsInputName = 'connectors-py';
+const pkgName = 'elastic_connectors';
+const pkgVersion = '0.0.4';


do we need this version hardcoded here? The current (latest) version in the integration registry should be def tracked somewhere by fleet, can we look it up in the package registry dynamically?

Context, 0.0.4 is already outdated

Maybe code edited in this PR will help? https://github.com/elastic/kibana/pull/192081/files here I was able to access package info and adjust permissions dynamically

jedrazb · 2025-01-02T11:36:33Z

x-pack/solutions/search/plugins/search_connectors/server/task.ts

+
+const SCHEDULE = { interval: '1m' };
+
+export function infraSyncTaskRunner(


potential followup: It would be cool if we could force schedule this when e.g. a user creates a new connector. This would limit the wait time for the infrastructure to get deployed.

Yes! Should be easy, because I already actually schedule this task during plugin startup - doing it again is easy and will require really minor refactoring.

x-pack/solutions/search/plugins/search_connectors/server/task.ts

Co-authored-by: Jedr Blaszyk <[email protected]>

jedrazb

🚢

mikecote · 2025-01-03T13:58:04Z

x-pack/solutions/search/plugins/search_connectors/server/task.ts

+      const taskInstance = await taskManager.ensureScheduled({
+        id: AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID,
+        taskType: AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE,
+        schedule: SCHEDULE,


Taking a quick look here from Response Ops. I was reading the PR description and was wondering if we need to have this task run every 30s indefinitely or if it would be possible to make it event based so it runs after a user creates or deletes a connector? Or perhaps a combo of the two but the schedule runs less frequently?

was thinking the same ...

For now this seemed to us like the best way to move forward:

The task runs and checks if any agentless policies need to be created for our connector records. Connector records can be created in multiple ways:

User creates a connector via UI

Connector is created automatically by already running agentless connector deployment

User creates a connector via API/CLI

Scenario #1 can be done with an event triggered by Kibana UI easily. Scenario #2 does not need this logic. Scenario #3 really needs this task - our CLI doesn't have access to Task Manager + our API is hosted in Elasticsearch, and Elasticsearch also has no way to affect this task run time.

This way we've taken current approach with polling every 30 seconds (a minute should be fine too), plus the task itself queries reasonably small amount of data, I believe, for it hopefully not to be too problematic.

The GenAI connectors have a similar sort of constraint, where something in Kibana wants to know when connectors get created / updated / deleted. Added in #189027

That PR originally contained some connector logic for the new "hooks", but we extracted that and restructured into a stand-alone PR: #194081 , rather than ship the two pieces together.

So, in theory case 3 can be handled this way.

Looking at those PRs, I'm also wondering if you need to handle the case of connectors being updated / deleted ...

I've skimmed through. the change but don't understand how it handles case 3 - we have customer calling Elasticsearch API directly, Kibana is not involved in this.

So we cannot have hooks attached to this call, all we can do is poll the content of a couple indices to see if changes were made. Am I missing some detail in the mentioned PR that works around this limitation?

Connector update is not important for us, but deletion is also handled in this PR

Oh, these aren't alerting connectors? These are "search" connectors? If so, you're correct, completely different "connector" framework I was talking about (I was talking about the alerting connectors).

elasticmachine · 2025-01-06T12:31:31Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 75b22f7

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #22 / Index Management app Index Management: index templates tab Index template tab "after each" hook for "can create an index template with data retention"
[job] [logs] FTR Configs #22 / Index Management app Index Management: index templates tab Index template tab can create an index template with data retention

Metrics [docs]

‼️ ERROR: no builds found for mergeBase sha [69cb966]

History

💔 Build #264260 failed 0f4db80
💚 Build #263964 succeeded b96ec2e
💛 Build #263775 was flaky 898020b
💚 Build #263434 succeeded e6e96b8
💔 Build #263422 failed ee087ba

pmuellr

ResponseOps code LGTM, left a few comments

pmuellr · 2025-01-07T20:22:23Z

x-pack/solutions/search/plugins/search_connectors/server/task.ts

+const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_ID = 'search:agentless-connectors-manager-task';
+const AGENTLESS_CONNECTOR_DEPLOYMENTS_SYNC_TASK_TYPE = 'search:agentless-connectors-manager';
+
+const SCHEDULE = { interval: '30s' };


Setting this to the largest value you are willing to live with, will be helpful to Kibana's task throughput :-)

I believe a comment in the PR indicated it could be set to "1m" which would cut down the executions by 50% (useful!)

pmuellr · 2025-01-07T20:34:04Z

x-pack/solutions/search/plugins/search_connectors/server/task.ts

+          };
+        }
+      },
+      cancel: async () => {


Note that if you want to have the cancel actually stop the task from running, you'll have to do a bit more. This function is invoked when TM decides the task needs to be cancelled (running longer than it's time limit). The basic idea is you set a local indicating you've been cancelled, and then can check that in the run() method. Example here:

kibana/x-pack/solutions/security/plugins/security_solution/server/lib/entity_analytics/entity_store/task/field_retention_enrichment_task.ts

Lines 250 to 279 in 0e13d86

const createTaskRunnerFactory =

({

logger,

telemetry,

executeEnrichPolicy,

getStoreSize,

}: {

logger: Logger;

telemetry: AnalyticsServiceSetup;

executeEnrichPolicy: ExecuteEnrichPolicy;

getStoreSize: GetStoreSize;

}) =>

({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {

let cancelled = false;

const isCancelled = () => cancelled;

return {

run: async () =>

runTask({

executeEnrichPolicy,

getStoreSize,

isCancelled,

logger,

taskInstance,

telemetry,

}),

cancel: async () => {

cancelled = true;

},

};

};

- note that this code doesn't actually seem to use the isCancelled() local function they created - I think it did at one point, must have been removed in another PR ...

seanstory reviewed Dec 12, 2024

View reviewed changes

x-pack/plugins/search_connectors/server/plugin.ts Outdated Show resolved Hide resolved

artem-shelkovnikov force-pushed the artem/add-agentless-connectors-task branch 2 times, most recently from 0500e73 to 7456f1b Compare December 25, 2024 11:37

artem-shelkovnikov changed the title ~~WIP~~ Introduce Kibana task to deploy agentless connectors for 9.0 Dec 25, 2024

artem-shelkovnikov added the release_note:skip Skip the PR/issue when compiling release notes label Dec 25, 2024

artem-shelkovnikov marked this pull request as ready for review December 25, 2024 12:26

artem-shelkovnikov requested review from a team as code owners December 25, 2024 12:26

artem-shelkovnikov commented Dec 25, 2024

View reviewed changes

botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 25, 2024

artem-shelkovnikov commented Dec 25, 2024

View reviewed changes

artem-shelkovnikov added backport:skip This commit does not require backporting Team:Search labels Dec 25, 2024

artem-shelkovnikov commented Dec 25, 2024

View reviewed changes

artem-shelkovnikov requested a review from a team as a code owner December 27, 2024 19:08

artem-shelkovnikov added 10 commits December 30, 2024 12:49

WIP

4927e05

WIP 2

5716a4e

More changes, fun fun

45278cb

More WIP + tests

1aedf35

Revert artifacts/task.ts

e791be2

Minor tweaks

8c27d06

More tests + refactoring

8f996be

Make methods private again

65a50eb

Humanise the creation time

3c294fe

Clean up a comment

3ec5dce

artem-shelkovnikov and others added 11 commits December 30, 2024 12:49

Fix error message for scheduling

6d3fea6

[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…

8b19c58

…fix'

Remove a comment in task.ts

ec75692

Increase interval to 1m

4c3eda4

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

9738e86

…-fix'

Fix issues with types/linters

cd17d94

Fix import

294f411

Remove async from the methods

8351e8a

Fix tests?

6367d72

Fix test for task manager

bec29db

Rebase fix

6d02c2a

artem-shelkovnikov force-pushed the artem/add-agentless-connectors-task branch from 41d1313 to 6d02c2a Compare December 30, 2024 12:17

kibanamachine and others added 4 commits December 30, 2024 12:37

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

39c1429

…-fix'

Merge branch 'main' into artem/add-agentless-connectors-task

ee087ba

Merge branch 'main' into artem/add-agentless-connectors-task

c59f0cb

Rebase fix

e6e96b8

juliaElastic approved these changes Jan 2, 2025

View reviewed changes

jedrazb reviewed Jan 2, 2025

View reviewed changes

artem-shelkovnikov and others added 4 commits January 2, 2025 17:25

Apply suggestions from code review

f53d034

Co-authored-by: Jedr Blaszyk <[email protected]>

[CI] Auto-commit changed files from 'node scripts/notice'

898020b

Improvements here and there

0d83497

Merge branch 'main' into artem/add-agentless-connectors-task

b96ec2e

jedrazb approved these changes Jan 3, 2025

View reviewed changes

mikecote reviewed Jan 3, 2025

View reviewed changes

artem-shelkovnikov requested review from pmuellr and mikecote January 6, 2025 09:15

artem-shelkovnikov added 2 commits January 6, 2025 10:25

Merge branch 'main' into artem/add-agentless-connectors-task

0f4db80

Merge branch 'main' into artem/add-agentless-connectors-task

75b22f7

pmuellr approved these changes Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Kibana task to deploy agentless connectors for 9.0 #203973

Introduce Kibana task to deploy agentless connectors for 9.0 #203973

artem-shelkovnikov commented Dec 12, 2024 •

edited

Loading

artem-shelkovnikov Dec 25, 2024

elasticmachine commented Dec 25, 2024

artem-shelkovnikov Dec 25, 2024 •

edited

Loading

artem-shelkovnikov Dec 25, 2024

artem-shelkovnikov Dec 25, 2024

artem-shelkovnikov Dec 25, 2024

juliaElastic Jan 2, 2025

artem-shelkovnikov Dec 25, 2024

juliaElastic left a comment

jedrazb left a comment •

edited

Loading

jedrazb Jan 2, 2025

jedrazb Jan 2, 2025

jedrazb Jan 2, 2025

artem-shelkovnikov Jan 2, 2025

jedrazb left a comment

mikecote Jan 3, 2025

pmuellr Jan 3, 2025

artem-shelkovnikov Jan 3, 2025

pmuellr Jan 6, 2025

artem-shelkovnikov Jan 7, 2025 •

edited

Loading

pmuellr Jan 7, 2025

elasticmachine commented Jan 6, 2025 •

edited

Loading

pmuellr left a comment

pmuellr Jan 7, 2025

pmuellr Jan 7, 2025


		const SCHEDULE = { interval: '1m' };

		export function infraSyncTaskRunner(

	const createTaskRunnerFactory =
	({
	logger,
	telemetry,
	executeEnrichPolicy,
	getStoreSize,
	}: {
	logger: Logger;
	telemetry: AnalyticsServiceSetup;
	executeEnrichPolicy: ExecuteEnrichPolicy;
	getStoreSize: GetStoreSize;
	}) =>
	({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {
	let cancelled = false;
	const isCancelled = () => cancelled;
	return {
	run: async () =>
	runTask({
	executeEnrichPolicy,
	getStoreSize,
	isCancelled,
	logger,
	taskInstance,
	telemetry,
	}),
	cancel: async () => {
	cancelled = true;
	},
	};
	};

Introduce Kibana task to deploy agentless connectors for 9.0 #203973

Are you sure you want to change the base?

Introduce Kibana task to deploy agentless connectors for 9.0 #203973

Conversation

artem-shelkovnikov commented Dec 12, 2024 • edited Loading

Closes https://github.com/elastic/search-team/issues/8508

Closes https://github.com/elastic/search-team/issues/8465

Summary

Checklist

Choose a reason for hiding this comment

elasticmachine commented Dec 25, 2024

artem-shelkovnikov Dec 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juliaElastic left a comment

Choose a reason for hiding this comment

jedrazb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jedrazb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artem-shelkovnikov Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Jan 6, 2025 • edited Loading

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

pmuellr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artem-shelkovnikov commented Dec 12, 2024 •

edited

Loading

artem-shelkovnikov Dec 25, 2024 •

edited

Loading

jedrazb left a comment •

edited

Loading

artem-shelkovnikov Jan 7, 2025 •

edited

Loading

elasticmachine commented Jan 6, 2025 •

edited

Loading