-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Trained models list: disables 'View training data' action if data frame analytics job no longer exists #171061
[ML] Trained models list: disables 'View training data' action if data frame analytics job no longer exists #171061
Conversation
Pinging @elastic/ml-ui (:ml) |
@@ -98,6 +98,7 @@ export type TrainedModelConfigResponse = estypes.MlTrainedModelConfig & { | |||
* Associated pipelines. Extends response from the ES endpoint. | |||
*/ | |||
pipelines?: Record<string, PipelineDefinition> | null; | |||
origin_job_exists?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be optional route param. We should always perform this check when retrieving the models as it's useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 7c06abd
@@ -142,6 +142,7 @@ export function useModelActions({ | |||
icon: 'visTable', | |||
type: 'icon', | |||
available: (item) => !!item.metadata?.analytics_config?.id, | |||
enabled: (item) => item.origin_job_exists === true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a tooltip to show why the action is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 6d8a4e5
Happy to get suggestions on the copy to use.
This is ready for another look when you get a chance 🙏 cc @peteharverson, @jgowdyelastic |
@elasticmachine merge upstream |
x-pack/plugins/ml/public/application/model_management/model_actions.tsx
Outdated
Show resolved
Hide resolved
const jobIds = result.map((model) => { | ||
let id = model.metadata?.analytics_config?.id; | ||
if (id) { | ||
id = `${id}*`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be modifying the IDs. This could lead to unexpected behaviour. e.g. if we have two jobs foo
and foo2
. wildcarding foo*
will match both.
In my previous suggestion I overlooked the fact that comma separating the IDs to pass to getDataFrameAnalytics
will throw a 404 if any of them are missing.
I still don't think we should call getDataFrameAnalytics
in a loop and so I think the simplest solution would be to fetch all DFA jobs and just loop over them looking for the IDs we want.
That call isn't expensive. We could also be clever and only use the job ID if there is only one model in the results. As it's likely this endpoint will be called with either one model ID or no model IDs.
I had a go at writing up what I'm thinking. I got carried away trying to make it performant.
const filteredModels = filterForEnabledFeatureModels<TrainedModelConfigResponse>(
result,
getEnabledFeatures()
);
const dfaJobIdMap = filteredModels.reduce<Record<string, string>>((c, m) => {
const id = m.metadata?.analytics_config?.id;
if (id !== undefined) {
c[m.model_id] = id;
}
return c;
}, {});
const jobIds = Object.values(dfaJobIdMap);
if (jobIds.length === 0) {
// return early, there are no dfa jobs
return response.ok({
body: filteredModels,
});
}
let dfaJobs: estypes.MlDataframeAnalyticsSummary[] = [];
try {
const jobs =
jobIds.length === 1
? await mlClient.getDataFrameAnalytics({
id: jobIds[0],
})
: await mlClient.getDataFrameAnalytics();
dfaJobs = jobs.data_frame_analytics;
} catch (e) {
//
}
for (const model of filteredModels) {
const dfaJob = dfaJobs.find((j) => j.id === dfaJobIdMap[model.model_id]);
model.origin_job_exists = dfaJob !== undefined;
}
return response.ok({
body: filteredModels,
});
Also updating filterForEnabledFeatureModels
to add a generic type to avoid type issues
export function filterForEnabledFeatureModels<
T extends TrainedModelConfigResponse | estypes.MlTrainedModelConfig
>(models: T[], enabledFeatures: MlFeatures) {
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this offline and can't think of a situation where adding *
to each job might cause the wrong job to be matched.
I'm still not a fan of adding these *
characters to work around the fact the es endpoint will throw a 404 if one job can't be found. But there's not a compelling reason to demand this change.
} = useMlKibana(); | ||
|
||
const handleClick = async () => { | ||
if (item.metadata?.analytics_config === undefined) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could this be a check probably with a type guard so we can avoid the as
casting later on? (Saw these lines were copied as is but maybe it's an easy fix)
}); | ||
|
||
jobs.forEach(({ id }) => { | ||
const model = result.find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to explicitly set to true
or false
, not just true
for all dfa models.
const model = result.find( | |
if (m?.analytics_config?.id !== undefined) { | |
// if this is a dfa model, set origin_job_exists | |
model.origin_job_exists = result.find((m) => id === m.analytics_config.id) !== undefined; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only set it to true if a job id returned from the check matched the job id on the model. But agree that we should be setting it explicitly to false instead of just not adding the property at all and relying on falsey-ness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to set explicitly in 50ffcfd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly yeah,
origin_job_exists === undefined
-> this is not a DFA model
origin_job_exists === true
-> this a DFA model and the job exists
origin_job_exists === false
-> this is a DFA model and the job no longer exits.
Falsey-ness is evil :D
// Swallow error to prevent blocking trained models result | ||
} | ||
|
||
const filteredModels = filterForEnabledFeatureModels(result, enabledFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can loop over the filteredModels
rather than result
to remove the need to explicitly check for enabledFeatures.dfa
as all dfa jobs will have been removed if dfa
is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - that makes sense - updated in 50ffcfd
allow_no_match: true, | ||
}); | ||
|
||
jobs.forEach(({ id }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be looping through models, not jobs. We need to set false
for all job ids that aren't found, and by looping through jobs only we won't know if the jobs hasn't been foun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the job hasn't been returned then the origin_job_exists
property wouldn't be added to the model - though if we want to explicitly set it to 'false' when it's not found then I agree that looping through the models makes more sense.
I was originally thinking that it would be more efficient to just loop through the returned jobs since that would mean maybe we wouldn't need to go through all the models but happy to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to loop through the filtered models to ensure all dfa models get the origin_job_exists
property set explicitly in 50ffcfd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking that it would be more efficient to just loop through the returned jobs
I was actually thinking that to make this more efficient we could keep a list of the dfa models when identifying them in the reduce and only loop through those later on. e.g.
const dfaModels = [];
const jobIdsString = filteredModels.reduce(
(jobIdsStr: string, currentModel: TrainedModelConfigResponse, idx: number) => {
if (isTrainedModelConfigResponse(currentModel)) {
dfaModels.push(currentModel);
....
But it's not needed, the time difference will be milliseconds
const filteredModels = filterForEnabledFeatureModels(result, getEnabledFeatures()); | ||
|
||
try { | ||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types can be fixed by updating filterForEnabledFeatureModels
to use a generic type. (I suggested this is a previous comment when we were thinking of the other implementation)
As result
is TrainedModelConfigResponse
and not estypes.MlTrainedModelConfig
export function filterForEnabledFeatureModels<
T extends TrainedModelConfigResponse | estypes.MlTrainedModelConfig
>(models: T[], enabledFeatures: MlFeatures) {
...
This also means you don't need the isTrainedModelConfigResponse
type guard later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - good catch - updated in 8c7094d
@elasticmachine merge upstream |
const jobIdsString = filteredModels.reduce((jobIdsStr, currentModel, idx) => { | ||
let id = currentModel.metadata?.analytics_config?.id ?? ''; | ||
if (id !== '') { | ||
id = `${idx > 0 ? ',' : ''}${id}*`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a bug where if idx
is > 0 but no dfa jobs have yet to be found it'll add a comma to the front of the string which then causes the loading of dfa jobs to fail.
Rather than looking for idx > 0
, it should to checking to see if jobIdsStr
is empty.
I know it was my suggestion to use a reduce
but I think it was bad advice as it has caused this bug due to the code being hard to read. For the sake of easy to read code maybe we should revert back to a map, filter, join.
Something like
const jobIds = filteredModels
.map((m) => m.metadata?.analytics_config?.id)
.filter(isDefined)
.map((id) => `${id}*`);
if (jobIds.length) {
const { data_frame_analytics: jobs } = await mlClient.getDataFrameAnalytics({
id: jobIds.join(','),
allow_no_match: true,
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - updated in 73216d0
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]Async chunks
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested latest changes and LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
Fixes #167667, disabling the 'View training data' action for models in the Trained Models list if the data frame analytics job which created the model no longer exists
Adds
origin_job_exists
property to trained models list model items.This is set during the models fetch for models with associated data frame analytics jobs.
Checklist
Delete any items that are not applicable to this PR.