Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Trained models list: disables 'View training data' action if data frame analytics job no longer exists #171061

Merged
merged 8 commits into from
Nov 21, 2023
1 change: 1 addition & 0 deletions x-pack/plugins/ml/common/types/trained_models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ export type TrainedModelConfigResponse = estypes.MlTrainedModelConfig & {
* Associated pipelines. Extends response from the ES endpoint.
*/
pipelines?: Record<string, PipelineDefinition> | null;
origin_job_exists?: boolean;
Copy link
Member

@jgowdyelastic jgowdyelastic Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be optional route param. We should always perform this check when retrieving the models as it's useful information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 7c06abd


metadata?: {
analytics_config: DataFrameAnalyticsConfig;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@

import { Action } from '@elastic/eui/src/components/basic_table/action_types';
import { i18n } from '@kbn/i18n';
import { FormattedMessage } from '@kbn/i18n-react';
import { isPopulatedObject } from '@kbn/ml-is-populated-object';
import { EuiToolTip } from '@elastic/eui';
import React, { useCallback, useMemo, useEffect, useState } from 'react';
import { EuiIcon, EuiLink, EuiText, EuiToolTip } from '@elastic/eui';
import React, { FC, useCallback, useMemo, useEffect, useState } from 'react';
import {
BUILT_IN_MODEL_TAG,
DEPLOYMENT_STATE,
Expand All @@ -33,6 +34,80 @@ import { isTestable, isDfaTrainedModel } from './test_models';
import { ModelItem } from './models_list';
import { usePermissionCheck } from '../capabilities/check_capabilities';

const ViewTrainingDataAction: FC<{
item: ModelItem;
}> = ({ item }) => {
const urlLocator = useMlLocator()!;
const {
services: {
application: { navigateToUrl },
},
} = useMlKibana();

const handleClick = async () => {
if (item.metadata?.analytics_config === undefined) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could this be a check probably with a type guard so we can avoid the as casting later on? (Saw these lines were copied as is but maybe it's an easy fix)


const analysisType = getAnalysisType(
item.metadata?.analytics_config.analysis
) as DataFrameAnalysisConfigType;

const url = await urlLocator.getUrl({
page: ML_PAGES.DATA_FRAME_ANALYTICS_EXPLORATION,
pageState: {
jobId: item.metadata?.analytics_config.id as string,
analysisType,
...(analysisType === 'classification' || analysisType === 'regression'
? {
queryText: `${item.metadata?.analytics_config.dest.results_field}.is_training : true`,
}
: {}),
},
});

await navigateToUrl(url);
};

const buttonContent = (
peteharverson marked this conversation as resolved.
Show resolved Hide resolved
<>
<EuiIcon type={'visTable'} css={{ marginRight: '8px' }} />
{i18n.translate('xpack.ml.trainedModels.modelsList.viewTrainingDataActionLabel', {
defaultMessage: 'View training data',
})}
</>
);

const isEnabled = item.origin_job_exists;
const button = isEnabled ? (
<EuiLink onClick={handleClick} color={'text'}>
<EuiText size="s" color={'text'}>
{buttonContent}
</EuiText>
</EuiLink>
) : (
<EuiText size="s" color={'subdued'} css={{ fontWeight: 300 }}>
{buttonContent}
</EuiText>
);

if (isEnabled) {
return button;
} else {
return (
<EuiToolTip
position="top"
content={
<FormattedMessage
id="xpack.ml.trainedModels.modelsList.viewTrainingDataActionTooltip"
defaultMessage="Related data frame analytics job was not found."
/>
}
>
{button}
</EuiToolTip>
);
}
};

export function useModelActions({
onDfaTestAction,
onTestAction,
Expand All @@ -54,7 +129,6 @@ export function useModelActions({
}): Array<Action<ModelItem>> {
const {
services: {
application: { navigateToUrl },
overlays,
theme,
i18n: i18nStart,
Expand Down Expand Up @@ -130,40 +204,11 @@ export function useModelActions({
return useMemo(
() => [
{
name: i18n.translate('xpack.ml.trainedModels.modelsList.viewTrainingDataActionLabel', {
defaultMessage: 'View training data',
}),
description: i18n.translate(
'xpack.ml.trainedModels.modelsList.viewTrainingDataActionLabel',
{
defaultMessage: 'View training data',
}
),
icon: 'visTable',
type: 'icon',
available: (item) => !!item.metadata?.analytics_config?.id,
onClick: async (item) => {
if (item.metadata?.analytics_config === undefined) return;

const analysisType = getAnalysisType(
item.metadata?.analytics_config.analysis
) as DataFrameAnalysisConfigType;

const url = await urlLocator.getUrl({
page: ML_PAGES.DATA_FRAME_ANALYTICS_EXPLORATION,
pageState: {
jobId: item.metadata?.analytics_config.id as string,
analysisType,
...(analysisType === 'classification' || analysisType === 'regression'
? {
queryText: `${item.metadata?.analytics_config.dest.results_field}.is_training : true`,
}
: {}),
},
});

await navigateToUrl(url);
render: (item) => {
return <ViewTrainingDataAction item={item} />;
},
available: (item) => !!item.metadata?.analytics_config?.id,
enabled: (item) => item.origin_job_exists === true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a tooltip to show why the action is disabled.

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 6d8a4e5
Happy to get suggestions on the copy to use.
image

isPrimary: true,
},
{
Expand Down Expand Up @@ -606,7 +651,6 @@ export function useModelActions({
isLoading,
modelAndDeploymentIds,
navigateToPath,
navigateToUrl,
onDfaTestAction,
onLoading,
onModelDeployRequest,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ export type ModelItem = TrainedModelConfigResponse & {
type?: string[];
stats?: Stats & { deployment_stats: TrainedModelDeploymentStatsResponse[] };
pipelines?: ModelPipelines['pipelines'] | null;
origin_job_exists?: boolean;
deployment_ids: string[];
putModelConfig?: object;
state: ModelState;
Expand Down
34 changes: 32 additions & 2 deletions x-pack/plugins/ml/server/routes/trained_models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -190,11 +190,41 @@ export function trainedModelsRoutes(
// we don't need to fill kibana's log with these messages.
mlLog.debug(e);
}
const enabledFeatures = getEnabledFeatures();
try {
if (enabledFeatures.dfa) {
const jobIds = result.map((model) => {
let id = model.metadata?.analytics_config?.id;
if (id) {
id = `${id}*`;
Copy link
Member

@jgowdyelastic jgowdyelastic Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be modifying the IDs. This could lead to unexpected behaviour. e.g. if we have two jobs foo and foo2. wildcarding foo* will match both.

In my previous suggestion I overlooked the fact that comma separating the IDs to pass to getDataFrameAnalytics will throw a 404 if any of them are missing.

I still don't think we should call getDataFrameAnalytics in a loop and so I think the simplest solution would be to fetch all DFA jobs and just loop over them looking for the IDs we want.
That call isn't expensive. We could also be clever and only use the job ID if there is only one model in the results. As it's likely this endpoint will be called with either one model ID or no model IDs.

I had a go at writing up what I'm thinking. I got carried away trying to make it performant.

const filteredModels = filterForEnabledFeatureModels<TrainedModelConfigResponse>(
  result,
  getEnabledFeatures()
);
const dfaJobIdMap = filteredModels.reduce<Record<string, string>>((c, m) => {
  const id = m.metadata?.analytics_config?.id;
  if (id !== undefined) {
    c[m.model_id] = id;
  }
  return c;
}, {});

const jobIds = Object.values(dfaJobIdMap);

if (jobIds.length === 0) {
  // return early, there are no dfa jobs
  return response.ok({
    body: filteredModels,
  });
}

let dfaJobs: estypes.MlDataframeAnalyticsSummary[] = [];
try {
  const jobs =
    jobIds.length === 1
      ? await mlClient.getDataFrameAnalytics({
          id: jobIds[0],
        })
      : await mlClient.getDataFrameAnalytics();

  dfaJobs = jobs.data_frame_analytics;
} catch (e) {
  //
}

for (const model of filteredModels) {
  const dfaJob = dfaJobs.find((j) => j.id === dfaJobIdMap[model.model_id]);
  model.origin_job_exists = dfaJob !== undefined;
}

return response.ok({
  body: filteredModels,
});

Also updating filterForEnabledFeatureModels to add a generic type to avoid type issues

export function filterForEnabledFeatureModels<
  T extends TrainedModelConfigResponse | estypes.MlTrainedModelConfig
>(models: T[], enabledFeatures: MlFeatures) {
  ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this offline and can't think of a situation where adding * to each job might cause the wrong job to be matched.
I'm still not a fan of adding these * characters to work around the fact the es endpoint will throw a 404 if one job can't be found. But there's not a compelling reason to demand this change.

}
return id;
});
const filteredJobIds = jobIds.filter((id) => id !== undefined);

const body = filterForEnabledFeatureModels(result, getEnabledFeatures());
const { data_frame_analytics: jobs } = await mlClient.getDataFrameAnalytics({
jgowdyelastic marked this conversation as resolved.
Show resolved Hide resolved
id: filteredJobIds.join(','),
jgowdyelastic marked this conversation as resolved.
Show resolved Hide resolved
allow_no_match: true,
});

jobs.forEach(({ id }) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be looping through models, not jobs. We need to set false for all job ids that aren't found, and by looping through jobs only we won't know if the jobs hasn't been foun

Copy link
Contributor Author

@alvarezmelissa87 alvarezmelissa87 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the job hasn't been returned then the origin_job_exists property wouldn't be added to the model - though if we want to explicitly set it to 'false' when it's not found then I agree that looping through the models makes more sense.

I was originally thinking that it would be more efficient to just loop through the returned jobs since that would mean maybe we wouldn't need to go through all the models but happy to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to loop through the filtered models to ensure all dfa models get the origin_job_exists property set explicitly in 50ffcfd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking that it would be more efficient to just loop through the returned jobs

I was actually thinking that to make this more efficient we could keep a list of the dfa models when identifying them in the reduce and only loop through those later on. e.g.

const dfaModels = [];
const jobIdsString = filteredModels.reduce(
  (jobIdsStr: string, currentModel: TrainedModelConfigResponse, idx: number) => {
    if (isTrainedModelConfigResponse(currentModel)) {
      dfaModels.push(currentModel);
      ....

But it's not needed, the time difference will be milliseconds

const model = result.find(
Copy link
Member

@jgowdyelastic jgowdyelastic Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to explicitly set to true or false, not just true for all dfa models.

Suggested change
const model = result.find(
if (m?.analytics_config?.id !== undefined) {
// if this is a dfa model, set origin_job_exists
model.origin_job_exists = result.find((m) => id === m.analytics_config.id) !== undefined;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only set it to true if a job id returned from the check matched the job id on the model. But agree that we should be setting it explicitly to false instead of just not adding the property at all and relying on falsey-ness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to set explicitly in 50ffcfd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly yeah,
origin_job_exists === undefined -> this is not a DFA model
origin_job_exists === true -> this a DFA model and the job exists
origin_job_exists === false -> this is a DFA model and the job no longer exits.
Falsey-ness is evil :D

(modelWithJob) => id === modelWithJob.metadata?.analytics_config?.id
);

if (model) {
model.origin_job_exists = true;
}
});
}
} catch (e) {
// Swallow error to prevent blocking trained models result
jgowdyelastic marked this conversation as resolved.
Show resolved Hide resolved
peteharverson marked this conversation as resolved.
Show resolved Hide resolved
}

const filteredModels = filterForEnabledFeatureModels(result, enabledFeatures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can loop over the filteredModels rather than result to remove the need to explicitly check for enabledFeatures.dfa as all dfa jobs will have been removed if dfa is not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - that makes sense - updated in 50ffcfd


return response.ok({
body,
body: filteredModels,
});
} catch (e) {
return response.customError(wrapError(e));
Expand Down