Skip to content

Commit

Permalink
feat(glue-alpha): add job run queuing to Glue job (#31830)
Browse files Browse the repository at this point in the history
### Issue # (if applicable)

Closes #31826 

### Reason for this change

[Job](https://docs.aws.amazon.com/cdk/api/v2/docs/@aws-cdk_aws-glue-alpha.Job.html) within [@aws-cdk/aws-glue-alpha](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-glue-alpha-readme.html) does not currently include the [jobRunQueuingEnabled](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue.CfnJob.html#jobrunqueuingenabled) property of the [CfnJob](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue.CfnJob.html) within [aws-cdk-lib/aws-glue](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue-readme.html). Setting this property currently requires a [raw override](https://docs.aws.amazon.com/cdk/v2/guide/cfn_layer.html#develop-customize-override).

### Description of changes

Added `jobRunQueuingEnabled` to construction properties for `Job`, along with validation that this is not enabled when execution class is flexible and/or `maxRetries` exceeds zero  ([see](https://aws.amazon.com/blogs/big-data/introducing-job-queuing-to-scale-your-aws-glue-workloads/)).

### Description of how you validated changes

Unit tests and an integration test.

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
jdebuseamazon authored Nov 4, 2024
1 parent be6a964 commit 5fca268
Show file tree
Hide file tree
Showing 10 changed files with 425 additions and 22 deletions.
18 changes: 18 additions & 0 deletions packages/@aws-cdk/aws-glue-alpha/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,24 @@ The `sparkUI` property also allows the specification of an s3 bucket and a bucke

See [documentation](https://docs.aws.amazon.com/glue/latest/dg/add-job.html) for more information on adding jobs in Glue.

### Enable Job Run Queuing

AWS Glue job queuing monitors your account level quotas and limits. If quotas or limits are insufficient to start a Glue job run, AWS Glue will automatically queue the job and wait for limits to free up. Once limits become available, AWS Glue will retry the job run. Glue jobs will queue for limits like max concurrent job runs per account, max concurrent Data Processing Units (DPU), and resource unavailable due to IP address exhaustion in Amazon Virtual Private Cloud (Amazon VPC).

Enable job run queuing by setting the `jobRunQueuingEnabled` property to `true`.

```ts
new glue.Job(this, 'EnableRunQueuing', {
jobName: 'EtlJobWithRunQueuing',
executable: glue.JobExecutable.pythonEtl({
glueVersion: glue.GlueVersion.V4_0,
pythonVersion: glue.PythonVersion.THREE,
script: glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')),
}),
jobRunQueuingEnabled: true,
});
```

## Connection

A `Connection` allows Glue jobs, crawlers and development endpoints to access certain types of data stores. For example, to create a network connection to connect to a data source within a VPC:
Expand Down
18 changes: 18 additions & 0 deletions packages/@aws-cdk/aws-glue-alpha/lib/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,16 @@ export interface JobProps {
*/
readonly description?: string;

/**
* Specifies whether job run queuing is enabled for the job runs for this job.
* A value of true means job run queuing is enabled for the job runs.
* If false or not populated, the job runs will not be considered for queueing.
* If this field does not match the value set in the job run, then the value from the job run field will be used.
*
* @default - no job run queuing
*/
readonly jobRunQueuingEnabled?: boolean;

/**
* The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs.
* Cannot be used for Glue version 2.0 and later - workerType and workerCount should be used instead.
Expand Down Expand Up @@ -722,6 +732,9 @@ export class Job extends JobBase {
if (props.workerType && (props.workerType !== WorkerType.G_1X && props.workerType !== WorkerType.G_2X)) {
throw new Error('FLEX ExecutionClass is only available for WorkerType G_1X or G_2X');
}
if (props.jobRunQueuingEnabled === true) {
throw new Error('FLEX ExecutionClass is only available if job run queuing is disabled');
}
}

let maxCapacity = props.maxCapacity;
Expand All @@ -743,6 +756,10 @@ export class Job extends JobBase {
throw new Error('Both workerType and workerCount must be set');
}

if (props.jobRunQueuingEnabled === true && props.maxRetries !== undefined && !cdk.Token.isUnresolved(props.maxRetries) && props.maxRetries > 0) {
throw new Error(`Maximum retries was set to ${props.maxRetries}, must be set to 0 with job run queuing enabled`);
}

const jobResource = new CfnJob(this, 'Resource', {
name: props.jobName,
description: props.description,
Expand All @@ -756,6 +773,7 @@ export class Job extends JobBase {
glueVersion: executable.glueVersion.name,
workerType: props.workerType?.name,
numberOfWorkers: props.workerCount,
jobRunQueuingEnabled: props.jobRunQueuingEnabled,
maxCapacity: props.maxCapacity,
maxRetries: props.maxRetries,
executionClass: props.executionClass,
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -1754,6 +1754,127 @@
},
"WorkerType": "G.1X"
}
},
"EtlJobWithRunQueuingServiceRole33547334": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
}
}
],
"Version": "2012-10-17"
},
"ManagedPolicyArns": [
{
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":iam::aws:policy/service-role/AWSGlueServiceRole"
]
]
}
]
}
},
"EtlJobWithRunQueuingServiceRoleDefaultPolicy5725F511": {
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyDocument": {
"Statement": [
{
"Action": [
"s3:GetBucket*",
"s3:GetObject*",
"s3:List*"
],
"Effect": "Allow",
"Resource": [
{
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":s3:::",
{
"Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
},
"/*"
]
]
},
{
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":s3:::",
{
"Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
}
]
]
}
]
}
],
"Version": "2012-10-17"
},
"PolicyName": "EtlJobWithRunQueuingServiceRoleDefaultPolicy5725F511",
"Roles": [
{
"Ref": "EtlJobWithRunQueuingServiceRole33547334"
}
]
}
},
"EtlJobWithRunQueuingA1B098B5": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"PythonVersion": "3",
"ScriptLocation": {
"Fn::Join": [
"",
[
"s3://",
{
"Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
},
"/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py"
]
]
}
},
"DefaultArguments": {
"--job-language": "python"
},
"GlueVersion": "4.0",
"JobRunQueuingEnabled": true,
"Name": "EtlJobWithRunQueuing",
"Role": {
"Fn::GetAtt": [
"EtlJobWithRunQueuingServiceRole33547334",
"Arn"
]
}
}
}
},
"Parameters": {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 5fca268

Please sign in to comment.