diff --git a/.doc_gen/metadata/sagemaker_metadata.yaml b/.doc_gen/metadata/sagemaker_metadata.yaml index 4ce7481215e..7cccd2cc07e 100644 --- a/.doc_gen/metadata/sagemaker_metadata.yaml +++ b/.doc_gen/metadata/sagemaker_metadata.yaml @@ -388,6 +388,9 @@ sagemaker_Scenario_GettingStarted: sagemaker_Scenario_Pipelines: title: Get started with &SM; geospatial jobs in a pipeline using an &AWS; SDK title_abbrev: Get started with geospatial jobs and pipelines + guide_topic: + title: Create and run SageMaker pipelines using &AWS; SDKs on Community.aws + url: https://community.aws/posts/create-and-run-sagemaker-pipelines-using-aws-sdks synopsis_list: - Set up resources for a pipeline. - Set up a pipeline that executes a geospatial job. diff --git a/.github/pre_validate/pre_validate.py b/.github/pre_validate/pre_validate.py index c30405d15a6..3357d7bacec 100644 --- a/.github/pre_validate/pre_validate.py +++ b/.github/pre_validate/pre_validate.py @@ -86,8 +86,6 @@ 'sample_cert.pem', 'sample_private_key.pem', 'sample_saml_metadata.xml', - 'GeoSpatialPipeline.json', - 'latlongtest.csv', } # Media file types. diff --git a/dotnetv3/SageMaker/Scenarios/README.md b/dotnetv3/SageMaker/Scenarios/README.md index 21078c125ce..10c757aff76 100644 --- a/dotnetv3/SageMaker/Scenarios/README.md +++ b/dotnetv3/SageMaker/Scenarios/README.md @@ -4,32 +4,49 @@ This scenario demonstrates how to work with Amazon SageMaker pipelines and geospatial jobs. +### Amazon SageMaker Pipelines A [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a series of -interconnected steps that can be used to automate machine learning workflows. You can create and run pipelines from SageMaker Studio by using Python, but you can also do this by using AWS SDKs in other +interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case. You can create and run pipelines from SageMaker Studio using Python, but you can also do this by using AWS SDKs in other languages. Using the SDKs, you can create and run SageMaker pipelines and also monitor operations for them. -### Pipeline steps -This example pipeline includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) -and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). -Both steps are processed by the same example Lambda function. +### Explore the scenario +This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket. -This Lambda code is included as part of this example, with the following functionality: -- Starts the SageMaker Vector Enrichment Job with the provided job configuration. +![Workflow image](../../../workflows/sagemaker_pipelines/resources/workflow.png) + +When you run the example console application, you can execute the following steps: + +- Create the AWS resources and roles needed for the pipeline. +- Create the AWS Lambda function. +- Create the SageMaker pipeline. +- Upload an input file into an S3 bucket. +- Execute the pipeline and monitor its status. +- Display some output from the output file. +- Clean up the pipeline resources. + +#### Pipeline steps +[Pipeline steps](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html) define the actions and relationships of the pipeline operations. The pipeline in this example includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) +and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). +Both steps are processed by the same example Lambda function. + +The Lambda function handler is included as part of the example, with the following functionality: +- Starts a [SageMaker Vector Enrichment Job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) with the provided job configuration. +- Processes Amazon SQS queue messages from the SageMaker pipeline. - Starts the export function with the provided export configuration. -- Processes Amazon SQS queue messages from the SageMaker pipeline. +- Completes the pipeline when the export is complete. -![Pipeline image](../Images/Pipeline.PNG) +![Pipeline image](../../../workflows/sagemaker_pipelines/resources/pipeline.png) -### Pipeline parameters +#### Pipeline parameters The example pipeline uses [parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) that you can reference throughout the steps. You can also use the parameters to change -values between runs. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) -locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. -The example demonstrates how to set and access these parameters. +values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) +locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. +The example demonstrates how to set and access these parameters before executing the pipeline using an SDK. -### Geospatial jobs +#### Geospatial jobs A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job -for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable -addresses powered by Amazon Location Service. Other types of jobs could be substituted in the pipeline instead. +for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable +addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead. ## ⚠ Important diff --git a/dotnetv3/SageMaker/Scenarios/SageMakerScenario.csproj b/dotnetv3/SageMaker/Scenarios/SageMakerScenario.csproj index 43bb44b666e..4cffd598e19 100644 --- a/dotnetv3/SageMaker/Scenarios/SageMakerScenario.csproj +++ b/dotnetv3/SageMaker/Scenarios/SageMakerScenario.csproj @@ -8,6 +8,9 @@ + + Always + PreserveNewest @@ -21,6 +24,12 @@ + + + Always + + + diff --git a/javav2/usecases/workflow_sagemaker_pipes/Readme.md b/javav2/usecases/workflow_sagemaker_pipes/Readme.md index 96874da3191..fb83863044e 100644 --- a/javav2/usecases/workflow_sagemaker_pipes/Readme.md +++ b/javav2/usecases/workflow_sagemaker_pipes/Readme.md @@ -4,33 +4,49 @@ This scenario demonstrates how to work with Amazon SageMaker pipelines and geospatial jobs. -A [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a series of -interconnected steps that can be used to automate machine learning workflows. You can create and run pipelines from SageMaker Studio by using Python, but you can also do this by using AWS SDKs in other +### Amazon SageMaker Pipelines +A [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a series of +interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case. You can create and run pipelines from SageMaker Studio using Python, but you can also do this by using AWS SDKs in other languages. Using the SDKs, you can create and run SageMaker pipelines and also monitor operations for them. -### Pipeline steps -This example pipeline includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) -and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). -Both steps are processed by the same example Lambda function. +### Explore the scenario +This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket. -This Lambda code is included as part of this example, with the following functionality: -- Starts the SageMaker Vector Enrichment Job with the provided job configuration. +![Workflow image](../../../workflows/sagemaker_pipelines/resources/workflow.png) + +When you run the example console application, you can execute the following steps: + +- Create the AWS resources and roles needed for the pipeline. +- Create the AWS Lambda function. +- Create the SageMaker pipeline. +- Upload an input file into an S3 bucket. +- Execute the pipeline and monitor its status. +- Display some output from the output file. +- Clean up the pipeline resources. + +#### Pipeline steps +[Pipeline steps](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html) define the actions and relationships of the pipeline operations. The pipeline in this example includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) +and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). +Both steps are processed by the same example Lambda function. + +The Lambda function handler is included as part of the example, with the following functionality: +- Starts a [SageMaker Vector Enrichment Job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) with the provided job configuration. +- Processes Amazon SQS queue messages from the SageMaker pipeline. - Starts the export function with the provided export configuration. -- Processes Amazon Simple Queue Service (Amazon SQS) messages from the SageMaker pipeline. +- Completes the pipeline when the export is complete. -![AWS Tracking Application](images/pipes.png) +![Pipeline image](../../../workflows/sagemaker_pipelines/resources/pipeline.png) -### Pipeline parameters +#### Pipeline parameters The example pipeline uses [parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) that you can reference throughout the steps. You can also use the parameters to change -values between runs. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) -locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. -The example demonstrates how to set and access these parameters. +values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) +locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. +The example demonstrates how to set and access these parameters before executing the pipeline using an SDK. -### Geospatial jobs +#### Geospatial jobs A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job -for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable -addresses powered by Amazon Location Service. Other types of jobs could be substituted in the pipeline instead. - +for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable +addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead. ## ⚠ Important * Running this code might result in charges to your AWS account. @@ -58,7 +74,7 @@ You must download and use these files to successfully run this code example: + GeoSpatialPipeline.json + latlongtest.csv -These files are located on GitHub in this folder [sample_files](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/resources/sample_files). +These files are located on GitHub in this [folder](../../../workflows/sagemaker_pipelines/resources). ### Java Lambda Function diff --git a/javav2/usecases/workflow_sagemaker_pipes/images/pipes.png b/javav2/usecases/workflow_sagemaker_pipes/images/pipes.png deleted file mode 100644 index f3215db0099..00000000000 Binary files a/javav2/usecases/workflow_sagemaker_pipes/images/pipes.png and /dev/null differ diff --git a/kotlin/usecases/workflow_sagemaker_pipes/Readme.md b/kotlin/usecases/workflow_sagemaker_pipes/Readme.md index b9a47486db7..735cabecdc9 100644 --- a/kotlin/usecases/workflow_sagemaker_pipes/Readme.md +++ b/kotlin/usecases/workflow_sagemaker_pipes/Readme.md @@ -4,32 +4,49 @@ This scenario demonstrates how to work with Amazon SageMaker pipelines and geospatial jobs. -A [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a series of -interconnected steps that can be used to automate machine learning workflows. You can create and run pipelines from SageMaker Studio by using Python, but you can also do this by using AWS SDKs in other +### Amazon SageMaker Pipelines +A [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a series of +interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case. You can create and run pipelines from SageMaker Studio using Python, but you can also do this by using AWS SDKs in other languages. Using the SDKs, you can create and run SageMaker pipelines and also monitor operations for them. -### Pipeline steps -This example pipeline includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) -and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). -Both steps are processed by the same example Lambda function. +### Explore the scenario +This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket. -This Lambda code is included as part of this example, with the following functionality: -- Starts the SageMaker Vector Enrichment Job with the provided job configuration. +![Workflow image](../../../workflows/sagemaker_pipelines/resources/workflow.png) + +When you run the example console application, you can execute the following steps: + +- Create the AWS resources and roles needed for the pipeline. +- Create the AWS Lambda function. +- Create the SageMaker pipeline. +- Upload an input file into an S3 bucket. +- Execute the pipeline and monitor its status. +- Display some output from the output file. +- Clean up the pipeline resources. + +#### Pipeline steps +[Pipeline steps](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html) define the actions and relationships of the pipeline operations. The pipeline in this example includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) +and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). +Both steps are processed by the same example Lambda function. + +The Lambda function handler is included as part of the example, with the following functionality: +- Starts a [SageMaker Vector Enrichment Job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) with the provided job configuration. +- Processes Amazon SQS queue messages from the SageMaker pipeline. - Starts the export function with the provided export configuration. -- Processes Amazon Simple Queue Service (Amazon SQS) messages from the SageMaker pipeline. +- Completes the pipeline when the export is complete. -![AWS App](images/pipes.png) +![Pipeline image](../../../workflows/sagemaker_pipelines/resources/pipeline.png) -### Pipeline parameters +#### Pipeline parameters The example pipeline uses [parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) that you can reference throughout the steps. You can also use the parameters to change -values between runs. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) -locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. -The example demonstrates how to set and access these parameters. +values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) +locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. +The example demonstrates how to set and access these parameters before executing the pipeline using an SDK. -### Geospatial jobs +#### Geospatial jobs A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job -for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable -addresses powered by Amazon Location Service. Other types of jobs could be substituted in the pipeline instead. +for demonstration purposes: a [Vector Enrichment Job (VEJ)](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) that processes a set of coordinates to produce human-readable +addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead. ## ⚠ Important @@ -59,7 +76,7 @@ To successfully run this code example, you must download and use the following f + GeoSpatialPipeline.json + latlongtest.csv -These files are located on GitHub in [sample_files](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/resources/sample_files). +These files are located on GitHub in this [folder](../../../workflows/sagemaker_pipelines/resources). ### Kotlin Lambda function diff --git a/kotlin/usecases/workflow_sagemaker_pipes/images/pipes.png b/kotlin/usecases/workflow_sagemaker_pipes/images/pipes.png deleted file mode 100644 index f3215db0099..00000000000 Binary files a/kotlin/usecases/workflow_sagemaker_pipes/images/pipes.png and /dev/null differ diff --git a/resources/sample_files/GeoSpatialPipeline.json b/resources/sample_files/GeoSpatialPipeline.json deleted file mode 100644 index 36dccff06fe..00000000000 --- a/resources/sample_files/GeoSpatialPipeline.json +++ /dev/null @@ -1,162 +0,0 @@ -{ - "Version": "2020-12-01", - "Metadata": {}, - "Parameters": [ - { - "Name": "parameter_execution_role", - "Type": "String", - "DefaultValue": "" - }, - { - "Name": "parameter_region", - "Type": "String", - "DefaultValue": "us-west-2" - }, - { - "Name": "parameter_queue_url", - "Type": "String", - "DefaultValue": "" - }, - { - "Name": "parameter_vej_input_config", - "Type": "String", - "DefaultValue": "" - }, - { - "Name": "parameter_vej_export_config", - "Type": "String", - "DefaultValue": "" - }, - { - "Name": "parameter_step_1_vej_config", - "Type": "String", - "DefaultValue": "" - } - ], - "PipelineExperimentConfig": { - "ExperimentName": { - "Get": "Execution.PipelineName" - }, - "TrialName": { - "Get": "Execution.PipelineExecutionId" - } - }, - "Steps": [ - { - "Name": "vej-processing-step", - "Type": "Lambda", - "Arguments": { - "role": { - "Get": "Parameters.parameter_execution_role" - }, - "region": { - "Get": "Parameters.parameter_region" - }, - "vej_input_config": { - "Get": "Parameters.parameter_vej_input_config" - }, - "vej_config": { - "Get": "Parameters.parameter_step_1_vej_config" - }, - "vej_name": "vej-pipeline-step-1" - }, - "FunctionArn": "*FUNCTION_ARN*", - "OutputParameters": [ - { - "OutputName": "statusCode", - "OutputType": "String" - }, - { - "OutputName": "vej_arn", - "OutputType": "String" - } - ] - }, - { - "Name": "vej-callback-step", - "Type": "Callback", - "Arguments": { - "role": { - "Get": "Parameters.parameter_execution_role" - }, - "region": { - "Get": "Parameters.parameter_region" - }, - "vej_arn": { - "Get": "Steps.vej-processing-step.OutputParameters['vej_arn']" - } - }, - "DependsOn": [ - "vej-processing-step" - ], - "SqsQueueUrl": { - "Get": "Parameters.parameter_queue_url" - }, - "OutputParameters": [ - { - "OutputName": "vej_status", - "OutputType": "String" - } - ] - }, - { - "Name": "export-vej-step", - "Type": "Lambda", - "Arguments": { - "vej_arn": { - "Get": "Steps.vej-processing-step.OutputParameters['vej_arn']" - }, - "role": { - "Get": "Parameters.parameter_execution_role" - }, - "region": { - "Get": "Parameters.parameter_region" - }, - "vej_export_config": { - "Get": "Parameters.parameter_vej_export_config" - } - }, - "DependsOn": [ - "vej-callback-step" - ], - "FunctionArn": "*FUNCTION_ARN*", - "OutputParameters": [ - { - "OutputName": "statusCode", - "OutputType": "String" - }, - { - "OutputName": "vej_arn", - "OutputType": "String" - } - ] - }, - { - "Name": "export-vej-callback", - "Type": "Callback", - "Arguments": { - "role": { - "Get": "Parameters.parameter_execution_role" - }, - "region": { - "Get": "Parameters.parameter_region" - }, - "vej_arn": { - "Get": "Steps.export-vej-step.OutputParameters['vej_arn']" - } - }, - "DependsOn": [ - "export-vej-step" - ], - "SqsQueueUrl": { - "Get": "Parameters.parameter_queue_url" - }, - "OutputParameters": [ - { - "OutputName": "statusJob", - "OutputType": "String" - } - ] - } - ] -} diff --git a/resources/sample_files/latlongtest.csv b/resources/sample_files/latlongtest.csv deleted file mode 100644 index 2dcd24a80fd..00000000000 --- a/resources/sample_files/latlongtest.csv +++ /dev/null @@ -1,200 +0,0 @@ -Longitude,Latitude --149.8935557,61.21759217 --149.9054948,61.19533942 --149.7522,61.2297 --149.8643361,61.19525062 --149.8379726,61.13751355 --149.9092788,61.13994658 --149.7364877,61.19533265 --149.8211,61.2156 --149.8445832,61.13806145 --149.9728678,61.176693 --149.8638034,61.14473454 --149.8359225,61.18081843 --149.88695,61.180763 --149.552503,61.345921 --149.5708432,61.32360721 --149.571083,61.3252 --147.690082,64.8522227 --147.7044254,64.85064492 --147.8222994,64.83696195 --151.53123,59.64204 --134.5621,58.3601 --131.680747,55.35173351 --152.3609342,57.81208717 --149.1183292,61.60093654 --149.4331873,60.12892832 --151.0417651,60.49443906 --151.051614,60.489986 --149.40958,61.57691 --86.827626,33.2152 --86.949656,34.785059 --86.78842024,33.50904809 --86.826127,33.447522 --86.7102897,33.42956176 --86.8018,33.5062 --86.8113765,33.37312732 --86.66982766,33.41838757 --86.7011,33.5868 --86.75620223,33.50263454 --86.79644474,33.50040126 --86.799091,33.504689 --87.898099,30.601019 --87.683099,30.377666 --87.682999,30.304936 --86.80917035,33.38072249 --86.69949,33.42254 --86.665699,34.7407 --86.68134336,34.74505961 --86.55230743,34.68203745 --86.58647,34.769819 --86.575905,34.7266 --86.57429381,34.69111819 --86.57919788,34.72002697 --88.04221,30.68976 --88.143477,30.697309 --88.12661,30.673693 --88.2255057,30.67333454 --88.196698,30.681004 --88.072584,30.680498 --88.14309381,30.67621703 --88.130514,30.687603 --88.22533,30.69911 --88.122539,30.6761 --88.190499,30.661638 --86.181656,32.338634 --86.183853,32.3825 --86.18089,32.3399 --86.26356801,32.35239312 --86.21922302,32.35217285 --86.773962,33.484513 --85.40687104,32.6280937 --85.3928,32.6548 --85.842,33.5927 --85.82810247,33.60858102 --87.852027,30.669385 --86.63801947,33.60234164 --86.64673,33.59541 --87.51826693,33.20120805 --87.54677614,33.21231052 --86.797593,33.421892 --86.735925,33.455541 --94.21321152,36.35696718 --92.53446,34.60255 --92.3874,35.0785 --92.41866,35.09135 --92.43947287,35.11074111 --94.145299,36.123007 --94.36294976,35.3587282 --93.06209925,34.45764525 --90.66704,35.821236 --92.38191429,34.74577386 --92.4126917,34.75204348 --92.21905583,34.72769385 --92.4149,34.798 --92.379005,34.75911 --92.35743226,34.77140489 --92.40416376,34.74810967 --92.33723,34.770222 --92.341099,34.753007 --92.2551,34.7993 --92.22356,34.789467 --94.18313137,36.31955482 --94.179842,36.3349 --94.1589,36.2834 --94.1879,36.304601 --94.356399,35.466507 --112.1347187,33.86405429 --112.1344948,33.86381987 --111.57396,33.415339 --112.287081,33.493493 --112.3050986,33.43536059 --112.2753599,33.46464844 --114.5864543,35.12158199 --111.8655,34.5658 --111.740086,32.89571681 --111.806444,33.21903128 --111.9124763,33.31998926 --111.789399,33.234573 --111.8228735,33.32087638 --111.9015716,33.30567804 --111.8585749,33.23378805 --111.84192,33.30353 --111.84123,33.28584 --111.9118504,33.31999187 --111.9684795,33.31760813 --111.9482622,33.30531037 --111.8964975,33.30569847 --111.858732,33.34972709 --111.8746327,33.3060856 --111.8764317,33.33595832 --111.8600981,33.26181573 --112.4521866,34.78282547 --112.0089493,34.72846222 --111.6638424,35.18716021 --111.6613874,35.18429883 --111.631,35.216557 --111.7398,35.3625 --111.661499,35.188047 --111.63142,35.194549 --114.597999,35.016058 --111.7299795,33.60857565 --111.7249286,33.61019636 --111.718697,33.575834 --111.6877649,33.36489156 --111.7714833,33.3354678 --111.7897403,33.30648175 --111.7249218,33.37926425 --111.7557216,33.34905619 --111.8056644,33.32092108 --111.7507,33.364 --111.7590113,33.335633 --111.8095615,33.33537004 --112.166999,33.59640608 --112.159799,33.610699 --112.3556078,33.53860232 --112.2029872,33.67000491 --112.2374114,33.65280991 --112.1436192,33.69838093 --112.1863216,33.65394714 --112.1855156,33.61081108 --112.2176653,33.63835723 --112.224434,33.63960715 --112.3753049,33.49345981 --112.392599,33.451464 --112.3583262,33.48026748 --112.3580189,33.46160901 --110.9987224,31.85233448 --111.6862971,33.28825881 --114.0374472,35.23700997 --114.337554,34.478282 --114.342988,34.47318734 --109.955309,34.14205054 --112.1692939,33.37746449 --111.239105,32.4409 --111.2598,32.4354 --111.0500991,32.3374794 --111.6840687,33.46544574 --111.599266,33.378662 --111.6261939,33.37899985 --111.8051231,33.42328203 --111.6840841,33.45479248 --111.7718998,33.4226701 --111.7532534,33.39379889 --111.6870689,33.33250477 --111.6375432,33.37909008 --111.8747,33.3814 --111.6361908,33.37831827 --111.6840636,33.46678288 --111.7538886,33.38021706 --111.755104,33.3937 --111.7874795,33.37902519 --111.687991,33.37826704 --111.874783,33.38388487 --111.861856,33.39311451 --111.8054634,33.38418881 --110.9629869,32.4272998 --110.939853,32.41179186 --114.279904,34.154358 --111.31757,34.24052 --112.2888222,33.56562937 diff --git a/workflows/sagemaker_pipelines/README.md b/workflows/sagemaker_pipelines/README.md new file mode 100644 index 00000000000..ddf00904fa2 --- /dev/null +++ b/workflows/sagemaker_pipelines/README.md @@ -0,0 +1,31 @@ +# Create and run a SageMaker geospatial pipeline using an AWS SDK + +## Overview + +This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon [SageMaker pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html). The pipeline itself executes a [geospatial job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket. + +When you run the example console application, you can execute the following steps: + +- Create the AWS resources and roles needed for the pipeline. +- Create the AWS Lambda function. +- Create the SageMaker pipeline. +- Upload an input file into an S3 bucket. +- Execute the pipeline and monitor its status. +- Display some output from the output file. +- Clean up the pipeline resources. + +These steps are completed using AWS SDKs as part of an interactive demo that runs at a command prompt. + +## Implementations + +This example is implemented in the following languages: + +* [.NET](../../dotnetv3/SageMaker/Scenarios/README.md) +* [Java](../../javav2/usecases/workflow_sagemaker_pipes/Readme.md) +* [Kotlin](../../kotlin/usecases/workflow_sagemaker_pipes/Readme.md) + +--- + +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. + +SPDX-License-Identifier: Apache-2.0 \ No newline at end of file diff --git a/workflows/sagemaker_pipelines/SPECIFICATION.md b/workflows/sagemaker_pipelines/SPECIFICATION.md new file mode 100644 index 00000000000..05e9c0cb26f --- /dev/null +++ b/workflows/sagemaker_pipelines/SPECIFICATION.md @@ -0,0 +1,332 @@ +# SageMaker workflow technical specification + +This document contains the technical specifications for the Amazon SageMaker Geospatial Pipelines example, a sample workflow that showcases SageMaker pipelines using SDKs. + +This document explains the following: + +- Application inputs and outputs +- Underlying AWS components and their configurations +- Implementation details and sample output +- Troubleshooting information + +### Table of contents + +- [Architecture](#architecture) +- [Common resources](#common-resources) +- [Metadata](#metadata) +- [Implementation](#implementation) +- [Troubleshooting](#troubleshooting) + +## Architecture +This workflow uses a pre-defined SageMaker pipeline to execute a geospatial job in SageMaker. The pipeline uses [Pipeline steps](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html) to define the actions and relationships of the pipeline operations. The pipeline in this example includes an [AWS Lambda step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-lambda) +and a [callback step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-callback). +Both steps are processed by the same example Lambda function. + +The Lambda function handler should be written as part of the example, with the following functionality: +- Starts a [SageMaker Vector Enrichment Job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) with the provided job configuration. +- Processes Amazon SQS queue messages from the SageMaker pipeline. +- Starts the export function with the provided export configuration. +- Completes the pipeline when the export is complete. + +This diagram represents the relationships between key components. +![relational diagram](resources/workflow.png) + +Amazon SageMaker is a managed machine learning service. Developers can build and train machine learning models and deploy them into a production-ready hosted environment. This example focuses on the pipeline capabilities rather than the model training and building capabilities, since those are more likely to be useful to an SDK developer. + +The example uses a geospatial job because it allows for a fast processing time, and it's simple to verify that the pipeline executed correctly. We expect that the user would replace this job with processing steps of their own, but be able to use the SDK pipeline operations for creating or updating a pipeline, handling callback and execution steps in an AWS Lambda function, and using pipeline parameters to set up input and output. + +The geospatial job itself is a Vector Enrichment Job (VEJ) that reverse geocodes a set of coordinates. Other job types are much slower to complete, and this job type has an easy-to-read output. Note that you should use **us-west-2 region** to use this job type. This particular job type is powered by Amazon Location Service, although you will not need to call that service directly. You can read more [about geospatial capabilities here].(https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial.html). + +The AWS Lambda function handles the callback and the parameter-based queue messages from the pipeline. This example includes writing this Lambda function and deploying it as part of the pipeline, and also connecting it to the SQS queue that is used by the pipeline. + +There are multiple ways to handle pipeline operations, but in the interest of consistency, the C# implementation is based on [this pipeline example reference](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-geospatial/geospatial-pipeline/assets/eoj_pipeline_lambda.py). This logic checks for the existence of parameters in the message to determine which type of processing to start. Other languages do not need to mimic the exact logic shown here, that functionality is left up to the language developer. + +The pipeline in this example is defined through a [JSON file](resources/GeoSpatialPipeline.json). Each language might want to name the steps and parameters in a way that makes sense for their implementation, but you can use the file here as a guide. + +## Common resources +This example has a set of common resources that are stored in the [resources](resources) folder. +- GeoSpatialPipeline.json defines the pipeline steps and parameters for the SageMaker pipeline. +- latlongtest.csv is a sample set of coordinates for geocoding. +- pipeline.png is a pipeline image to use in language-specific READMEs. +- workflow.png is a workflow image to use in language-specific READMEs. + +## Metadata +Service actions can either be pulled out as individual functions or can be incorporated into the scenario, but each service action must be included as an excerpt. + +### SageMaker actions +- CreatePipeline +- UpdatePipeline +- StartPipelineExecution +- DescribePipelineExecution +- DeletePipeline +- Hello Service + +### Metadata tags +``` +sagemaker_Hello +sagemaker_CreatePipeline +sagemaker_ExecutePipeline +sagemaker_DeletePipeline +sagemaker_DescribePipelineExecution +sagemaker_Scenario_Pipelines +``` + +## Implementation + +_Reminder:_ A scenario runs at a command prompt and prints output to the user on the result of each service action. Because of the choices in this workflow scenario, it must be run interactively. + +1. Set up any missing resources needed for the example if they don’t already exist. + 1. Create a Lambda role with the following: `iamClient CreateRole, AttachRolePolicy` + 1. AssumeRolePolicy: + ``` + { + Version: "2012-10-17", + Statement: [ + { + Effect: "Allow", + Action: ["sts:AssumeRole"], + Principal: { Service: ["lambda.amazonaws.com"] }, + }, + ], + } + ``` + b. ExecutionPolicy: + ``` + { + Version: "2012-10-17", + Statement: [ + { + Effect: "Allow", + Action: [ + "sqs:ReceiveMessage", + "sqs:DeleteMessage", + "sqs:GetQueueAttributes", + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:PutLogEvents", + "sagemaker-geospatial:StartVectorEnrichmentJob", + "sagemaker-geospatial:GetVectorEnrichmentJob", + "sagemaker:SendPipelineExecutionStepFailure", + "sagemaker:SendPipelineExecutionStepSuccess", + "sagemaker-geospatial:ExportVectorEnrichmentJob" + ], + Resource: "*", + }, + { + Effect: "Allow", + Action: ["iam:PassRole"], + Resource: `${pipelineExecutionRoleArn}`, + Condition: { + StringEquals: { + "iam:PassedToService": [ + "sagemaker.amazonaws.com", + "sagemaker-geospatial.amazonaws.com", + ], + }, + }, + }, + ], + } + ``` + + 1. Create a SageMaker role with the following: `iamClient CreateRole, AttachRolePolicy` + 1. AssumeRolePolicy: + ``` + { + Version: "2012-10-17", + Statement: [ + { + Effect: "Allow", + Action: ["sts:AssumeRole"], + Principal: { + Service: [ + "sagemaker.amazonaws.com", + "sagemaker-geospatial.amazonaws.com", + ], + }, + }, + ], + } + ``` + b. ExecutionPolicy: + ``` + { + Version: "2012-10-17", + Statement: [ + { + Effect: "Allow", + Action: ["lambda:InvokeFunction"], + Resource: lambdaArn, + }, + { + Effect: "Allow", + Action: ["s3:*"], + Resource: [ + `arn:aws:s3:::${s3BucketName}`, + `arn:aws:s3:::${s3BucketName}/*`, + ], + }, + { + Effect: "Allow", + Action: ["sqs:SendMessage"], + Resource: sqsQueueArn, + }, + ], + } + ``` + 1. Create an SQS queue for the pipeline. SqsClient CreateQueue, GetQueueUrl + You will need the queue URL for the pipeline execution. + 1. Create a bucket and upload a .csv file that includes Latitude and Longitude columns (see [the resources section](#common-resources)) for reverse geocoding. `s3 client PutBucket and PutObject` + 1. You can add an /input directory for this file. The pipeline will create a /output directory for the output file. +1. Add a Lambda handler, with code included and written in your language, that handles callback functionality and connect it to the queue. If the Lambda already exists, you can prompt the user if they would like to update it. Suggested timeout for the Lambda is 30 seconds. `lambdaClient CreateFunction, UpdateFunction, ListEventSourceMappings, CreateEventSourceMapping` + 1. The lambda performs the following tasks, based on the input: + 1. If queue records are present, processes the records to check the job status of the geospatial job. + COMPLETED: call SendPipelineExecutionStepSuccess + FAILED: call SendPipelineExeuctionStepFailure + IN_PROGRESS: log that the job is still running + 1. If export configuration is present, call ExportVectorEnrichmentJob + 1. If job name is present, call StartVectorEnrichmentJob + 1. The queue must be added to the event source mappings for the Lambda, and the event source mapping must be enabled. +1. Create a pipeline using the SDK with the following characteristics. If the pipeline already exists, use an Update call to update it. You can use the JSON referenced here as a guide for the pipeline definition. `sagemakerClient UpdatePipeline, CreatePipeline` + 1. Pipeline parameters for the job, input, and export steps. + 1. A Lambda processing step: a Lambda that kicks off a vector enrichment job that takes in a set of coordinates for reverse geocode. + 1. A callback step to check the progress of the processing job. + 1. An export step for the results of the VEJ. + 1. A callback step to finish the pipeline. +1. Execute the pipeline using the SDK with some input and poll for the execution status. `sagemakerClient StartPipelineExecution, DescribePipelineExecution` +1. When the execution is complete, fetch the latest output file and display some of the output data to the user. `s3client ListObjects, GetObject` +1. Provide instructions for optionally viewing the pipeline and executions in SageMaker Studio. +1. Clean up the pipeline and resources – the user gets to decide if they want to clean these up or not. + 1. Clean up pipeline. DeletePipeline + 1. Clean up the queue. DeleteQueue + 1. Clean up the bucket. DeleteObjects, DeleteBucket + 1. Clean up the Lambda. DeleteFunction + 1. Clean up the pipeline. DeletePipeline + +### Sample output + +``` +-------------------------------------------------------------------------------- +Welcome to the Amazon SageMaker pipeline example scenario. + +This example workflow will guide you through setting up and executing a +Amazon SageMaker pipeline. The pipeline uses an AWS Lambda function and an +Amazon SQS queue, and runs a vector enrichment reverse geocode job to +reverse geocode addresses in an input file and store the results in an export file. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +First, we will set up the roles, functions, and queue needed by the SageMaker pipeline. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Checking for role named SageMakerExampleLambdaRole. +-------------------------------------------------------------------------------- +Checking for role named SageMakerExampleRole. +-------------------------------------------------------------------------------- +Setting up the Lambda function for the pipeline. + The Lambda function SageMakerExampleFunction already exists, do you want to update it? +n + Lambda ready with ARN arn:aws:lambda:us-west-2:1234567890:function:SageMakerExampleFunction. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Setting up queue sagemaker-sdk-example-queue-test. +-------------------------------------------------------------------------------- +Setting up bucket sagemaker-sdk-test-bucket-test. + Bucket sagemaker-sdk-test-bucket-test ready. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Now we can create and execute our pipeline. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Setting up the pipeline. + Pipeline set up with ARN arn:aws:sagemaker:us-west-2:1234567890:pipeline/sagemaker-sdk-example-pipeline. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Starting pipeline execution. + Execution started with ARN arn:aws:sagemaker:us-west-2:1234567890:pipeline/sagemaker-sdk-example-pipeline/execution/f8xmafpxx3ke. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Waiting for pipeline execution to finish. + Execution status is Executing. + Execution status is Executing. + Execution status is Executing. + Execution status is Succeeded. + Execution finished with status Succeeded. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Getting output results sagemaker-sdk-test-bucket-test. + Output file: outputfiles/qyycwuuxwc9w/results_0.csv + Output file contents: + + -149.8935557,"61.21759217 + ",601,USA,"601 W 5th Ave, Anchorage, AK, 99501, USA",Anchorage,,99501 6301,Alaska,Valid Data + -149.9054948,"61.19533942 + ",2794,USA,"2780-2798 Spenard Rd, Anchorage, AK, 99503, USA",Anchorage,North Star,99503,Alaska,Valid Data + -149.7522,"61.2297 + ",,USA,"Enlisted Hero Dr, Jber, AK, 99506, USA",Jber,,99506,Alaska,Valid Data + -149.8643361,"61.19525062 + ",991,USA,"959-1069 E Northern Lights Blvd, Anchorage, AK, 99508, USA",Anchorage,Rogers Park,99508,Alaska,Valid Data + -149.8379726,"61.13751355 + ",2372,USA,"2276-2398 Abbott Rd, Anchorage, AK, 99507, USA",Anchorage,,99507,Alaska,Valid Data +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +The pipeline has completed. To view the pipeline and executions in SageMaker Studio, follow these instructions: +https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-studio.html +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Finally, let's clean up our resources. +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +Clean up resources. + Delete pipeline sagemaker-sdk-example-pipeline? (y/n) +y + Delete queue https://sqs.us-west-2.amazonaws.com/565846806325/sagemaker-sdk-example-queue-rlhagerm? (y/n) +y + Delete S3 bucket sagemaker-sdk-test-bucket-rlhagerm2? (y/n) +y + Delete role SageMakerExampleLambdaRole? (y/n) +y + Delete role SageMakerExampleRole? (y/n) +y +-------------------------------------------------------------------------------- +-------------------------------------------------------------------------------- +SageMaker pipeline scenario is complete. +-------------------------------------------------------------------------------- +``` +### Hello SageMaker +The Hello Service example should demonstrate how to set up the client and make an example call using the SDK. + +Initialize the client and call ListNotebookInstances to list up to 5 of the account's notebook instances. If no instances are found, you can direct the user to instructions on how to add one. + +Sample output: + +``` +Hello Amazon SageMaker! Let's list some of your notebook instances: + Instance: test-notebook + Arn: arn:aws:sagemaker:us-west-2:123456789:notebook-instance/test-notebook + Creation Date: 6/7/2023 +``` + +_General info for Hello Service example snippets:_ +This section of the workflow should be a streamlined, simple example with enough detail to be as close to “copy/paste” runnable as possible. This example may include namespaces and other setup in order to focus on getting the user up and running with the new service. + +### README + +This is a workflow scenario. As such, the READMEs should be standardized. +This is the [.NET reference README](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/dotnetv3/SageMaker/Scenarios/README.md). When a language implementation is completed, update the [parent README](README.md) to include the new language SDK. + +## Troubleshooting +- You might want to view your pipeline in SageMaker studio, which will require a domain. +This will provide better debugging for the pipeline execution steps. You can use a default domain or create a custom domain. +- Amazon CloudWatch Logs will help you with this task. SageMaker studio should link each step to relevant logs. +- When testing your Lambda function, you might find it useful to log out the function input. Serialization differences between languages (capitalization, nulls) can cause the function to fail. +- When pipelines are failing, don't delete the queue until the pipeline execution has stopped. +- Pipelines can only be deleted through the SDK. +- For an example, see the [.NET implementation](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/dotnetv3/SageMaker). You might find it useful to begin with the .NET Lambda and get the rest of the workflow working, then work on the language-specific Lambda function handler. +- Geospatial jobs are supported in `region us-west-2`. All operations should use this Region unless otherwise specified. +- Pipeline callbacks won't resolve until SendPipelineExecutionStepSuccess or SendPipelineExecutionStepFailure are called. There's a risk of having to reach out to support if these aren't called. + +### SageMaker documentation references: + +- [SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) +- [SageMaker API Reference](https://docs.aws.amazon.com/sagemaker/latest/APIReference/Welcome.html?icmpid=docs_sagemaker_lp) +- [Sagemaker examples and example notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-geospatial/geospatial-pipeline) diff --git a/dotnetv3/SageMaker/Scenarios/GeoSpatialPipeline.json b/workflows/sagemaker_pipelines/resources/GeoSpatialPipeline.json similarity index 100% rename from dotnetv3/SageMaker/Scenarios/GeoSpatialPipeline.json rename to workflows/sagemaker_pipelines/resources/GeoSpatialPipeline.json diff --git a/dotnetv3/SageMaker/Scenarios/latlongtest.csv b/workflows/sagemaker_pipelines/resources/latlongtest.csv similarity index 100% rename from dotnetv3/SageMaker/Scenarios/latlongtest.csv rename to workflows/sagemaker_pipelines/resources/latlongtest.csv diff --git a/dotnetv3/SageMaker/Images/Pipeline.PNG b/workflows/sagemaker_pipelines/resources/pipeline.png similarity index 100% rename from dotnetv3/SageMaker/Images/Pipeline.PNG rename to workflows/sagemaker_pipelines/resources/pipeline.png diff --git a/workflows/sagemaker_pipelines/resources/workflow.png b/workflows/sagemaker_pipelines/resources/workflow.png new file mode 100644 index 00000000000..81f2a0a48b2 Binary files /dev/null and b/workflows/sagemaker_pipelines/resources/workflow.png differ