Skip to content

Use Azure automation to run an iterative u-sql query against Azure Date Lake store

Notifications You must be signed in to change notification settings

navalev/azure-automation-data-lake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Automation Data Lake Analytics Scheduled Job

Using Azure Automation, Azure Scheduler and Data Lake Analytics Job to execute a U-SQL query against a Data Lake Store.

Scenario

An iterative task need to be executed against data stored in a Data Lake Store. The task in this example is an append on files stored in the data lake store. A Data Lake Anaytics Job is submitted using Azure Automation. Azure Scheduler is used, since the minimal interval for the built in scheduler for Azure Automation is 1 hour. This will allow better granularity with 10 minute intervals. In this scanario I assume that you have already setup a Data Lake Store.

alt tag

Deployment Setup Flow

The deployment flow is implemented in automateDataLakeJob.ps1 powershell script, and consists of 4 parts:

  1. Create a storage account and upload all the neccecary assets to it:
  2. Deploy an ARM template automationAccountDeployment.json to create an Azure Automation account with:
    • Runbook with script
    • DataLakeAnaytics powershell module
    • Variables to be used by the script in the runbook
    • Credentials object with Azure AD user to execute the automation scripts with.
  3. Create a webhook in the Automation runbook (at the moment this can't be done via ARM template)
  4. Create an Azure Scheduler collection and an HTTP job with the Automation runbook webhook as the POST uri

Authenticating with Azure Automation

Azure Automation requires an Azure Active Directoy organizational user to authenticate. There are some limitations - the user can not have multi factor authentiation enabled, and as of now service principle authentication (for Azure Resource Manager) is not supported. I would recommnd to create a user specificly for the Automation jobs. This user info (username and passowrd) will be stored in Credentials assets in the Automation Account. For more information, read this tutorial.

Executing the Script

Forks this repository, and edit automateDataLakeJob.ps1 with your information:

  1. If you have more han one subcription in your account, set the right id
  2. Set the automation ADD user name
  3. Set the automation AAD password
  4. Set the Data Lake account name
  5. Set the Data Lake resource group
  6. Set the automation account webhook expiry date
  7. Set the scheduler job start time

Note that in this scanario the scheduler job interval is 10 minutes.

References

Scheduling Azure Automation with Azure Scheduler

Azure Automation Authentication

Azure Automation ARM Powerhsell Modules

Azure Data Lake Analytics Powershell

About

Use Azure automation to run an iterative u-sql query against Azure Date Lake store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published