How to apply a custom policy for Databricks Job Compute and execute a job from Azure Data Factory

Sergii Bielskyi
3 min read4 days ago

--

Here is a new use case demonstrating how to utilize a custom policy created in Azure Databricks and apply it to Job Compute to execute from Azure Data Factory as an activity. Suppose you need to start Job Compute using a specific policy to execute a notebook script. Job Compute should be initiated from the Azure Data Factory Pipeline.

From the Azure Data Factory interface, a Linked Service to Databricks must be created.

Here, we have the option to create a new job cluster. However, there are no options to include a custom policy. According to the official documentation, there is a field called policyId that can be used to apply a custom policy.

{
"type": "AzureDatabricks",
"typeProperties": {
"accessToken": {
"type": "string"
// For remaining properties, see SecretBase objects
},
"authentication": {},
"credential": {
"referenceName": "string",
"type": "string"
},
"domain": {},
"encryptedCredential": "string",
"existingClusterId": {},
"instancePoolId": {},
"newClusterCustomTags": {
"{customized property}": {}
},
"newClusterDriverNodeType": {},
"newClusterEnableElasticDisk": {},
"newClusterInitScripts": {},
"newClusterLogDestination": {},
"newClusterNodeType": {},
"newClusterNumOfWorker": {},
"newClusterSparkConf": {
"{customized property}": {}
},
"newClusterSparkEnvVars": {
"{customized property}": {}
},
"newClusterVersion": {},
"policyId": {},
"workspaceResourceId": {}
}
}

So we need to go to Azure Databrics, open the policy and copy the ID.

Next, you need to set this policyId in the Azure Data Factory Linked Service. To accomplish this, open the service as JSON and insert this attribute, as the UI does not provide this option.

Well, when we started our pipeline we got an error like

Operation on target Python Run Script failed: Cluster validation error: Validation failed for enable_local_disk_encryption, the value must be true (is "false"); Validation failed for data_security_mode, must be present; Validation failed for runtime_engine, the value must be present

This indicates that our policy requires certain attributes. The issue is how to set these attributes if there are no properties within the Linked Service. We need to include enable_local_disk_encryption and data_security_mode. To achieve this, we must reopen the Linked Service and input these attributes.

            "enableLocalDiskEncryption": true,
"dataSecurityMode": "SINGLE_USER"

You may notice that the naming conventions used within Azure Data Factory differ when integrated with Azure Databricks. Additionally, you can make certain attributes optional in a policy if they have default values. For example,

  "runtime_engine": {
"defaultValue": "STANDARD",
"type": "unlimited",
"isOptional": true
}

That is probably it, Happy coding.

--

--

Sergii Bielskyi
Sergii Bielskyi

Written by Sergii Bielskyi

Cloud is more that you imagine… Microsoft Azure MVP | Founder of IoT community in Ukraine | MCPD | MCTS | MCP

No responses yet