ARM Template Sandbox

Resource Not Found (dependsOn is not working)

I’ve been getting this question quite a bit lately so I thought a good reason to get something written down (and it forced me to realize I never finished the previous topic).

You have a resource in an Azure Resource Manager template that references another resource using reference() or listKeys(). That reference() call is behind a dependsOn or a condition property, but the deployment tries to reference the resource too early and the deployment fails. This makes it look like the dependsOn is ignored or in the case of a condition, the resource may not even be deployed but the reference() or listKeys() is still called. Here’s how to fix it…

First some background – these functions reference() or anything that starts with list*() are what we call “run-time” functions – meaning they are evaluated at “run-time” or when the deployment has started. All the other functions and language expressions in ARM are “design-time” or “compile-time” functions. They are evaluated before deployment begins.

Once deployment starts, all run-time functions are scheduled by the deployment engine and they can be done in parallel to other operations. It’s this parallel execution, that keeps things fast and can occasionally cause the problem. Sometimes the scheduler doesn’t honor the dependency and in the case of a condition, ARM can evaluate the resources (partially) even if the resource will not be deployed.

To be clear, this is a really old design pattern and the capabilities of the language have grown up around it and exposed some scenarios as deployments get more sophisticated. We’re working on fixing it but it’s harder than it sounds and you can work around it, in some cases pretty easily if you know how it works. Since there are multiple things we need to work through here you may not see the exact symptoms as they start to go away (as we do start to fix things). We’re going through this slowly since this deployment engine handles millions of operations every day. But since it’s a question I keep getting here’s how you can “fix” it – try the following (in this order).

Remove the apiVersion from the reference() call. This *only* works if the resource being referenced is deployed unconditionally in the same template. If the apiVersion is omitted, ARM assumes the resource is defined in the template and schedules the call after the resource is deployed. If the resource is conditionally deployed you can’t do this since the apiVersion is needed for the GET (i.e. the reference()) and the apiVersion on the resource is actually not available (remember I mentioned this was harder to fix than it sounds). Also, this doesn’t work for any list*() function as the apiVersion is always required in a list*() function.
Wrap the call in an if() statement. In the case of a conditional resource, wrap the run-time function call in an if() statement with the same condition as the resource itself. ARM won’t evaluate the “false” side of the statement so the call is never made – like you would expect with the conditional resource itself (hopefully you can start to see why this is hard).
Use a nested deployment. If neither of the above work, you likely have a scenario where the resource is conditionally deployed as part of a new or existing pattern. I.e. the template needs a storage account but it may or may not be deployed in the same template. In this case, you need the apiVersion parameter, which means ARM will schedule the call when it can and sometimes that happens too early. The only way around this last scenario is to nest the deployment that needs the reference() or consumes the resource. In that case, you can do one of two things, which I’ll show below. Nesting the deployment schedules another deployment in ARM and if dependencies are set *and* run-time functions are used as input to the deployment, ARM will defer the call. Knowing this you can use it for other “advanced” scenarios too, like changing deployment options based on run-time state (more on this below too).

Remove API Version

Remember this only works with reference(), not list*(), and when the resource is unconditionally deployed in the same template, but it’s this simple:

        "diagnosticsProfile": {
          "bootDiagnostics": {
            "enabled": true,
            "storageUri": "[reference(variables('storageAccountName')).primaryEndpoints.blob]"
          }
        }

Also, if the name of the resource is unambiguous in the template (i.e. unique) then the full resourceId is not needed and you can use use the resource name.

Use an if() wrapper

If you have a conditional resource, just wrap the run-time function in an if() statement with the same condition, it will only be evaluated when the resource is actually deployed. See line 2 and line 13.

    {
      "condition": "[parameters('deployThisResource')]",
      "type": "Microsoft.Web/sites",
      "apiVersion": "2019-08-01",
      "name": "[variables('webSiteName')]",
      "location": "[parameters('location')]",
      "properties": {
        "serverFarmId": "[resourceId('Microsoft.Web/serverfarms', variables('hostingPlanName'))]",
        "siteConfig": {
          "appSettings": [
            {
              "name": "CosmosDb:Key",
              "value": "[if(parameters('deployThisResource'), listKeys(resourceId('Microsoft.DocumentDb/databaseAccounts', variables('cosmosAccountName')), '2020-04-01').primaryMasterKey, 'does not matter')]"
            }
          ]
        }
      },

Use a Nested Deployment

Ok, this is the most inelegant of the three but if your scenario isn’t handled by the other two above, this will always work. The other technique you can use this approach for is a scenario where you want to know something about a resource (or a dependency) before deploying it. Remember the nested deployment scheduling is “strict” so things will always be done in the order you expect. So whether it’s just a simple case of sequencing or you need to retrieve run-time state from a resource before you know what you want to deploy, this technique can be used.

The key here is that you are using a run-time function as a parameter to a nested deployment. This could be referencing a deployment output from another nested deployment or any other run-time function as a parameter value that’s passed in to the nested deployment. The nested deployment could be a linked template or inline, doesn’t matter, they will behave the same way.

Here is one example – I linked to the line of code, but if (or when) it changes look at the MongoDBUri parameter value. That param value is calling the list*() function directly which will defer the call. Another way you could do the same thing is just pass in the resourceId() and put the listKeys call in the template which is a slightly better practice for debugging.

Here’s is a more advanced example. In this example, I want to set up diagnostic logging on storage endpoints, but not all storageAccounts expose all endpoints. To simplify deployment, I can just prompt for the storageAccount name and determine the endpoints it supports at run-time – rather than have the user of the template supply the endpoints (and potentially miss one and it wouldn’t be logged).

Key points here:

This nested deployment is inline, not linked, so no staging required. expressionEvaluationOptions are set to inner scope (line 9), so evaluation is deferred until needed.
One of the parameters to the template is an object containing all endpoints available on the storageAccount (line 13), so I don’t need to know ahead of time which are supported on the given storageAccount.
Variables then make it easy to set the condition property on each diags resource to determine which endpoints need logging (line 34, etc).

    "resources": [
        {
            "apiVersion": "2019-10-01",
            "name": "nested",
            "type": "Microsoft.Resources/deployments",
            "properties": {
                "mode": "Incremental",
                "expressionEvaluationOptions": {
                    "scope": "inner"  // this allows putting any template inline and evaluation is deferred
                },
                "parameters": {
                    "endpoints": {
                        "value": "[reference(resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName')), '2019-06-01', 'Full').properties.primaryEndpoints]"
                    }
                   // snip
                },
                "template": {
                    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
                    "contentVersion": "1.0.0.0",
                    "parameters": {
                        "endpoints": {
                            "type": "object"
                        }
                    //snip
                    },
                    "variables": {
                        "hasblob": "[contains(parameters('endpoints'),'blob')]",
                        "hastable": "[contains(parameters('endpoints'),'table')]",
                        "hasfile": "[contains(parameters('endpoints'),'file')]",
                        "hasqueue": "[contains(parameters('endpoints'),'queue')]"
                    },
                    "resources": [
                        {
                            "condition": "[variables('hasblob')]",
                            "type": "Microsoft.Storage/storageAccounts/blobServices/providers/diagnosticsettings",
                            "apiVersion": "2017-05-01-preview",
                            "name": "[concat(parameters('storageAccountName'),'/default/Microsoft.Insights/', parameters('settingName'))]",
                            "properties": {
                                "workspaceId": "[parameters('workspaceId')]",
                                "storageAccountId": "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageSinkName'))]",
                                "logs": [
                                    {
                                        "category": "StorageRead",
                                        "enabled": true
                                    }
                                ],
                                "metrics": [
                                    {
                                        "category": "Transaction",
                                        "enabled": true
                                    }
                                ]
                            }
                        },
                        {
                            "condition": "[variables('hastable')]"
                            // snip
                        },
                        {
                            "condition": "[variables('hasfile')]"
                            // snip
                        },
                        {
                            "condition": "[variables('hasqueue')]"
                            // snip
                        }
                    ]
                }
            }
        }
    ]

I think that captures it… again apologies for this being necessary – we are working on it but in the meantime…

As always, lmk if I missed anything or there’s something else you’d like to see…

The Ultimate ARM Template – Part 2 (Parameters)

When you think of a template that will give you the most flexibility or reuse the main thing you’ll need is the ability to change what that template does and you do that with parameters. Variables help make this easier to do and read. So by allowing the consumer of the template to provide parameters for different options, you maximize reuse. There is a balance here in that you don’t want to provide parameters for everything as that will make things difficult to consume but there are things you can do to achieve a good balance. To be honest, it’s really all about how you use defaultValues…

Default Values

Default values essentially make a parameter “optional” in that the consumer of a template does not have to provide a value. Given this, you want to make sure that whatever defaultValue you provide will work in the majority of, or the most common cases. For example, if the parameter value needs to be globally unique, don’t provide a literal value, generate a unique one. However, idempotency is also important. The uniqueString() function generates a deterministic value and it instrumental in this approach. Contrast that with the newGuid() function that will generate a new GUID every time the template is deployed. This sample below is a good balance of idempotency and uniqueness – the expression creates a new storageAccount name for each VM. So this template could be deployed repeatedly to the same resourceGroup and not have to worry about conflicts (for example with the storageAccount SKU).


    "storageAccountName": {
      "type": "string",
      "defaultValue": "[concat('storage', uniqueString(parameters('vmName'), resourceGroup().id))]",
      "metadata": {
        "description": "Name of the storage account"
      }
    },

The downside is that it will create a storageAccount for each VM which isn’t likely necessary, but… this is only default. So the template can still be used for sharing a storageAccount, but since it’s generally not possible for the template author to determine the defaultValues needed for that scenario, the default/simple case is the scenario to optimize for here.

Default Values Working Together

Another thing to consider is how different parameters work together and how the defaultValues should too. In the ultimate template the consumer has the option of using new or existing resources for certain parts of the deployment. In this example, you may want to leverage an existing storageAccount for boot diagnostics on the VM. Now, again, I can’t reasonably assume what that existing storage account might be unless I’m in a constrained environment. So to make the template flexible and easy to use, the defaultValues work together to deploy a new storageAccount.

    
    "storageNewOrExisting": {
      "type": "string",
      "defaultValue": "new",
      "allowedValues": [
        "new",
        "existing",
        "none"
      ],
      "metadata": {
        "description": "Determines whether or not a new storage account should be provisioned.  'none' disables boot diags."
      }
    },
    "storageAccountName": {
      "type": "string",
      "defaultValue": "[concat('storage', uniqueString(parameters('vmName'), resourceGroup().id))]",
      "metadata": {
        "description": "Name of the storage account"
      }
    },
    "storageAccountType": {
      "type": "string",
      "defaultValue": "Standard_LRS",
      "allowedValues": [
        "Standard_LRS",
        "Standard_GRS",
        "Standard_RAGRS"
      ],
      "metadata": {
        "description": "Storage account type for boot diagnostics, only LRS/GRS skus are allowed."
      }
    },
    "storageAccountResourceGroupName": {
      "type": "string",
      "defaultValue": "[resourceGroup().name]",
      "metadata": {
        "description": "Name of the resource group for the existing storage account"
      }
    },

Here’s how they work together:

storageAccountNewOrExisting – this determines the scenario
storageAccountName – generates a unique name for each deployment
storageAccountSku – this doesn’t matter much for the scenario, but is the most cost efficient since we always have a new account
storageAccountResourceGroupName – perhaps only the one of these that’s not obvious but in the case of an existing storageAccount, that resource may be in a different resourceGroup – so the resourceId() function that references it needs the resourceGroup parameter. IOW, in the “new” scenario, this will be the current resourceGroup and then supplied by the user in the non-default case. Defaulting to the current resourceGroup simplifies the reference to it (more on this below).

So as a consumer of this template if I want to deploy the scenario with a new storageAccount, I can rely on the defaultValues of all the parameters that relate to it, to simplify my deployment. Also, note the parameter for new/existing also allows for a value of “none” if I want to disable boot diags. Two things to call out here:

I didn’t need an extra parameter (say a boolean) for enabling disabling. On one hand that may be “overloading” the parameter but it’s still very readable, understandable and uses less code overall.
When I select “none” I don’t need to worry about all the other parameters, I can use the default because the others will be ignored. That won’t be obvious in every deployment experience, say the Portal’s Template Deployment blade, but in scenarios where you control the user experience (Marketplace, Managed Apps and the forth coming templateSpec) the experience can be exactly as you’d expect.

Finally, this approach allows me to simplify the creation of resourceIds – I don’t need an if() condition around the new or existing case, because of the defaultValues, I can construct the resourceId in a consistent manner. This always works in the common case because of the defaultValue for the storageAccountResourceGroup name.

"storageAccountId": "[resourceId(parameters('storageAccountResourceGroupName'), 'Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]",

Parameter Validation

There is a trick you can use to validate parameter values and even validate parameter sets. It’s not the most elegant but it is possible and it will allow you to “fail fast” of something is wrong with the deployment. Here’s an example: In the ultimate template I have the option of using a managedIdentity on the VM. This could be a userAssigned or systemAssigned identity. If it’s systemAssigned, that’s simple, I just set one parameter value. However, if it’s userAssigned, I need to provide a value for the id of the identity – so I want to ensure that the user did not select a userAssigned identity and fail to provide it. Here’s what that looks like:

    "managedIdentity": {
      "type": "string",
      "defaultValue": "None",
      "allowedValues": [
        "UserAssigned",
        "SystemAssigned",
        "None"
      ],
      "metadata": {
        "description": "Managed identity type, if UserAssigned, the value of the UserAssignedIdentity parameter must also be set."
      }
    },
    "userAssignedIdentity": {
      "type": "string",
      "defaultValue": "",
      "metadata": {
        "description": "The resourceId of the user assigned managed identity to assign to the virtual machine."
      }
    },
    "validateManagedIdentity": {
      "type": "bool",
      "allowedValues": [
        true
      ],
      "defaultValue": "[if(and(equals(parameters('managedIdentity'),'UserAssigned'), empty(parameters('userAssignedIdentity'))), bool('false'), bool('true'))]",
      "metadata": {
        "description": "Check to ensure that if a managedIdentity type is UserAssigned that UserAssignedIdenity parameters is not empty."
      }
    },

The “validation” starts on line 20 with the validateManagedIdentity parameter. This parameter is a boolean type with only one allowedValue, it must validate successfully, or in other words it must be “true”. The defaultValue has a condition that makes sure the condition is met, in this case if the userAssignedIdentity is empty and the type is UserAssigned that would fail the validation so we set the defaultValue to “false”. Since the only allowedValue is true, this will cause template validation to fail and deployment will not continue. This way do don’t end up with a partial deployment because it never starts.

Now, this is not perfect – it’s possible for someone to override the defaultValue of the validateManagedIdentity parameter and just set it to true, but if you have a scenario where you can control the parameter input (Marketplace, Managed Apps and the forth coming templateSpec) then it works as expected. It’s helpful even if it cannot be completely enforced.

Do Not Default

The next thing to think about – a few cases where you do not to default parameters values:

Passwords for sure 😉 or any secrets – secureString, secureObject – aside from empty values to make the secret optional. I think we’ve all heard and maybe learned, about putting secrets in source control [remember templates should be under source code control] so won’t go into detail. And this is one case where the arm-ttk will flag the practice for you.
Non-secure credentials, e.g. user names, yeah it’s not a secret and it doesn’t need to be secure but it increases surface area. You might find cases where this is helpful, I’ll still say don’t do it. When you look at the this template you’ll find there are very few values that are required, so adding one for increased security is worth the cost, IMO.
Anything that’s not likely to work for the common case – this was mentioned above briefly, but don’t use a literal value for something that needs to be unique.

Ok, I thought I was going to get to variables in this post but the longer I make them the less likely I am to do them, so I’ll wrap this one here and save variables for the next post. You can always find the full template here.

As always, let me know if there’s something you’d like to see (here or via any of the social icons at the top) and I’ll add it to the list.

The Ultimate ARM Template – Part 1

In January I did a presentation at LEAP 2020 on “The Ultimate ARM Template”. The sessions weren’t recorded so when someone asked about it I offered to try to get the content out… It’s taken a while because I kept thinking how long this post would be (and procrastinating) so I eventually figured out I could just break it up to get started.

What is “The Ultimate ARM Template”? I had no idea when I started (I didn’t pick the topic for LEAP) but eventually came up with this:

A declarative statement of the goal state for your Azure Infrastructure and Configuration

OK, that describes pretty much any ARM Template, so what should the ultimate one do?

Utilize available language constructs to maximize the impact of minimal code
One Template for n Scenarios – achieving the right balance based on the life cycle of the resources (i.e. don’t just put everything in the same template)
Under Source Code Control 😉 (With unit tests (parameter files))

Why templates?

Why bother with templates? There are many other ways to do this sort of thing (Infrastructure as Code) – CLI Tools, SDKs, Terraform, even the Azure Portal and may seem easier at first glance. What’s unique about templates?

An ARM template creates a deployment of multiple resources orchestrated by Azure in the native language of the platform

A single deployment can deploy to multiple “scopes” (Resource Group, Subscription, Management Group and Tenant)
The template deployment is validated and checked for errors before any resource is created
The orchestration is captured as deployment history that can be used for debugging, auditing, logging

The nice thing about templates is that the platform (not you or tooling) handles all of this.

Key Constructs and Patterns

The patterns you can use across all templates are fairly finite and simple constructs. Even though a template is a simple declarative statement, environments are not always that simple, so expressing them cleanly and concisely can be a challenge. These constructs make your code more flexible but also more concise. Here are those constructs and patterns.

Parameters Add Flexibility

Parameters allow you to change common settings or properties in a template deployment. Default Values can be used to simplify this flexibility. When default values are considered together (often one parameter value depends on another) a template with many parameters defined may only need one or two supplied for most deployments. Smart defaults can also remove the need for conditional statements later in the template. Blank values can be used for optional parameters rather than coupling actual values with switch parameters.

Resource Defaults

Determine which properties for which resources are “flexible” and which are “standard”. Simplify the sophistication of Azure in your code. For example, VM Disks can be managed, unmanaged or ephemeral. Do you need to provide an option for all three? Also, any property on any resource can be “null” which makes is easy to flip options on and off without duplicating resource definitions.

Copying and Looping

Copy loops have been around for a while, but only recently have we extended the capability to span everything resources, properties, variables and outputs. Recently we added the ability to use zero to any of the above.

Conditionals

Binary conditions can be applied to just about any property in the JSON from a defaultValue on a parameter, to resource properties, to outputs. By applying “null” to a condition many properties can become optional, empty or simply the platform default.

Case/Switch and Object Arrays

JSON object arrays can be used to emulate the behavior of switch statements to simplify look-ups or complex conditionals. If you’ve every needed (or tried) nested if statements this will give you much cleaner, simpler code.

Template Expressions

A host of other simple programming expressions can be leveraged for string manipulation, simple arithmetic or logical operations to round out the flexibility needed for any deployment scenario.

The Ultimate Template

That gives you an idea of the tools and patterns that allow you to get the most out of the template language and templates. The template we talked about at LEAP is here along with templates that make use of that template. Another example of how templates like these can be used is as modules which you can see in the QuickStart repo. There are a few QuickStart samples that consume them if you want to see both production and consumption.

In the next post I’ll start walking through that template and how and why the constructs and patterns are applied. In the meantime, if you have any thoughts feel free to leave a comment below.