Skip to content

Terraform Design Considerations for Cisco ACI - Part 4

Estimated time to read: 16 minutes

  • Originally Written: August, 2023

This is the fourth of a four post series

Topics Covered

  • Recap
  • Design Consideration: Single Source of Truth and Production Ready Environment
    • APIC Snapshots
    • Moving From Local Developer Mode to a Version Control System
    • Tracking Changes
    • Common Git Terminology
    • Additional Benefits
    • Don’t Commit Everything
    • The State File and Git
    • Centralized State File
    • Git Snapshots and Rollbacks
  • Summary
  • Additional References

Recap

From the last post:

  • There are multiple options for structuring terraform configuration
  • Single folder for all Terraform configuration ​- ​​​​​​Folder per ACI UI screen - e.g. one for common, one for each tenant, one for access policies
    • Multiple folders per tenant
      • i.e. sharing a tenant
    • Folder per workflow
  • Some of the reasons for organizing configuration and state files across different folders are:
    • Fault domain size and deployment time
    • Readability and troubleshooting
    • Static vs dynamic environments
    • Roles and responsibilities i.e. RBAC

Design Consideration: Single Source of Truth and Production Ready Environment

APIC Snapshots

When working with Infrastructure as Code one of the best practices is to have a single source of truth. Having multiple configuration methods (e.g. manually configuring some resources) may cause your infrastructure to drift from the desired configuration in your IaC files. This also applies to features such as APIC snapshots as they can be thought of as another source of truth.

First look at what can happen in a scenario where a tenant is configured with Terraform and at the same time a snapshot is taken with APIC. The following image outlines the process and what state exists across Terraform, ACI, and the APIC snapshots

In this scenario:

  • Two bridge domains are created using Terraform. A snapshot is then taken on the APIC. Following that the Terraform configuration is updated to remove the 192.168.20.0 bridge domain and changes are applied to the ACI fabric.

  • At this stage there are no issues. Terraform has one bridge domain (192.168.10.0) as the desired configuration and is tracking that same bridge domain in the state file. The ACI fabric also only contains that single bridge domain.

  • The problems begin when you roll back the snapshot that you previously made. When this happens the ACI fabric will again have two bridge domains (192.168.10.0 and 192.168.20.0) but have a look at who "owns" the configuration.

  • The 192.168.10.0 configuration and state is still managed by Terraform however the 192.168.20.0 bridge domain was configured by the APIC snapshot and no longer exists in the Terraform config or state files. Therefore Terraform has no idea this configuration exists and cannot manage it.

  • To have Terraform manage the 192.168.20.0 bridge domain would be the same as an import from a brownfield environment

Although this was only a very simple example you can imagine how this could easily have much wider impacts on your ACI environment. Always try to avoid multiple sources of truth when working with Infrastructure as Code. In this case think very carefully and understand the outcomes when using APIC snapshots alongside an IaC process.

Version Control Systems can be used to implement the snapshot/rollback functionality and are discussed later in the paper.

Moving From Local Developer Mode To A Version Control System

Throughout the paper so far, the concept of a folder containing the Terraform configuration has been discussed. A folder on a laptop is often a great starting point to store and run Terraform configuration. This works well to test configuration, however may not scale to meet the needs of a production environment. For example, if two admins are working with the same resources, who manages the state file? Where are the ACI credentials stored? How can the rest of the team have visibility of the changes? How do you rollback to a previous version if a mistake is made?

A Version Control System such as Git can be used to track and manage changes made to Terraform configuration by multiple people or teams. If a mistake is made, the configuration can be rolled back to a previous version.

The structure and layout concepts used in the paper thus far can also apply to Git. In a previous example the segmentation the Terraform configuration for each APIC UI was contained in a separate folder. When using Git or another VCS, this folder might be an individual repository as shown in the following screenshot (each folder contains a hidden .git folder to track changes)

A repository is where your files, in this case Terraform .tf files, are stored. The changes you make to those files are tracked and a new version or snapshot of the files can be created. Note that the repositories on the right-hand side do not contain all the files shown in the tree on the left-hand side. This is intentional and will be covered later in the paper.


Tracking Changes

You can see a UI on the right-hand side of the screenshot above. This is the Github UI.

Github, Gitlab, and Bitbucket are some examples of centralized hosting services you can use to store your repositories and configuration files. Git is the protocol (CLI) which tracks your file changes and pushes those changes to the central repositories.

In the following diagram, the dark blue boxes (working directory, index, and local repository) represent a local machine where file changes are made. This might be your laptop or a common jumphost/configuration server. The light green box represents the centralized hosted repository such as Github.

When a change is made, the local file needs to be added and committed. A helpful message about what was changed should also be added. Think of this stage like taking a snapshot of the new configuration.

That snapshot/change can then be sent or pushed to Github.

Since the changes are pushed to a central location, any other team members with access to the repository can also view the change. This provides greater visibility and collaboration across one or more teams.

Common Git Terminology

  • Repository (Repo): A folder for storing version controlled files
  • Working Directory: The visible directory and its contents
  • Versioned Files: Files you have asked Git to track
  • Un-Versioned Files: Files in your working directory not tracked by Git
  • Commit: Snapshot in time (of your version controlled files)
  • Branches: A safe place for you to work

Additional Benefits

Not only does Git provide you with versioning capability, but services such as Github, Gitlab, or Bitbucket provide a number of other benefits.

Organizations often put change requests or approval processes in place to reduce the risk of downtime. Similar approval structures can be implemented with these centralized platforms, including tracking comments and discussions related to the changes.

Infrastructure as Code is not just writing Terraform configuration. Apart from an approval process, when working in a production environment having a way to automatically format and test configuration changes help catch any problems before they impact an environment.

With tools such as Github Actions, Gitlab CI/CD, and BitBucket CI/CD, you can create a workflow to provide these checks automatically as demonstrated in the following guide.

https://developer.cisco.com/docs/nexus-as-code/#!cicd-example/introduction

Don't Commit Everything

The following example shows three sets of individual Terraform configurations which are each stored in their own repository on Github. You can see from the screenshots that some files have not been pushed to the central Github repo and this is intentional.

The .terraform folder contains any modules or Terraform provider that is used in the configuration. For example when using Nexus as Code this would include all the NaC Terraform module. As suggested by the Terraform documentation the .terraform folder (including the .terraform.lock.hcl file) should not be committed to your Git repository.

https://developer.hashicorp.com/terraform/cli/commands/get

The terraform.tfvars file can be used to store data/values used in the Terraform plan. In many cases the data may be related to a specific environment e.g. tenant-production data vs tenant-development data. Because the .tfvars file may contain sensitive information, it's recommended that you don't commit this to your repository.

https://developer.hashicorp.com/terraform/tutorials/configuration-language/sensitive-variables

As a general best practice you should never store secrets in your repository but instead use a more secure setup. For example using Hashicorp Vault or Github Actions Encrypted Secrets

https://docs.github.com/en/actions/security-guides/encrypted-secrets

You can use a .gitignore file to ensure that these files and folders are not added when you commit changes to the Git repository. A gitignore file allows you to specify which files should be tracked by Git and which files should be ignored. For example, if you create a file named .gitignore in your repository and add a newline containing, *.tfvars , if any files end with .tfvars in the files that you add and commit, Git will leave out those files when pushing them to the Git repository (e.g. Github)

Here is Github's recommended .gitignore file for Terraform.

https://github.com/github/gitignore/blob/main/Terraform.gitignore

The State File And Git

You might have noticed that the terraform.tfstate file was also not stored in the central repository. The first reason is that there may be sensitive values contained within the state file, similar to the terraform.tfvars file above. Additionally, as was demonstrated with the Terraform and APIC Snapshot scenario, you may run into challenges if you have more than one source of truth. Consider the following scenario.

There are two admins configuring an ACI fabric through Terraform. The configuration, including the state file, is stored in a Github repository.

Admin A creates a new bridge domain, 192.168.10.0, applies the configuration and pushes the configuration changes and state file to Github.

Admin B plans to add a new BD so they pull down the configuration and state from Github to their local machine. They have other meetings and plan to make the change later in the day.

Meanwhile Admin A adds a second BD, 192.168.20.0, applies the configuration and pushes the configuration changes and updated state file to Github.

Admin B finally gets some time to create the BD they require. Since they already pulled the configuration and state earlier in the day they just add a new BD, 192.168.30.0.

The Terraform plan runs successfully and the new BD is created in ACI.

When they try to push the change to Github however they receive a warning that "Updates were rejected because the remote contains work that you do not have locally."

They didn't realise that Admin A had made changes in between. They've forgotten to pull the latest configuration and state from the Github repository before making their changes.

There are now two separate configuration files and two separate state files. These files must be merged as both network admins require the bridge domains they've configured. Depending on how much configuration was changed this can be a challenging process. The Terraform statefile can also be complex to understand, making the task more difficult.

Pulling the latest updates from Github may have avoided the issue however mistakes can happen.



Centralized State File

To overcome this issue the state file can be stored in a central remote backend rather than in the Git repository. Terraform uses a backend configuration to determine where and how state is loaded, and how an operation such as apply is executed.

Some of the benefits of remote backends include:

  • Collaboration: Backends can store state remotely and protect that state with locks to prevent corruption. Some backends such as Terraform Cloud even automatically store a history of all state revisions.
  • Removes sensitive information: Services such as Terraform Cloud can encrypt state files when storing them and provide Role Based Access Control to restrict who has access to view a state file.
  • Remote operations: Allow operations to be migrated from a local laptop into a central platform using tools such as the CI/CD examples previously discussed.

https://www.terraform.io/docs/backends/index.html

Looking at the scenario again, the same process is followed however since the state file is centrally managed only the Terraform configuration files must be merged.

Git Snapshots And Rollbacks

In the APIC snapshot example, the Terraform code used to apply the configuration could be paired with a VCS to implement snapshots and rollbacks similar to what you might experience with APIC.

The git revert command is commonly used to revert changes made to a repository without losing the commit history. Think of a commit as a version of your Terraform configuration. When you run git revert, a new commit (version) will be created that undoes the changes made in the specified commit (version), effectively reverting the repository back to the way it was before that commit (version) was made. This allows you to undo changes while preserving the history and allowing for collaboration with others.

Maintaining the history can be helpful when performing root cause analysis and reporting back to relevant stakeholders.

Imagine the following example. The first Terraform configuration contains two bridge domains, 192.168.10.0and 192.168.20.0. This config is updated and the 192.168.20.0 BD is removed. Terraform applies the configuration on ACI and the config file is committed and pushed to the Github repository. It's later determined that 192.168.20.0 was deleted in error and should be rolled back. The git revert HEAD command is used, where HEAD is used to reference the last commit i.e. the incorrect configuration. The configuration which was deleted will be added back to the config file and the Terraform plan can then run again. Since a new commit/version was made containing the fixed configuration file, this will be pushed to the central Github repository.

Summary

In this paper, we have discussed key points that network administrators should consider when migrating to Terraform to manage Cisco ACI. Remember that there is no one design that suits every environment, only tradeoffs to consider.

Given that most administrators may not have a background in development, it is crucial to keep the design simple. Start with a phased approach, gradually familiarizing yourself with Terraform before converting all infrastructure elements.

It is also important to consider production practices such as RBAC and backups to ensure security and resilience in your infrastructure.

Throughout the design process, it is essential to ask questions about the management of the infrastructure, the number of teams involved, the rate of change in the environment, the current process of deploying network changes, and the programming experience of team members. These considerations will help determine the best layout for your Terraform setup.

Additional References

Comments