Terraform Design Considerations for Cisco ACI - Part 3¶
Estimated time to read: 24 minutes
- Originally Written: August, 2023
This is the third of a four post series
- Part 1: A Terraform recap and introduction to the design considerations
- Part 2: Terraform Considerations Related to ACI Network Connectivity Options
- Part 3: Example Designs
- Part 4: Single Source of Truth and Production Ready Environment
Topics Covered¶
- Recap
- Example Designs
- Design: A Single folder for all Terraform configuration
- Import existing configuration
- Hybrid Approach
- Design: One folder per screen in the ACI UI
- Design: Multiple folders per tenant (i.e. sharing a tenant)
- There’s no one size fits all
- Design: Grouping per workflow
- Design: A Single folder for all Terraform configuration
- Example Scenario
- ESG Segmentation Design
- Shared Services and Inter-Tenant VRF Route Leaking Design
- Additional References
Recap¶
From the last post:
- ACI provides various network connectivity options and it's helpful to understand how the connectivity choice may influence a Terraform design. In particular, what are the Terraform plan and apply times for each option.
- ACI Network Connectivity Options
- VMM Integration
- Individual Static Port Bindings
- Bulk Static Port Bindings
- Bind to a Group/Subset of Ports Across One or More Switches
- Bind to All Ports in a Single Switch
- Considerations related to network connectivity options
- Is VMM integration an option for some of the workloads? This can reduce the static port configuration which improves run time and readability
- If static ports are the only option:
- Does every VLAN need to be associated to every port on every switch? e.g. in the example of 20 VLANs on 6 switches, each with 48 ports, could this be reduced?
- Can
bulk
static port configuration be used? - Would one of the other options be suitable? i.e. associate the EPGs/VLANs to the AAEP, configure a static leaf, set and forget the static ports and use ESGs?
Examples Designs¶
This post describes some folder layouts for Terraform configuration. These are not the only options available but aim to provide a starting point.
Design: A Single Folder For All Terraform Configuration¶
This is usually the starting point when using Terraform in general. As your configuration grows and you become familiar with Terraform concepts and how the tool functions, the configuration can be designed in a much more resilient architecture.
Although it may be easiest to work with Terraform in a greenfield environment, many customers only start to adopt IaC practices when their ACI networks are up and running. Once these customers are comfortable with Terraform and past the basic examples, the question of how to adopt their existing fabric often comes up. In these scenarios there are a couple of options.
Import Existing Configuration¶
You can import the existing ACI configuration into Terraform and from then on manage all configuration through Terraform, although this option is a lengthy process and may pose some challenges. Firstly you need to write the configuration and import the state for each resource you want to manage. With Terraform 1.5 it's now possible to generate some configuration however the feature is currently experimental. You still need to write the import blocks and there is currently no dependency mapping performed between resources. (e.g. after importing a aci_vrf
the reference to the tenant is tenant_dn = "uni/tn-production"
rather than tenant_dn = aci_tenant.my_tenant.id
).
https://www.hashicorp.com/blog/terraform-1-5-brings-config-driven-import-and-checks
https://developer.hashicorp.com/terraform/language/import
Depending on the skills of the user it's also possible to write a script to help with the import process. Alternatively, Cisco CX may be able to provide similar services.
Once the configuration and state are imported you can run the terraform plan
command to show you which resources will be imported. You then run terraform apply
to import the resources (i.e. create the state file).
Info
It's important to note that Terraform only manages what you tell it to manage so be careful not to overlook any resources when importing.
If you do begin by importing existing configuration, consider starting with only a subset of resources. For example, don't import an entire production tenant rather see if there are other resources available that make more sense. There might be a development tenant you can import, or alternatively you may want to only import a subset of a development tenant.
Is there a clear boundary such as a single application profile you could import while the remaining application profiles are managed through the UI?
The second challenge is ensuring the correct process is followed once the configuration is imported. As discussed in a later section, when working with any IaC tool it's important to ensure you work from a single source of truth.
What happens if a resource is managed with Terraform and someone decides to update that resource through the UI or another tool? Two versions now exist, one that Terraform knows about and one that is configured in the UI. This can lead to unintended consequences in the future.
Some questions to consider: what team manages the ACI fabric? Is everyone comfortable with Terraform and how it works? Has the appropriate environment and process been setup e.g. central location to run the commands or hosted environment such as Terraform Cloud for Business? (see section Single Source of Truth and Production Ready Environment)
Assuming the decision is made that all ACI configuration will be managed by Terraform, it is strongly recommended to build resilience into the architecture. This may be in the form of separate folder structure as previously discussed.
Hybrid Approach¶
The hybrid option can be easier to adopt in brownfield environments. In this scenario, only a section of the ACI configuration is managed by Terraform and the rest is managed by the current method. This has a number of benefits:
- Lowers the barrier to adoption. For example one team may want to manage their ACI tenant with Terraform while others use their current methods
- Reduces the risk of something going wrong in a complete all-in-one migration
- Allows teams to test their configuration and learn from any mistakes. With this knowledge they may then be able to build a repeatable process or templates which expands to the rest of the fabric
What ACI configuration should be managed by Terraform depends on the environment. For example, in very dynamic environments where tenants are often created, rather than trying to import an existing tenant, Terraform might be used to create a new tenant which is from then onwards managed only by Terraform. This could be thought of like a greenfield deployment since a new tenant will be created.
In more static environments, you may want to look for low hanging fruit or low risk configuration. Rather than trying to import a production tenant, perhaps there's a development or testing tenant. Once imported, Terraform would then manage this lower risk resource while the more important configuration is still managed through the current methods. Over time as experience grows the rest of the configuration may be imported and managed through the one tool.
Selecting where to start can be difficult but one option is to replicate how the APIC UI is set out.
Design: One Folder Per Screen In The ACI UI¶
In this design the folder structure matches what is found in the APIC UI. For example one folder per tenant, one for common policies, one for access policies. This also provides a simpler RBAC structure where a set of Terraform user credentials can be tied to a tenant and security domain.
Design: Multiple Folders Per Tenant i.e. Sharing A Tenant¶
Some customers deploy ACI using only a couple of tenants containing 100s of application profiles and EPGs. In environments with a large amount of resources per tenant, it might not make sense to manage the configuration at a per tenant level for the reasons mentioned previously (fault domain size and apply time).
Additionally, consider the following example based on a common design of 1 EPG = 1 BD = 1 VLAN = 1 subnet design. The first screenshot shows static port bindings for 3 subnets/VLANs configured on the first 10 ports of 5 switches. The second screenshot shows the configuration for the same subnets but instead integrated with a VMM domain.
As the number of static port bindings increase so too does the amount of configuration. This can make it difficult to troubleshoot and understand what has been configured.
What would the configuration size for 100 or 200 VLANs look like? At a certain point you might want to use a more granular structure where each folder manages a subset of the tenant configuration. The exact layout depends on a number of factors.
- What is the current/planned configuration? e.g. is it a 1 EPG = 1 BD = 1 VLAN design or more of an application centric design?
- Are different teams responsible for the resources and if so, do they need to share some resources?
- Is there an obvious point to segment resources? e.g. each team is responsible for their own application profile and EPGs
- Is there a design which would make it easier to determine when to segment resources? e.g. look at the Endpoint Security Group section further down
There's No One Size Fits All¶
As has been the theme throughout this paper, there is no single design that suits every environment, only tradeoffs and considerations. In some cases the structure may be easier to spot than others.
As shown in the screenshot below, one demarcation point might be applications split by application profile, associated EPGs and BDs. Each folder would contain the resources for that application. Any shared resources could be managed through the common tenant or by a separate Terraform plan.
Info
When sharing a tenant, the application profiles should reference the tenant through a read-only Terraform datasource
or using the tenant managed: false
flag in Nexus as Code.
In a design where 1 EPG = 1 BD = 1 VLAN, the tenant may only have a single application profile containing 10s or 100s of EPGs/subnets/VLANs. This can be more diifficult to segment but think about the questions previously raised. Are different teams or applications associated to a single/group of VLANs or is everything a shared resource?
If each team or application is associated to a particular VLAN or group of VLANs the demarcation point is similar to the previous example. The difference being that each folder would only manage the EPG configuration associated to that application. All folders would reference the same tenant and application profile.
Info
Just like the previous example, since the same tenant is used the application profile should reference the tenant through a read-only Terraform datasource
(or using the tenant managed: false
flag in Nexus as Code). Additionally, the application profile should be referenced through a read-only Terraform datasource (or using the application profile managed: false
flag in Nexus as Code) since EPGs will be sharing the same application profile.
What about when the segmentation is not so obvious and teams or applications all share the same VLANs/subnets? e.g. company-website
and finance-application
both use EPG=192.168.1.0_24
and EPG=192.168.2.0_24
. The folder structure in the previous example would not work as the Terraform plans would each think they should manage the shared EPG resource.
Endpoint Security Groups
can be helpful in this design.
When implementing ACI, many customers use a design similar to a traditional network. Application segmentation can be difficult in this scenario if subnets and VLANs are shared across multiple applications.
Endpoint Security Groups (ESGs)
are a network security component in ACI. Although endpoint groups (EPGs) have been providing the network security in ACI, EPGs have to be associated to a single bridge domain and used to define security zones within a bridge domain. This is because the EPGs define both forwarding and security segmentation at the same time. The direct relationship between the bridge domain and an EPG limits the possibility of an EPG to spanning more than one bridge domain. This limitation of EPGs is resolved by using the new ESG constructs.
By using ESGs the Terraform configuration for an ACI fabric can be split into multiple scopes. One or more folders managing the network connectivity i.e. the BDs and EPGs for the relevant subnets and VLANs. Separate folders would contain the Terraform configuration for the applications. This configuration would provide secure communication between ESGs based on a range of different selectors as outlined in the ACI ESG Design Guide. See further down in the paper for an example using ESGs
Design: Grouping Per Workflow¶
The previous examples focused on replicating screens found in the UI. Although this design is similar, it focuses on grouping Terraform configuration based on a business process or workflow. This lends itself well to service catalogs and repeatable processes where a template can be used to provision a new service.
Consider a multi-tenant environment where each tenant manages their own VRFs, BDs, EPGs, access policies, L3outs, even physical leaf switches. There are multiple options to manage the Terraform configuration.
If small enough, a single folder might be used for each new service created. The configuration can be split into multiple files for readability and managability. Any resources that are common or shared amongst the services should be managed via a separate Terraform plan. Datasources can provide readonly access to the properties of the shared resources.
If the configuration for a service is too large and becomes unmanagable it could be split into multiple folders, as seen in previous examples. This can also help with very dynamic services where changes are constantly made.
Info
It's important to remember that while the structure (single or multiple folders) is the same as previous designs, the focus of this design is to capture any configuration needed to roll-out new service or workflow.
In this example the tenants own the Terraform configuration for their service. They can choose where and how to implement this folder structure. It might be their own jumphost from which they run the Terraform commands, or as shown later in the paper, it could be their own Github organization and repositories.
Grouping by service or workflow also allows you to include more context in your Terraform configuration. Further down in the paper is an example of a shared services tenant. Inter-tenant VRF route leaking and global contracts are used to allow communication to and from a user tenant to shared services or an external network. It can be quite easy to overlook configuration as it spans multiple pages and places in the UI. Defining all configuration in one place i.e. Terraform configuration files provides a number of benefits.
It allows a network admin to quickly understand how the environment is configured. This is not only useful for anyone new to the team but also when troubleshooting connectivity or security issues. Additionally, documentation can be written alongside the configuration by way of Terraform comments, YAML comments (if using Nexus as Code), or a separate file (e.g. documentation as markdown).
Example Scenario: ESG Segmentation Design¶
Consider a scenario where the network team at ACME Corp
have a production ACI fabric and tenant with one application profile managing network segments (1 EPG = 1 BD = 1 VLAN design) and one or more profiles managing the application security zones are configured (see ESG design guide for various designs). As can be seen in the following example there are 200 VLANs/Subnets with applications split across these network segments.
Based on what has been covered in the paper so far, the following folder structure might be considered for the Terraform configuration.
This layout can help minimize the Terraform plan size for the security zones. Depending on the number of VLANs/subnets and how they are connected, you may still end up with a large Terraform plan for the network segments. This is where the calculations and sample apply times from the sections above can be used to help determine if the network-segments
application profile should be managed by a single plan or multiple Terraform plans.
The ACME Corp team need to determine the ACI network connectivity to use. It could be a single option or a mixture depending on endpoints connected to the fabric.
If the 200 network-segments
are configured using VMM integration or associated to an AAEP, the initial apply time should be anywhere between 10 and 15 minutes. Adding a new segment would be approximately 45 seconds to 1 minute based on the test results above.
If individual static ports are configured and the 200 segments are defined on the first 4 ports of 5 switches, based on the calculation and testing above it may be 1 to 2 hours to apply the initial configuration.
If bulk static ports are used, the initial and subsequent apply times can be greatly reduced.
The teams design uses ESGs to separate network-segments
from security-zones
. Once the networking connectivity is in place, the endpoints can change the security zone (ESG) without changing the underlying network configuration, potentially resulting in a very static network-segment
Terraform configuration i.e. set and forget the initial subnet/VLAN configuration.
The ACME Corp team will need a mixture of connectivity including VMM and static port configuration for the 200 segments. They are investigating two options to structure the network-segments
Terraform configuration.
Although the initial apply time could potentially take up to 2 hours (if they use large amounts of individual static ports), they are in agreement that once the network connectivity is in place it will rarely change. They decide to simplify the folder structure by placing all network-segment
Terraform configuration in a single folder. This will be split into multiple files containing ranges of BDs/EPGs for readability and they will use VMM/bulk static bindings where possible to reduce the run time.
The ACME Corp team also manage a second ACI fabric which is used as a development/test/support environment. It also has a single tenant but is a much more dynamic fabric. The network-segments
and security-zones
(ESGs) design is also used with this environment however the subnet/VLAN configuration is sometimes changed daily. The decision is made to split the network-segments
into five folders with each folder containing 40 subnets/VLANs. A sixth Terraform configuration will manage the shared resources such as Tenant, VRF, and Application profile for the network-segments
.
Although this will increase the amount of folders and files to manage, it provides a few benefits for this environment. Firstly, changes impact a smaller number of resources meaning the apply and refresh times should be reduced. Secondly, since the configuration is split across multiple folders, admins can make changes in parallel without potential state locks or configuration conflicts. Finally, with a dynamic environment operated on by many people at once, there always the possibility for mistakes to be made. Having smaller fault domains reduces the impact of a configuration or state errors.
As seen in the screenshot below (right-hand side), there is a parent network-segments
folder with sub-folders containing the Terraform configuration for the range of subnets/VLANs that sub-folder manages. The same applies for the security-zones
. This is for readability/maintainability and sub-folders are not required.
Example Scenario: Shared Services and Inter-Tenant VRF Route Leaking Design¶
As discussed previously in the Grouping Per Workflow
design, grouping by service or workflow allows you to include more context in your Terraform configuration.
The network team at ACME Corp
are back and this time they have a new fabric design. After reading through the great Cisco ACI Endpoint Security Group (ESG) Design Guide they have decided to implement a shared services tenant. A number of additional tenants for different application teams are also created. To support network connectivity the routes will be leaked between tenants VRFs. For security policies, global contracts will be exported from each application tenant and consumed within the shared services tenant.
While this is fairly straight-forward as outlined in the following diagram, the implementation can be more difficult. The setup spans multiple pages and places in the UI making it easy to overlook a configuration step. For anyone new to the environment, it can be challenging to understand what is configured and how the networking and security policies function.
Capturing all relevant configuration for the shared services design in a single place (i.e. Terraform configuration files) provides a number of benefits.
Firstly it allows a network admin to quickly understand how the environment is configured. This is not only useful for anyone new to the team but also when troubleshooting connectivity or security issues. Documentation can be written alongside the configuration by way of Terraform comments, YAML comments (if using Nexus as Code), or a separate file (e.g. documentation as markdown). This documentation can help to shed light on why this design was chosen, what steps are required to provision a new service, and helpful troubleshooting tips.
There's also the potential to automate provisioning of new application tenants and provide standardised templates. This reduces the risk of human error and helps to decrease the time in standing up new services.
As shown in the example folder structure below, the network admin team are responsible for and manage the policies associated with the shared services tenant. This includes the shared L3Out profiles, external EPGs, consumed contracts, and all access policies such as switch/interface profiles, AAEPs, VLAN pools, and domains. Shared services such as virtual routing using the CSR, and virtual firewalls using L4L7 service graphs are also provided. This team has admin access to the fabric.
The application teams are each responsible for their tenant configuration. They are provided with templates to get started and can access services and external networks through the shared services tenant. These teams and their users have read/write access only to the tenant to which they belong. They have read only access to the rest of the fabric.
Finally, it's decided that this fabric will be managed through a hybrid approach, similar to what was discussed earlier in the paper. The admins responsible for the shared services tenant and overall fabric are very experienced with Terraform and ACI. They are comfortable with the way it functions have workflows in place to manage the services.
Some of the application teams are also familiar with Terraform and decide to manage their tenants in the same way. They maintain separate folders and relevant Terraform configuration files for their tenants. However, not everyone wants to write code or configuration files. Some of the teams already have documentation and processes built for managing their tenant through the UI. They are comfortable with knowing where to look, what to configure, and how to troubleshoot.
The hybrid approach means all teams teams can consume the resources provided by the shared services tenant but can manage their own resources using the tools they know best.
Onto the next chapter¶
Part 4 - Single Source of Truth and Production Ready Environment
Additional Referencesr¶
- Terraform Provider for ACI
- Nexus as Code Project
- Cisco ACI Endpoint Security Group (ESG) Design Guide