Nexus Dashboard Fabric Controller - What features help me manage a data centre fabric?¶
Estimated time to read: 30 minutes
- Originally Written: July, 2024
Info
This post uses the VXLAN as Code project which makes it very easy to configure NDFC fabrics through a YAML file . More details and examples can be found at https://github.com/netascode/ansible-dc-vxlan-example
Want more details?
My colleague Yves has been blogging on DC fabrics and Data Centre Interconnects for many years. He has also put together some great NDFC content on his site and some videos on Youtube.
If you looked at the Cisco portfolio you may see a few options for configuring data centre switches and fabrics.
- Standalone with NX-OS:
- For those customer who just need a couple of switches or would prefer to manage the devics with their own processes e.g. through CLI or through Ansible scripts they've built
- NDFC:
- For customers who want a UI to manage the switches but also want an easy way to configure VXLAN and other network fabric types. NDFC also provides best practice configurations, validated templates, and automation capabilities. For example as part of the AI/ML Network Blueprint
- ACI:
- For customers who want a plug and play VXLAN fabric and all its benefits without having to ever think about how to configure the underlay. See this link for more details
This post gives an overview of some of the NDFC features which help simplify data centre network configurations and management.
The features¶
- Templates
- VXLAN as Code
- vPC configuration
- Syncing configurations
- Git integration
- Inventory and monitoring
- VM visibility
- Software management
- Fabric and device backup and restore
- Easy RMA
Templates¶
Templates for fabrics¶
Although you can configure individual "standalone" devices, NDFC provides many built-in templates which help you to easily deploy network configuration across multiple switches and sites. This includes traditional 3-tier architectures as well as VXLAN EVPN fabrics and VXLAN Multi-site deployments.
Fabric Type | Description |
---|---|
Data Center VXLAN EVPN | Fabric for a VXLAN EVPN deployment with Nexus 9000 and 3000 switches. |
Enhanced Classic LAN | Fabric for a fully automated 3-tier Classic LAN deployment with Nexus 9000 and 7000 switches. |
Campus VXLAN EVPN | Fabric for a VXLAN EVPN Campus deployment with Catalyst 9000 and Nexus 9000 as border gateways switches. |
BGP Fabric | Fabric for an eBGP based deployment with Nexus 9000 and 3000 switches. Optionally VXLAN EVPN can be enabled on top of the eBGP underlay. |
Custom Network | Fabric for flexible deployments with a mix of Nexus and Non-Nexus devices. |
Fabric Group | Domain that can contain Enhanced Classic LAN, Classic LAN, and External Connectivity Network fabrics. |
Classic LAN | Fabric to manage a legacy Classic LAN deployment with Nexus switches. |
LAN Monitor | Fabric for monitoring Nexus switches for basic discovery and inventory management. |
VXLAN EVPN Multi-Site | Domain that can contain multiple VXLAN EVPN Fabrics (with Layer-2/Layer-3 Overlay Extensions) and other Fabric Types. |
Classic IPFM | Fabric to manage or monitor existing Nexus 9000 switches in an IP Fabric for Media Deployment. |
IPFM | Fabric for a fully automated deployment of IP Fabric for Media Network with Nexus 9000 switches. |
Multi-Site External Network | Fabric to interconnect VXLAN EVPN for Multi-Site deployments with a mix of Nexus and Non-Nexus devices |
External Connectivity Network | Fabric for core and edge router deployments with a mix of Nexus and Non-Nexus devices. |
- You can create one or more fabrics by selecting the
Actions
button on theManage
->Fabrics
page.
- Provide a name for the fabric
- Then select the type of fabric to deploy. As you can see there are many different types.
-
You're then presented with a form which includes one or more tabs and inputs depending on the fabric type selected. For example in the VXLAN EVPN fabric you see inputs such as the underlay/overlay parameters, vPC configuration, and management configuration.
-
Many of the fabric templates will have default inputs which make it easier to get started. For example in the VXLAN EVPN fabric template you only need to put in the BGP ASN and all other inputs will use the default value.
- Once the fabric definition has been configured you can add the switches. In this case I'm just using a manual discovery based on the
mgmt 0
IP address and credentials of the switch.
Brownfield import
Notice in the screenshot below that the Preserve config
flag is set to false. This means the switch config will be wiped as part of the onboarding process so that the new fabric configuration can be pushed to the device.
NDFC can also import and manage an existing VXLAN EVPN fabric which you have configured manually through the CLI or another method. This is known as a brownfield import.
When you import an existing VXLAN fabric you need to ensure the Preserve config
flag is set to true. This ensures that the current configuration of the switches will be retained.
For more information on this process see the Managing a Brownfield VXLAN BGP EVPN Fabric configuration guide
- You can set the role of the device which determines the configuration that is deployed. For example a VXLAN leaf or spine role.
- When you're ready to push the config to the switches select the
Actions
button at the top of the page and thenRecalculate and Deploy
. This will calculate the configuration change and deploy to all switches in the fabric
- You can view the pending changes
- And finally deploy all the changes to the devices
Templates for individual devices¶
Templates can also be helpful to manage individual devices. This could be the entire configuration for a switch or just specific sections of the configuration.
- For example, you may have a base configuration you want to apply to all access switches which enables certain features, configures VLANs, configures security settings etc. In the following screenshots I've created a new template which applies to the whole device and used
TEMPLATE_CLI
as the content type since this is just the standard NX-OS CLI config
- Placeholders (in the format
$$placeholder$$
) can be used to customize the configuration with variables or user input
- Since the example above is a device config we can apply it at either the fabric level (for multiple switches) or at the individual switch level. In both cases you can select the
Policies
tab and thenAdd Policy
- The policy is selected and any custom user input can be provided
You may also want to use policies for a subset of the configurations.
- For example, imagine you configured different access switches using the base configuration mentioned above. You could then apply interface configurations for one or more interfaces across one or more switches. Here I am editing the required interfaces and applying an out-of-the-box policy (
int_access_host
) which accepts user input. This policy also allows you to add freeform config if needed
For more information see the NDFC Templates Configuration Guide
VXLAN as Code¶
You might have seen the Nexus as Code project which simplifies the deployment of ACI fabric configuration. You can achieve something similar with NDFC using the VXLAN as Code project. This project uses the Network as Code DC VXLAN Ansible Galaxy Collection to simplify the deployment of a VXLAN fabric using NDFC as the controller.
Rather than configuring the fabric by clicking through the UI like we've just seen,with VXLAN as Code you can define all your configuration in YAML files. Ansible will then parse these YAML files and apply the configuration to the fabric. This includes discovery/registration of the switches.
There is also validation built into the collection so you can catch any misconfigurations before they end up in production.
Example Config
These examples contain the deployment details you would find in the fabric template, the switch discovery (management IP and serial number), and the overlay networks.
This is NOT a full working configuration but just used to demonstrate what is possible.
global.yaml
underlay.yaml
---
vxlan:
underlay:
general:
routing_protocol: ospf
replication_mode: multicast
fabric_interface_numbering: p2p
subnet_mask: 31
underlay_routing_loopback_id: 0
underlay_vtep_loopback_id: 1
underlay_routing_protocol_tag: UNDERLAY
underlay_rp_loopback_id: 250
intra_fabric_interface_mtu: 9216
layer2_host_interfacde_mtu: 9216
unshut_host_interfaces: true
ipv4:
underlay_routing_loopback_ip_range: 10.0.0.0/22
underlay_vtep_loopback_ip_range: 10.100.100.0/22
underlay_rp_loopback_ip_range: 10.250.250.0/24
underlay_subnet_ip_range: 10.1.0.0/16
ospf:
area_id: 0.0.0.0
multicast:
underlay_rp_loopback_id: 251
underlay_primary_rp_loopback_id: 0
switches.yaml
---
vxlan:
topology:
switches:
- name: site-1-spine1
serial_number: FF123456789
role: spine
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.1
- name: site-1-spine2
serial_number: FF123456780
role: spine
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.2
- name: site-1-leaf1
serial_number: FF123456788
role: leaf
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.3
- name: site-1-leaf2
serial_number: FF123456787
role: leaf
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.4
- name: site-1-leaf3
serial_number: FF123456786
role: leaf
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.5
- name: site-1-leaf4
serial_number: FF123456785
role: leaf
management:
default_gateway_v4: 192.168.100.1
management_ipv4_address: 192.168.100.6
overlay.yaml
---
vxlan:
overlay_services:
vrfs:
- name: vrf-01
vrf_id: 150001
vlan_id: 2001
attach_group: all
- name: vrf-02
vrf_id: 150002
vlan_id: 2002
attach_group: leaf2
vrf_attach_groups:
- name: all
switches:
- { hostname: site-1-leaf1 }
- { hostname: site-1-leaf2 }
- { hostname: site-1-leaf3 }
- { hostname: site-1-leaf4 }
- name: leaf1
switches:
- { hostname: site-1-leaf1 }
- name: leaf2
switches:
- { hostname: site-1-leaf2 }
networks:
- name: 192_168_10_0_24
vrf_name: vrf-01
net_id: 130001
vlan_id: 2301
vlan_name: 192_168_10_0_24_vlan2301
gw_ip_address: "192.168.10.254/24"
attach_group: all
- name: 192_168_20_0_24
vrf_name: vrf-01
net_id: 130002
vlan_id: 2302
vlan_name: 192_168_10_0_24_vlan2302
gw_ip_address: "192.168.20.254/24"
attach_group: leaf1
- name: 192_168_30_0_24
vrf_name: vrf-02
net_id: 130003
vlan_id: 2303
vlan_name: 192_168_10_0_24_vlan2303
gw_ip_address: "192.168.30.254/24"
attach_group: leaf2
network_attach_groups:
- name: all
switches:
- { hostname: site-1-leaf1, ports: [Ethernet1/1, Ethernet1/2] }
- { hostname: site-1-leaf2, ports: [Ethernet1/1, Ethernet1/2] }
- name: leaf1
switches:
- { hostname: site-1-leaf1, ports: [] }
- name: leaf2
switches:
- { hostname: site-1-leaf2, ports: [Ethernet1/48] }
vPC configuration¶
NDFC also helps deploy vPC configuration on a pair of switches.
- If you're using the DC VXLAN EVPN Fabric template you can find the vPC settings in the
vPC
tab of the template wizard. These values are prepopulated to simplify the deployment but can be changed if needed
- Select the first switch and then
vPC Pairing
from theSwitch
tab
- Select the second switch and input the vPC settings. If the settings were configured as part of the template wizard (e.g. in the DC VXLAN EVPN template), you will only need to select the switch
- After running
Relcalculate and Deploy
to push the new vPC configuration you can monitor the vPC within thevPC Overview
Syncing configurations¶
Whether you're using NDFC, Ansible/Terraform, or something other method to manage your devices, it's always good to have a single source of truth/centralized configuration. Having one place you make changes reduces the risk of conflicts and unexpected errors and ensures that the config is the most accurate and up-to-date.
NDFC implements a configuration compliance check to reduce or eliminate configuration drift. NDFC notices when changes are made manually to a device (e.g. through the CLI) and can redeploy the correct config to that device.
However in some cases there may be a need to pull config that was changed on a device (e.g. through the CLI) back into NDFC. This can be achieved through a host_port_resync
policy.
Interface config sync¶
As mentioned above, it's always best to have a single source of truth for where configuration is applied, however if needed NDFC can pull in config that was separately applied on a device.
What can be synced?
The config sync currently only applies to changes made to interfaces configuration. See the Syncing up Switch Interface Configuration
section of the following guide for caveats, guidelines, and limitations
- From the
Fabric
orSwitch
level select thePolicies
tab andAdd Policy
- Add the
host_port_resync
policy and save
- Make a change on the device interface
Recalculate and Deploy
from the fabric or switchActions
menu. You should see the Host Port Resync pulling the latest config from the devices and then creating host policies
- Confirm that the changes are reflected in the NDFC interfaces tab
- Since the config has now been pulled into NDFC and is part of the configured policies, when you run a new
Recalculate and Deploy
you should not see any pending changes
Configuration compliance¶
See the Configuration Compliance config guide for more details.
- You can see in the following image that I delete the BGP configuration from
leaf 1
.
- Configuration Compliance checks run every 60 minutes by default but this can be adjusted if needed. Once the check completes the main dashboard shows an
Out-of-sync
device
- I can preview the config and see the side by side comparison
- To return the fabric back to the intended config I can just recalculate and deploy the change
Git integration¶
Git integration
My lab is using the NDFC 12.2 release which only supports a single Git integration for all templates. It's not currently possible to configure a Git repo per individual template/policy
Exporting to Git¶
As seen in the previous sections, NDFC ships with built-in templates and policies you can use to deploy many types of networks. Although you can backup the NDFC configuration, in some cases you may want to export or import only the custom templates you have built without having to import a complete NDFC configuration.
With NDFC you can export a policy to your local machine or export/import using Git. This makes it very easy to manage custom templates/policies, including version control and sharing with others.
- First create a new Git repository. I have selected the checkbox to create a
README.md
file as this will create and set the branch, in my casemain
. You should also edit the README to explain the purpose of the repository.
- You'll need to provide a token so that NDFC has access to push and pull from the repo. In this case I'm using Github so I created a Personal Access Token
- Check the
Actions
button in the templates andExport to Git
. You'll need to provide the URL to the repo, the branch, and the token you created. If everything is successful you should see a success message.
Having problems with cloning the repo?
- Is the URL correct?
- Is the token correct and does it have read/write access to the repo?
- Is the branch name correct?
- Have you initialized the repo? e.g. added a README.md and created the
main
branch
- The image on the left-hand side shows a brand new empty repo (excluding the README file), while in the right-hand image I created an additional folder called
templates
in the repo which I can use to store the template files. Either option is fine, it just depends on how you want to use the repo (e.g. you might have additional files/folders used outside of NDFC).
- Clicking
Export
pushes the template to your Git repo and you should see something similar to the following config
Importing from Git¶
You can import from a Git repo using the same process.
- Select the
Actions
button and thenImport from Git
Inventory and monitoring¶
Some customers take a different approach with NDFC. While the devices are still configured through the CLI, they use NDFC as a centralized management platform for inventory reporting, visibility and monitoring, software management, config backups, and other day to day operations tasks.
- For inventory you can select one or more devices and select
Export
to download a CSV with all inventory details
- The main fabric dashboard gives you a quick overview of how a site or fabric is operating and where you may need to focus your attention
VM Visibility¶
Whether or not you're using NDFC to manage the config on your devices, it's always helpful to have an understanding of how endpoints connect to the network. There are a few options available in NDFC including the endpoint locator, virtual machine manager integration with VMware, and Kubernetes cluster visibility. You're also able to integrate NDFC with an IPAM to view subnets mapped to a VXLAN fabric created by NDFC, including subnet utilization.
See the following docs for more information:
Software management¶
Managing devices through a central controller (e.g. NDFC) not only allows you to make config changes to multiple devices at once, but also centralize day to day tasks such as switch firmware management. Here are a couple of the key features NDFC provides when it comes to device software management.
- Perform upgrades across multiple devices in parallel
- Validate the firmware is correct
- Stage the firmware (i.e. copy it to the switch bootflash) which can reduce the upgrade time required in a maintenance window
- Run pre- and post- upgrade snapshots of the switch configuration to confirm if anything has changed
See the Nexus Dashboard Fabric Controller Fabric Software whitepaper for more details.
- First navigate to the
Manage
->Fabric Software
dashboard. You'll see the fabrics that you are managing, the number of switches, the fabric status etc
- Clicking on
Prepare
allows you to select an image policy which applies to all switches in this fabric. This ensures a consistent image across all switches
- When you upload the device firmware an image policy will automatically be created for you. You could also create a new one manually
- Clicking the
Update plan
button from the fabric software main dashboard will show you the update plan for the fabric. You should see the switches placed into different groups. You're free to determine in which order you upgrade your devices but in many cases customers use anodd
andeven
group i.e. half the spine/leaf switches in one group and half in the other. You can also manually move switches to a different group
- If needed you can also manage the device firmware individually from the
Devices
tab. TheValidate
button from theActions
menu will check if the image is complete, it it's valid for the specified hardware, and if the upgrade can be non-disruptive.
- The
Stage/distribute
button will push the firmware to the devices in that group. TheInstall
button will install the new firmware based on the attached policy
- You can click
View Logs
if you need to see the logs on the device for any of the stages
Fabric and device backup and restore¶
In the Configuration Compliance section we saw how you can recalculate and deploy configuration if the switch is out of sync.
It's also possible to backup switch configuration and restore from a previous snapshot. You can achieve this in two ways.
- Scheduled Configuration Backup as part of the fabric configuration
- Alternatively you can manually take a configuration backup.
- When restoring from a previous configuration you have the option to mark a backup as golden. A golden backup means it won't be deleted even after reaching the maximum number of backups per fabric
- You can either restore a full fabric configuration from backup (see the screenshot above) or you can restore an individual device
Individual switch restore
NDFC supports restoring configurations for individual switches when using the following fabric types
- Custom Network
- Classic LAN
- Multi-Site External Network
- External Connectivity Network
- The wizard will show you which devices are out-of-sync. i.e. the configuration at the time of the backup is different from the current config
- You can also view the different between the two configs
- Finally, clicking
Restore intent
will revert the configuration of the device(s) to the selected backup configuration
Easy RMA¶
Sometimes a device might need to be replaced, for example if it is no longer functioning. In NDFC there are a couple of options to make the replacement as simple as possible.
- Provision RMA with Power On Auto Provision (POAP)
- Manual RMA
What is an RMA?
Return Material Authorization or Return Merchandise Authorization, RMA, is the process of returning a device for a refund or to receive a replacement device
https://en.wikipedia.org/wiki/Return_merchandise_authorization
You can also put the switch into maintenance mode before the replacement.
When you place the switch in maintenance mode, all configured Layer 3 control-plane protocols are isolated from the network. Directly connected routes are not withdrawn or modified during this state. When normal mode is restored, the advertisement of all routes is restored.
Provision RMA with Power On Auto Provision (POAP)¶
You may have seen the following when you load a new Nexus 9000 that doesn't contain a startup config. This is the Power On Auto Provisioning (POAP) feature which automates the process of configuring a switch for the first time. When a switch using POAP boots and doesn't find the startup config it sends a DHCP DISCOVER
message to get an IP and also find the IP of a TFTP server. It then downloads and installs the appropriate software image and configuration file.
The NDFC server can be configured as the DHCP server or an external one can be used. When you need to replace the switch, you first have to physically swap it out and connect it to the same interfaces. In NDFC you then select the actions button for the switch and then Provision RMA
. If the DHCP DISCOVER has been successful you should see it appear in the list of replacement switches.
Note: in this example I'm just showing the manual option as I don't have it setup correctly in my lab.
For more details see the Zero-Touch Provisioning of VXLAN Fabrics using Inband POAP with NDFC whitepaper.
Manual RMA¶
NDFC can still help with the replacement of devices if you don't want to setup POAP. In this case you first need to configure the new switch with the same management IP address/route, hostname, and username/password. Then swap out the old switch with a new one and keep the cabling the same. You can Rediscover
the switch again. As it's a brand new switch, when you run a Recalculate and Deploy
it should push all the config again and come online.
- The old switch has been swapped and is now unreachable
- I've cabled in the new switch and configured the same management IP, hostname, and username/password. I can now rediscover the switch again because the IP and credentials haven't changed
- Once rediscovered I can push down the config and as you can see here, it's 512 lines because the switch has been wiped so the entire config will be pushed