Skip to content

Nexus Dashboard Fabric Controller - What features help me manage a data centre fabric?

Estimated time to read: 30 minutes

  • Originally Written: July, 2024

Info

This post uses the VXLAN as Code project which makes it very easy to configure NDFC fabrics through a YAML file . More details and examples can be found at https://github.com/netascode/ansible-dc-vxlan-example

Want more details?

My colleague Yves has been blogging on DC fabrics and Data Centre Interconnects for many years. He has also put together some great NDFC content on his site and some videos on Youtube.

If you looked at the Cisco portfolio you may see a few options for configuring data centre switches and fabrics.

  • Standalone with NX-OS:
    • For those customer who just need a couple of switches or would prefer to manage the devics with their own processes e.g. through CLI or through Ansible scripts they've built
  • NDFC:
    • For customers who want a UI to manage the switches but also want an easy way to configure VXLAN and other network fabric types. NDFC also provides best practice configurations, validated templates, and automation capabilities. For example as part of the AI/ML Network Blueprint
  • ACI:
    • For customers who want a plug and play VXLAN fabric and all its benefits without having to ever think about how to configure the underlay. See this link for more details

This post gives an overview of some of the NDFC features which help simplify data centre network configurations and management.

The features

Templates

Templates for fabrics

Although you can configure individual "standalone" devices, NDFC provides many built-in templates which help you to easily deploy network configuration across multiple switches and sites. This includes traditional 3-tier architectures as well as VXLAN EVPN fabrics and VXLAN Multi-site deployments.

Fabric Type Description
Data Center VXLAN EVPN Fabric for a VXLAN EVPN deployment with Nexus 9000 and 3000 switches.
Enhanced Classic LAN Fabric for a fully automated 3-tier Classic LAN deployment with Nexus 9000 and 7000 switches.
Campus VXLAN EVPN Fabric for a VXLAN EVPN Campus deployment with Catalyst 9000 and Nexus 9000 as border gateways switches.
BGP Fabric Fabric for an eBGP based deployment with Nexus 9000 and 3000 switches. Optionally VXLAN EVPN can be enabled on top of the eBGP underlay.
Custom Network Fabric for flexible deployments with a mix of Nexus and Non-Nexus devices.
Fabric Group Domain that can contain Enhanced Classic LAN, Classic LAN, and External Connectivity Network fabrics.
Classic LAN Fabric to manage a legacy Classic LAN deployment with Nexus switches.
LAN Monitor Fabric for monitoring Nexus switches for basic discovery and inventory management.
VXLAN EVPN Multi-Site Domain that can contain multiple VXLAN EVPN Fabrics (with Layer-2/Layer-3 Overlay Extensions) and other Fabric Types.
Classic IPFM Fabric to manage or monitor existing Nexus 9000 switches in an IP Fabric for Media Deployment.
IPFM Fabric for a fully automated deployment of IP Fabric for Media Network with Nexus 9000 switches.
Multi-Site External Network Fabric to interconnect VXLAN EVPN for Multi-Site deployments with a mix of Nexus and Non-Nexus devices
External Connectivity Network Fabric for core and edge router deployments with a mix of Nexus and Non-Nexus devices.
  • You can create one or more fabrics by selecting the Actions button on the Manage -> Fabrics page.
  • Provide a name for the fabric
  • Then select the type of fabric to deploy. As you can see there are many different types.
  • You're then presented with a form which includes one or more tabs and inputs depending on the fabric type selected. For example in the VXLAN EVPN fabric you see inputs such as the underlay/overlay parameters, vPC configuration, and management configuration.

  • Many of the fabric templates will have default inputs which make it easier to get started. For example in the VXLAN EVPN fabric template you only need to put in the BGP ASN and all other inputs will use the default value.

  • Once the fabric definition has been configured you can add the switches. In this case I'm just using a manual discovery based on the mgmt 0 IP address and credentials of the switch.

Brownfield import

Notice in the screenshot below that the Preserve config flag is set to false. This means the switch config will be wiped as part of the onboarding process so that the new fabric configuration can be pushed to the device.

NDFC can also import and manage an existing VXLAN EVPN fabric which you have configured manually through the CLI or another method. This is known as a brownfield import.

When you import an existing VXLAN fabric you need to ensure the Preserve config flag is set to true. This ensures that the current configuration of the switches will be retained.

For more information on this process see the Managing a Brownfield VXLAN BGP EVPN Fabric configuration guide

  • You can set the role of the device which determines the configuration that is deployed. For example a VXLAN leaf or spine role.
  • When you're ready to push the config to the switches select the Actions button at the top of the page and then Recalculate and Deploy. This will calculate the configuration change and deploy to all switches in the fabric
  • You can view the pending changes
  • And finally deploy all the changes to the devices

Templates for individual devices

Templates can also be helpful to manage individual devices. This could be the entire configuration for a switch or just specific sections of the configuration.

  • For example, you may have a base configuration you want to apply to all access switches which enables certain features, configures VLANs, configures security settings etc. In the following screenshots I've created a new template which applies to the whole device and used TEMPLATE_CLI as the content type since this is just the standard NX-OS CLI config
  • Placeholders (in the format $$placeholder$$) can be used to customize the configuration with variables or user input
  • Since the example above is a device config we can apply it at either the fabric level (for multiple switches) or at the individual switch level. In both cases you can select the Policies tab and then Add Policy
  • The policy is selected and any custom user input can be provided

You may also want to use policies for a subset of the configurations.

  • For example, imagine you configured different access switches using the base configuration mentioned above. You could then apply interface configurations for one or more interfaces across one or more switches. Here I am editing the required interfaces and applying an out-of-the-box policy (int_access_host) which accepts user input. This policy also allows you to add freeform config if needed

For more information see the NDFC Templates Configuration Guide

VXLAN as Code

You might have seen the Nexus as Code project which simplifies the deployment of ACI fabric configuration. You can achieve something similar with NDFC using the VXLAN as Code project. This project uses the Network as Code DC VXLAN Ansible Galaxy Collection to simplify the deployment of a VXLAN fabric using NDFC as the controller.

Rather than configuring the fabric by clicking through the UI like we've just seen,with VXLAN as Code you can define all your configuration in YAML files. Ansible will then parse these YAML files and apply the configuration to the fabric. This includes discovery/registration of the switches.

There is also validation built into the collection so you can catch any misconfigurations before they end up in production.

Example Config

These examples contain the deployment details you would find in the fabric template, the switch discovery (management IP and serial number), and the overlay networks.

This is NOT a full working configuration but just used to demonstrate what is possible.

global.yaml
---
vxlan:
  global:
    name: nac-ndfc1
    bgp_asn: 65111
    route_reflectors: 2
    anycast_gateway_mac: 12:34:56:78:90:00
    dns_servers:
      - ip_address: 8.8.8.8
        vrf: management
    ntp_servers:
      - ip_address: pool.ntp.org
        vrf: management
underlay.yaml
---
vxlan:
  underlay:
    general:
      routing_protocol: ospf
      replication_mode: multicast
      fabric_interface_numbering: p2p
      subnet_mask: 31
      underlay_routing_loopback_id: 0
      underlay_vtep_loopback_id: 1
      underlay_routing_protocol_tag: UNDERLAY
      underlay_rp_loopback_id: 250
      intra_fabric_interface_mtu: 9216
      layer2_host_interfacde_mtu: 9216
      unshut_host_interfaces: true
    ipv4:
      underlay_routing_loopback_ip_range: 10.0.0.0/22
      underlay_vtep_loopback_ip_range: 10.100.100.0/22
      underlay_rp_loopback_ip_range: 10.250.250.0/24
      underlay_subnet_ip_range: 10.1.0.0/16
    ospf:
      area_id: 0.0.0.0
    multicast:
      underlay_rp_loopback_id: 251
      underlay_primary_rp_loopback_id: 0
switches.yaml
---
vxlan:
  topology:
    switches:
      - name: site-1-spine1
        serial_number: FF123456789
        role: spine
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.1
      - name: site-1-spine2
        serial_number: FF123456780
        role: spine
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.2
      - name: site-1-leaf1
        serial_number: FF123456788
        role: leaf
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.3
      - name: site-1-leaf2
        serial_number: FF123456787
        role: leaf
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.4
      - name: site-1-leaf3
        serial_number: FF123456786
        role: leaf
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.5
      - name: site-1-leaf4
        serial_number: FF123456785
        role: leaf
        management:
          default_gateway_v4: 192.168.100.1
          management_ipv4_address: 192.168.100.6
overlay.yaml
---
vxlan:
  overlay_services:
    vrfs:
      - name: vrf-01
        vrf_id: 150001
        vlan_id: 2001
        attach_group: all
      - name: vrf-02
        vrf_id: 150002
        vlan_id: 2002
        attach_group: leaf2
    vrf_attach_groups:
      - name: all
        switches:
          - { hostname: site-1-leaf1 }
          - { hostname: site-1-leaf2 }
          - { hostname: site-1-leaf3 }
          - { hostname: site-1-leaf4 }
      - name: leaf1
        switches:
          - { hostname: site-1-leaf1 }
      - name: leaf2
        switches:
          - { hostname: site-1-leaf2 }
    networks:
      - name: 192_168_10_0_24
        vrf_name: vrf-01
        net_id: 130001
        vlan_id: 2301
        vlan_name: 192_168_10_0_24_vlan2301
        gw_ip_address: "192.168.10.254/24"
        attach_group: all
      - name: 192_168_20_0_24
        vrf_name: vrf-01
        net_id: 130002
        vlan_id: 2302
        vlan_name: 192_168_10_0_24_vlan2302
        gw_ip_address: "192.168.20.254/24"
        attach_group: leaf1
      - name: 192_168_30_0_24
        vrf_name: vrf-02
        net_id: 130003
        vlan_id: 2303
        vlan_name: 192_168_10_0_24_vlan2303
        gw_ip_address: "192.168.30.254/24"
        attach_group: leaf2
    network_attach_groups:
      - name: all
        switches:
          - { hostname: site-1-leaf1, ports: [Ethernet1/1, Ethernet1/2] }
          - { hostname: site-1-leaf2, ports: [Ethernet1/1, Ethernet1/2] }
      - name: leaf1
        switches:
          - { hostname: site-1-leaf1, ports: [] }
      - name: leaf2
        switches:
          - { hostname: site-1-leaf2, ports: [Ethernet1/48] }

vPC configuration

NDFC also helps deploy vPC configuration on a pair of switches.

  • If you're using the DC VXLAN EVPN Fabric template you can find the vPC settings in the vPC tab of the template wizard. These values are prepopulated to simplify the deployment but can be changed if needed
  • Select the first switch and then vPC Pairing from the Switch tab
  • Select the second switch and input the vPC settings. If the settings were configured as part of the template wizard (e.g. in the DC VXLAN EVPN template), you will only need to select the switch
  • After running Relcalculate and Deploy to push the new vPC configuration you can monitor the vPC within the vPC Overview

Syncing configurations

Whether you're using NDFC, Ansible/Terraform, or something other method to manage your devices, it's always good to have a single source of truth/centralized configuration. Having one place you make changes reduces the risk of conflicts and unexpected errors and ensures that the config is the most accurate and up-to-date.

NDFC implements a configuration compliance check to reduce or eliminate configuration drift. NDFC notices when changes are made manually to a device (e.g. through the CLI) and can redeploy the correct config to that device.

However in some cases there may be a need to pull config that was changed on a device (e.g. through the CLI) back into NDFC. This can be achieved through a host_port_resync policy.

Interface config sync

As mentioned above, it's always best to have a single source of truth for where configuration is applied, however if needed NDFC can pull in config that was separately applied on a device.

What can be synced?

The config sync currently only applies to changes made to interfaces configuration. See the Syncing up Switch Interface Configuration section of the following guide for caveats, guidelines, and limitations

Syncing up Switch Interface Configuration

  • From the Fabric or Switch level select the Policies tab and Add Policy
  • Add the host_port_resync policy and save
  • Make a change on the device interface
  • Recalculate and Deploy from the fabric or switch Actions menu. You should see the Host Port Resync pulling the latest config from the devices and then creating host policies
  • Confirm that the changes are reflected in the NDFC interfaces tab
  • Since the config has now been pulled into NDFC and is part of the configured policies, when you run a new Recalculate and Deploy you should not see any pending changes

Configuration compliance

See the Configuration Compliance config guide for more details.

  • You can see in the following image that I delete the BGP configuration from leaf 1.
  • Configuration Compliance checks run every 60 minutes by default but this can be adjusted if needed. Once the check completes the main dashboard shows an Out-of-sync device
  • I can preview the config and see the side by side comparison
  • To return the fabric back to the intended config I can just recalculate and deploy the change

Git integration

Git integration

My lab is using the NDFC 12.2 release which only supports a single Git integration for all templates. It's not currently possible to configure a Git repo per individual template/policy

Exporting to Git

As seen in the previous sections, NDFC ships with built-in templates and policies you can use to deploy many types of networks. Although you can backup the NDFC configuration, in some cases you may want to export or import only the custom templates you have built without having to import a complete NDFC configuration.

With NDFC you can export a policy to your local machine or export/import using Git. This makes it very easy to manage custom templates/policies, including version control and sharing with others.

  • First create a new Git repository. I have selected the checkbox to create a README.md file as this will create and set the branch, in my case main. You should also edit the README to explain the purpose of the repository.
  • You'll need to provide a token so that NDFC has access to push and pull from the repo. In this case I'm using Github so I created a Personal Access Token
  • Check the Actions button in the templates and Export to Git. You'll need to provide the URL to the repo, the branch, and the token you created. If everything is successful you should see a success message.

Having problems with cloning the repo?

  • Is the URL correct?
  • Is the token correct and does it have read/write access to the repo?
  • Is the branch name correct?
  • Have you initialized the repo? e.g. added a README.md and created the main branch
  • The image on the left-hand side shows a brand new empty repo (excluding the README file), while in the right-hand image I created an additional folder called templates in the repo which I can use to store the template files. Either option is fine, it just depends on how you want to use the repo (e.g. you might have additional files/folders used outside of NDFC).
  • Clicking Export pushes the template to your Git repo and you should see something similar to the following config

Importing from Git

You can import from a Git repo using the same process.

  • Select the Actions button and then Import from Git

Inventory and monitoring

Some customers take a different approach with NDFC. While the devices are still configured through the CLI, they use NDFC as a centralized management platform for inventory reporting, visibility and monitoring, software management, config backups, and other day to day operations tasks.

  • For inventory you can select one or more devices and select Export to download a CSV with all inventory details
  • The main fabric dashboard gives you a quick overview of how a site or fabric is operating and where you may need to focus your attention

VM Visibility

Whether or not you're using NDFC to manage the config on your devices, it's always helpful to have an understanding of how endpoints connect to the network. There are a few options available in NDFC including the endpoint locator, virtual machine manager integration with VMware, and Kubernetes cluster visibility. You're also able to integrate NDFC with an IPAM to view subnets mapped to a VXLAN fabric created by NDFC, including subnet utilization.

See the following docs for more information:

Software management

Managing devices through a central controller (e.g. NDFC) not only allows you to make config changes to multiple devices at once, but also centralize day to day tasks such as switch firmware management. Here are a couple of the key features NDFC provides when it comes to device software management.

  • Perform upgrades across multiple devices in parallel
  • Validate the firmware is correct
  • Stage the firmware (i.e. copy it to the switch bootflash) which can reduce the upgrade time required in a maintenance window
  • Run pre- and post- upgrade snapshots of the switch configuration to confirm if anything has changed

See the Nexus Dashboard Fabric Controller Fabric Software whitepaper for more details.

  • First navigate to the Manage -> Fabric Software dashboard. You'll see the fabrics that you are managing, the number of switches, the fabric status etc
  • Clicking on Prepare allows you to select an image policy which applies to all switches in this fabric. This ensures a consistent image across all switches
  • When you upload the device firmware an image policy will automatically be created for you. You could also create a new one manually
  • Clicking the Update plan button from the fabric software main dashboard will show you the update plan for the fabric. You should see the switches placed into different groups. You're free to determine in which order you upgrade your devices but in many cases customers use an odd and even group i.e. half the spine/leaf switches in one group and half in the other. You can also manually move switches to a different group
  • If needed you can also manage the device firmware individually from the Devices tab. The Validate button from the Actions menu will check if the image is complete, it it's valid for the specified hardware, and if the upgrade can be non-disruptive.
  • The Stage/distribute button will push the firmware to the devices in that group. The Install button will install the new firmware based on the attached policy
  • You can click View Logs if you need to see the logs on the device for any of the stages

Fabric and device backup and restore

In the Configuration Compliance section we saw how you can recalculate and deploy configuration if the switch is out of sync.

It's also possible to backup switch configuration and restore from a previous snapshot. You can achieve this in two ways.

  • Scheduled Configuration Backup as part of the fabric configuration
  • Alternatively you can manually take a configuration backup.
  • When restoring from a previous configuration you have the option to mark a backup as golden. A golden backup means it won't be deleted even after reaching the maximum number of backups per fabric
  • You can either restore a full fabric configuration from backup (see the screenshot above) or you can restore an individual device

Individual switch restore

NDFC supports restoring configurations for individual switches when using the following fabric types

  • Custom Network
  • Classic LAN
  • Multi-Site External Network
  • External Connectivity Network

Reference: https://www.cisco.com/c/en/us/td/docs/dcn/ndfc/1221/articles/ndfc-backup-restore-lan/backing-up-and-restoring-lan-operational-mode-setups.html#_restoring_switch_configurations

  • The wizard will show you which devices are out-of-sync. i.e. the configuration at the time of the backup is different from the current config
  • You can also view the different between the two configs
  • Finally, clicking Restore intent will revert the configuration of the device(s) to the selected backup configuration

Easy RMA

Sometimes a device might need to be replaced, for example if it is no longer functioning. In NDFC there are a couple of options to make the replacement as simple as possible.

  • Provision RMA with Power On Auto Provision (POAP)
  • Manual RMA

What is an RMA?

Return Material Authorization or Return Merchandise Authorization, RMA, is the process of returning a device for a refund or to receive a replacement device

https://en.wikipedia.org/wiki/Return_merchandise_authorization

You can also put the switch into maintenance mode before the replacement.

When you place the switch in maintenance mode, all configured Layer 3 control-plane protocols are isolated from the network. Directly connected routes are not withdrawn or modified during this state. When normal mode is restored, the advertisement of all routes is restored.

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/104x/config-guides/cisco-nexus-9000-series-nx-os-system-management-configuration-guide-release-104x/m-configuring-graceful-insertion-and-removal-10x.html

Provision RMA with Power On Auto Provision (POAP)

You may have seen the following when you load a new Nexus 9000 that doesn't contain a startup config. This is the Power On Auto Provisioning (POAP) feature which automates the process of configuring a switch for the first time. When a switch using POAP boots and doesn't find the startup config it sends a DHCP DISCOVER message to get an IP and also find the IP of a TFTP server. It then downloads and installs the appropriate software image and configuration file.

The NDFC server can be configured as the DHCP server or an external one can be used. When you need to replace the switch, you first have to physically swap it out and connect it to the same interfaces. In NDFC you then select the actions button for the switch and then Provision RMA. If the DHCP DISCOVER has been successful you should see it appear in the list of replacement switches.

Note: in this example I'm just showing the manual option as I don't have it setup correctly in my lab.

For more details see the Zero-Touch Provisioning of VXLAN Fabrics using Inband POAP with NDFC whitepaper.

Manual RMA

NDFC can still help with the replacement of devices if you don't want to setup POAP. In this case you first need to configure the new switch with the same management IP address/route, hostname, and username/password. Then swap out the old switch with a new one and keep the cabling the same. You can Rediscover the switch again. As it's a brand new switch, when you run a Recalculate and Deploy it should push all the config again and come online.

  • The old switch has been swapped and is now unreachable
  • I've cabled in the new switch and configured the same management IP, hostname, and username/password. I can now rediscover the switch again because the IP and credentials haven't changed
  • Once rediscovered I can push down the config and as you can see here, it's 512 lines because the switch has been wiped so the entire config will be pushed

Comments