Awesome
Declaratively deploy Leaf and Spine fabric
This playbook will deploy a leaf and spine fabric and its related services in a declarative manner. You only have to define a few key values such as naming convention, number of devices and addresses ranges, the playbook is smart enough to do the rest for you.
This came from my project for the IPSpace Building Network Automation Solutions course and was used in part when we were deploying Cisco 9k leaf and spine fabrics in our Data Centers. The playbook is structured in a way that it should hopefully not be too difficult to add templates to deploy leaf and spine fabrics for other vendors. My plan was to add Arista and Juniper but is unlikely to happen.
I now am done with building DCs (bring on the :cloud:) and with this being on the edge of the limit of my programing knowledge I don't envisage making any future changes. If any of it is useful to you please do take it and mold it to your own needs.
This README is intended to give enough information to understand the playbooks structure and run it. The variable files hold examples of a deployment with more information on what each variable does. For more detailed information about the playbook have a look at the series off posts I did about it on my blog.
<hr>The playbook deployment is structured into the following 5 roles with the option to deploy part or all of the fabric.
- base: Non-fabric specific core configuration such as hostname, address ranges, aaa, users, acls, ntp, syslog, etc
- fabric: Fabric specific core elements such as fabric size, interfaces (spine-to->leaf/border), routing protocols (OSPF, BGP) and MLAG
- services: Services provided by the fabric (not the fabric core) are split into three sub-roles:
- tenant: VRFs, SVIs, VLANs and VXLANs on the fabric and their associated VNIs
- interface: Access ports connecting to compute or other non-fabric core network devices
- routing: BGP (address-families), OSPF (additional non-fabric process) and static routes
If you wish to have a more custom build the majority of the settings in the variable files (unless specifically stated) can be changed as none of the scripting or templating logic uses the actual contents (dictionary values) to make decisions.
This deployment will scale up to a max of 4 spines, 4 borders and 10 leafs, this is how it will be deployed with the default values.
The default ports used for inter-switch links are in the table below, these can be changed within fabric.yml (fbc.adv.bse_intf).
Connection | Start Port | End Port |
---|---|---|
SPINE-to-LEAF | Eth1/1 | Eth1/10 |
SPINE-to-BORDER | Eth1/11 | Eth1/14 |
LEAF-to-SPINE | Eth1/1 | Eth1/4 |
BORDER-to-SPINE | Eth1/1 | Eth1/4 |
MLAG Peer-link | Eth1/5 | Eth1/6 |
MLAG keepalive | mgmt | n/a |
This playbook is based on 1U Nexus devices, therefore using the one linecard module for all the connections. I have not tested how it will work with multiple modules, the role intf_cleanup is likely not to work. This role ensures interface configuration is declarative by defaulting non-used interfaces, therefore could be excluded without breaking the playbook.
As Python is a lot more flexible than Ansible the dynamic inventory_plugin and filter_plugins (within the roles) do the manipulating of the data in the variable files to create the data models that are used by the templates. This helps to abstract a lot of the complexity out of the jinja templates making it easier to create new templates for different vendors as you only have to deal with the device configuration rather than data manipulation.
Fabric Core Variable Elements
These core elements are the minimum requirements to create the declarative fabric. They are used for the dynamic inventory creation as well by the majority of the Jinja2 templates. All variables are proceeded by ans, bse or fbc to make it easier to identify within the playbook, roles and templates which variable file the variable came from. From the contents of these var_files a dynamic inventory is built containing host_vars of the fabric interfaces and IP addresses.
ansible.yml (ans)
dir_path: Base directory location on the Ansible host that stores all the validation and configuration snippets
device_os: Operating system of each device type (spine, leaf and border)
creds_all: hostname (got from the inventory), username and password\
base.yml (bse)
The settings required to onboard and manage device such as hostname format, IP address ranges, aaa, syslog, etc.
device_name: Naming format that the automatically generated 'Node ID' (double decimal format) is added to and the group name created from (in lowercase). The name must contain a hyphen (-) and the characters after that hyphen must be either letters, digits or underscore as that is what the group name is created from. For example using DC1-N9K-SPINE would mean that the device is DC1-N9K-SPINE01 and the group is spine
Key | Value | Information |
---|---|---|
spine | xx-xx | Spine switch device and group naming format |
border | xx-xx | Border switch device and group naming format |
leaf | xx-xx | Leaf switch device and group naming format |
addr: Subnets from which the device specific IP addresses are generated based on the device-type increment and the Node ID. The majority of subnets need to be at least /27 to cover a maximum network size of 4 spines, 10 leafs and 4 borders (18 addresses)
Key | Value | Min size | Information |
---|---|---|---|
lp_net | x.x.x.x/26 | /26 | The range routing (OSPF/BGP), VTEP and vPC loopbacks are from (mask will be /32) |
mgmt_net | x.x.x.x/27 | /27 | Management network, by default will use .11 to .30 |
mlag_peer_net | x.x.x.x/26 | /26 or /27 | Range for OSPF peering between MLAG pairs, is split into /30 per-switch pair. Must be /26 if using same range for keepalive |
mlag_kalive_net | x.x.x.x/27 | /27 | Optional keepalive address range (split into /30). If not set uses mlag_peer_net range |
mgmt_gw | x.x.x.x | n/a | Management interface default gateway |
mlag_kalive_net
is only needed if not using the management interface for the keepalive or you want separate ranges for the peer-link and keepalive interfaces. The keepalive link is created in its own VRF so it can use duplicate IPs or be kept unique by offsetting it with the fbc.adv.addr_incre.mlag_kalive_incre
fabric variable.
There are a lot of other system wide settings in base.yml such as AAA, NTP, DNS, usernames and management ACLs. Anything under bse.services
are optional (DNS, logging, NTP, AAA, SNMP, SYSLOG) and will use the management interface and VRF as the source unless specifically set. More detailed information can be found in the variable file.
fabric.yml (fbc)
Variables used to determine how the fabric will be built, the network size, interfaces, routing protocols and address increments. At a bare minimum you only need to declare the size of fabric, total number of switch ports and the routing options.
network_size: How many of each device type make up the fabric. Can range from 1 spine and 2 leafs up to a maximum of 4 spines, 4 borders and 10 leafs. The border and leaf switches are MLAG pairs so must be in increments of 2.
Key | Value | Information |
---|---|---|
num_spines | 2 | Number of spine switches in increments of 1 up to a maximum of 4 |
num_borders | 2 | Number of border switches in increments of 2 up to a maximum of 4 |
num_leafs | 4 | Number of leaf switches in increments of 2 up to a maximum of 10 |
num_intf: The total number of interfaces per-device-type is required to make the interface assignment declarative by ensuring that non-defined interfaces are reset to their default values
Key | Value | Information |
---|---|---|
spine | 1,64 | The first and last interface for a spine switch |
border | 1,64 | The first and last interface for a border switch |
leaf | 1,64 | The first and last interface for a leaf switch |
adv.bse_intf: Interface naming formats and the 'seed' interface numbers used to build the fabric
Key | Value | Information |
---|---|---|
intf_fmt | Ethernet1/ | Interface naming format |
intf_short | Eth1/ | Short interface name used in interface descriptions |
mlag_fmt | port-channel | MLAG interface naming format |
mlag_short | Po | Short MLAG interface name used in MLAG interface descriptions |
lp_fmt | loopback | Loopback interface naming format |
sp_to_lf | 1 | First interface used for SPINE to LEAF links (1 to 10) |
sp_to_bdr | 11 | First interface used for SPINE to BORDER links (11 to 14) |
lf_to_sp | 1 | First interface used LEAF to SPINE links (1 to 4) |
bdr_to_sp | 1 | First interface used BORDER to SPINE links (1 to 4) |
mlag_peer | 5-6 | Interfaces used for the MLAG peer Link |
mlag_kalive | mgmt | Interface for the keepalive. If it is not an integer uses the management interface |
adv.address_incre: Increments added to the 'Node ID' and subnet to generate unique device IP addresses. Uniqueness is enforced by using different increments for different device-types and functions
Key | Value | Information |
---|---|---|
spine_ip | 11 | Spine mgmt and routing loopback addresses (default .11 to .14) |
border_ip | 16 | Border mgmt and routing loopback addresses (default .16 to .19) |
leaf_ip | 21 | Leaf mgmt and routing loopback addresses (default .21 to .30) |
border_vtep_lp | 36 | Border VTEP (PIP) loopback addresses (default .36 to .39) |
leaf_vtep_lp | 41 | Leaf VTEP (PIP) loopback addresses (default .41 to .50) |
border_mlag_lp | 56 | Shared MLAG anycast (VIP) loopback addresses for each pair of borders (default .56 to .57) |
leaf_mlag_lp | 51 | Shared MLAG anycast (VIP) loopback addresses for each pair of leafs (default .51 to .55) |
border_bgw_lp | 58 | Shared BGW MS anycast loopback addresses for each pair of borders (default .58 to .59) |
mlag_leaf_ip | 1 | Start IP for leaf OSPF peering over peer-link (default LEAF01 is .1, LEAF02 is .2, LEAF03 is .5, etc) |
mlag_border_ip | 21 | Start IP for border OSPF peering over peer-link (default BORDER01 is .21, BORDER03 is .25, etc) |
mlag_kalive_incre | 28 | Increment added to leaf/border increment (mlag_leaf_ip/mlag_border_ip) for keepalive addresses |
If the management interface is not being used for the keepalive link either specify a separate network range (bse.addr.mlag_kalive_net
) or use the peer-link range and define an increment (mlag_kalive_incre
) that is added to the peer-link increment (mlag_leaf_ip
or mlag_border_ip
) to generate unique addresses.
route: Settings related to the fabric routing protocols (OSPF and BGP). BFD is not supported on unnumbered interfaces so the routing protocol timers have been shortened (OSPF 2/8, BGP 3/9), these are set under the variable file advanced settings (adv.route
)
Key | Value | Mandatory | Information |
---|---|---|---|
ospf.pro | string or integer | Yes | Can be numbered or named |
ospf.area | x.x.x.x | Yes | Area this group of interfaces are in, must be in dotted decimal format |
bgp.as_num | integer | Yes | Local BGP Autonomous System number |
authentication | string | No | Applies to both BGP and OSPF. Hash out if don't want to set authentication |
acast_gw_mac: The distributed gateway anycast MAC address for all leaf and border switches in the format xxxx.xxxx.xxxx
Dynamic Inventory
The ansible, base and fabric variables are passed through the inv_from_vars.py inventory_plugin to create the dynamic inventory and host_vars of all the fabric interfaces and IP addresses. By doing this in the inventory the complexity is abstracted from the base and fabric role templates making it easier to expand the playbook to other vendors in the future.
With the exception of intf_mlag and mlag_peer_ip (not on the spines) the following host_vars are created for every host.
- ansible_host: Devices management address
- ansible_network_os: Got from ansible var_file and used by napalm device driver
- intf_fbc: Dictionary of fabric interfaces with interface the keys and description the values
- intf_lp: List of dictionaries with keys of name, ip and description
- intf_mlag: Dictionary of MLAG peer-link interfaces with interface the key and description the value
- mlag_peer_ip: IP of the SVI (default VLAN2) used for the OSPF peering over the MLAG peer-link
- num_intf: Number of the first and last physical interface on the switch
- intf_mlag_kalive: Dictionary of MLAG keepalive link interface with interface the key and description the value (only created if defined)
- mlag_kalive_ip: IP of the keepalive link (only created if defined)
The devices (host-vars) and groups (group-vars) created by the inventory plugin can be checked using the graph
flag. It is the inventory config file (.yml) not the inventory plugin (.py) that is referenced when using the dynamic inventory.
ansible-inventory --playbook-dir=$(pwd) -i inv_from_vars_cfg.yml --graph
@all:
|--@border:
| |--DC1-N9K-BORDER01
| |--DC1-N9K-BORDER02
|--@leaf:
| |--DC1-N9K-LEAF01
| |--DC1-N9K-LEAF02
| |--DC1-N9K-LEAF03
| |--DC1-N9K-LEAF04
|--@spine:
| |--DC1-N9K-SPINE01
| |--DC1-N9K-SPINE02
|--@ungrouped:
host
shows the host-vars for that specific host whereas list
shows everything, all host-vars and group-vars.
ansible-inventory --playbook-dir=$(pwd) -i inv_from_vars_cfg.yml --host DC1-N9K-LEAF01
ansible-inventory --playbook-dir=$(pwd) -i inv_from_vars_cfg.yml --list
An example of the host_vars created for a leaf switch.
{
"ansible_host": "10.10.108.21",
"ansible_network_os": "nxos",
"intf_fbc": {
"Ethernet1/1": "UPLINK > DC1-N9K-SPINE01 - Eth1/1",
"Ethernet1/2": "UPLINK > DC1-N9K-SPINE02 - Eth1/1"
},
"intf_lp": [
{
"descr": "LP > Routing protocol RID and peerings",
"ip": "192.168.101.21/32",
"name": "loopback1"
},
{
"descr": "LP > VTEP Tunnels (PIP) and MLAG (VIP)",
"ip": "192.168.101.41/32",
"mlag_lp_addr": "192.168.101.51/32",
"name": "loopback2"
}
],
"intf_mlag_kalive": {
"Ethernet1/7": "UPLINK > DC1-N9K-LEAF02 - Eth1/7 < MLAG Keepalive"
},
"intf_mlag_peer": {
"Ethernet1/5": "UPLINK > DC1-N9K-LEAF02 - Eth1/5 < Peer-link",
"Ethernet1/6": "UPLINK > DC1-N9K-LEAF02 - Eth1/6 < Peer-link",
"port-channel1": "UPLINK > DC1-N9K-LEAF02 - Po1 < MLAG Peer-link"
},
"mlag_kalive_ip": "10.10.10.29/30",
"mlag_peer_ip": "192.168.202.1/30",
"num_intf": "1,64"
}
To use the inventory plugin in a playbook reference the inventory config file in place of the normal hosts inventory file (-i
).
ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml
Services - Tenant (svc_tnt)
Tenants, SVIs, VLANs and VXLANs are created based on the variables stored in the service_tenant.yml file (svc_tnt.tnt).
tnt: A list of tenants that contains a list of VLANs (Layer2 and/ or Layer3)
- Tenants (VRFs) will only be created on a leaf or border if a VLAN within that tenant is to be created on that device
- Even if a tenant is not a layer3 tenant a VRF will still be created and the L3VNI and tenant VLAN number reserved
- If the tenant is a layer3 tenant the route-map for redistribution is always created and attached to the BGP peer
Key | Value | Mandatory | Information |
---|---|---|---|
tenant_name | string | Yes | Name of the VRF |
l3_tenant | True or False | Yes | Does it need SVIs or is routing done off the fabric (i.e external router) |
bgp_redist_tag | integer | No | Tag used to redistributed SVIs into BGP, by default uses tenant SVI number |
vlans | list | Yes | List of VLANs within this tenant (see the below table) |
vlans: A List of VLANs within a tenant which at a minimum need the layer2 values of name and num. VLANs and SVIs can only be created on all leafs and/ or all borders, you can't selectively say which individual leaf or border switches to create them on
- Unless an IP address is assigned to a VLAN (ip_addr) it will only be L2 VLAN
- L3 VLANs are automatically redistributed into BGP. This can be disabled (ipv4_bgp_redist: False) on a per-vlan basis
- By default VLANs will only be created on the leaf switches (create_on_leaf). This can be changed on a per-vlan basis to create only on borders (create_on_border) or on both leafs and borders
- To add a non-VXLAN SVI (without anycast address) create the VLAN as normal but with the extra
VXLAN: False
dictionary. The SVI is defined in service_interface.yml astype: svi
- Optional settings will implicitly use the default value, they only need defining if not using the default value
Key | Value | Mand | Information |
---|---|---|---|
num | integer | Yes | The VLAN number |
name | string | Yes | The VLAN name |
ip_addr | x.x.x.x/x | No | Adding an IP address automatically making the VLAN L3 (not set by default) |
ipv4_bgp_redist | True or False | No | Dictates whether the SVI is redistributed into BGP VRF address family (default True) |
create_on_leaf | True or False | No | Dictates whether this VLAN is created on the leafs (default True) |
create_on_border | True or False | No | Dictates whether this VLAN is created on the borders (default False) |
vxlan | True or False | No | Whether VXLAN or normal VLAN. Only need if don't want it to be a VXLAN |
The redistribution route-map name can be changed in the advanced (adv) section of services-tenant.yml or services-routing.yml. If defined in both places the setting in services-routing.yml take precedence.
L2VNI and L3VNI numbers
The L2VNI and L3VNI values are automatically derived and incremented on a per-tenant basis based on the start and increment seed values defined in the advanced section (svc_tnt.adv) of services_tenant.yml.
adv.bse_vni: Starting VNI numbers
Key | Value | Information |
---|---|---|
tnt_vlan | 3001 | Starting VLAN number for the transit L3VNI |
l3vni | 10003001 | Starting L3VNI number |
l2vni | 10000 | Starting L2VNI number, the VLAN number will be added to this |
adv.vni_incre: Number by which VNIs are incremented for each tenant
Key | Value | Information |
---|---|---|
tnt_vlan | 1 | Value by which the transit L3VNI VLAN number is increased for each tenant |
l3vni | 1 | Value by which the transit L3VNI VNI number is increased for each tenant |
l2vni | 10000 | Value by which the L2VNI range (range + vlan) is increased for each tenant |
For example a two tenant fabric each with a VLAN 20 using the above values would have L3 tenant SVIs of 3001, 3002, L3VNIs or 10003001, 10003002 and L2VNIs of 10020 and 20020.
A new data-model is created from the services_tenant.yml variables by passing them through the format_dm.py filter_plugin method create_svc_tnt_dm along with the BGP route-map name (if exists) and ASN (from fabric.yml). The result is a per-device-type (leaf and border) list of tenants, SVIs and VLANs which are used to render the svc_tnt_tmpl.j2 template and create the config snippet.
Below is an example of the data model format for a tenant and its VLANs.
{
"bgp_redist_tag": 99,
"l3_tnt": true,
"l3vni": 100003004,
"rm_name": "RM_CONN->BGP65001_RED",
"tnt_name": "RED",
"tnt_redist": true,
"tnt_vlan": 3004,
"vlans": [
{
"create_on_border": true,
"create_on_leaf": false,
"ip_addr": "10.99.99.1/24",
"ipv4_bgp_redist": true,
"name": "red_inet_vl99",
"num": 99,
"vni": 40099
},
{
"ip_addr": "l3_vni",
"ipv4_bgp_redist": false,
"name": "RED_L3VNI",
"num": 3004,
"vni": 100003004
}
]
}
Services - Interface (svc_intf)
The service_interface.yml variables define single or dual-homed interfaces (including port-channel) either statically or dynamically.
- By default all interfaces are dual-homed LACP 'active'. The VPC number can not be changed, is always the port-channel number
- Interfaces and port-channels can be assigned dynamically from a pre-defined pool (under svc_intf.adv) or specified manually
- If the tenant (VRF) is not defined for a layer3, SVI or loopback interface it will be created in the global routing table
- If the interface config is the same across multiple switches (like an access port) define one interface with a list of switches
- Only specify the odd numbered switch for dual-homed interfaces, the config for MLAG neighbor is automatically generate
There are 7 pre-defined interface types that can be deployed:
- access: A single VLAN layer2 access port with STP set to 'edge'
- stp_trunk: A trunk going to a device that supports Bridge Assurance. STP is set to 'network'
- stp_trunk_non_ba: Same as stp_trunk except STP is set to 'normal' as it is for devices that don't support BA
- non_stp_trunk: A trunk port going to a device that doesn't support BPDU. STP is set to 'edge' and BPDU Guard enabled
- layer3: A layer3 interface with an IP address. Must be single-homed as MLAG not supported for L3 interfaces
- loopback: A loopback interface with an IP address (must be single-homed)
- svi: To define a SVI the VLAN must exist in service_tenant.yml and not be a VXLAN (must be single-homed)
The intf.single_homed and intf.dual-homed dictionaries hold a list of all single-homed or dual-homed interfaces using any of the attributes in the table below. If there are no single-homed or dual-homed interfaces on the fabric hash out the relevant dictionary.
Key | Value | Mand | Information |
---|---|---|---|
descr | string | Yes | Interface or port-channel description |
type | intf_type | Yes | Either access, stp_trunk, stp_trunk_non_ba, non_stp_trunk, layer3, loopback or svi |
ip_vlan | vlan or ip | Yes | Depends on the type, either ip/prefix, vlan or multiple vlans separated by , and/or - |
switch | list | Yes | List of switches created on. If dual-homed needs to be odd numbered switch from MLAG pair |
tenant | string | No | Layer3, svi and loopbacks only. If not defined the default VRF is used (global routing table) |
po_mbr_descr | list | No | PO member interface description, [odd_switch, even_switch]. If undefined uses PO descr |
po_mode | string | No | Set the Port-channel mode, 'on', 'passive' or 'active' (default is 'active') |
intf_num | integer | No | Only specify the number, the name and module are got from the fbc.adv.bse_intf.intf_fmt |
po_num | integer | No | Only specify the number, the name is got from the fbc.adv.bse_intf.mlag_fmt |
The playbook has the logic to recognize if statically defined interface numbers overlap with the dynamic interface range and exclude them from dynamic interface assignment. For simplicity it is probably best to use separate ranges for the dynamic and static assignments.
adv.single_homed: Reserved range of interfaces to be used for dynamic single-homed and loopback assignment
Key | Value | Information |
---|---|---|
first_intf | integer | First single-homed interface to be dynamically assigned |
last_intf | integer | Last single-homed interface to be dynamically assigned |
first_lp | integer | First loopback number to be dynamically used |
last_lp | integer | Last loopback number to be dynamically used |
adv.dual-homed: Reserved range of interfaces to be used for dynamic dual-homed and port-channel assignment
Key | Value | Information |
---|---|---|
first_intf | integer | First dual-homed interface to be dynamically assigned |
last_intf | integer | Last dual-homed interface to be dynamically assigned |
first_po | integer | First port-channel number to be dynamically used |
last_po | integer | Last port-channel number to be dynamically used |
The format_dm.py filter_plugin method create_svc_intf_dm is run for each inventory host to produce a list of all interfaces to be created on that device. In addition to the services_interface.yml variables it also passes in the interface naming format (fbc.adv.bse_intf) to create the full interface name and hostname to find the interfaces relevant to that device. This is saved to the fact flt_svc_intf which is used to render the svc_intf_tmpl.j2 template and create the config snippet.
Below is an example of the data model format for a single-homed and dual-homed interface.
{
"descr": "UPLINK > DC1-BIP-LB01 - Eth1.1",
"dual_homed": false,
"intf_num": "Ethernet1/9",
"ip_vlan": 30,
"stp": "edge",
"type": "access"
},
{
"descr": "UPLINK > DC1-SWI-BLU01 - Gi0/0",
"dual_homed": true,
"intf_num": "Ethernet1/18",
"ip_vlan": "10,20,30",
"po_mode": "on",
"po_num": 18,
"stp": "network",
"type": "stp_trunk"
},
{
"descr": "UPLINK > DC1-SWI-BLU01 - Po18",
"intf_num": "port-channel18",
"ip_vlan": "10,20,30",
"stp": "network",
"type": "stp_trunk",
"vpc_num": 18
}
Interface Cleanup - Defaulting Interfaces
The interface cleanup role is required to make sure any interfaces not assigned by the fabric or the services (svc_intf) role have a default configuration. Without this if an interface was to be changed (for example a server moved to a different interface) the old interface would not have its configuration put back to the default values.
This role goes through the interfaces assigned by the fabric (from the inventory) and service_interface role (from the svc_intf_dm method) producing a list of used physical interfaces which are then subtracted from the list of all the switches physical interfaces (fbc.num_intf). It has to be run after the fabric or service_interface role as it needs to know what interfaces have been assigned, therefore uses tags to ensure it is run anytime either of these roles are run.
Services - Route (svc_rte)
BGP peerings, non-backbone OSPF processes, static routes and redistribution (connected, static, bgp, ospf) are configured based on the variables specified in the service_route.yml file. The naming convention of the route-maps and prefix-lists used by OSPF and BGP can be changed under the advanced section (adv) of the variable file.
I am undecided about this role as it goes against the simplistic principles used by the other roles. By its very nature routing is very configurable which leads to complexity due to the number of options and inheritance. In theory all these features should work but due to the number of options and combinations available I have not tested all the possible variations of configuration.
Static routes (svc_rte.static_route)
Routes are added per-tenant with the tenant being the top-level dictionary that routes are created under.
- tenant, switch and prefix are lists to make it easy to apply the same routes across multiple devices and tenants
- For routes with the same attributes (like next-hop) can group all the routes as a list within the one
prefix
dictionary value
Parent dict | Key | Value | Mand | Information |
---|---|---|---|---|
n/a | tenant | list | Yes | List of tenants to create the routes in. Use 'global' for the global routing table |
n/a | switch | list | Yes | List of switches to create all routes on (alternatively can be set per-route) |
route | prefix | list | Yes | List of routes that all have same settings (gateway, interface, switch, etc) |
route | gateway | x.x.x.x | Yes | Next hop gateway address |
route | interface | string | No | Next hop interface, use interface full name (Ethernet), Vlan or Null0 |
route | ad | integer | No | Set the admin distance for this group of routes (1 - 255) |
route | next_hop_vrf | string | No | Set the VRF for next-hop if it is in a different VRF (route leaking between VRFs) |
route | switch | list | Yes | Switches to create this group of routes on (overrides static_route.switch) |
OSPF (svc_rte.ospf)
An OSPF processes can be configured for any of the tenants or the global routing table.
- Each OSPF process is enabled on a per-interface basis with summarization and redistribution defined on a per-switch basis
- The mandatory
process.switch
list defines the switches the OSPF process is configured on - Non-mandatory settings only need to be defined if changing the default behavior, otherwise is no need to add the dictionary
Key | Value | Mand | Information |
---|---|---|---|
process | integer or string | Yes | The process can be a number or word |
switch | list | Yes | List of switches to create the OSPF process on |
tenant | string | No | The VRF OSPF is enabled in. If not defined uses the global routing table |
rid | list | No | List of RIDs, must match number of switches (if undefined uses highest loopback) |
bfd | True | No | Enable BFD globally for all interfaces (disabled by default) |
default_orig | True, always | No | Conditionally (True) or always advertise default route (disabled by default) |
Interface, summary and redistribution are child dictionaries of lists under the ospf parent dictionary. They inherit process.switch
unless switch
is specifically defined under that child dictionary.
ospf.interface: Each list element is a group of interfaces with the same set of attributes (area number, interface type, auth, etc)
Key | Value | Mand | Information |
---|---|---|---|
name | list | Yes | List of one or more interfaces. Use interface full name (Ethernet) or Vlan |
area | x.x.x.x | Yes | Area this group of interfaces are in, must be in dotted decimal format |
switch | list | No | Which switches to enable OSPF on these interfaces (inherits process.switch if not set) |
cost | integer | No | Statically set the interfaces OSPF cost, can be 1-65535 |
authentication | string | No | Enable authentication for the area and a password (Cisco type 7) for this interface |
area_type | string | No | By default is normal. Can be set to stub, nssa, stub/nssa no-summary, nssa default-information-originate or nssa no-redistribution |
passive | True | No | Make the interface passive. By default all configured interfaces are non-passive |
hello | integer | No | Interface hello interval (deadtime is x4), automatically disables BFD for this interface |
type | point-to-point | No | By default all interfaces are broadcast, can be changed to point-to-point |
ospf.summary: All summaries with the same attributes (switch, filter, area) can be grouped in a list within the one prefix
dictionary value
Key | Value | Mandatory | Information |
---|---|---|---|
prefix | list | Yes | List of summaries to apply on all the specified switches |
switch | list | No | What switches to summarize on, inherits process.switch if not set |
area | x.x.x.x | No | By default it is LSA5. For LSA3 add an area to summarize from that area |
filter | not-advertise | No | Stops advertisement of the summary and subordinate subnets (is basically filtering) |
ospf.redist:: Each list element is the redistribution type (ospf_xx, bgp_xx, static or connected). Redistributed prefixes can be filtered (allow) or weighted (metric) with the route-map order being metric and then allow. If the allow list is not set it will allow any (empty route-map)
Key | Value | Mand | Information |
---|---|---|---|
type | string | Yes | Redistribute either OSPF process, BGP AS, static or connected |
switch | list | No | What switches to redistribute on, inherits process.switch if not set |
metric | dict | No | Add metric to redistributed prefixes. Keys are metric value and values a list of prefixes or keyword ('any' or 'default'). Can't use metric with a type of connected |
allow | list, any, default | No | List of prefixes (connected is list of interfaces) or keyword ('any' or 'default') to redistribute |
BGP
Uses the concept of groups and peers with the majority of the settings configured in either
group
holds the global settings for all peers within it. Are automatically created on any switches that peers within it are createdpeer
is a list of peers within the group. If the setting is configured in the group and peer the peer setting will take precedence- The
group.name
andpeer.name
are used in the construction of route-map and prefix-list names (formatting is in advanced) - If the tenant is not specified (dictionary not defined) the group or peer will be added to the default global routing table
- Non-mandatory settings only need to be defined if changing the default behavior, otherwise is no need to add the dictionary
Set in | Key | Value | Mand | Information |
---|---|---|---|---|
group | name | string | Yes | Name of the group, no whitespaces or duplicate names (group or peer) |
peer | name | string | Yes | Name of the peer, no whitespaces or duplicate names (group or peer) |
peer | peer_ip | x.x.x.x | Yes | IP address of the peer |
peer | descr | string | Yes | Description of the peer |
both | switch | list | Yes | List of switches (even if is only 1) to create the group and peers on |
both | tenant | list | No | List of tenants (even if is only 1) to create the peers under |
both | remote_as | integer | Yes | Remote AS of this peer or if group all peers within that group |
both | timers | [kl,ht] | No | List of [keepalive, holdtime], if not defined uses [3, 9] seconds |
both | bfd | True | No | Enable BFD for an individual peer or all peers in group (disabled by default) |
both | password | string | No | Plain-text password to authenticate a peer or all peers in group (default none) |
both | default | True | No | Advertise default route to a peer or all peers in the group (default False) |
both | update_source | string | No | Set the source interface used for peerings (default not set) |
both | ebgp_multihop | integer | No | Increase the number of hops for eBGP peerings (2 to 255) |
both | next_hop_self | True | No | Set the next-hop to itself for any advertised prefixes (default not set) |
inbound or outbound: Optionally set under the group or peer to filter BGP advertisements and/ or BGP attribute manipulation
- The naming of the route-maps and prefix-lists are dependant on where they are applied (group or peer)
- All attribute settings are dictionaries with the key being the attribute and the value the prefixes it is applied to
Key | Value | Direction | Information |
---|---|---|---|
weight | dict | inbound | Keys are the weight and the value a list of prefixes or keyword ('any' or 'default') |
pref | dict | inbound | Keys are the local preference and the value a list of prefixes or keyword |
med | dict | outbound | Keys are the MED value and the values a list of prefixes or keyword |
as_prepend | dict | outbound | Keys are the number of times to add the ASN and values a list of prefixes or keyword |
allow | list, any, default | both | Can be a list of prefixes or a keyword to advertise just the default route or anything |
deny | list, any, default | both | Can be a list of prefixes or a keyword to not advertise the default route or anything |
bgp.tnt_advertise: Optionally advertise prefixes on a per-tenant basis (list of VRFs) using network, summary and redistribution. The switch
can be set globally for all network/summary/redist in a VRF and be overridden on an individual per-prefix basis
- network: List of prefixes to be advertised on a per-switch basis (network cmd). If a device is covered by 2 different
network.prefix
statements it will get a combination of them both (merged), so network statements for all prefixes - summary: Group summaries (aggregate-address) with the same attributes (switch and summary_only) within the same list element
- redist: Each list element is the redistribution type (ospf process, static or connected) with the redistributed prefixes weighted (metric) and/or filtered (allow). If the allow list is not set it is allow any (empty route-map). Can only have one each of types connected and static per-switch, first occurrence is used. The switch set under the redistribution type is preferred over that set in
process.switch
, is no merging
Set in | Key | Value | Mand | Information |
---|---|---|---|---|
tnt_advertise | name | string | Yes | A single VRF that is being advertising into (use 'global' for global routing table) |
all | switch | list | Yes | What switches to redistribute on, inherits process.switch if not set |
network/summary | prefix | list | Yes | List of prefixes to advertise |
summary | filter | summary-only | No | Only advertise the summary, suppress all prefixes within it (disabled by default) |
redist | type | string | Yes | Redistribute ospf_process (whitespace before process), static or connected |
redist | metric | dict | No | Add metric to redistributed prefixes. Keys are the MED value and values a list of prefixes or keyword ('any' or 'default'). Cant use metric with connected |
redist | allow | list, any, default | No | List of prefixes (can use 'ge' and/or 'le'), interfaces (for connected) or keyword ('any' or 'default') to redistribute |
Advanced settings (svc_rte.adv) allow the changing of the default routing protocol timers and naming format of the route-maps and prefix-lists used for advertisement and redistribution.
The filter_plugin method create_svc_rte_dm is run for each inventory host to produce a data model of the routing configuration for that device. The outcome is a list of seven per-device data models that are used by the svc_rte_tmpl.j2 template.
- all_pfx_lst: List of all prefix-lists with each element in the format [name, seq, permission, prefix]
- all_rm: List of all route-maps with each element in the format [name, seq, permission, prefix, [attribute, value]]. If no BGP attributes are set in the RM the last entry in the list will be [null, null]
- stc_rte: Per-VRF dictionaries (VRF is the key) of lists of static routes with interface and/or gateway, optional AD and destination VRF
- group: Dictionaries of BGP groups (group is the key) that have peers on this device. The value is dictionaries of any group settings
- peer: Dictionaries of tenants (VRFs) containing the following nested dictionaries:
- peers: Dictionary of peers (key is the peer) with the value being dictionaries of the peers settings
- network: List of networks to be advertised by BGP
- summary: Dictionary of summaries with the key being the prefix and value either null (doesn't suppress) or summary-only
- redist: Two dictionaries of the route-map name (rm_name) and redistribution type (connected, static, etc)
- ospf_proc: Dictionary of VRFs (key) and the OSPF process settings for each VRF (settings configured under the process)
- ospf_intf: Dictionary of interfaces (key) that have OSPF enabled, the values are the interface specific OSPF settings
Passwords
There are four main types of passwords used within the playbooks.
- BGP/OSPF: In the variable file it is in as plain text but in the device running configuration is encrypted
- Users: Has to be in encrypted format (type-5) in the variable file
- TACACS: Has to be in the encrypted format (type-7) in the variable. Could use type-6 but would also need to generate a master key
- device: The password used by Napalm to log into devices defined under
ans.creds_all
. Can be plain-text or use vault
Input validation
Pre-task input validation checks are run on the variable files with the goal being to highlight any problems with variable before any of the fabric build tasks are started. Fail fast based on logic rather failing halfway through a build. Pre-validation checks for things such as missing mandatory variables, variables are of the correct type (str, int, list, dict), IP addresses are valid, duplicate entires, dependencies (VLANs assigned but not created), etc. It wont catch everything but will eliminate a lot of the needless errors that would break a fabric build.
A combination of Python assert within a filter plugin (to identify any issues) and Ansible assert within the playbook (to return user-friendly information) is used to achieve the validation. All the error messages returned by input validation start with the nested location of the variable to make it easier to find.
It is run using the `pre_val' tag and will conditionally only check variable files that have been defined under var_files. It can be run using the inventory plugin but will fail if any of the values used to create inventory are wrong so better use a dummy host file.
ansible-playbook playbook.yml -i hosts --tag pre_val
ansible-playbook playbook.yml -i inv_from_vars_cfg.yml --tag pre_val
A full list of what variables are checked and the expected input can be found in the header notes of the filter plugin input_validate.py.
Playbook Structure
The main playbook (PB_build_fabric.yml) is divided into 3 sections with roles used to do the data manipulation and templating
- pre_tasks: Pre-validation checks and deletion/creation of file structure (at each playbook run) to store config snippets
- tasks: Imports tasks from roles which in turn use variables (.yml) and templates (.j2) to create the config snippets
- base: From base.yml and bse_tmpl.j2 creates the base configuration snippet (aaa, logging, mgmt, ntp, etc)
- fabric: From fabric.yml and fbc_tmpl.j2 creates the fabric configuration snippet (interfaces, OSPF, BGP)
- services: Per-service-type tasks, templates and plugins to create the config for services that run on the fabric
- svc_tnt: From services_tenant.yml and svc_tnt_tmpl.j2 creates the tenant config snippet (VRF, SVI, VXLAN, VLAN)
- svc_intf: From services_interface.yml and svc_intf_tmpl.j2 creates interface config snippet (routed, access, trunk, loop)
- svc_rte: From service_route.yml and svc_rte_tmpl.j2 creates the tenant routing config snippet (BGP, OSPF, routes, redist)
- intf_cleanup: Based on the interfaces used in the fabric creates config snippet to default all the other interfaces
- task_config: Assembles the config snippets into the one file and applies using Napalm replace_config
The post-validation playbook (PB_post_validate.yml) uses the validation role to do the majority of the work
- pre_tasks: Creates the file structure to store validation files (desired_state) and the compliance report
- roles: Imports the services role so that the filter plugins within it can be used to create the service data models for validation
- tasks: Imports tasks from roles and checks the compliance report result
- validation: Per-validation engine tasks to create desired_state, gather the actual_state and produce a compliance report
- nap_val: For elements covered by napalm_getters creates desired_state and compares against actual_state
- cus_val: For elements not covered by napalm_getters creates desired_state and compares against actual_state
- compliance_report: Loads validation report (created by nap_val and cus_val) and checks whether it complies (passed)
- validation: Per-validation engine tasks to create desired_state, gather the actual_state and produce a compliance report
Directory Structure
The directory structure is created within ~/device_configs to hold the configuration snippets, output (diff) from applied changes, validation desired_state files and compliance reports. The parent directory is deleted and re-added at each playbook run.
The base location for this directory can be changed using the ans.dir_path
variable.
~/device_configs/
├── DC1-N9K-BORDER01
│ ├── config
│ │ ├── base.conf
│ │ ├── config.cfg
│ │ ├── dflt_intf.conf
│ │ ├── fabric.conf
│ │ ├── svc_intf.conf
│ │ ├── svc_rte.conf
│ │ └── svc_tnt.conf
│ └── validate
│ ├── napalm_desired_state.yml
│ └── nxos_desired_state.yml
├── diff
│ ├── DC1-N9K-BORDER01.txt
└── reports
├── DC1-N9K-BORDER01_compliance_report.json
Prerequisites
The deployment has been tested on NXOS 9.2(4) and NXOS 9.3(5) (in theory should be fine with 9.3(6) & 9.3(7)) using Ansible 2.10.6 and Python 3.6.9. See the Caveats section for the few nuances when running the different versions of code.
git clone https://github.com/sjhloco/build_fabric.git
mkdir ~/venv/venv_ansible2.10
python3 -m venv ~/venv/venv_ansible2.10
source ~/venv/venv_ansible2.10/bin/activate
pip install -r build_fabric/requirements.txt
Once the environment has been setup with all the packages installed run napalm-ansible
to get the location of the napalm-ansible paths and add them to ansible.cfg under [defaults].
Before any configuration can be deployed using Ansible a few things need to be manually configured on all N9K devices:
- Management IP address and default route
- The features nxapi and scp-server are required for Naplam replace_config
- Image validation can take a while on NXOS so is best to be done so beforehand
interface mgmt0
ip address 10.10.108.11/24
vrf context management
ip route 0.0.0.0/0 10.10.108.1
feature nxapi
feature scp-server
boot nxos bootflash:/nxos.9.3.5.bin sup-1
- Leaf and border switches also need the TCAM allocation changed to allow for arp-suppression. This can differ dependant on device model, any changes made need correcting in
/roles/base/templates/nxos/bse_tmpl.j2
to keep it idempotent
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide
copy run start
reload
The default username/password for all devices is admin/ansible and is stored in the variable bse.users.password
. Swap this out for the encrypted type5 password got from the running config. The username and password used by Napalm to connect to devices is stored in ans.creds_all
and will also need changing to match (is plain-text or use vault).
Before the playbook can be run the devices SSH keys need adding on the Ansible host. ssh_key_playbook.yml (in ssh_keys directory) can be run to add these automatically, you just need to populate the device's management IPs in the ssh_hosts file.
sudo apt install ssh-keyscan
ansible-playbook ssh_keys/ssh_key_add.yml -i ssh_keys/ssh_hosts
Running playbook
The device configuration is applied using Napalm with the differences always saved to ~/device_configs/diff/device_name.txt and optionally printed to screen. Napalm commit_changes is set to True meaning that Ansible check-mode is used for dry-runs. It can take upto 6 minutes to deploy the full configuration when including the service roles so the Napalm default timeout has been increased to 360 seconds. If it takes longer (N9Kv running 9.2(4) is very slow) Ansible will report the build as failed but it is likely the process is still running on the device so give it a minute and run the playbook again, it should pass and with no changes needed.
Due to the declarative nature of the playbook and inheritance between roles there are only a certain number of combinations that the roles can be deployed in.
Ansible tag | Playbook action |
---|---|
pre_val | Checks that the var_file contents are of a valid format |
bse_fbc | Generates, joins and applies the base, fabric and inft_cleanup config snippets |
bse_fbc_tnt | Generates, joins and applies the base, fabric, inft_cleanup and tenant config snippets |
bse_fbc_intf | Generates, joins and applies the base, fabric, tenant, interface and inft_cleanup config snippets |
full | Generates, joins and applies the base, fabric, tenant, interface, inft_cleanup and route config snippets |
rb | Reverses the last applied change by deploying the rollback configuration (rollback_config.txt) |
diff | Prints the differences between the current_config (on the device) and desired_config (applied by Napalm) to screen |
diff
tag can be used withbse_fbc_tnt
,bse_fbc_intf
,full
orrb
to print the configuration changes to screen- Changes are always saved to file no matter whether diff is used or not
-C
or--check-mode
will do everything except actually apply the configuration
pre-validation: Validates the contents of variable files defined under var_files. Best to use dummy host file instead of dynamic inventory
ansible-playbook PB_build_fabric.yml -i hosts --tag post_val
Generate the complete config: Creates config snippets, assembles them in config.cfg, compares against device config and prints the diff
ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag 'full, diff' -C
Apply the config: Replaces current config on the device with changes made automatically saved to ~/device_configs/diff/device_name.txt
ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag full
All roles can be deployed individually to just to create the config snippet files, no connections are made to devices or changes applied. The merge
tag can be used in conjunction with any combination of these role tags to non-declaratively merge the config snippets with the current device config rather than replacing it. As the L3VNIs and interfaces are generated automatically at a bare minimum the variable files will still need current tenants and interfaces as well as the advanced variable sections.
Ansible tag | Playbook action |
---|---|
bse | Generates the base configuration snippet saved to device_name/config/base.conf |
fbc | Generates the fabric and intf_cleanup configuration snippets saved to fabric.conf and dflt_intf.conf |
tnt | Generates the tenant configuration snippet saved to device_name/config/svc_tnt.conf |
intf | Generates the interface configuration snippet saved to device_name/config/svc_intf.conf |
rte | Generates the route configuration snippet saved to device_name/config/svc_rte.conf |
merge | Non-declaratively merges the new and current config, can be run with any combination of role tags |
Generate the fabric config: Creates the fabric and interface cleanup config snippets and saves them to fabric.conf and dflt_intf.conf
ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag fbc
Apply tenants and interfaces non-declaratively: Add additional tenant and routing objects by merging their config snippets with the devices config. The diffs for merges are simply the lines in the merge candidate config so wont be as true as the diffs from declarative deployments
ansible-playbook PB_build_fabric.yml -i inv_from_vars_cfg.yml --tag tnt,rte,merge,diff
Post Validation checks
A declaration of how the fabric should be built (desired_state) is created from the values of the variables files and validated against the actual_state. napalm_validate can only perform a compliance check against anything it has a getter for, for anything not covered by this the custom_validate filter plugin is used. This plugin uses the same napalm_validate framework but the actual state is supplied through a static input file (got using napalm_cli) rather than a getter. Both validation engines are within the same validate role with separate template and task files.
The results of the napalm_validate (nap_val.yml) and custom_validate (cus_val.yml) tasks are joined together to create the one combined compliance report. Each getter or command has a complies dictionary (True or False) to report its state which feeds into the compliance reports overall complies dictionary. It is based on this value that a task in the post-validation playbook will raise an exception.
napalm_validate
As Napalm is vendor agnostic the jinja template file used to create the validation file is the same for all vendors. The following elements are validated by napalm_validate with the roles being validated in brackets.
- hostname (fbc): Automatically created device names are correct
- lldp_neighbors (fbc): Devices physical fabric and MLAG connections are correct
- bgp_neighbors (fbc, tnt): Overlay neighbors are all up (strict). fbc doesn't check for sent/rcv prefixes, this is done by tnt
An example of the desired and actual state file formats.
- get_bgp_neighbors:
global:
router_id: 192.168.101.16
peers:
_mode: strict
192.168.101.11:
is_enabled: true
is_up: true
custom_validate
custom_validate requires a per-OS type template file and per-OS type method within the custom_validate.py filter_plugin. The command output is collected in JSON format using naplam_cli, passed through the nxos_dm method to create a new actual_state data model and along with the desired_state is fed into napalm_validate using the compliance_report method.
The following elements are validated by napalm_validate with the roles being validated in brackets.
- show ip ospf neighbors detail (fbc): Underlay neighbors are all up (strict)
- show port-channel summary (fbc, intf): Port-channel state and members (strict) are up
- show vpc (fbc, tnt, intf): MLAG peer-link, keep-alive state, vpc status and active VLANs
- show interfaces trunk (fbc, tnt, intf): Allowed vlans and STP forwarding vlans
- show ip int brief include-secondary vrf all (fbc, tnt, intf): Layer3 interfaces in fabric and tenants
- show nve peers (tnt): All VTEP tunnels are up
- show nve vni (tnt): All VNIs are up, have correct VNI number and VLAN mapping
- show interface status (intf): State and port type
- show ip ospf interface brief vrf all (rte): Tenant OSPF interfaces are in correct process, area and are up
- show bgp vrf all ipv4 unicast (rte): Prefixes advertised by network and summary are in the BGP table
- show ip route vrf all (rte): Static routes are in the routing table with correct gateway and AD
An example of the desired and actual state file formats
cmds:
- show ip ospf neighbors detail:
192.168.101.11:
state: FULL
192.168.101.12:
state: FULL
192.168.101.22:
state: FULL
To aid with creating new validations the custom_val_builder directory is a stripped down version of custom_validate to use when building new validations. The README has more detail on how to run it, the idea being to walk through each stage of creating the desired and actual state ready to add to the validate roles.
Running Post-validation
Post-validation is hierarchial as the addition of elements in the later roles effects the validation outputs in the earlier roles. For example, extra VLANs added in tenant_service will effect the bse_fbc post-validate output of show vpc (peer-link_vlans). For this reason post-validation must be run for the current role and all applied roles before it. This is done automatically by Jinja template inheritance as calling a template with the extends statement will also render the inheriting templates.
Ansible tag. | Playbook action |
---|---|
bse_fbc | Validates the configuration applied by the base and fabric roles |
bse_fbc_tnt | Validates the configuration applied by the base, fabric and tenant roles |
bse_fbc_tnt_intf | Validates the configuration applied by the base, fabric, tenant and interfaces roles |
full | Validates the configuration applied by the base, fabric, tenant, interfaces and route roles |
Run fabric validation: Runs validation against the desired state got from all the variable files. There is no differentiation between naplam_validate and custom_validate, both are run as part of the validation tasks
ansible-playbook PB_post_validate.yml -i inv_from_vars_cfg.yml --tag full
Viewing compliance report: When viewing the validation report piping it through json.tool makes it more human readable
cat ~/device_configs/reports/DC1-N9K-SPINE01_compliance_report.json | python -m json.tool
Caveats
When starting this project I used N9Kv on EVE-NG and later moved onto physical devices when we were deploying the data centers. vPC fabric peering does not work on the virtual devices so this was never added as an option in the playbook.
As deployments are declarative and there are differences with physical devices you will need a few minor tweaks to the bse_tmpl.j2 template as different hardware can have slightly different hidden base commands. An example is the command system nve infra-vlans
, it is required on physical devices (command doesnt exist on N9Kv) in order to use an SVI as an underlay interface (one that forwards/originates VXLAN-encapsulated traffic). Therefore on physical devices unhash this line in bse_tmpl.j2, it is used for the OSPF peering over the vPC link (VLAN2).
{# system nve infra-vlans {{ fbc.adv.mlag.peer_vlan }} #}
The same applies for NXOS versions, it is only the base commands that will change (features commands stay the same across versions) so if statements are used in bse_tmpl.j2 based on the bse.adv.image
variable.
Although they work on EVE-NG it is not perfect for running N9Kv. I originally started on nxos.9.2.4
and although it is fairly stable in terms of features and uptime, the API can be very slow at times taking upto 10 minutes to deploy a device config. Sometimes after a deployment the API would stop responding (couldn't telnet on 443) but NXOS CLI said it was listening. To fix this you have to disable and re-enable the nxapi feature. Removing the command nxapi use-vrf management
seems to have helped to make the API more stable.
I moved onto to NXOS nxos.9.3.5
and although the API is faster and more stability, there is a different issue around the interface module. When the N9Kv went to 9.3 the interfaces where moved to a separate module.
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 64 Nexus 9000v 64 port Ethernet Module N9K-X9364v ok
27 0 Virtual Supervisor Module N9K-vSUP active *
With 9.3(5), 9.3(6) and 9.3(7) on EVE-NG up to 5 or 6 N9Ks it is fine, however when you add anymore N9Ks (other device types are fine) things start to become unstable. New devices take an age to boot up and when they do their interface linecards normally fail and go into the pwr-cycld state.
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 64 Nexus 9000v 64 port Ethernet Module pwr-cycld
27 0 Virtual Supervisor Module N9K-vSUP active *
Mod Power-Status Reason
--- ------------ ---------------------------
1 pwr-cycld Unknown. Issue show system reset mod ...
This in turn makes other N9Ks unstable, some freezing and others randomly having the same linecard issue. Rebooting sometimes fixes it but due to the load times it is unworkable. I have not been able to find a reason for this, it doesn't seem to be related to resources for either the virtual device or the EVE-NG box.
On in N9Kv 9.2(4) there is a bug whereas you cant have '>' in the name of the prefix-list in the route-map match statement. This name is set in the service_route.yml variables svc_rte.adv.pl_name
and svc_rte.adv.pl_metric_name
. The problem has been fixed in 9.3.
DC1-N9K-BGW01(config-route-map)# match ip address prefix-list PL_OSPF_BLU100->BGP_BLU
Error: CLI DN creation failed substituting values. Path sys/rpm/rtmap-[RM_OSPF_BLU100-BGP_BLU]/ent-10/mrtdst/rsrtDstAtt-[sys/rpm/pfxlistv4-[PL_OSPF_BLU100->BGP_BLU]]
If you are running these playbooks on MAC you may get the following error when running post-validations:
objc[29159]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[29159]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
Is the same behaviour as this older ansible bug, the solution of adding export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
before running the post-validation playbook solved it for me.