VXLAN
VXLAN Packet Header


VXLAN RT and RD
Route Distinguisher (the auto method)
Used in BGP table to keep all routes unique
4-bytes admin field: 2-byte numbering field, ex: 12.12.12.12:32777 or 13.13.13.13:3
Admin field is BGP router ID
The numbering field is the internal VRF ID
The numbering field for L2/MAC addresses starts with 32767 + VLAN number
The numbering field for L3/IP addresses starts with 3 (1&2 are reserved fro default and management VRF).
Route Target (the auto method)
Encoded in BGP extended community
Consists of 2-bytes admin field and a 4-byte number field, ex: 2:100000
The admin field is the BGP ASN
THe Numbering field is the tenant VNI
For Multi-AS environments, the Route Target must either be statically defined or rewritten to match the ASN portion of the Route-Targets
Examples of an auto derived Route Target:
IP-VRF within ASN 65001 and L3VNI 50001 - Route Target 65001:50001
MAC-VRF within ASN 65001 and L2VNI 30001 - Route Target 65001:30001
Configuration
VXLAN Prerequisites
Prerequisites are hardware/software specific
For Nexus 5600 as hardware VTEP
Set switching mode to store-and-forward and reboot: hardware ethernet store-and-fwd-switching
Establish IP unicast reachability between VTEPs
Establish PIM BIDIR reachability between VTEPs
SPines can be phantom RPs for redundancy
Enable features:
feature vn-segment-vlan-based
feature nv overlay
Flood and Learn
Map VLAN to VXLAN: vn-segment under vlan config mode
Create Network Virtualization Edge (NVE) interface: interface nve
Specify VTEP source: source interface loopback0
Specify VNI membership: member vni [vnid]
Specify multicast group for BUM replication: mcast-group [group]
Multicast group 228.9.10.11 in this example must be the same on all VTEPs
VNID 11110 is local significant on each VTEPs
BGP EVPN
Map VLAN to VXLAN: vn-segment under vlan config mode
Create Network Virtualization Edge (NVE) interface: interface nve
Specify VTEP source: source interface loopback0
Specify VNI membership: member vni [vnid]
Specify multicast group for BUM replication: mcast-group [group]
Specify BGP as control plane protocol: host-reachability protocol bgp
Extablish BGP EVPN peerings
address-family l2vpn evpn
extended communities required
Generate BGP advertisement (like network statement)
evpn
vni [vnid] l2
rd auto
route-target import auto
route-target export auto
Operation Steps
Map VLANs to VXLAN Network Identifiers (VNIs/VNIDs)
Advertise information into BGP
MAC to L2 VNI to VTEP mapping
IP to L3 VNI to VTEP mapping
Import MAC addresses into the CAM table for bridging
Route traffic through SVIs to remote segments
Spine configuration
Spine is Route Reflector
Leaf configuration
Verification
show interface nve id
show platform fwm info nve peer|vni [all]
show mac address-table [vlan id]
show nve peer|vni
show bgp l2vpn evpn [summary]
show bgp l2vpn evpn neighbor $address advertised-routes
show ip mroute 228.9.10.11
show nve vni
show nve peers
show l2route evpn mac all
show l2route evpn mac-ip all
show nve internal bgp rnh database (rnh: recursive next hop)
show system internal l2rib event mac
show fabric forwarding internal event-history events
show fabric forwarding ip local-host-db vrf $VRF
Inter-VLAN Routing - Asymmetric vs Symmetric IRB
EVPN Intergraged Routing and Bridging (IRB) has two options:
Asymmetric IRB
Symmetric IRB
Asymmetric IRB
Ingress VTEP (Leaf) does both L2 and L3 lookup
Egress VTEP does L2 lookup only
I.e. Bridge - Route - Bridge -> Need to configure SVI for all segments on all VTEPs as it will need for both forward and return traffic -> not efficient as it will increase ARP cache and CAM table size and control plane scaling issue
Symmetric IRB
Ingress VTEP does both L2 and L3 lookup
Egress VTEP does both L3 and L2 Lookup
I.e. Bridge - Route - Route - Bridge
How Symmetric IRB works
New concept called Layer-3 VNI
Each tenant VRF is mapped to a unique Layer-3 VNI
Mapping mus match on all VTEPs
All VXLAN routed traffic is encapsulated with L3 VNI in VXLAN header which allows for a single shared VNI among all VTEPs
L2 VNIs only need to be configured where access ports exist -> saving of ARP and CAM table spaces
Configuration
vPC and VXLAN BGP
VXLAN Traffic is tunneled over the underlay network using the BGP next-hop address of the remote VTEP
NVE source interface (i.e. loopback0) is the default BGP next-hop for advertised routes
in a vPC, both vPC peers advertise duplicate EVPN MAC/IP routes to spine RRs
With other attributes equal, next-hop is tie breaker in BGP Best Path Seletion
Implies that one vPC peer is always preferred for dual attached hosts
Result is that egress traffic from vPC member is load balanced, but return ingress traffic is polarized -> Solution is to use Anycast VTEP address
vPC peers share duplicate IP address on NVE source interface
Peer 1 - interface loop 0; ip address 1.1.1.51/32
Peer 2 - interface loop 0; ip address 1.1.1.52/32
Both peers - interface loopback0; ip address 1.1.1.111/32 secondary
BGP next-hop is automatically set to secondary address for locally originated routes: i.e. L2VPN EVPN MAC/IP Routes for vPC member ports. This can be changed to primary ip address by using below configuration
Result is that ingress flows from spines are load balanced. Other leafs use IGP ECMP to reach shared secondary ip address
on Nexus 5600, all traffic across the vPC peeer link must be VXLAN encapsulated due to ASIC implementation
Normal vPC Peer link is a classical ethernet trunk
Result is that East/West flows over vPC Peer link are broken by default
i.e. the VNI number is lost when packet is sent out the peer link
Peer link is normally only used for orphans or in failure scenarios
Result is that everything looks fine until the failure occurs
Traffic to orphans & single attached members black-holed over vPC peer link
Workaround is to maintain VXLAN encapsulation across Peer Link: vpc nve peer-link-vlan
Create new VLAN and specify as NVE Peer Link VLAN
vlan 999
vpc nve peer-link-vlan 999
Establish layer 3 peering across NVE Peer Link VLAN
interface vlan 999
ip router ospf 1 area 0
ip router isis 1
Traffic engineer so other vPC Peer's VTEP loopback is preferred over vPC Peer Link
ip ospf cost 10
isis metric 10 level-2
VXLAN Underlay Fabric Convergence
VXLAN underlay fabric convergence is based on three factors which must be addressed separately to achieve High Availability for VXLAN overlay flows
IGP convergence
PIM convergence
BGP convergence
4 Factors generally affect IGP convergence time
Failure detection time: is the neighbor down?
Link up/down event
Routing protocol hello/dead timers
IP SLA & EEM
Bidirectional Forwarding Detection (BFD)
Event Propagation Time: tell neighbors about the change
EIGRP Query/Reply
OSPF LSA Flooding Procedure
BGP Update/Withdraw
Recalculation time: Run SPF/DUAL/etc. calculation
EIGRP DUAL
OSPF SPF
BGP Best Path Selection
Forwarding Table update Time: Install new paths
EIGRP topology to RIB download
RIB to Software FIB Download
Software FIB to hardware TCAM download
Methods of Modifying Convergence Time
Reactive optimizations
e.g. carrier delay and link debounce timer
e.g. Fast Hellos & BFD
e.g. OSPF LSA & SPF pacing
e.g. FIB prefix prioritization
Proactive optimizations
EIGRP Feasible successors
OSPF Loop Free Alternate (LFA)
BGP Prefix Independent Convergence (PIC)
MPLS Traffic Engineering Fast Reroute (TE FRR)
BFD
Verification
show bfd neighbor
PIM Convergence
Generally two factors affect PIM convergence time
Neighbor Failure Detection Time: Is the PIM neighbor down
RP Failure Detection Time
Can I still join the (*,G)? - ASM and BIDIR
Can I still register the (S,G)? - ASM only
PIM RP, like a BGP RR, adds High Availability by adding Node Redundancy: RP should never be a single point of failure
Redundancy Design depends on PIM design
Any Source Multicast (ASM)
Auto-RP with multiple candidate mapping agents and RPs (slow convergence)
BSR with multiple BSR and RP candidates (slow convergence)
Anycast RP
Bidirectional RP (BiDir)
Phantom RP
Source Specific Multicast (SSM)
No RPs used, no redundancy needed
Anycast RP
Adds redundancy by sharing RP address between multiple nodes: e.g. duplicate loopback1 address advertised into IGP
Multicast control plane must sync between anycast RPs
Multicast Source Discovery Protocol (MSDP)
PIM anycast
Verification
debug ip pim data-register send
debug ip pim data-register receive
debug ip pim null-register
Phantom RP
BiDir PIM does not use REgister or (S,G) join
only multicast state is (*,G) rooted at the RP
Implies that anycast isn't needed to sync state
Phantom RP provides redundancy based on longest match routing
Primary RP advertises longest match into IGP
Secondary RP advertises next longest match into IGP
Primary RP fails, secondary RP's address now becomes longest match
Verification
show ip pim rp
External Routing
By defaults, hosts in VXLAN fabric are isolated from the rest of the network
E.g. underlay fabric is in "default" VRF, while servers are in the tenant VRFs
Assumption is that tenant applications need external reachability
E.g. web clients on internet need access to web server in VXLAN fabric
Border leafs are used to connect external networks to internal fabric
Border leafs run multiple copies of the routing control plane
MP-BGP L2VPN EVPN to VTEPS inside VXLAN fabric
Tenant VRF aware IPv4/IPv6 Unicast BGP or IGP to external router(s)
MP-BGP to BGP/IGP redistribution occurs on Border Leaf
External router(s) can have Tenant VRFs
Allows for overlapping addressing inside Tenant networks, e.g. VRF-Lite
External router(s) don't require Tenant VRFs
can mix all routes into default routing table as long as addresses are unique
Border leafs maintain all host routes for all Tenant VRFs
e.g. they must import all prefixes into VRFs from MP-BGP
External Routers don't need host routes, just aggregates
Summarization should occurs at MP-BGP to BGP/IGP redistribution point
Route leaking could be used for longer match routing traffic engineering
Border Leaf Configuration
External Router Configuration
Verification
Border Leaf
show bgp vrf SHARED ipv4 unicast summary
VXLAN EVPN Multisite
Reference
VXLAN Network with MP-BGP EVPN Control Plane Design Guide: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-c07-734107.html
Configuration and Verification VXLAN with MP-BGP EVPN Control Plane: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/200952-Configuration-and-Verification-VXLAN-wit.html
VXLAN EVPN Multi-Site Design and Deployment White Paper: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-739942.html
https://chasewright.com/vxlan-evpn-multisite-setup-part-1/
Cisco NX-OS/IOS Multicast Comparison: https://docwiki.advanxer.com/docwiki.cisco.com/wiki/Cisco_NX-OS/IOS_Multicast_Comparison.html
NextGen DCI with VXLAN EVPN Multi-Site Using vPC Border Gateways White Paper: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/whitepaper-c11-742114.html
Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/pf/configuration/guide/b-pf-configuration/Introducing-Cisco-Programmable-Fabric-VXLAN-EVPN.html
Deploying a Data Center: http://dc.ciscolive.com/pod0/labs/lab1/lab1
Cisco Nexus 9000 Series NX-OS VXLAN Configuration Guide, Release 10.2(x): https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/102x/configuration/vxlan/cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-release-102x/m_configuring_vxlan_93x.html
VXLAN: https://www.youtube.com/playlist?list=PLDQaRcbiSnqFe6pyaSy-Hwj8XRFPgZ5h8
VXLAN Overview - Part 1: https://networkdirection.net/articles/routingandswitching/vxlanoverview/
VXLAN BGP EVPN Configuration - Part 6: https://networkdirection.net/articles/routingandswitching/vxlanoverview/vxlanevpnconfiguration/
Troubleshooting duplicate IP/MAC in MP-BGP EVPN VxLan on Nexus 9000: https://www.ciscolive.com/c/dam/r/ciscolive/us/docs/2019/pdf/CTHDCN-2304.pdf
VXLAN EVPN Multisite Implementation: https://www.youtube.com/watch?v=vMj-aGFjAKM&list=PLpGt4hh32rCrevUCtFL0N2FrTNpCNfIiC&index=14
VXLAN: https://rayka-co.com/course/vxlan-evpn/
VXLAN EVPN Multisite: https://www.youtube.com/watch?v=vJqwIl2V8GY
VXLAN Primer Series: https://www.youtube.com/watch?v=bSiriF8kM7E&list=PLxyr0C_3Ton2-AsrD2iMdQ1mV4bqae8kv&index=1
VXLAN Multisite: https://www.youtube.com/watch?v=KFW16GRFMz8&list=PLYzE2pIn57rHznu5eRUT88F5H9LAzrVF5&index=8
VXLAN BGP EVPN Multisite: https://www.youtube.com/watch?v=y-ZDCMwEpxw
https://datacenteroverlords.com/2022/12/13/in-defense-of-ospf-in-the-underlay-in-some-situations/
BGP EVPN Step by Step Configuration Example: https://blog.devopssimplified.com/BGP-EVPN-Step-by-Step
Last updated