vPC
Configuration
vPC Peer-Keepalive
# In SW-Master
!!! We need to activate LACP and vPC feature, by default they are turned off
feature vpc
feature lacp
interface mgmt0
vrf member management
ip address 10.1.1.1/24
vpc domain 11
!!! Lower priority means high likelihood of becoming vPC primary switch
role priority 10
!!! How to reach the peer-switch over keepalive link
peer-keepalive destination 10.1.1.2 source 10.1.1.1 vrf management
!!! Activates auto recovery, in case of primary switch failure, secondary with
!!! can forward packets
auto-recovery
!!! peer-switch represent both vPC peer as single device to the downstream
!!! STP aware devices and gives us shorter STP convergence time.
peer-switch
# In SW-Slave
feature vpc
feature lacp
interface mgmt0
vrf member management
ip address 10.1.1.2/24
vpc domain 11
role priority 20
peer-keepalive destination 10.1.1.1 source 10.1.1.2 vrf management
auto-recovery
peer-switch
vPC peer link configuration
vPC Interface Configuration
Verification
show vpc brief | role
show vpc [consistency-parameters]
Switch profiles and Config Sync
As of 06/2020, must use oob management (mgmt0) as vPC keepalive link. Underlying protocol - Cisco Fabric Services over IP (CFSoIP) which allows config-sync does not allow layer-3 or SVI interface as keepalive link.
cannot have same configuration under configuration mode and switch-profile, in that case validation will fail. Any configuration changes is allowed only once either in global configuration mode (config terminal) or in switch-profile mode.
Enable Cisco Fabric Services over IP (CFSoIP)
Configure required configuration on Master
Delete switch profile buffer
Import configuration to switch profile
Verification
show cfs status
show switch-profile status
show switch-profile buffer
show switch-profile session-history
vPC with HSRP and VRRP
The control plane refers to traffic that is sent to the Nexus switch. In the case of HSRP, this is ARP traffic. In control plane terms, HSRP with vPC is active/passive. This is because only the primary switch responds to ARP requests.
The data plane refers to traffic that the Nexus switch forwards. For example, traffic from one server to another. In data plane terms, HSRP with vPC is active/active. Both of the switches forward traffic.
Do not use HSRP object tracking with vPC
Cisco recommends configuring the HSRP with the default settings when using vPC
vPC Enhancement
vPC auto-recovery (on by default)
Certain failures can result in neither vPC peers forwarding
Power Outage with node failture problem case
Power outage on both Peers
Only one peer is restored
vPC Peer Keepalive never comes up
Means vPC peer links can never come up
Means vPC member ports can never come up
Servers are isolated
vPC Auto Recovery allows single Peer to promote itself to Primary
If Peer Link does not initialize before auto recover timeout, promote myself to primary and bring up member ports
Gradual failure problem case
vPC Peer link goes down
vPC secondary ping vPC primary and gets response
vPC Secondary disables vPC member ports
vPC Primary completely fails
vPC Secondary does not re-activate member ports
Servers are isolated
vPC Auto Recovery allows secondary to detect this
vPC primary is continually tracked over vPC Peer Keepalive
Peer Keepalive failure at later time results in Secondary promoting itself to primary
Secondary re-activates Member ports
vPC Failure Detection and Recovery
vPC Initialization Order of Operations
vPC process starts
IP/UDP 3200 Peer Keepalive connectivity established
Peer-link adjacency forms
vPC Primary/Secondary role detection
vPC Consistency Checks performed
Layer 3 SVIs move to up/up state
vPC member ports move to up/up state
vPC Primary/Secondary election
In a vPC system, one vPC peer device is defined as vPC primary and one is defined as vPC secondary, based on these parameters and in this order
vPC Primary sticky-bit set to 0 or 1. vPC peer device with sticky bit set 1 wins this comparison, it becomes vPC primary regardless of the configured vPC role priority value or system MAC addresses both peers have.
If both vPC peer switches have the same Sticky Bit value, the election process proceeds to the next step to compare the user-defined vPC role priority (Cisco NX-OS software uses the lowest numeric value to elect the primary device).
If both vPC roles are configured to the same value, the election process proceeds to compare the system MAC addresses (Cisco NX-OS software uses the lowest MAC address to elect the primary device).
Layer 3 SVI activation: timer controlled delay restore interface-vlan
vPC member port activation: timer controlled by delay restore
vPC Primary Sticky Bit
vPC Primary Sticky bit is a Programmed Protection Mechanism introduced to avoid unnecessary role change (which would potentially cause disruption on the network) when the Primary Switch gets reloaded unexpectedly. vPC Primary Sticky Bit allows the alive switch sticks to its PRIMARY role when a dead switch comes back alive or when an isolated switch is integrated back into the VPC domain.
Toggling vPC Primary Sticky Bit:
vPC Primary Sticky Bit value is set to TRUE in this scenario: The current vPC Primary reboots and the vPC-enabled switch changes its role from vPC Secondary to vPC Operational Primary. The sticky bit is not set if the role changes from vPC Operational Secondary to vPC Primary. A vPC-enabled switch changes its role from None establish to vPC Primary when reload restore timer (240 sec by default) expires.
vPC Primary Sticky Bit value is set to FALSE in these scenarios: A vPC-enabled switch is rebooted (Sticky Bit is set to FALSE by default). vPC role priority is changed or re-entered. vPC Sticky Primary bit is reported under vPC Manager software component structure, and can be checked with this NX-OS exec mode command.
vPC Consistency Checks
vPC peers sync control plane over Peer Link with Cisco Fabric Services (CFS)
Includes advertisement of "Consistency Parameters" that must match for vPC to form successfully. E.g line card type (M or F), Speed, Duplex, Trunking, LACP mode, STP configs
Three types of consistency checks
Type 1 Global
Mismatch results in vPC failing to form
E.g. STP mode Rapid-PVST vs MST
Type 1 interface
Mismatch results in Vlans being suspended on vPC member
E.g. STP port type network vs normal
Type 2
Mismatch results in syslog message but not vPC failure
Can result in failures in data plane
E.g. MTU Mismatch
Graceful Consistency Check
Consistency failure results only vPC Secondary disabling vPCs: 50% bandwidth reduction in favor of 0% packet loss
Enabled by default in version 5.2 and later: show vpc
vPC Member Port Failure Detection
vPC Peers exchange vPC member status over Peer link
Failed member ports result in "Orphan Ports"
Orphan ports are single attached ports that use a vPC VLAN
vPC VLANS are any VLANS allowed on the peer link
show vpc orphan-ports
Traffic to Orphans use vPC Peer Link as a last resort
Orphan Ports use modified loop prevention
Traffic from remote Orphan is allowed to enter Peer Link and exit via local member
Traffic from remote member is allowed to enter peer link and exit via local orphan
Traffic from remote member is not allowed to enter via peer link and exit via local member
Orphan ports should be avoided at all costs
vPC Peer link is the bottleneck of the system and should be used only for control plane under ideal circumstances
Orphan ports can result in traffic black holes
Orphans connected to vPC secondary can be isolated from their default gateway if vPC peer Link fails
vPC Failure Problem Cases
vPC Systems Behavior When a vPC Peer-Link Goes Down
When vPC peer-link fails down and vPC peer-keepalive link is still up, the vPC secondary peer device performs these operations:
Suspends its vPC member ports.
Shuts down the SVI associated with the vPC VLAN.
This protective behavior from vPC redirects all south-to-north traffic to the vPC primary device.
Note: When vPC peer-link is down, both vPC peer devices cannot synchronize with each other anymore, so the designed protection mechanism leads to the isolation of one of the peer devices (in occurrence, the secondary peer device) from the data path.
Peer Keepalive and Peer links must not share fate in order to prevent split brain, e.g. separate management switch, separate port channels on separate line cards
Recommendation for re-introducing isolated vPC
Before re-introducing isolated vPC device back into production, check the LACP roles on both boxes. If the same role, disable auto recovery with no auto-recovery under the vPC domain on both peers and reload the isolated device. After reload, the isolated device comes up with the LACP role 'none established' and can be introduced into the vPC without LACP role re-election.
show system internal vpcm info all | i "LACP Role"
show system internal vpcm info all | i "LACP Per"
Ensure sticky bit is set to false: show sys internal vpcm info all | i i stick
If the sticky bit is set to true, reconfigure the vPC role priority. This means to reapply the original configuration for the role priority. If the role priority is default, then reapply the default. This step resets the sticky bit from true to false.
Check sticky bit again and if the sticky bit is still true, reload the VDC or chassis
When the sticky bit is false, bring up the PKA and Peer Link (PL) (one by one by not shutting the interface.)
Bring up the orphan ports
Bring up the Layer 3 physical interfaces
Verification
show vpc
show vpc consistency-parameters [global]
show port-channel compatibility-parameters
show run interface port-channel membership
show system internal vpcm info all | in "LACP Role": check LACP Role
show system internal vpcm info all | in "LACP Per": check LACP Role
show system internal vpcm info all | in i stick: check sticky bit
Reference
Switch Profile: https://sharifulhoque.blogspot.com/2020/06/how-to-setup-cisco-nx-os-switch-profile.html
Advanced vPC: https://networkdirection.net/articles/virtual-port-channels-vpc/advancedvpc/
vPC with HSRP and VRRP: https://networkdirection.net/articles/virtual-port-channels-vpc/vpcwithhsrpvrrp/
vPC Auto Recovery: https://community.cisco.com/t5/networking-knowledge-base/vpc-auto-recovery-feature-in-nexus-7000/ta-p/3123651
vPC Election Process: https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/nx-os-software/212589-understanding-vpc-election-process.html
Nexus 7000 Chassis Replacement Procedure: https://www.cisco.com/c/en/us/support/docs/interfaces-modules/nexus-7000-series-supervisor-1-module/119033-technote-nexus-00.html
Do not Allow LACP rate fast for vPC Peerlink: https://quickview.cloudapps.cisco.com/quickview/bug/CSCuu91089
Last updated