vPC

Configuration

vPC Peer-Keepalive

# In SW-Master

!!! We need to activate LACP and vPC feature, by default they are turned off
feature vpc
feature lacp

interface mgmt0
  vrf member management
  ip address 10.1.1.1/24

vpc domain 11
  !!! Lower priority means high likelihood of becoming vPC primary switch
  role priority 10
  !!! How to reach the peer-switch over keepalive link
  peer-keepalive destination 10.1.1.2 source 10.1.1.1 vrf management
  !!! Activates auto recovery, in case of primary switch failure, secondary with
  !!! can forward packets
  auto-recovery
  !!! peer-switch represent both vPC peer as single device to the downstream
  !!! STP aware devices and gives us shorter STP convergence time.
  peer-switch

# In SW-Slave

feature vpc
feature lacp

interface mgmt0

  vrf member management
  ip address 10.1.1.2/24

vpc domain 11
  role priority 20
  peer-keepalive destination 10.1.1.1 source 10.1.1.2 vrf management
  auto-recovery
  peer-switch

vPC Interface Configuration

Verification

  • show vpc brief | role

  • show vpc [consistency-parameters]

Switch profiles and Config Sync

  • As of 06/2020, must use oob management (mgmt0) as vPC keepalive link. Underlying protocol - Cisco Fabric Services over IP (CFSoIP) which allows config-sync does not allow layer-3 or SVI interface as keepalive link.

  • cannot have same configuration under configuration mode and switch-profile, in that case validation will fail. Any configuration changes is allowed only once either in global configuration mode (config terminal) or in switch-profile mode.

Enable Cisco Fabric Services over IP (CFSoIP)

Configure required configuration on Master

Delete switch profile buffer

Import configuration to switch profile

Verification

  • show cfs status

  • show switch-profile status

  • show switch-profile buffer

  • show switch-profile session-history

vPC with HSRP and VRRP

  • The control plane refers to traffic that is sent to the Nexus switch. In the case of HSRP, this is ARP traffic. In control plane terms, HSRP with vPC is active/passive. This is because only the primary switch responds to ARP requests.

  • The data plane refers to traffic that the Nexus switch forwards. For example, traffic from one server to another. In data plane terms, HSRP with vPC is active/active. Both of the switches forward traffic.

  • Do not use HSRP object tracking with vPC

  • Cisco recommends configuring the HSRP with the default settings when using vPC

vPC Enhancement

vPC auto-recovery (on by default)

  • Certain failures can result in neither vPC peers forwarding

  • Power Outage with node failture problem case

    • Power outage on both Peers

    • Only one peer is restored

    • vPC Peer Keepalive never comes up

    • Means vPC peer links can never come up

    • Means vPC member ports can never come up

    • Servers are isolated

  • vPC Auto Recovery allows single Peer to promote itself to Primary

    • If Peer Link does not initialize before auto recover timeout, promote myself to primary and bring up member ports

Gradual failure problem case

  • vPC Peer link goes down

  • vPC secondary ping vPC primary and gets response

  • vPC Secondary disables vPC member ports

  • vPC Primary completely fails

  • vPC Secondary does not re-activate member ports

  • Servers are isolated

  • vPC Auto Recovery allows secondary to detect this

    • vPC primary is continually tracked over vPC Peer Keepalive

    • Peer Keepalive failure at later time results in Secondary promoting itself to primary

    • Secondary re-activates Member ports

vPC Failure Detection and Recovery

vPC Initialization Order of Operations

  • vPC process starts

  • IP/UDP 3200 Peer Keepalive connectivity established

  • Peer-link adjacency forms

  • vPC Primary/Secondary role detection

  • vPC Consistency Checks performed

  • Layer 3 SVIs move to up/up state

  • vPC member ports move to up/up state

vPC Primary/Secondary election

  • In a vPC system, one vPC peer device is defined as vPC primary and one is defined as vPC secondary, based on these parameters and in this order

    • vPC Primary sticky-bit set to 0 or 1. vPC peer device with sticky bit set 1 wins this comparison, it becomes vPC primary regardless of the configured vPC role priority value or system MAC addresses both peers have.

    • If both vPC peer switches have the same Sticky Bit value, the election process proceeds to the next step to compare the user-defined vPC role priority (Cisco NX-OS software uses the lowest numeric value to elect the primary device).

    • If both vPC roles are configured to the same value, the election process proceeds to compare the system MAC addresses (Cisco NX-OS software uses the lowest MAC address to elect the primary device).

  • Layer 3 SVI activation: timer controlled delay restore interface-vlan

  • vPC member port activation: timer controlled by delay restore

vPC Primary Sticky Bit

  • vPC Primary Sticky bit is a Programmed Protection Mechanism introduced to avoid unnecessary role change (which would potentially cause disruption on the network) when the Primary Switch gets reloaded unexpectedly. vPC Primary Sticky Bit allows the alive switch sticks to its PRIMARY role when a dead switch comes back alive or when an isolated switch is integrated back into the VPC domain.

  • Toggling vPC Primary Sticky Bit:

  1. vPC Primary Sticky Bit value is set to TRUE in this scenario: The current vPC Primary reboots and the vPC-enabled switch changes its role from vPC Secondary to vPC Operational Primary. The sticky bit is not set if the role changes from vPC Operational Secondary to vPC Primary. A vPC-enabled switch changes its role from None establish to vPC Primary when reload restore timer (240 sec by default) expires.

  2. vPC Primary Sticky Bit value is set to FALSE in these scenarios: A vPC-enabled switch is rebooted (Sticky Bit is set to FALSE by default). vPC role priority is changed or re-entered. vPC Sticky Primary bit is reported under vPC Manager software component structure, and can be checked with this NX-OS exec mode command.

vPC Consistency Checks

  • vPC peers sync control plane over Peer Link with Cisco Fabric Services (CFS)

  • Includes advertisement of "Consistency Parameters" that must match for vPC to form successfully. E.g line card type (M or F), Speed, Duplex, Trunking, LACP mode, STP configs

  • Three types of consistency checks

    • Type 1 Global

      • Mismatch results in vPC failing to form

      • E.g. STP mode Rapid-PVST vs MST

    • Type 1 interface

      • Mismatch results in Vlans being suspended on vPC member

      • E.g. STP port type network vs normal

    • Type 2

      • Mismatch results in syslog message but not vPC failure

      • Can result in failures in data plane

      • E.g. MTU Mismatch

Graceful Consistency Check

  • Consistency failure results only vPC Secondary disabling vPCs: 50% bandwidth reduction in favor of 0% packet loss

  • Enabled by default in version 5.2 and later: show vpc

vPC Member Port Failure Detection

  • vPC Peers exchange vPC member status over Peer link

  • Failed member ports result in "Orphan Ports"

    • Orphan ports are single attached ports that use a vPC VLAN

    • vPC VLANS are any VLANS allowed on the peer link

    • show vpc orphan-ports

  • Traffic to Orphans use vPC Peer Link as a last resort

    • Orphan Ports use modified loop prevention

    • Traffic from remote Orphan is allowed to enter Peer Link and exit via local member

    • Traffic from remote member is allowed to enter peer link and exit via local orphan

    • Traffic from remote member is not allowed to enter via peer link and exit via local member

  • Orphan ports should be avoided at all costs

    • vPC Peer link is the bottleneck of the system and should be used only for control plane under ideal circumstances

  • Orphan ports can result in traffic black holes

    • Orphans connected to vPC secondary can be isolated from their default gateway if vPC peer Link fails

vPC Failure Problem Cases

  • When vPC peer-link fails down and vPC peer-keepalive link is still up, the vPC secondary peer device performs these operations:

    • Suspends its vPC member ports.

    • Shuts down the SVI associated with the vPC VLAN.

  • This protective behavior from vPC redirects all south-to-north traffic to the vPC primary device.

  • Note: When vPC peer-link is down, both vPC peer devices cannot synchronize with each other anymore, so the designed protection mechanism leads to the isolation of one of the peer devices (in occurrence, the secondary peer device) from the data path.

  • Peer Keepalive and Peer links must not share fate in order to prevent split brain, e.g. separate management switch, separate port channels on separate line cards

Recommendation for re-introducing isolated vPC

  • Before re-introducing isolated vPC device back into production, check the LACP roles on both boxes. If the same role, disable auto recovery with no auto-recovery under the vPC domain on both peers and reload the isolated device. After reload, the isolated device comes up with the LACP role 'none established' and can be introduced into the vPC without LACP role re-election.

    • show system internal vpcm info all | i "LACP Role"

    • show system internal vpcm info all | i "LACP Per"

  • Ensure sticky bit is set to false: show sys internal vpcm info all | i i stick

    • If the sticky bit is set to true, reconfigure the vPC role priority. This means to reapply the original configuration for the role priority. If the role priority is default, then reapply the default. This step resets the sticky bit from true to false.

    • Check sticky bit again and if the sticky bit is still true, reload the VDC or chassis

  • When the sticky bit is false, bring up the PKA and Peer Link (PL) (one by one by not shutting the interface.)

  • Bring up the orphan ports

  • Bring up the Layer 3 physical interfaces

Verification

  • show vpc

  • show vpc consistency-parameters [global]

  • show port-channel compatibility-parameters

  • show run interface port-channel membership

  • show system internal vpcm info all | in "LACP Role": check LACP Role

  • show system internal vpcm info all | in "LACP Per": check LACP Role

  • show system internal vpcm info all | in i stick: check sticky bit

Reference

  • Switch Profile: https://sharifulhoque.blogspot.com/2020/06/how-to-setup-cisco-nx-os-switch-profile.html

  • Advanced vPC: https://networkdirection.net/articles/virtual-port-channels-vpc/advancedvpc/

  • vPC with HSRP and VRRP: https://networkdirection.net/articles/virtual-port-channels-vpc/vpcwithhsrpvrrp/

  • vPC Auto Recovery: https://community.cisco.com/t5/networking-knowledge-base/vpc-auto-recovery-feature-in-nexus-7000/ta-p/3123651

  • vPC Election Process: https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/nx-os-software/212589-understanding-vpc-election-process.html

  • Nexus 7000 Chassis Replacement Procedure: https://www.cisco.com/c/en/us/support/docs/interfaces-modules/nexus-7000-series-supervisor-1-module/119033-technote-nexus-00.html

  • Do not Allow LACP rate fast for vPC Peerlink: https://quickview.cloudapps.cisco.com/quickview/bug/CSCuu91089

Last updated