vPC failure scenarios are sometimes destructive. However, if you have good understanding on vPC and you follow Cisco recommended vPC design, then you can handle Virtual Port Channel (vPC) failure scenarios with confidence. In this lesson, i will be discussing different vPC Failure Scenarios, it’s impact on the network and how to solve the problem with Cisco recommended way.
vPC Design:
Depending on your requirement, vPC can be design as one-sided (regular) vPC, double sided or Multilayer (DCI) vPC. You can use Cisco guide for vPC design from here.
If you are new to vPC configuration, below articles are recommended for you from this blog.
How to configure Cisco Nexus vPC
How to configure Double-Sided vPC in Cisco Nexus
vPC Failure Scenarios
I will be discussing based on following failures.
- vPC keep-alive link failure
- vPC peer-link failure
- Member port failure
- vPC Peer switch failure
- Dual Failure Scenarios
++ Case 1: Peer-link failure, followed by Keep-alive link
++ Case 2: Keep-alive link failure, followed by Peer-link
vPC keep-alive link failure:
Impact:
If only keep-alive fails, nothing will happen. Only heartbeat between Primary and Secondary node will be lost.
Solution:
Restore the link as early as possible to avoid further complication if double failure happens.
vPC peer-link failure
Impact:
If peer-link fails, then all the member ports from vPC secondary node will be suspended. Here important to note, keep-alive is active in this scenario, which allowing nodes to exchange heartbeat.
Solution:
Make sure peer-link is UP and running.
Recommendation:
The peer-link is important, so it’s really good idea to create port channel with multiple ports from multiple modules, so that, if a link/module goes faulty, other links remain up and active.
Member port failure
Impact:
If the member port fails for a particular end host, that host only will be affected. All other members will still be operational. In case of one link fails, then traffic will be through another interface. If both fails, then full outage for that end host.
Solution:
Make sure members are up and running.
vPC Peer switch failure
Impact:
In case of Primary switch failure in vPC, secondary switch will be promoted as operational primary and forward all the traffic. If secondary switch fails, primary will keep forwarding traffic like earlier.
Solution:
Bring the peer switch UP. Then, make sure the keep-alive is UP and make sure it’s operational. And, then move to peer-link and lastly, the member ports.
Dual Failure Scenarios
In dual failure scenarios, we will be discussing below failure cases.
1. Case: Peer-link failure, followed by Keep-alive link
2. Case: Keep-alive link failure, followed by Peer-link
Case 1: Peer-link failure, followed by Keep-alive link
Here, the member port will be suspended first due to peer-link down, but the heartbeat is there through keep-alive link. Traffic will flow through the primary peer switch. Now, if keep-alive fails, the suspended ports will remain suspended and all the traffic keeps flowing through primary node.
Solution:
Just bring the keep-alive link first and then work with peer-link. You should maintain this order.
Case 2: Keep-alive link failure, followed by Peer-link
This failure is most critical. If keep-alive link fail first, nothing will happen due to vPC peer roles are already decided. However, if peer-link dies after the keep-alive, secondary vPC node will start thinking that, the primary node are completely down because of no heartbeat from Primary node. So, secondary node will become operational primary. In this case, both vPC nodes will forward the traffic. This type of scenario called split brain scenario in vPC.
Solution:
Make all the member nodes from secondary switch are down. Then, bring the keep-alive link. After restoring heartbeat (keep-alive), make the peer-link up and running. If vPC form, then up the member ports.
If keep-alive went down and we un-noticed it… What will be impact if keep alive link not brought up.
I already discussed, what will happen if keep-alive goes down.
If the keep alive link goes down,then how Peer know live status of other peer ?
Does heartbeat flow through the peerlink ?
Hi Farzana, heartbeat only flow through keep-alive.
Really very nice and live scenario basis issues explained clearly with diagram.
Thanks Arumugam..
thanks for article. I understand the the order of fails is quite important, but why is the order of bringing of ports back to up important as well? say the primary switch reloaded and came back with all ports down. what happens if I bring up the peer-link first then keep-alive?
If you bring up the peer-link first, how nodes will decide for primary and secondary?
I labbed it, I think it doesn’t make a big difference. I shutdown keep-alive and followed-by peer-link. I saw both switches as primary. and I enabled keep-alive only they were still in split-brain state, they were both primary until I enabled peer-link too.
vice versa also seems to end up with same results. I put them in split brain state, and I enabled peer-link first, their roles didn’t change until I enabled keep-alive. I think role change doesn’t happen until both of keep-alive and peer-link come up.
this is output for keep-alive up, peer-link down state:
switch1:
vPC domain id : 70
Peer status : peer link is down
vPC keep-alive status : peer is alive
vPC role : primary
switch2:
vPC domain id : 70
Peer status : peer link is down
vPC keep-alive status : peer is alive
vPC role : secondary, operational primary
Number of vPCs configured : 2
great
thanks 🙂
Nicely explained , I read several docs but this is best.
Thanks
Glad to know that. Thanks Malik.
Dude great stuff, you kill it.
Can you give me a spanning-tree best practice? I’m very new when it comes to Data Center stuff but I’m catching up really fast.
Hi Freddy, i will publish articles on spanning-tree soon. Thanks for your comment.
Well write up buddy.. Kudos..
Thank you so much.