BGP Protocol, How to Troubleshoot BGP Peering Issues?

BGP is a path-vector routing protocol that provides scalability, flexibility, and network stability. BGP is the only Exterior Gateway Protocol (EGP) that advertises, learns, and chooses the best paths inside the global Internet.

In large-scale enterprise and data center networks, we typically use BGP to exchange routing information within these networks and with one or more ISPs.

What are the Requirements for Forming BGP Peers?

BGP Peering Example


Routers must meet several requirements to become BGP neighbors/peers:

  • The BGP router IDs of the two routers must be different.
  • If configured, BGP authentication (MD5) must match.
  • Agree on AS numbers, a local router’s ASN (on the router bgp asn) must match the neighboring router’s reference to that ASN with its neighbor ip remote-as asn.
  • Each router must be part of the TCP connection with the other router. The remote router’s IP address used in that TCP connection must match what the local router configures in a BGP neighbor remote-as command.
  • Neighbors must agree on address families (capabilities), IPv4 unicast, L2VPN EVPN … etc.

BGP Neighboring States Review

BGP States


BGP Peer Troubleshooting Cases

BGP peering issues fall primarily into two categories:

BGP Peer Down Case


A down BGP peer state is in either an Idle or Active state; these states would mean the following possible problems:

  • No route to peer address (IP connectivity not present, including the default route 0.0.0.0/0).
  • Configuration error, such as update-source command is missing or wrongly configured.
  • TCP is established, but BGP negotiation fails; for example, misconfigured ASN.
  • Routers did not agree on the peering parameters, for example, the stub area flag.


The steps to troubleshoot down peering issues for BGP neighbors:

  • Step 1: Verify the configuration for correct peering IP addresses, AS numbers, update-source interface, authentication passwords, eBGP multi-hop configuration, .. etc.
  • Step 2: Verify reachability using the ping a.b.c.d [source-interface-id | source-ip-address], where a.b.c.d is the peer’s IP address.
  • Step 3: Verify the TCP connections using the command show socket connection TCP.
  • Step 4: Verify any IP ACLs in the path. The ACLs in the path should permit TCP connections on port 179 and ICMP packets that can help verify reachability.
  • Step 5: Use the debug ip bgp keepalives command to capture BGP packets. (NX-OS)

The above five steps should help you understand the peering issues when the BGP peer is down (not flapping).

Flapping BGP Peer Case


When the BGP peer is flapping, it means it keeps moving between Idle and Established states. Flapping BGP peers could be due to one of the following several reasons:

  • The Hold Timer expired. And that could happen due to:
    • Interface drops: maybe the interface drops cause the BGP control packets (keepalive messages).
    • High CPU or Control-plane policy drops (Improper control-plane policing): maybe the BGP packet dropping is on the CPU interface.
    • If configured, validate that the BFD session is stable.
    • The MTS queue is stuck (in NX-OS). Or there is a BGP keepalive generation problem (OutQ/InQ). This means the BGP process is dropping the BGP messages.

  • MTU mismatch. (I saw this in a production environment)
    • If a device in the path or even the destination cannot accept the packets with a higher MTU, it sends an ICMP error message back to the BGP speaker.
    • Because the Update message is used as keepalive as well! After 180 seconds, the destination router sends a notification back to the source with a hold time expired error message.

  • Bad BGP update.
    • Bad links carrying the update or bad hardware cause intermittent packet dropping.

Conclusion

For BGP peering to be established well, make sure that the peering requirements are met. Troubleshoot BGP issues depending on the specific issue you’re facing; there are different troubleshooting methods as described in this article.
Remember, always ensure that you understand the potential impact of any command before running it in a live network environment.

I hope this summary was useful. For comprehensive content, you can refer to my CCIE Data Center (v3.1) – BGP Udemy course.
Feel free to leave a comment or a question.

5 1 vote
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
Scroll to Top