BGP Protocol, How to Troubleshoot BGP Peering Issues?

BGP is a path-vector routing protocol that provides scalability, flexibility, and network stability. BGP is the only Exterior Gateway Protocol (EGP) that advertises, learns, and chooses the best paths inside the global Internet. In this article, I will explain how to troubleshoot BGP peering issues.

What are the Requirements for Forming BGP Peers?


Routers must meet several requirements to become BGP neighbors/peers:

  • The BGP router IDs of the two routers must be different.
  • If configured, BGP authentication (MD5) must match.
  • Agree on AS numbers, a local router’s ASN (on the router bgp asn) must match the neighboring router’s reference to that ASN with its neighbor ip remote-as asn.
  • Each router must be part of the TCP connection with the other router. The remote router’s IP address used in that TCP connection must match what the local router configures in a BGP neighbor remote-as command.
  • Neighbors must agree on address families (capabilities), IPv4 unicast, L2VPN EVPN … etc.


Network diagram with two routers, SW-A and SW-B, connected via a link labeled 172.10.11.0/31. SW-A has ASN 100 and SW-B has ASN 200. Both routers are configured with BGP and specific neighbor relationships, showing IP addresses and ASNs for communication.
eBGP Peering Configuration Example

BGP Neighboring States Review

The BGP router forms TCP sessions with its neighbor routers (peers). BGP uses the Finite State Machine (FSM) below to maintain a table of all BGP peers and their operational status.

BGP Peer StateDescription
IdleThe BGP process is either administratively down or waiting for the next retry attempt.
ConnectThe BGP process is waiting for the TCP connection to be completed.
ActiveThe TCP connection has not been completed yet, no BGP messages have been sent to the peer yet.
OpenSentThe BGP OPEN message has been sent to the peer, but the matching OPEN has not yet been received.
OpenConfirmAn OPEN message has been both sent to an received from the other router.
EstablishedAll neighbor parameters match, the neighbor relationship works, and the peers can now exchange UPDATE messages.
Diagram depicting the BGP session establishment process. Stages from top to bottom: Idle, Connect, TCP 3-Way, Active, OPEN Message, OpenSent, OPEN Message, OpenConfirm, Parameters matched, and Established. Two routers labeled A and B are shown on either side.
BGP States FSM

Troubleshoot BGP Peering Cases

BGP peering issues fall primarily into two categories:

  • BGP peer down. (Steady Downstate)
  • Flapping BGP peer. (Moving State)

BGP Peer Down Case


A down BGP peer state is in either an Idle or Active state; these states would mean the following possible problems:

  • No route to peer address (IP connectivity not present, including the default route 0.0.0.0/0).
  • Configuration error, such as update-source command is missing or wrongly configured.
  • TCP is established, but BGP negotiation fails; for example, misconfigured ASN.
  • Routers did not agree on the peering parameters, for example, the stub area flag.


The steps to troubleshoot down peering issues for BGP neighbors:

  • Step 1: Verify the configuration for correct peering IP addresses, AS numbers, update-source interface, authentication passwords, eBGP multi-hop configuration, .. etc.
  • Step 2: Verify reachability using the ping a.b.c.d [source-interface-id | source-ip-address], where a.b.c.d is the peer’s IP address.
  • Step 3: Verify the TCP connections using the command show socket connection TCP.
  • Step 4: Verify any IP ACLs in the path. The ACLs in the path should permit TCP connections on port 179 and ICMP packets that can help verify reachability.
  • Step 5: Use the debug ip bgp keepalives command to capture BGP packets.

The above five steps should help you understand the peering issues when the BGP peer is down (not flapping).

Flapping BGP Peer Case


When the BGP peer is flapping, that means it keeps moving between Idle and Established states. Flapping BGP peers could be due to one of the following several reasons:

  • The Hold Timer expired. And that could happen due to:
    • Interface drops: maybe the interface drops cause the BGP control packets (keepalive messages).
    • High CPU or Control-plane policy drops (Improper control-plane policing): maybe the BGP packet dropping is on the CPU interface.
    • If configured, validate that the BFD session is stable.
    • The MTS queue is stuck (in NX-OS). Or there is a BGP keepalive generation problem (OutQ/InQ). This means the BGP process is dropping the BGP messages.

  • MTU mismatch. (Seen in a production environment)
    • If a device in the path or even the destination cannot accept the packets with a higher MTU, it sends an ICMP error message back to the BGP speaker.
    • Because the Update message is used as keepalive as well! After 180 seconds, the destination router sends a notification back to the source with a hold time expired error message.

  • Bad BGP update.
    • Bad links carrying the update or bad hardware cause intermittent packet dropping.

Summary

For BGP peering to be established well, make sure that the peering requirements are met. Troubleshoot BGP issues depending on the specific issue you’re facing; there are different troubleshooting methods as described in this article.
Remember, always ensure that you understand the potential impact of any command before running it in a live network environment.

Need Comprehensive BGP Content?

I hope this article was helpful. If you want comprehensive content about BGP, check out my Cisco Data Centers | MP-BGP course on Udemy.

5 3 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Scroll to Top