BGP Implementation in Red Hat OpenStack Services on OpenShift¶
Nov 20, 2024
8 min read
Introduction¶
Red Hat OpenStack Services on OpenShift (RHOSO) 18.0 introduces comprehensive Border Gateway Protocol (BGP) support for dynamic routing in containerized OpenStack deployments. This implementation leverages Free Range Routing (FRR) with the OVN BGP agent to provide scalable, reliable networking for modern cloud infrastructure deployed on OpenShift.
RHOSO’s BGP implementation enables pure Layer 3 data center architectures, eliminating traditional Layer 2 limitations such as large failure domains and slow convergence during network failures. This approach is essential for enterprise environments requiring high availability, scalability, and integration with existing network infrastructure.
Understanding RHOSO Dynamic Routing¶
What is RHOSO?¶
Red Hat OpenStack Services on OpenShift (RHOSO) represents Red Hat’s next-generation OpenStack deployment model, running OpenStack services as containerized workloads on OpenShift. This architectural shift provides:
Container-native deployment: All OpenStack services run as pods on OpenShift
Kubernetes-native operations: Leverages OpenShift for scaling, health monitoring, and lifecycle management
Enhanced reliability: Benefits from OpenShift’s self-healing and high availability features
Simplified operations: Uses OpenShift’s declarative configuration and GitOps workflows
BGP in RHOSO vs. Traditional OpenStack¶
RHOSO 18.0’s BGP implementation differs significantly from traditional Red Hat OpenStack Platform:
Traditional RHOSP: - BGP services run on bare-metal or VM hypervisors - Direct systemd service management - Node-level configuration files
RHOSO 18.0: - BGP services run in OpenShift pods - Kubernetes-native configuration management - Container-based service deployment
Key Components¶
RHOSO’s dynamic routing relies on four primary components working together in a distributed architecture that requires dedicated networking nodes:
OVN BGP Agent¶
The OVN BGP agent is a Python-based daemon running in the ovn_bgp_agent
container on Compute and Networker nodes. Its primary functions include:
Database monitoring: Monitors the OVN northbound database for VM and floating IP events
Route management: Triggers FRR to advertise or withdraw routes based on workload lifecycle
Traffic redirection: Configures Linux kernel networking for proper traffic flow
Interface management: Manages the
bgp-nic
dummy interface for route advertisement
The agent operates by detecting changes in the OVN database and translating these into BGP routing decisions.
apiVersion: v1
kind: ConfigMap
metadata:
name: ovn-bgp-agent-config
data:
ovn_bgp_agent.conf: |
[DEFAULT]
debug = False
reconcile_interval = 120
expose_tenant_networks = False
[bgp]
bgp_speaker_driver = ovn_bgp_driver
FRR Container Suite¶
Free Range Routing runs as the frr
container on all RHOSO nodes, providing enterprise-grade routing capabilities:
BGP Daemon (bgpd): Manages BGP peer relationships and route advertisements
BFD Daemon (bfdd): Provides sub-second failure detection
Zebra Daemon: Interfaces between FRR and the Linux kernel routing table
VTY Shell: Command-line interface for configuration and monitoring
frr version 8.5
frr defaults traditional
hostname rhoso-compute-0
log syslog informational
service integrated-vtysh-config
!
router bgp 64999
bgp router-id 172.30.1.10
neighbor 172.30.1.254 remote-as 65000
neighbor 172.30.1.254 bfd
!
address-family ipv4 unicast
redistribute connected
maximum-paths 8
exit-address-family
!
Kernel Networking Integration¶
RHOSO leverages RHEL kernel networking features configured by the OVN BGP agent:
VRF (Virtual Routing and Forwarding): Network isolation using
bgp_vrf
IP Rules: Direct traffic to appropriate routing tables
Dummy Interface: The
bgp-nic
interface for route advertisementOVS Integration: Flow rules redirecting traffic to the OVN overlay
Dedicated Networking Nodes¶
RHOSO BGP deployments require dedicated networking nodes with specific architectural constraints:
Mandatory Architecture: BGP dynamic routing cannot function without dedicated networker nodes
DVR Integration: Must be deployed with Distributed Virtual Routing (DVR) enabled
Traffic Gateway Role: Networker nodes host neutron router gateways and CR-LRP (Chassis Redirect Logical Router Ports)
North-South Traffic: All external traffic to tenant networks flows through networker nodes
BGP Advertisement: Both compute and networker nodes run FRR and OVN BGP agent containers
# OpenShift node labels for dedicated networking
apiVersion: v1
kind: Node
metadata:
name: rhoso-networker-0
labels:
node-role.kubernetes.io/rhoso-networker: ""
feature.node.kubernetes.io/network-sriov.capable: "true"
spec:
# Networker nodes require specific networking capabilities
# and dedicated hardware for external connectivity
Architecture Constraints:
Control Plane OVN Gateways: Not supported with BGP (incompatible)
Octavia Load Balancer: Cannot be used with BGP dynamic routing
IPv6 Deployments: Currently not supported with BGP
BFD Limitations: Bi-directional forwarding detection has known issues
Network Architecture¶
RHOSO BGP Network Topology¶
BGP Component Interactions¶
The following diagram shows detailed interactions between all RHOSO BGP components as they operate within the OpenShift container environment:
Key Interaction Flows:
Route Discovery: OVN BGP Agent monitors OVN northbound database for VM and floating IP events
Route Injection: Agent adds IP addresses to bgp-nic dummy interface
Kernel Integration: Zebra daemon detects new routes and updates kernel routing table
BGP Advertisement: BGP daemon advertises connected routes to external peers
Traffic Redirection: Agent configures IP rules and OVS flows for incoming traffic
BFD Monitoring: BFD daemon provides fast failure detection between BGP peers
Traffic Flow Process¶
When a VM is created or a floating IP is assigned, the following sequence occurs:
BGP Session Establishment Process¶
The following diagram illustrates the complete BGP peering process between RHOSO nodes and external infrastructure:
BGP Session States and Transitions:
Idle: Initial state, no BGP session attempt
Connect: TCP connection establishment in progress
OpenSent: BGP OPEN message sent, waiting for peer response
OpenConfirm: BGP OPEN received, sending KEEPALIVE
Established: Full BGP session active, route exchange possible
RHOSO BGP Configuration Parameters:
router bgp 64999
# BGP timers (keepalive, hold-time)
timers bgp 10 30
# BFD for fast failure detection
neighbor 172.30.1.254 bfd
neighbor 172.30.1.254 bfd profile fast-detect
# BGP session parameters
neighbor 172.30.1.254 remote-as 65000
neighbor 172.30.1.254 capability extended-nexthop
# BFD profile for sub-second detection
bfd
profile fast-detect
detect-multiplier 3
receive-interval 100 # 100ms
transmit-interval 100 # 100ms
Private Network Advertising¶
RHOSO BGP supports advertising private tenant networks, though this feature is disabled by default due to security implications.
Tenant Network Exposure Configuration¶
Default Behavior: By default, only floating IPs and provider network IPs are advertised via BGP. Private tenant networks remain isolated within the OVN overlay.
Enabling Tenant Network Advertisement:
apiVersion: v1
kind: ConfigMap
metadata:
name: ovn-bgp-agent-config
data:
ovn_bgp_agent.conf: |
[DEFAULT]
debug = False
reconcile_interval = 120
expose_tenant_networks = True # Enable tenant network advertising
[bgp]
bgp_speaker_driver = ovn_bgp_driver
Security Considerations:
Network Isolation: Enabling tenant network exposure breaks traditional OpenStack network isolation
Routing Policies: External routers must implement proper filtering to maintain security boundaries
Non-Overlapping CIDRs: Tenant networks must use unique, non-overlapping IP ranges
Access Control: External network infrastructure must enforce tenant access policies
Note
Cross-Datacenter Tenant Network Design Considerations
When exposing tenant networks via BGP across datacenter boundaries, consider these additional design factors:
WAN Security: Plan for appropriate security measures when tenant traffic traverses WAN links, including encryption and filtering strategies
Route Advertisement Control: Implement proper BGP route filtering and communities to control advertisement boundaries
Operational Complexity: Account for increased troubleshooting complexity spanning multiple sites and administrative domains
Compliance Planning: Evaluate data locality and compliance requirements that may affect multi-site tenant network designs
Performance Considerations: Factor in WAN latency and bandwidth characteristics for cross-datacenter tenant communication
Design Options: Consider various approaches including dedicated VPN connections, floating IP strategies, or hybrid architectures that balance connectivity needs with operational complexity.
Traffic Flow for Tenant Networks¶
When tenant network advertising is enabled, traffic follows a specific path through dedicated networking nodes:
Detailed Traffic Flow Analysis:
External Client Request: Client sends packet destined for tenant VM (10.1.0.10)
BGP Route Lookup: ToR switch consults BGP routing table, finds route advertised by Networker Node 1
Kernel Processing: Networker node kernel applies IP rules, directing tenant network traffic to br-ex bridge
CR-LRP Injection: Traffic enters OVN overlay via Chassis Redirect Logical Router Port hosted on networker node
OVN Routing: Logical router performs L3 routing and ARP resolution within overlay
Overlay Forwarding: Logical switch performs L2 forwarding to target VM based on MAC address
VM Delivery: Packet delivered to tenant VM running on compute node
Return Traffic Path:
Key Technical Details:
CR-LRP Role: Chassis Redirect Logical Router Ports serve as the entry point for external traffic to tenant networks
Networker Node Gateway: All north-south traffic to tenant networks must traverse the networker node hosting the neutron router gateway
Route Advertisement: The OVN BGP agent on networker nodes advertises neutron router gateway ports when tenant network exposure is enabled
Implementation Requirements¶
Network Planning:
# Tenant network configuration
router bgp 64999
# Advertise tenant network ranges
address-family ipv4 unicast
network 10.0.0.0/24 # Tenant network 1
network 10.1.0.0/24 # Tenant network 2
network 10.2.0.0/24 # Tenant network 3
exit-address-family
# Route filtering for security
ip prefix-list TENANT-NETWORKS permit 10.0.0.0/8 le 24
route-map TENANT-FILTER permit 10
match ip address prefix-list TENANT-NETWORKS
Operational Considerations:
Network Overlap Detection: Implement monitoring to detect and prevent CIDR overlaps
Route Filtering: Configure external routers with appropriate filters to prevent route leaks
Multi-Tenancy: Consider impact on tenant isolation and implement additional security measures
Troubleshooting Complexity: Private network advertising increases troubleshooting complexity
Real-World Deployment Scenarios¶
Enterprise Multi-Zone Deployment¶
Note
Cross-Datacenter Deployment Considerations
The architecture shown below is technically feasible and has been successfully implemented by various organizations. However, cross-datacenter RHOSO deployments typically require specific support considerations and careful planning beyond the standard deployment model.
Key Design Considerations:
Control plane database synchronization across WAN links
Network latency considerations for OpenStack service communication
Resilience planning for network partitions between sites
Enhanced monitoring and troubleshooting procedures
Storage architecture design for multi-site scenarios
Alternative Architecture: Red Hat’s Distributed Compute Node (DCN) architecture offers a supported approach for multi-site deployments, where control plane services remain centralized and only compute nodes are deployed at remote sites.
Planning Recommendation: Consult with Red Hat support during the design phase to validate your specific cross-datacenter deployment architecture and requirements.
Use Case: Large enterprise with RHOSO deployed across multiple OpenShift clusters in different availability zones.
Multi-Zone Network Topology:
Technical Implementation:
# Zone 1 Configuration (AS 64999)
router bgp 64999
bgp router-id 10.1.1.1
# Local zone ToR peering (eBGP)
neighbor 10.1.1.254 remote-as 65001
neighbor 10.1.1.253 remote-as 65001
# Inter-zone peering (iBGP confederation or eBGP)
neighbor 10.2.1.1 remote-as 64998
neighbor 10.2.1.1 ebgp-multihop 3
address-family ipv4 unicast
# Advertise zone-specific networks
network 203.0.113.10/32 # Control plane VIP
network 203.0.113.100/28 # Zone 1 floating IPs
network 10.1.0.0/16 # Zone 1 tenant networks (if enabled)
# ECMP for load balancing
maximum-paths 4
maximum-paths ibgp 4
# Route filtering between zones
neighbor 10.2.1.1 route-map ZONE2-IN in
neighbor 10.2.1.1 route-map ZONE1-OUT out
exit-address-family
# Zone 2 Configuration (AS 64998)
router bgp 64998
bgp router-id 10.2.1.1
# Local zone ToR peering (eBGP)
neighbor 10.2.1.254 remote-as 65002
neighbor 10.2.1.253 remote-as 65002
# Inter-zone peering
neighbor 10.1.1.1 remote-as 64999
neighbor 10.1.1.1 ebgp-multihop 3
address-family ipv4 unicast
# Advertise zone-specific networks
network 203.0.113.11/32 # Control plane VIP
network 203.0.113.201/28 # Zone 2 floating IPs
network 10.2.0.0/16 # Zone 2 tenant networks (if enabled)
maximum-paths 4
maximum-paths ibgp 4
exit-address-family
Benefits: - Geographic Distribution: Workloads distributed across multiple data centers - Fault Isolation: Zone failures don’t impact other zones - Load Distribution: Traffic distributed based on BGP routing policies - Disaster Recovery: Automatic failover between zones via BGP route withdrawal - Scalability: Independent scaling of compute and network resources per zone
Control Plane High Availability¶
Use Case: RHOSO control plane services distributed across OpenShift nodes with BGP-advertised VIPs.
Implementation Details: - Pacemaker manages VIP assignment - OVN BGP agent advertises active VIP location - Sub-second failover with BFD - No single point of failure
Dedicated Networker Node Deployment¶
Use Case: Enterprise RHOSO deployment with dedicated networking infrastructure for BGP routing and tenant network isolation.
Architecture Requirements:
Technical Implementation:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: rhoso-networker-services
spec:
selector:
matchLabels:
app: rhoso-networker
template:
metadata:
labels:
app: rhoso-networker
spec:
nodeSelector:
node-role.kubernetes.io/rhoso-networker: ""
hostNetwork: true
containers:
- name: frr-bgp
image: quay.io/rhoso/frr:latest
securityContext:
privileged: true
- name: ovn-bgp-agent
image: quay.io/rhoso/ovn-bgp-agent:latest
env:
- name: EXPOSE_TENANT_NETWORKS
value: "true" # Enable tenant network advertising
Benefits: - Dedicated Traffic Path: All north-south traffic controlled through networker nodes - High Availability: Multiple networker nodes provide redundancy for tenant network access - Security Isolation: Clear separation between compute and networking functions - Scalability: Independent scaling of compute and network infrastructure
Deployment Considerations: - Hardware Requirements: Networker nodes need enhanced networking capabilities - Network Connectivity: Direct physical connections to external infrastructure - DVR Requirement: Must be deployed with Distributed Virtual Routing enabled - Monitoring: Enhanced monitoring required for CR-LRP and gateway functions
Hybrid Cloud Connectivity¶
Use Case: Connecting RHOSO workloads to external cloud providers and on-premises networks.
Technical Implementation:
# RHOSO to AWS Transit Gateway
router bgp 64999
neighbor 169.254.100.1 remote-as 64512 # AWS side
address-family ipv4 unicast
network 10.0.0.0/16 # RHOSO tenant networks
neighbor 169.254.100.1 prefix-list RHOSO-OUT out
exit-address-family
# RHOSO to On-premises
router bgp 64999
neighbor 172.16.1.1 remote-as 65000 # Corporate network
address-family ipv4 unicast
neighbor 172.16.1.1 route-map CORPORATE-IN in
exit-address-family
Configuration and Deployment¶
Prerequisites¶
RHOSO dynamic routing requires:
RHOSO 18.0 or later with ML2/OVN mechanism driver
OpenShift 4.14+ with appropriate node networking
BGP-capable network infrastructure (ToR switches, routers)
Dedicated networker nodes (mandatory for BGP deployments)
Distributed Virtual Routing (DVR) enabled
Proper network planning for ASN assignment and IP addressing
Critical Architecture Requirements:
No Control Plane OVN Gateways: BGP is incompatible with control plane OVN gateway deployments
No Octavia Load Balancer: Cannot be used simultaneously with BGP dynamic routing
No Distributed Control Plane: RHOSO dynamic routing does not support distributed control planes across datacenters
IPv4 Only: IPv6 deployments are not currently supported with BGP
Non-overlapping CIDRs: When using tenant network advertising, all networks must use unique IP ranges
External BGP Peers: Network infrastructure must support BGP peering and route filtering
Note
Cross-Datacenter Deployment Design Considerations
RHOSO BGP deployments across datacenters require additional planning and design considerations:
Control Plane Design: Consider the implications of control plane service communication patterns across WAN links
Network Latency Planning: Evaluate network latency requirements for optimal OpenStack service performance
Database Architecture: Plan for database replication, backup, and disaster recovery strategies
Storage Design: Consider storage architecture options that balance performance, availability, and data locality
Alternative Options: Red Hat’s Distributed Compute Node (DCN) architecture provides a proven approach for multi-site deployments with centralized control planes and distributed compute resources.
OpenShift Integration¶
RHOSO BGP services integrate with OpenShift through:
Service Mesh: BGP containers run within the OpenShift service mesh ConfigMaps: Configuration stored as Kubernetes ConfigMaps Monitoring: Integration with OpenShift monitoring and alerting Networking: Uses OpenShift SDN or OVN-Kubernetes for pod networking
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: frr-bgp
spec:
selector:
matchLabels:
app: frr-bgp
template:
metadata:
labels:
app: frr-bgp
spec:
hostNetwork: true
containers:
- name: frr
image: quay.io/rhoso/frr:latest
securityContext:
privileged: true
volumeMounts:
- name: frr-config
mountPath: /etc/frr
volumes:
- name: frr-config
configMap:
name: frr-configuration
Production Operations¶
Monitoring and Observability¶
RHOSO BGP monitoring leverages OpenShift’s native observability:
Prometheus Metrics: FRR and OVN BGP agent export metrics Grafana Dashboards: Pre-built dashboards for BGP performance Alerting: Automated alerts for BGP session failures Logging: Centralized logging through OpenShift logging stack
# Check BGP session status
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show bgp summary'
# View route advertisements
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show ip bgp neighbors advertised-routes'
# Check OVN BGP agent status
oc logs -n rhoso-system -l app=ovn-bgp-agent --tail=50
Scaling Operations¶
Adding new compute capacity with BGP requires:
OpenShift node addition: Standard OpenShift node scaling procedures
Automatic BGP configuration: DaemonSets ensure BGP services on new nodes
Network validation: Verify BGP peering and route advertisement
Workload validation: Test VM connectivity through new nodes
Network Failure and Recovery¶
RHOSO BGP deployments implement automated failure detection and recovery mechanisms:
Failure Scenarios and Recovery Times:
Failure Type |
Detection Time |
Recovery Time |
Impact Level |
---|---|---|---|
BGP Session Failure (BFD enabled) |
100~300ms |
1~3 seconds |
Low (alternate paths) |
BGP Session Failure (no BFD) |
30~90 seconds |
30~120 seconds |
Medium (delayed reroute) |
Single Networker Node |
5~15 seconds |
10~30 seconds |
Medium (CR-LRP migration) |
Multiple Networker Nodes |
5~15 seconds |
Manual intervention |
High (tenant isolation) |
Compute Node |
30~60 seconds |
60~180 seconds |
Low (workload migration) |
Automated Recovery Mechanisms:
BGP Route Withdrawal: Automatic route withdrawal upon session failure
BFD Fast Detection: Sub-second failure detection with proper BFD configuration
CR-LRP Migration: Automatic migration of Chassis Redirect Logical Router Ports
Traffic Rerouting: ECMP and alternate path utilization
Session Re-establishment: Automatic BGP session recovery
Troubleshooting¶
Common Issues and Solutions¶
BGP Sessions Not Establishing
Symptoms: BGP peers show “Idle” or “Connect” state
# Check BGP peer status
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show bgp neighbors'
# Verify network connectivity
oc exec -n rhoso-system ds/frr-bgp -- ping <peer-ip>
# Check firewall rules
oc exec -n rhoso-system ds/frr-bgp -- ss -tulpn | grep 179
Solution: Verify network connectivity, ASN configuration, and firewall rules allowing TCP port 179.
Routes Not Being Advertised
Symptoms: External networks cannot reach RHOSO workloads
# Check if IPs are on bgp-nic interface
oc exec -n rhoso-system ds/ovn-bgp-agent -- ip addr show bgp-nic
# Verify FRR is redistributing connected routes
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show running-config'
# Check OVN BGP agent logs
oc logs -n rhoso-system -l app=ovn-bgp-agent
Solution: Ensure OVN BGP agent is running and FRR has “redistribute connected” configured.
Tenant Networks Not Reachable
Symptoms: External clients cannot reach VMs on tenant networks despite expose_tenant_networks = True
# Check if tenant network exposure is enabled
oc exec -n rhoso-system ds/ovn-bgp-agent -- grep expose_tenant_networks /etc/ovn_bgp_agent/ovn_bgp_agent.conf
# Verify CR-LRP (Chassis Redirect Logical Router Ports) are active on networker nodes
oc exec -n rhoso-system ds/ovn-bgp-agent -- ovn-sbctl show | grep cr-lrp
# Check neutron router gateway port advertisement
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show ip bgp | grep 10.0.0'
Solution: Verify networker nodes are hosting CR-LRP and neutron router gateways are properly advertised.
Networker Node Failures
Symptoms: Complete loss of external connectivity to tenant networks
# Check networker node health
oc get nodes -l node-role.kubernetes.io/rhoso-networker
# Verify networker pods are running
oc get pods -n rhoso-system -l app=rhoso-networker
# Check CR-LRP failover status
oc exec -n rhoso-system ds/ovn-bgp-agent -- ovn-sbctl find Chassis_Redirect_Port
# Verify BGP session status on remaining networker nodes
oc exec -n rhoso-system ds/frr-bgp -- vtysh -c 'show bgp summary'
Solution: Ensure multiple networker nodes are deployed for high availability and CR-LRP can migrate between nodes.
Slow Convergence
Symptoms: Long failover times during node or network failures
router bgp 64999
neighbor 172.30.1.254 bfd
neighbor 172.30.1.254 bfd profile fast-detect
bfd
profile fast-detect
detect-multiplier 3
receive-interval 100
transmit-interval 100
Performance Tuning¶
ECMP Configuration
router bgp 64999
maximum-paths 8
maximum-paths ibgp 8
bestpath as-path multipath-relax
BGP Timers Optimization
router bgp 64999
neighbor 172.30.1.254 timers 10 30
neighbor 172.30.1.254 capability extended-nexthop
Conclusion¶
RHOSO 18.0’s BGP implementation provides enterprise-grade dynamic routing for containerized OpenStack deployments. By leveraging OpenShift’s container orchestration with proven networking technologies like FRR and OVN, organizations can achieve:
Scalable networking: Automatic route management as workloads scale
High availability: Sub-second failover with BFD and ECMP
Operational simplicity: Kubernetes-native management and monitoring
Enterprise integration: Seamless connectivity with existing network infrastructure
The combination of RHOSO’s containerized architecture with BGP’s proven routing capabilities enables organizations to deploy production-ready OpenStack clouds that integrate seamlessly with modern data center networking practices.
For detailed deployment procedures and additional configuration options, refer to the official Red Hat OpenStack Services on OpenShift 18.0 documentation.