API Gateway Traffic Distribution / Load Balancing Configuration Method
This guide explains how to optimize traffic distribution and load balancing strategies for Kong Gateway across clusters and nodes, ensuring both high availability and maximum throughput. It references the resource sizing, cluster configuration, in-memory performance tuning, and scaling dimensions covered in prior guidelines.
Concept Overview​
What is Traffic Distribution / Load Balancing in Kong Gateway?​
Traffic distribution refers to how inbound API requests are routed across a Kong Gateway cluster and its nodes to ensure balanced CPU, memory, and network utilization. Load balancing ensures no single node or worker becomes a bottleneck, enabling horizontal scalability and fault tolerance. The method includes configuring:
- Gateway scaling strategy (vertical/horizontal)
- Node-level resource sizing
- In-memory caching and plugin queue optimization
- Latency/throughput-sensitive setup
Configuration Methods​
Method 1: Sizing Based on Cluster Load​
The first step in achieving optimal traffic distribution is classifying your Gateway by workload type and expected traffic volume. Use the table below to define initial system boundaries:
Cluster Size | CPU | RAM | Cloud Instance Examples | Use Case |
---|---|---|---|---|
Development | 1-2 cores | 2-4 GB | AWS: t3.medium / GCP: n1-standard-1 / Azure: A1 | Dev/test, low-latency sensitivity |
Small | 1-2 cores | 2-4 GB | AWS: t3.medium / GCP: n1-standard-1 / Azure: A1 | Light production, greenfield traffic |
Medium | 2-4 cores | 4-8 GB | AWS: m5.large / GCP: n1-standard-4 / Azure: A1v4 | Critical workloads with latency needs |
Large | 8-16 cores | 16-32 GB | AWS: c5.xlarge / GCP: n1-highcpu-16 / Azure: F8s | Enterprise-grade, large-scale clusters |
Avoid using throttled instance types (e.g., AWS t2, t3) in production. CPU throttling severely degrades Kong performance under load.
Method 2: Memory Allocation & In-Memory Optimization​
Memory Cache Configuration:
- Set mem_cache_size as large as possible, while reserving memory for the OS and adjacent processes.
- Recommended baseline: ~500MB per core
- On a 4-core, 8GB instance: allocate 4–6 GB for Kong cache
- Cached data includes configuration entities like routes, services, plugins
Plugin Queue Buffering:
- Plugins such as http-log use queue.max_entries for asynchronous event buffering
- Default value: 10,000
- For high throughput, experiment with this value to avoid memory overflow or message drops
- Each plugin queue is memory-bound and should be scaled based on load profile
Method 3: Scaling by Latency/Throughput Dimension​
Kong’s performance depends on:
- Latency: Time per request (memory-bound)
- Throughput: Requests per second (CPU-bound)
Optimization Strategy:
Scenario | Optimization Focus |
---|---|
Latency-sensitive | Increase memory for database + plugin cache |
Throughput-sensitive | Add CPU cores; scale vertically/horizontally |
Hybrid scaling (HA setups) | Use dedicated config processing |
Enable dedicated_config_processing in hybrid mode to offload CPU-heavy tasks like syncing configuration across nodes.
Additional Considerations​
Database Load Impact
- Kong reads from DB only when nodes start or configuration changes
- Resource needs depend on: Traffic, Rate of entity changes and Number of nodes and enabled features
- Use in-memory caching to reduce DB pressure
- Maintain minimal access with fallback (keep Kong Gateway operational) if DB is temporarily unavailable
Performance Factors to Monitor​
Track and tune the following:
- Number of Routes, Services, Consumers, and Plugins
- Plugin Cardinality (diverse plugin types increase CPU load)
- Request/response size (large payloads increase latency)
- Workspace count and metadata (more memory per workspace)
Summary​
To ensure balanced traffic distribution and optimal load handling in Kong Gateway:
- Start with proper cluster size definitions
- Allocate memory based on cache and plugin queue needs
- Tune based on latency vs throughput priorities
- Minimize database dependency with caching
- Enable scaling via CPU/memory configuration and hybrid processing options
These configurations form the foundation for building a scalable, production-grade API Gateway using Kong.