Skip to main content

API Gateway Traffic Distribution / Load Balancing Configuration Method

This guide explains how to optimize traffic distribution and load balancing strategies for Kong Gateway across clusters and nodes, ensuring both high availability and maximum throughput. It references the resource sizing, cluster configuration, in-memory performance tuning, and scaling dimensions covered in prior guidelines.

Concept Overview​

What is Traffic Distribution / Load Balancing in Kong Gateway?​

Traffic distribution refers to how inbound API requests are routed across a Kong Gateway cluster and its nodes to ensure balanced CPU, memory, and network utilization. Load balancing ensures no single node or worker becomes a bottleneck, enabling horizontal scalability and fault tolerance. The method includes configuring:

  • Gateway scaling strategy (vertical/horizontal)
  • Node-level resource sizing
  • In-memory caching and plugin queue optimization
  • Latency/throughput-sensitive setup

Configuration Methods​

Method 1: Sizing Based on Cluster Load​

The first step in achieving optimal traffic distribution is classifying your Gateway by workload type and expected traffic volume. Use the table below to define initial system boundaries:

Cluster SizeCPURAMCloud Instance ExamplesUse Case
Development1-2 cores2-4 GBAWS: t3.medium / GCP: n1-standard-1 / Azure: A1Dev/test, low-latency sensitivity
Small1-2 cores2-4 GBAWS: t3.medium / GCP: n1-standard-1 / Azure: A1Light production, greenfield traffic
Medium2-4 cores4-8 GBAWS: m5.large / GCP: n1-standard-4 / Azure: A1v4Critical workloads with latency needs
Large8-16 cores16-32 GBAWS: c5.xlarge / GCP: n1-highcpu-16 / Azure: F8sEnterprise-grade, large-scale clusters
note

Avoid using throttled instance types (e.g., AWS t2, t3) in production. CPU throttling severely degrades Kong performance under load.

Method 2: Memory Allocation & In-Memory Optimization​

Memory Cache Configuration:

  • Set mem_cache_size as large as possible, while reserving memory for the OS and adjacent processes.
  • Recommended baseline: ~500MB per core
  • On a 4-core, 8GB instance: allocate 4–6 GB for Kong cache
  • Cached data includes configuration entities like routes, services, plugins

Plugin Queue Buffering:

  • Plugins such as http-log use queue.max_entries for asynchronous event buffering
  • Default value: 10,000
  • For high throughput, experiment with this value to avoid memory overflow or message drops
  • Each plugin queue is memory-bound and should be scaled based on load profile

Method 3: Scaling by Latency/Throughput Dimension​

Kong’s performance depends on:

  • Latency: Time per request (memory-bound)
  • Throughput: Requests per second (CPU-bound)

Optimization Strategy:

ScenarioOptimization Focus
Latency-sensitiveIncrease memory for database + plugin cache
Throughput-sensitiveAdd CPU cores; scale vertically/horizontally
Hybrid scaling (HA setups)Use dedicated config processing

Enable dedicated_config_processing in hybrid mode to offload CPU-heavy tasks like syncing configuration across nodes.

Additional Considerations​

Database Load Impact

  • Kong reads from DB only when nodes start or configuration changes
  • Resource needs depend on: Traffic, Rate of entity changes and Number of nodes and enabled features
  • Use in-memory caching to reduce DB pressure
  • Maintain minimal access with fallback (keep Kong Gateway operational) if DB is temporarily unavailable

Performance Factors to Monitor​

Track and tune the following:

  • Number of Routes, Services, Consumers, and Plugins
  • Plugin Cardinality (diverse plugin types increase CPU load)
  • Request/response size (large payloads increase latency)
  • Workspace count and metadata (more memory per workspace)

Summary​

To ensure balanced traffic distribution and optimal load handling in Kong Gateway:

  1. Start with proper cluster size definitions
  2. Allocate memory based on cache and plugin queue needs
  3. Tune based on latency vs throughput priorities
  4. Minimize database dependency with caching
  5. Enable scaling via CPU/memory configuration and hybrid processing options

These configurations form the foundation for building a scalable, production-grade API Gateway using Kong.