API Gateway Traffic Distribution / Load Balancing Configuration Method

This guide explains how to optimize traffic distribution and load balancing strategies for Kong Gateway across clusters and nodes, ensuring both high availability and maximum throughput. It references the resource sizing, cluster configuration, in-memory performance tuning, and scaling dimensions covered in prior guidelines.

Concept Overview

What is Traffic Distribution / Load Balancing in Kong Gateway?

Traffic distribution refers to how inbound API requests are routed across a Kong Gateway cluster and its nodes to ensure balanced CPU, memory, and network utilization. Load balancing ensures no single node or worker becomes a bottleneck, enabling horizontal scalability and fault tolerance. The method includes configuring:

Gateway scaling strategy (vertical/horizontal)
Node-level resource sizing
In-memory caching and plugin queue optimization
Latency/throughput-sensitive setup

Configuration Methods

Method 1: Sizing Based on Cluster Load

The first step in achieving optimal traffic distribution is classifying your Gateway by workload type and expected traffic volume. Use the table below to define initial system boundaries:

Cluster Size	CPU	RAM	Cloud Instance Examples	Use Case
Development	1-2 cores	2-4 GB	AWS: t3.medium / GCP: n1-standard-1 / Azure: A1	Dev/test, low-latency sensitivity
Small	1-2 cores	2-4 GB	AWS: t3.medium / GCP: n1-standard-1 / Azure: A1	Light production, greenfield traffic
Medium	2-4 cores	4-8 GB	AWS: m5.large / GCP: n1-standard-4 / Azure: A1v4	Critical workloads with latency needs
Large	8-16 cores	16-32 GB	AWS: c5.xlarge / GCP: n1-highcpu-16 / Azure: F8s	Enterprise-grade, large-scale clusters

note

Avoid using throttled instance types (e.g., AWS t2, t3) in production. CPU throttling severely degrades Kong performance under load.

Method 2: Memory Allocation & In-Memory Optimization

Memory Cache Configuration:

Set mem_cache_size as large as possible, while reserving memory for the OS and adjacent processes.
Recommended baseline: ~500MB per core
On a 4-core, 8GB instance: allocate 4–6 GB for Kong cache
Cached data includes configuration entities like routes, services, plugins

Plugin Queue Buffering:

Plugins such as http-log use queue.max_entries for asynchronous event buffering
Default value: 10,000
For high throughput, experiment with this value to avoid memory overflow or message drops
Each plugin queue is memory-bound and should be scaled based on load profile

Method 3: Scaling by Latency/Throughput Dimension

Kong’s performance depends on:

Latency: Time per request (memory-bound)
Throughput: Requests per second (CPU-bound)

Optimization Strategy:

Scenario	Optimization Focus
Latency-sensitive	Increase memory for database + plugin cache
Throughput-sensitive	Add CPU cores; scale vertically/horizontally
Hybrid scaling (HA setups)	Use dedicated config processing

Enable dedicated_config_processing in hybrid mode to offload CPU-heavy tasks like syncing configuration across nodes.

Additional Considerations

Database Load Impact

Kong reads from DB only when nodes start or configuration changes
Resource needs depend on: Traffic, Rate of entity changes and Number of nodes and enabled features
Use in-memory caching to reduce DB pressure
Maintain minimal access with fallback (keep Kong Gateway operational) if DB is temporarily unavailable

Performance Factors to Monitor

Track and tune the following:

Number of Routes, Services, Consumers, and Plugins
Plugin Cardinality (diverse plugin types increase CPU load)
Request/response size (large payloads increase latency)
Workspace count and metadata (more memory per workspace)

Summary

To ensure balanced traffic distribution and optimal load handling in Kong Gateway:

Start with proper cluster size definitions
Allocate memory based on cache and plugin queue needs
Tune based on latency vs throughput priorities
Minimize database dependency with caching
Enable scaling via CPU/memory configuration and hybrid processing options

These configurations form the foundation for building a scalable, production-grade API Gateway using Kong.

Concept Overview​

What is Traffic Distribution / Load Balancing in Kong Gateway?​

Configuration Methods​

Method 1: Sizing Based on Cluster Load​

Method 2: Memory Allocation & In-Memory Optimization​

Method 3: Scaling by Latency/Throughput Dimension​

Additional Considerations​

Performance Factors to Monitor​

Summary​