Rate Limiting + Backpressure Control

Mở đầu

Khuyến mãi 12.12 lúc 00:00, vài triệu user đổ vào — server chịu nổi? Mọi system có giới hạn. Khi request vượt capacity, không control → ai cũng không dùng được. Rate limiting + backpressure = 2 hàng phòng thủ chống system bị "đè bẹp".

Bạn sẽ học:

Sao cần limit: chủ động reject để bảo vệ system
Algorithm: token bucket, leaky bucket, sliding window
Backpressure: khi upstream nhanh hơn downstream
Multi-layer: client → gateway → service
Selection: scenario nào dùng strategy nào

Chương	Nội dung
1	Sao cần rate limit
2	Algorithms
3	Backpressure
4	Multi-layer architecture
5	Thực chiến + selection

0. Toàn cảnh: sao "reject" user?

Nghe phi trực giác — không phải nên serve mọi user à? Nhưng thực tế: không reject 1 phần → tất cả request fail.

Tưởng nhà hàng 100 chỗ, đột nhiên 1000 người tới. Không limit → 1000 người không phải đều ăn được, mà bếp crash, phục vụ liệt, 1000 người không ai ăn. Cách đúng: limit ở cửa, 100 vào trước, còn lại chờ.

Mục tiêu rate limit

Bảo vệ system: phòng overload gây service hoàn toàn không dùng
Phân phối công bằng: đảm bảo request đã accept xử bình thường
Graceful degradation: request bị limit nhận 429 rõ ràng, không timeout hoặc 500

1. Algorithms: 3 plan kinh điển

Vấn đề: trong unit time, max bao nhiêu request được qua? Algorithm khác về precision, xử burst, complexity.

通过0

拒绝0

剩余令牌5

令牌桶

以固定速率往桶里放令牌，每个请求消耗一个令牌。桶满时多余令牌丢弃。允许一定程度的突发流量（桶里有存量令牌时）。

Algorithm	Nguyên lý	Burst	Precision	Complexity
Token bucket	Tốc độ cố định cấp token, request consume token	Cho phép (có dự trữ)	Cao	Trung
Leaky bucket	Request xếp hàng, tốc độ cố định xử	Không cho phép (smooth)	Cao	Trung
Sliding window	Đếm request trong window	1 phần cho phép	Khá cao	Thấp
Fixed window	Đếm theo time window	Boundary có thể burst	Thấp	Thấp nhất

Chọn algorithm?

API limit: token bucket hay nhất, cho phép burst hợp lý
Traffic shaping: leaky bucket hợp output rate constant
Simple count: sliding window đơn giản, hợp đa số web app

2. Backpressure: khi upstream nhanh hơn downstream

Rate limit giải "external request quá nhiều", backpressure giải "internal component speed không match".

Khi producer sinh data nhanh hơn consumer xử liên tục, buffer giữa tăng vô hạn → OOM hoặc mất data. Backpressure = consumer "reverse notify" producer chậm lại.

生产速率：6/s

消费速率：3/s

生产者

6/s

缓冲区 (0/20)

正常运行

消费者

3/s

背压处理策略：

丢弃策略

缓冲区满时直接丢弃新数据

如：日志采集、实时监控指标

阻塞策略

缓冲区满时让生产者等待

如：Go channel、Java BlockingQueue

采样策略

只处理部分数据，跳过其余

如：高频传感器数据降采样

弹性扩容

动态增加消费者数量

如：K8s HPA 自动扩缩容

4 strategy

Drop: buffer đầy drop new/old data, hợp realtime cao nhưng cho phép mất
Block: producer pause, đợi consumer xử xong, hợp data không mất được
Sample: chỉ xử 1 phần, hợp data stream tần số cao
Scale: dynamic tăng consumer, hợp cloud-native

3. Multi-layer rate limiting

Production cần multi-layer protection, mỗi layer xử granularity khác.

Layer	Vị trí	Granularity	Tool
Client	FE/App	Button debounce, request throttle	lodash.throttle, debounce
CDN/WAF	Edge	IP, region	Cloudflare Rate Limiting
API gateway	Entry	Route, user	Nginx limit_req, Kong
Server	App nội bộ	API, resource	Sentinel, Resilience4j
DB	Storage	Connection, QPS	Connection pool, slow query circuit break

HTTP spec

Request bị limit return 429 Too Many Requests + headers:

Retry-After: thời gian retry (giây hoặc date)
X-RateLimit-Limit: limit max
X-RateLimit-Remaining: quota còn
X-RateLimit-Reset: time reset quota

4. Selection thực

Scenario	Recommend	Note
Nginx entry limit	`limit_req_zone`	Leaky bucket, config đơn giản
Distributed limit	Redis + Lua	Token bucket / sliding window, share count
Java microservice	Sentinel / Resilience4j	Hỗ trợ circuit, degradation, hot limit
Node.js API	express-rate-limit	Đơn giản, Redis support
Go service	golang.org/x/time/rate	Stdlib token bucket

Tổng kết

Rate limit + backpressure = 2 hàng phòng thủ stability.

Sao cần: không reject 1 phần → tất cả fail
3 algorithm core: token bucket (burst), leaky bucket (smooth), sliding window (đơn giản)
Backpressure: drop, block, sample, scale
Multi-layer: client → DB
429 spec: status code + headers chuẩn

2026 cho VN dev

Cloudflare Workers: edge rate limiting, geo-based, bot detection
Upstash Ratelimit: serverless Redis-based, dễ dùng Next.js/Vercel
AI API rate limit: OpenAI/Anthropic có rate limit theo TPM (token/min) + RPM
VN scenario: protect API auth (login brute-force), payment endpoint, AI inference
VN-friendly tools: BunnyCDN, Cloudflare APAC PoP

Rate Limiting + Backpressure Control ​

0. Toàn cảnh: sao "reject" user? ​

1. Algorithms: 3 plan kinh điển ​

2. Backpressure: khi upstream nhanh hơn downstream ​

3. Multi-layer rate limiting ​

4. Selection thực ​

Tổng kết ​

Tài liệu ​