Rate limiting protects your API from abuse, prevents noisy tenants from degrading performance for everyone else, and keeps your infrastructure costs predictable. Most SaaS startups implement it too late (after an incident) or too aggressively (blocking legitimate users). Here is how to do it right.
Why rate limiting matters
Without rate limiting, a single customer can consume all of your API capacity. A bug in their integration might fire thousands of requests per second. A malicious actor might try to brute-force your authentication endpoints. A poorly written script might paginate through your entire dataset in a tight loop.
Each of these scenarios has happened to startups we work with. In one case, a customer is integration bug caused 50,000 requests per minute, which overwhelmed the database and caused a 2-hour outage for all customers.
The three layers
Layer 1: Global rate limiting. Protect your infrastructure from total overload. Set a maximum requests-per-second that your infrastructure can handle with headroom. If your system can handle 5,000 RPS at peak, set a global limit at 3,000 RPS. This catches DDoS attacks and runaway integrations.
Implementation: Use your load balancer or API gateway. AWS ALB, NGINX, or Cloudflare all support global rate limiting. No application code needed.
Layer 2: Per-tenant rate limiting. Prevent any single tenant from consuming a disproportionate share of capacity. Typical limits for a SaaS API: 100-1,000 requests per minute per tenant, depending on your use case.
Implementation: Use Redis with a sliding window algorithm. For each request, increment a counter keyed by tenant_id and current minute. If the counter exceeds the limit, return HTTP 429 (Too Many Requests). Include a Retry-After header telling the client when they can try again.
Layer 3: Per-endpoint rate limiting. Some endpoints are more expensive than others. An endpoint that generates a PDF report should have a lower rate limit than an endpoint that returns a single record. Authentication endpoints should have strict limits to prevent brute-force attacks.
Common per-endpoint limits:
- Login/authentication: 10 requests per minute per IP
- Password reset: 3 requests per hour per email
- Data export/report generation: 5 requests per hour per tenant
- Search/list endpoints: 60 requests per minute per tenant
- CRUD operations: 300 requests per minute per tenant
- Webhook delivery: 1,000 requests per minute per tenant
Implementing it right
Return proper HTTP responses. Use 429 Too Many Requests, not 403 Forbidden or 500 Internal Server Error. Include these headers in every response (not just rate-limited ones):
- X-RateLimit-Limit: the maximum number of requests allowed in the window
- X-RateLimit-Remaining: the number of requests remaining in the current window
- X-RateLimit-Reset: the Unix timestamp when the window resets
- Retry-After: (on 429 responses) the number of seconds to wait before retrying
Use the sliding window algorithm. Fixed windows (reset every minute on the minute) create burst problems at window boundaries. A client could send 100 requests at 11:59:59 and another 100 at 12:00:00, effectively doubling their rate. Sliding windows solve this by counting requests over the previous N seconds, regardless of clock boundaries.
Differentiate by plan. Enterprise customers should get higher rate limits than free-tier users. Define rate limits per pricing tier and communicate them clearly in your API documentation.
Alert on rate limiting events. When a customer starts hitting rate limits, it might indicate a problem with their integration. Proactively reach out before they file a support ticket.
Exempt internal services. Your own frontend, your background job processors, and your internal tools should not be subject to the same rate limits as external API consumers. Use separate authentication tokens for internal services and apply higher (or no) limits.
Testing rate limits
Before deploying, test your rate limiting implementation thoroughly. Verify that: limits are enforced accurately, 429 responses include the correct headers, requests resume normally after the window resets, and legitimate traffic is not affected at normal volumes.
Need help with API design and security?
traztech helps SaaS startups build secure, scalable APIs. From rate limiting to authentication to documentation, we make sure your API is production-ready.
Book a free strategy call