Rate Limiting Guide in Vets API | VA Platform documentation

Last Updated: 2026-06-29

A complete guide: when to add rate limiting and how to implement it.

Part 1: When Should Your Team Add Rate Limiting?

Rate limiting in vets-api is opt-in. Not every endpoint needs it. Use the guidance below to decide whether to add a Rack::Attack rule for your endpoint.

First: Understand the Two Types of 429s

Before adding rate limiting, distinguish between two different sources of 429 errors in vets-api:

Type	Description
Rack::Attack throttles (inbound)	vets-api itself rejects requests before they reach your endpoint. Protects vets-api from abusive or excessive inbound traffic.
Upstream `429`s (outbound)	An external service (e.g. Lighthouse) returns a `429` to vets-api because your service is calling it too frequently. These are NOT solved by Rack::Attack — they require retry logic, caching, or coordination with the upstream service.

Real Example (May 2026): benefits_documents/service generated 376 429 errors over a 3-week period. Investigation showed the referrers were almost entirely va.gov/track-claims/your-claim-letters — real veterans checking their claim letters, many immediately after login. This was Lighthouse rate limiting vets-api’s outbound calls, not inbound abuse. Adding a Rack::Attack rule here would have blocked legitimate users. The correct fix is caching, retry logic, or working with the Lighthouse team to increase their rate limit.

How to tell the difference: If you’re seeing 429s in your logs but your endpoint isn’t in rack_attack.rb, check the referrer and controller in Datadog. User-facing referrers (e.g. va.gov/track-claims/*) with real controller names point to an upstream issue, not inbound abuse.

Should You Add a Rack::Attack Rule?

Ask yourself the following questions:

1. Is your endpoint unauthenticated or lightly authenticated?

Unauthenticated endpoints are the highest priority for rate limiting. Without authentication, there’s no barrier to abuse. See representation_management/next_steps_email as an example — without throttling it functioned as an open email relay.

2. Does your endpoint trigger expensive downstream calls?

If a single request fans out to multiple upstream services (e.g. Lighthouse FHIR APIs), a high request rate can cascade into upstream rate limit exhaustion. Consider rate limiting to protect both vets-api and your upstream dependencies.

3. Does your endpoint accept file uploads or send external communications?

File upload endpoints and anything that sends emails, notifications, or triggers external actions should be rate limited to prevent abuse and resource exhaustion.

4. Has your endpoint experienced a traffic spike or near-DoS incident?

Most existing Rack::Attack rules were added reactively after incidents. Don’t wait for an incident — if your endpoint is publicly accessible and handles sensitive operations, add a rule proactively.

5. Is your endpoint part of a form submission flow?

Form submission endpoints (POST) are good candidates for rate limiting. A legitimate user submitting a form rarely needs more than 15–30 submissions per minute.

You Probably Don’t Need Rack::Attack If…

Your endpoint is fully authenticated and only accessible to credentialed users
Your endpoint is read-only with low computational cost and no upstream fan-out
Traffic to your endpoint is low and stable with no history of abuse
You’re seeing 429s that trace back to upstream services rather than inbound request volume

Quick Decision Reference

Scenario	Priority	Suggested Limit
Unauthenticated POST (email, form)	High	5–15/min
File upload	High	8/5min
Form submission	Medium	15–30/min
Read endpoint with upstream calls	Medium	20–30/min
High-volume lookup (e.g. facility search)	Medium	30/min
Authenticated, read-only, low traffic	Low/None	Probably no rule needed

When in doubt, reach out to the Platform Backend in #vfs-platform-support in Slack to open a support request. They can help review your endpoint’s traffic patterns in Datadog and recommend an appropriate limit.

Part 2: How to Implement Rate Limiting

Rate limiting is configured in config/initializers/rack_attack.rb using the Rack::Attack gem. There is no global rate limiting — it is added per-endpoint as needed.

When to Add Rate Limiting (Checklist)

Rate limiting should be considered when:

Your endpoint is publicly accessible
The endpoint calls expensive upstream services
The endpoint could be abused to cause denial of service
A Staging Review or Security Review requires it

Reference: The Security Review checklist includes “Rate limits defined” as a required item.

How to add rate limiting

Add a throttle block to config/initializers/rack_attack.rb:

Ruby

throttle('your_endpoint_name/ip', limit: 10, period: 1.minute) do |req|
  req.remote_ip if req.path.starts_with?('/your/endpoint/path')
end

Configuration Options

Parameter	Description	Notes
limit	Maximum requests allowed in the period
period	Time window	e.g. 1.minute, 5.minutes
req.remote_ip	Use this (not req.ip) since we’re behind a load balancer	Preferred over req.ip
req.path	Can use == for exact match or .starts_with? for prefix
req.get? / req.post?	Optional — filter by HTTP method

What Happens When Rate Limited

Returns HTTP 429 Too Many Requests
Includes headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Response body: “throttled”

Part 3: Determining the Right Rate Limit

For new endpoints (no existing traffic data)

When launching a new endpoint, you won’t have production traffic data to analyze. Here’s how to approach rate limiting without historical data:

Step 1: Model the User Journey

Map out realistic usage scenarios. Your rate limit should accommodate the power user scenario with headroom.

Scenario	Calculation	Requests/min
Normal user	1 page load = 2 API calls, user visits 3 pages/min	6
Power user	Rapid searching/filtering, 10 actions/min	20
Automated refresh	Page polls every 30 seconds	2

Step 2: Find a Similar Endpoint for Reference

Your Endpoint Type	Similar Existing Endpoint	Their Limit
Search/lookup	`facilities_api/v2/va`	30/min
Form submission	`education_benefits_claims`	15/min
File operations	`vic/profile_photo_attachments`	8/5min
Status/polling	`medical_copays`	20/min

Step 3: Check Upstream Service Constraints

If your endpoint calls external services, their limits set your ceiling:

Your rate limit ≤ Upstream service limit / Expected concurrent users
Contact the upstream service team to understand their constraints.

Step 4: Start High and Plan to Adjust

Recommended approach for new endpoints:

Ruby

# Phase 1: Launch with permissive limit (2-3x expected peak usage)

throttle('new_endpoint/ip', limit: 60, period: 1.minute) do |req|
  req.remote_ip if req.path.starts_with?('/v0/new_endpoint')
end

Then follow this timeline:

Week	Action
1–2	Monitor traffic patterns in Datadog, no changes
3	Analyze P95 usage, identify if limit is too high
4+	Adjust limit based on actual data

Step 5: Add Monitoring From Day One

Deploy with Datadog monitoring so you can adjust quickly:

# In your controller or service

StatsD.increment('api.new_endpoint.request', tags: ["ip:#{request.remote_ip}"])

Step 6: Document Your Assumptions

In your PR, document:

Expected user behavior and request patterns
Similar endpoints used as reference
Upstream service constraints (if any)
Plan for adjusting limits post-launch

Example PR description:

Rate limit set to 30/min based on:

• Similar to facilities_api endpoint (30/min)

• Expected max 10 requests/min for power users

• Upstream service X has 100/min limit

• Will review after 2 weeks of production traffic

For Existing Endpoints (With Traffic Data)

If your endpoint already exists and has traffic, you can use Datadog to make data-driven decisions.

Step 1: Analyze Expected User Behavior

Think through the user journey: How many times would a legitimate user hit this endpoint in a session? Is it called once per page load? Multiple times during form submission? Are there any frontend polling patterns?

Example: If a user searches for facilities and might refine their search 5–6 times, and each search makes 2 API calls, that’s ~12 requests in a few minutes for an active user.

Step 2: Check Existing Traffic in Datadog

Before adding rate limiting, query Datadog for current traffic patterns:

# Requests per IP per minute

sum:vets_api.requests{path:/your/endpoint/*} by {client_ip}.rollup(count, 60)

Look for:

P95/P99 requests per IP per minute — what do normal heavy users look like?
Max requests per IP — what do potential abusers look like?
Distribution — is there a clear gap between normal and abnormal traffic?

Step 3: Start Permissive, Then Tighten

Phase	Limit	Purpose
1. Monitor only	None	Add logging/metrics to track what would be rate limited
2. High limit	100/min	Catch only obvious abuse
3. Tighten	30–50/min	Based on observed normal traffic
4. Final	10–20/min	If needed, based on upstream limits

Step 4: Consider Upstream Service Limits

If your endpoint calls an external service (PPMS, Lighthouse, etc.):

What are their rate limits?
Your limit should be lower than theirs to protect the upstream service

Step 5: Environment-Specific Limits

You can exclude non-production environments from rate limiting:

Ruby

throttle('your_endpoint/ip', limit: 10, period: 1.minute) do |req|
  req.remote_ip if req.path.starts_with?('/your/endpoint') &&
    !Settings.vsp_environment.match?(/local|development|staging/)
end

Part 4: Reference

Safe Starting Points

Endpoint Type	Safe Starting Limit	Rationale
Read-only GET	30–60/min	Users may browse/search repeatedly
Form submission POST	15–20/min	Deliberate actions, allow for retries
File upload	10/5min	Heavy operations, natural user throttling
Shared with other apps	Coordinate with teams first	Avoid breaking partner integrations

Existing Rate Limits in rack_attack.rb

Endpoint	Limit	Period	Notes
facilities_api/v2/va	30	1 min	Added after DoS incident
facilities_api/v2/ccp/provider	8	1 min	PPMS protection
vic/profile_photo_attachments (GET)	8	5 min	Download limit
vic/profile_photo_attachments (POST)	8	5 min	Upload limit
vic/supporting_documentation_attachments	8	5 min	Upload limit
vic/vic_submissions	10	1 min	Form submission
check_in	10	1 min	Excludes local/dev/staging
medical_copays (GET)	20	1 min	Read operations
education_benefits_claims (POST)	15	1 min	Form submission
form214192 (POST)	30	1 min	Form submission
form21p530a (POST)	30	1 min	Form submission
form210779 (POST)	30	1 min	Form submission
form212680 (POST)	30	1 min	Form submission
vaos/v2/appointments (GET/POST/PUT)	30	1 min	VAOS appointments
vaos/v2/providers (GET)	30	1 min	VAOS providers
vaos/v2/locations (GET)	30	1 min	VAOS clinics
vaos/v2/community_care/eligibility (GET)	30	1 min	VAOS CC eligibility
vaos/v2/eligibility (GET)	30	1 min	VAOS patient eligibility
vaos/v2/scheduling/configurations (GET)	30	1 min	VAOS scheduling
vaos/v2/facilities (GET)	30	1 min	VAOS facilities
vaos/v2/relationships (GET)	30	1 min	VAOS relationships
ask_va_api/v0/zip_state_validation (POST)	60	1 min	Production only
ask_va_api/v0/diagnostics (GET)	30	1 min
representation_management/v0/next_steps_email (POST)	5	1 min	Per IP; prevents open relay
representation_management/v0/next_steps_email (POST)	3	1 hour	Per destination email address

Monitoring After Deployment

Set up a Datadog dashboard to track:

429 responses — How often is the limit being hit?
Unique IPs hitting limits — Is it one bad actor or many users?
Requests just below limit — Are legitimate users getting close?

Datadog query examples:

# Count of 429 responses

sum:vets_api.response{status:429,path:/your/endpoint/*}.as_count()

# Unique IPs hitting rate limits

count_distinct:vets_api.requests{status:429,path:/your/endpoint/*} by {client_ip}

Safe Rollout Strategy

Start with a limit of 2–3x your expected heavy user (e.g., if you expect 10 requests max, set 30)
Deploy to production with monitoring enabled
Watch Datadog for 1–2 weeks to observe actual traffic patterns
Tighten the limit based on observed data
Document your rationale in the PR for future reference

Additional Resources

Rack::Attack Documentation
Engineering and Security Checklist
Sidekiq Enterprise Rate Limiting (for worker-level rate limiting)

Questions?

Reach out in #vfs-platform-support on Slack.

Help and feedback

Suggest content changes to this page.
Submit new Platform Website content.
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.