Mitigating and Handling Leaked API Keys
Last Updated: November 4, 2024
The Vets API and Vets Website repositories are public, VA owned repos that are hosted on GitHub. Many of the external services we interact with require API keys for authentication. During local development, it’s common to temporarily embed these keys in the code. However, this practice can lead to accidental exposure of sensitive keys if committed to source control. This document explains a response plan and best practices if API keys are leaked.
Types of API keys
API keys can include private keys, OAuth tokens, Bearer tokens, and more. These credentials are often used for authentication and authorization when interacting with external services on the VA Network or adjacent APIs (such at Lighthouse). In the Vets API Rails app, API keys are often utilized within Service classes to manage requests to third party services or recorded within VCR cassettes for test cases that simulate API responses. Proper handling of these keys is essential to ensure secure communication and prevent unauthorized access to protected resources.
VCR scrubbing
We should ensure that any sensitive data, such as API keys, is properly scrubbed to prevent it from being recorded in VCR cassettes. This can be achieved by configuring RSpec to mask specific values during test recordings. This setup allows developers to safely store cassettes without exposing sensitive information, maintaining security while enabling reliable testing with simulated API responses by utilizing the VCR filter_sensitive_data
method.
The VCR.filter_sensitive_data
method for RSpec with VCR allows you to mask sensitive information, like API sensitive keys, amongst other data, from being recorded in cassettes. You specify a placeholder for the sensitive value so it doesn’t appear in the saved cassette files. This ensures that when tests run, the real data is replaced with <SCRUBBED_VALUE_HERE>
in the cassette, keeping sensitive data secure while preserving the ability to replay recorded requests and responses.
Identification of leaked keys
Leaked API keys are often discovered during PR reviews, where a reviewer (Platform or VFS) may notice sensitive information committed in code. Other times, team members working within a repository or investigating an issue may stumble upon the leak, such as through misconfigurations or accidentally shared credentials. Occasionally, leaked keys might also surface from security tools scanning the repository (whether scanned internally or flagged from external or third party services where the API key was issued).
Immediate response plan
Notification
Upon identifying a leaked key, Please notify the Platform Team immediately to begin mitigation.
Key rotation and revocation
❗Many legacy VA systems may complicate the rotation process, but every effort should be made to rotate the key promptly to prevent misuse.
The first and most critical step is to rotate the leaked key to invalidate the exposed credential.
After rotating the key, ensure that any related configurations or secrets are promptly updated to reflect the new key.
After updating the key (most likely in Parameter Store), the pods in EKS will need to be rolled to ensure they pick up the new value. If the Parameter Store key does not reference a specific version in the config (e.g.
my_key/some_path:3
), the secret store will automatically retrieve the latest version. In cases where no version is specified in the secrets manifest, a deployment rollout will be required to propagate the changes.Please ask a Platform member to Restart ArgoCD Deployments to Recycle Pods.
Note: Once the Parameter Store value is updated, any new pods that spin up (such as during an autoscaling event) will immediately use the new key.
Important note on downtime: Revoking the compromised key may cause downtime or disruptions to services that rely on it, until the new key is in place. To minimize impact, ensure any configuration files, environment variables, or secret management values are updated as soon as possible with the new key. See Stakeholder Engagement.
For external services integrations using the Breakers middleware pattern, an outage can be force triggered to manage disruptions. This mitigation strategy will additionally require scheduling a maintenance window in PagerDuty and may also involve placing a temporary banner on the frontend to notify users about the service impact. Additionally, a WIP Maintenance Bot is place to streamline the scheduling of maintenance windows.
If Rotation is delayed, please see the next section.
Again, this may result in downtime until a new key can be integrated.
Stakeholder engagement and communication (if the incident is ongoing):
If the leak leads to an ongoing incident, both Platform and OCTO leadership must be engaged to assess the situation and coordinate further actions.
‼️ If the key cannot be immediately rotated
Coordinate with stakeholders
Evaluate the best course of action for maintaining operational stability. Given the circumstances, this may involve temporary downtime for affected services until a new key can be successfully rotated and integrated. Clear communication will be essential to manage expectations and minimize disruptions.
Determine the most appropriate course of action
Is scrubbing the Git history the best option for mitigating the leak?
Should the key be revoked immediately, or would doing so disrupt critical services?
Carefully evaluate the potential impact with the help of leadership to avoid unintended downtime, and ensure that alternative solutions are considered if the key plays a vital role in service stability.
Scrubbing Git history (option if rotation is delayed)
⚠️ Important - Monitor logs and other related metrics for misuse if the key cannot be immediately rotated.
⚠️ Important - Once a key is committed to a public repository, it should be considered fully exposed and compromised, and there is no way to guarantee it hasn’t been accessed or copied. Even if the key is quickly removed, it may have already been cached or detected by automated scanning tools.
If the key cannot be immediately rotated, removing the key from the Git history may be the next best step to reduce misuse and to limit the scope of impact.
Use this guide on scrubbing Git history to ensure all traces of the key are removed from the repository.
The
git filter-repo
tool or the BFG Repo-Cleaner open source tool are both options for scrubbing Git history.
Monitoring
After updating the key, verify that all services are functioning correctly to avoid issues or downtime.
Monitor via Datadog or Sentry.
Postmortem process
After the incident is resolved, complete a postmortem report following the documented process. This helps prevent future occurrences and provides a clear timeline and summary of actions taken.
Start a postmortem draft using the Postmortem template. Postmortems should live in the
va.gov-team-sensitve
repo under the Postmortems directory.
Prevention best practices
Add sensitive files to .gitignore
Avoid hard coding API keys directly in your code or in configuration files.
If this is unavoidable, please ensure that sensitives values are not committed to the remote repository.
Store secrets locally (in a
.env
file) or set them as local environment variables to avoid committing them to the public repo.Note: Git-ignore the
.env
file to ensure it isn’t tracked to git history.
export API_KEY=your_api_key_here
OR
# example .env file
API_KEY=your_api_key_here
DATABASE_URL=your_database_url_here
# .gitignore
.env
Be mindful of pasting sensitive values into settings files (like
settings.yml
). We usesettings.yml
heavily, but use caution to avoid pasting sensitive keys here (and later committing to git history).
api:
key: <%= ENV['API_KEY'] %>
Use GitHub's built-in secret scanning
The Department of Veterans Affairs participates in a Responsible Vulnerability Disclosure Program through Bugcrowd.
Secret scanning is currently enabled on Vets API. GitHub’s secret scanning automatically identifies and sends alerts upon detection of exposed credentials, API keys, and other sensitive information in GitHub repositories, helping to prevent security breaches.
See the closed issues here (Note: You need permissions to the Vets API Github Repo to access this link) for examples of past secret scanning alerts. Read more about secret scanning patterns here.
Rotate API keys regularly
Set short expiry times for API keys where possible.
If an API key is accidentally leaked, rotate it immediately to minimize impact.
Minimize the scope of API keys to restrict access only to necessary resources.
Enable IP whitelisting or rate limiting where possible to further secure API keys.
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.