Vets API on EKS
Last Updated:
Intro
The vets-api development, staging, sandbox, and production environments currently reside in EKS. This document provides background information on how EKS and how to work with Vets API from the EKS infrastructure.
How does it work?
EKS
EKS is a managed service that you can use to run Kubernetes on AWS. It removes the need to install, operate, and maintain your own Kubernetes control plane. In the light of container orchestration, EKS works automatically by unifying the infrastructure components.
ECR
ECR is an AWS manager container image registry service. A registry exists for the Vets API-specific images in ECR. When the docker image is built upon push to the master branch, an image is then pushed to the Vets API ECR registry.
ArgoCD
ArgoCD provides a UI for developers to see the application and manages the deployment by checking the desired state against the deployed state.
GitHub Actions
Vets API in EKS utilizes GitHub Actions to build and update the docker image when a change is pushed to the master branch.
The vets-api deployments no longer use the k8s branch to deploy to EKS.
Helm charts
The vets-api EKS deployment utilizes a custom helm chart that resides in the vets-api repository. The vets-api manifest files then reference the helm chart package and custom values are passed to the chart.
Utilizing a helm chart simplifies the deployment, maintainability and reduces repeated code. vets-api-server (puma) and vets-api-worker (sidekiq) are bundled into the same parent helm chart.
More on helm charts here.
Access
Access to the vets-api
EKS applications is managed via GitHub teams (linked below). To obtain access, fill out a Vets-api ArgoCD terminal access request form. Note: prod access requires OCTO-DE approval and will take longer to get than the lower environments.
Vets API GitHub teams
Terminal access
Links to Vets API in ArgoCD (requires SOCKS)
Access the terminal via Argo
Navigate to http://argocd.vfs.va.gov/applications (Requires Socks)
Search for "vets-api-{env-here}" in the search bar
Click on a
vets-api-web-*
pod (far right)Note: Look for the pod icon
A Terminal tab will appear on the far right
Note: If you get an error or don't see the tab, log out/in of ArgoCD. If that doesn’t work, double check that you are a member of the GitHub team for the environment you’re in.
Rails console access
Follow the steps above
Run
bundle exec rails c
Vets API settings and secrets
With EKS, settings and secrets are configured via EKS resources and definitions.
Secret values
The vets-api deployment utilizes secret references via a combination of the ExternalSecret resource and ENV vars in the values.yaml. The env vars can then be referenced via "ruby .erb" in in the settings.local.yml values in the configMap definition. If your setting does not need to be secret, it can just be added to the settings.local.yml configMap definition in the values.yaml (see the ”Creating or updating a non-sensitive value” section). Details on all of this below.
Care and attention to detail should be taken when adding secrets to vets-api
–a misconfigured secret in Parameter Store or in the code will cause a vets-api
pod to fail.
Creating or updating a non-sensitive value
A non-sensitive value is something that doesn’t need to be stored in AWS Parameter Store (for example, mock_debts: false
or service_name: VBS
). In this case, you can add the value to the settings.local.yml
configMap
section of values.yaml. This section is shown in the screenshot.
Adding a cert
For adding certs, see Add certs as secrets to vets-api. It’s uncommon, but these instructions are for adding a cert or other item that needs to end up at a very specific mount path in the pod.
Steps to create a new secret value
A secret value is a sensitive value that needs to be stored in AWS Parameter Store. Most items belong in settings-local-secrets
and you can follow the steps below to get your secret in vets-api
. Steps for adding a settings-local-secret
:
Add your secret to Parameter Store:
aws ssm put-parameter --name /dsva-vagov/vets-api/dev/your_value_goes_here --value your_value_goes_here --type SecureString --overwrite
In the
settings-local-secrets
section of secrets.yaml, add an entry (key
andname
).Note: IMPORTANT if you’re updating the dev file:
vets-api/dev/templates/secrets.yaml
, please update the value in two sections: the section mentioned above and the section formatted like this.- YAML
- secretKey: tt1_ssm_testing remoteRef: key: /dsva-vagov/vets-api/dev/tt1/testing
Note: All keys are listed twice in this dev secrets file. This is to accommodate testing for an EKS upgrade. This is necessary because we have a test cluster based on the Vets API dev manifest, used for testing new EKS versions (and manifest resource versions, etc). If the values are missing or do not match what is set in the live dev environment, the new cluster sync will fail, requiring the Platform team to manually correct and update the values, hindering our automated test processes.
To verify that your values are correctly in the two sections (described in
b
above), there should be a value under both the commented sections titled"# NEW EKS CLUSTER"
and"# OLD EKS CLUSTER"
Add a new entry to the
settings-local-secrets
definition in values.yaml. Thename
andpath
need to match thekey
andname
added in the previous step. Include anenv_var
definition.
Note: Be sure that the path and the name match exactly what you have placed in theExternalSecret
Resource in the step above.In that same file, values.yaml, add your setting to the
settings-configmap
configMap
definition and reference the ENV var you just created. (Thesettings.local.yml
section uses.erb
syntax.)
Steps to update/rename an existing value
If the parameter store secret path hasn't changed, just update the value in parameter store.
If the parameter store secret path HAS changed:
Update the path name in secrets.yaml.
Update the corresponding
env_var
definition (path
and/orenv_var
) in thevets-api-secrets
definition in values.yaml.
How do the secrets work with the parent helm charts?
An ExternalSecret Custom Resource Definition (CRD) was created here to pull in secrets from parameter store.
ENV vars are created on the deployment resource by looping through the secrets definition in the values.yaml.
The deployment resource:
YAML{{- range $keys, $key := $root.Values.common.secrets }} {{- range $secrets, $secret := $key }} - name: {{ $secret.env_var }} valueFrom: secretKeyRef: name: {{ $keys }} key: {{ $secret.name }} {{- end }} {{- end }}
The start of the secrets definition in values.yaml:
YAMLsecrets: vets-api-secrets: - name: sidekiq_license path: /dsva-vagov/vets-api/common/sidekiq_license env_var: BUNDLE_ENTERPRISE__CONTRIBSYS__COM settings-local-secrets: - name: kms_key_id path: /dsva-vagov/vets-api/dev/kms_key_id env_var: KMS_KEY_ID
This configMap definition references and defines a configMap based on the definition in the values.yaml
Parameter Store updates will not trigger pods to be replaced to reload Secrets. Any changes made in the Parameter Store will not be deployed until the next ArgoCD sync in applied.
A version can be added to the end of a parameter path to ensure the correct value is deployed to Vets-API.
ex:
Latest version in the Parameter Store will be used if the version is not defined:
settings-local-secrets:
- name: tt1_ssm_testing
path: /dsva-vagov/vets-api/dev/tt1/testing
env_var: TT1_SSM_TESTING
Even though there are 3 versions in the Parameter Store, Vets-API will use the 2nd version because it’s defined after the parameter path:
settings-local-secrets:
- name: tt1_ssm_testing
path: /dsva-vagov/vets-api/dev/tt1/testing:2
env_var: TT1_SSM_TESTING
Vets API EKS deploy process
How it works
Vets API in EKS deploys from the master branch. The deploy process consists of a combination of GitHub Actions, ECR, yaml manifests, and ArgoCD.
Deploy Process Overview
The following steps detail how changes are deployed to EKS
A change is committed to the master branch
This automatically kicks off a GHA to
Argo is configured to autosync the vets-api application upon a change to the manifest file. (
autosync_enabled
defaults to true)Argo auto syncs the vets-api dev application (ArgoCD requires socks)
Changes are deployed
Again, Vets API utilizes the custom helm chart.
Deploy Process Details
After committing a change to master, you should be able to see when your change was deployed. Once you merge a change, after the image is pushed to ECR, the manifest repo image_tag will be updated with the commit SHA of your change via the VA VSP BOT. Watch the autosync for the manifest commit message and SHA.
Example:
The Rolling Update
Vets API on EKS utilizes a rolling update pattern to ensure zero downtime and no disruption to service. This will incrementally replace pods with new ones, while gracefully draining connections on pods rolling out of service. See more on rolling updates here.
Bulkhead Deployment Pattern
The Bulkhead deployment pattern, utilized in our production environment, acts as a safeguard mechanism, compartmentalizing sections Vets API through defined ingress routes. This guarantees fault tolerance, meaning that even if a set of pods were to have an issue, the overall application remains undisturbed, ensuring consistent performance levels like latency, etc. Currently, several latency prone and high traffic routes are directed to their dedicated bulkheads.
Metrics related to the current bulkhead deployments can be viewed on this Datadog dashboard. We manage these bulkheads through ingress routes, service classes, and distinct pod deployments managed by ReplicaSet resources. Ultimately, we aim to have most distinct logical code grouping or product catered to by an individual bulkheads (e.g. Think the modules in Vets API), which would create an illusion of fault tolerant microservices. Currently, a number of routes benefit from the bulkhead deployment pattern, providing greater benefit overall such as log segregation, increased resiliency and simplified debugging. All “common” routes funnel to the vets-api-web pods. Detailed definitions of our existing bulkheads can be found here defined under the webServices key in the manifest repo.
Bulkhead image visualized
The image below showcases our current bulkhead deployments, focusing particularly on the feature-toggles bulkhead. This structure uses a ReplicaSet to guarantee a consistent number of running pods within each bulkhead. Furthermore, the ReplicaSet actively preserves the desired pod replicas count, ensuring resilience and constant availability. Every bulkhead scales autonomously based on custom Datadog metrics related to available puma threads. Alongside feature-toggles, the image also displays other operational bulkheads, as evident in their respective ReplicaSets and pods.
Resource Hook & Deployment Flow
For further details on the deployment rollout process and details around hook configuration and pre-sync ordering, see the “EKS Deployment Resource Hook Configuration & Deployment Flow” document.
graph TD
A[Vets API] -->|Commit to master branch| B{GitHub Actions}
B -->|One| D[Build Image & Push to ECR]
B -->|Two| E[Deploy]
B -->|Three| H[Code Checks & Linting]
E -->|Parse & Update yaml| F[Update Manifest File Image Tag]
F -->|Commit| G[Argo Detects Change]
G -->|Argo Sync| I[Changes Deployed]
ClamAV
Prior to EKS, ClamAV (the virus scanner) was deployed in the same process as Vets API. With EKS, ClamAV has been broken out into a sidecar deployment that lives on the vets-api server and worker pods. See ClamAV repo for further details. Essentially, this new pattern allows us to extract the ClamAV service outside of Vets API to adopt the single responsibility pattern.
ClamAV is updated on the hour, every hour to ensure that the signature database is up to date via the mirror-images.yml and ci.yml Github Actions. Essentially, this follows the same deployment pattern as Vets API where images are pushed to ECR and the VA VSP BOT updates the manifest with the new image tag.
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.