Developer docs

Continuous deployment to production

Applications on the isolated builds allow-list with continuous deployment enabled can deploy to production when their code merges to main. This page explains what happens end-to-end when a CD-eligible change lands.

How it works

Every commit to main runs the same continuous-integration.yml workflow — builds, tests, and deploys to dev and staging regardless of whether the change is a single-app or full-site build. The only difference is what happens at the end: if CI determines the change is a CD-eligible single-app build, it dispatches a separate workflow (continuous-deploy-production.yml) to also deploy that application's assets to production.

If the change is not CD-eligible — because it touches files outside the app, the app doesn't have CD enabled, or any other reason — CI still runs the full workflow. Builds, tests, and dev/staging deploys all proceed normally. The only thing that doesn't happen is the production CD dispatch. The change will reach production via the next daily full-site deploy.

End-to-end flow

  1. CI detects a single-app change. The continuous-integration.yml workflow determines which applications were affected by the merge. If every changed file belongs to one (or a group of) allow-listed application(s), CI performs a partial build — compiling only those apps instead of the entire site.

  2. Build and archive. The partial build output is compressed into a tarball and uploaded to S3 at a path that is distinct from full-site builds:

    • Partial builds: s3://.../partial/<commit-sha>/<buildtype>.tar.bz2

    • Full builds: s3://.../full/<commit-sha>/<buildtype>.tar.bz2

    This path separation ensures a partial build can never be overwritten by a full build (or vice versa), even if both run for the same commit.

  3. Build metadata is embedded. A BUILD_ARTIFACT.txt file is included inside the tarball with metadata about the build:

    • IS_SINGLE_APP_BUILD — whether this is a partial or full build

    • IS_CONTINUOUS_DEPLOYMENT_ENABLED — whether CD is enabled for the app(s)

    • REF — the commit SHA

    • BUILD_TIMESTAMP — when the build was produced

  4. Tests and dev/staging deploys. CI runs unit tests and Cypress tests for the changed app(s), then deploys the build to dev and staging. This happens for every commit to main, whether or not the change is CD-eligible.

  5. CD eligibility check. CI checks whether the application has continuousDeployment enabled in the allow-list and that valid entry names exist. If either check fails, the workflow completes normally — the change is already deployed to dev and staging, and will reach production via the next daily full-site deploy. The only step that is skipped is the production CD dispatch.

  6. Holiday check. CI checks whether a holiday deploy freeze is active. If so, the production dispatch is skipped and a message is logged.

  7. CD production dispatch. CI sends a repository_dispatch event to trigger the continuous-deploy-production.yml workflow, passing the commit SHA, entry app name, Slack channel, and concurrency group.

  8. Pending deployment notification. The continuous-deploy-production.yml workflow posts a Slack notification to the app team's channel (or #status-vets-website if no team channel is configured) indicating that a deployment is awaiting approval.

  9. Approval gate. The deploy job uses the production-cd GitHub environment, which requires approval from a member of the fe-deployment-approval-team. The workflow pauses here until someone approves. If a new commit for the same concurrency group arrives while approval is pending, GitHub Actions' cancel-in-progress: true setting cancels the waiting run in favor of the newer one — so the older deploy never executes.

  10. Deployment safety checks. After approval, the pipeline runs check-deployability.js, which:

    • Fetches the currently deployed BUILD.txt from S3 to determine what is live

    • Compares the current commit against the deployed commit using structured key-value parsing and SHA validation

    • Waits if another deployment is in progress to avoid conflicts

  11. Build type validation. The partial-deploy.sh script extracts the tarball and validates the embedded BUILD_ARTIFACT.txt — confirming that IS_SINGLE_APP_BUILD=true. If the build type does not match (e.g., a full build was accidentally provided), the script fails immediately with a clear error message. This prevents the class of incident where the wrong build type is deployed with the wrong script.

  12. Partial deployment. partial-deploy.sh deploys only the application's JavaScript and CSS assets to production. It syncs files to both the website bucket and the asset bucket using aws s3 sync without the --delete flag — meaning it only adds or overwrites the app's specific files and never removes anything else from production. This is how multiple apps coexist safely: each CD deploy updates only its own assets. Specifically:

    • Uncompressed JS and CSS are synced to the website bucket

    • Gzip-compressed JS, CSS, and TXT files are synced to the asset bucket with Content-Encoding: gzip

    • HTML scaffold pages are not deployed. During the build, vets-website generates an index.html for each app route (these are the shell pages that load the app's JS/CSS bundles). However, for partial builds, remove-global-assets.sh strips the build directory down to only the app's generated/ JS/CSS chunks before the tarball is created — removing the scaffold HTML at the archive step. As an additional safety net, partial-deploy.sh also filters to only *.js*, *.css*, and *.txt files at deploy time. HTML pages are deployed only by the daily full-site deploy via deploy.sh.

    • Global/shared assets (polyfills, vendor bundles, shared modules, web components, style bundles) are excluded even if they somehow appear in the tarball — the script applies rsync exclusion rules as a safety net on top of the build-time filtering.

  13. Success notification. A final Slack notification confirms the deployment completed successfully. If the deploy fails or is rejected, a failure notification is posted instead.

What gets deployed (and what doesn't)

Deployed by CD (partial deploy)

NOT deployed by CD

App-specific JS bundles

HTML scaffold pages (stripped at archive time, filtered at deploy time)

App-specific CSS

Global polyfills

App-specific webpack chunks

Vendor bundles

App-specific TXT files

Shared modules


Web components


Style bundles

Global and shared assets — along with HTML scaffold pages — are deployed only by the daily full-site deploy, which uses deploy.sh and syncs everything with the --delete flag (removing stale files from previous builds).

Concurrency and deployment ordering

The CD production workflow uses GitHub Actions concurrency groups to prevent conflicting deployments. When CI dispatches a CD deploy, it includes a concurrency_group derived from the application's rootFolder name in the allow-list. The continuous-deploy-production.yml workflow is configured with:

concurrency:
  group: deploy-<concurrency_group>
  cancel-in-progress: true

This means:

  • One deploy per app at a time. If a second commit for the same app arrives while a deploy is pending approval or in progress, the earlier run is cancelled and the newer commit takes its place.

  • Different apps deploy independently. A deploy for vaos does not interfere with a deploy for check-in — they have different concurrency groups.

Grouped applications

Some allow-list entries have a rootFolder that contains multiple applications (e.g., check-in contains pre-check-in, day-of check-in, and travel claim). All applications under the same rootFolder share the same concurrency group. This is usually correct — changes to one app in the group should supersede any pending deploy for the group.

Edge case: overlapping concurrency groups

A rare scenario can cause out-of-order deployments when two commits affect the same app through different concurrency groups:

  1. Commit A changes src/applications/my-app/ only → concurrency group my-app

  2. Commit B changes files in both src/applications/my-app/ and src/applications/other-app/ → concurrency group my-app,other-app (a grouped build)

These are different concurrency groups (my-app vs my-app,other-app), so GitHub Actions treats them as independent — both can run simultaneously. If Commit A's deploy completes after Commit B's, production will have an older version of my-app.

Recommendation: If your application is part of a group that frequently changes together, coordinate with the other app teams. Avoid merging single-app changes immediately before or after a grouped change that includes your app.

Enabling continuous deployment for your application

Continuous deployment is enabled by default for applications on the allow-list. To explicitly control it, set the continuousDeployment field in config/changed-apps-build.json:

JSON
{
  "rootFolder": "your-app",
  "slackGroup": "@your-team",
  "continuousDeployment": true
}

Setting continuousDeployment to false disables CD for the application — changes will only reach production via the daily full-site deploy.

If your application is not on the allow-list yet, see How to add your application to the allow-list.

Safety controls

The CD pipeline includes multiple layers of protection that prevent the class of failure that caused the January 2026 deployment incident:

Control

What it prevents

S3 path separation

Partial and full builds use distinct S3 paths (partial/ vs full/), eliminating the possibility of one overwriting the other

Build type validation

Deployment scripts validate BUILD_ARTIFACT.txt before executing — a script receiving the wrong build type fails immediately

Approval gate

The production-cd environment requires approval from the fe-deployment-approval-team before any deploy executes

Deployability checks

check-deployability.js prevents concurrent deployments from conflicting and validates SHA integrity

Additive sync (no --delete)

partial-deploy.sh only adds or overwrites the app's own files — it never removes other files from production

Double asset filtering

Global assets are removed during both the build archive step and the deployment step, ensuring partial deploys never touch shared platform code

Concurrency controls

GitHub Actions concurrency groups prevent multiple deploys for the same app from running simultaneously; newer commits cancel pending older ones

Dry-run mode

Engineers can run partial-deploy.sh -n locally against a real tarball to verify behavior without any S3 interaction

When the production CD dispatch does not occur

In all of the following cases, the CI workflow still runs normally — builds, tests, and dev/staging deploys all proceed. The only thing that is skipped is the dispatch to continuous-deploy-production.yml:

  • The PR changes files outside the application's src/applications/<rootFolder> directory (CI performs a full build instead of a partial build)

  • The application is not on the allow-list

  • The application has continuousDeployment: false in the allow-list

  • The merge target is not the main branch

  • A holiday deploy freeze is active

In all of these cases, the change will reach production via the next daily full-site deploy.

FAQ

Q: My app is on the allow-list but my change went out with the daily deploy instead of CD. Why?

A: The most common reason is that your PR included changes to files outside your app directory (e.g., shared platform code, feature flag names, other apps). When that happens, CI performs a full build and the production CD dispatch does not occur. Your change still deployed to dev and staging normally — it just reaches production with the daily deploy instead.

Q: My CD deploy was approved and succeeded, but production is showing an older version of my app. Why?

A: This can happen if your CD deploy went out between when the daily Build and Tag workflow ran (which snapshots main for the daily deploy) and when the daily deploy actually executed (1:00 PM ET cron, sometimes slightly later depending on GitHub Actions traffic). The daily full-site deploy uses deploy.sh with the --delete flag, which syncs the full build and removes files not in that snapshot — effectively overwriting your newer CD-deployed assets with the older tagged build. To fix this, re-run your continuous-deploy-production.yml workflow from the GitHub Actions UI — it will re-deploy your newer code on top of the daily deploy. To avoid this situation, we recommend not merging CD-eligible changes within an hour before the daily deploy (approximately noon–1:00 PM ET).

Q: Can CD deploy HTML changes?

A: No. Although vets-website generates scaffold HTML pages during the build (the index.html shell pages that load each app's JS/CSS bundles), these are stripped out of partial builds by remove-global-assets.sh at archive time. As an additional safeguard, partial-deploy.sh only syncs *.js*, *.css*, and *.txt files. HTML pages reach production only via the daily full-site deploy (deploy.sh).

Q: Why does the production deploy pause for approval instead of deploying automatically like dev and staging?

A: Dev and staging deploys are truly continuous — they happen automatically as part of CI with no human intervention. The production approval gate is intentional: it gives your team a window to verify the change on staging before pushing it to production, while still letting you deploy to production on your own timeline without waiting for the next daily full-site deploy.

Q: What happens if a CD deploy fails?

A: The deployment script fails fast and posts a failure notification to Slack. The previous version of your app's assets remains live on production. The change will be included in the next daily full-site deploy automatically.

Q: What happens if nobody approves the deploy?

A: The workflow will eventually time out and be marked as cancelled. A Slack notification is posted indicating the deploy expired without approval. If a newer commit for the same app arrives while approval is pending, the older run is automatically cancelled in favor of the newer one.

Q: Can I test what a CD deploy would do without actually deploying?

A: Yes. Use dry-run mode: partial-deploy.sh -n -s /path/to/tarball.tar.bz2 -d s3://target -a s3://assets. This runs the full deployment logic (extraction, validation, filtering) but replaces all S3 operations with logged output showing what would have been synced.