Handling failed isolated application pipelines

Isolated application pipelines in the main branch of vets-website can cause deployments to fail. Escalating and resolving the failed pipeline is necessary to ensure code can be deployed on a consistent basis. It is the responsibility of application owners to triage and resolve failed CI pipelines for their applications in main.

Learn more about the functionality of isolated applications and how to configure an application to be isolated in the Isolated application builds documentation.

Following pipeline failures

You can follow the status of vets-website pipelines in the #status-vets-website Slack channel. The channel has automatic alerts that get sent when a pipeline fails in the main branch. If your team’s application pipeline fails, the alert will tag the Slack user group specified for the application in the allow-list. The alerts will also include a link to the failed pipeline in GitHub Actions.

The link in the alert will take you to an overview of the pipeline, which should look something like this:

The Slack alert should be responded to promptly. To let others know that you're looking into the failed pipeline, comment in the thread of the alert or react to it with an emoji. Application owners are responsible for resolving isolated application failures.

Common causes of build failures

There are common issues that can cause isolated application pipelines to fail in main.

Flaky tests

Tests can fail for unapparent reasons. The failure may be a fluke, or there could be a real issue, so everything needs to be investigated.

Cypress tests

You can view failed Cypress tests by opening the Mochawesome report in the Cypress Test Summary section of the workflow.

Steps to resolve failed Cypress tests:

Re-run the failed jobs in the workflow. GitHub Actions supports re-running failed jobs in a workflow. You can do so by opening the workflow of the failed commit from the commit status and re-running the failed jobs or re-running the entire workflow.
If the Cypress test(s) fail again, they will need to be disabled if a fix can’t be provided within an hour. You can disable the test by merging a PR for skipping the test while further investigating the issue.
Once a PR has been merged to either fix the issue or skip the Cypress test(s), verify that the new commit’s pipeline successfully completes.

Unit tests

Learn more about flaky unit tests in the Handling flaky unit tests documentation.

Build

The Build job in the pipeline can fail for various reasons. If you notice a build failure due to a Webpack error that is unrelated to your app, you can notify the person on support for the Release Tools Team.

Restarting the daily deploy

If the failed application pipeline is the last commit in the repo at the time that the daily production deploy starts, the deployment will fail. You will need to have the deploy restarted once the issue has been resolved. You can tag the person on support for the Release Tools Team in the Slack thread of the failed build notification, and let them know that the issue has been resolved. They will be able to restart the daily production deploy.

Help and feedback

Suggest content changes to this page.
Submit new Platform Website content.
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.