Incident Call Rules: Swarm Room vs. MIM Bridge
Last Updated:
Overview
For every Major Incident, two calls will be opened during the resolution process:
1. A Swarm Room
2. A MIM Bridge.
Swarm Room
The Swarm Room is the dedicated space for engineers actively diagnosing and resolving the root cause of the incident. Its purpose is twofold:
1. Enable engineers to collaborate and restore service as quickly as possible.
2. Allow others to listen in and capture root cause and resolution details to support outward communications (including updates on the MIM Bridge).
If you are not an engineer working on resolving the issue, you are welcome in the Swarm Room but there are some rules:
This is a safe space for engineers to openly explore potential causes and solutions. All engineers are welcome, but focus must remain on problem-solving.
Please do not disrupt active troubleshooting. If you have questions, submit them in the chat. They will either be addressed during the call, answered by others in attendance, or captured for follow-up in the Post Mortem.
Anyone directly involved in the incident or troubleshooting is welcome in the call but some people are REQUIRED to attend:
VA Platform Leadership: Erika Washburn and Steve Albers
Platform Contract Leadership: Andrea Townsend, Lindsey Hattamer, Jason Woodman.
Incident Commander: Individual on IC rotation from the Support team.
Support Team TL: Brandon Dech (backup: Lindsay Insco)
TL from the team who owns the broken service: Curt Bonade, Steven Venner, Ken Mayo.
MIM Bridge
For each Major Incident, the MIM Team will open a MIM Bridge after the Incident Commander designates the MIM as urgent/critical.
On the Bridge, the MIM Team will gather essential information such as
A description of the incident
Impacted services
Number of affected users
Everyone is Welcome in the MIM Bridge but some people are REQUIRED to attend:
A senior team member (TL or senior) who is not hands-on resolving the issue but can provide clear, timely updates. They must be able to answer questions from the MIM team about the incident and status of resolution. This person should bridge communication between the Swarm Room and the MIM Bridge.
This should be a TL or a Senior member of the team (For example, Kyle for IST, or Curt for SRE.)
Incident Commander: Individual who submitted the MIM.
Program Manager: Andrea Townsend (backup: Em Allan)
Engineering Lead: Lindsey Hattamer (backup: Clint Little)
VA Technical Leadership: Steve Albers (backup: Andrew Mo)
VA Product Leadership: Erika Washburn (backups: Marni, Chris J)
If you are named as required and you cannot attend, you must designate who your backup is in the Platform Leadership channel in slack.
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.