Major Incident Management Playbook
Last Updated:
Overview
This document will help all VFS team members working on VA.gov understand how to report major incidents to the Major Incident Management (MIM) team when necessary. Please read this document carefully as you prepare to make a call to raise a potential major incident.
Acronyms
ESD - Enterprise Service Desk
MIM - Major Incident Management
HPI - High Priority Incident
CPI - Critical Priority Incident
SNOW “ITIL“ - ServiceNOW Information Technology Infrastructure Library
MI - Major Incident
Reporting a HPI/CPI Incident to the Major Incident Management team
Contact the MIM team by directly calling the ESD, and selecting option 9. This bypasses the ESD Tier 1 and provides a direct connection to the MIM team.
Calling to report the HPI/CPI incident to the MIM
After calling the Enterprise Service Desk - (855) 673-4357, and selecting option 9, you will be greeted with the following information:
Welcome to the Major Incident Management team. Are you an OIT IT Professional, Area Manager, Program Manager, Senior Engineer, Administrative Officer of the Day, or Nurse of the Day who can answer questions on this incident?
• If neither, press 1 for No. You will be informed of the following:
A. If you believe a major incident needs to be submitted, please contact an OIT IT Professional, Area Manager, Program Manager, Senior Engineer, Administrative Officer of the Day or Nurse of the Day to call the Enterprise Service Desk and select option 9. You are now being forwarded to the Enterprise Service. Please select option 2 to be connected with an agent.
• If yes, press 2 for Yes. Go to step 2.
Is there an acceptable workaround in place?
• If no, press 1 for No. Go to step 3.
• If yes, press 2 for Yes. You will be stated the following:
A. If a workaround is available, this would not qualify as a major incident. You are being forwarded to the Enterprise Service Desk to report your issue. Please select Option 2 to be connected with an agent.
Is this a service or application impacting a VA facility or nationwide?
• If no, press 1 for No. You will be stated the following:
A. Incidents that do not impact a VA Facility or Nationwide are handled via the Enterprise Service Desk. You are being forwarded to the Enterprise Service Desk. Please select Option 2 to be connected with an agent.
• If yes, press 2 for Yes. You will be stated the following:
A. You are now being connected to a MIM agent.
Questions that will be asked during the reporting call
What is the affected service?
Is the system/application in the Critical Systems List ?
Is the affected service completely unavailable?
Is the affected service experiencing latency or degradation?
Is there a function or feature within the service or application that’s not working?
Is there a specific error message shown?
When did the service disruption first begin?
How many people are impacted?
Is this impacting an entire Facility, VISN, District, or Nation?
What mission essential task is the business unable to perform?
Is there another way, such as a contingency plan, to perform that mission essential task?
Is someone in OIT already troubleshooting this issue?
Is there ongoing maintenance or a known change occurring?
Who will make this call?
This is the question we need to answer. Guidance from the Major Incident Management Team states the caller must be one of the following: OIT IT Professional, Area Manager, Program Manager, Senior Engineer, or Administrative Officer of the Day. (YourIT Article Here)
We believe the Incident Commander on-call will be the individual to call/report as a Major Incident.
What happens after the call?
An Incident Manager/Incident Coordinator will be assigned to the Major Incident and a MIM bridge will be created
a. We will have our own swarm room happening at the same timeA Technical Lead on the MIM team will be assigned to the Major Incident.
At this point MIM team will start their bridge. Platform has already started a swarm room. This is a dedicated space for engineers actively diagnosing and resolving the root cause of the incident.
The next steps is laid out in the Incident Call Rules document. This specifies each individuals role during the entirety of this process.
Fix issue or create workaround
If workaround/patch is available, the SNOW incident may likely be demoted to a Priority 3
We’re unsure of the MIM teams Post Mortem process, as nothing is laid out in the VA Major Incident Management Process. We assume we would still follow our own PostMortem process on Platform?
Plan B - Reporting a HPI/CPI Incident to the Major Incident Management team
If for any reason the call to ESD is unsuccessful, an incident can be created in SNOW ITIL and the SNOW Service Portal. This should only be a last resort, as calling will result in a quicker decision by the MIM.
Important: This is not the preferred method of contact.
SNOW ITIL
When you open up http://yourit.va.gov you will be sent to a different dashboard
Select “All”, type “incident” into the filter box (Or scroll all the way down) and “Create New”
You will get a ticket number immediately, even before you submit it. Copy this number.
What to enter in each field:
Submitted by: Your name
Location: Washington DC
Service Area: VA Washington
Affected End-user: Your name
Affected User Building Number: N/A
Affected User Room Number: N/A
Leave “Telework” unchecked
Best Contact Method: (Your choice, they will likely reach out via Teams)
Phone Number: Yours
Category: Affected Service
Affected CI: None
Affected Service: http://VA.gov - Veteran-facing Services Platform
Service Offering: N/A
Portfolio: Veteran Experience Services
Product Line: Digital Experience
Impact: 1 - Critical - Impacts National
Urgency: 1(critical) System outage (Application or Service)
Priority: 1 - Critical
Assignment Group: ESD Tier 1 (They will route you if declared a MIM)
Short Description: Brief statement - 1 sentence or less
Description: What is down, what is the error code if possible, what is the impact. Ex: http://va.gov and api.va.gov are returning 502 errors. No Veterans are able to access the site at this time.
Affected System: Name of system (example above was http://va.gov )
YourIT Service Portal
From behind the VA network, go to http://yourit.va.gov/va to get to your favorite page.
Select “Report an issue”
Select “Not sure? Submit your issue here”
Fill out your general information and select “Next page”
Name, #, email, etc
Brief description: 1 sentence max
Is this happening at a VA location? No
VA location: Anywhere
Category: Software
Subcategory: Web (IMPORTANT: Check the box that says “This device I am looking for is not on the list)
Name: http://va.gov
URL : Whatever one is non-responsive
Click “Further Details”
Select impact.
Submit issue. You will get a ticket number. Something like INC123456. Look out for communication via Teams, email, and your phone.
MIM Decision Tree - How do determine if an incident meets the criteria for a Major Incident?

Resources
Incident Call Rules: Swarm Room vs. MIM Bridge
MIM SOP (Only accessible behind VA network [CAG, AVD, GFE])
YourIT Helpdesk Article (Only accessible behind VA network [CAG, AVD, GFE])
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.