Skip to main content
Skip table of contents

Major Incident Management Playbook

Last Updated:

Overview

This document will help all VFS team members working on VA.gov understand how to report major incidents to the Major Incident Management (MIM) team when necessary. Please read this document carefully as you prepare to make a call to raise a potential major incident.

Acronyms

  • ESD - Enterprise Service Desk

  • MIM - Major Incident Management

  • HPI - High Priority Incident

  • CPI - Critical Priority Incident

  • SNOW “ITIL“ - ServiceNOW Information Technology Infrastructure Library

  • MI - Major Incident


Reporting a HPI/CPI Incident to the Major Incident Management team

Contact the MIM team by directly calling the ESD, and selecting option 9. This bypasses the ESD Tier 1 and provides a direct connection to the MIM team.

Calling to report the HPI/CPI incident to the MIM

After calling the Enterprise Service Desk - (855) 673-4357, and selecting option 9, you will be greeted with the following information:

  1. Welcome to the Major Incident Management team. Are you an OIT IT Professional, Area Manager, Program Manager, Senior Engineer, Administrative Officer of the Day, or Nurse of the Day who can answer questions on this incident?

• If neither, press 1 for No. You will be informed of the following:

A. If you believe a major incident needs to be submitted, please contact an OIT IT Professional, Area Manager, Program Manager, Senior Engineer, Administrative Officer of the Day or Nurse of the Day to call the Enterprise Service Desk and select option 9. You are now being forwarded to the Enterprise Service. Please select option 2 to be connected with an agent.

• If yes, press 2 for Yes. Go to step 2.

  1. Is there an acceptable workaround in place?

• If no, press 1 for No. Go to step 3.

• If yes, press 2 for Yes. You will be stated the following:

A. If a workaround is available, this would not qualify as a major incident. You are being forwarded to the Enterprise Service Desk to report your issue. Please select Option 2 to be connected with an agent.

  1. Is this a service or application impacting a VA facility or nationwide?
    • If no, press 1 for No. You will be stated the following:
    A. Incidents that do not impact a VA Facility or Nationwide are handled via the Enterprise Service Desk. You are being forwarded to the Enterprise Service Desk. Please select Option 2 to be connected with an agent.

• If yes, press 2 for Yes. You will be stated the following:

A. You are now being connected to a MIM agent.

Questions that will be asked during the reporting call

  • What is the affected service?

  • Is the system/application in the Critical Systems List ?

  • Is the affected service completely unavailable?

  • Is the affected service experiencing latency or degradation?

  • Is there a function or feature within the service or application that’s not working?

  • Is there a specific error message shown?

  • When did the service disruption first begin?

  • How many people are impacted?

  • Is this impacting an entire Facility, VISN, District, or Nation?

  • What mission essential task is the business unable to perform?

  • Is there another way, such as a contingency plan, to perform that mission essential task?

  • Is someone in OIT already troubleshooting this issue?

  • Is there ongoing maintenance or a known change occurring?

Who will make this call?

This is the question we need to answer. Guidance from the Major Incident Management Team states the caller must be one of the following: OIT IT Professional, Area Manager, Program Manager, Senior Engineer, or Administrative Officer of the Day. (YourIT Article Here)

We believe the Incident Commander on-call will be the individual to call/report as a Major Incident.

What happens after the call?

  1. An Incident Manager/Incident Coordinator will be assigned to the Major Incident and a MIM bridge will be created
    a. We will have our own swarm room happening at the same time

  2. A Technical Lead on the MIM team will be assigned to the Major Incident.

  3. At this point MIM team will start their bridge. Platform has already started a swarm room. This is a dedicated space for engineers actively diagnosing and resolving the root cause of the incident.

  4. The next steps is laid out in the Incident Call Rules document. This specifies each individuals role during the entirety of this process.

  5. Fix issue or create workaround

    1. If workaround/patch is available, the SNOW incident may likely be demoted to a Priority 3

  6. We’re unsure of the MIM teams Post Mortem process, as nothing is laid out in the VA Major Incident Management Process. We assume we would still follow our own PostMortem process on Platform?

Plan B - Reporting a HPI/CPI Incident to the Major Incident Management team

If for any reason the call to ESD is unsuccessful, an incident can be created in SNOW ITIL and the SNOW Service Portal. This should only be a last resort, as calling will result in a quicker decision by the MIM.

Important: This is not the preferred method of contact.

SNOW ITIL

  1. When you open up http://yourit.va.gov you will be sent to a different dashboard

  2. Select “All”, type “incident” into the filter box (Or scroll all the way down) and “Create New”

  1. You will get a ticket number immediately, even before you submit it. Copy this number.

  2. What to enter in each field:

    1. Submitted by: Your name

    2. Location: Washington DC

    3. Service Area: VA Washington

    4. Affected End-user: Your name

    5. Affected User Building Number: N/A

    6. Affected User Room Number: N/A

    7. Leave “Telework” unchecked

    8. Best Contact Method: (Your choice, they will likely reach out via Teams)

    9. Phone Number: Yours

    10. Category: Affected Service

    11. Affected CI: None

    12. Affected Service: http://VA.gov - Veteran-facing Services Platform

    13. Service Offering: N/A

    14. Portfolio: Veteran Experience Services

    15. Product Line: Digital Experience

    16. Impact: 1 - Critical - Impacts National

    17. Urgency: 1(critical) System outage (Application or Service)

    18. Priority: 1 - Critical

    19. Assignment Group: ESD Tier 1 (They will route you if declared a MIM)

    20. Short Description: Brief statement - 1 sentence or less

    21. Description: What is down, what is the error code if possible, what is the impact. Ex: http://va.gov and api.va.gov are returning 502 errors. No Veterans are able to access the site at this time.

    22. Affected System: Name of system (example above was http://va.gov )

YourIT Service Portal

  1. From behind the VA network, go to http://yourit.va.gov/va to get to your favorite page.

  2. Select “Report an issue”

  1. Select “Not sure? Submit your issue here”

  1. Fill out your general information and select “Next page”

    • Name, #, email, etc

  1. Brief description: 1 sentence max

  2. Is this happening at a VA location? No

  3. VA location: Anywhere

  4. Category: Software

  5. Subcategory: Web (IMPORTANT: Check the box that says “This device I am looking for is not on the list)

  6. Name: http://va.gov  

  7. URL : Whatever one is non-responsive 

  1. Click “Further Details”

  1. Select impact. 

  2. Submit issue. You will get a ticket number. Something like INC123456. Look out for communication via Teams, email, and your phone.  

MIM Decision Tree - How do determine if an incident meets the criteria for a Major Incident?


Resources

Incident Call Rules: Swarm Room vs. MIM Bridge

MIM SOP (Only accessible behind VA network [CAG, AVD, GFE])

YourIT Helpdesk Article (Only accessible behind VA network [CAG, AVD, GFE])


Help and feedback

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.