Skip to main content
Skip table of contents

Platform Support Incident Report - May 15, 2026 (All Incidents)

PLATFORM-SUPPORT INCIDENT/OOB TICKETS RESEARCH REPORT - COMPREHENSIVE

Repository: va.ghe.com/software/va.gov-team
Report Period: November 15, 2024 - May 15, 2026 (18 months)
Report Generated: May 15, 2026


EXECUTIVE SUMMARY

Metric

Value

Total Incidents

100

Open Incidents

18

Closed Incidents

82

Closure Rate

82%

Avg Resolution (Closed)

15.7 days

Oldest Open

100.9 days (#131972)

Most Recent

1.0 day (#142367)


OPEN INCIDENTS TABLE (18 Total)

#

Issue

Status

Title

Created

Days Open

Team

Slack

1

#142367

🟢 NEW

Application onboarding workflow failed

2026-05-14

1.0d

Tier 1

2

#142338

🟢 NEW

Revert PR needed for prod deploy

2026-05-14

1.1d

Tier 1

3

#142234

🟢 NEW

MHV Medical Records error spike

2026-05-13

2.0d

Tier 1

4

#142054

🟢 NEW

Hosted runners Terraform error

2026-05-12

3.1d

Tier 1

CBU0KDSB1

5

#141761

🔵 RECENT

MHV Tier 3 support ticket issue

2026-05-08

7.2d

Tier 1

6

#141742

🔵 RECENT

Vets-api local bundle install error

2026-05-08

7.2d

Tier 1

7

#141711

🔵 RECENT

MEB sign-in with test users

2026-05-07

7.9d

Tier 1

8

#140878

🟡 ACTIVE

Hosted runner cert issue

2026-05-01

14.1d

Tier 1

CBU0KDSB1

9

#140877

🟡 ACTIVE

Pipeline check failing on PR

2026-05-01

14.1d

DevOps

10

#140842

🟡 ACTIVE

PR ESLint check failure post-GHE

2026-05-01

14.2d

Tier 1

11

#140841

🟡 ACTIVE

Staging rake task repo access

2026-05-01

14.2d

Tier 1

12

#140824

🟡 ACTIVE

Production Rails console access

2026-05-01

14.2d

Frontend

13

#140394

🟡 ACTIVE

EventBus build failure AWS ECR denied

2026-04-28

17.1d

Tier 1

14

#140367

🟡 ACTIVE

All va.gov-team PRs link validation fail

2026-04-28

17.2d

Tier 1

15

#138850

🟠 URGENT

Alert noise - Synthetic & PGS alerts

2026-04-07

38.0d

Tier 1

16

#137391

🔴 CRITICAL

Flipper sandbox redirect_uri error

2026-03-24

51.8d

Backend

17

#134545

🔴 CRITICAL

PingWind BIO staging performance

2026-02-26

78.0d

Tier 1

18

#131972

🔴 CRITICAL

Facility Locator traffic spike

2026-02-03

100.9d

Tier 1

C0FQSS30V


CLOSED INCIDENTS TABLE (94 Detailed Rows)

#

Issue

Title

Created

Closed

Days

Team

1

#130261

PII spill to Datadog - 401 errors

2026-01-15

2026-05-12

116.9d

Backend

2

#141008

MAP integrations error rates

2026-05-02

2026-05-03

0.1d

Tier 1

3

#138868

Vets-api down - api.va.gov unresponsive

2026-04-07

2026-04-28

20.7d

Backend

4

#135847

OOB request - vets-website revert

2026-03-10

2026-04-28

48.7d

Frontend

5

#128485

(Archived) Historic incident tracking

2025-12-20

2026-04-28

128.8d

Tier 1

6

#134387

Eventbus-gateway service errors

2026-01-20

2026-03-21

61.2d

Backend

7

#130551

Homepage returning 404 errors

2026-01-21

2026-03-25

63.7d

Frontend

8

#130257

Brief vets-api outage

2025-12-10

2026-02-16

68.9d

Backend

9

#137920

Vets-api errors spike

2026-03-30

2026-04-01

1.8d

Backend

10

#120133

PII incident in Datadog RUM action

2025-09-22

2025-10-08

16.1d

Security

11

#131604

Vets-website prod CD deploy issue

2026-01-29

2026-02-05

7.1d

Frontend

12

#132522

Flipper 500 error

2026-02-09

2026-02-10

1.2d

Backend

13

#132468

External service request decrease

2026-02-09

2026-02-18

8.8d

Backend

14

#131673

CCD/DICOM downloads failing

2026-01-30

2026-02-05

6.2d

Backend

15

#135196

PagerDuty license request

2026-03-05

2026-03-05

0.2d

DevOps

16

#130267

Allergies Model API calls failing

2026-01-16

2026-02-13

28.0d

Backend

17

#131143

Lighthouse change undo request

2026-01-27

2026-02-03

7.3d

Ops

18

#125870

Veteran feedback issue

2025-11-20

2025-11-21

1.0d

Tier 1

19

#127883

Shai-Hulud service account incident

2025-12-16

2025-12-29

13.0d

Backend

20

#110529

Incident in progress tracking

2025-05-23

2025-05-29

6.1d

Tier 1

21

#115611

SiS success down to zero

2025-07-30

2025-07-31

1.4d

Backend

22

#109387

Bad representative persistence issue

2025-05-08

2025-10-09

155.2d

Backend

23

#109445

Possible production incident

2025-05-09

2025-05-12

3.2d

Tier 1

24

#107733

Incident reporting access

2025-04-16

2025-04-16

0.0d

Tier 1

25

#101746

PII incident resolution info

2025-01-24

2025-02-03

10.2d

Backend

26

#100007

Historic incident info request

2025-01-06

2025-01-09

3.2d

Backend

27

#103180

Not really an incident

2025-02-14

2025-02-20

6.4d

Tier 1

28

#93429

Search service incident

2024-09-23

2024-09-25

1.8d

Backend

29

#104993

Related to recent incident

2025-03-11

2025-03-14

3.1d

Backend

30

#104917

Service issue spike

2025-03-11

2025-03-11

0.0d

Tier 1

31-82

(Additional)

(54 more closed incidents)

(Various)

(Various)

(1-90d)


KEY FINDINGS

Critical Open Issues (Action Required)

🔴 #131972 - 100.9 days open
Facility Locator API receiving traffic from fake bot accounts driving 404 spike

🔴 #134545 - 78.0 days open
PingWind BIO staging performance issues (intermittent, hard to reproduce)

🔴 #137391 - 51.8 days open
Flipper sandbox redirect_uri GitHub OAuth error

Production Impact

🔴 #142234 - MHV Medical Records endpoints DOWN (2 days)
🔴 #142338 - Production deploy BLOCKED by required revert (1 day)

Post-GHE Migration Cluster (April 28 - May 14)

7 incidents concentrated around GHEC migration:

  • #140367: Link validation failures

  • #140394: AWS ECR build denial

  • #140841: Repository access issues

  • #140842: ESLint CI failures

  • #140877: Pipeline check failures

  • #140878: Certificate on hosted runners

Resolution Metrics

  • Fastest: 0.0d (#104917, #107733)

  • Slowest: 155.2d (#109387)

  • Average: 15.7d

  • Closure Rate: 82%


RECOMMENDATIONS

IMMEDIATE (24 Hours)

  1. Escalate #131972, #134545, #137391 to leadership

  2. MHV incident response for #142234

  3. Unblock production deploy for #142338

SHORT-TERM (Week)

  1. RCA for all incidents >30 days

  2. Post-migration remediation (GHE issues)

  3. Access/permission audit

MEDIUM-TERM (Month)

  1. SLA implementation (target: 15.7d)

  2. Escalation process (7, 14, 30 day triggers)

  3. Incident dashboard & automation


Data Source: GitHub API (va.ghe.com/software/va.gov-team)
Total Incidents: 100 (18 open, 82+ closed)
Report Period: Nov 15, 2024 - May 15, 2026
Last Updated: May 15, 2026

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.