Monitor Tagging Standards
Last Updated:
To effectively track the health of VA.gov applications requires monitoring of many different systems, metrics, and logs. For these monitors to be effective, they need to be understandable by both application teams and incident responders, as well as sorted, filtered, and managed programmatically.
The three main goals of monitor tagging are:
Quickly and easily find relevant dashboards and monitors when responding to an incident
Clearly define ownership and point of contact for all monitors for usage tracking and issue triage
Allow bulk management of resources, such as muting all monitors related to a system undergoing maintenance
References: ECC Monitoring Tool Standard - Datadog [To view this link, you must be logged into a remote desktop with your PIV card.]
Tagging standards are implemented through Datadog’s Tag Policy and all monitors must have the “required” fields filled out.
Required: env
The env
tag is the VA.gov hosting environment that is being evaluated by the monitor. This allows responders to understand the significance of an issue, and correlate patterns of alerts that correspond with a particular environment.
Recommended values:
env:sandbox
env:dev
env:staging
env:prod
Required: team
The team
tag is the team that manages the monitor. This team is the first POC when a monitor alert is triggered, and the most granular description of who manages the monitor. Use a tag to configure the team handle, or request an Admin to use the full Team management.
Examples:
team:benefits-delivery
team:1010-health-apps
Required: itportfolio
The itportfolio
tag is the OCTO Portfolio that manages the monitor.
Predefined list of
itportfolio
tags:itportfolio:digital-experience
itportfolio:benefits-delivery
itportfolio:health-delivery
itportfolio:technology-innovation
itportfolio:data-analytics
Required: service
The service
tag is for the application or service that the monitor is watching, using the name from the service catalog.
Note: If the monitor points directly at an external service which is managed outside of OCTO then use the tag service:external
and consider using the dependency
tag.
Recommended: dependency
The dependency
tag is the name of the external dependencies that affect the monitor.
For example, these are dependencies used to mute groups of alerts when an upstream system is undergoing planned maintenance.:
dependency:evss
dependency:mpi
Additional Resources:
If you have any questions or you would like to sign up for the Datadog support team’s weekly office hours (Mondays at 11am ET), please contact #public_datadog.
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.