Monitor Tagging Standards
Last Updated:
To effectively track the health of VA.gov applications requires monitoring of many different systems, metrics, and logs. For these monitors to be effective, they need to be understandable by both application teams and incident responders, as well as sorted, filtered, and managed programmatically.
The three main goals of monitor tagging are:
Quickly and easily find relevant dashboards and monitors when responding to an incident
Clearly define ownership and point of contact for all monitors for usage tracking and issue triage
Allow bulk management of resources, such as muting all monitors related to a system undergoing maintenance
References: ECC Monitoring Tool Standard - Datadog [To view this link, you must be logged into a remote desktop with your PIV card.]
Tagging standards are implemented through Datadog’s Tag Policy and all monitors must have the “required” fields filled out.
Required: env
The env tag is the VA.gov hosting environment that is being evaluated by the monitor. This allows responders to understand the significance of an issue, and correlate patterns of alerts that correspond with a particular environment.
Recommended values:
env:sandboxenv:devenv:stagingenv:prod
Required: team
The team tag is the team that manages the monitor. This team is the first POC when a monitor alert is triggered, and the most granular description of who manages the monitor. Use a tag to configure the team handle, or request an Admin to use the full Team management.
Examples:
team:benefits-deliveryteam:1010-health-apps
Required: itportfolio
The itportfolio tag is the OCTO Portfolio that manages the monitor.
Predefined list of
itportfoliotags:itportfolio:digital-experienceitportfolio:benefits-deliveryitportfolio:health-deliveryitportfolio:technology-innovationitportfolio:data-analytics
Required: service
The service tag is for the application or service that the monitor is watching, using the name from the service catalog.
Note: If the monitor points directly at an external service which is managed outside of OCTO then use the tag service:external and consider using the dependency tag.
Recommended: dependency
The dependency tag is the name of the external dependencies that affect the monitor.
For example, these are dependencies used to mute groups of alerts when an upstream system is undergoing planned maintenance.:
dependency:evssdependency:mpi
Additional Resources:
If you have any questions or you would like to sign up for the Datadog support team’s weekly office hours (Mondays at 11am ET), please contact #public_datadog.
Help and feedback
Get help from the Platform Support Team in Slack.
Submit a feature idea to the Platform.