Last Updated:
ClamAV scans uploaded files for viruses before they are stored or processed. This page explains how the scanning architecture works, how vets-api integrates with ClamAV, and how to diagnose and remediate common failures that can impact vets-api pod health or upload flows.
ClamAV Architecture, Behavior, and Failures (vets-api)
Overview
-
Goal: Ensure all file uploads scanned for viruses without destabilizing vets-api pods.
-
Pattern: ClamAV runs as a sidecar container in each vets-api pod; Rails talks to it via a client library.
-
Risk: Because Kubernetes pod health is per-pod, ClamAV failures can mark the whole pod unhealthy, causing restarts and reduced availability. See ClamAV error logs (requires Datadog access)
High-level architecture
Components
-
Vets-API Rails app
-
Uses
Common::VirusScanfor scanning. -
Uses
UploaderVirusScan(CarrierWave concern) and the Shrine plugin#validate_virus_freeto hook scanning into upload flows.
-
-
ClamAV sidecar container
-
Runs alongside vets-api in the same pod.
-
Exposes the ClamAV daemon on a TCP port inside the pod.
-
Pod health is impacted by both the Rails and ClamAV containers.
-
-
vsp-infra-clamav repo (https://va.ghe.com/software/vsp-infra-clamav)
-
Houses the ClamAV Docker image and Kubernetes configuration used by vets-api.
-
Notable GitHub Actions:
-
mirror-images.yml: builds and pushes ClamAV images to ECR.
-
s3_sync.yml: builds the ClamAV image, extracts DBs, syncs them to S3.
-
-
vets-api application integration
Common::VirusScan
Location: lib/common/virus_scan.rb
-
API
-
Common::VirusScan.scan(file_path, upload_context: nil)-> true/false -
Raises on hard failures (e.g., temp file missing, ClamAV unreachable)
-
-
Behavior
-
Verifies the temp file exists; raises "Failed to create temp file" if not.
-
Mock mode:
-
If
Settings.clamav.mockis true, returns true immediately (used for non-prod/testing).
-
-
Collects file metadata for audit:
-
Hashed basename (SHA-256), file size, and content type (Marcel).
-
-
Measures scan duration using a monotonic clock.
-
Calls
perform_scan(file_path)and expects a hash:-
{ safe: true/false, virus_name: '...' }
-
-
Emits a scan audit log:
-
Message: "ClamAV Virus Scan Audit".
-
Fields:
-
event: 'virus_scan' -
user_uuid,ip_addressfromRequestStore.store['additional_request_attributes'](currently set to nil due to PII concerns, pending further guidance) -
file_name(hashed),file_size,content_type -
scan_result: "clean" or "infected" -
virus_name,scan_duration_ms,upload_context
-
-
-
On any exception:
-
Emits an error audit log (
scan_result: 'error'). -
Re-raises the error to the caller.
-
-
perform_scan
-
If
file_pathstarts withclamav_tmp/:-
Treats it as already in the ClamAV temp directory.
-
Sets mode 0640.
-
Calls
ClamAV::PatchClient.new.scan_with_result(file_path).
-
-
Feature Flag
:clamav_scan_file_from_other_location:-
Enabled:
-
Logs that it is creating a ClamAV tmp file.
-
Calls
#scan_file_from_other_location(original_path):-
Ensures
Rails.root.join('clamav_tmp')exists. -
Sets original file mode 0640.
-
Builds a unique temp path under
clamav_tmp/ -
Copies the original file to that path; verifies the copy exists.
-
Sets the temp file mode 0640.
-
Calls
ClamAV::PatchClient.new.scan_with_result(temp_path). -
Always attempts to delete the temp file (and logs success/failure).
-
-
-
Disabled:
-
Logs a warning: ClamAV scan from other locations is disabled.
-
Returns
{ safe: false, virus_name: nil }, which callers treat as unsafe.
-
-
UploaderVirusScan (CarrierWave integration)
Location: app/uploaders/uploader_virus_scan.rb
-
Inclusion
-
Included into CarrierWave uploaders that require virus scanning.
-
Registers a callback:
-
before(:store, :validate_virus_free)
-
-
-
Runtime behavior
-
Only active in production:
-
Immediately returns
unless Rails.env.production?
-
-
Uses
Common::FileHelpers.generate_clamav_temp_file(file.read)to write the upload to a ClamAV-readable temp file. -
Calls
Common::VirusScan.scan(temp_file_path). -
Deletes the temp file after the scan.
-
If the scan result is false (infected or treated as unsafe):
-
Calls
file.deleteon the uploader file object. -
Raises
UploaderVirusScan::VirusFoundError, "Virus Found + #{temp_file_path}".
-
-
Shrine validate_virus_free plugin
Location: lib/shrine/plugins/validate_virus_free.rb
-
Purpose
-
Shrine-based uploads (e.g., some form submissions) use this plugin to validate uploads are virus-free before persistence. It is the Shrine counterpart to CarrierWave’s UploaderVirusScan.
-
-
Behavior
-
Attachers call
validate_virus_free(message: nil)(e.g., from a Shrine validation block). -
Wraps the scan in a Datadog trace "Scan Upload for Viruses".
-
Downloads the Shrine file, writes it to a ClamAV temp path via
Common::FileHelpers.generate_clamav_temp_file, then callsCommon::VirusScan.scan(temp_file_path)(and, when implemented, can passupload_context:for audit logging). -
Deletes the temp file after the scan.
-
If the scan returns false: logs a virus-detected warning (with hashed file name and optional upload context from record.class.name), adds a validation error, and returns false. In development, a special message prompts starting clamd.`j
-
If the scan returns true: returns true (validation passes).
-
-
Audit logging
-
Common::VirusScanemits the same "ClamAV Virus Scan Audit" log for Shrine scans as for other callers.
-
ClamAV sidecar behavior
Healthy behavior
-
Startup
-
ClamAV container starts alongside vets-api.
-
Loads virus databases (from the image or mounted data/S3-synced volume).
-
Binds to its configured TCP port and logs that it is ready.
-
-
Steady state
-
Occasional log lines for:
-
Definition updates (depending on configuration).
-
Internal housekeeping.
-
-
For each scan:
-
A short-lived log entry with request/response context.
-
-
Resource profile:
-
Spike in memory/CPU during DB load.
-
-
Relatively stable usage in steady state with periodic spikes during scans.
-
-
From vets-api’s point of view
-
ClamAV::PatchClientcalls return within a few hundred ms under normal load. -
Common::VirusScan.scanreturns:-
true for clean files.
-
false for confirmed infections or when configured to treat non-scannable cases as unsafe.
-
-
Audit logs show
scan_result: 'clean'with reasonablescan_duration_ms.
-
Common error patterns
Expected / benign
-
Detection of real or test malware (e.g., EICAR)
-
Result:
{ safe: false, virus_name: 'Eicar-Test-Signature' }or similar. -
vets-api behavior:
-
Common::VirusScan.scanreturns false. -
UploaderVirusScan raises
VirusFoundError; file is deleted.
-
-
This is expected behavior and not a ClamAV failure.
-
Unexpected / problematic
-
Daemon unreachable
-
Symptoms:
-
Connection errors (
ECONNREFUSED, timeouts) fromClamAV::PatchClient. -
Errors logged in
Common::VirusScanand error audit events.
-
-
Impact:
-
Upload flows that rely on scanning fail.
-
If health checks are tied to ClamAV readiness, pods may be marked Unready or restart.
-
-
-
Database load failures
-
Symptoms:
-
ClamAV logs show DB load errors or repeated restarts.
-
-
Impact:
-
Scans may fail outright.
-
Downstream, vets-api sees exceptions or very slow responses.
-
-
-
High memory usage / OOM
-
Symptoms:
-
ClamAV container is
OOMKilledby Kubernetes. -
Frequent pod restarts.
-
-
Impact:
-
Reduced capacity during churn.
-
Possible spikes in 5xx errors for upload endpoints.
-
-
Debugging procedures (Kubernetes + vets-api)
1. Identify pods and containers in trouble
-
Check vets-api pods for:
-
Unready status, repeated restarts, or
CrashLoopBackOff.
-
-
Inspect container-level status:
-
Confirm whether the ClamAV container is failing (CrashLoop, OOMKilled, failing probes) while Rails appears healthy.
-
2. Inspect ClamAV logs
-
View logs for the ClamAV container in an affected pod:
-
Look for:
-
DB load success/failure.
-
Port binding issues.
-
Repeated crashes/restarts.
-
Timeouts or resource exhaustion.
-
-
-
Correlate with vets-api logs:
-
Error audit logs from
Common::VirusScan(scan_result: 'error'). -
Exceptions originating from
ClamAV::PatchClientorCommon::VirusScan.scan. -
Warnings such as "Clamav scan from other location disabled".
-
3. Validate vets-api configuration
-
Feature flags
-
Flipper.enabled?(:clamav_scan_file_from_other_location):-
If disabled, only files already under
clamav_tmp/are scanned. -
Any other file path returns
{ safe: false }, which uploaders treat as infected.
-
-
Settings.clamav.mock:-
If true, scans always pass (returns true) without talking to ClamAV.
-
Acceptable for local/dev/test, not for production.
-
-
-
Temp file handling
-
Confirm
Common::FileHelpers.generate_clamav_temp_filewrites to a path that:-
Is reachable and readable by ClamAV.
-
Resides in a filesystem with enough space.
-
-
Ensure
clamav_tmp/exists and has correct owner/mode.
-
-
Audit logs
-
Use "ClamAV Virus Scan Audit" entries to:
-
Confirm scans are being triggered for specific upload endpoints.
-
Examine scan_duration_ms for latency issues.
-
Spot patterns (e.g., errors only for certain file sizes or types).
-
-
4. Common remediation steps
-
ClamAV container repeatedly failing
-
Check:
-
Image version changes.
-
ClamAV configuration.
-
Resource limits/requests.
-
-
Mitigations:
-
Increase memory/CPU.
-
Adjust DB loading or update behavior if too heavy.
-
Roll back to a previous known-good image if a new release is faulty.
-
-
-
Connection errors from Rails
-
Validate:
-
ClamAV daemon is listening on expected host/port.
-
No network policy changes blocking traffic inside the pod.
-
-
Consider:
-
Restarting affected pods.
-
Temporarily enabling
Settings.clamav.mockonly if acceptable from a risk standpoint (and documenting the window).
-
-
-
Slow scans
-
Look for:
-
Large or numerous concurrent uploads.
-
High CPU contention on ClamAV pod(s).
-
-
Options:
-
Increase resources.
-
Rate-limit or size-limit uploads upstream.
-
Add retry behavior or backpressure in upload flows.
-
-
Impact on vets-api pods and request handling
-
Pod health
-
Any ClamAV sidecar failure can:
-
Fail readiness/liveness probes.
-
Trigger pod restarts and churn.
-
-
Systemic ClamAV issues (bad image, DB problems) can reduce overall cluster capacity.
-
-
Request behavior
-
For endpoints using
UploaderVirusScan:-
Requests block until
Common::VirusScan.scancompletes. -
If:
-
Scan returns true: upload proceeds and file is stored.
-
Scan returns false: upload is rejected with
VirusFoundError; file deleted. -
Scan raises error: request typically fails with a 5xx (depending on controller handling).
-
-
-
-
Observability
-
Combine:
-
Application logs ("ClamAV Virus Scan Audit", Rails exceptions).
-
Sidecar logs (daemon startup, DB load, errors).
-
GitHub Actions workflows for image/DB pipeline health.
-
-
ClamAV image and database pipelines (vsp-infra-clamav)
Repo: vsp-infra-clamav
Image mirroring to ECR (mirror-images.yml)
-
Trigger
-
Daily cron around 12:30 PM Eastern (with separate EST/EDT entries) plus on-demand workflow_dispatch.
-
Time-gated-job ensures the job only runs when the Eastern hour is 12 unless manually triggered.
-
-
Behavior
-
prepare-build:
-
Checks out the repo.
-
Reads
versions.jsonand exports.componentsas a JSON array (config).
-
-
mirror:
-
Matrix over config; each entry has version and repo.
-
Sets NOW (e.g., YYYY-MM-DD-HH) for tagging.
-
Configures AWS credentials and logs into ECR in us-gov-west-1.
-
Builds the ClamAV Docker image from ./Dockerfile with:
-
APP_VERSION=${{ matrix.versions.version }} -
REPO=${{ matrix.versions.repo }}
-
-
Pushes the image to:
-
${registry}/dsva/clamav:${GITHUB_SHA}-${NOW}
-
-
-
-
Failure handling
-
notify-on-failure sends a Slack alert to channel #platform-cop-be-notifications when the mirror job fails:
-
Explains that vets-api cannot receive updated ClamAV images and that downstream “Release and Update Manifests” workflows are blocked.
-
Suggests checking:
-
Docker build errors.
-
freshclam issues (e.g., CDN/network).
-
ECR login/push permissions.
-
Base image pulls (e.g., clamav/clamav:1.4 from Docker Hub).
-
-
-
Virus database sync to S3 (s3_sync.yml)
-
Trigger
-
Twice daily:
-
Cron for 12:01 AM / 12:01 PM Eastern (EST + EDT variants).
-
-
Also supports manual workflow_dispatch.
-
time-gated-job:
-
If manually triggered: always enables the job.
-
If scheduled: only continues when current Eastern hour is 0 or 12.
-
-
-
Behavior
-
upload_to_s3 (runs only when gated output is true):
-
Checks out the s3-upload branch.
-
Assumes an AWS role via OIDC in us-gov-west-1.
-
Logs into Docker Hub.
-
Builds the ClamAV Docker image from
./Dockerfileand loads it locally asclamav-image:latest -
Creates a container: clamav-container.
-
Copies database files out of the container into a local database/ directory:
-
data/bytecode.cvd -
data/main.cvd -
data/daily.cvd
-
-
Uploads database/ to AWS s3
-
-
-
Failure handling
-
notify-on-failure sends a Slack alert to channel #platform-cop-be-notifications when the S3 sync fails:
-
Explains that ClamAV DBs (main.cvd, daily.cvd, bytecode.cvd) were not updated in S3.
-
Suggests checking:
-
Docker build/container creation.
-
docker cp for DB extraction.
-
AWS credentials, bucket permissions, and network to us-gov-west-1.
-
-
Calls out the risk of stale virus definitions for any infrastructure pulling DBs from dsva-vetsgov-utility-clamav.
-
-
Relationship to vets-api reliability
-
If image mirroring fails:
-
New ClamAV image versions are not pushed to ECR.
-
vets-api environments may:
-
Continue using older images with outdated ClamAV or OS components.
-
Experience deployment failures (
ErrImagePull) if no valid image exists.
-
-
-
If DB sync fails:
-
S3 bucket may contain stale DB files.
-
ClamAV sidecars that depend on S3 for DBs will run with old virus signatures.
-
Virus scanning remains functional but is less effective against new threats.
-
Monitoring
These are all Datadog links that assume you have Datadog access.
Help and feedback
-
Get help from the Platform Support Team in Slack.
-
Submit a feature idea to the Platform.