Load tests need to be run before launching new endpoints or substantial updates to existing endpoints to ensure the stability of the API. Static endpoints have no explicit need for this, but we may have examples in the loadtest directory of the devops repo to exercise components of the infrastructure to test for bottlenecks with the same goals outlined here (for example, while tuning at the revproxy layer).

There are a few goals to keep in mind while writing and conducting a test of your new feature. There may be slightly different goals for a load test when launching Platform level features.

  • Identification of bottlenecks and operational concerns

  • Validation that if an SLA is given for a new service is achievable with the current implementation

  • Estimating upper bounds for traffic for a given service

Before You Begin

To load test you need access to

Load Test Tools

There are two main tools that we use for load testing:

  • Locust lets you define test behavior in Python code and has a web interface for interactive testing. It also supports creating a swarm of test instances. It breaks down latency measurements by individual request paths. In the current version, its text output format is somewhat lacking which makes scripted testing and report generation difficult.

  • wrk2 claims to have more accurate latency measurements through the use of HdrHistogram and accounting for coordinated omission at the limits of system throughput. Simple tests against a single endpoint can be triggered easily, or test behavior can be scripted in Lua. It has a well-defined report format, or reporting behavior can be overridden via the scripting API. Install instructions are found at the wiki

Dependencies

Navigate to the loadtest folder of the devops repo and install the dependencies

  • by using pipenv: pipenv install OR

  • directly: pip install -r requirements.txt

Load Testing

Create a folder inside devops/loadtest to hold scripts that will load test your application. Look to existing scripts for examples or visit the Locust docs. There are two ways to run your load tests: via the command line or (if you’re using Locust) through a web interface.

Load Testing with the Command Line

A small script at loadtest/loadtest_runner allows you to run a load test and record information to make report generation easier. It uses the following arguments:

  • -d, --dir: Directory to cd to where the load test files are, e.g. search

  • -t, --test_type: Default 'locust', set to 'wrk2' to run wrk2 tests

  • argument_input: A file of command line arguments to pass to the load testing tool

An example

❗️Warning: setting -c higher than 10 can exhaust the available staging connections.

# appealsv2/test_args

-H https://staging-api.va.gov -c 10 -r 2 -t 10m --no-web -f appealsv2_locust.py --only-summary
CODE
# running the loadtest

./loadtest_runner -d appealsv2/ -t locust appealsv2/test_args
CODE

It will run the test and produce an output file {start_timestamp}-output.loadtest that can be passed to the report generation tool.

Report Generation

The scripts in the loadtest/report directory will read in the output from one of the above load testing tools and query prometheus metrics to generate a report of how the infrastructure behaved during the test.

These tools are wrapped by a script at loadtest/report_builder that can be run to generate the report from the output of loadtest/loadtest_runner.

The main parameters that need to be specified are test output file, and the environment to interrogate. Sample command line:

./report_builder -o appeals_report.html -e staging -s "Appealsv2 Test" --component_metrics appealsv2 1490979367-output.loadtest
CODE
  • -s: Short descriptive name of test

  • -e: Environment (dev/staging/prod)

  • -o: Report output filename, defaults to report.html

  • --component\_metrics: Collects latency and request metrics for labeled vets-api components

  • --service_metrics: Collect latency and request metrics for kong services. Services must be referred to by their kong name and comma separated e.g. --service_metrics va_facilities,vet360_address_validation

  • --extra\_metrics: Additional metrics to plot which may not be of interest for all tests. Available metrics are defined in report/definitions.yml. Extra metrics should be a comma separated string e.g. --extra_metrics gateway_instance_cpu,gateway_instance_memory

  • -h, --help: for a full list of options

Load Testing with Web Interface

Locust has a web interface to run your load testing scripts. After writing your script (and making sure that you have installed Locust) CD to your script and do:

  1. locust -f my_script.py

  2. visit http://0.0.0.0:8089/

You will see something like the following:

locust load testing tool web interface landing page

Start a new load test with the Locust web interface

  • Use the following formula to determine the Number of Users:

(Peak Hourly Users * Average Session in seconds) / 3600

Example: let’s say that you expect 1,000 users per hour to access your product during its peak usage.  The average session length for those users is 5 minutes (300 seconds). Then the number of users is (1000 * 300) / 3600 = ~83 users.

  • Choose as host the endpoint your test will hit, e.g. staging-api.va.gov.

After you click the Start swarming button you will see a result like the following:

results page after running a locust load test

Results after running a Locust load test

You can click the Download tab to download CSV data of the results.