Unit testing an ElastAlert rule using elastalert-ci

This post refers to an early version of elastalert-ci, and technical implementation details mentioned below may not apply. Please read the README on the project repository for accurate information on how to use elastalert-ci within your project.

When I wrote my original post on unit testing for ElastAlert earlier this year, I cunningly didn’t go into very much detail on how a user should create the data required for the unit test to run against. This was largely because I hadn’t worked out the exact workflow I would use myself. Elasticsearch is relatively particular about how it wants data to be uploaded to it, with widespread usage of the .ndjson (newline-delimited JSON) format and the requirement that certain metadata fields are present. This means that it’s not as straightforward as downloading the data you want and being able to directly re-upload it to Elasticsearch. I made the call that for the first version, I would leave it up to anyone who cared enough to manipulate the data into the required format before using it.

I found some time this week to sit down and test the process of developing a new unit-tested rule from sample data, which is fairly fundamental documentation for the package. I have also created a small helper script to download the data required from Elasticsearch in a format that the unit testing framework will be able to use automatically without further human intervention. Between the two, you should be able to go from an ElastAlert rule rule to a unit-tested ElastAlert rule in less than an hour.

To illustrate the process of writing a rule, I’m going to use sample data that comes with Kibana. To follow along, you will therefore need to install Elasticsearch and Kibana. I used the ECK quickstart on Minikube, but any Elasticsearch + Kibana setup will do. You will also need some familiarity with querying Elasticsearch via the Search API.

On the Kibana homepage, click on ‘Load a data set and a Kibana dashboard’, and on the following page, on the card titled ‘Sample web logs’, click ‘Add data’. Kibana should set up the data for you, and display a success message when it is done.

The sample web logs are the sort of access logs that you would receive from a web server. For our example, let’s say that we’re interested in alerting if we see any access log entries from Firefox user agents, because we all know Firefox users are deviants who must be punished.

An ElatAlert rule for this could might look like:

name: "Catch Firefox users"
description: "Alert whenever we see a Firefox user in the logs"

index: kibana_sample_data_logs`
use_ssl: True
type: any 
filter:
  - query_string:
      query: "*Firefox*"

alert_text: "test alert"
alert:
  - "debug"

Now, let’s say we wanted to unit test whether this alert would actually work against real data in the index. elastalert-ci is built to integrate closely with CircleCI, but can also be used locally, which is what I’m going to do here.

Steps

Clone elastalert-ci, and cd into the root directory of the repository.
Copy the rule above into a new YAML file. Save it as sample_rule.yaml.
The first unit-testing step is to extract the data that you want to test against from Elasticsearch, which is where the helper script does the work. The helper script currently requires the ES_USERNAME, ES_PASSWORD, ES_HOST and ES_PORT environment variables to be set, so set those to your local Elasticsearch environment.
Write a search query using the Search API to get a subset of the data that you would like the unit test rule to run against. Refer to the Search API documentation if you aren’t familiar with how the Search API works. It might also be useful to use Kibana’s Dev Tools to play around with the query until you’re receiving the data that you want.
Convert the query to an argument that you can pass to the exporter script in util/es-data-exporter.py. For example:
```
 GET kibana_sample_data_logs/_search
 {
   "query": {
     "match_all": {}
   }
 }
```
would translate to:

python3 util/es-data-exporter.py --index kibana_sample_data_logs --query "{\"query\": {\"match_all\": {}}}" > access-logs.json
Run the above command.
Update the data-file.yaml data configuration, adding in an entry for the access log data file. Something like:
```
weblogs:
  filename: "access-logs.json"
  timestamp_field: "timestamp"
  start_time: "2020-05-20T00:39:02"
  end_time: "2020-09-20T06:15:34"
```
Note: You will have to define your own start and end times based on the start and end times of the data in your index. They don’t have to match the first and last record of the access-logs.json data exactly, but the time period defined must cover the records that you want to run ElastAlert against. Defining a wide time period here is fine, but it will also increase the time taken by the script to run.
Add an annotation to your sample_rule.yaml, telling it what data file the unit test will require:

ci_data_source: "weblogs"
Add the rule to the --rules argument in the Dockerfile
Run the tests! sudo docker-compose build and then sudo docker-compose upi --abort-on-container-exit

If everything is successful, the containers should exit with:

elastalert-ci_1  | Testing Catch Firefox users
elastalert-ci_1  | 2020/10/04 09:18:07 Command finished successfully.
elastalert-ci_elastalert-ci_1 exited with code 0

You can try changing the rule to match on random text to verify that the run fails in case the rule doesn’t match on anything.

The most time-consuming part of this will be the formulation of the necessary query to grab the data, but ideally multiple rules can be referenced against a single data file, which should reduce the overhead of writing tests against the same data sources.