Open source continuous integration for Elastalert rules


I have created a Docker image that can be used to continously test Elastalert rules against Elasticsearch data, to verify that new rules and edits to existing rules work as expected:

If you want to head straight to the code, the CI image is defined here, and you can also look at how a CircleCI config that uses the image should be structured here.


I’ve been using Elastalert quite extensively for a few months now, and one of the processes that isn’t as neat as it could be is testing. The current workflow I use is to write a test that looks sensible, and then either test it locally or simply push the test and do the actions that should trigger an alert on the running instance. The first method is fairly time-consuming and not easy to replicate in an identical manner; the second is time-consuming, annoying for other users of the system, and somewhat risky. Neither option is particularly ‘DevSecOps’ either.

In addition, I’ve been meaning to play with CircleCI for some time, because it’s a word that I’ve been hearing frequently and I wanted to see what it was like.

When Marco mentioned that Elastalert provided an elastalert-test-rule function, I thought that this would be a simple project - simply wrap that test script into a Docker container and use that in a CircleCI image. The reality, as it always does, turned out to be significantly more involved, but I got there in the end.


The aim

elastalert-test-rule was described in a way that made it seem ideal to use as-is in a Dockerfile. It claims to provide a --data argument that allows a user to submit a file of Elasticsearch data that will be matched against locally (without spinning up an Elasticsearch instance), and a --formatted-output argument that “[outputs] results in formatted JSON”.

My initial idea was to use these options to create a single-container CI image that, when provided with a rule, would use the testing system’s mocking functionality to run matching rules. I would then capture the ‘formatted JSON’ output, and return whether a rule matched or not.

Problem 1: the script doesn’t work

The first problem I came up against (and a worrying one at that) was that the elastalert-test-rule script doesn’t work. A few months ago, Elastalert made the switch from Python 2 to Python 3, and it looks as some functionality had been forgotten in the process. I put in a PR for the issue I was having, but it was clear that I was the only person to have used this script with the --data flag for a long while. While I waited for the PR to be accepted, I used sed in the Dockerfile to carry out my change on-the-fly at build time.

Problem 2: --data doesn’t work as described


In the Elastalert documentation, the --data flag is described as allowing you to Use a JSON file as a data source _instead_ of Elasticsearch (emphasis mine). This was the main attraction for me, as it would reduce the overhead of running a test. When actually running the script, however, it would error if it wasn’t able to contact Elasticsearch, as the script was trying to create an Elasticsearch object regardless of whether the --data flag was provided.

In addition, I discovered through testing that using the --data flag doesn’t apply any of the filters that your rule defines. For example, if you supply elastalert-test-rule with a dataset that has 1000 records, and the filter in your rule you are testing would result in your rule not matching on any of the records, using elastalert-test-rule --data would result in 1000 hits regardless. In hindsight, this is somewhat obvious - Elasticsearch handles the querying and it would take significant work for Elastalert to mock this functionality. Either way, the limitation isn’t obvious from the documentation, and severely reduces the value of the --data option. I would need to find a way of running an Elasticsearch container in addition to the Elastalert container, and making the two of them talk to each other.

This was probably the most time-consuming challenge. For local testing, I switched to using docker-compose, which allows you to manage multiple containers using a couple of commands, allowing those containers to talk to each other on their own network. CircleCI also allows you to do similar things, which made it surprisingly easy to spin up an Elasticsearch container that would sit in the background waiting for Elastalert to run against it on either platform. I wrote a wrapper script to avoid using --data, instead uploading the necessary data directly to Elasticsearch.

My final issue on this front was that Elasticsearch takes quite a while to start up, especially when running on low-spec containers. Elastalert would start up much faster, and would then exit on not being able to talk to Elasticsearch. I discovered dockerize, which can be used as a wrapper around your Docker CMD with the -wait option. This basically polls a HTTP endpoint every second for a predetermined amount of time; if/when the endpoint returns a 200 status code, dockerize hands over to your Docker CMD. This stopped my Elastalert container from panicking immediately at runtime, and hasn’t failed to work yet, even though it’s quite a crude solution.


Problem 3: --formatted-output is not entirely formatted

By the time I got to this point I had what I thought were the most complicated parts of the puzzle sorted: I had a pair of containers that could run in either docker-compose or CircleCI, uploading test data to the Elasticsearch instance and then running Elastalert to it. I then began wrapping the Elastalert output: Elastalert would return a 0 exit code if it managed to run successfully regardless of whether rules matched or not, and I needed more granular insight into the rule output.

I discovered here that --formatted-output does indeed print the output as JSON, but there is no easy way to direct this output to a location of your choosing. I initially worked around this by modifying the elastalert-test-rule script directly, changing it to write to a file, but I then thought that it might be neater for me to simply capture the output of STDOUT in my wrapper script and work with that. I then discovered that elastalert-test-rule with the --formatted-output option prints two lines of unformatted plaintext output before outputting the formatted JSON. At this point I was too tired to switch back to directly editing the testing script, so I simply used a Python regex to remove the first two lines of the output file before parsing the JSON. I used to be a Perl developer in a previous life.

The home stretch


These were all the key problems fixed: I finally had a system by which I could run a rule against some data and check whether it fired on not, returning an appropriate exit code depending on the result. In a final burst of motivation, I also solved the following minor issues:

  1. Timestamps: Elastalert normally runs over the last day of data, but I found a poorly documented elastalert-test-rule feature that allows you to provide --start and --end times, so that you don’t have to keep changing the timestamps on your data files. I added the concept of a ‘data index’ that should live in the repository with the rules and the data, which defines metadata about the dataset, and allows multiple rules to easily point to the same dataset.
  2. Recursion: I used CircleCI’s globbing functionality to find all the rules within a folder, even if they are nested, which I expect is a common use case. Rules that don’t have CI metadata defined are skipped by the wrapper script.
  3. Parallelism: I used CircleCI’s parallelism and tests split functionality to add parallelism to the testing, which I expect will help significantly with large rulesets.

The CI image and an example ruleset are both on Github for those interested.

Future work

The current script is quite brittle - in the excitement of making something I haven’t really been focusing too hard on edge cases. If there is interest, or if I actually end up using this for practical purposes, I would want to tidy the code up significantly so that it is more maintainable.

The wrapper function is also quite naive - it can tell you that an alert has fired, but can’t tell you the number of matches to the rule, or whether some metadata you wanted in the alert text has been correctly picked up. I do hope to add this functionality soon, as it shouldn’t be too difficult.

It should also be possible to state that a particular dataset shouldn’t cause a rule to fire, and test for that. That functionality doesn’t exist yet.

It should also be possible to easily run a single rule against multiple datasets, testing its behaviour against each one. Again, this isn’t possible yet.

Ideally, however, this is a starting point to creating reliable, repeatable CI builds for Elastalert rules. For security, which is the field that I work in, being able to assure ourselves that our rules work as we expect them to is a big piece of knowing that we are functional as a team. I have seen other mentions of CI pipelines for security detection rules, but I haven’t seen any open source work in the area, so hopefully this is a useful contribution as a starting point.

Written by Feroz Salam on 17 May 2020