<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Padlock</title>
    <description>Notes about information security, written by Feroz Salam.
</description>
    <link>https://padlock.argh.in/</link>
    <atom:link href="https://padlock.argh.in/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 15 Dec 2024 17:41:34 +0000</pubDate>
    <lastBuildDate>Sun, 15 Dec 2024 17:41:34 +0000</lastBuildDate>
    <generator>Jekyll v4.3.3</generator>
    
      <item>
        <title>Container capabilities: a short tour</title>
        <description>&lt;p&gt;A while ago, for reasons that I no longer remember clearly, I spent some time investigating the differences between different configurations of Docker containers and the implications they had on the Linux capabilities the resulting containers would have. I find that advice around Docker security is often a bit muddled – a common occurrence of this confusion is the use of ‘privileged’ and ‘root’ as interchangeable terms, when the implications of either choice for a container are different.&lt;/p&gt;

&lt;p&gt;The topic as a whole is surprisingly complex, but I thought it might be useful to compare Linux capabilities across different container configurations. In short, I’m trying to answer the question – what Linux capabilities does a particular configuration combination running on Docker give you? I’m going to consider the following combinations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A root container running with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/li&gt;
  &lt;li&gt;A root container running without the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/li&gt;
  &lt;li&gt;A non-root container running without the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/li&gt;
  &lt;li&gt;A non-root container running with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While not exhaustive, I think the comparison is useful to highlight some interesting nuances of Docker security.&lt;/p&gt;

&lt;p&gt;For simplicity, I’m not going to consider containerization engines other than Docker, and I’m also going to (mostly) ignore other ways of setting capabilities. I’m also going to focus exclusively on capabilities, but both the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag and the root user have other security implications.&lt;/p&gt;

&lt;h2 id=&quot;a-root-container-running-with-the---privileged-flag&quot;&gt;A root container running with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/h2&gt;

&lt;p&gt;To begin, I’m going to start at the highly privileged end of the scale, looking at a containerized process running as root, with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag enabled. This is probably the most straightforward case. Running a Dockerfile like the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FROM ubuntu:latest  
RUN apt update &amp;amp;&amp;amp; apt -y -q install libcap2-bin iputils-ping python3 python3-pip  
ENV DEBIAN_FRONTEND=noninteractive  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker run -it --privileged my-root-container bash&lt;/code&gt;) puts us in a container with the following capability sets:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;root@91d6878472c6:/# grep Cap /proc/$$/task/$$/status  
CapInh:	0000000000000000  
CapPrm:	000001ffffffffff  
CapEff:	000001ffffffffff  
CapBnd:	000001ffffffffff  
CapAmb:	0000000000000000  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;capabilities&lt;/code&gt; &lt;a href=&quot;https://linux.die.net/man/7/capabilities&quot;&gt;man page&lt;/a&gt; explains the algorithm used to calculate the effective permissions set from the various permissions sets above (as well as what the sets mean).&lt;/p&gt;

&lt;p&gt;Decoding the &lt;em&gt;Effective&lt;/em&gt; set using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;capsh&lt;/code&gt; shows a pretty extensive list of permissions:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;root@91d6878472c6:/# capsh --decode=000001ffffffffff  
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This includes the highly permissioned &lt;a href=&quot;https://lwn.net/Articles/486306/&quot;&gt;CAP_SYS_ADMIN&lt;/a&gt; capability, but there’s no real surprise here. A simple check using Python shows that we can (using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cap_setuid&lt;/code&gt;), set the process’ UID, for example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;root@91d6878472c6:/# python3  
Python 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] on linux  
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.  
&amp;gt;&amp;gt;&amp;gt; import os  
&amp;gt;&amp;gt;&amp;gt; os.setuid(65534)  
&amp;gt;&amp;gt;&amp;gt; import pwd  
&amp;gt;&amp;gt;&amp;gt; print(pwd.getpwuid(os.getuid()))  
pwd.struct_passwd(pw_name=&apos;nobody&apos;, pw_passwd=&apos;x&apos;, pw_uid=65534, pw_gid=65534, pw_gecos=&apos;nobody&apos;, pw_dir=&apos;/nonexistent&apos;, pw_shell=&apos;/usr/sbin/nologin&apos;)  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;h2 id=&quot;a-root-container-running-without-the---privileged-flag&quot;&gt;A root container running without the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag&lt;/h2&gt;

&lt;p&gt;Dropping the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag, as expected, grants a lower set of default capabilities:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;root@4a54f2ed2f00:/# grep Cap /proc/$$/task/$$/status  
CapInh:	0000000000000000  
CapPrm:	00000000a80425fb  
CapEff:	00000000a80425fb  
CapBnd:	00000000a80425fb  
CapAmb:	0000000000000000  
root@4a54f2ed2f00:/# capsh --decode=00000000a80425fb  
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; example, these capabilities are already in the effective set, which means that processes that need the included capabilities will be able to run successfully (although you may need to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setcap&lt;/code&gt; or similar to set the required file capabilities on the applications you wish to execute – note that the list of effective process capabilities includes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CAP_SETFCAP&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Due to &lt;a href=&quot;https://linux.die.net/man/7/capabilities&quot;&gt;the way in which capabilities are calculated&lt;/a&gt;, you are absolutely bound by the &lt;em&gt;Bounding&lt;/em&gt; set in this situation, so any further capabilities you need must be added via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--cap-add&lt;/code&gt; when executing the container. Adding file-level capabilities that are outside the process’ bounding set  and attempting to execute those files will not succeed:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;root@3c456e81068c:/# getcap /usr/bin/ping  
/usr/bin/ping cap_net_raw=ep  
root@3c456e81068c:/# setcap &apos;cap_net_raw=ep cap_sys_admin=ep&apos; /usr/bin/ping  
root@3c456e81068c:/# getcap /usr/bin/ping  
/usr/bin/ping cap_net_raw,cap_sys_admin=ep  
root@3c456e81068c:/# ping  
bash: /usr/bin/ping: Operation not permitted  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is all still pretty much as you would expect, although the default list of capabilities provides a relatively significant list of effective permissions that could have security implications depending on your threat model.&lt;/p&gt;

&lt;h2 id=&quot;a-non-root-non-privileged-docker-container&quot;&gt;A non-root, non-privileged Docker container&lt;/h2&gt;

&lt;p&gt;What happens if you then switch away from the root user on a non-privileged container?&lt;/p&gt;

&lt;p&gt;My Dockerfile looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FROM ubuntu:latest  
RUN apt update &amp;amp;&amp;amp; apt -y -q install libcap2-bin iputils-ping  
ENV DEBIAN_FRONTEND=noninteractive  
USER nobody  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running the container and looking at the capability sets shows us the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@f1bf44469f4a:/$ grep Cap /proc/$$/task/$$/status  
CapInh:	0000000000000000  
CapPrm:	0000000000000000  
CapEff:	0000000000000000  
CapBnd:	00000000a80425fb  
CapAmb:	0000000000000000  
nobody@8f013c859738:/$ capsh --decode=00000000a80425fb  
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Even though the process’ &lt;em&gt;Effective&lt;/em&gt; and &lt;em&gt;Permitted&lt;/em&gt; sets are empty, this does not mean that you can’t run processes that require capabilities. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ping&lt;/code&gt; in our Dockerfile installation is again installed with CAP_NET_RAW as a file-level capability:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@8f013c859738:/$ getcap /usr/bin/ping  
/usr/bin/ping cap_net_raw=ep  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and therefore you can run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ping&lt;/code&gt; without issues:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@8f013c859738:/$ ping -c 1 1.1.1.1  
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.  
64 bytes from 1.1.1.1: icmp_seq=1 ttl=63 time=26.4 ms

--- 1.1.1.1 ping statistics ---  
1 packets transmitted, 1 received, 0% packet loss, time 0ms  
rtt min/avg/max/mdev = 26.355/26.355/26.355/0.000 ms  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was initially unintuitive to me (as the &lt;em&gt;Permitted&lt;/em&gt; and &lt;em&gt;Effective&lt;/em&gt; capability sets are empty for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bash&lt;/code&gt; process that we examined), but it again comes down to the way in which the &lt;em&gt;Permitted&lt;/em&gt; set for a new process is calculated:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;P&apos;(permitted) = (P(inheritable) &amp;amp; F(inheritable)) | (F(permitted) &amp;amp; cap_bset)  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That is, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ping&lt;/code&gt; is permitted to use CAP_NET_RAW because CAP_NET_RAW is both in the bounding set (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CapBnd&lt;/code&gt; above) and the permitted set for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/usr/bin/ping&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;If there’s one takeaway from this section, it should be that when using Docker containers, dropping capabilities (potentially at the file level but ideally at the bounding set level), can be quite important in pruning viable attack paths available to an attacker – simply running as non-root and without the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag may not be sufficient, dependent on your threat model.&lt;/p&gt;

&lt;p&gt;For example, in a container started with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker run -it --cap-drop NET_RAW non-root bash&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@f51647bac472:/$ capsh --print  
…  
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap  
…  
nobody@f51647bac472:/$ ping  
bash: /usr/bin/ping: Operation not permitted  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;a-non-root-privileged-container-or-a-non-root-container-with---cap-add&quot;&gt;A non-root, privileged container (or a non-root container with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--cap-add&lt;/code&gt;)&lt;/h2&gt;

&lt;p&gt;This was the edge case that initially drove me down this rabbit hole, and the Docker permissions model in this context is curious.&lt;/p&gt;

&lt;p&gt;Using the non-root Docker image from the previous example, let’s also use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag when running the container. My initial expectation would be that the &lt;em&gt;Permitted&lt;/em&gt; and &lt;em&gt;Effective&lt;/em&gt; capability sets would be the same as in the root container, but that is not the case:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@b802f123279c:/$ grep Cap /proc/$$/task/$$/status  
CapInh:	0000000000000000  
CapPrm:	0000000000000000  
CapEff:	0000000000000000  
CapBnd:	000001ffffffffff  
CapAmb:	0000000000000000  
nobody@b802f123279c:/$ capsh --decode=000001ffffffffff  
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So the &lt;em&gt;Bounding&lt;/em&gt; set gets all the permissions, but the &lt;em&gt;Permitted&lt;/em&gt; and &lt;em&gt;Effective&lt;/em&gt; sets are completely empty. This effectively means that the capabilities you need have to be defined via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setcap&lt;/code&gt; on the application file you wish to execute, before the container is running. This won’t work, for example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@418e021432ff:/$ python3  
Python 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] on linux  
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.  
&amp;gt;&amp;gt;&amp;gt; import os  
&amp;gt;&amp;gt;&amp;gt; os.setuid(0)  
Traceback (most recent call last):  
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;  
PermissionError: [Errno 1] Operation not permitted  
&amp;gt;&amp;gt;&amp;gt; import prctl  
&amp;gt;&amp;gt;&amp;gt; prctl.cap_effective.setuid = True  
Traceback (most recent call last):  
…  
    return _prctl.set_caps(*_parse_caps(True, *args))  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
PermissionError: [Errno 1] Operation not permitted  
&amp;gt;&amp;gt;&amp;gt;  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Instead, your Dockerfile needs to looks like this (note the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setcap&lt;/code&gt; step):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FROM ubuntu:latest  
RUN apt update &amp;amp;&amp;amp; apt -y -q install libcap2-bin iputils-ping python3 python3-prctl  
RUN setcap &apos;cap_setuid=ep&apos; /usr/bin/python3.12  
ENV DEBIAN_FRONTEND=noninteractive  
USER nobody  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And then:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nobody@d31b4e5dc664:/$ python3  
…  
&amp;gt;&amp;gt;&amp;gt; import os  
&amp;gt;&amp;gt;&amp;gt; os.setuid(0)  
&amp;gt;&amp;gt;&amp;gt; os.getuid()  
0  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It’s non-intuitive to me that using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag with a non-root user does not immediately grant the user all capabilities. The Docker docs state that “The –privileged flag gives all capabilities to the container.”, which is straightforwardly true for root users but not so much for other users. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--cap-add&lt;/code&gt; flag works similarly: it only changes the &lt;em&gt;Bounding&lt;/em&gt; set rather than the &lt;em&gt;Effective&lt;/em&gt; or &lt;em&gt;Permitted&lt;/em&gt; sets, so using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setcap&lt;/code&gt; is still required.&lt;/p&gt;

&lt;p&gt;I’m not the first person to bump into this, there’s more discussion on the internet:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/containers/podman/issues/13449&quot;&gt;An issue from 2022 on the podman GitHub org&lt;/a&gt;, which states that the aim of podman’s (similar) behaviour was Docker compatibility.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/hashicorp/nomad/issues/16692&quot;&gt;A similar thread from 2023 on the nomad GitHub repo&lt;/a&gt;, which contains some links to older Docker issues on the matter and a nice table of different permissions settings and the impact they have on specific capabilities.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/moby/moby/issues/8460&quot;&gt;Discussion of the matter on the Docker GitHub repo&lt;/a&gt;, which ends on a slightly inconclusive note.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m not sure what to make of the security impact other than the behaviour is slightly unintuitive to me and is worth understanding when evaluating a container’s security posture.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Running through the various permutations of the root user and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--privileged&lt;/code&gt; flag was useful for me to understand a lot more about how capabilities work in Linux. In addition, there are a handful of things that stand out to me from this review:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;If you’re installing arbitrary packages into your container from external repos, minimizing the &lt;em&gt;Bounding&lt;/em&gt; capabilities set is important, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--drop-cap&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Understanding file-level capabilities is important to fully understand the security posture of your Docker containers.&lt;/li&gt;
  &lt;li&gt;Using specific capabilities in a non-root Docker container is slightly more involved than you might initially expect!&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sun, 15 Dec 2024 17:02:30 +0000</pubDate>
        <link>https://padlock.argh.in/2024/12/15/container-capabilities.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2024/12/15/container-capabilities.html</guid>
        
        
      </item>
    
      <item>
        <title>Auditing GKE operations? Configure Data Access audit logs</title>
        <description>&lt;p&gt;If you’re setting up GKE audit logging, you are probably following the instructions
on &lt;a href=&quot;https://cloud.google.com/kubernetes-engine/docs/how-to/audit-logging&quot;&gt;this page&lt;/a&gt;. It describes two levels of audit logging that are available via
GCP: the ‘Admin Activity log’ and the ‘Data Access log’. The documentation says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;blockquote&gt;
    &lt;blockquote&gt;
      &lt;p&gt;Admin Activity logging is enabled by default and has no extra cost. Data
Access logging is disabled by default, and enabling it can result in extra
billing. To learn more about enabling Data Access logging, and the associated
costs, see ‘Configuring Data Access Logs’.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;The ‘Configuring Data Access Logs’ link points to &lt;a href=&quot;https://cloud.google.com/logging/docs/audit/configure-data-access&quot;&gt;the general Data Access logging page&lt;/a&gt;
for all GCP services, and has no Kubernetes-specific information. A more useful
page to understand exactly how the logging policy works can be found &lt;a href=&quot;https://cloud.google.com/kubernetes-engine/docs/concepts/audit-policy&quot;&gt;here&lt;/a&gt;.
This page clarifies that:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;ul&gt;
    &lt;li&gt;Entries that represent create, delete, and update requests go to your Admin Activity log.&lt;/li&gt;
    &lt;li&gt;Entries that represent get, list, and updateStatus requests go to your Data Access log.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this might seem reasonable on the face of it (most destructive or concerning
operations will go into the Admin Activity logs), the Admin Activity logs are
missing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get&lt;/code&gt; operations on Secret objects by default. So for example, if you store
a service account password in your cluster as a Kubernetes secret, a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl
get secret service_account_password -o yaml&lt;/code&gt; will get an attacker the entire
secret without logging a single line into the audit logs. For this reason alone
(if you use Kubernetes secrets for anything sensitive) it is probably essential
that you enable the Data Access logging as well.&lt;/p&gt;

&lt;p&gt;Interestingly, at the end of the [GKE how-to] on audit logging, they specify a method
for adding Data Access audit logging that will probably generate way more log
data than you actually need (assuming you are only interested data access logging 
from GKE).&lt;/p&gt;

&lt;p&gt;Instead of updating the project’s IAM policy with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;auditConfigs:
- auditLogConfigs:
  - logType: ADMIN_READ
  - logType: DATA_WRITE
  - logType: DATA_READ
  service: allServices
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;as the page (currently) suggests, you can get away with the much less verbose:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;auditConfigs:
- auditLogConfigs:
  - logType: ADMIN_READ
  - logType: DATA_READ
  - logType: DATA_WRITE
  service: container.googleapis.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can also do this via the GUI by following the instructions &lt;a href=&quot;https://cloud.google.com/logging/docs/audit/configure-data-access&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
        <pubDate>Thu, 10 Feb 2022 10:02:30 +0000</pubDate>
        <link>https://padlock.argh.in/2022/02/10/gke-audit.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2022/02/10/gke-audit.html</guid>
        
        
      </item>
    
      <item>
        <title>Questions you should ask at security engineering interviews</title>
        <description>&lt;p&gt;Over the last few months I have been speaking to a variety of companies about
joining their security teams. If you’ve done any software interviews, you’re
probably pretty familiar with how these things go: an hour is usually divided
into two parts, the first part being roughly 45 minutes long, with the
interviewers asking you questions. The second part is left for you to ask
any questions you think are important.&lt;/p&gt;

&lt;p&gt;I have always been slightly surprised by how neglected the second part of these
interviews typically is. Over the years, I’ve interviewed hundreds of people
and have generally found that they either don’t prepare for the questions section,
or have a set of questions that are not well thought through.&lt;/p&gt;

&lt;p&gt;On the rare occasions where I have met a candidate who has asked interesting
questions, this has been a strong point in their favour; when the baseline is
so low, it isn’t particularly hard to make a good impression. It’s also worth
remembering that you’re potentially going to be working with the people in your
interview for years - don’t you &lt;em&gt;really&lt;/em&gt; want to be sure you’re making the right
choice? After all, they have just spent 45 minutes making sure you’re the right
choice for them.&lt;/p&gt;

&lt;p&gt;Some general principles that I find useful for the questions I ask in general
(and these could really apply to any job interview, not just security engineering):&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Minimise qualitative questions&lt;/strong&gt;: Questions like ‘How well is the security
team regarded within the company?’ are not going to get you a useful answer.
No one - in my experience - is going to lie outright, but they are trying
to hire you. Even if the CISO was intentionally locked in a meeting room last week
while the rest of the company proceeded to ship a highly vulnerable release,
the best you are going to get is a vague answer referring to ‘the need to
ship regularly and the healthy tension between that and security’.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Do ask quantitative or process-oriented questions&lt;/strong&gt;: It is more difficult
to gloss over shortcomings when asked a question that refers to a concrete
process or has a specific numerical answer. ‘What is the ratio of application
security engineers to developers?’, for example, has a specific answer that can
tell you a lot about the extent to which security is prioritised within an
organisation. It also can lead to an interesting discussion about why the ratio
is the way it is - has there been rapid and recent growth? Are there turnover
issues within the security team?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Use the cultural interviews to learn about the team you will be joining&lt;/strong&gt;:
The culture-fit interviews will frequently be with people from outside the team
that you will be working with, which is a good oportunity to find out about
how the team is perceived. I would expect mature security teams to work with
a wide range of people across an organisation, from finance to operations, and
the question ‘When was the last time you interacted with the security team and
how did you find the interaction?’ is surprisingly insightful. When I was interviewing at
Sourcegraph, a particular team member’s name came up multiple times in a very
positive light - a sign that there are some high-performing individuals in the
team, the sort of team you want to be joining.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keeping all of that in mind, here are some more of my go-to questions for security
engineering interviews:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;When developing a major new feature or product, how are the product requirements
scoped?&lt;/strong&gt;: The main thing you’re looking for here is whether the security team
is mentioned at all in the process. Are they included, or will they have to
find out about the feature/product on their own? Obviously, the former is preferable.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;If a developer wants to use a new open source software library, what is the process in place
for them to do so? Are there any guardrails to ensure the library is safe?&lt;/strong&gt;: The ideal
answer here is one where there is a light (and potentially automated) review of licensing, whether the library is
being actively maintained, and whether there are any known vulnerabilities. The introduction
of a new software library introduces a significant ongoing burden to an organisation, especially
one that ships software externally, so the decision shouldn’t be taken lightly.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;If I joined your organisation and we were looking back a year from now, what would
a successful year look like for me?&lt;/strong&gt;: This is a good question to understand what
the pain points are that the company is looking to solve. Is is regulatory certification,
expanding client requirements, or just to beef up existing operations? Is this something
that you want to be doing? Is there a plan, or are you in charge of making the plan?&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;What does a successful security team look like to you?&lt;/strong&gt;: This is a good question to
ask senior execs outside the security team, should you get that far into the interview stage.
I’ve received a range of answers, and while there’s no one correct response to this, I tend
to find the most attractive organisations have executives who see the security team as
a group who can actively contribute to an organisation’s overall engineering excellence,
rather than as a dull but necessary regulatory function.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;What is the most common type of security incident your organisation faces? How are you
tackling this?&lt;/strong&gt; A relatively easy question that helps get into some interesting areas.
The things I’m looking to understand are whether there’s a well-defined incident response
process, whether the company is collecting metrics about the incidents that are affecting
them, whether the security engineers are aware of what those metrics look like, and finally
whether their response plan includes post-mortems and further improvements that are actually
put in place. There are also technical aspects to this answer which might be interesting
depending on the issues the organisation might be facing.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;How do you decide whether to build or buy security tooling?&lt;/strong&gt; This is really a personal
preference in terms of where you want to work, and I think I stand somewhere in the middle
of the spectrum. It is important, however, that you receive some sort of conscious opinion
here, and that this opinion chimes with your own. Businesses that are non-software at their
core (do these still exist?) might have a reasonable bias towards buy, while large software businesses
might have very good reasons for building most of their tooling. At its core, the question
you should be asking yourself is whether the answer you are given makes sense given the business
in question and their engineering principles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are other questions that will be specific to the organisation that you are joining.
These types of questions are already well covered in other interview guides available online,
but I would stick to the following basic points:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You may be interviewing for a ‘pure’ security role, but understanding the wider context
of the market the organisation is operating in is essential. Ask senior leaders questions
about the product range, and gauge whether their answers make sense to you - it’s your
dinner on the line if they get it wrong.&lt;/li&gt;
  &lt;li&gt;Understand the regulatory and competitive pressures for the business. Security requirements
are fairly frequently derived from a combination of internal engineering attitudes, regulatory
requirements, and competitive pressure. Ask questions about how the business is planning
to meet regulatory requirements and exceed their competitors’ offerings in terms of 
security. Does the business perceive security as a potential USP of the product?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I hope this is useful to other people out there interviewing! I’ll be joining &lt;a href=&quot;https://about.sourcegraph.com/&quot;&gt;Sourcegraph&lt;/a&gt;
as a Security Engineer in January 2022.&lt;/p&gt;
</description>
        <pubDate>Sat, 04 Dec 2021 15:02:30 +0000</pubDate>
        <link>https://padlock.argh.in/2021/12/04/security-eng.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2021/12/04/security-eng.html</guid>
        
        
      </item>
    
      <item>
        <title>The CKA for security engineers</title>
        <description>&lt;p&gt;One week ago, I passed the exam for the Certified Kubernetes Administrator (CKA)
certification. My eventual goal is the Certified Kubernetes Security
Specialist (CKS), for which the CKA is a prerequisite. There are many descriptions
of the CKA exam process on the internet, but not that many from a security
engineering perspective, so I thought it might be useful to discuss how I found
the course, the preparation I did, and my experience of the exam.&lt;/p&gt;

&lt;p&gt;To begin, some background on me. I have been working in what would traditionally
be called the ‘security industry’ for maybe 5 years now, although my prior experience
as a developer was also security-related. I touched Kubernetes for the first time
roughly three years ago, and I’m lucky enough that in my current job I work with
Kubernetes daily. This involves both deploying and maintaining applications in
Kubernetes clusters, as well as securing and monitoring the same clusters.
As a result, I was interested in the CKA from two perspectives:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Improving my understanding of what a Kubernetes cluster actually consists of,
from the perspective of an end-user who might need to debug broken resources
(although hopefully not a broken cluster itself).&lt;/li&gt;
  &lt;li&gt;Improving my understanding of the security architecture of Kubernetes, in particular
building a complete understanding of Kubernetes-native security features.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;preparation&quot;&gt;Preparation&lt;/h2&gt;

&lt;p&gt;My employer purchased a CKA + CKS exam bundle that included Linux Foundation
courses on both certifications. I didn’t start there, however.
Based on the strong recommendations of some colleagues and several blog posts,
I instead began by running through Kelsey Hightower’s &lt;a href=&quot;https://github.com/kelseyhightower/kubernetes-the-hard-way&quot;&gt;Kubernetes The Hard Way&lt;/a&gt;,
which walks you through manually bootstrapping a Kubernetes cluster. While this
was vaguely interesting, I don’t think it alone was as useful a learning experience as
some blog posts suggest. The actual steps in the exercise are presented without
much context, and I expect that if you did the exercise using GCP (the default
instructions are for GCP), you could finish the entire thing by simply copy-pasting,
without learning much at all. To make the most of it, I would suggest not using
GCP, which would force you to think about how any particular instruction would
translate to the infrastructure you are working on, removing the temptation to
blindly copy commands. I would also suggest spending time reading in detail about
each new component you encounter.&lt;/p&gt;

&lt;p&gt;Once I had my Kubernetes cluster, I tore it down and started going through the
Linux Foundation material. I found the written material on their &lt;a href=&quot;https://training.linuxfoundation.org/training/kubernetes-fundamentals/&quot;&gt;CKA course&lt;/a&gt;
to be OK, although I occasionally saw hints in the material that the author
was less certain about Kubernetes commands than me (unnecessary flags/insructions).
Overall, it was a decent guide to what the curriculum was, but not a brilliant learning
resource. The most useful part of the course were the hands-on questions that
are included; I worked through all of them, and they were decent practice for
the actual exam. I wouldn’t necessarily pay for the course, but if like me, you
have got the course as part of a bundle on offer, it is maybe worth going through
the exercises alone.&lt;/p&gt;

&lt;p&gt;One course that came up in nearly every blog post I read was Mumshad Mannambeth’s
&lt;a href=&quot;https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/&quot;&gt;CKA course&lt;/a&gt; on Udemy. The consistency with which it was recommended was
intriguing enough that I felt obliged to give it a go next, although I didn’t 
bother with the course material, having just gone through the Linux Foundation material.
Instead, I worked through the included lab exercises. These were a really nice interactive way to work through sets
of questions on different Kubernetes domains, with the difficulty building
until you reach two mock exams. I see why it is recommended so highly, although
in general I found the difficulty of all the questions to be
marginally lower than what I encountered in the actual exam. The course is often
on offer, so can be picked up for far less than I think it’s worth.&lt;/p&gt;

&lt;p&gt;Finally, based on some more blog post recommendations, I worked through
&lt;a href=&quot;https://killer.sh/&quot;&gt;killer.sh&lt;/a&gt;. killer.sh is pretty intense - for the CKA, you get access
to 25 questions, all of which are at the higher end of the difficulty scale.
It has the feel of a product in beta: the ‘exam’ mode simply offers you
all 25 questions with a 2-hour clock, while the real exam only makes you do
15-20 questions of lower difficulty in the same time. The automated marking
is also kind of rudimentary at the moment. I suspect all of this will improve
over time, and I thought the overall difficulty level was great practice
for the actual exam. Even if you start a set of questions in ‘exam mode’,
once the two hour clock runs out you get the environment for 34 hours more, so it’s
possible to work through all the questions you haven’t managed to finish in
your own time. It is 30 EUR for two simulator sessions, so pretty expensive
compared to Mumshad’s course. Overall, however, I think it’s better practice
once you’re familiar with the basics.&lt;/p&gt;

&lt;p&gt;If I had to do it again, I would skip Kubernetes The Hard Way and the Linux
Foundation course. I don’t feel as if either of these were as effective a learning
experience as killer.sh or Mumshad’s Udemy course, and I think that because
I started with the wrong two options, I spent much longer preparing for the
exam than I really needed to.&lt;/p&gt;

&lt;h2 id=&quot;exam&quot;&gt;Exam&lt;/h2&gt;

&lt;p&gt;The exam itself was alright, relatively relaxing compared to the difficulty of
killer.sh. I found the wording of a couple of questions slightly vague, but
nothing that was a significant issue. The only point worth noting is that it
took roughly 15 minutes at the start of the exam for the proctor to verify over webcam that
I wasn’t trying to cheat, which was longer than I was expecting. My
results were emailed to me roughly 22h after my exam finished, within the 24h
that Linux Foundation promises.&lt;/p&gt;

&lt;h2 id=&quot;overall-opinions&quot;&gt;Overall opinions&lt;/h2&gt;

&lt;p&gt;In terms of my two initial goals, I do have a much better understanding of 
Kubernetes concepts in some areas, in particular those areas which you might
never touch as a user of a Kubernetes cluster (Endpoints, Static Pods, etc.).
While this might sound somewhat futile, it’s important to understand how all
the different pieces of a cluster fit together in order to secure it, so I’m
glad that I have a more complete grasp of concepts here.&lt;/p&gt;

&lt;p&gt;Aside from the tangential benefits that come from understanding the system
better, there was not much security-related material in the CKA. This is
not a surprise, but just a note for anyone looking to do the CKS - it’s best
to look at the CKA entirely as a preparatory step for the CKS.&lt;/p&gt;

&lt;p&gt;There are also some things
which are expected knowledge for the certification that I will never have to touch again, such as 
hands on work with backing up and restoring &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etcd&lt;/code&gt; clusters, or upgrading
a cluster using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubeadm&lt;/code&gt;. Learning how to do such work at speed seemed slightly
futile in terms of anything I would ever be expected to do as a security specialist.&lt;/p&gt;

&lt;p&gt;By its nature as a timed exam,
the certification also ends up forcing you to spend time learning imperative
CLI commands that you would be unlikely to ever use in the real world. As
a security engineer, if I was ever invoking &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl&lt;/code&gt; on a production cluster
with the wild abandon that the CKA encourages, something would be very wrong.
I’m not sure how easy this problem is to solve, but it’s another area where I
felt like I was learning something just for certification.&lt;/p&gt;

&lt;p&gt;In addition, with industry trends moving towards tools like Digital Ocean’s
&lt;a href=&quot;https://www.digitalocean.com/products/kubernetes/&quot;&gt;managed Kubernetes&lt;/a&gt; and &lt;a href=&quot;https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview&quot;&gt;GKE Autopilot&lt;/a&gt;, I’m also unsure about the long-term
relevance of the ‘Cluster Maintenance’ section of the CKA in general. All the
companies I know who are running their own
Kubernetes clusters are moving over to EKS/GKE/etc. Setting aside my focus on
security, is there &lt;em&gt;anyone&lt;/em&gt; who’s going to be backing up an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;etcd&lt;/code&gt; database
manually in 5 years time?&lt;/p&gt;

&lt;p&gt;Overall, however, I am glad I did the CKA. It has confirmed that I have a
good understanding of concepts in some areas, and reinforced my understanding
of concepts in others. I’ve been ambivalent about certifications, but I do like
how it made me sit down and go through all the Kubernetes fundamentals in a 
structured manner, something that I doubt I would have bothered doing on my own
time. The lab-based exam also means you’re being tested on your
ability to do things rather than your theoretical knowledge, which is much more
meaningful, regardless of its imperfections. Onwards to the CKS I guess!&lt;/p&gt;

</description>
        <pubDate>Sat, 08 May 2021 03:02:30 +0000</pubDate>
        <link>https://padlock.argh.in/2021/05/08/cka.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2021/05/08/cka.html</guid>
        
        
      </item>
    
      <item>
        <title>doh.li now supports ODoH proxying</title>
        <description>&lt;p&gt;Earlier today, Cloudflare announced support for &lt;a href=&quot;https://blog.cloudflare.com/oblivious-dns/&quot;&gt;ODoH&lt;/a&gt;, a new protocol that
(somewhat) solves the problem of having to place significant trust in your
DoH provider. The solution involves leveraging a proxy to pass on your request
in such a manner that the proxy doesn’t know what your request is, and the
DNS resolver doesn’t know who you are. The solution is not perfect - if the
proxy and the resolver collude, you’re back at where you started. However,
with lower latency than DNS-over-HTTPS-over-Tor, and with (probably) more privacy
than standard DNS-over-HTTPS, it might be a sweet-spot for some users. 
If you’re into this sort of tech, the blog post linked above is very 
interesting on the protocol and tradeoffs involved.&lt;/p&gt;

&lt;p&gt;In any case, the DoH service I run at &lt;a href=&quot;https://doh.li&quot;&gt;doh.li&lt;/a&gt; now also supports ODoH proxying,
reverse proxying a stripped-down version of Chris Wood’s &lt;a href=&quot;https://github.com/chris-wood/odoh-server&quot;&gt;odoh-server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To test this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Clone the &lt;a href=&quot;https://github.com/cloudflare/odoh-client-go&quot;&gt;odoh-client-go&lt;/a&gt; repo&lt;/li&gt;
  &lt;li&gt;Change the default proxy mode to HTTPS in &lt;a href=&quot;https://github.com/cloudflare/odoh-client-go/blob/master/commands/common.go&quot;&gt;common.go&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;go build -o odoh-client ./cmd/...&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;./odoh-client odoh --domain i.argh.in. --dnstype A --target odoh.cloudflare-dns.com --proxy doh.li&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You should hopefully see something like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ ./odoh-client odoh --domain i.argh.in. --dnstype A --target odoh.cloudflare-dns.com --proxy doh.li
;; opcode: QUERY, status: NOERROR, id: 52470
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;i.argh.in.     IN       A

;; ANSWER SECTION:
i.argh.in.      8245    IN      A       188.166.143.227
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As far as I’m aware there are no commonly used clients that support ODoH at the
moment, but I will update the instructions on &lt;a href=&quot;https://doh.li&quot;&gt;doh.li&lt;/a&gt; should that change.&lt;/p&gt;

</description>
        <pubDate>Tue, 08 Dec 2020 21:21:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/12/08/odoh.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/12/08/odoh.html</guid>
        
        
      </item>
    
      <item>
        <title>Detections as code: reliably scaling your detections library</title>
        <description>&lt;p&gt;One of the engineering questions that’s been preoccupying me over the last few 
months at Thought Machine has been about the most effective way to maintain a 
large library of detection rules for security events. We use ElastAlert extensively
for our detection libraries, in part because it offers us the ability to put
our detections into code. Our ElastAlert deployments run in immutable containers,
and any change to our rulesets has to go through a code review process (and be
approved by a specific subset of the team) before they are pushed into our monitoring
environments. This is fairly sophisticated as far as the detection solutions I have
seen go - the majority of the products rely on engineers and analysts defining
rules within GUI interfaces, with no effective review process.&lt;/p&gt;

&lt;p&gt;While decent, this hasn’t really involved making use of all the other benefits
of the ‘as-code’ philosophy. Unlike with code, we don’t write tests for our rules,
and unlike our infrastructure deployments, we don’t run configuration checking
either. Or at least, we didn’t until fairly recently. In this post, I’m going to
run through a few things that you can do to add some sophistication to your
collection of security detections (or any ElastAlert rules in general, really).&lt;/p&gt;

&lt;p&gt;For the purpose of this blog post, let’s assume that we’re starting off with
a simple rule based off the Kibana sample server access logs:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;name: &quot;Catch Firefox users&quot;
description: &quot;Alert whenever we see a Firefox user in the logs&quot;

index: kibana_sample_data_logs`
use_ssl: True
type: any
filter:
  - query_string:
      query: &quot;*Firefox*&quot;

alert_text: &quot;test alert&quot;
alert:
  - &quot;debug&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;adding-arbitrary-metadata-to-your-detection-rules&quot;&gt;Adding arbitrary metadata to your detection rules&lt;/h2&gt;

&lt;p&gt;ElastAlert doesn’t really shout enough about the fact that you can add
arbitrary fields to your alerts without any issue - the rule parser just ignores
any fields that it doesn’t need when it loads up the rulesets. At a very simple
level, you can add things like MITRE tactics and techniques:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mitre:
  tactics:
    - TA0043
  techniques:
    - T1595
    - T1190
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Or, if you’re developing ElastAlert rules that are owned by multiple teams, you
can define owners for your rules. In our case, perhaps a theoretical team to
capture and contain rogue Firefox users:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;owner: Firefox User Detection Team
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You could also have a free-text array of tags, for example with tags that correspond
to the certification rules that a particular rule covers:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;tags:
  - ISO-27001-12.4.2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There’s a huge variety of ways in which you could use this arbitrary metadata, but
whatever way in which you do this, you can then trivially build automation to loop
over the rules that you have created, and measure your coverage in various ways. How
well do your rules cover the entire range of MITRE tactics, for example? Do you have
detections to cover a particular item required by your ISO 27001 audit? How many detections
do you have overall, how many belong to each team, and to which platforms do they fire?&lt;/p&gt;

&lt;p&gt;In addition to metrics, one of the ways in which we have used these arbitrary tags at Thought Machine is in a
recent project to reduce the number of missing runbooks in our set of detections. We
started by tagging all our missing runbooks with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;missing_runbook&lt;/code&gt; tag, and then added
something to our ElastAlert metrics script to count the number of missing runbooks and
display that on an internal dashboard. As we wrote runbooks, we removed the tag from
each detection that we added a runbook to; having a number counting down gave the 
project a sense of direction and a concrete idea of the effort that we needed to put
into the exercise.&lt;/p&gt;

&lt;h2 id=&quot;configuration-checking-with-conftest&quot;&gt;Configuration checking with conftest&lt;/h2&gt;

&lt;p&gt;Once you have a structure for your alerts, including the arbitrary metadata fields
that you find useful, you can now begin thinking about configuration testing. Ideally,
we would like to be able to ensure that everyone is following the same basic pattern
when building alerts. &lt;a href=&quot;https://www.conftest.dev/&quot;&gt;conftest&lt;/a&gt; is a tool that we use
at Thought Machine to test a variety of different cloud infrastructure pieces, but
you can use it to write rules against any YAML file. Using a rule like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;deny[msg] {
  not input.mitre
  msg := &quot;MITRE tactics &amp;amp; techniques have not been defined for this rule&quot;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;you can identify any rules that don’t have a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mitre&lt;/code&gt; field defined in the 
rule YAML definition. You could also ensure that links to runbooks are in
their own &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runbook:&lt;/code&gt; field, which you then check for pre-commit (you can
include the runbook link as a regular ElastAlert variable in the alert text).&lt;/p&gt;

&lt;p&gt;In this manner, you can build a library of tests to ensure that your
rules all have certain fields, and that everyone in the team is conforming to a 
specific rule structure. It’s a simple and quick way of enforcing some degree
of consistency.&lt;/p&gt;

&lt;h2 id=&quot;integration-testing-with-elastalert-ci&quot;&gt;Integration testing with elastalert-ci&lt;/h2&gt;

&lt;p&gt;Finally, we get to my pet project over the last few months: &lt;a href=&quot;https://github.com/ferozsalam/elastalert-ci&quot;&gt;elastalert-ci&lt;/a&gt;. One of
the major difficulties we have had is in reliably testing ElastAlert rules before
deployment; the solution that we have often resorted to is to push a test rule into
the production monitoring environment and then run the operation that should cause
an alert, and see if it fires. This doesn’t really scale with detection complexity.&lt;/p&gt;

&lt;p&gt;Using &lt;a href=&quot;https://github.com/ferozsalam/elastalert-ci&quot;&gt;elastalert-ci&lt;/a&gt;, you can write tests for your ElastAlert rules, which will then be
run against real data to verify that they actually do what you expect them to do.
&lt;a href=&quot;https://github.com/ferozsalam/elastalert-ci&quot;&gt;elastalert-ci&lt;/a&gt; is a bit heavyweight to be run as a pre-commit hook on all rules, but
is simple to run against a single rule. You can read more about it &lt;a href=&quot;https://github.com/ferozsalam/elastalert-ci&quot;&gt;on Github&lt;/a&gt;, but
the main thing it gives us - as with any good integration test - is the confidence to
make changes to rules knowing that they still work on the cases that we expect them
to work on. If you would like to see how you could add integration testing to the
sample rule I posted above, have a look at my &lt;a href=&quot;https://padlock.argh.in/2020/10/04/elastalert-ci-example.html&quot;&gt;previous blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;As we have grown the Threat Detection team at Thought Machine over the last few
months, and continue to grow as a company, it’s been important for us to build guardrails
that mean that we will be able to work in a consistent, reliable, and automation-friendly
manner. Looking back on the rule that I used at the top of the post, with all the metadata
I have suggested, your rule might now look something like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;name: &quot;Catch Firefox users&quot;
description: &quot;Alert whenever we see a Firefox user in the logs&quot;

index: kibana_sample_data_logs`
use_ssl: True
type: any
filter:
  - query_string:
      query: &quot;*Firefox*&quot;

alert_text: &amp;gt;-
  Firefox user detected. Help!
  Runbook: {0}
alert_text_args:
  - runbook
alert:
  - &quot;debug&quot;

mitre:
  tactics:
    - TA0043
  techniques:
    - T1595
    - T1190
owner: Firefox User Detection Team
runbook: &quot;https://internal-wiki.example.com/runbooks/catching-firefox-users&quot;
tags:
  - ISO-27001-12.4.2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Using this structure, you can now:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Buid metrics to automatically answer questions about your detection ruleset,
including metrics that could ease the process of audits and certifications.&lt;/li&gt;
  &lt;li&gt;Enforce minimum standards in your detection ruleset.&lt;/li&gt;
  &lt;li&gt;Potentially even write integration tests, to test that your rules are syntactically
correct and match against real data where you expect them to.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re building your own detection team, hopefully the ideas above show how you too
can get more than code review out of the ‘detections-as-code’ model, and really make
use of the power of committed, automatically parseable detection rules.&lt;/p&gt;

</description>
        <pubDate>Sat, 31 Oct 2020 14:21:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/10/31/detections-as-code.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/10/31/detections-as-code.html</guid>
        
        
      </item>
    
      <item>
        <title>Unit testing an ElastAlert rule using elastalert-ci</title>
        <description>&lt;p&gt;&lt;em&gt;This post refers to an early version of elastalert-ci, and technical implementation
details mentioned below may not apply. Please read the README on the project repository
for accurate information on how to use elastalert-ci within your project.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I wrote my &lt;a href=&quot;https://padlock.argh.in/2020/05/17/elastalert-ci.html&quot;&gt;original post&lt;/a&gt; on unit testing for ElastAlert earlier this
year, I cunningly didn’t go into very much detail on how a user should
create the data required for the unit test to run against. This was largely
because I hadn’t worked out the exact workflow I would use myself. 
Elasticsearch is relatively particular about how it wants data to be uploaded
to it, with widespread usage of the .ndjson (newline-delimited JSON) format 
and the requirement that certain metadata fields are present. This means that
it’s not as straightforward as downloading the data you want and being able
to directly re-upload it to Elasticsearch. I made the call that for the first
version, I would leave it up to anyone who cared enough to manipulate the data
into the required format before using it.&lt;/p&gt;

&lt;p&gt;I found some time this week to sit down and test the process of developing
a new unit-tested rule from sample data, which is fairly fundamental
documentation for the package. I have also created a small helper script
to download the data required from Elasticsearch in a format that the unit
testing framework will be able to use automatically without further
human intervention. Between the two, you should be able to go from an ElastAlert rule
rule to a &lt;em&gt;unit-tested&lt;/em&gt; ElastAlert rule in less than an hour.&lt;/p&gt;

&lt;p&gt;To illustrate the process of writing a rule, I’m going to use sample data
that comes with Kibana. To follow along, you will therefore need to install
Elasticsearch and Kibana. I used the &lt;a href=&quot;https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html&quot;&gt;ECK quickstart&lt;/a&gt; on Minikube, but
any Elasticsearch + Kibana setup will do. You will also need some familiarity
with querying Elasticsearch via the &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html&quot;&gt;Search API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On the Kibana homepage, click on
‘Load a data set and a Kibana dashboard’, and on the following page, on the
card titled ‘Sample web logs’, click ‘Add data’. Kibana should set up the
data for you, and display a success message when it is done.&lt;/p&gt;

&lt;p&gt;The sample web logs are the sort of access logs that you would receive
from a web server. For our example, let’s say that we’re interested in
alerting if we see any access log entries from Firefox user agents, because
we all know Firefox users are deviants who must be punished.&lt;/p&gt;

&lt;p&gt;An ElatAlert rule for this could might look like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;name: &quot;Catch Firefox users&quot;
description: &quot;Alert whenever we see a Firefox user in the logs&quot;

index: kibana_sample_data_logs`
use_ssl: True
type: any 
filter:
  - query_string:
      query: &quot;*Firefox*&quot;

alert_text: &quot;test alert&quot;
alert:
  - &quot;debug&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, let’s say we wanted to unit test whether this alert would actually
work against real data in the index. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;elastalert-ci&lt;/code&gt; is built to integrate
closely with CircleCI, but can also be used locally, which is what I’m going to
do here.&lt;/p&gt;

&lt;h2 id=&quot;steps&quot;&gt;Steps&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;Clone &lt;a href=&quot;https://github.com/ferozsalam/elastalert-ci&quot;&gt;elastalert-ci&lt;/a&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cd&lt;/code&gt; into the root directory of the repository.&lt;/li&gt;
  &lt;li&gt;Copy the rule above into a new YAML file. Save it as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sample_rule.yaml&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;The first unit-testing step is to extract the data that you want to test against from
Elasticsearch, which is where the helper script does the work. The helper
script currently requires the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ES_USERNAME&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ES_PASSWORD&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ES_HOST&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ES_PORT&lt;/code&gt; environment variables to be set, so set those to your local Elasticsearch
environment.&lt;/li&gt;
  &lt;li&gt;Write a search query using the Search API to get a subset of the data that
you would like the unit test rule to run against. Refer to the &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html&quot;&gt;Search API
documentation&lt;/a&gt; if you aren’t familiar with how the Search API works. It
might also be useful to use Kibana’s &lt;a href=&quot;https://www.elastic.co/guide/en/kibana/current/console-kibana.html&quot;&gt;Dev Tools&lt;/a&gt; to play around
with the query until you’re receiving the data that you want.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Convert the query to an argument that you can pass to the exporter script
in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;util/es-data-exporter.py&lt;/code&gt;. For example:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; GET kibana_sample_data_logs/_search
 {
   &quot;query&quot;: {
     &quot;match_all&quot;: {}
   }
 }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;would translate to:&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python3 util/es-data-exporter.py --index kibana_sample_data_logs --query &quot;{\&quot;query\&quot;: {\&quot;match_all\&quot;: {}}}&quot; &amp;gt; access-logs.json&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Run the above command.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Update the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data-file.yaml&lt;/code&gt; data configuration, adding in an entry for 
the access log data file. Something like:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;weblogs:
  filename: &quot;access-logs.json&quot;
  timestamp_field: &quot;timestamp&quot;
  start_time: &quot;2020-05-20T00:39:02&quot;
  end_time: &quot;2020-09-20T06:15:34&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: You will have to define your own start and end times based on the
start and end times of the data in your index. They don’t have to match
the first and last record of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;access-logs.json&lt;/code&gt; data exactly, but the
time period defined must cover the records that you want to run ElastAlert
against. Defining a wide time period here is fine, but it will also increase
the time taken by the script to run.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Add an annotation to your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sample_rule.yaml&lt;/code&gt;, telling it what data file the 
unit test will require:&lt;/p&gt;

    &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ci_data_source: &quot;weblogs&quot;&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Add the rule to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--rules&lt;/code&gt; argument in the Dockerfile&lt;/li&gt;
  &lt;li&gt;Run the tests! &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo docker-compose build&lt;/code&gt; and then
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo docker-compose upi --abort-on-container-exit&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If everything is successful, the containers should exit with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;elastalert-ci_1  | Testing Catch Firefox users
elastalert-ci_1  | 2020/10/04 09:18:07 Command finished successfully.
elastalert-ci_elastalert-ci_1 exited with code 0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can try changing the rule to match on random text to verify that the run
fails in case the rule doesn’t match on anything.&lt;/p&gt;

&lt;p&gt;The most time-consuming part of this will be the formulation of the necessary
query to grab the data, but ideally multiple rules can be referenced against
a single data file, which should reduce the overhead of writing tests against
the same data sources.&lt;/p&gt;

</description>
        <pubDate>Sun, 04 Oct 2020 08:21:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/10/04/elastalert-ci-example.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/10/04/elastalert-ci-example.html</guid>
        
        
      </item>
    
      <item>
        <title>Microk8s doesn&apos;t play well with wg-quick (Wireguard)</title>
        <description>&lt;p&gt;For the last few months, Wireguard has been mysteriously broken on my personal laptop.
I hadn’t touched the configuration, and my other devices were working perfectly, but
packets from my laptop were no longer reaching my Wireguard server. I finally decided
to sit down and crack the problem today. After a couple of hours spent in the unhappy
company of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dmesg&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tcpdump&lt;/code&gt; and various reboots, I have a culprit: Microk8s.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/trailofbits/algo&quot;&gt;Algo&lt;/a&gt;, which is what I used to set up Wireguard, recommends the use of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wg-quick&lt;/code&gt;
to set up client devices on Linux. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wg-quick&lt;/code&gt; sets up a rule to route all traffic via
the Wireguard network interface. Wireguard also adds a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fwmark&lt;/code&gt; to packets, which is
apparently a way of tagging certain packets so that they can be routed in a particular
way. I don’t fully understand the networking intricacies here, but Microk8s (which acts
directly on the host, unlike Minikube), also adds its own iptables rules, in particular
including a rule that drops all marked packets.&lt;/p&gt;

&lt;p&gt;There are a couple of people who appear to have run into this issue in different contexts,
with differing solutions.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Stop/remove Microk8s and reboot. &lt;a href=&quot;https://github.com/ubuntu/microk8s/issues/688&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Don’t use wg-quick and run the networking setup by hand. &lt;a href=&quot;https://discuss.kubernetes.io/t/kubernetes-wireguard-flannel-overlay-network-on-vms-blocked-by-kubefirewall/4602&quot;&gt;Kubernetes Forums&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Remove the fwmark from Wireguard configuration. &lt;a href=&quot;https://github.com/ubuntu/microk8s/issues/1541&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My first instinct was to remove Microk8s, which I can confirm works. I’m not sure
what the etiquette of marked packets is: whether Wireguard should be marking packets
differently or Kubernetes shouldn’t be routing marked packets in that way. Regardless,
the fix was easy enough!&lt;/p&gt;

</description>
        <pubDate>Sun, 06 Sep 2020 16:03:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/09/06/microk8s-wireguard.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/09/06/microk8s-wireguard.html</guid>
        
        
      </item>
    
      <item>
        <title>Mapping EKS and GKE audit logs</title>
        <description>&lt;p&gt;GKE and EKS forward audit logs from the Kubernetes API server to Cloud Audit Logs and
Cloudwatch respectively. Unfortunately, however, the logs from each provider have a
marginally different format, which means that you can’t simply apply the same rules
to logs from both sources indiscriminately.&lt;/p&gt;

&lt;p&gt;Taking a single operation - the creation of an nginx pod - in a vanilla installation
of GKE and EKS, I have extracted the audit log record created for the operation. From
this, I have created a simple mapping between GKE and EKS, which can be found &lt;a href=&quot;https://www.notion.so/cef9899794384a55a83a3a00cf8a614f?v=608e5dd3f01b46c9ac23b32defed7acc&quot;&gt;here&lt;/a&gt;,
along with the raw log data that I created the mapping from.&lt;/p&gt;

&lt;p&gt;I chose GKE and EKS because they are probably the most popular choices for managed
k8s deployments, and there’s a possibility that for various reasons you might have
clusters on both providers.&lt;/p&gt;

&lt;p&gt;The logs themselves are fairly similar, with some key differences:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;I couldn’t find a simple field in the Cloudwatch log record to tell me exactly which cluster and
region the operation was occurring in. I assume that you would need to correlate the operation
with other data, such as IAM logs, in order to work that out, but it seems like an obvious
nice-to-have. The data is clearly accessible in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;resource&lt;/code&gt; field via the GKE Cloud Audit Logs.&lt;/li&gt;
  &lt;li&gt;The Cloud Audit Log throws a bunch of log data into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;protoPayload&lt;/code&gt; object, which potentially
reflects the fact that the log is being pushed as a protobuf. It’s a little bit messier than
the EKS log, which is much easier to parse because fields are better named and better split
up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regardless, from my research above it should be easy to write some sort of translation
layer to unify GKE and EKS audit log data to ensure that you can then compare consistently
between the two.&lt;/p&gt;

&lt;p&gt;However as far as I’m aware you can’t control the formatting of either log source, so you’re
probably still exposed to the whims of either provider in the long run.&lt;/p&gt;

</description>
        <pubDate>Sat, 15 Aug 2020 09:03:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/08/15/eks-gke-audit.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/08/15/eks-gke-audit.html</guid>
        
        
      </item>
    
      <item>
        <title>Easy Kubernetes audit log inspection with Vagrant</title>
        <description>&lt;p&gt;For a project that &lt;a href=&quot;https://github.com/marcojmancini&quot;&gt;Marco&lt;/a&gt; and I have been working on, we have recently had
a need to examine Kubernetes audit logs. In order to simplify and standardise
the process of creating a small k8s environment that generates Kubernets audit
logs, I have created &lt;a href=&quot;https://github.com/ferozsalam/k8s-audit-log-inspector&quot;&gt;a Vagrant box&lt;/a&gt; that:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Sets up microk8s with audit logging configured&lt;/li&gt;
  &lt;li&gt;Loads a custom audit policy&lt;/li&gt;
  &lt;li&gt;Sets up Elasticsearch and Kibana to ship logs to&lt;/li&gt;
  &lt;li&gt;Sets up Filebeat to watch the microk8s audit logs and ship them to Elastic&lt;/li&gt;
  &lt;li&gt;Opens up port 5601 on localhost so that you can navigate to the logs in your
browser on the host&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are more detailed instructions in the README for the repo linked above.&lt;/p&gt;

&lt;p&gt;Marco has added some intelligent parsing of the logs, so that all the elements
of the audit logs are neatly tagged for correlation and searching.&lt;/p&gt;

&lt;p&gt;If you want to play around with different audit log policies, or create
a small local Kubernetes environment with audit logging enabled, this should
‘just work’, and give you a nice view of the data you would receive using
different audit policies.&lt;/p&gt;

</description>
        <pubDate>Sun, 31 May 2020 09:03:30 +0000</pubDate>
        <link>https://padlock.argh.in/2020/05/31/k8s-audit.html</link>
        <guid isPermaLink="true">https://padlock.argh.in/2020/05/31/k8s-audit.html</guid>
        
        
      </item>
    
  </channel>
</rss>
