<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Florian Akos Szabo | FLRNKS</title><link>https://flrnks.netlify.app/author/florian-akos-szabo/</link><atom:link href="https://flrnks.netlify.app/author/florian-akos-szabo/index.xml" rel="self" type="application/rss+xml"/><description>Florian Akos Szabo</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© 2024</copyright><image><url>https://flrnks.netlify.app/author/florian-akos-szabo/avatar_hu81f330f762e2aabb61e21d9093b724db_89928_270x270_fill_lanczos_center_2.png</url><title>Florian Akos Szabo</title><link>https://flrnks.netlify.app/author/florian-akos-szabo/</link></image><item><title>Cloud Security Automation</title><link>https://flrnks.netlify.app/post/sans-sec540/</link><pubDate>Thu, 26 Nov 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/sans-sec540/</guid><description>&lt;p>In November 2020 I was lucky to have had the chance to take part in my 2nd SANS course of the year: &lt;strong>SEC540 - Cloud Security and DevOps Automation -&lt;/strong> as part of the
&lt;a href="https://www.sans.org/event/amsterdam-november-2020-live-online" target="_blank" rel="noopener">SANS Amsterdam&lt;/a>. Unlike the first one, this was conducted in a remote-only format that they call &lt;strong>LiveOnline&lt;/strong>. I liked it so much that I wanted to share it. If interested, you can read more about my experience of &lt;strong>SEC530 - Defensible Security Architecture -&lt;/strong> in
&lt;a href="https://flrnks.netlify.app/post/sans-sec530">this post&lt;/a> which was an on-site/in-person course as part of the
&lt;a href="https://www.sans.org/event/prague-march-2020" target="_blank" rel="noopener">SANS Prague&lt;/a> in March 2020.&lt;/p>
&lt;h2 id="pre-course">Pre-Course&lt;/h2>
&lt;p>About a week before the course was set to begin, I received the Course Booklets via UPS delivery. It was a bit surprising that they did not send an email with the tracking ID, so I was caught off-guard when I was told I needed to pick it up in a nearby UPS affiliate shop. Nevertheless, it was quite fast and efficient, so there were no issues there.&lt;/p>
&lt;p>Since this was a &lt;strong>LiveOnline&lt;/strong> course, I needed to download a few things from my SANS account in advance, that normally would be distributed on USB sticks at the start of an in-person course. Luckily they send numerous email reminders about this, and there are also great instructions available online, such as
&lt;a href="https://sansorg.egnyte.com/dl/wO5QUU3BK5/Power_Computing_-_Generic_Laptop_Requirements_Checklist_v2.0.docx_" target="_blank" rel="noopener">THIS&lt;/a> document.&lt;/p>
&lt;p>The most important item to download was of course the course VM for the Lab Exercises. For this course, it was a 9 GB iso file which had the compressed VMWare virtual machine image in it. This VM required quite substantial resources, so I felt lucky to have a work laptop that has 32 GB RAM with an 8 core Intel i9 CPU and 1 TB of SSD storage. The RAM was especially critical for the VM, it needed at least 12 GB, but I gave it 16 just to be sure. For students whose machine was no powerful enough they had an AMI image in AWS with a Cloudformation template to set it up quickly.&lt;/p>
&lt;p>In addition, we needed to download and set up Slack for chat support during the course and GoToTraining for the actual streaming of the course content. I found that for whatever reason the GoToTraining session was spiking my laptop&amp;rsquo;s CPU usage to a point that it was almost overheating, so I decided to use my Table for the course streaming, which worked quite well.&lt;/p>
&lt;p>Last but not least, I also downloaded the course booklets in pdf format, however they were heavily protected with watermarks and a complex password. Copy-pasting was also disabled. It would have been nice if I could open the pdfs on my tablet and use my pencil to write on it, but since I also had the printed booklets this was a minor annoyance.&lt;/p>
&lt;h2 id="course-content">Course Content&lt;/h2>
&lt;p>The first day started with an introduction to the principles of DevOps and how Security can be integrated into CI/CD pipelines. In between the topics, we were getting familiar with the student VM which is home to the Lab Exercises. I have to admit that at first I was quite overwhelmed by the complex setup that&amp;rsquo;s shipped in this single VM image. There were a surprising number of services running in docker containers behind the scenes, such as Jenkins, GitLab and Hashicorp Vault.&lt;/p>
&lt;p>As part of the day 1 labs we practiced the deployment of a web service using
&lt;a href="https://www.jenkins.io/" target="_blank" rel="noopener">Jenkins&lt;/a>. We also implemented improved security via pre-commit scanning and Security Analysis (SAST/DAST) as part of the CI/CD pipeline. The next day we set up the environment that paved our journey to the cloud (AWS) relying on concepts such as Infrastructure-as-Code (
&lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener">Cloudformation&lt;/a>) and Configuration Management (
&lt;a href="https://puppet.com/" target="_blank" rel="noopener">Puppet&lt;/a>). On day 3 we embarked on a journey to harden our cloud infrastructure with tools that can do Security Scanning and Continuous Monitoring and Alerting (
&lt;a href="https://grafana.com/" target="_blank" rel="noopener">Grafana&lt;/a> &amp;amp;
&lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener">CloudWatch&lt;/a>). We also looked into secrets management best practices on-premise and in the cloud via
&lt;a href="https://www.vaultproject.io/" target="_blank" rel="noopener">Hashicorp Vault&lt;/a>. On day 4 we fixed some vulnerabilities in our web service using a blue/green deployment setup to minimize downtime. We also looked into protecting microservice APIs using serverless functions that aim to manage authorization and access control. On the final day we looked into certain concepts related to compliance in cloud environments and explored technologies such as
&lt;a href="https://aws.amazon.com/waf/" target="_blank" rel="noopener">AWS WAF&lt;/a>,
&lt;a href="https://duo.com/blog/introducing-cloudmapper-an-aws-visualization-tool" target="_blank" rel="noopener">CloudMapper&lt;/a> and
&lt;a href="https://cloudcustodian.io/" target="_blank" rel="noopener">Cloud Custodian&lt;/a>.&lt;/p>
&lt;p>I have to admit that the lab environment that&amp;rsquo;s set up in the Student VM was pretty impressive to me. There were so many moving parts to it, yet everything worked more or less seamlessly. The built-in Wiki always provided detailed instructions with copy-paste support to allow you to work through each lab even if you were unfamiliar with the technology. If you were stuck you could get help very quickly from the Teaching Assistant, or the Instructor as well. Overall they did an excellent job over the 5 days of the course.&lt;/p>
&lt;h2 id="netwars">NetWars&lt;/h2>
&lt;p>This post would not be complete without mention of the NetWars arena which I was very keen to take part in. During &lt;strong>#SEC530&lt;/strong> in March 2020, the NetWars arena was open only on Day 6 when we competed against each other in teams. Thanks to this course, I was invited to several free NetWars events afterwards, such as
&lt;a href="https://www.sans.org/cyber-ranges/netwars-tournaments/core/" target="_blank" rel="noopener">Core NetWars&lt;/a> and the Mini NetWars Missions 1-2-3-4.&lt;/p>
&lt;p>I am quite certain that these free NetWars sessions helped me immensely to hone my CTF skillz, that would come in handy during &lt;strong>#SEC540&lt;/strong> where I had 4 full days to compete. I jumped to the front of the leader board already after the first night, as I stayed up until 3 am working on the NetWars questions. This was a bit reckless as I was a bit tired the day after, so my focus on the course material was not the best, but a few rounds of coffee helped with that.&lt;/p>
&lt;p>&lt;img src="scoreboard.png" alt="SEC540-NetWars-Scoreboard">&lt;/p>
&lt;p>In the end I managed to keep my position on the top of the leaderboard which made me feel really proud as I&amp;rsquo;ve worked really long and hard during the whole week. I even managed to solve some of the more advanced &lt;code>1337&lt;/code> challenges that had no hints, just a description of what was required and we were free to improvise the solution.&lt;/p>
&lt;p>Two months later my 2nd NetWars coin has finally arrived by post 🤩&lt;/p>
&lt;p>&lt;img src="coin.jpg" alt="SEC540-NetWars-Coin">&lt;/p>
&lt;h2 id="conclusions">Conclusions&lt;/h2>
&lt;p>Initially I was quite hesitant about attending &lt;strong>SEC540&lt;/strong> in the &lt;strong>LiveOnline&lt;/strong> format as I was not sure if it would work well. In the end I was left with only positive feelings about it. The course content was excellent. The delivery was smooth and help was always available through the Slack channel. If someone wants to learn about DevOps, Cloud and Security, I highly recommend this SANS course!&lt;/p>
&lt;h3 id="ps">P.S.&lt;/h3>
&lt;p>On the 1st of February, 2.5 months after my class I successfully passed the GIAC exam and became GCSA certified! 🎉&lt;/p></description></item><item><title>My first scala app</title><link>https://flrnks.netlify.app/post/aws-scala-tools/</link><pubDate>Sat, 10 Oct 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/aws-scala-tools/</guid><description>&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>In this post I wanted to write about a personal project I started some time ago, with the goal of learning more about Scala. At work, we use Scala quite often to run big data jobs on AWS using Apache Spark. I&amp;rsquo;ve never used Scala before I joined my current team, and its syntax was very alien to me. However, recently I had the chance to work on a task, where I had to modify a component to use AWS Secrets Manager instead of HashiCorp&amp;rsquo;s Vault for fetching some secret value at runtime. To my surprise I could complete this work without much struggle with Scala, and afterwards I became eager to learn more. Based on a colleague&amp;rsquo;s recommendation I started reading a book from Cay S. Horstmann titled &lt;strong>Scala for the impatient (2nd edition)&lt;/strong>. I&amp;rsquo;m making slow but steady progress.&lt;/p>
&lt;p>
&lt;a href="https://learning.oreilly.com/library/view/scala-for-the/9780134540627/" target="_blank" rel="noopener">&lt;img src="images/scalabook.jpg" alt="Scala-For-The-Impatient">&lt;/a>&lt;/p>
&lt;p>Shortly after starting with the book, I had the idea to start a small project so that I can practice Scala by doing.&lt;/p>
&lt;h2 id="the-idea">The Idea&lt;/h2>
&lt;p>The idea, like many others before, came while fixing a bug at work. The bug was found within a component written in Scala to interact with the AWS Athena service. It had some neatly written functionality for making queries and waiting for their completion before trying to fetch the results. I thought I would try to write something similar for AWS Systems Manager (SSM). It is a service with few different components, so I decided to focus on &lt;code>Automation Documents&lt;/code> that can carry out actions in an automated fashion. For example, the AWS provided SSM document &lt;code>AWS-StartEC2Instance&lt;/code> can run any EC2 instance when invoked with the below 2 input parameters:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>InstanceId&lt;/strong>: to specify which EC2 instance you want to start&lt;/li>
&lt;li>&lt;strong>AutomationAssumeRole&lt;/strong>: to specify an IAM role which can be assumed by SSM to carry out this action&lt;/li>
&lt;/ul>
&lt;p>I realized quite early on, that if I wanted to implement this capability in my Scala app, it needed to be quite generic, so that it could support any Automation Document with an arbitrary number of input parameters. I also wanted it to be able to wait for the execution and report whether it failed or succeeded. Here are the final requirements I came up with:&lt;/p>
&lt;ul>
&lt;li>create 2 separate git repos for:
&lt;ul>
&lt;li>a module that&amp;rsquo;s home for the AWS utility/helper classes&lt;/li>
&lt;li>a module for implementing the CLI App&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>support extra AWS services such as KMS, Secrets Manager and CloudFormation&lt;/li>
&lt;li>utilize
&lt;a href="https://github.com/localstack/localstack-java-utils" target="_blank" rel="noopener">localstack&lt;/a> for integration testing (when possible)&lt;/li>
&lt;/ul>
&lt;h2 id="initial-setup">Initial setup&lt;/h2>
&lt;p>Firstly, I had to figure out which third-party packages I needed to implement the app according to these simple requirements. To interact with AWS from Scala code, I decided to go with &lt;strong>v2&lt;/strong> of the official
&lt;a href="https://docs.aws.amazon.com/sdk-for-java/index.html" target="_blank" rel="noopener">Java SDK for AWS&lt;/a>. To implement the CLI app I mainly relied on the &lt;strong>picocli&lt;/strong> Java package, which was a bit less straightforward, but eventually it proved to be a good choice.&lt;/p>
&lt;p>Secondly, I have to admit that creating a re-usable scala package from scratch was a rather non-trivial task for me. Most of my programming experience comes from working with in non-JVM based environments so that&amp;rsquo;s probably no surprise. I initially started out with &lt;strong>sbt&lt;/strong> for build &amp;amp; dependency management, but I was running into issues that I couldn&amp;rsquo;t solve on my own, so I decided to swap it with &lt;strong>maven&lt;/strong> which was a bit more familiar to me.&lt;/p>
&lt;p>Finally, separating the project into two distinct git repositories allowed me to practice versioning and dependency management which I also found very useful:&lt;/p>
&lt;ul>
&lt;li>AWS Scala Utils: &lt;a href="https://github.com/florianakos/aws-utils-scala">https://github.com/florianakos/aws-utils-scala&lt;/a>&lt;/li>
&lt;li>AWS SSM CLI App: &lt;a href="https://github.com/florianakos/aws-ssm-scala-app">https://github.com/florianakos/aws-ssm-scala-app&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="the-utils-module">The utils module&lt;/h2>
&lt;p>Creating the utils module that would serve as a kind of glue between the scala CLI app and AWS Systems Manager was actually not as difficult as I thought. This is mostly thanks to the example I&amp;rsquo;ve seen at work for a similar project with the AWS Athena service.&lt;/p>
&lt;p>The core functionality of the utils module when it comes to SSM, is captured in the below functions:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-scala" data-lang="scala">&lt;span class="k">private&lt;/span> &lt;span class="k">def&lt;/span> &lt;span class="n">executeAutomation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">documentName&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">parameters&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">java.util.Map&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>,&lt;span class="kt">java.util.List&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">]])&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">Future&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">startAutomationRequest&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">StartAutomationExecutionRequest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">documentName&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">documentName&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">build&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="nc">Future&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">executionResponse&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="n">ssmClient&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">startAutomationExecution&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startAutomationRequest&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">s&amp;#34;Execution id: &lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="n">executionResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">automationExecutionId&lt;/span>&lt;span class="o">()&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s">&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="n">executionResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">automationExecutionId&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="k">private&lt;/span> &lt;span class="k">def&lt;/span> &lt;span class="n">waitForAutomationToFinish&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">executionId&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">String&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">Future&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">getExecutionRequest&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">GetAutomationExecutionRequest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">builder&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="n">automationExecutionId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">executionId&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="n">build&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="k">var&lt;/span> &lt;span class="n">status&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">IN_PROGRESS&lt;/span>
&lt;span class="nc">Future&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">var&lt;/span> &lt;span class="n">retries&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">status&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">SUCCESS&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">automationExecutionResponse&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="n">ssmClient&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">getAutomationExecution&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">getExecutionRequest&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="n">status&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="n">automationExecutionResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">automationExecution&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">automationExecutionStatus&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="n">status&lt;/span> &lt;span class="k">match&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">case&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">CANCELLED&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">FAILED&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">TIMED_OUT&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span>
&lt;span class="k">throw&lt;/span> &lt;span class="nc">SsmAutomationExecutionException&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">status&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">automationExecutionResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">automationExecution&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">failureMessage&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="k">case&lt;/span> &lt;span class="nc">AutomationExecutionStatus&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">SUCCESS&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span>
&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">s&amp;#34;Query finished with status: &lt;/span>&lt;span class="si">$status&lt;/span>&lt;span class="s">&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="k">case&lt;/span> &lt;span class="n">status&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">AutomationExecutionStatus&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span>
&lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">s&amp;#34;SSM Automation execution status: &lt;/span>&lt;span class="si">$status&lt;/span>&lt;span class="s">, check #&lt;/span>&lt;span class="si">$retries&lt;/span>&lt;span class="s">.&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="nc">Thread&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sleep&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">retries&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="mi">2500&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">retries&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="mi">5000&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="mi">15000&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="n">retries&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">_&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span> &lt;span class="n">executionId&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The first one &lt;code>executeAutomation&lt;/code> crafts an execution request and then submits it to AWS, returning its execution ID. This ID can be passed to the &lt;code>waitForAutomationToFinish&lt;/code> function that periodically checks in with AWS until the execution is complete. Between subsequent API requests it uses an increasing timeout to prevent API rate-limiting caused by excessive polling.&lt;/p>
&lt;h2 id="testing-the-utils-module">Testing the utils module&lt;/h2>
&lt;p>Once I had the core functionality ready I wanted to write integration tests to ensure it works as expected. Instead of having hard-coded AWS credentials or an AWS profile for a real account I wanted to use Localstack that mocks the real AWS API so that you can interact with it. For this reason I slightly tweaked the &lt;code>SsmAutomationHelper&lt;/code> class to accept an &lt;strong>Optional&lt;/strong> second argument which can be used while building the SSM API client:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-scala" data-lang="scala">&lt;span class="k">class&lt;/span> &lt;span class="nc">SsmAutomationHelper&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">profile&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">apiEndpoint&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">Option&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">])&lt;/span> &lt;span class="k">extends&lt;/span> &lt;span class="nc">LazyLogging&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">private&lt;/span> &lt;span class="k">val&lt;/span> &lt;span class="n">ssmClient&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="n">apiEndpoint&lt;/span> &lt;span class="k">match&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">case&lt;/span> &lt;span class="nc">None&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span> &lt;span class="nc">SsmClient&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">credentialsProvider&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">ProfileCredentialsProvider&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">profile&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">region&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">Region&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="nc">EU_WEST_1&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">build&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="k">case&lt;/span> &lt;span class="nc">Some&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">localstackEndpoint&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span> &lt;span class="nc">SsmClient&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">credentialsProvider&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">StaticCredentialsProvider&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">AwsBasicCredentials&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;foo&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;bar&amp;#34;&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">endpointOverride&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">URI&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">localstackEndpoint&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">build&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This allowed me to pass &lt;code>http://localhost:4566&lt;/code> when running the integration tests against &lt;strong>localstack&lt;/strong> and have the API calls directed to those mocked endpoints. Previously each mocked service had its own dedicated port, but thanks to a recent change in &lt;strong>localstack&lt;/strong>, now all AWS services can be run on a single port, they call &lt;strong>EDGE&lt;/strong> port.&lt;/p>
&lt;p>According to the documentation, SSM is supported in &lt;strong>localstack&lt;/strong>, however I&amp;rsquo;ve found out that running Automation Documents is feature that is still missing. As a result, I had to run the integration tests against a real AWS account that I set up for such scenarios. I was okay with doing this since there are plenty of built-in Automation Documents provided by AWS that I could safely use for this purpose.&lt;/p>
&lt;p>Eventually I decided to encode in the tests &lt;code>AWS-StartEC2Instance &amp;amp; AWS-StopEC2Instance&lt;/code> which only required me to set up a dummy EC2 instance which would be the target of these requests. I also added a special &lt;strong>Tag&lt;/strong> to these integration tests so that they are excluded from running when invoked via &lt;code>mvn test&lt;/code> but still available to run manually whenever necessary.&lt;/p>
&lt;h2 id="cli-app-implementation">CLI App implementation&lt;/h2>
&lt;p>After running the tests, I was confident that the AWS utils worked correctly, so I started putting together the CLI app. For this, I&amp;rsquo;ve searched on the web for a third party package and found that it&amp;rsquo;s not as simple as it is when using Python&amp;rsquo;s &lt;code>argparse&lt;/code> package. I eventually settled with &lt;code>picocli&lt;/code>, which is written in Java but can also be used from Scala via the below annotations:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-scala" data-lang="scala">&lt;span class="nd">@Command&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">name&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="s">&amp;#34;SsmHelper&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;v0.0.1&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">mixinStandardHelpOptions&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">description&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;CLI app for running automation documents in AWS SSM&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="k">class&lt;/span> &lt;span class="nc">SsmCliParser&lt;/span> &lt;span class="k">extends&lt;/span> &lt;span class="nc">Callable&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">Unit&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="k">with&lt;/span> &lt;span class="nc">LazyLogging&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="nd">@Option&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">names&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;-D&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;--document&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">description&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Name of the SSM Automation document to execute&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="k">private&lt;/span> &lt;span class="k">var&lt;/span> &lt;span class="n">documentName&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="nc">String&lt;/span>
&lt;span class="nd">@Parameters&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">index&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="s">&amp;#34;0..*&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">arity&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="s">&amp;#34;0..*&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">paramLabel&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="s">&amp;#34;&amp;lt;param1=val1&amp;gt; &amp;lt;param2=val2&amp;gt; ...&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">description&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Key=Value parameters to use as Input Params&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="k">private&lt;/span> &lt;span class="k">val&lt;/span> &lt;span class="n">parameters&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">util.ArrayList&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="kc">null&lt;/span>
&lt;span class="o">[&lt;/span>&lt;span class="kt">...&lt;/span>&lt;span class="o">]&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>According to the original idea, there had to be one constant CLI flag which controlled the name of the AWS Automation Document (&lt;code>--document&lt;/code>) and then there had to be a variable number of additional arguments for specifying the Input Parameters required by the given document. The &lt;code>picocli&lt;/code> package supported this workflow via the &lt;strong>@Option&lt;/strong> and the &lt;strong>@Parameters&lt;/strong> annotations.&lt;/p>
&lt;p>The only thing left was a custom function that would carry out the needed transformation of Input Parameters. The values received in the &lt;code>parameters&lt;/code> were in the form of an &lt;strong>ArrayList&lt;/strong>: &lt;code>[&amp;lt;param1=val1&amp;gt;, &amp;lt;param2=val2&amp;gt;, ...]&lt;/code> which had to be transformed into a &lt;strong>Map&lt;/strong>: &lt;code>[param1 -&amp;gt; [val1], param2 -&amp;gt; [val2]]&lt;/code> by splitting each String on the &lt;strong>=&lt;/strong> character. The desired format was a requirement of the AWS SDK for SSM. After some iterations I ended up with the below function that could do this transformation:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-scala" data-lang="scala">&lt;span class="k">private&lt;/span> &lt;span class="k">def&lt;/span> &lt;span class="n">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">util.ArrayList&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">])&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">util.Map&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>, &lt;span class="kt">util.List&lt;/span>&lt;span class="o">[&lt;/span>&lt;span class="kt">String&lt;/span>&lt;span class="o">]]&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">asScala&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="sc">&amp;#39;=&amp;#39;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">collect&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="k">case&lt;/span> &lt;span class="nc">Array&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="k">=&amp;gt;&lt;/span> &lt;span class="n">key&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">value&lt;/span> &lt;span class="o">}&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">groupBy&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_1&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">.&lt;/span>&lt;span class="n">mapValues&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_2&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="n">asJava&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="n">asJava&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Finally, I constructed the below method which utilized the &lt;code>SsmAutomationHelper&lt;/code> class from the utils module and passed the two variables provided by &lt;code>picocli&lt;/code> to it so it would invoke the necessary Automation Document and wait to retrieve its result via the &lt;code>Await&lt;/code> mechanism of Scala:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-scala" data-lang="scala">&lt;span class="k">def&lt;/span> &lt;span class="n">call&lt;/span>&lt;span class="o">()&lt;/span>&lt;span class="k">:&lt;/span> &lt;span class="kt">Unit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">conf&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="nc">ConfigFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load&lt;/span>&lt;span class="o">()&lt;/span>
&lt;span class="k">val&lt;/span> &lt;span class="n">inputParams&lt;/span> &lt;span class="k">=&lt;/span> &lt;span class="n">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="nc">Await&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nc">SsmAutomationHelper&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">newInstance&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">conf&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="n">runDocumentWithParameters&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">documentName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">inputParams&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="mf">10.&lt;/span>&lt;span class="n">minutes&lt;/span>&lt;span class="o">)&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="packaging-the-cli-app">Packaging the CLI app&lt;/h2>
&lt;p>At this point I was ready with the CLI app and wanted to run it to see how it would function. Before I could run it, I needed to figure out how to package it all into a &lt;code>fat&lt;/code> JAR file with all needed dependencies, so that it could be invoked with CLI arguments. I googled around a bit and quickly found the
&lt;a href="https://docs.spring.io/spring-boot/docs/1.5.x/maven-plugin/repackage-mojo.html" target="_blank" rel="noopener">spring-boot-maven-plugin&lt;/a> which has the &lt;code>repackage&lt;/code> goal that&amp;rsquo;s just what I needed:&lt;/p>
&lt;blockquote>
&lt;p>Repackages existing JAR and WAR archives so that they can be executed from the command line using java -jar. With layout=NONE can also be used simply to package a JAR with nested dependencies (and no main class, so not executable).&lt;/p>
&lt;/blockquote>
&lt;p>I only had to add the below lines to my project&amp;rsquo;s &lt;strong>pom.xml&lt;/strong>:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-xml" data-lang="xml">&lt;span class="nt">&amp;lt;plugin&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;groupId&amp;gt;&lt;/span>org.springframework.boot&lt;span class="nt">&amp;lt;/groupId&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;artifactId&amp;gt;&lt;/span>spring-boot-maven-plugin&lt;span class="nt">&amp;lt;/artifactId&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;version&amp;gt;&lt;/span>2.3.2.RELEASE&lt;span class="nt">&amp;lt;/version&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;configuration&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;layout&amp;gt;&lt;/span>JAR&lt;span class="nt">&amp;lt;/layout&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/configuration&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;executions&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;execution&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;goals&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;goal&amp;gt;&lt;/span>repackage&lt;span class="nt">&amp;lt;/goal&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/goals&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/execution&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/executions&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/plugin&amp;gt;&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Next I just had to run the &lt;code>mvn package&lt;/code> command, which invokes the plugin to builds the &lt;code>fat&lt;/code> JAR.&lt;/p>
&lt;h2 id="running-the-cli-app">Running the CLI app&lt;/h2>
&lt;p>Once the JAR is available, it can be used via the &lt;code>java -jar ...&lt;/code> command with extra arguments to run the any Automation Document such as &lt;code>AWS-StartEC2Instance&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-Bash" data-lang="Bash">$ ▶ java -jar ./target/scala-cli-app-1.0.0.jar --document&lt;span class="o">=&lt;/span>AWS-StartEC2Instance &lt;span class="nv">InstanceId&lt;/span>&lt;span class="o">=&lt;/span>i-0ed4574c5ba94c877 &lt;span class="nv">AutomationAssumeRole&lt;/span>&lt;span class="o">=&lt;/span>arn:aws:iam::&lt;span class="o">{{&lt;/span>global:ACCOUNT_ID&lt;span class="o">}}&lt;/span>:role/AutomationServiceRole
15:24:41.998 &lt;span class="o">[&lt;/span>main&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Going to kick off SSM orchestration document: AWS-StartEC2Instance
15:24:42.773 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-29&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Execution id: &amp;lt;...&amp;gt;
15:24:42.882 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-11&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Current status: &lt;span class="o">[&lt;/span>InProgress&lt;span class="o">]&lt;/span>, retry counter: &lt;span class="c1">#0&lt;/span>
&lt;span class="o">[&lt;/span>...&lt;span class="o">]&lt;/span>
15:28:01.226 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-11&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Current status: &lt;span class="o">[&lt;/span>InProgress&lt;span class="o">]&lt;/span>, retry counter: &lt;span class="c1">#21&lt;/span>
15:28:16.442 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-11&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Execution finished with final status: &lt;span class="o">[&lt;/span>Success&lt;span class="o">]&lt;/span>
15:28:16.444 &lt;span class="o">[&lt;/span>main&lt;span class="o">]&lt;/span> INFO com.flrnks.app.SsmCliParser :: SSM execution run took &lt;span class="m">215&lt;/span> seconds
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Seems to be working quite well!&lt;/p>
&lt;h2 id="bonus-running-in-a-container">Bonus: running in a container&lt;/h2>
&lt;p>I thought I would take the above one step further and package the JAR into a java based docker container. This would allow me to forget about the syntax of the java command that I previously used to run the app. Instead, I can hide it in a very minimal Dockerfile:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-dockerfile" data-lang="dockerfile">&lt;span class="k">FROM&lt;/span>&lt;span class="s"> openjdk:8-jdk-alpine&lt;/span>&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">MAINTAINER&lt;/span>&lt;span class="s"> flrnks &amp;lt;flrnks@flrnks.netlify.com&amp;gt;&lt;/span>&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">ADD&lt;/span> target/scala-cli-app-1.0.0.jar /usr/share/backend/app.jar&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">ENTRYPOINT&lt;/span> &lt;span class="p">[&lt;/span> &lt;span class="s2">&amp;#34;/usr/bin/java&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;-jar&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;/usr/share/backend/app.jar&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="err">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>mvn package&lt;/code> command which is used to build the fat JAR will save it into the &lt;strong>/target&lt;/strong> subdirectory, so one can put this Dockerfile into the project&amp;rsquo;s root and then manually build the docker image by running &lt;code>docker build -t ssmcli .&lt;/code>. This will create an image called &lt;strong>ssmcli&lt;/strong> without issues, however I&amp;rsquo;ve found an awesome plugin called &lt;code>dockerfile-maven-plugin&lt;/code> built by
&lt;a href="https://github.com/spotify/dockerfile-maven" target="_blank" rel="noopener">Spotify&lt;/a> which can automagically take this Dockerfile and turn it into an image based on the plugin configuration:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-xml" data-lang="xml">&lt;span class="nt">&amp;lt;plugin&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;groupId&amp;gt;&lt;/span>com.spotify&lt;span class="nt">&amp;lt;/groupId&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;artifactId&amp;gt;&lt;/span>dockerfile-maven-plugin&lt;span class="nt">&amp;lt;/artifactId&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;version&amp;gt;&lt;/span>1.4.10&lt;span class="nt">&amp;lt;/version&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;executions&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;execution&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;id&amp;gt;&lt;/span>default&lt;span class="nt">&amp;lt;/id&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;goals&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;goal&amp;gt;&lt;/span>build&lt;span class="nt">&amp;lt;/goal&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/goals&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;configuration&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;repository&amp;gt;&lt;/span>flrnks/ssmcli&lt;span class="nt">&amp;lt;/repository&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;tag&amp;gt;&lt;/span>latest&lt;span class="nt">&amp;lt;/tag&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/configuration&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/execution&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/executions&amp;gt;&lt;/span>
&lt;span class="nt">&amp;lt;/plugin&amp;gt;&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This plugin hooks into the &lt;code>mvn package&lt;/code> goal and when it&amp;rsquo;s executed it will automatically create the docker image:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-Bash" data-lang="Bash">&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> --- spring-boot-maven-plugin:2.3.2.RELEASE:repackage &lt;span class="o">(&lt;/span>default&lt;span class="o">)&lt;/span> @ scala-cli-app ---
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Layout: JAR
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Replacing main artifact with repackaged archive
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span>
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> --- dockerfile-maven-plugin:1.4.10:build &lt;span class="o">(&lt;/span>default&lt;span class="o">)&lt;/span> @ scala-cli-app ---
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> dockerfile: null
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> contextDirectory: /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Building Docker context /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Path&lt;span class="o">(&lt;/span>dockerfile&lt;span class="o">)&lt;/span>: null
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Path&lt;span class="o">(&lt;/span>contextDirectory&lt;span class="o">)&lt;/span>: /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span>
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Image will be built as flrnks/ssmcli:latest
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Step 1/4 : FROM openjdk:8-jdk-alpine
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Pulling from library/openjdk
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Digest: sha256:94792824df2df33402f201713f932b58cb9de94a0cd524164a0f2283343547b3
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Status: Image is up to date &lt;span class="k">for&lt;/span> openjdk:8-jdk-alpine
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; a3562aa0b991
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Step 2/4 : MAINTAINER flrnks &amp;lt;flrnks@flrnks.netlify.com&amp;gt;
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; Using cache
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; efcc673b4f35
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Step 3/4 : ADD target/scala-cli-app-1.0.0.jar /usr/share/backend/app.jar
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; 8b2cf76f03c2
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Step 4/4 : ENTRYPOINT &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;/usr/bin/java&amp;#34;&lt;/span>, &lt;span class="s2">&amp;#34;-jar&amp;#34;&lt;/span>, &lt;span class="s2">&amp;#34;/usr/share/backend/app.jar&amp;#34;&lt;/span>&lt;span class="o">]&lt;/span>
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; Running in c9633237f9fa
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Removing intermediate container c9633237f9fa
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> ---&amp;gt; 6db69aa30fb1
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Successfully built 6db69aa30fb1
&lt;span class="o">[&lt;/span>INFO&lt;span class="o">]&lt;/span> Successfully tagged flrnks/ssmcli:latest
&lt;/code>&lt;/pre>&lt;/div>&lt;p>To test this new docker image I ran the &lt;code>AWS-StopEC2Instance&lt;/code> Automation Document and specified the same CLI arguments as before, thanks to the &lt;code>ENTRYPOINT&lt;/code> configuration in the Dockerfile. As an extra step I needed to share the AWS profile with the docker container at runtime by using the flag &lt;code>-v ~/.aws:/root/.aws&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-Bash" data-lang="Bash">$ ▶ ddocker run --rm -v ~/.aws:/root/.aws flrnks/ssmcli --document&lt;span class="o">=&lt;/span>AWS-StopEC2Instance &lt;span class="nv">InstanceId&lt;/span>&lt;span class="o">=&lt;/span>i-0ed4574c5ba94c877 &lt;span class="nv">AutomationAssumeRole&lt;/span>&lt;span class="o">=&lt;/span>arn:aws:iam::&lt;span class="o">{{&lt;/span>global:ACCOUNT_ID&lt;span class="o">}}&lt;/span>:role/AutomationServiceRole
17:18:59.541 &lt;span class="o">[&lt;/span>main&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Going to kick off SSM orchestration document: AWS-StopEC2Instance
17:19:00.789 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-13&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Execution id: &amp;lt;...&amp;gt;
17:19:00.966 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-11&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Current status: &lt;span class="o">[&lt;/span>InProgress&lt;span class="o">]&lt;/span>, retry counter: &lt;span class="c1">#0&lt;/span>
17:19:03.564 &lt;span class="o">[&lt;/span>ForkJoinPool-1-worker-11&lt;span class="o">]&lt;/span> INFO c.f.utils.ssm.SsmAutomationHelper :: Execution finished with final status: &lt;span class="o">[&lt;/span>Success&lt;span class="o">]&lt;/span>
17:19:03.568 &lt;span class="o">[&lt;/span>main&lt;span class="o">]&lt;/span> INFO com.flrnks.app.SsmCliParser :: SSM execution run took &lt;span class="m">5&lt;/span> seconds
&lt;/code>&lt;/pre>&lt;/div>&lt;p>One may say that typing that long &lt;code>docker run ...&lt;/code> command above takes longer than typing &lt;code>java -jar ./target/scala-cli-app-1.0.0.jar ...&lt;/code> but I would argue that running it inside a docker container has its valid use-cases as well. It allows for controlled setup of the runtime environment and prevents dependency issues too!&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>This project has allowed me to learn much more than I initially expected. I learnt a lot about Scala, which was the original goal, but I also gained valuable experience with Maven, its plugin ecosystem and of course with Java as well. I hope whoever reads this post will find something useful in it too!&lt;/p></description></item><item><title>Monitoring Flink on AWS EMR</title><link>https://flrnks.netlify.app/post/emr-flink-datadog/</link><pubDate>Sun, 16 Aug 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/emr-flink-datadog/</guid><description>&lt;h2 id="brief-intro">Brief intro&lt;/h2>
&lt;p>This is going to be a somewhat unusual post on this blog. It is about a problem I recently encountered while trying to improve the monitoring of a long-running Flink cluster we have on AWS EMR, following the official
&lt;a href="https://docs.datadoghq.com/integrations/flink/" target="_blank" rel="noopener">documentation&lt;/a> from Datadog.&lt;/p>
&lt;h2 id="the-emr-setup">The EMR setup&lt;/h2>
&lt;p>Our EMR cluster consumes 4 Kinesis Data Streams which are used to send s3 files in AVRO format for processing. When a new file arrives, the Flink job will fetch it from S3, do some validation and filtering and then convert it to ORC format and save it to a new location on s3. In early June we experienced a failure in one of the Flink jobs consuming a production stream. Sadly we did not have adequate monitoring set up to detect this on time. We only learnt about it when we noticed that data in the output bucket was missing for certain dates. Our streams were configured with the maximum retention period of 7 days. By the time we noticed the missing data in the stream was already piling up, and the oldest was close to half of this retention period. By the time we managed to find the root cause and deploy the fix to the Flink job, it was too late, and some data had already expired from the stream.&lt;/p>
&lt;p>The existing monitoring solution was implemented via AWS Lambda functions running every 8 hours. These functions were making Athena queries to check if any data arrived to the S3 bucket during the last 48 hours. The problem with this was approach was that we do not get alerts about missing data for up to 2 days because of the way our query used a sliding window of 2 days.&lt;/p>
&lt;p>The Flink cluster runs in a private VPC, so reaching the Flink Web UI to check the status of the jobs was quite difficult to say the least. We either had to set up an SSH port forwarding session and use a FoxyProxy setup in Firefox, or set up a personal VM the same private VPC via the AWS WorkSpaces managed service and then connect from that VM&amp;rsquo;s browser to the cluster&amp;rsquo;s Flink UI. Either way it was quite cumbersome and still a manual process to connect to the Flink UI to check the cluster health. I wanted an automated way of gathering metrics and alerting if something went wrong, so I looked into how Flink could be monitored by Datadog.&lt;/p>
&lt;h2 id="datadog--flink">Datadog ❤️ Flink&lt;/h2>
&lt;p>A quick Google search threw up the official documentation from Datadog where I found really straightforward instructions on enabling the submission of Flink metrics to Datadog, which could be instantly visualized in their default Flink dashboard. These main steps are:&lt;/p>
&lt;ul>
&lt;li>adding some new parameters to the flink-conf.yaml, such as the Datadog API/APP keys and custom tags&lt;/li>
&lt;li>copying the &lt;code>flink-datadog-metrics.jar&lt;/code> to the active flink installation path&lt;/li>
&lt;/ul>
&lt;p>The first step was quite easy. Our cluster was defined in Cloudformation where we used &lt;code>AWS::EMR::Cluster&lt;/code> which allows specifying the flink-conf.yaml content as below:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="k">Cluster&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>AWS&lt;span class="p">::&lt;/span>EMR&lt;span class="p">::&lt;/span>Cluster&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Properties&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>Flink-Cluster&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Configurations&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="k">Classification&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>flink-conf&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ConfigurationProperties&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.class&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>org.apache.flink.metrics.datadog.DatadogHttpReporter&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.apikey&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;{{resolve:secretsmanager:datadog/api_key:SecretString}}&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.tags&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>name&lt;span class="p">:&lt;/span>flink-cluster&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>app&lt;span class="p">:&lt;/span>flink-cluster&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>region&lt;span class="p">:&lt;/span>eu-central&lt;span class="m">-1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>env&lt;span class="p">:&lt;/span>prod&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>...&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The above CF snippet shows just the 3 most important lines of the &lt;strong>flink-conf.yaml&lt;/strong>: (1) the full package name of the java class which implements the metric submission, (2) the Datadog API key loaded from AWS Secrets Manager and (3) a few custom tags which will be added to metrics sent to Datadog.&lt;/p>
&lt;p>To copy the necessary datadog-metrics JAR where it would be loaded from (&lt;code>/usr/lib/flink/lib&lt;/code>), I added a new &lt;code>AWS::EMR::Step&lt;/code> to in CloudFormation which is executed only on the EMR Master Node in order to activate Datadog monitoring on the cluster via the supplied Java class and API key in the &lt;strong>flink-conf.yaml&lt;/strong>.&lt;/p>
&lt;p>To test that it was working properly I just needed to redeploy the cluster which was surprisingly easy thanks to the Cloudformation setup we had in place. But something was still not right.&lt;/p>
&lt;h2 id="know-your-continent">Know your continent&lt;/h2>
&lt;p>After redeploying the cluster I waited and waited and waited a bit more but metrics were not showing up in the Flink dashboard. So I got in touch with Datadog support who were very helpful in figuring out what the issue was. After a few rounds of emails back and forth we quickly discovered why the metrics were not showing up.&lt;/p>
&lt;p>The reason was that we had our Datadog account set up in the EU region and not in the USA. Thus, all our metrics were supposed to flow to the EU endpoint at &lt;code>app.datadoghq.eu/api/&lt;/code> instead of the USA endpoint at &lt;code>app.datadoghq.com/api/&lt;/code>. The difference is quite subtle, only a simple change in the TLD from &lt;strong>.com&lt;/strong> to &lt;strong>.eu&lt;/strong>. The catch was that our EMR cluster was running Flink 1.9.1 (provided by the EMR release 5.29.0) which had this API endpoint hardcoded, pointing to the USA data centre. The Datadog Support Engineer uncovered some extra
&lt;a href="https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter" target="_blank" rel="noopener">instructions&lt;/a> on how this can be solved by adding an extra line to the &lt;strong>flink-conf.yaml&lt;/strong> to change the default US region to the EU instead:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="k">Cluster&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>AWS&lt;span class="p">::&lt;/span>EMR&lt;span class="p">::&lt;/span>Cluster&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Properties&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>Flink-Cluster&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>...&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">Configurations&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="k">Classification&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>flink-conf&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ConfigurationProperties&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>...&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.class&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>org.apache.flink.metrics.datadog.DatadogHttpReporter&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.apikey&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;{{resolve:secretsmanager:datadog/api_key:SecretString}}&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.tags&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>name&lt;span class="p">:&lt;/span>flink-cluster&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>app&lt;span class="p">:&lt;/span>flink-cluster&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>region&lt;span class="p">:&lt;/span>eu-central&lt;span class="m">-1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>env&lt;span class="p">:&lt;/span>prod&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">metrics.reporter.dghttp.dataCenter&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>EU&lt;span class="w"> &lt;/span>&lt;span class="c"># &amp;lt;&amp;lt; points the metrics reported to the EU region&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>...&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The problem was that this was only available in Flink v1.11.0 while the highest version offered by EMR through the latest EMR Release was only v1.10.0, so this was not going to work for me. I almost gave up on the idea of monitoring Flink via Datadog when I had the idea to clone the official Flink repository from Github and tweak the code in v1.9.1 which we were running to change the hardcoded API endpoint from &lt;strong>.com&lt;/strong> to &lt;strong>.eu&lt;/strong>. It was much easier than I expected, I just needed to tweak this class slightly &lt;code>./src/main/java/org/apache/flink/metrics/datadog/DatadogHttpClient.java&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="cm">/**
&lt;/span>&lt;span class="cm"> * Http client talking to Datadog.
&lt;/span>&lt;span class="cm"> */&lt;/span>
&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">DatadogHttpClient&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="cm">/* Changed endpoint for metric submission to use .eu instead of .com */&lt;/span>
&lt;span class="kd">private&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">SERIES_URL_FORMAT&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s">&amp;#34;https://app.datadoghq.eu/api/v1/series?api_key=%s&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;span class="cm">/* Changed endpoint for API key validation to use .eu instead of .com */&lt;/span>
&lt;span class="kd">private&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">VALIDATE_URL_FORMAT&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s">&amp;#34;https://app.datadoghq.eu/api/v1/validate?api_key=%s&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Once I made the above code changes, I built a new JAR via &lt;code>mvn clean package&lt;/code>. The new JAR was made available at &lt;strong>./flink-metrics/flink-metrics-datadog/target/flink-metrics-datadog-1.9.1.jar&lt;/strong> which I then uploaded to an S3 bucket where we store such files in my team. Next I slightly tweaked the AWS EMR step to load this JAR from S3 redeployed the cluster once more. Finally, metrics started flowing! And it looked so nice, I was especially happy to see the TaskManager heap distribution, because the issue which sparked this whole endeavor was showing symptoms of Heap Memory issues.&lt;/p>
&lt;p>&lt;img src="./images/default-dashboard.png" alt="Default Datadog Flink Dashboard">&lt;/p>
&lt;p>Unfortunately this default dashboard was not perfect, as it had some graphs that were failing to show some data. Maybe it was because of using v1.9.1 of Flink instead of v1.11.0, not sure. In any case, I ended up cloning the dashboard and fixing the graphs manually, while also adding a few extras to show data about the AWS Kinesis streams which were feeding into the Flink cluster.&lt;/p>
&lt;p>&lt;img src="./images/custom-dashboard.jpg" alt="Custom Datadog Flink dashboard">&lt;/p>
&lt;p>Now it shows very nicely the age of each Flink job, which was not visible at all on the default dashboard. The end result is much better in my opinion.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>All in all, I am quite happy with how this whole story turned out in the end. Despite the issue with the hardcoded API endpoints to the USA region in v1.9.1 of Flink, I managed to implement a simple workaround thanks to the Open Source nature of the project. The result is that we have much better visibility and monitoring implemented for our Flink cluster which makes our lives in the DevOps world much better. I did not write much about it in this post, but once these metrics became available in our Datadog account it was trivial to set up a few Monitors which would alert us if for example one of the 4 Flink jobs were failing. I will leave it up to the reader to imagine how that&amp;rsquo;s done.&lt;/p></description></item><item><title>Testing Terraform Modules</title><link>https://flrnks.netlify.app/post/terraform-testing/</link><pubDate>Sun, 12 Jul 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/terraform-testing/</guid><description>&lt;h2 id="intro">Intro&lt;/h2>
&lt;p>I first head of Terraform about 1 year ago while working on an assignment for a job interview. The learning curve was steep, and I still remember how confused I was about the syntax of HCL that resembled JSON but was not exactly the same. I also remember hearing about the concept of Terraform Modules, but for the assignment it was not needed, so I skipped it for the time being.&lt;/p>
&lt;p>Fast forward to present day, I&amp;rsquo;ve had a good amount of exposure to Terraform Modules at work, where we use them to provision resources on AWS in a standardized and rapid fashion. In order to broaden my knowledge on Terraform Modules, I decided to create an exercise in which I created two TF Modules with using version 0.12 of Terraform. In this post I wanted to describe these two Terraform Modules and how I went about testing them to ensure they did what they were meant to.&lt;/p>
&lt;h2 id="what-is-a-terraform-module">What is a Terraform Module&lt;/h2>
&lt;p>According to official
&lt;a href="https://www.terraform.io/docs/configuration/modules.html" target="_blank" rel="noopener">documentation&lt;/a> a Terraform module is simply a container for multiple resources that are defined and used together. Terraform Modules can be embedded in each other to create a hierarchical structure of dependent resources. To define a Terraform Module one needs to create one or more Terraform files that define some input variables, some resources and some outputs. The input variabls are used to control properties of the resources, while the outputs are used to reveal information about the created resources. These are often organized into such structure as follows:&lt;/p>
&lt;ul>
&lt;li>&lt;code>variables.tf&lt;/code> defining the Terraform variables&lt;/li>
&lt;li>&lt;code>main.tf&lt;/code> creating the Terraform resources&lt;/li>
&lt;li>&lt;code>output.tf&lt;/code> listing the Terraform outputs&lt;/li>
&lt;/ul>
&lt;p>Note that the above is just an un-enforced convention, it simply makes it easier to get a quick understanding about a Terraform Module. As an example, if an organization needs to have their AWS S3 buckets secured with the same policies to protect their data, they can embed these security policies in a TF Module and then prescribe its use within the organization to enable those security policies automatically. Next up is an example of just that.&lt;/p>
&lt;h2 id="the-secure-bucket-tf-module">The Secure-Bucket TF Module&lt;/h2>
&lt;p>The first of the 2 Terraform Modules is &lt;code>tf-module-s3-bucket&lt;/code> which can be used to create an S3 bucket in AWS that is secured to a higher degree, so that it may be suitable for storing highly sensitive data. The security features of the bucket consists of:&lt;/p>
&lt;ul>
&lt;li>filtering on Source IPs that can access its contents&lt;/li>
&lt;li>enforcing encryption at rest (KMS) and in transit&lt;/li>
&lt;li>object-level and server access logging enabled&lt;/li>
&lt;li>filtering on IAM principals based on official
&lt;a href="https://aws.amazon.com/blogs/security/how-to-restrict-amazon-s3-bucket-access-to-a-specific-iam-role/" target="_blank" rel="noopener">docs&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>When using this module, one can define a list of IPs, and a list of IAM Principals to control who and from which networks can access the contents of the bucket. These restrictions are written into the Bucket Policy, which is considered a &lt;code>resource-based policy&lt;/code> that always takes precendence over Identity based policies, so it does not matter if an IAM Role has specific permission granted to access the bucket, if the bucket&amp;rsquo;s own Bucket Policy denies the same access. Below is a good overview of the whole evaluation logic of AWS IAM:&lt;/p>
&lt;p>&lt;img src="static/aws-iam.png" alt="AWS IAM Evaluation Logic">&lt;/p>
&lt;p>In addition, server-access and object-level logging can be enabled as well to improve the bucket&amp;rsquo;s level of auditability. Altogether, these settings can greatly elevate the security of data in the S3 bucket that was created by this module.&lt;/p>
&lt;h2 id="the-s3-authz-tf-module">The S3-AuthZ TF Module&lt;/h2>
&lt;p>This 2nd Terraform Module is called &lt;code>tf-module-s3-auth&lt;/code> and it was written to in part to complement the other one used to create an S3 bucket. The aim of this module is to help with the creation of a single IAM policy that can cover the S3 and KMS permissions needed for a given IAM Principal. The motivation behind this module comes from some difficulties I&amp;rsquo;ve faced at work which meant that some IAM Roles we used had too many policies attached. For further reference see the AWS
&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html" target="_blank" rel="noopener">docs&lt;/a> on this.&lt;/p>
&lt;p>The Bucket Policy that is crafted by the first TF Module allows the definition of list of IAM Principals that are allowed to interact with the bucket. With this TF module one can actually define the particular S3 actions that those IAM Principals CAN carry out on the data in the bucket. Additionally, this TF module can also be used allow KMS actions on the KMS keys that are protecting the data at rest in the bucket.&lt;/p>
&lt;h2 id="untested-code-is-broken-code">Untested code is broken code&lt;/h2>
&lt;p>With infrastructure-as-code, just as with normal code, testing is often an afterthought. However, it seems to be catching on more and more nowadays. Nothing shows this better than the amount of search results in Google for &lt;code>Infrastructure as Code testing&lt;/code>: &lt;strong>235.000.000&lt;/strong> as of today (15.8.2020). While Infrastructure as Code is a much broader topic with many other interesting projects, this post will have a sole focus on Terraform. With Terraform, a good step in the right direction is as simple as running &lt;code>terraform validate&lt;/code> that can catch silly mistakes and syntax errors and provide feedback such as below:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">Error: Missing required argument
on main.tf line 107, in output &lt;span class="s2">&amp;#34;s3_bucket_name&amp;#34;&lt;/span>:
107: output &lt;span class="s2">&amp;#34;s3_bucket_name&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
The argument &lt;span class="s2">&amp;#34;value&amp;#34;&lt;/span> is required, but no definition was found.
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In addition to the &lt;code>terraform validate&lt;/code> option, many IDEs such as IntelliJ, already have plugins that can alert to such issues, so I find myself not using it so often. However, it&amp;rsquo;s still nice to have this feature built into the &lt;code>terraform&lt;/code> executable!&lt;/p>
&lt;p>Once all syntax errors are fixed, the next stage of testing can continue with the &lt;code>terraform plan&lt;/code> command. This command uses &lt;strong>terraform state&lt;/strong> information (local or remote) to figure out what changes are needed if the configuration is applied. This is truly very useful in showing in advance what will be created or destroyed. However, a successful &lt;code>terraform plan&lt;/code> can still result in a failed deployment because some constraints cannot be verified without making the actual API calls to the Cloud Service Provider. The &lt;code>terraform plan&lt;/code> command does not make any actual API calls, it only computes the difference that exist between the Terraform Code vs. the Terraform State (local or remote). The failures are usually very provider specific.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">data &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;Deny-Non-CiscoCidr-S3-Access&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
statement &lt;span class="o">{&lt;/span>
&lt;span class="nv">sid&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Deny-All-S3-Actions-If-Not-In-IP-PrefixList&amp;#34;&lt;/span>
&lt;span class="nv">effect&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Deny&amp;#34;&lt;/span>
&lt;span class="nv">actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;s3:*&amp;#34;&lt;/span> &lt;span class="o">]&lt;/span>
&lt;span class="nv">resources&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;*&amp;#34;&lt;/span> &lt;span class="o">]&lt;/span>
condition &lt;span class="o">{&lt;/span>
&lt;span class="nb">test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;NotIpAddress&amp;#34;&lt;/span>
&lt;span class="nv">variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;aws:SourceIp&amp;#34;&lt;/span>
&lt;span class="nv">values&lt;/span> &lt;span class="o">=&lt;/span> local.ip_prefix_list
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This Terraform Code is syntactically correct nd passes the &lt;code>terraform validate&lt;/code>, and &lt;code>terraform plan&lt;/code> produces a valid plan. However, it still fails at the &lt;code>terraform apply&lt;/code> stage because AWS has a restriction on the &lt;code>sid&lt;/code>: &lt;strong>For IAM policies, basic alphanumeric characters (A-Z,a-z,0-9) are the only allowed characters in the Sid value&lt;/strong>. This constraint is never checked before &lt;code>terraform apply&lt;/code> is called, at which point it is going to fail the whole action with the below error:&lt;/p>
&lt;pre>&lt;code>An error occurred: Statement IDs (SID) must be alpha-numeric. Check that your input satisfies the regular expression [0-9A-Za-z]*
&lt;/code>&lt;/pre>&lt;p>Such types of errors can only be caught when making real API calls to the Cloud Service Provider (or to a truly identical mock of the real API) which will validate the calls and return errors if any are found. Next I will go into some details on how I went about testing the 2 Terraform Modules I wrote.&lt;/p>
&lt;h3 id="manual-testing-via-aws">Manual Testing via AWS&lt;/h3>
&lt;p>This most rudimentary form of testing can be done by setting up a real project that imports and uses the two Terraform modules. This test can be found in my repository&amp;rsquo;s &lt;code>test/terraform/aws/&lt;/code> directory. For this to work properly the AWS provider has to be set up with real credentials, which is beyond the scope of this post. I also opted to use S3 as TF state backend storage but this is optional, it can just ass well store the state locally in a &lt;code>.tfstate&lt;/code> file.&lt;/p>
&lt;p>First, terraform has to be initialized which will trigger the download of the AWS Terraform Provider via &lt;code>terraform init&lt;/code>. Next, the changes can be planned and applied via &lt;code>terraform plan &amp;amp; apply&lt;/code> respectively. It&amp;rsquo;s interesting to note that a complete &lt;code>terraform apply&lt;/code> takes close to 1 minute to complete:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">Apply complete! Resources: &lt;span class="m">7&lt;/span> added, &lt;span class="m">0&lt;/span> changed, &lt;span class="m">0&lt;/span> destroyed.
Outputs: &lt;span class="o">[&lt;/span>...&lt;span class="o">]&lt;/span>
real 0m49.090s
user 0m3.532s
sys 0m1.929s
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Once the &lt;code>terraform apply&lt;/code> is complete one can make manual assertions whether it went as expected based on the outputs (if any) and by manually inspecting the resources that were created. While this can be good enough for new setups, it may be not so good when an already deployed project has to be modified and one needs to make sure the changes will not have any undesired side effects.&lt;/p>
&lt;h3 id="manual-testing-via-localstack">Manual Testing via localstack&lt;/h3>
&lt;p>In order to save time (and some costs), one may also consider using &lt;strong>localstack&lt;/strong> which replicates most of the AWS API and its features to enable faster and easier development and testing. It&amp;rsquo;s important to note that it only works if one is an AWS customer. In an earlier
&lt;a href="https://flrnks.netlify.app/post/python-aws-datadog-testing/" target="_blank" rel="noopener">post&lt;/a> I&amp;rsquo;ve already written on how to set it up, so I will not repeat it here. The most important thing is to enable S3, IAM and KMS services in the
&lt;a href="https://github.com/florianakos/terraform-testing/blob/master/test/terraform/localstack/docker-compose.yml" target="_blank" rel="noopener">docker-compose.yaml&lt;/a> by setting this environment variable: &lt;code>SERVICES=s3,kms,iam&lt;/code> so the corresponding API endpoints are turned on.&lt;/p>
&lt;p>The Terraform files I wrote for testing with on real AWS can be re-used for testing with localstack with some tweaks, for more detail look to &lt;code>test/terraform/localstack/&lt;/code> folder in my repository. Then it&amp;rsquo;s just a matter of running &lt;code>terraform init&lt;/code> followed by a &lt;code>terraform plan &amp;amp; apply&lt;/code> to create the fake resources in Localstack.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">Apply complete! Resources: &lt;span class="m">7&lt;/span> added, &lt;span class="m">0&lt;/span> changed, &lt;span class="m">0&lt;/span> destroyed.
Outputs: &lt;span class="o">[&lt;/span> ... &lt;span class="o">]&lt;/span>
real 0m11.649s
user 0m3.589s
sys 0m1.580s
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Notice that this time the &lt;code>terraform apply&lt;/code> took only about 10 seconds, which is considerably faster than using the real AWS API.&lt;/p>
&lt;h3 id="automating-tests-via-terratest">Automating tests via Terratest&lt;/h3>
&lt;p>As I&amp;rsquo;ve shown, running tests via Localstack can be much faster on average, but sometimes a project may require the use of some AWS services that are not supported by Localstack. In this case it becomes necessary to run tests against the real AWS API. For such situations I recommend &lt;code>terratest&lt;/code> from
&lt;a href="https://terratest.gruntwork.io/" target="_blank" rel="noopener">Gruntwork.io&lt;/a>, which is a Go library that provides capabilities to automate tests.&lt;/p>
&lt;p>It still requires a terraform project to be set up, as described in &lt;code>Manual Testing via AWS&lt;/code>, however having the ability to formally define and verify tests can greatly increase the confidence that the code being tested will function the way it&amp;rsquo;s supposed to. In the test I implemented some assertions on the output values of the &lt;code>terraform apply&lt;/code> as well as about the existence of the S3 bucket just created. In addition, the Go library also provides ways to verify the AWS infrastructure setup, by making HTTP calls or SSH connections. This can be a pretty powerful tool.&lt;/p>
&lt;p>This &lt;code>terratest&lt;/code> setup can be found in my repo under
&lt;a href="https://github.com/florianakos/terraform-testing/blob/master/test/go/terraform_test.go" target="_blank" rel="noopener">test/go/terraform_test.go&lt;/a>.&lt;/p>
&lt;p>Running this test takes considerably longer than either of the two previous ones, but the advantage is that this can be easily automated and integrated into a CI/CD build where it can verify on-demand that the TF code still works as intended, even if there were some changes.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">▶ go &lt;span class="nb">test&lt;/span>
TestTerraform 2020-08-09T21:46:22+02:00 logger.go:66: Terraform has been successfully initialized!
...
TestTerraform 2020-08-09T21:47:30+02:00 logger.go:66: Apply complete! Resources: &lt;span class="m">7&lt;/span> added, &lt;span class="m">0&lt;/span> changed, &lt;span class="m">0&lt;/span> destroyed.
...
TestTerraform 2020-08-09T21:48:08+02:00 logger.go:66: Destroy complete! Resources: &lt;span class="m">7&lt;/span> destroyed.
...
PASS
ok github.com/florianakos/terraform-testing/tests 116.670s
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The basic idea of &lt;code>terratest&lt;/code> is to automate the process or creation and cleanup of resources for the purposes of tests. To avoid name clashes with existing AWS resources, it&amp;rsquo;s a good practice to append some random strings to resource names as part of the test, so they are not going to fail due to unique name constraints.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>In this post I have shown what options are available for testing a Terraform Module in local or remote settings. If one only works with AWS services then Localstack can be a great tool for quick local tests during development, while &lt;strong>terratest&lt;/strong> from Gruntwork can be a great help with codifying and automating such tests that run against the real AWS Cloud from your favourite CI/CD setup.&lt;/p></description></item><item><title>Defensible Security Architecture</title><link>https://flrnks.netlify.app/post/sans-sec530/</link><pubDate>Wed, 22 Apr 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/sans-sec530/</guid><description>&lt;p>In this post I wanted to write about my experience with #SEC530 which is a SANS course that I took in March during the
&lt;a href="https://www.sans.org/event/prague-march-2020/" target="_blank" rel="noopener">SANS Prague&lt;/a> event. Not long ago I wrote another
&lt;a href="https://flrnks.netlify.app/post/sans-netwars/">post&lt;/a> about my experience with NetWars in March, now I wanted to write about the infosec course that started it all.&lt;/p>
&lt;h2 id="defensible-security-architecture---sec530">Defensible Security Architecture - SEC530&lt;/h2>
&lt;p>Initially I was hesitant to register for an advanced level SANS course (5xx in the code). As I had no previous experience with SANS I did not know if an advanced infosec course would be too difficult for me. Luckily, I found a GIAC assessment exam online called &lt;strong>SANS Cybertalent Assessment Exam&lt;/strong>, which I took for free and eventually passed with a score of 93.33%. This made me confident in registering for #SEC530, as the assessment results stated:&lt;/p>
&lt;p>&lt;em>&amp;ldquo;Examinees who score in this range have demonstrated reliable knowledge in core information security principles [&amp;hellip;] they are typically ready for advanced security training&amp;rdquo;&lt;/em>.&lt;/p>
&lt;p>&lt;img src="cybertalent.png" alt="cyber-talent-assessment">&lt;/p>
&lt;h2 id="course-experience">Course Experience&lt;/h2>
&lt;h3 id="day-1">Day 1&lt;/h3>
&lt;p>The course was taking place at a hotel in Prague 5, about 10 mins walk from my flat, so I was quite happy about the venue. It was a nice hotel with plenty of room for my course and the other ones that were running in parallel with a dozen or so attendees each:&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://www.sans.org/event/prague-march-2020/course/security-essentials-bootcamp-style" target="_blank" rel="noopener">#SEC401&lt;/a> - &lt;strong>Security Essentials Bootcamp Style&lt;/strong>&lt;/li>
&lt;li>
&lt;a href="https://www.sans.org/event/prague-march-2020/course/hacker-techniques-exploits-incident-handling" target="_blank" rel="noopener">#SEC504&lt;/a> - &lt;strong>Hacker Tools, Techniques, Exploits and Incident Handling&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>Some colleagues were taking #SEC504, I was alone from my workplace in taking #SEC530. This was nice because knowing nobody in my class forced me to get to know them, and they all turned out to be interesting people! This was also a good opportunity to start recruiting team mates for the NetWars challenge on Day 6!&lt;/p>
&lt;p>Our instructor was Mr.
&lt;a href="https://www.sans.org/instructors/ryan-nicholson" target="_blank" rel="noopener">Ryan Nicholson&lt;/a> from the United States with an interesting career path that led him to become a SANS Instructor. He used to be a Network Administrator in the past and made lots of references to Cisco networking equipment which made me quite nostalgic from time to time &amp;hellip; 😊&lt;/p>
&lt;p>Eventually the course kicked off and the first day&amp;rsquo;s goal was to get an overview of Defensible Security Architecture. We discussed the downsides of traditional approach to security and architecture, and how the defensible approach may improve the situation. We were given a recommended reading by Richard Bejtlich titled &lt;strong>The Tao of Network Security Monitoring&lt;/strong>, in which there is a really neat definition: &lt;strong>architecture that encourages, rather than frustrates, digital self-defence&lt;/strong>.&lt;/p>
&lt;p>The rest of the day we discussed many interesting topics, including the Layer 2 security that led to a discovery about the WLAN at the hotel: &lt;strong>station isolation&lt;/strong> was not enabled! This wouldn&amp;rsquo;t be a huge deal normally, but then we became aware of some fellow SANS students in the adjacent room taking the #SEC504 which is a red-team course that has topics such as penetration testing. This inspired me to take some actions as a blue-teamer, which I hoped would earn me the infamous Red coin for #SEC530&amp;hellip; More on this later in the &lt;code>Blue Team Project&lt;/code> section.&lt;/p>
&lt;h3 id="day-2">Day 2&lt;/h3>
&lt;p>After an interesting first day, we dived right-in to the material on the 2nd day titled: &lt;strong>Network Security Architecture and Engineering&lt;/strong>. This day taught me many interesting topics of L3 security, and provided some interesting lab exercises as well. Most interesting to me was the lab on the config auditing tool called &lt;code>nipper-ng&lt;/code> that can parse Cisco router/switch config files for security issues and provide actionable recommendations. This surely would have been a nice tool to have back when I worked as a Network Administrator.&lt;/p>
&lt;h3 id="day-3">Day 3&lt;/h3>
&lt;p>We continued with the material on the third day with &lt;strong>Network-Centric Security&lt;/strong> with a bunch of different topics on the menu, such as Next Generation Firewalls (NGFW), &lt;strong>Network Security Monitoring&lt;/strong> (NSM) and Secure Remote Access, just to name a few. Probably the most interesting topic for me was NSM that involves the passive capture (in- or out-of-band) and analysis of network / flow metadata. This gave me some good ideas for the &lt;code>Blue Team Project&lt;/code> described in a later section.&lt;/p>
&lt;p>After our lunch break, just before we resumed class, someone from the SANS support team came to our classroom and informed us that they decided to convert the class to remote/virtual mode of operation for the rest of the week, as a safety measure against the COVID-19 pandemic. Although it was quite frustrating to me at the time, I now totally agree with their approach to handling this safety concern. Eventually they did an excellent job of converting the class to run via the virtual CyberCast platform on such short notice!&lt;/p>
&lt;h3 id="day-4">Day 4&lt;/h3>
&lt;p>So on the morning o day four, I did not go to the nearby hotel where the first three days were held, instead I just logged in to my SANS account and accessed the CyberCast session where we continued the course. The teaching duty was split between two new remote instructors from the USA: for the first half of the day we had Mr.
&lt;a href="https://www.sans.org/instructors/greg-scheidel" target="_blank" rel="noopener">Greg Scheidel&lt;/a>, in the afternoon Mr.
&lt;a href="https://www.sans.org/instructors/ismael-valenzuela" target="_blank" rel="noopener">Ismael Valenzuela&lt;/a> took over to finish the rest of the material planned for the day.&lt;/p>
&lt;p>The main theme was &lt;strong>Data Centric Security&lt;/strong> which included topics such as Web Application Firewalls, Data Loss Prevention and some discussions on Cloud Security and containerisation technologies. This last topic was particularly interesting to me, because I had been learning about Docker prior to the SANS training and I had not really considered it from a security point of view before.&lt;/p>
&lt;h3 id="day-5">Day 5&lt;/h3>
&lt;p>This fifth day was dedicated to &lt;strong>Zero Trust Security Architecture&lt;/strong>, which was quite a new and interesting concept to me. During the first half of the day we covered the basic principles of Zero Trust (everything is hostile, verify before establishing trust) and how certain techniques such as mutual authentication can help improve security. The second half of the day with Ismael included some interesting topics such as Security Information and Event Management systems (SIEMs) which are indispensable tools for Security Operations Centres (SOC). This section also proved to have some very valuable lab exercises for the NetWars challenge the following day.&lt;/p>
&lt;h3 id="day-6---netwars">Day 6 - NetWars&lt;/h3>
&lt;p>This final day was dedicated to the DTF-style &lt;strong>NetWars Challenge&lt;/strong> that ran for about 6 hours. Three teams were formed amongst the class participants who competed against each other and agains the clock to solve the challenge questions that were testing our concepts taught during the course. I have to say I genuinely enjoyed every second of it. Our team was leading the scoreboard all the way until the very end, when we got kicked down to the 2nd place because we rushed to be the first and incurred some penalty for incorrect answers. Regardless of the final result, it was a very valuable experience with tons of fun and learning. For our efforts that got us the 2nd place, were rewarded with the much coveted blue coin of #SEC530 which &lt;del>will hopefully arrive by FedEx soon&lt;/del> has arrived to me in Prague via FedEx finally &amp;hellip; :)&lt;/p>
&lt;p>&lt;img src="blue-coin.png" alt="bluec-coin">&lt;/p>
&lt;h3 id="blue-team-project">Blue Team Project&lt;/h3>
&lt;p>As I previously mentioned, on the first day we discovered that all attendees of the SANS venue will be sharing a WLAN network without &lt;strong>station isolation&lt;/strong> and this was making me somewhat uncomfortable. Some years ago in a university course I had done some simple attacks using MITM technique on shared LAN networks, so I knew that it was not too difficult to steal credentials or do other kinds of malicious attacks when the attacker didn&amp;rsquo;t even have to crack the wifi password to be able to join the shared WLAN.&lt;/p>
&lt;p>Later I was wondering that perhaps the WLAN isolation feature was disabled on purpose so that the red-team students in the adjacent room could practice using some of the typical penetration testing tools. Regardless, this vulnerability enabled by the lack of WLAN isolation gave me the idea to implement some kind of defence system that can monitor and/or if possible alert me to any seemingly malicious attempts targeting my machine.&lt;/p>
&lt;p>My first idea was to run a packet capture on my Host OS via Wireshark, but of course that would have been very difficult to manage and quite likely not so effective! I would&amp;rsquo;ve had to keep an eye on it constantly and check for suspicious packets manually using some filters.&lt;/p>
&lt;p>Instead, I got some inspiration from one of the lab exercises with the ELK stack where we had to look for some suspicious log entries from various sources of security telemetry. I decided to set up a similar set of services to run non-stop on my #SEC530 virtual machine. To provide the network metadata I needed, I decided to install
&lt;a href="https://www.elastic.co/beats/packetbeat" target="_blank" rel="noopener">PacketBeat&lt;/a> and configured it to extract and forward &lt;strong>netflow&lt;/strong> data to the ELK stack. This way I could obtain the necessary visibility into the network activity on my Virtual Machine, without the need to do full packet capture using WireShark!&lt;/p>
&lt;p>With the below steps one can run the ELK stack via docker-compose in the #SEC530 VM:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="c1"># ELK stack setup&lt;/span>
mkdir monitor &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nb">cd&lt;/span> monitor
cp /labs/1.3/docker-compose.yml ./
sed -i &lt;span class="s1">&amp;#39;17,18 s/^/#/&amp;#39;&lt;/span> docker-compose.yml &lt;span class="c1">#comment out some volumes not needed&lt;/span>
sed -i &lt;span class="s1">&amp;#39;s/lab13es/elastic_search/g&amp;#39;&lt;/span> docker-compose.yml
sed -i &lt;span class="s1">&amp;#39;s/kibana13/kibana_dashboard/g&amp;#39;&lt;/span> docker-compose.yml
docker container prune -f
docker-compose up
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Next I set installed and configured the OSS version of PacketBeat:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="c1"># PacketBeat setup&lt;/span>
curl -L -O https://artifacts.elastic.co/downloads/beats/packetbeat/packetbeat-oss-7.6.1-amd64.deb
sudo dpkg -i packetbeat-oss-7.6.1-amd64.deb
&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;setup.dashboards.enabled: true&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> sudo tee -a /etc/packetbeat/packetbeat.yml
sudo packetbeat setup --dashboards
sudo service packetbeat start
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, one can test if it&amp;rsquo;s working by generating some network traffic from the VM which should then appear in the Kibana dashboard at &lt;code>http://localhost:5601/app/kibana&lt;/code>.&lt;/p>
&lt;p>&lt;img src="kibana.png" alt="kibana">&lt;/p>
&lt;p>At this point, it becomes possible to observe malicious hacking attempts by focusing on IP addresses from my local IP subnet&amp;hellip; But I was not yet fully satisfied and wanted to take it a bit further.&lt;/p>
&lt;h3 id="blue-team-project---next-level">Blue Team Project - Next Level&lt;/h3>
&lt;p>It was quite nice to see &lt;strong>netflow&lt;/strong> data being exported to the ELK stack in the previous setup, however I was a bit disappointed with the Kibana dashboards that were set up by PacketBeat. Some were completely dysfunctional due to some syntax errors I could not figure out how to fix.&lt;/p>
&lt;p>I spent quite a long time looking for a fix to the Kibana dashboard issues, but eventually I ended up swapping my ELK &amp;amp; PacketBeat setup for a more advanced set of Tools:
&lt;a href="https://securityonion.net/" target="_blank" rel="noopener">The Security Onion&lt;/a>! Turns out that it also uses docker to run the ELK stack behind the scenes. In addition, it includes some tools such as &lt;strong>Zeek/Bro&lt;/strong>, &lt;strong>Suricata/Snort&lt;/strong> right out of the box, that we also covered in the course. So cool!&lt;/p>
&lt;p>Setting it all up on the #SEC530 VM was a bit more lengthy than my previous setup. First I had to add some additional juice to the underlying VM (4 CPUs and min 8GB of RAM) which then I followed up with the below installation steps on a fresh clone of the #SEC530 VM:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Set the VM NIC mode to bridge (Autodetect) (in VMWare Fusion)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Boot the VM, log in and change the settings in &lt;strong>Software &amp;amp; Updates&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>on &lt;strong>Ubuntu Software&lt;/strong> tab check all options except &lt;strong>restricted software&lt;/strong>&lt;/li>
&lt;li>on &lt;strong>Updates&lt;/strong> tab select the first two options&lt;/li>
&lt;li>click &lt;strong>Close&lt;/strong> and then click &lt;strong>Reload&lt;/strong> to latest updates&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Next run these steps in the Terminal (adopted from
&lt;a href="https://securityonion.readthedocs.io/en/latest/installing-on-ubuntu.html" target="_blank" rel="noopener">here&lt;/a>):&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;debconf debconf/frontend select noninteractive&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> sudo debconf-set-selections
sudo rm -rf /var/lib/apt/lists/*
sudo apt-get update
sudo apt-get -y install software-properties-common
sudo add-apt-repository -y ppa:securityonion/stable
sudo apt-get update
sudo apt-get -y -f -o Dpkg::Options::&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;--force-overwrite&amp;#34;&lt;/span> install securityonion-all securityonion-onionsalt securityonion-suricata syslog-ng-core
&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;p>The above steps install necessary dependencies and then create a desktop shortcut called &lt;strong>Setup&lt;/strong> with the Security Onion icon. Double-click it to continue the install process (alternatively issue &lt;code>sudo sosetup&lt;/code> in Terminal):&lt;/p>
&lt;ul>
&lt;li>chose to reconfigure the network interfaces (with DHCP)&lt;/li>
&lt;li>accept the necessary reboot now&lt;/li>
&lt;li>trigger the Setup process again to finish the installation&lt;/li>
&lt;li>chose &lt;strong>Evaluation Mode&lt;/strong> when it asks this question&lt;/li>
&lt;li>set up default username/password used to secure the various dashboards&lt;/li>
&lt;/ul>
&lt;p>Once the setup finishes, it takes a few minutes, it will show several additional popup windows with useful information about the Security Onion&amp;rsquo;s functions, while also several new desktop icons will appear:&lt;/p>
&lt;p>&lt;img src="setup-done.png" alt="install-onion">&lt;/p>
&lt;p>At this point, the setup is complete and you can see the installed services by clicking on the new icons on the Desktop. Most interesting to me was the &lt;strong>Kibana dashboard&lt;/strong> which comes pre-loaded with some amazing features out of the box:&lt;/p>
&lt;p>&lt;img src="kibana-onion.png" alt="kibana-onion">&lt;/p>
&lt;p>This really seems like an awesome set of features that can detect malicious attacks much better than my first setup with &lt;strong>ELK &amp;amp; Packetbeat&lt;/strong>. This is exactly what I was looking for, when I was on that shared WLAN, some advanced visibility into network metadata. I&amp;rsquo;m glad I did not have to implement it by hand after all &amp;hellip; :)&lt;/p>
&lt;h3 id="blue-team-project---next-next-level">Blue Team Project - Next Next Level&lt;/h3>
&lt;p>While looking around on the net for possible solutions to my issues, I stumbled upon this project from
&lt;a href="https://github.com/dtag-dev-sec/tpotce/tree/master/docker" target="_blank" rel="noopener">Telekom Security&lt;/a>&amp;lsquo;s GitHub page, which seemed like an even more advanced version of the Security Onion with various types of built-in honeypots that feed information to a Kibana dashboard. Sadly however, this is not possible to set up on the #SEC530 VM because the built-in installer does not support Xubuntu 16.04 and there were so many moving parts to the project that I did not dare to do it all by hand. For now I just keep it here as a reference, maybe in a future post I will describe it in more detail!&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>As I already mentioned, this was my first SANS training and I could not be happier about the whole experience, despite the unfortunate situation with the global pandemic disrupting our onsite course. While I was initially a bit worried about the lack of &lt;code>station isolation&lt;/code> on the shared WLAN, I really enjoyed digging around the Internet for a solution to earn my some peace of mind. The knowledge and new skills I acquired in the domain of Defensible Security Architecture have been quite overwhelming to say the least.&lt;/p>
&lt;p>I also enjoyed building new connections with the people who run these trainings and with my fellow SANS alumni. Taking part in the NetWars events that followed in March and April, I felt good to be part of such an incredible community.&lt;/p></description></item><item><title>SANS NetWars in March</title><link>https://flrnks.netlify.app/post/sans-netwars/</link><pubDate>Sat, 04 Apr 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/sans-netwars/</guid><description>&lt;p>This past month of March was quite eventful, to say the least, with all the news of this pandemic shaking many different segments of our globalised society. It&amp;rsquo;s virtually impossible to escape the constant flow of news in the media. While March was practically defined by the continuously evolving story of the virus, I wanted to write a new blog post about a different topic that also greatly impacted this month for me: a live SANS course I took attended in Prague and some online CTF challenges organised by SANS and the Counter Hack team.&lt;/p>
&lt;h2 id="sans-prague-march-2020">SANS Prague March 2020&lt;/h2>
&lt;p>I still remember how excited I was when I learnt that my employer will sponsor my attendance a 6 days long
&lt;a href="https://www.sans.org/event/prague-march-2020" target="_blank" rel="noopener">SANS&lt;/a> course in March, taking place in the city where I live and work currently. I was eagerly looking forward to it, taking place between 9th and 14th of March.&lt;/p>
&lt;p>&lt;img src="sans_prague.jpg" alt="SANS-Prague">&lt;/p>
&lt;p>The course was arranged in a very nice hotel in Prague 5 district, and we were hosted by a very friendly SANS staff that included some world-class teachers. I really liked how well they organised everything and tried to spoil us with good food. There were actually several courses running in parallel, my course, the
&lt;a href="https://www.sans.org/course/defensible-security-architecture-and-engineering" target="_blank" rel="noopener">SEC530&lt;/a>, a.k.a &lt;strong>Defensible Security Architecture and Engineering&lt;/strong>, was taught by Ryan Nicholson, who did a great job during the first 3 days.&lt;/p>
&lt;p>Sadly however, on Wednesday (11th of March) we were instructed to go home due to the growing risk of contacting the COVID-19 virus. All was not lost, because the SANS team did their best to convert the whole class to an online CyberCast while the course was in progress. So from the next day onward, we continued remotely with new instructors, who jumped in, while Ryan was on his way back to the States. Initially we thought he would continue hosting the CyberCast from his hotel room, but eventually we got to know two new SANS instructors, Greg Scheidel and Ismael Valenzuela, who took turns teaching the rest of the course material and then hosting the NetWars event for us.&lt;/p>
&lt;h2 id="sans-netwars">SANS NetWars&lt;/h2>
&lt;p>While the raw educational content of Sec530 was great, I most enjoyed the last day of the course when we got to take part in a private NetWars challenge hosted just for the participants of the course, which was about 10-15 people. I had some initial ideas about what NetWars was all about, thanks to numerous cleverly placed banners in Holiday Hack Challenges from previous years, I never actually got to participate in one before so it was a completely new experience for me. And I was immediately loving it so much, that when it was over I knew I wanted more!&lt;/p>
&lt;!-- ![SEC530-Coin](https://pbs.twimg.com/media/D9g4yNrWwAE8H8h?format=jpg&amp;name=4096x4096) -->
&lt;p>So you can imagine how excited I was when I learnt that SANS was going to offer a bunch of
&lt;a href="https://www.sans.org/blog/and-now-for-something-awesome-sans-launches-new-series-of-worldwide-capture-the-flag-cyber-events/" target="_blank" rel="noopener">free NetWars events&lt;/a> for SANS alumni, with some special events open to the whole world to take part in! First one was a two-day Core NetWars Tournament, first of its kind, organised completely online via CyberCast from 19th to 20th of March. Due to timezone differences, it lasted until 2 am on both days, but I loved every second of it! While I had no high hopes of winning, I was surprised how well I did, eventually finishing as 12th amongst the first time NetWars players.&lt;/p>
&lt;!-- ![Core-NetWars](core-netwars.jpg) -->
&lt;p>Next up was the Mini NetWars Mission 1, also first of its kind, from 2nd till 3rd of April. This was a bit different from Core NetWars, as we did not have to solve the challenges in a virtualised OS environment, instead we relied solely on the browser, very similar to how the Holiday Hack environment works, which was already quite familiar to me!&lt;/p>
&lt;p>This time many more people signed up, as registration was not limited to just SANS alumni but open to the public. Eventually we were more than 500 people competing! This time I managed to solve all of the objectives and obtained the maximum score of 92 which qualified me as a
&lt;a href="https://www.counterhackchallenges.com/winners" target="_blank" rel="noopener">winner&lt;/a>. My final placement on the ranking was somewhere around 50th, as I took a number of hints and was a bit slower than others. Nevertheless, I was still amazed by how far I have come. By the way, this is my battle station setup, which won me some cool SANS swag on
&lt;a href="https://twitter.com/SANSInstitute/status/1246150677602226176" target="_blank" rel="noopener">Twitter&lt;/a> :)&lt;/p>
&lt;p>&lt;img src="mini-netwars.jpg" alt="Mini-NetWars">&lt;/p>
&lt;h2 id="conclusion">CONCLUSION&lt;/h2>
&lt;p>All in all, I cannot thanks SANS enough for hosting these alumni NetWars events, some completely free for the whole cyber security community. I am probably not alone in feeling that they did an amazing service to us all, who are probably stuck at home due to social distancing and quarantine measures implemented world wide. This month for me was surely made a bit special, so big thanks to SANS and the Counter Hack team for all that their efforts!&lt;/p>
&lt;p>&lt;strong>P.S.:&lt;/strong>: A very very very cool Spotify Playlist, which works wonders during such CTF contests, is available via this
&lt;a href="https://open.spotify.com/playlist/2KwHJlC1x117sXWR0CKZWW?si=H3V76HhzSwi_Bu5Wqut7qQ" target="_blank" rel="noopener">link&lt;/a>. I cannot take credit for it, it belongs to Bryce Galbraith who moderated these two previous NetWars events and was kind enough to share his playlist with us.&lt;/p></description></item><item><title>Identity &amp; Access Management</title><link>https://flrnks.netlify.app/post/aws-iam/</link><pubDate>Mon, 03 Feb 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/aws-iam/</guid><description>&lt;h2 id="introduction">INTRODUCTION&lt;/h2>
&lt;p>In this post I show how the Identity and Access Management service in the AWS Public Cloud works to secure resources and workloads. It is a very important topic, because it underpins all of the security that is needed for hosting one&amp;rsquo;s resources in the public cloud.&lt;/p>
&lt;p>At the end of the day, the cloud is just a concept that offers a convenient illusion of dedicated resources, but in reality it&amp;rsquo;s just some process that runs on someone else&amp;rsquo;s hardware, so one has to be absolutely sure about security before trusting it and running their business-critical workloads on it.&lt;/p>
&lt;p>It is enough to do a quick google search for
&lt;a href="https://www.google.com/search?q=unsecured%20s3%20bucket" target="_blank" rel="noopener">unsecured s3 bucket&lt;/a> to see plenty of examples of administrators failing to properly harden and configure their AWS resources, and falling victim to accidental disclosure of often business-critical information.&lt;/p>
&lt;p>
&lt;a href="https://docs.aws.amazon.com/iam/?id=docs_gateway" target="_blank" rel="noopener">IAM&lt;/a> exists in the realm of AWS Cloud as a standalone service, providing various ways in which access to resources and workloads can be restricted. For example, if someone has an S3 bucket for storing arbitrary data, one can use IAM policies to restrict access to data stored in the bucket based on various criteria such as user identity, connection source IP, VPC environment and so on. S3 is a convenient service to demonstrate IAM capabilities, because it is very easy to grasp the result of restrictions: access to files in an S3 bucket is either granted or denied.&lt;/p>
&lt;h2 id="how-it-works">HOW IT WORKS&lt;/h2>
&lt;p>In order to illustrate how IAM works, I decided to create a Python Lambda function, which is just an AWS service offering server-less functions, and implemented a routine that tries to access some data stored in a particular S3 bucket. By default the Lambda starts running with an
&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html" target="_blank" rel="noopener">IAM role&lt;/a> that has only read-only permission to the bucket. This is verified by making an API call with the
&lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html" target="_blank" rel="noopener">boto3&lt;/a> package, which returns without any error. Next the Lambda tries to write some new data to the bucket, but this fails because the IAM role is not equipped with Write permission to the S3 bucket.&lt;/p>
&lt;p>To mitigate this problem, I use boto3 to make an AWS Secure Token Service (
&lt;a href="https://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html" target="_blank" rel="noopener">STS&lt;/a>) call and assume a new role which is equipped with the necessary read-write access. Using this new role the program demonstrates that it can write to the bucket as expected. Below is a sample output of the Lambda Function in action:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yml" data-lang="yml">===&lt;span class="w"> &lt;/span>Checking&lt;span class="w"> &lt;/span>IAM&lt;span class="w"> &lt;/span>Identity&lt;span class="w"> &lt;/span>===&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">ARN&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>arn&lt;span class="p">:&lt;/span>aws&lt;span class="p">:&lt;/span>sts&lt;span class="p">::&lt;/span>ACCOUNT_ID&lt;span class="p">:&lt;/span>assumed-role/Base-Lambda-Custom-Role/lambda&lt;span class="w">
&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>===&lt;span class="w"> &lt;/span>Testing&lt;span class="w"> &lt;/span>Read&lt;span class="w"> &lt;/span>access&lt;span class="w"> &lt;/span>to&lt;span class="w"> &lt;/span>S3&lt;span class="w"> &lt;/span>file&lt;span class="w"> &lt;/span>in&lt;span class="w"> &lt;/span>bucket&lt;span class="w"> &lt;/span>===&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>{&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">&amp;#34;field1&amp;#34;: &lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">&amp;#34;field2&amp;#34;: &lt;/span>&lt;span class="m">1.&lt;/span>4107917E7&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>}&lt;span class="w">
&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>===&lt;span class="w"> &lt;/span>Testing&lt;span class="w"> &lt;/span>Write&lt;span class="w"> &lt;/span>access&lt;span class="w"> &lt;/span>to&lt;span class="w"> &lt;/span>S3&lt;span class="w"> &lt;/span>bucket&lt;span class="w"> &lt;/span>===&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">Error&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>AccessDenied!&lt;span class="w">
&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>===&lt;span class="w"> &lt;/span>Assumed&lt;span class="w"> &lt;/span>New&lt;span class="w"> &lt;/span>IAM&lt;span class="w"> &lt;/span>Identity&lt;span class="w"> &lt;/span>===&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">ARN&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>arn&lt;span class="p">:&lt;/span>aws&lt;span class="p">:&lt;/span>sts&lt;span class="p">::&lt;/span>ACCOUNT_ID&lt;span class="p">:&lt;/span>assumed-role/S3-RW-Role/lambda&lt;span class="w">
&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>===&lt;span class="w"> &lt;/span>Testing&lt;span class="w"> &lt;/span>Write&lt;span class="w"> &lt;/span>access&lt;span class="w"> &lt;/span>to&lt;span class="w"> &lt;/span>S3&lt;span class="w"> &lt;/span>bucket&lt;span class="w"> &lt;/span>(using&lt;span class="w"> &lt;/span>new&lt;span class="w"> &lt;/span>role)&lt;span class="w"> &lt;/span>===&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>...&lt;span class="w"> &lt;/span>file&lt;span class="w"> &lt;/span>was&lt;span class="w"> &lt;/span>written&lt;span class="w"> &lt;/span>successfully!&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To get a better understanding how this all worked in code, feel free to check out the source code repository in Github (
&lt;a href="https://github.com/florianakos/aws-iam-exercise" target="_blank" rel="noopener">link&lt;/a>). Because I am a big fan of Terraform, I defined all resources (S3, IAM, Lambda) in code which makes it very simple and straightforward to deploy and test the code if you feel like!&lt;/p>
&lt;h2 id="advanced-iam">ADVANCED IAM&lt;/h2>
&lt;p>Besides providing the basic functionality to restrict access to resources base on user identity, there are some cool and more advanced features of AWS IAM that I wanted to touch upon. For example, to show how simple it is to give read-only permissions to a bucket for an IAM role:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">data &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_ro_access_policy_document&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
statement &lt;span class="o">{&lt;/span>
&lt;span class="nv">effect&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Allow&amp;#34;&lt;/span>
&lt;span class="nv">actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>
&lt;span class="s2">&amp;#34;s3:GetObject&amp;#34;&lt;/span>,
&lt;span class="s2">&amp;#34;s3:ListBucket&amp;#34;&lt;/span>,
&lt;span class="o">]&lt;/span>
&lt;span class="nv">resources&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>
&lt;span class="s2">&amp;#34;arn:aws:s3:::my-bucket&amp;#34;&lt;/span>,
&lt;span class="s2">&amp;#34;arn:aws:s3:::my-bucket/*&amp;#34;&lt;/span>
&lt;span class="o">]&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
resource &lt;span class="s2">&amp;#34;aws_iam_policy&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_ro_access_policy&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="nv">name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;S3-ReadOnly-Access&amp;#34;&lt;/span>
&lt;span class="nv">policy&lt;/span> &lt;span class="o">=&lt;/span> data.aws_iam_policy_document.s3_ro_access_policy_document.json
&lt;span class="o">}&lt;/span>
resource &lt;span class="s2">&amp;#34;aws_iam_role_policy_attachment&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;Allow_S3_ReadOnly_Access&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="nv">role&lt;/span> &lt;span class="o">=&lt;/span> aws_iam_role.aws_custom_role_for_lambda.name
&lt;span class="nv">policy_arn&lt;/span> &lt;span class="o">=&lt;/span> aws_iam_policy.s3_ro_access_policy.arn
&lt;span class="o">}&lt;/span>
resource &lt;span class="s2">&amp;#34;aws_iam_role&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;aws_s3_readwrite_role&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;span class="nv">name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;S3-RW-Role&amp;#34;&lt;/span>
&lt;span class="nv">description&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Role to allow full RW to bucket&amp;#34;&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Full source code on
&lt;a href="https://github.com/florianakos/aws-iam-exercise/blob/master/terraform/s3.tf" target="_blank" rel="noopener">GitHub&lt;/a>.&lt;/p>
&lt;p>With this short Terraform code, I created a role, and assigned an IAM policy to it, which has RO access to &lt;code>my-bucket&lt;/code> resource in S3. To spice this up a bit, it is possible to add extra restrictions based on various elements of the request context to restrict access based on Source IP for example:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">data &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_ro_access_policy_document&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
statement &lt;span class="o">{&lt;/span>
&lt;span class="nv">effect&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Deny&amp;#34;&lt;/span>
&lt;span class="nv">actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>
&lt;span class="s2">&amp;#34;s3:*&amp;#34;&lt;/span>
&lt;span class="o">]&lt;/span>
&lt;span class="nv">resources&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;*&amp;#34;&lt;/span>&lt;span class="o">]&lt;/span>
condition &lt;span class="o">{&lt;/span>
&lt;span class="nb">test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;IpAddress&amp;#34;&lt;/span>
&lt;span class="nv">variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;aws:SourceIp&amp;#34;&lt;/span>
&lt;span class="nv">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;192.168.2.0/24&amp;#34;&lt;/span> &lt;span class="o">]&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>All of a sudden, even if the user who makes the request to S3 has correct credentials, but is connecting from a subnet which is outside the one specified above, the request will be &lt;strong>denied&lt;/strong>! This can be very useful for example, when trying restricting access to resources to be possible only from within a corporate network with specific CIDR range.&lt;/p>
&lt;p>One small issue with this source IP restriction is that it can cause issues for certain AWS services that run on behalf of a principal/user. When using the AWS Athena service for example, triggering a query on data stored in S3 means Athena will make S3 API requests on behalf of the user who initiated the Athena query, but will have a source IP address from some Amazon AWS CIDR range and the request will fail. For this purpose, there is an extra condition that can be added to remediate this issue:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">data &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_ro_access_policy_document&amp;#34;&lt;/span> &lt;span class="o">{&lt;/span>
statement &lt;span class="o">{&lt;/span>
&lt;span class="nv">effect&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Deny&amp;#34;&lt;/span>
&lt;span class="nv">actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>
&lt;span class="s2">&amp;#34;s3:*&amp;#34;&lt;/span>
&lt;span class="o">]&lt;/span>
&lt;span class="nv">resources&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;*&amp;#34;&lt;/span>&lt;span class="o">]&lt;/span>
condition &lt;span class="o">{&lt;/span>
&lt;span class="nb">test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;IpAddress&amp;#34;&lt;/span>
&lt;span class="nv">variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;aws:SourceIp&amp;#34;&lt;/span>
&lt;span class="nv">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;192.168.2.0/24&amp;#34;&lt;/span> &lt;span class="o">]&lt;/span>
&lt;span class="o">}&lt;/span>
condition &lt;span class="o">{&lt;/span>
&lt;span class="nb">test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Bool&amp;#34;&lt;/span>
&lt;span class="nv">variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;aws:ViaAWSService&amp;#34;&lt;/span>
&lt;span class="nv">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span> &lt;span class="s2">&amp;#34;false&amp;#34;&lt;/span> &lt;span class="o">]&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;span class="o">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>aws:viaAWSService = false&lt;/code> condition will ensure that this Deny will only take effect when the request context does not come from an AWS Service Endpoint. For additional info on what other possibilities exist that can be used to grant or deny access, please consult the AWS
&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html" target="_blank" rel="noopener">documentation&lt;/a>.&lt;/p>
&lt;h2 id="conclusion">CONCLUSION&lt;/h2>
&lt;p>In this post I demonstrated how to use the boto3 python package to make AWS IAM and STS calls to access resources in the AWS cloud protected by IAM policies. I also discussed some advanced features of AWS IAM that can help you implement more granular IAM policies and access rights. The linked repository also contains an example which may be run locally and does not need the Lambda function to be created (it still, however, requires the Terraform resources to be deployed).&lt;/p></description></item><item><title>Cloud Service Testing</title><link>https://flrnks.netlify.app/post/python-aws-datadog-testing/</link><pubDate>Fri, 17 Jan 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/python-aws-datadog-testing/</guid><description>&lt;p>In this blog post I discuss a recent project I worked on to practice my skills related to AWS, Python and Datadog. It includes topics such as integration testing using &lt;strong>pytest&lt;/strong> and &lt;strong>localstack&lt;/strong>; running Continuous Integration via &lt;strong>Travis-CI&lt;/strong> and infrastructure as code using &lt;strong>Terraform&lt;/strong>.&lt;/p>
&lt;h2 id="intro">Intro&lt;/h2>
&lt;p>For the sake of this blog post, let&amp;rsquo;s assume that a periodic job runs somewhere in the Cloud, outside the context of this application, which generates a file with some meta-data about the job itself. This data includes mostly numerical values, such as the number of images used to train an ML model, or the number of files processed, etc. This part is depicted on the below diagram as a dummy Lambda function that periodically uploads this metadata file to an S3 bucket with random numerical values.&lt;/p>
&lt;p>&lt;img src="img/arch.png" alt="Architecture">&lt;/p>
&lt;p>When this file is uploaded, an event notification is sent to the message queue. The goal of the Python application is to periodically drain these messages from the queue. When the application runs, it fetches the S3 file referenced in each SQS message, parses the file&amp;rsquo;s contents and submits the numerical metrics to DataDog for the purpose of visualisation and alerting.&lt;/p>
&lt;h2 id="testing">Testing&lt;/h2>
&lt;p>Since the application interacts with two different APIs (AWS &amp;amp; Datadog), I figured it was a good idea to create integration tests that can be run easily via some free CI service (e.g.: Travis-CI.org). When writing the integration tests, I opted to create a simple mock class for testing the interaction with the Datadog API, and chose to rely on &lt;strong>localstack&lt;/strong> for testing the interaction with the AWS API.&lt;/p>
&lt;p>Thanks to &lt;strong>localstack&lt;/strong> I could skip creating real resources in AWS and instead use free fake resources in a docker container, that mimic the real AWS API close to 100%. The AWS SDK called &lt;code>boto3&lt;/code> is very easy to reconfigure to connect to the fake resources in &lt;strong>localstack&lt;/strong> with the &lt;code>endpoint_url=&lt;/code> parameter.&lt;/p>
&lt;p>In the following sections I go through different phases of the project:&lt;/p>
&lt;ol>
&lt;li>coding the python app&lt;/li>
&lt;li>mocking Datadog statsd client&lt;/li>
&lt;li>setting up AWS resources in localstack&lt;/li>
&lt;li>creating integration tests&lt;/li>
&lt;li>Travis-CI integration&lt;/li>
&lt;li>running the datadog-agent locally&lt;/li>
&lt;li>setting up real AWS resources&lt;/li>
&lt;li>live testing&lt;/li>
&lt;/ol>
&lt;h3 id="-coding-the-python-app-">~ Coding the python app ~&lt;/h3>
&lt;p>The
&lt;a href="https://github.com/florianakos/python-testing/blob/master/app/submitter.py" target="_blank" rel="noopener">code&lt;/a> is mainly composed of two Python classes with methods to interact with AWS and DataDog. The &lt;strong>CloudResourceHandler&lt;/strong> class has methods to interact with S3 and SQS, which can be replaced in integration-tests with preconfigured &lt;code>boto3&lt;/code> clients for &lt;strong>localstack&lt;/strong>.&lt;/p>
&lt;p>The &lt;strong>MetricSubmitter&lt;/strong> class uses the &lt;strong>CloudResourceHandler&lt;/strong> internally and offers some additional methods for sending metrics to DataDog. Internally it uses statsd from the &lt;code>datadog&lt;/code> python
&lt;a href="https://pypi.org/project/datadog/" target="_blank" rel="noopener">package&lt;/a>, which can be replaced via dependency injection in integration tests with a mock statsd class that I created to test its interaction with the Datadog API.&lt;/p>
&lt;p>To connect to the real AWS &amp;amp; Datadog APIs (via a preconfigured local datadog-agent) there needs to be two environment variables specified at run-time:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>STATSD_HOST&lt;/strong> set to &lt;code>localhost&lt;/code>&lt;/li>
&lt;li>&lt;strong>SQS_QUEUE_URL&lt;/strong> set to the URL of the Queue&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">environ&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;STATSD_HOST&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;localhost&amp;#39;&lt;/span>
&lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">environ&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;SQS_QUEUE_URL&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;https://sqs.eu-central-1.amazonaws.com/????????????/cloud-job-results-queue&amp;#39;&lt;/span>
&lt;span class="n">session&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">boto3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Session&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">profile_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;profile-name&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="n">MetricSubmitter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">statsd&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">datadog_statsd&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="n">sqs_client&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">session&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">client&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;sqs&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;span class="n">s3_client&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">session&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">client&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;s3&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">run&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In addition, it also requires a preconfigured AWS profile in &lt;code>~/.aws/credentials&lt;/code> which is necessary for &lt;strong>boto3&lt;/strong> to authenticate to AWS:&lt;/p>
&lt;pre>&lt;code class="language-console" data-lang="console">[profile-name]
aws_access_key_id = XXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
region = eu-central-1
&lt;/code>&lt;/pre>&lt;p>But before running it, let&amp;rsquo;s set up some integration tests!&lt;/p>
&lt;h3 id="-mocking-datadog-statsd-client-">~ Mocking Datadog statsd client ~&lt;/h3>
&lt;p>In truth, the application does not interact directly with the Datadog API, but rather it uses &lt;strong>statsd&lt;/strong> from the &lt;code>datadog&lt;/code> python package, which interacts with the local &lt;code>datadog-agent&lt;/code>, which in turn forwards metrics and events to the Datadog API.&lt;/p>
&lt;p>To test this flow that relies on &lt;code>statsd&lt;/code>, I created a class called &lt;strong>DataDogStatsDHelper&lt;/strong>. This class has 2 functions (&lt;strong>gauge/event&lt;/strong>) with identical signatures to the real functions from the official &lt;code>datadog-statsd&lt;/code> package. However, the mock functions do not send anything to the &lt;code>datadog-agent&lt;/code>. Instead, they accumulate the values they were passed in local class variables:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="k">class&lt;/span> &lt;span class="nc">DataDogStatsDHelper&lt;/span>&lt;span class="p">:&lt;/span>
&lt;span class="n">event_title&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">event_text&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">event_alert_type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">event_tags&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">event_counter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;span class="n">gauge_metric_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">gauge_metric_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">gauge_tags&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">None&lt;/span>
&lt;span class="n">gauge_counter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">event&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">title&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">text&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">alert_type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">aggregation_key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">source_type_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="n">date_happened&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">priority&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">tags&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">hostname&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">gauge&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">metric&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">tags&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sample_rate&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">None&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>When the MetricSubmitter class is tested, this mock class is injected instead of the real &lt;strong>statsd&lt;/strong> class, which enables assertions to be made and compare expectations with reality.&lt;/p>
&lt;h3 id="-aws-resources-in-localstack-">~ AWS resources in localstack ~&lt;/h3>
&lt;p>To test how the python app integrates with S3 and SQS, I decided to use &lt;strong>loalstack&lt;/strong>, running in a Docker container. To make it simple and repeatable, I created a &lt;code>docker-compose.yaml&lt;/code> file that allows the configuration parameters to be defined in YAML:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yml" data-lang="yml">&lt;span class="k">version&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;3.2&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">localstack&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>localstack/localstack&lt;span class="p">:&lt;/span>latest&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">container_name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>localstack&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ports&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="s1">&amp;#39;4563-4599:4563-4599&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="s1">&amp;#39;8080:8080&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">environment&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- SERVICES=s3&lt;span class="p">,&lt;/span>sqs&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- AWS_ACCESS_KEY_ID=foo&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- AWS_SECRET_ACCESS_KEY=bar&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The resulting fake AWS resources are accessible via different ports on localhost. In this case, S3 runs on port &lt;strong>4572&lt;/strong> and SQS on port &lt;strong>4576&lt;/strong>. Refer to the
&lt;a href="https://github.com/localstack/localstack#overview" target="_blank" rel="noopener">docs&lt;/a> on GitHub for more details on ports used by other AWS services in localstack.&lt;/p>
&lt;p>It is important to note that when localstack starts up, it is completely empty. Thus, before the integration tests can run, it is necessary to provision the S3 bucket and SQS queue in localstack, just as one would normally do it when using real AWS resources.&lt;/p>
&lt;p>For this purpose, it&amp;rsquo;s possible to write a simple bash script that can be called from the localstack container as part of an automatic init script:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">aws --endpoint-url&lt;span class="o">=&lt;/span>http://localhost:4572 s3api create-bucket --bucket &lt;span class="s2">&amp;#34;bucket-name&amp;#34;&lt;/span> --region &lt;span class="s2">&amp;#34;eu-central-1&amp;#34;&lt;/span>
aws --endpoint-url&lt;span class="o">=&lt;/span>http://localhost:4576 sqs create-queue --queue-name &lt;span class="s2">&amp;#34;queue-name&amp;#34;&lt;/span> --region &lt;span class="s2">&amp;#34;eu-central-1&amp;#34;&lt;/span> --attributes &lt;span class="s2">&amp;#34;MaximumMessageSize=4096,MessageRetentionPeriod=345600,VisibilityTimeout=30&amp;#34;&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, for the sake of making the integration-tests self-contained, I opted to integrate this into the tests as part of a class setup phase that runs before any tests and sets up the required S3 bucket and SQS queue:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="nd">@classmethod&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">setUpClass&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="bp">cls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ls&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">LocalStackHelper&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="bp">cls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_s3_client&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create_bucket&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Bucket&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">s3_bucket_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="bp">cls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_sqs_client&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create_queue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">QueueName&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sqs_queue_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="-creating-integration-tests-">~ Creating integration tests ~&lt;/h3>
&lt;p>As a next step I created the integration
&lt;a href="https://github.com/florianakos/python-testing/blob/master/app/test_submitter.py" target="_blank" rel="noopener">tests&lt;/a> which use the fake AWS resources in localstack, as well as the mock &lt;strong>statsd&lt;/strong> class for DataDog. I used two popular python packages to create these:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>unittest&lt;/strong> which is a built-in package&lt;/li>
&lt;li>&lt;strong>pytest&lt;/strong> which is a 3rd party package&lt;/li>
&lt;/ul>
&lt;p>Actually, the test cases only use &lt;strong>unittest&lt;/strong>, while &lt;strong>pytest&lt;/strong> is used for the simple collection and execution of those tests. To get started with the &lt;strong>unittest&lt;/strong> framework, I created a python class and implemented the test cases within this class:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="kn">import&lt;/span> &lt;span class="nn">unittest&lt;/span>
&lt;span class="kn">from&lt;/span> &lt;span class="nn">app.utils.datadog_fake_statsd&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">DataDogStatsDHelper&lt;/span>
&lt;span class="kn">from&lt;/span> &lt;span class="nn">app.utils.localstack_helper&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">LocalStackHelper&lt;/span>
&lt;span class="kn">from&lt;/span> &lt;span class="nn">app.submitter&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">MetricSubmitter&lt;/span>
&lt;span class="k">class&lt;/span> &lt;span class="nc">ProjectIntegrationTesting&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">unittest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TestCase&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="nd">@classmethod&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">setUpClass&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">setUp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">test_ddg_submitter_valid_payload&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">test_ddg_submitter_invalid_payload&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">test_aws_handler_invalid_s3key&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;span class="k">def&lt;/span> &lt;span class="nf">test_aws_handler_valid_s3key&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="o">...&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In the &lt;strong>setUpClass&lt;/strong> method, a few things are taken care of before tests can be executed:&lt;/p>
&lt;ul>
&lt;li>define class variables for the bucket &amp;amp; the queue&lt;/li>
&lt;li>create SQS &amp;amp; S3 clients using localstack endpoint url&lt;/li>
&lt;li>provision needed resources (Queue/Bucket) in localstack&lt;/li>
&lt;/ul>
&lt;p>To test the interaction with DataDog via the statsd client, the submitter app is executed, which stores some values in the mock &lt;strong>statsd&lt;/strong> class&amp;rsquo;s internal variables, which are then used in assertions to compare values with expectations.&lt;/p>
&lt;p>The other tests inspect the behaviour of the &lt;strong>CloudResourceHandler&lt;/strong> class. For example, one of the assertions tests whether the &lt;code>.has_available_messages()&lt;/code> function returns false when there are no more messages in the queue.&lt;/p>
&lt;p>A nice feature of &lt;strong>unittest&lt;/strong> is that it&amp;rsquo;s easy to define tasks that need to be executed before each test, to ensure a clean slate for each test. For example, the code in the &lt;strong>setUp&lt;/strong> method ensures two things:&lt;/p>
&lt;ul>
&lt;li>the fake SQS queue is emptied before each test&lt;/li>
&lt;li>class variables of the mock DataDog class are reset before each test&lt;/li>
&lt;/ul>
&lt;p>Theoretically, it would be possible to run the test by running &lt;code>pytest -s -v&lt;/code> in the python project&amp;rsquo;s root directory, however the tests rely on localstack, so they would fail&amp;hellip;&lt;/p>
&lt;h3 id="-travis-ci-integration-">~ Travis-CI integration ~&lt;/h3>
&lt;p>So now that the integration tests are created, I thought it would be really nice to have them automatically run in a CI service, whenever someone pushes changes to the Git repo. To this end, I created a free account on &lt;code>travis-ci.org&lt;/code> and integrated it with my github rep by creating a &lt;strong>.travis.yaml&lt;/strong> file with the below initial content:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="k">os&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>linux&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>python&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">python&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;3.8&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- docker&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">script&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- {...}&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, I still needed a way to run &lt;code>localstack&lt;/code> and then execute the integration tests within the CI environment. Luckily I found &lt;strong>docker-compose&lt;/strong> to be a perfect fit for this purpose. I had already created a yaml file to describe how to run &lt;code>localstack&lt;/code>, so now I could just simply add an extra container that would run my tests. Here is how I created a docker image to run the tests via docker-compose:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-dockerfile" data-lang="dockerfile">&lt;span class="k">FROM&lt;/span>&lt;span class="s"> python:3.8-alpine&lt;/span>&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">WORKDIR&lt;/span>&lt;span class="s"> /app&lt;/span>&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">COPY&lt;/span> ./requirements-test.txt ./&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">RUN&lt;/span> apk add --no-cache --virtual .pynacl_deps build-base gcc make python3 python3-dev libffi-dev &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> pip3 install --upgrade setuptools pip &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> pip3 install --no-cache-dir -r requirements-test.txt &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> rm requirements-test.txt&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">COPY&lt;/span> ./utils/*.py ./utils/&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">COPY&lt;/span> ./*.py ./&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">ENV&lt;/span> LOCALSTACK_HOST localstack&lt;span class="err">
&lt;/span>&lt;span class="err">&lt;/span>&lt;span class="k">ENTRYPOINT&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;pytest&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;-s&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;-v&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="err">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>It installs the necessary dependencies to an alpine based python 3.8 image; adds the necessary source code, and finally executes &lt;strong>pytest&lt;/strong> to collect &amp;amp; run the tests. Here are the updates I had to make to the &lt;strong>docker-compose.yaml&lt;/strong> file:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="k">version&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;3.2&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w">&lt;/span>&lt;span class="k">services&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">localstack&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>{...}&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">integration-tests&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">container_name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>cloud-job-it&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">build&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">context&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>.&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">dockerfile&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>Dockerfile-tests&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">depends_on&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- &lt;span class="s2">&amp;#34;localstack&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Docker Compose auto-magically creates a shared network to enable connectivity between the defined services, which can call one-another by name. So when the tests are running in the &lt;strong>cloud-job-it&lt;/strong> container, they can use the hostname &lt;strong>localstack&lt;/strong> to create the &lt;strong>boto3&lt;/strong> session via the endpoint url to reach the fake AWS resources.&lt;/p>
&lt;p>For easier to creation of AWS clients via localstack, I used a package called
&lt;a href="https://github.com/localstack/localstack-python-client" target="_blank" rel="noopener">localstack-python-client&lt;/a>, so I don&amp;rsquo;t have to deal with port numbers and low level details. However, this client by default tries to use &lt;strong>localhost&lt;/strong> as the hostname, which wouldn&amp;rsquo;t work in my setup using docker-compose. After digging through the source-code of this python package, I found a way to change this by setting an environment variable named &lt;strong>LOCALSTACK_HOST&lt;/strong>.&lt;/p>
&lt;p>As a final step, I just had to add two lines to complete to the &lt;strong>.travis.yaml&lt;/strong> file:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="k">script&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- docker-compose&lt;span class="w"> &lt;/span>up&lt;span class="w"> &lt;/span>--build&lt;span class="w"> &lt;/span>--abort-on-container-exit&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>- docker-compose&lt;span class="w"> &lt;/span>down&lt;span class="w"> &lt;/span>-v&lt;span class="w"> &lt;/span>--rmi&lt;span class="w"> &lt;/span>all&lt;span class="w"> &lt;/span>--remove-orphans&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Thanks to the &lt;code>--abort-on-container-exit&lt;/code> flag, docker-compose will return the same exit code which is returned from the container that first exits, which first this use-case perfectly, as the &lt;strong>cloud-job-it&lt;/strong> container only runs until the tests finish. This way the whole setup will gracefully shut down, while preserving the exit code from the container, allowing the CI system to generate an alert if it&amp;rsquo;s not 0 (meaning some test failed).&lt;/p>
&lt;h3 id="-running-the-datadog-agent-locally-">~ Running the datadog-agent locally ~&lt;/h3>
&lt;p>&lt;strong>Note&lt;/strong>: while Datadog is a paid service, it&amp;rsquo;s possible to create a trial account that&amp;rsquo;s free for 2 weeks, without the need to enter credit card details. This is pretty amazing!&lt;/p>
&lt;p>Now that the integration tests are automated and passing, I wanted to run the &lt;code>datadog-agent&lt;/code> locally, so that I can test the python application with some real data that was to he submitted to Datadog via the agent. Here is an
&lt;a href="https://docs.datadoghq.com/getting_started/agent/?tab=datadogeusite" target="_blank" rel="noopener">article&lt;/a> that was particularly useful to me, with instructions on how the agent should be set up.&lt;/p>
&lt;p>While the option of running it in docker-compose was initially appealing, I eventually decided to just start it manually as a long-lived detached container. Here is how I went about doing that:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="nv">DOCKER_CONTENT_TRUST&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="m">1&lt;/span> docker run -d &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> --name dd-agent &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -v /var/run/docker.sock:/var/run/docker.sock:ro &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -v /proc/:/host/proc/:ro &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -e &lt;span class="nv">DD_API_KEY&lt;/span>&lt;span class="o">=&lt;/span>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -e &lt;span class="nv">DD_SITE&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;datadoghq.eu&amp;#34;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -e &lt;span class="nv">DD_DOGSTATSD_NON_LOCAL_TRAFFIC&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> -p 8125:8125/udp &lt;span class="se">\
&lt;/span>&lt;span class="se">&lt;/span> datadog/agent:7
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Most notable of these lines is the &lt;strong>DD_API_KEY&lt;/strong> environment variable which ensures that whatever data I send to the agent is associated with my own account. In addition, since I am closest to the EU region, I had to specify the endpoint via the &lt;strong>DD_SITE&lt;/strong> variable. Also, because I want the agent to accept metrics from the python app, I need to turn on a feature via the environment variable &lt;strong>DD_DOGSTATSD_NON_LOCAL_TRAFFIC&lt;/strong>, as well as expose port 8125 from the docker container to the host machine:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash"> ▶ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
477cb2ea74b2 datadog/agent &lt;span class="s2">&amp;#34;/init&amp;#34;&lt;/span> &lt;span class="m">3&lt;/span> days ago Up &lt;span class="m">3&lt;/span> days &lt;span class="o">(&lt;/span>healthy&lt;span class="o">)&lt;/span> 0.0.0.0:8125-&amp;gt;8125/udp, 8126/tcp dd-agent
&lt;/code>&lt;/pre>&lt;/div>&lt;p>All seems to be well!&lt;/p>
&lt;h3 id="-deploying-real-aws-resources-">~ Deploying real AWS resources ~&lt;/h3>
&lt;p>Here I briefly discuss how I deployed some real resources in AWS to see my application running live. In a nutshell, I set the infra up as code in Terraform, which greatly simplified the whole process. All the necessary files are collected in a
&lt;a href="https://github.com/florianakos/python-testing/tree/master/terraform" target="_blank" rel="noopener">directory&lt;/a> of my repository:&lt;/p>
&lt;ul>
&lt;li>&lt;code>variables.tf&lt;/code> defines some variables used in multiple places&lt;/li>
&lt;li>&lt;code>init.tf&lt;/code> initialisation of the AWS provider and definition of AWS resources&lt;/li>
&lt;li>&lt;code>outputs.tf&lt;/code> defines some values that are reported when deployment finishes&lt;/li>
&lt;/ul>
&lt;p>The first and last files are not very interesting. Most of the interesting stuff happens in the &lt;strong>init.tf&lt;/strong>, which defines the necessary resources and permissions. One extra resource not mentioned before, is an AWS Lambda function, which gets executed every minute and is used to upload a JSON file to the S3 bucket. This acts as a random source of data, so that the python app has some work to do without manual intervention.&lt;/p>
&lt;h3 id="-live-testing-">~ Live testing ~&lt;/h3>
&lt;p>Now that all parts seem to be ready, it&amp;rsquo;s time to run the main python app using the real S3 bucket and SQS queue, as well as the local datadog-agent. The console output provides some hints whether it&amp;rsquo;s able to pump the metrics from AWS to a DataDog:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">▶ python3 submitter.py
Initializing new Cloud Resource Handler with SQS URL - https://.../cloud-job-results-queue
Processing available messages in SQS queue:
- sending data to DataDog via statsd/datadog-agent.
- removing message from SQS &lt;span class="o">(&lt;/span>AQEBO37smPPHg6OIqbh3HMu3g...&lt;span class="o">)&lt;/span>
- ...
- sending data to DataDog via statsd/datadog-agent.
- removing message from SQS &lt;span class="o">(&lt;/span>AQEBV0/JzMVEP6k5kBmx2kvGn...&lt;span class="o">)&lt;/span>
No more messages visible in the queue, shutting down ...
Process finished with &lt;span class="nb">exit&lt;/span> code &lt;span class="m">0&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Next, I checked my DataDog account to see whether the metric data arrived. For this I created a custom
&lt;a href="https://app.datadoghq.eu/notebook/list" target="_blank" rel="noopener">Notebook&lt;/a> with graphs to display them:&lt;/p>
&lt;p>&lt;img src="img/datadog-metrics.png" alt="DataDog Metrics">&lt;/p>
&lt;p>All seems to be well! The deployed AWS Lambda function has already run a few times, providing input data for the python app, which were successfully processed and forwarded to Datadog. As seen on the &lt;code>Notebook&lt;/code> above, it is really easy to display metric data over time about any recurring workload, which can provide pretty useful insights into those jobs.&lt;/p>
&lt;p>Furthermore, since DataDog also submission of
&lt;a href="https://docs.datadoghq.com/events/" target="_blank" rel="noopener">events&lt;/a> it becomes possible to design dashboards and create alerts which trigger based on mor complex criteria, such as the presence or lack of events over certain periods of time. One such example can be seen below:&lt;/p>
&lt;p>&lt;img src="img/ok-vs-fail.png" alt="DataDog Dashboard OK">&lt;/p>
&lt;p>This is a so-called
&lt;a href="https://docs.datadoghq.com/dashboards/screenboards/" target="_blank" rel="noopener">screen-board&lt;/a> which I created to display the status of a Monitor that I set up previously. This Monitor tracks incoming events with the tag &lt;strong>cloud_job_metric&lt;/strong> and generates an alert, if there is not at least one such event of type &lt;strong>success&lt;/strong> in the last 30 minutes. The screen-board can be exported via a public URL if needed, or just simply displayed on a big screen somewhere in the office.&lt;/p>
&lt;h2 id="conclusions">Conclusions&lt;/h2>
&lt;p>In this post I discussed a relatively complex project with lots of exciting technology working together in the realm of Cloud Computing. In the end, I was able to create DashBoards and Monitors in DataDog, which can ingest and display telemetry about AWS workloads, in a way that makes it useful to track and monitor the workloads themselves.&lt;/p></description></item><item><title>KringleCon II</title><link>https://flrnks.netlify.app/post/kringlecon-writeup/</link><pubDate>Mon, 13 Jan 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/kringlecon-writeup/</guid><description>&lt;p>In this post I just wanted to announce and link to my write-up in the Tutorials section of my blog, which chronicles my solution to the challenges of the most fun CTF of the holiday season.&lt;/p>
&lt;p>&lt;img src="sans-main.png" alt="HHC 2019">&lt;/p>
&lt;p>A huge thank you goes out to the SANS Institute and the Counter Hack Team who are the organisers of this event. They put a great deal of energy and effort year after year to host this event. It is no wonder the campus of the Elf University was sometimes so crowded, you could barely see your own avatar! :)&lt;/p>
&lt;p>&lt;img src="crowd.png" alt="Crowds at Elf University">&lt;/p>
&lt;p>To get to the write-up, either click
&lt;a href="https://flrnks.netlify.app/tutorials/kringlecon2019/">this&lt;/a> link, or manually go to the Tutorials section in the top bar. I also welcome any kind of feedback or comment on the write-up, to do so please hit the Contact link in the top bar.&lt;/p></description></item><item><title>RunCode.ninja Challenges</title><link>https://flrnks.netlify.app/post/runcode/</link><pubDate>Sat, 11 Jan 2020 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/runcode/</guid><description>&lt;p>This post was born on a misty saturday morning, while slowly sipping some good quality coffe in a Prague café. The last several days after work was over I spent solving programming challenges on
&lt;a href="https://runcode.ninja/" target="_blank" rel="noopener">runcode.ninja&lt;/a> and I thought it would be nice to share my experience and spread the word about it.&lt;/p>
&lt;h3 id="runcodeninja">RunCode.ninja&lt;/h3>
&lt;p>I can&amp;rsquo;t really recall how I discovered this website in the first place&amp;hellip; All I remember is that I was really into the simplistic idea of it all. The basic idea for most of the challenges goes something like this:&lt;/p>
&lt;ul>
&lt;li>check problem description&lt;/li>
&lt;li>inspect any sample input (if any)&lt;/li>
&lt;li>write your program locally&lt;/li>
&lt;li>test on sample input (if any)&lt;/li>
&lt;li>submit source code to the evaluation platform&lt;/li>
&lt;/ul>
&lt;p>If all went well, you will get feedback within a few seconds whether the submitted code worked correctly for the given task at hand. If it didn&amp;rsquo;t, then you can turn to their
&lt;a href="https://runcode.ninja/faq" target="_blank" rel="noopener">FAQ&lt;/a> for some advice. It definitely has some useful info, however if all else fails, you can also contact the team behind the platform on their slack
&lt;a href="runcodeslack.slack.com">channel&lt;/a>. They are really friendly people so be sure to respond to their effort in kind!&lt;/p>
&lt;p>&lt;img src="runcode.png" alt="easy-category">&lt;/p>
&lt;p>Another nice thing about their platform is that they categorized all their challenges (119 in total as of now) into nice categories such as &lt;code>binary, encoding, encryption, forensics, etc.&lt;/code> which allows you to select what you are interested in. When I started out, I was first aiming to complete the challenges in &lt;code>Easy&lt;/code> which offers a combination of relatively easy challenges from &lt;code>math, text-parsing, encoding&lt;/code> and other categories.&lt;/p>
&lt;p>As it currently stands, I rank 155 our of around ~2400 registered users, which seems quite impressive at first, but I suspect there may be quite a few inactive accounts in their database. Also, there are some hardcore people who have already completed all their challenges that seems quite impressive. If only a few rainy and cold weekends I could spend working on these, I would probably catch up soon!&lt;/p>
&lt;p>Last but not least, their platform is set up to interpret a several different programming languages, so you can choose to solve them in the language you are most comfortable with. Once you solve a challenge, you can access its &lt;code>write-ups&lt;/code> which provide some very useful inspiration on how others have solved the same problem. This can provide some very valuable lessons, like that one time when I wrote a Go program that was 20 lines long to solve a challenge that took only 1 line into solve in Bash&amp;hellip;&lt;/p>
&lt;p>If you are interested to check out my solutions for some of the challenges, you can find them in my GitHub
&lt;a href="https://github.com/florianakos/codewarz" target="_blank" rel="noopener">repository&lt;/a>. For some of them I even created two different solutions, one in Python and another Go, just to compare and practice working with both languages.&lt;/p>
&lt;p>Oh and I almost forgot to mention, they have some really cool stickers that they are not shy to send half-way across the world by post, so that&amp;rsquo;s another big plus for sticker fans :)&lt;/p>
&lt;p>&lt;img src="sticker.png" alt="sticket">&lt;/p>
&lt;p>That&amp;rsquo;s all for now, thank you for tuning in! :)&lt;/p></description></item><item><title>Docker with Ansible</title><link>https://flrnks.netlify.app/post/ansible-docker/</link><pubDate>Fri, 13 Dec 2019 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/ansible-docker/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>This post was written as a kind of learning diary for my most recent venture into the world of automation through &lt;code>Ansible&lt;/code>. The project I implemented uses Docker to package 2 services into a micro-services architecture and Ansible to build and deploy those services on remote hosts (with the help of Dockr Compose).&lt;/p>
&lt;h3 id="the-idea">The Idea&lt;/h3>
&lt;p>The service implements a file processing utility which monitors the file-system (a particular folder) and grabs any newly created files and stores them in another folder while compressing it. Interacting with the service is possible through a web interface which offers a way to upload files, simple statistics and the possibility to request email summaries.&lt;/p>
&lt;h3 id="the-approach">The Approach&lt;/h3>
&lt;p>The first idea was to write it all in Go, because I am quite comfortable with the language. However, after a few searches on the interweb, I discovered that a handy UNIX utility already exists for my exat use-case: &lt;code>inotify&lt;/code>. While Go has some packages that offer wrappers around this utility, I eventually decided to just write a bash script for using the &lt;code>inotify&lt;/code> tool, instead of relying on Go for implementing all parts of this service. This also allowed me a convenient excuse to make the service into a 2 piece set, both of which can be deployed and scaled independently, in the spirit of micro-service architecture. Next, I set out to learn enough of Ansible that can be used to deploy a packaged in Docker containers.&lt;/p>
&lt;h2 id="ansible-101">ANSIBLE 101&lt;/h2>
&lt;p>Before this project, I never had the chance to use Ansible, but I wanted to learn about it for quite a while, so here I would describe it briefly for those who are also on the start of their journey with Ansible.&lt;/p>
&lt;p>At the basic level, it is a tool for provisioning and configuring applications on remote systems in an automated fashion. To achieve the automation it uses so-called &lt;code>playbooks&lt;/code>, which define what steps are necessary to reach a desired state for remote systems. It runs mainly on UNIX systems, but is able to provision and configure both UNIX and Windows based systems.&lt;/p>
&lt;p>It is an &lt;code>agentless&lt;/code> tool, which means it does not require any special software to be included in the remote hosts. Instead it relies on an SSH connection to remote hosts, through which bash or PowerShell utilities are used to carry out the necessary steps.&lt;/p>
&lt;p>Ansible uses an &lt;code>inventory&lt;/code> that describes the remote systems that can be provisioned through the playbooks. Inventories can be defined statically in local filesystem on the Ansible master node, or pulled dynamically from remote systems as well.&lt;/p>
&lt;h2 id="ansible-meets-docker">ANSIBLE MEETS DOCKER&lt;/h2>
&lt;p>For the purpose of this project, them main use of Ansible lies in its ability to build and run Docker containers. While Docker is not strictly needed to deploy this service on multiple remote hosts, it becomes much easier when all the necessary dependencies and the source code are packaged neatly in a container that can be easily shipped. Within the Docker container, all dependencies are set up and the service is configured in a reliable and consistent manner, while Ansible takes care of deploying and running the service.&lt;/p>
&lt;p>It is worth mentioning that other tools exist, such as Kubernetes, Docker Swarm and others, which focus more on shipping containerised applications. This blog post, however will not deal with those, but focus entirely on Ansible and Docker instead. Future posts may discuss those alternatives in more detail.&lt;/p>
&lt;p>Below is a brief summary of the proposed architecture that depicts how Ansible and Docker are used together to achieve the desired state of deploying the containerised service on each Ansible host.&lt;/p>
&lt;p>&lt;img src="ansib-meets-dock.png" alt="Ansible meets Docker">&lt;/p>
&lt;p>Detailed instructions are out of scope for this post as well, but briefly: the above shows a snapshot of my local environment using virtual machines in VirtualBox. First, I created a master VM with Ubuntu Desktop and then two slave VMs with Ubuntu server (no GUI necessary). Ansible was installed on the Master node and proper SSH access was configured for both slave VMs from the master VM. In the Ansible playbook used to deploy the service on remote systems, the first few tasks were about installing necessary dependencies and setting up a local docker environment, which can later build and run containerised applications.&lt;/p>
&lt;h2 id="monolithic-vs-microservice">MONOLITHIC VS MICROSERVICE&lt;/h2>
&lt;p>Before discussing how Ansible was used to deploy the service on remote machines using Docker, it is worth going through the building blocks of the service itself. The set of features needed for the service:&lt;/p>
&lt;ul>
&lt;li>file monitoring service that grabs and compresses files&lt;/li>
&lt;li>web interface for file uploads, email sending and service stats&lt;/li>
&lt;/ul>
&lt;p>These features could be implemented in one application that runs all the necessary functions in parallel. In fact, on my first iteration, I opted to solve it this way, packaging all features into a single container. The below figure shows how it worked.&lt;/p>
&lt;p>&lt;img src="monolithic.png" alt="Monolithic Docker">&lt;/p>
&lt;p>However, for the sake of learning, it is worth to consider using a &lt;code>microservice&lt;/code> approach. This essentially means breaking up big &lt;code>monolithic&lt;/code> applications to smaller sub-components. Docker is a perfect tool for this. For our purposes, such an architecture could mean deploying 2 separate containers: one for the Web UI backend (for uploads, statistics and email) and another that implements the monitoring and compression service. Below is an updated figure showing the breakup of our previously monolithic approach.&lt;/p>
&lt;p>&lt;img src="microservice.png" alt="Microservice Docker">&lt;/p>
&lt;p>Breaking up the one container from the first iteration into two separate containers enables us to reap some benefits of microservice architecture. Our application components can fail independently, for example, a bug in the email sending service will not bring down the monitoring service. Also, such an architecture means in the future we can scale better with demand, in case there would be a huge surge in requests to the web frontend, we could just deploy more instances of the container and use a load-balancer to distribute requests among those instances.&lt;/p>
&lt;h2 id="implementation">IMPLEMENTATION&lt;/h2>
&lt;p>To implement the web component, I used simmple static HTML being served from a &lt;code>GO&lt;/code> backend, that also handled file-uploads, sending email notifications and extracting statistical data from a shared SQLite3 database. In order to implement the file monitoring service, I used the &lt;code>inofity-tools&lt;/code> available on UNIX systems, and wrapped it in a bash script that took care of the zipping, and generating of logs and statistics into the SQLite3 database.&lt;/p>
&lt;h3 id="docker-compose">Docker-Compose&lt;/h3>
&lt;p>Docker Compose was used to enable easier testing and deployment. The definitions in the &lt;code>docker-compose.yml&lt;/code> describe what docker containers should be started with what parameters. The two services defined in the docker-compose correspond to the two containers defined above using the micro-service architecture.&lt;/p>
&lt;p>The &lt;code>webserver&lt;/code> running the GO backend uses a few mounted folders plus an exposed port to let inbound communication reach the server. The &lt;code>monitor&lt;/code> uses 4 folders mounted from the host FS, which enable its core functionality (listening for files and zipping them to a different folder).&lt;/p>
&lt;h3 id="ansible">Ansible&lt;/h3>
&lt;p>Thanks to Docker Compose, it was relatively simple to deploy and run the service with Ansible, once the necessary packages and dependencies are installed on Ansible hosts. All it took was a simple Ansible Task using the docker_compose module:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-yml" data-lang="yml">- &lt;span class="k">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>Docker-Compose&lt;span class="w"> &lt;/span>UP&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">docker_compose&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">project_src&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>path_to_docker_compose_yml&lt;span class="w">
&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">build&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>yes&lt;span class="w">
&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>While testing the sevice a few issues were discovered that could be considered as bugs, but instead let&amp;rsquo;s call them features!&lt;/p>
&lt;h3 id="feature-1">Feature #1&lt;/h3>
&lt;p>Since the service lets users upload files, sometimes, if the file is large enough, the processing may kick in faster than the upload can be completed. In this case, the file may be corrupted and would not be possible to recover after unzipping. To mitigae this to a certain extent, a 5 second processing delay has been added to the &lt;code>monitor_service.sh&lt;/code> script, which will, as a result, delay the processing and hope that during those 5 seconds, the upload has finished.&lt;/p>
&lt;h3 id="feature-2">Feature #2&lt;/h3>
&lt;p>While creating the two Docker files describing each component of the service, I wanted to take an extra step and created a non-root user, so that the main process of the service starts as some user which does not have full root access to everything. This worked well while developing and testing on a local system using manual execution via &lt;code>docker-compose up/down&lt;/code> commands. However, once Ansible has been updated to use DC via the &lt;code>docker_compose&lt;/code> module, certain functionalities would be broken due to file/folder permission issues. Basically the mounted folders would belong to root, whereas the running process was non-root, so it could not save uploaded files for example. Further investigations will be done to solve this, until then, the Dockerfiles have been reverted to use root when starting the main processes.&lt;/p>
&lt;h2 id="conclusion">CONCLUSION&lt;/h2>
&lt;p>All in all, working on this project has been a great opportunity to practice such tools as Docker, Docker Compose and Ansible. While I have used Docker briefly before, I have never once used Ansible, and I learnt a great deal about it during this project. I can definitely see how it enables large organisations to streamline their processes when it comes to deploying and configuring various systems and services in their infrastructure. While this project is rather rudimentary, it gave me a good entry to this realm of IT.&lt;/p></description></item><item><title>Infrastructure as Code</title><link>https://flrnks.netlify.app/post/infra-as-code/</link><pubDate>Tue, 12 Nov 2019 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/infra-as-code/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>In this post I will briefly introduce different AWS services and show how to use Terraform to orchestrate and manage them. While the concept of the whole service is rather simple, its main use is enabling me to learn about this new emerging technology called Infrastructure-as-Code or IaC for short.&lt;/p>
&lt;h2 id="project-overview">Project overview&lt;/h2>
&lt;p>The main goal of this task is to deploy a server-less function and periodically query the Github API to get a list of public repositories for a given organisation (e.g.: Google). The retrieved information should then be stored in a compressed CSV file in a specific S3 bucket, while notifications should be created for new files saved to the bucket.&lt;/p>
&lt;p>&lt;img src="arch.png" alt="Go concurrency implemented">&lt;/p>
&lt;p>The main AWS components of the solution are:&lt;/p>
&lt;ul>
&lt;li>Lambda function written in Python&lt;/li>
&lt;li>CW Event Rule to schedule the Lambda periodically&lt;/li>
&lt;li>S3 for storing data in a bucket&lt;/li>
&lt;li>SQS for queueing notifications from S3&lt;/li>
&lt;/ul>
&lt;h2 id="possibilities">Possibilities&lt;/h2>
&lt;p>Various methods exist for the creation and configuration of these necessary resources. The most simple one is by logging in to the AWS Management Console and setting up each components one by one via the GUI. This method, however, is slow, cumbersome and quite prone to errors.&lt;/p>
&lt;p>A better option can be to use the
&lt;a href="https://aws.amazon.com/tools/" target="_blank" rel="noopener">AWS SDK&lt;/a> for your favourite programming language. Several options exist, such as Java, Python, GO, Node.js, etc&amp;hellip; This option is less error-prone, but still quite cumbersome and slow.&lt;/p>
&lt;p>Perhaps one of the best options is to use Terraform, which is a popular Infrastructure as Code or IaC tool these days. It lets you define your infrastructure in a configuration language and has its own internal engine that talks to the AWS SDK to create the necessary infrastructure you defined.&lt;/p>
&lt;h2 id="setup-procedure">Setup procedure&lt;/h2>
&lt;p>Before we can make use of Terraform to deploy our project on AWS, we need to set up credentials. This can be done by logging in to the AWS management console and going to Identity and Access Management section, which can provide the necessarz Access Key and Secret value that you need to put into a file on disk. These credentials should be saved to &lt;code>~/.aws/credentials&lt;/code> as follows:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="o">[&lt;/span>default&lt;span class="o">]&lt;/span>
&lt;span class="nv">aws_access_key_id&lt;/span> &lt;span class="o">=&lt;/span> XXXXXXXXXXXX
&lt;span class="nv">aws_secret_access_key&lt;/span> &lt;span class="o">=&lt;/span> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This enables Terraform to make changes to your AWS infrastructure through API calls made to AWS to provision resources according to your definition in the .tf file. Once you create the desired configuration a complete infrastructure can be deployed as simply as below:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">$ ▶ ls -la
-rw-r--r-- &lt;span class="m">1&lt;/span> user group 4.9K Nov &lt;span class="m">21&lt;/span> 22:58 main.tf
$ ▶ terraform init
...
Terraform has been successfully initialized!
$ ▶ terraform apply
...
Plan: &lt;span class="m">13&lt;/span> to add, &lt;span class="m">0&lt;/span> to change, &lt;span class="m">2&lt;/span> to destroy.
Do you want to perform these actions?
Enter a value: YES
&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="project-building-blocks">Project building blocks&lt;/h2>
&lt;p>In this section I will go over each major component and explain what it is, what it does and how it is set up. First up is the main component: the core logic implemented in Python.&lt;/p>
&lt;h3 id="aws-simple-storage-service">AWS Simple Storage Service&lt;/h3>
&lt;p>This is a basic building block which we use to store data generated by the Lambda function. Since Lambdas are by nature server-less, they do not have persistent storage attached which can be used to save data between two invocations of the function. If we need persistent storage we need to use S3. The necessary Terraform code is below:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-tf" data-lang="tf">&lt;span class="kr">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_s3_bucket&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;tf_aws_bucket&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="na">bucket&lt;/span> = &lt;span class="s2">&amp;#34;tf-aws-bucket&amp;#34;&lt;/span>
&lt;span class="na">tags&lt;/span> = &lt;span class="p">{&lt;/span>
&lt;span class="na">Name&lt;/span> = &lt;span class="s2">&amp;#34;Bucket for Terraform project&amp;#34;&lt;/span>
&lt;span class="na">Environment&lt;/span> = &lt;span class="s2">&amp;#34;Dev&amp;#34;&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="na">force_destroy&lt;/span> = &lt;span class="s2">&amp;#34;true&amp;#34;&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This will create a bucket named &lt;code>tf-aws-bucket&lt;/code> which we can then use to store the results of our Lambda function. As an extra feature, we also configured notifications for this bucket, which will be created when a compressed file with &lt;code>.gz&lt;/code> file type is created in the bucket. When this happens a notification will be generated and sent to the SQS queue that is also defined in the same Terraform file.&lt;/p>
&lt;h3 id="aws-lambda">AWS Lambda&lt;/h3>
&lt;p>AWS Lambda is a server-less technology which lets you create a bare function in the cloud and call it from various other services, without having to worry about setting up an environment where it will run. Different programming language are supported, such as Python, Java, Go and NodeJS. Once you deploy your code, you can receive input to your function just as normally when you write a function, and give it permission to access and modify other resources in AWS, such as working with files stored in S3.&lt;/p>
&lt;p>This is exactly the use-case that was implemented in this project. A lambda function that makes an API call to Github to download information, then store this in a compressed CSV file to an S3 bucket. To define the target organisation and the bucket where information is saved, the Lambda function expects two arguments in the function call:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="p">{&lt;/span>
&lt;span class="nt">&amp;#34;org_name&amp;#34;&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;twitter&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="nt">&amp;#34;target_bucket&amp;#34;&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;repos_folder&amp;#34;&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This JSON input passed to the function is converted to a map in Python, which can be tested for the presence of necessary keys for the correct functioning of the code:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="k">def&lt;/span> &lt;span class="nf">handler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="p">):&lt;/span>
&lt;span class="c1"># verify that URL is passed correctly and create file_name variable based on it&lt;/span>
&lt;span class="k">if&lt;/span> &lt;span class="s1">&amp;#39;org_name&amp;#39;&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">keys&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="s1">&amp;#39;target_bucket&amp;#39;&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">keys&lt;/span>&lt;span class="p">():&lt;/span>
&lt;span class="k">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Missing &amp;#39;org_name&amp;#39; from request body (JSON)!&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The rest of the function&amp;rsquo;s code downloads the list of public repositories of the passed organisation from Github API and store this in a temporary file that can be uploaded to S3, provided that the necessary permissions have been granted to this Lambda function:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="kn">import&lt;/span> &lt;span class="nn">boto3&lt;/span>
&lt;span class="n">s3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">boto3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">client&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;s3&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="n">s3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">upload_file&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">path_to_local_file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">target_bucket_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">key_name&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In order to enable access to S3 from Lambda, we have to define some IAM policies and roles. First we have to define a policy which says that the role, which obtains this policy can access the S3 bucket:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-tf" data-lang="tf">&lt;span class="kr">data&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_lambda_access&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">statement&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="na">effect&lt;/span> = &lt;span class="s2">&amp;#34;Allow&amp;#34;&lt;/span>
&lt;span class="na">resources&lt;/span> = &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;arn:aws:s3:::tf-aws-bucket/*&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;span class="na">actions&lt;/span> = &lt;span class="p">[&lt;/span>
&lt;span class="s2">&amp;#34;s3:GetObject&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="s2">&amp;#34;s3:PutObject&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="s2">&amp;#34;s3:ListBucket&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;span class="p">]&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="kr">
&lt;/span>&lt;span class="kr">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_policy&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_lambda_access&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="na">name&lt;/span> = &lt;span class="s2">&amp;#34;s3_lambda_access&amp;#34;&lt;/span>
&lt;span class="na">policy&lt;/span> = &lt;span class="nb">data&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">aws_iam_policy_document&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">s3_lambda_access&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">json&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This policy is then attached to an IAM role which is allowed to be assumed by AWS Lambda:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-tf" data-lang="tf">&lt;span class="kr">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_role_policy_attachment&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;s3_lambda_access&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="na">role&lt;/span> = &lt;span class="nx">aws_iam_role&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">tf_aws_exercise_role&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">name&lt;/span>
&lt;span class="na">policy_arn&lt;/span> = &lt;span class="nx">aws_iam_policy&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">s3_lambda_access&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">id&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="kr">
&lt;/span>&lt;span class="kr">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_role&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;tf_aws_exercise_role&amp;#34;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="na">name&lt;/span> = &lt;span class="s2">&amp;#34;tfExerciseRole&amp;#34;&lt;/span>
&lt;span class="na">description&lt;/span> = &lt;span class="s2">&amp;#34;Role that allowed to be assumed by AWS Lambda, which will be taking all actions.&amp;#34;&lt;/span>
&lt;span class="na">tags&lt;/span> = &lt;span class="p">{&lt;/span>
&lt;span class="na">owner&lt;/span> = &lt;span class="s2">&amp;#34;tfExerciseBoss&amp;#34;&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="na">assume_role_policy&lt;/span> = &lt;span class="o">&amp;lt;&amp;lt;EOF&lt;/span>&lt;span class="s">
&lt;/span>&lt;span class="s">{
&lt;/span>&lt;span class="s"> &amp;#34;Version&amp;#34;: &amp;#34;2012-10-17&amp;#34;,
&lt;/span>&lt;span class="s"> &amp;#34;Statement&amp;#34;: [
&lt;/span>&lt;span class="s"> {
&lt;/span>&lt;span class="s"> &amp;#34;Action&amp;#34;: &amp;#34;sts:AssumeRole&amp;#34;,
&lt;/span>&lt;span class="s"> &amp;#34;Principal&amp;#34;: {
&lt;/span>&lt;span class="s"> &amp;#34;Service&amp;#34;: &amp;#34;lambda.amazonaws.com&amp;#34;
&lt;/span>&lt;span class="s"> },
&lt;/span>&lt;span class="s"> &amp;#34;Effect&amp;#34;: &amp;#34;Allow&amp;#34;
&lt;/span>&lt;span class="s"> }
&lt;/span>&lt;span class="s"> ]
&lt;/span>&lt;span class="s">}
&lt;/span>&lt;span class="s">&lt;/span>&lt;span class="o">EOF&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="aws-cloudwatch-events">AWS CloudWatch Events&lt;/h3>
&lt;p>This component is responsible for periodically making a call to our Lambda function, with the required arguments passed in JSON format. This component was also configured via Terraform, but for the sake of simplicity, below is a screenshot taken from the AWS Management Console where the created CW event shows up as configured:&lt;/p>
&lt;p>&lt;img src="cwe.png" alt="Cloudwatch Events Rule">&lt;/p>
&lt;p>The screen-shot shows that it is configured to periodically execute a Target Lambda function every 2 minutes.&lt;/p>
&lt;h3 id="results">Results&lt;/h3>
&lt;p>In summary, it took me a while to get the hang of Infrastructure as Code concept and apply it while working with Terraform on AWS, but I can definitely see how it can benefit a bigger organisation which want their Cloud infrastructure to be stable and maintainable. IaC tools such as Terraform let developers define their infrastructure as code and check it in to version control for repeatable and more predictable deployment procedures. Now that I have this working project, I can do a simple &lt;code>terraform deploy&lt;/code> to bring alive my service with all required components and permissions correctly set up in seconds, while also being able to quickly destroy it if I chose to do so. This gives flexibility and greater ease of development that can speed up projects in the cloud.&lt;/p></description></item><item><title>Performance tuning GO</title><link>https://flrnks.netlify.app/post/go-performance/</link><pubDate>Mon, 11 Nov 2019 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/go-performance/</guid><description>&lt;h3 id="introduction">Introduction&lt;/h3>
&lt;p>This post is going to contain a short story on how I managed to optimize the execution of a simple program, written for a coding challenge on the site &lt;code>runcode.ninja&lt;/code>.&lt;/p>
&lt;p>Short description of the task:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">There is a text file which is given as argument to your program.This text
file contains lines, each of which is an encoded englishword. Recover them
and print them out to the standard output lineby line. Hint: the UNIX
built-in dictionary may come in handy at &lt;span class="s2">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>.
&lt;/code>&lt;/pre>&lt;/div>&lt;p>To attack problem, I used the GO language to write a program which used the built-in &lt;code>encoding&lt;/code> and &lt;code>os/exec&lt;/code> packages to decode the lines and to call grep to search in the file-based dictionary. It was not very difficult to figure out that the encoding in use was base64.&lt;/p>
&lt;p>However, to make each line valid either a single &lt;code>=&lt;/code> or double equation &lt;code>==&lt;/code> characters had to be added to each line. The below code takes care of this addition of extra characters at the end of each line.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">func&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">decoded&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">base64&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StdEncoding&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">DecodeString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">encodedStr&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="s">&amp;#34;=&amp;#34;&lt;/span>
&lt;span class="nx">decoded&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">base64&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StdEncoding&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">DecodeString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">decoded&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In order to test if the result of a decode operation is a valid word, a helper function was written, which is passed a string as an argument and performed the call to grep via &lt;code>os/exec&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">func&lt;/span> &lt;span class="nf">dictLookup&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">dictLocation&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="s">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>
&lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">exec&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Command&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;grep&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;-w&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">dictLocation&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">Output&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="kc">false&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="kc">true&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Finally, putting these pieces together, there is a function which reads in the txt file, iterates over the lines and calls decode and dict lookup until a valid word comes out, then prints it to standard output. Below is the sample code.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="nx">scanner&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="kd">var&lt;/span> &lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">line&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">())&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="p">!(&lt;/span>&lt;span class="nf">dictLookup&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">line&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Println&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="initial-results">Initial results&lt;/h3>
&lt;p>The sample code worked well enough and running it on the test / sample data provided yielded correct output, so all seemed to be fine!&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">flrnks@t460:~/drop_the_bass &lt;span class="o">(&lt;/span>master&lt;span class="o">)&lt;/span> ▶ go run main.go input.txt
interpretation
sanctioned
lawn
electives
unifying
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then came the idea to try to test this code on both of my laptops because it did not seem to run very quickly, even though it only had to decode 5 lines. So one of the machines I have is a ThinkPad T460 with an i5 and 16GB of RAM, while the other is a 15&amp;rdquo; MacBook Pro with i9 CPU and 32GB of RAM. I initially developed the code on the ThinkPad, and was quite surprised how much slower it was to execute on the MacBook. I would have expected that it would be the opposite, since the ThinkPad is around 3-4 years old already with a less powerful CPU. Initial test results from both machine:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash"> &lt;span class="o">[&lt;/span>MacBook&lt;span class="o">]&lt;/span> &lt;span class="o">[&lt;/span>ThinkPad&lt;span class="o">]&lt;/span>
interpretation 285.76ms 32.61ms
lawn 425.63ms 59.31ms
unifying 1.10s 93.60ms
electives 1.20s 91.10ms
sanctioned 6.18s 141.28ms
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Overall the MacBook took on average 9 seconds to finish, while the ThinkPad took around 0.5 to 1 second to finish. This was not normal, so I had to investigate! 👀 😄&lt;/p>
&lt;h3 id="performance-tuning-10">Performance Tuning 1.0&lt;/h3>
&lt;p>Seeing the results and the difference in performance, I was quite interested what could be the cause for such a performance drop on the MacBook. My first idea was to implement concurrency into the processing, so that instead of reading lines sequentially, they get processed in parallel by getting assigned to a worker using channels, which will return it to the main routine waiting for the results.&lt;/p>
&lt;p>&lt;img src="concurrent-go.png" alt="Go concurrency implemented">&lt;/p>
&lt;p>The above figure contains the basic idea for this concurrent processing model and the below code snippet shows some parts of the code that are most important:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="c1">// define the channels for distributing work and collecting the results
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="nx">jobs&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">chan&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="nx">results&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">chan&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="c1">// use the waitgroup for syncing up between the workers
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="nx">wg&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">new&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sync&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">WaitGroup&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="c1">// start up some workers that will block and wait
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="o">++&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wg&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">go&lt;/span> &lt;span class="nf">workerFunc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">jobs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">results&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">wg&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="c1">// interate over the file line by line and queue them up in the jobs channel
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">go&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">scanner&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">jobs&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="nb">close&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">jobs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}()&lt;/span>
&lt;span class="c1">// In parallel routine wait for WG to finish and close channel for results
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">go&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wg&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Wait&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="nb">close&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">results&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}()&lt;/span>
&lt;span class="c1">// Print out the results from the results channel.
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">results&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Println&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">v&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This parallel processing has noticeable improved the performance, but still did not eliminate the substantial difference between the two platforms.&lt;/p>
&lt;p>&lt;em>Note&lt;/em>: implementing the concurrent model means the words on the standard output will appear in a random order, and so the submission to the grading system might fail.&lt;/p>
&lt;h3 id="performance-tuning-20">Performance Tuning 2.0&lt;/h3>
&lt;p>Next, I was looking around on the internet (StackOverFlow.com in particular) where I got the idea to stop calling grep via the &lt;code>os/exec&lt;/code> package, and instead read the contents of the dictionary into memory and perform lookups that way. Essentially this was trading memory footprint for speed. So then I create a global dictionary {&amp;lsquo;map[string]bool&amp;rsquo;} which was loaded once at the start of the program and used as often as needed by the various go-routines. And this was perfectly fine because the worker routines called read-only operations on this map so there was no issue with concurrent access to the global map variable.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">var&lt;/span> &lt;span class="nx">wordDict&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">map&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="kt">bool&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="kd">func&lt;/span> &lt;span class="nf">loadDictionary&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">dict&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">defer&lt;/span> &lt;span class="nx">dict&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Close&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="nx">ds&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">dict&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">ds&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wordDict&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">ds&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">()]&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="kc">true&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This way the lookups in the dictionary cannot be a bottleneck of the I/O system of the particular OS the program is running on. Executing the same timing test this time yielded much improved results. It became clear that the issue on the MacBook was slow execution of the external &lt;code>grep&lt;/code> call from the GO program. Why this is the reason I am not sure, but the results speak for themselves:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash"> &lt;span class="o">[&lt;/span>MacBook&lt;span class="o">]&lt;/span> &lt;span class="o">[&lt;/span>ThinkPad&lt;span class="o">]&lt;/span>
interpretation 54.691µs 24.17µs
lawn 65.922µs 9.176µs
unifying 155.726µs 71.785µs
electives 113.074µs 47.478µs
sanctioned 286.94µs 464.20µs
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Somehow the older and less powerful ThinkPad still seems considerably faster, but at least the difference is not so substantial anymore&amp;hellip; 😌&lt;/p>
&lt;h3 id="results">Results&lt;/h3>
&lt;p>The below picture briefly summarizes the observed results when it comes to performance, which was measured by execution time. In order to mitigate transient effects on execution time, there were 10 measurements taken for each variant.&lt;/p>
&lt;p>&lt;img src="perf.png" alt="Performance measurements">&lt;/p>
&lt;p>Explanation for the different variants (Seq vs. Con and Grep vs Map):&lt;/p>
&lt;ul>
&lt;li>&lt;code>Seq&lt;/code>: each line is decoded one after the other in sequence.&lt;/li>
&lt;li>&lt;code>Con&lt;/code>: each line is processed concurrently on a pool of workers.&lt;/li>
&lt;li>&lt;code>Grep&lt;/code>: dictionary lookup done via exec call to GREP.&lt;/li>
&lt;li>&lt;code>Map&lt;/code>: dictionary is loaded into a string map in memory.&lt;/li>
&lt;/ul>
&lt;p>Quite frankly, the results speak for themselves. The most notable thing is that, compared to the most basic version (Seq-Grep), the biggest improvement is achieved not by using concurrency, but by eliminating the repeated calls to Grep.&lt;/p>
&lt;p>This is not to say that enabling concurrency did not have an impact on the execution time, on average it decreased from 9 to 6 seconds, which is quite good already!&lt;/p>
&lt;p>However, I/O latency seems to have a higher cost on the performance than lack of parallel processing. At least at the scale of input for this example this is the case. This difference is less pronounced when tests were run using a file which had 500 lines of encoded words (instead of just 5).&lt;/p>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>Never underestimate the power of I/O delay and the effect it can have on your program. Even if you have a very powerful machine, this can bog your performance down considerably! Also, it may help your program&amp;rsquo;s performance further, if you implement proper concurrent processing whenever possible.&lt;/p></description></item></channel></rss>