My first scala app

How I wrote my first re-usable scala module and then used it to create a CLI app in Scala for interacting with AWS services.

Florian Akos Szabo

Last updated on Apr 3, 2024 11 min read programming

Scala AWS utils Scala CLI app

Motivation

In this post I wanted to write about a personal project I started some time ago, with the goal of learning more about Scala. At work, we use Scala quite often to run big data jobs on AWS using Apache Spark. I’ve never used Scala before I joined my current team, and its syntax was very alien to me. However, recently I had the chance to work on a task, where I had to modify a component to use AWS Secrets Manager instead of HashiCorp’s Vault for fetching some secret value at runtime. To my surprise I could complete this work without much struggle with Scala, and afterwards I became eager to learn more. Based on a colleague’s recommendation I started reading a book from Cay S. Horstmann titled Scala for the impatient (2nd edition). I’m making slow but steady progress.

Shortly after starting with the book, I had the idea to start a small project so that I can practice Scala by doing.

The Idea

The idea, like many others before, came while fixing a bug at work. The bug was found within a component written in Scala to interact with the AWS Athena service. It had some neatly written functionality for making queries and waiting for their completion before trying to fetch the results. I thought I would try to write something similar for AWS Systems Manager (SSM). It is a service with few different components, so I decided to focus on Automation Documents that can carry out actions in an automated fashion. For example, the AWS provided SSM document AWS-StartEC2Instance can run any EC2 instance when invoked with the below 2 input parameters:

InstanceId: to specify which EC2 instance you want to start
AutomationAssumeRole: to specify an IAM role which can be assumed by SSM to carry out this action

I realized quite early on, that if I wanted to implement this capability in my Scala app, it needed to be quite generic, so that it could support any Automation Document with an arbitrary number of input parameters. I also wanted it to be able to wait for the execution and report whether it failed or succeeded. Here are the final requirements I came up with:

create 2 separate git repos for:
- a module that’s home for the AWS utility/helper classes
- a module for implementing the CLI App
support extra AWS services such as KMS, Secrets Manager and CloudFormation
utilize localstack for integration testing (when possible)

Initial setup

Firstly, I had to figure out which third-party packages I needed to implement the app according to these simple requirements. To interact with AWS from Scala code, I decided to go with v2 of the official Java SDK for AWS. To implement the CLI app I mainly relied on the picocli Java package, which was a bit less straightforward, but eventually it proved to be a good choice.

Secondly, I have to admit that creating a re-usable scala package from scratch was a rather non-trivial task for me. Most of my programming experience comes from working with in non-JVM based environments so that’s probably no surprise. I initially started out with sbt for build & dependency management, but I was running into issues that I couldn’t solve on my own, so I decided to swap it with maven which was a bit more familiar to me.

Finally, separating the project into two distinct git repositories allowed me to practice versioning and dependency management which I also found very useful:

AWS Scala Utils: https://github.com/florianakos/aws-utils-scala
AWS SSM CLI App: https://github.com/florianakos/aws-ssm-scala-app

The utils module

Creating the utils module that would serve as a kind of glue between the scala CLI app and AWS Systems Manager was actually not as difficult as I thought. This is mostly thanks to the example I’ve seen at work for a similar project with the AWS Athena service.

The core functionality of the utils module when it comes to SSM, is captured in the below functions:

private def executeAutomation(documentName: String, parameters: java.util.Map[String,java.util.List[String]]): Future[String] = { 
  val startAutomationRequest = StartAutomationExecutionRequest.builder()
    .documentName(documentName)
    .parameters(parameters)
    .build() 
  Future {
    val executionResponse = ssmClient.startAutomationExecution(startAutomationRequest)
    logger.info(s"Execution id: ${executionResponse.automationExecutionId()}")
    executionResponse.automationExecutionId()
  }
}

private def waitForAutomationToFinish(executionId: String): Future[String] = {
  val getExecutionRequest = GetAutomationExecutionRequest.builder().automationExecutionId(executionId).build()
  var status = AutomationExecutionStatus.IN_PROGRESS
  Future {
    var retries = 0
    while (status != AutomationExecutionStatus.SUCCESS) {
      val automationExecutionResponse = ssmClient.getAutomationExecution(getExecutionRequest)
      status = automationExecutionResponse.automationExecution.automationExecutionStatus()
      status match {
        case AutomationExecutionStatus.CANCELLED | AutomationExecutionStatus.FAILED | AutomationExecutionStatus.TIMED_OUT =>
          throw SsmAutomationExecutionException(status, automationExecutionResponse.automationExecution.failureMessage)
        case AutomationExecutionStatus.SUCCESS =>
          logger.info(s"Query finished with status: $status")
        case status: AutomationExecutionStatus =>
          logger.info(s"SSM Automation execution status: $status, check #$retries.")
          Thread.sleep(if (retries <= 3) 2500 else if (retries <= 10) 5000 else 15000)
      }
      retries += 1
    }
  }.map(_ => executionId)
}

The first one executeAutomation crafts an execution request and then submits it to AWS, returning its execution ID. This ID can be passed to the waitForAutomationToFinish function that periodically checks in with AWS until the execution is complete. Between subsequent API requests it uses an increasing timeout to prevent API rate-limiting caused by excessive polling.

Testing the utils module

Once I had the core functionality ready I wanted to write integration tests to ensure it works as expected. Instead of having hard-coded AWS credentials or an AWS profile for a real account I wanted to use Localstack that mocks the real AWS API so that you can interact with it. For this reason I slightly tweaked the SsmAutomationHelper class to accept an Optional second argument which can be used while building the SSM API client:

class SsmAutomationHelper(profile: String, apiEndpoint: Option[String]) extends LazyLogging {
  private val ssmClient = apiEndpoint match {
    case None => SsmClient.builder()
      .credentialsProvider(ProfileCredentialsProvider.create(profile))
      .region(Region.EU_WEST_1)
      .build()
    case Some(localstackEndpoint) => SsmClient.builder()
      .credentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create("foo", "bar")))
      .endpointOverride(URI.create(localstackEndpoint))
      .build()
  }
}

This allowed me to pass http://localhost:4566 when running the integration tests against localstack and have the API calls directed to those mocked endpoints. Previously each mocked service had its own dedicated port, but thanks to a recent change in localstack, now all AWS services can be run on a single port, they call EDGE port.

According to the documentation, SSM is supported in localstack, however I’ve found out that running Automation Documents is feature that is still missing. As a result, I had to run the integration tests against a real AWS account that I set up for such scenarios. I was okay with doing this since there are plenty of built-in Automation Documents provided by AWS that I could safely use for this purpose.

Eventually I decided to encode in the tests AWS-StartEC2Instance & AWS-StopEC2Instance which only required me to set up a dummy EC2 instance which would be the target of these requests. I also added a special Tag to these integration tests so that they are excluded from running when invoked via mvn test but still available to run manually whenever necessary.

CLI App implementation

After running the tests, I was confident that the AWS utils worked correctly, so I started putting together the CLI app. For this, I’ve searched on the web for a third party package and found that it’s not as simple as it is when using Python’s argparse package. I eventually settled with picocli, which is written in Java but can also be used from Scala via the below annotations:

@Command(name = "SsmHelper", version = Array("v0.0.1"), mixinStandardHelpOptions = true, description = Array("CLI app for running automation documents in AWS SSM"))
class SsmCliParser extends Callable[Unit] with LazyLogging {
  
  @Option(names = Array("-D", "--document"), description = Array("Name of the SSM Automation document to execute"))
  private var documentName = new String

  @Parameters(index = "0..*", arity = "0..*", paramLabel = "<param1=val1> <param2=val2> ...", description = Array("Key=Value parameters to use as Input Params"))
  private val parameters: util.ArrayList[String] = null

  [...]

According to the original idea, there had to be one constant CLI flag which controlled the name of the AWS Automation Document (--document) and then there had to be a variable number of additional arguments for specifying the Input Parameters required by the given document. The picocli package supported this workflow via the @Option and the @Parameters annotations.

The only thing left was a custom function that would carry out the needed transformation of Input Parameters. The values received in the parameters were in the form of an ArrayList: [<param1=val1>, <param2=val2>, ...] which had to be transformed into a Map: [param1 -> [val1], param2 -> [val2]] by splitting each String on the = character. The desired format was a requirement of the AWS SDK for SSM. After some iterations I ended up with the below function that could do this transformation:

private def process(params: util.ArrayList[String]): util.Map[String, util.List[String]] = {
  params.asScala
    .map(_.split('='))
    .collect { case Array(key, value) => key -> value }
    .groupBy(_._1)
    .mapValues(_.map(_._2).asJava).asJava
}

Finally, I constructed the below method which utilized the SsmAutomationHelper class from the utils module and passed the two variables provided by picocli to it so it would invoke the necessary Automation Document and wait to retrieve its result via the Await mechanism of Scala:

def call(): Unit = {
  val conf = ConfigFactory.load()
  val inputParams = process(parameters)
  Await.result(SsmAutomationHelper.newInstance(conf).runDocumentWithParameters(documentName, inputParams), 10.minutes)
}

Packaging the CLI app

At this point I was ready with the CLI app and wanted to run it to see how it would function. Before I could run it, I needed to figure out how to package it all into a fat JAR file with all needed dependencies, so that it could be invoked with CLI arguments. I googled around a bit and quickly found the spring-boot-maven-plugin which has the repackage goal that’s just what I needed:

Repackages existing JAR and WAR archives so that they can be executed from the command line using java -jar. With layout=NONE can also be used simply to package a JAR with nested dependencies (and no main class, so not executable).

I only had to add the below lines to my project’s pom.xml:

<plugin>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-maven-plugin</artifactId>
  <version>2.3.2.RELEASE</version>
  <configuration>
      <layout>JAR</layout>
  </configuration>
  <executions>
      <execution>
          <goals>
            <goal>repackage</goal>
          </goals>
      </execution>
  </executions>
</plugin>

Next I just had to run the mvn package command, which invokes the plugin to builds the fat JAR.

Running the CLI app

Once the JAR is available, it can be used via the java -jar ... command with extra arguments to run the any Automation Document such as AWS-StartEC2Instance:

$ ▶ java -jar ./target/scala-cli-app-1.0.0.jar --document=AWS-StartEC2Instance InstanceId=i-0ed4574c5ba94c877 AutomationAssumeRole=arn:aws:iam::{{global:ACCOUNT_ID}}:role/AutomationServiceRole

15:24:41.998 [main] INFO  c.f.utils.ssm.SsmAutomationHelper :: Going to kick off SSM orchestration document: AWS-StartEC2Instance
15:24:42.773 [ForkJoinPool-1-worker-29] INFO  c.f.utils.ssm.SsmAutomationHelper :: Execution id: <...>
15:24:42.882 [ForkJoinPool-1-worker-11] INFO  c.f.utils.ssm.SsmAutomationHelper :: Current status: [InProgress], retry counter: #0
[...]
15:28:01.226 [ForkJoinPool-1-worker-11] INFO  c.f.utils.ssm.SsmAutomationHelper :: Current status: [InProgress], retry counter: #21
15:28:16.442 [ForkJoinPool-1-worker-11] INFO  c.f.utils.ssm.SsmAutomationHelper :: Execution finished with final status: [Success]
15:28:16.444 [main] INFO  com.flrnks.app.SsmCliParser :: SSM execution run took 215 seconds

Seems to be working quite well!

Bonus: running in a container

I thought I would take the above one step further and package the JAR into a java based docker container. This would allow me to forget about the syntax of the java command that I previously used to run the app. Instead, I can hide it in a very minimal Dockerfile:

FROM openjdk:8-jdk-alpine
MAINTAINER flrnks <flrnks@flrnks.netlify.com>
ADD target/scala-cli-app-1.0.0.jar /usr/share/backend/app.jar
ENTRYPOINT [ "/usr/bin/java", "-jar", "/usr/share/backend/app.jar"]

The mvn package command which is used to build the fat JAR will save it into the /target subdirectory, so one can put this Dockerfile into the project’s root and then manually build the docker image by running docker build -t ssmcli .. This will create an image called ssmcli without issues, however I’ve found an awesome plugin called dockerfile-maven-plugin built by Spotify which can automagically take this Dockerfile and turn it into an image based on the plugin configuration:

<plugin>
  <groupId>com.spotify</groupId>
  <artifactId>dockerfile-maven-plugin</artifactId>
  <version>1.4.10</version>
  <executions>
    <execution>
      <id>default</id>
      <goals>
        <goal>build</goal>
      </goals>
      <configuration>
        <repository>flrnks/ssmcli</repository>
        <tag>latest</tag>
      </configuration>
    </execution>
  </executions>
</plugin>

This plugin hooks into the mvn package goal and when it’s executed it will automatically create the docker image:

[INFO] --- spring-boot-maven-plugin:2.3.2.RELEASE:repackage (default) @ scala-cli-app ---
[INFO] Layout: JAR
[INFO] Replacing main artifact with repackaged archive
[INFO] 
[INFO] --- dockerfile-maven-plugin:1.4.10:build (default) @ scala-cli-app ---
[INFO] dockerfile: null
[INFO] contextDirectory: /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
[INFO] Building Docker context /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
[INFO] Path(dockerfile): null
[INFO] Path(contextDirectory): /Users/flszabo/Desktop/personal-wrkspc/scala/scala-cli-app
[INFO] 
[INFO] Image will be built as flrnks/ssmcli:latest
[INFO] Step 1/4 : FROM openjdk:8-jdk-alpine
[INFO] Pulling from library/openjdk
[INFO] Digest: sha256:94792824df2df33402f201713f932b58cb9de94a0cd524164a0f2283343547b3
[INFO] Status: Image is up to date for openjdk:8-jdk-alpine
[INFO]  ---> a3562aa0b991
[INFO] Step 2/4 : MAINTAINER flrnks <flrnks@flrnks.netlify.com>
[INFO]  ---> Using cache
[INFO]  ---> efcc673b4f35
[INFO] Step 3/4 : ADD target/scala-cli-app-1.0.0.jar /usr/share/backend/app.jar
[INFO]  ---> 8b2cf76f03c2
[INFO] Step 4/4 : ENTRYPOINT [ "/usr/bin/java", "-jar", "/usr/share/backend/app.jar"]
[INFO]  ---> Running in c9633237f9fa
[INFO] Removing intermediate container c9633237f9fa
[INFO]  ---> 6db69aa30fb1
[INFO] Successfully built 6db69aa30fb1
[INFO] Successfully tagged flrnks/ssmcli:latest

To test this new docker image I ran the AWS-StopEC2Instance Automation Document and specified the same CLI arguments as before, thanks to the ENTRYPOINT configuration in the Dockerfile. As an extra step I needed to share the AWS profile with the docker container at runtime by using the flag -v ~/.aws:/root/.aws:

$ ▶ ddocker run --rm -v ~/.aws:/root/.aws flrnks/ssmcli --document=AWS-StopEC2Instance InstanceId=i-0ed4574c5ba94c877 AutomationAssumeRole=arn:aws:iam::{{global:ACCOUNT_ID}}:role/AutomationServiceRole

17:18:59.541 [main] INFO  c.f.utils.ssm.SsmAutomationHelper :: Going to kick off SSM orchestration document: AWS-StopEC2Instance
17:19:00.789 [ForkJoinPool-1-worker-13] INFO  c.f.utils.ssm.SsmAutomationHelper :: Execution id: <...>
17:19:00.966 [ForkJoinPool-1-worker-11] INFO  c.f.utils.ssm.SsmAutomationHelper :: Current status: [InProgress], retry counter: #0
17:19:03.564 [ForkJoinPool-1-worker-11] INFO  c.f.utils.ssm.SsmAutomationHelper :: Execution finished with final status: [Success]
17:19:03.568 [main] INFO  com.flrnks.app.SsmCliParser :: SSM execution run took 5 seconds

One may say that typing that long docker run ... command above takes longer than typing java -jar ./target/scala-cli-app-1.0.0.jar ... but I would argue that running it inside a docker container has its valid use-cases as well. It allows for controlled setup of the runtime environment and prevents dependency issues too!

Conclusion

This project has allowed me to learn much more than I initially expected. I learnt a lot about Scala, which was the original goal, but I also gained valuable experience with Maven, its plugin ecosystem and of course with Java as well. I hope whoever reads this post will find something useful in it too!

cloud aws java maven picocli scala