Infrastructure as Code

Simple programming project with Terraform and AWS.

Last updated on Jun 4, 2020 7 min read infra-as-code

Introduction

In this post I will briefly introduce different AWS services and show how to use Terraform to orchestrate and manage them. While the concept of the whole service is rather simple, its main use is enabling me to learn about this new emerging technology called Infrastructure-as-Code or IaC for short.

Project overview

The main goal of this task is to deploy a server-less function and periodically query the Github API to get a list of public repositories for a given organisation (e.g.: Google). The retrieved information should then be stored in a compressed CSV file in a specific S3 bucket, while notifications should be created for new files saved to the bucket.

Go concurrency implemented

The main AWS components of the solution are:

Lambda function written in Python
CW Event Rule to schedule the Lambda periodically
S3 for storing data in a bucket
SQS for queueing notifications from S3

Possibilities

Various methods exist for the creation and configuration of these necessary resources. The most simple one is by logging in to the AWS Management Console and setting up each components one by one via the GUI. This method, however, is slow, cumbersome and quite prone to errors.

A better option can be to use the AWS SDK for your favourite programming language. Several options exist, such as Java, Python, GO, Node.js, etc… This option is less error-prone, but still quite cumbersome and slow.

Perhaps one of the best options is to use Terraform, which is a popular Infrastructure as Code or IaC tool these days. It lets you define your infrastructure in a configuration language and has its own internal engine that talks to the AWS SDK to create the necessary infrastructure you defined.

Setup procedure

Before we can make use of Terraform to deploy our project on AWS, we need to set up credentials. This can be done by logging in to the AWS management console and going to Identity and Access Management section, which can provide the necessarz Access Key and Secret value that you need to put into a file on disk. These credentials should be saved to ~/.aws/credentials as follows:

[default]
aws_access_key_id = XXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This enables Terraform to make changes to your AWS infrastructure through API calls made to AWS to provision resources according to your definition in the .tf file. Once you create the desired configuration a complete infrastructure can be deployed as simply as below:

$ ▶ ls -la
-rw-r--r--   1 user  group   4.9K Nov 21 22:58  main.tf
$ ▶ terraform init
...
Terraform has been successfully initialized!

$ ▶ terraform apply
...
Plan: 13 to add, 0 to change, 2 to destroy.
Do you want to perform these actions?
  Enter a value: YES

Project building blocks

In this section I will go over each major component and explain what it is, what it does and how it is set up. First up is the main component: the core logic implemented in Python.

AWS Simple Storage Service

This is a basic building block which we use to store data generated by the Lambda function. Since Lambdas are by nature server-less, they do not have persistent storage attached which can be used to save data between two invocations of the function. If we need persistent storage we need to use S3. The necessary Terraform code is below:

resource "aws_s3_bucket" "tf_aws_bucket" {
  bucket = "tf-aws-bucket"
  tags = {
    Name        = "Bucket for Terraform project"
    Environment = "Dev"
  }
  force_destroy = "true"
}

This will create a bucket named tf-aws-bucket which we can then use to store the results of our Lambda function. As an extra feature, we also configured notifications for this bucket, which will be created when a compressed file with .gz file type is created in the bucket. When this happens a notification will be generated and sent to the SQS queue that is also defined in the same Terraform file.

AWS Lambda

AWS Lambda is a server-less technology which lets you create a bare function in the cloud and call it from various other services, without having to worry about setting up an environment where it will run. Different programming language are supported, such as Python, Java, Go and NodeJS. Once you deploy your code, you can receive input to your function just as normally when you write a function, and give it permission to access and modify other resources in AWS, such as working with files stored in S3.

This is exactly the use-case that was implemented in this project. A lambda function that makes an API call to Github to download information, then store this in a compressed CSV file to an S3 bucket. To define the target organisation and the bucket where information is saved, the Lambda function expects two arguments in the function call:

{
  "org_name" : "twitter",
  "target_bucket" : "repos_folder"
}

This JSON input passed to the function is converted to a map in Python, which can be tested for the presence of necessary keys for the correct functioning of the code:

def handler(event, context):
    # verify that URL is passed correctly and create file_name variable based on it
    if 'org_name' not in event.keys() or 'target_bucket' not in event.keys():
      print("Missing 'org_name' from request body (JSON)!")

The rest of the function’s code downloads the list of public repositories of the passed organisation from Github API and store this in a temporary file that can be uploaded to S3, provided that the necessary permissions have been granted to this Lambda function:

import boto3
s3 = boto3.client("s3")
s3.upload_file(path_to_local_file, target_bucket_name, key_name)

In order to enable access to S3 from Lambda, we have to define some IAM policies and roles. First we have to define a policy which says that the role, which obtains this policy can access the S3 bucket:

data "aws_iam_policy_document" "s3_lambda_access" {
  statement {
    effect    = "Allow"
    resources = ["arn:aws:s3:::tf-aws-bucket/*"]
    actions = [
      "s3:GetObject",
      "s3:PutObject",
      "s3:ListBucket",
    ]
  }
}

resource "aws_iam_policy" "s3_lambda_access" {
  name   = "s3_lambda_access"
  policy = data.aws_iam_policy_document.s3_lambda_access.json
}

This policy is then attached to an IAM role which is allowed to be assumed by AWS Lambda:

resource "aws_iam_role_policy_attachment" "s3_lambda_access" {
  role       = aws_iam_role.tf_aws_exercise_role.name
  policy_arn = aws_iam_policy.s3_lambda_access.id
}

resource "aws_iam_role" "tf_aws_exercise_role" {
  name               = "tfExerciseRole"
  description        = "Role that allowed to be assumed by AWS Lambda, which will be taking all actions."
  tags = {
      owner = "tfExerciseBoss"
  }
  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow"
    }
  ]
}
EOF
}

AWS CloudWatch Events

This component is responsible for periodically making a call to our Lambda function, with the required arguments passed in JSON format. This component was also configured via Terraform, but for the sake of simplicity, below is a screenshot taken from the AWS Management Console where the created CW event shows up as configured:

Cloudwatch Events Rule

The screen-shot shows that it is configured to periodically execute a Target Lambda function every 2 minutes.

Results

In summary, it took me a while to get the hang of Infrastructure as Code concept and apply it while working with Terraform on AWS, but I can definitely see how it can benefit a bigger organisation which want their Cloud infrastructure to be stable and maintainable. IaC tools such as Terraform let developers define their infrastructure as code and check it in to version control for repeatable and more predictable deployment procedures. Now that I have this working project, I can do a simple terraform deploy to bring alive my service with all required components and permissions correctly set up in seconds, while also being able to quickly destroy it if I chose to do so. This gives flexibility and greater ease of development that can speed up projects in the cloud.

aws terraform python cloudwatch lambda s3