Run Azkaban on AWS

TL;DR

9 min readDec 30, 2020

Docker containers and Amazon Web Service (AWS) serverless technologies are great for running Azkaban clusters, not on an island known to Harry Potter fans but in clouds loved by computer geeks. This post helps existing Azkaban based projects migrating from enterprise cloud to AWS cloud. It might also be helpful for migrating Azkaban deployment to Microsoft Azure, and Google Cloud Platform (GCP).

Introduction

Azkaban is an open-source (OSS) distributed Workflow Manager (from LinkedIn) to run jobs based on their dependencies. The jobs can be a shell script, a Java program, a Hadoop job, etc. A workflow contains multiple jobs with interdependencies, such as JobA must complete before JobB can start. Azkaban has a wide customer base for applications such as ETL pipelines, data analytics products, and data processing for Artificial Intelligence (AI) and Machine Learning (ML) applications.

This post describes an approach to create Docker containers for running Azkaban, and deploy Azkaban clusters using AWS serverless technologies. The AWS services we use include Aurora Serverless MySQL, Elastic Container Service (ECS) Fargate, Application Load Balancer (ALB), Network Load Balancer (NLB), AWS System Manager Parameter Store, and Route53.

Azkaban Cluster Architecture

An Azkaban cluster consists of a database (running on MySQL), some Executor (wkr) nodes, and one WebServer (svr) node.

Highlights of running an Azkaban cluster includes:

Bring up database first, executors next and the WebServer last
Database (DB) contains information of active & inactive executors
A new executor auto registers itself with a given hostname in the DB as an inactive node
Run single WebServer node to avoid redundant job executions
Restart WebServer to sync local data (flows, executors, schedules, executions, etc.) with the DB

Release Azkaban in Docker Containers

It is necessary to build two Docker images, one for running Azkaban WebServer and one for running Azkaban Executor. First build Azkaban source code to create all of the Azkaban release artifacts (e.g., azkaban-web-server-3.91.0–3-g94cb830.tar.gz, azkaban-exec-server-3.91.0–3-g94cb830.tar.gz, etc.) as instructed in the source code documentation. Next build the two Docker images using the created release artifacts. Finally, upload the created Docker images to AWS Elastic Container Registry (ECR).

The following is a sample Dockerfile to build the Docker image for Azkaban WebServer. The Dockerfile adds release artifacts to the root directory of the container. It modifies the configuration file and start up scripts in the Azkaban source code to run Azkaban inside Docker container. For example, it replaces the hard coded MySQL configuration values in the azkaban.properties file with placeholder values. This allow customization of MySQL configuration values to run Azkaban WebServer in a container in different deployment environments (e.g., DEV, QA, PROD, etc.) with environment variables.


# VERSION 1.0
# DESCRIPTION: Simple Debian image for azkaban-webserver FROM ubuntu:xenial-20200619
MAINTAINER Dan Tian# Turn off user prompts on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux# Runtime parameters as environment variablesENV MYSQL_PORT=3306
ENV MYSQL_HOST=localhost
ENV MYSQL_DATABASE=azkaban
ENV MYSQL_USER=azkaban
ENV MYSQL_USER_PASSWORD=azkaban
ENV AWS_REGION=us-west-2
ENV AZKABAN_WEB_SERVER_NAME=''## DEPLOY_ENV identifies environment (e.g., dev, qa, prod, etc.) for running the application 
ENV DEPLOY_ENV=dev# Docker build arguments (saved as environment variables for debugging purposes)ARG AZK_VERSION=3.91.0-3-g94cb830
ENV AZKABAN_VERSION=$AZK_VERSIONARG AZK_WEBSERVER_PORT=8081
ENV AZKABAN_WEBSERVER_PORT=$AZK_WEBSERVER_PORTRUN buildDeps='python telnet procps less curl netcat mysql-client default-jre' \
  && apt-get update && apt-get install -y $buildDeps --no-install-recommends \
  && rm -rf /var/lib/apt/lists/*### Install AWS CLIRUN cd /tmp && curl -O https://bootstrap.pypa.io/get-pip.py && python get-pip.py && rm get-pip.py && cd      
RUN pip install awscli --upgradeWORKDIR /rootADD azkaban-web-server-${AZK_VERSION}.tar.gz .
ADD azkaban-db-${AZK_VERSION}.tar.gz azkaban-web-server-${AZK_VERSION}/
ADD az-crypto-${AZK_VERSION}.tar.gz .
ADD az-hdfs-viewer-${AZK_VERSION}.tar.gz .
ADD az-jobsummary-${AZK_VERSION}.tar.gz .
ADD azkaban-hadoop-security-plugin-${AZK_VERSION}.tar.gz . 
ADD az-reportal-${AZK_VERSION}.tar.gz .RUN mkdir -p azkaban-web-server-${AZK_VERSION}/plugins/lib \
    && mv az-crypto-${AZK_VERSION}/lib/az-crypto-${AZK_VERSION}.jar azkaban-web-server-${AZK_VERSION}/lib \
    && mv az-hdfs-viewer-${AZK_VERSION}/lib/az-hdfs-viewer-${AZK_VERSION}.jar azkaban-web-server-${AZK_VERSION}/lib \
    && mv az-jobsummary-${AZK_VERSION}/lib/az-jobsummary-${AZK_VERSION}.jar azkaban-web-server-${AZK_VERSION}/lib \
    && mv azkaban-hadoop-security-plugin-${AZK_VERSION}/lib/azkaban-hadoop-security-plugin-${AZK_VERSION}.jar azkaban-web-server-${AZK_VERSION}/plugins/lib \
    && mv az-reportal-${AZK_VERSION}/lib/az-reportal-${AZK_VERSION}.jar azkaban-web-server-${AZK_VERSION}/libRUN mkdir azkaban-web-server-$AZK_VERSION/logs/ \
    && apt-get clean \
    && rm -rf \
    /var/lib/apt/lists/* \
    /tmp/* \
    /var/tmp/* \
    /usr/share/man \
    /usr/share/doc \
    /usr/share/doc-base### Update Azkaban web server configuration file
RUN sed -i "s/mysql.port=.*/mysql.port=MYSQL_PORT_PLACEHOLDER_VALUE/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties \
    && sed -i "s/mysql.host=.*/mysql.host=MYSQL_HOST_PLACEHOLDER_VALUE/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties \
    && sed -i "s/mysql.database=.*/mysql.database=MYSQL_DATABASE_PLACEHOLDER_VALUE/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties \
    && sed -i "s/mysql.user=.*/mysql.user=MYSQL_USER_PLACEHOLDER_VALUE/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties \
    && sed -i "s/mysql.password=.*/mysql.password=MYSQL_USER_PASSWORD_PLACEHOLDER_VALUE/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties \
    && echo "azkaban.webserver.external_port=$AZK_WEBSERVER_PORT" >> azkaban-web-server-$AZK_VERSION/conf/azkaban.properties### Update Azkaban web server configuration file for azkaban.executorselector.filters
RUN sed -i "s/MinimumFreeMemory\,CpuStatus/CpuStatus/" azkaban-web-server-$AZK_VERSION/conf/azkaban.properties### Update Azkaban web server start script internal-start-web.sh
RUN sed -i "s/\$@ \&/\$@/" azkaban-web-server-$AZK_VERSION/bin/internal/internal-start-web.sh \
    && sed -i "s:^echo .* \$azkaban_dir/currentpid$::" azkaban-web-server-$AZK_VERSION/bin/internal/internal-start-web.sh### Update Azkaban web server start script start-web.sh
RUN sed -i 's/>webServerLog_`date +%F+%T`.out 2>&1 &//' azkaban-web-server-$AZK_VERSION/bin/start-web.sh# Azkaban web server port
#EXPOSE 8443
EXPOSE $AZK_WEBSERVER_PORT# Define default workdir
WORKDIR /root/azkaban-web-server-$AZK_VERSIONCOPY start-azk-web.sh bin/
RUN chmod +x bin/start-azk-web.shCMD ["bin/start-azk-web.sh"]

The Docker file uses a new script file, start-azk-web.sh, inside the container image for Azkaban WebServer. It is for running Azkaban WebServer when starting the container. The script uses environment variables to customize Azkaban WebServer configuration. It detects the container running environment, and gets configuration values, such as MySQL DB password, from cloud services such as AWS System Manager Parameter Store. The script calls the original script start-web.sh (with removal of log redirect) to start Azkaban WebServer after completing configuration customization. The content of the script start-azk-web.sh is the following.


#!/bin/bashset -e################################################################### 
### A script to use environment variables (and AWS resources if 
### applicable) to configure and start azkaban-web-server
####################################################################echo "Instance configuration initialization start time: " `date +%Y%m%d-%H-%M-%S`APP_NAME=azkaban-web-server
AZK_VERSION=3.91.0-3-g94cb830AWS_EC2_CHECK=''
AWS_ECS_TASK_CHECK=''[ -z ${AWS_REGION} ] && AWS_REGION=us-west-2APP_CONF_DIR=/root/azkaban-web-server-$AZK_VERSION/conf
APP_CONF_FILE=${APP_CONF_DIR}/azkaban.propertiesDB_WAIT="20"usage () {
cat <<EOF
USAGE: ./start.sh 
Description:
This script starts the application in a running instance (either container or host).It first uses environment variables (and AWS resources if the instance runs in AWS) 
to update the conf file at {APP_CONF_FILE}.Then it starts the azkaban-web-server with the script ${APP_CONF_DIR}/../bin/start-web.shWARNING:
When the running instance is in AWS (EC2 or ECS container service),it must have 
correct IAM roles and profile for accessing the AWS resources in the AWS_REGION
 
EOF
exit 1
}echo "Container/Instance running as $HOSTNAME"
echo "Display environment variables"
printenv### Check if running in AWS EC2 instancesAWS_EC2_CHECK=`curl -m 8 -Is http://169.254.169.254/latest/meta-data/|grep 200|head -1`### Check if running in AWS container instancesif [ -n "${ECS_CONTAINER_METADATA_URI}" ]; then
  AWS_ECS_TASK_CHECK=`curl -m 8 -Is ${ECS_CONTAINER_METADATA_URI}|grep 200|head -1`
fiAWS_CHECK=${AWS_EC2_CHECK}${AWS_ECS_TASK_CHECK}### Update configuration file ${APP_CONF_FILE} using AWS resources (e.g., SSM parameter store, S3, etc.)"if [ -n "${AWS_CHECK}" ]; thenecho "### Update configuration file ${APP_CONF_FILE} using AWS resources (e.g., SSM parameter store, S3, etc.)"
  
  ### The container currently runs in AWS
  ###### Configure MYSQL_USER_PASSWORD stored in AWS Parameter Store
  ### For testing, we simply use the environment variable value for mysql user passwordsed -i "s/MYSQL_USER_PASSWORD_PLACEHOLDER_VALUE/${MYSQL_USER_PASSWORD}/" ${APP_CONF_FILE}
fiif [ -z "${AWS_CHECK}" ]; thenecho "### Update configuration file ${APP_CONF_FILE} using non-AWS resources"### The container currently runs in an environment outside AWS
###### Configure MYSQL_USER_PASSWORD stored in a vault that is outside AWS cloud
### For testing, we simply use the environment variable value for mysql user password
  
sed -i "s/MYSQL_USER_PASSWORD_PLACEHOLDER_VALUE/${MYSQL_USER_PASSWORD}/" ${APP_CONF_FILE}
fi### Update configuration file ${APP_CONF_FILE} using environment variablesecho "### Update configuration file ${APP_CONF_FILE} using environment variables"sed -i "s/MYSQL_PORT_PLACEHOLDER_VALUE/${MYSQL_PORT}/" ${APP_CONF_FILE}
sed -i "s/MYSQL_HOST_PLACEHOLDER_VALUE/${MYSQL_HOST}/" ${APP_CONF_FILE}
sed -i "s/MYSQL_DATABASE_PLACEHOLDER_VALUE/${MYSQL_DATABASE}/" ${APP_CONF_FILE}
sed -i "s/MYSQL_USER_PLACEHOLDER_VALUE/${MYSQL_USER}/" ${APP_CONF_FILE}#### Database connection#wait for database connectioni=0
while ! nc $MYSQL_HOST $MYSQL_PORT >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge $DB_WAIT ]; then
    echo "$(date) - ${MYSQL_HOST}:${MYSQL_PORT} still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for ${MYSQL_HOST}:${MYSQL_PORT}..."
  sleep 1
doneecho "#### verified database connection to db: ${MYSQL_HOST}:${MYSQL_PORT}"### Set Azkaban Web Server nameif [ -n "${AZKABAN_WEB_SERVER_NAME}" ]; thenecho "### Update conf/azkaban.conf by adding a new line: azkaban.webserver.external_hostname=${AZKABAN_WEB_SERVER_NAME}"
  echo "azkaban.webserver.external_hostname=${AZKABAN_WEB_SERVER_NAME}" >>conf/azkaban.propertiesfiecho "### Content of file bin/start-web.sh"
cat bin/start-web.shecho "Instance configuration initialization stop time: " `date +%Y%m%d-%H-%M-%S`### Start serviceecho "#### Starting azkaban web server"
bin/start-web.sh

The creation of the Docker image for running Azkaban Executor uses similar Dockerfile and script (start-azk-exec.sh). The Dockerfile contains a command to append executor.port setting to the azkaban.properties file. The script start-azk-exec.sh contains logic to get the correct executor hostname, and append azkaban.server.hostname setting to the azkaban.properties file.

Run Azkaban Clusters with AWS Serverless Technologies

To deploy a new Azkaban cluster in AWS, first create an Aurora Serverless MySQL database instance for the cluster. Use the SQL script from the Azkaban release file (e.g., azkaban-db-3.91.0–3-g94cb830.tar.gz) created by building Azkaban source code to create the azkabn database (DB). Create a new database user with read and write privileges for the azkaban DB. Store the password for the database user in AWS SSM Parameter Store for future use.

Next use AWS CloudFormation scripts to create a Network Load Balancer (NLB) for each Azkaban Executor node in the Azkaban cluster. Use AWS CloudFormation scripts to create an ECS Fargate service with the desired version of the container image for Azkaban Executor. The service has DesiredCount set to 1 and load balancer configuration referencing its matching NLB created earlier.

Then use AWS CloudFormation scripts to create an Application Load Balancer (ALB) for the Azkaban WebServer node of the Azkaban cluster. Use AWS CloudFormation scripts to create an ECS Fargate service with the desired version of the container image for Azkaban WebServer. The service has DesiredCount set to 1 and load balancer configuration referencing its matching ALB created earlier.

The following figure shows an Azkaban cluster with two executors deployed with AWS serverless technologies. The Application Load Balancer accepts http traffic to the Azkaban WebServer port from outside of the VPC. It can also serve SSL cert to accept https traffic to the Azkaban WebServer. Other resources in the VPC use Security Groups to ensure they accept network traffic to their designated port only from sources within the same VPC.

Not shown in the above figure, it is helpful to use a temporary EC2 instance in the same VPC that runs Azkaban clusters. Administrators can use the temporary EC2 to run MySQL CLI command to check or update Azkaban DB content, run curl commands to activate new executors, and diagnose issues using Azkaban Ajax API. The temporary EC2 instance can be created and terminated as necessary.

Each node in the Azkaban cluster is self-healing as the Fargate service will auto monitor the health of its container task and auto replace failed container task. The Fargate service for an Azkaban Executor node use its domain name in the Route 53 record (created with the network load balancer) to register itself in the executors table in the Azkaban DB. It is necessary to activate the executor in the Azkaban DB on a temporary EC2 instance in the same VPC with the command curl -m 5 executor-domain-name:executor-port/executor?action=activate.

Hybrid-Cloud CICD Pipeline for Automated Azkaban Deployment

It is possible to automate the deployment of new Azkaban release (in new versions of Docker container images) to an Azkaban cluster in AWS. The following figure shows a hybrid-cloud CICD solution. Use existing tools in the enterprise cloud to build the container image for new releases. Upload the new image to AWS Elastic Container Registry (ECR) and upload deploy configuration files to an AWS S3 bucket. The new file upload in the S3 bucket triggers AWS CodePipeline to deploy the new version of container images to the Fargate services in the DEV environment. The CodePipeline can deploy the new release to subsequent QA and PROD environment as necessary.

Build new docker images in enterprise cloud. Deploy new releases in AWS using AWS CodePipeline.

Summary

This post describes an approach to release Azkaban WebServer and Executor in Docker container images, and run Azkaban clusters using AWS serverless technologies. It improves resiliency of an Azkaban cluster with self-healing nodes. It also optimizes cost and operation of Azkaban clusters in AWS cloud. Our approach allows automated CICD deployment of new Azkaban code releases in hybrid-cloud environment.

For further discussions and details, please contact the author at dantian@gmail.com.