CI/CD for Data Engineers

Reliably Deploying Scala Spark containers for Kubernetes with Github Actions

van Bree — Le Friedland

0. OneFlow

1. Creating new Features

Makefile

make create-feature-branch my-new-feature

Workflow

name: 'Automatic: On Push'

on:
push:
branches:
- 'feature/**'
jobs:

build:
name: Build & Test
runs-on: ubuntu-latest

steps:
- name: Check out repository code
uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Java and Scala
uses: olafurpg/setup-scala@v10

- name: Cache sbt
uses: actions/cache@v2
with:
path: |
~/.sbt
~/.ivy2/cache
key: ${{ runner.os }}-sbt-cache-v2-${{ hashFiles('**/*.sbt') }}-${{ hashFiles('project/build.properties') }}

- name: Lint
shell: bash
run: make lint

- name: Test
shell: bash
run: make test-coverage

- name: Codecov
uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
directory: target

- name: Slack on error
uses: 8398a7/action-slack@v3
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
with:
status: ${{ job.status }}
fields: repo,message,commit,action,workflow,job,took
if: ${{ failure() }}

2. Deploying DEV Releases

Pull Request

Horrible history
Horrible history
This is not something you want in your history. Hence squash!

Workflow

name: 'Automatic: On Push'

on:
push:
branches:
- 'feature/**'
- 'main'
check:
name: Prebuild checks
runs-on: ubuntu-latest
outputs:
num_changes:
${{ steps.check1.outputs.num_changes }}

steps:
- name: Check out repository code
uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Check changes
id: check1
shell: bash
env:
SHA_OLD: ${{ github.event.before }}
SHA_NEW: ${{ github.sha }}
run: |
echo ::set-output name=num_changes::$(make check-changes)

- name: Turnstyle (1 at the time)
uses: softprops/turnstyle@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
git diff --name-only $(SHA_OLD) $(SHA_NEW) | (grep -v version.sbt || true) | wc -l
build:
name: Build & Test
runs-on: ubuntu-latest
needs: check
if: needs.check.outputs.num_changes > 0
outputs:
modules: ${{ steps.project.outputs.modules }}
version: ${{ steps.vars.outputs.version }}

steps:
- name: Check out repository code

...
- name: Set Project Modules for matrix
id: project
shell: bash
run: echo ::set-output name=modules::$(make list-modules-json)
- name: Bump snapshot (main)
if: github.ref == 'refs/heads/main'
shell: bash
run: make bump-snapshot-and-push
- name: Set variables
id: vars
run: echo ::set-output name=version::$(make version)
- name: Slack on error
...

Versioning

Matrix

deploy:
if: github.ref == 'refs/heads/main'
needs: build

name: Build & Deploy Snapshot
runs-on: ubuntu-latest
strategy:
matrix:
module: ${{fromJson(needs.build.outputs.modules)}}

steps:
- name: Check out repository code
...

- name: Setup Java and Scala
...

- name: Cache sbt
...

- name: Container Registry Login
shell: bash
env:
REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
REGISTRY_USERNAME: ${{ github.actor }}
run: make registry-docker-push-login
- name: Dockerize
shell: bash
run: make docker-build ${{ matrix.module }}

- name: Publish Docker Image to Github Container Registry
shell: bash
env:
REGISTRY_OWNER: ${{ github.repository_owner }}
run: make docker-push-registry ${{ matrix.module }}

3. Deploying TEST Releases

Release branch

Manual trigger

manual trigger from the Github Actions page
name: 'Manual: Start Release'

on:
workflow_dispatch:

jobs:
prepare-release:
name: Prepare release
...
if: github.ref == 'refs/heads/main'

steps:

- name: Delete current release branch
uses: dawidd6/action-delete-branch@v3
continue-on-error: true
with:
github_token: ${{github.token}}
branches: release

- name: Create new release branch
uses: peterjgrainger/action-create-branch@v2.0.1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
branch: release

- name: Check out repository code
uses: actions/checkout@v2
with:
ref: release
fetch-depth: 0


...
- name: Bump Release
shell: bash
run: make bump-release-and-push

- name: Set variables
id: vars
run: |
echo ::set-output name=version::$(make version)
echo ::set-output name=modules::$(make list-modules-json)

Version bump

Deploy to PROD action

Pull Request to PROD

      - name: Find old PR
uses: juliangruber/find-pull-request-action@v1
id: fpr
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
branch: release

- name: Close old PR
if: ${{ steps.fpr.outputs.number > 0 }}
uses: peter-evans/close-pull@v1

with:
pull-request-number: ${{ steps.fpr.outputs.number }}
comment: Auto-closing pull request
delete-branch: false

- name: Create new PR
id: pr
uses: repo-sync/pull-request@v2
with:
source_branch: release
destination_branch: main

pr_title: "Release ${{ steps.vars.outputs.version }} to PROD"
pr_body: "..."
pr_reviewer: "${{ github.actor }}"
pr_assignee: "${{ github.actor }}"
pr_label: "auto-pr,release"
pr_allow_empty: true
github_token: ${{ secrets.GITHUB_TOKEN }}

...

4. Deploying PROD Releases

on:
pull_request:
types: [ closed ]


jobs:
prepare-release:
name: Release to Prod
runs-on: ubuntu-latest


# If merged & pr was tagged release & from a release branch
if: contains(github.event.pull_request.labels.*.name, 'release') && github.event.pull_request.merged == true && github.event.pull_request.head.ref == 'release'
... build-deploy:
name: Build & Deploy to PROD
runs-on: ubuntu-latest
needs: prepare-release
strategy:
matrix:
module: ${{fromJson(needs.prepare-release.outputs.modules)}}

steps:
- name: Check out repository code
uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
... success:
needs: [ prepare-release, build-deploy ]
name: Notify success
runs-on: ubuntu-latest

steps:
- name: Create release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: v${{ needs.prepare-release.outputs.version }}
release_name: Release ${{ needs.prepare-release.outputs.version }}
draft: false
prerelease: false

- name: Deploy notification
uses: 8398a7/action-slack@v3
...

- name: Delete current release branch
uses: dawidd6/action-delete-branch@v3
continue-on-error: true
with:
github_token: ${{github.token}}
branches: release

Rejected PR / Abandon release

name: 'Automatic: Deploy to PROD'

on:
pull_request:
types: [ closed ]

jobs:
prepare-release:
...

build-deploy:
...

success:
...

abandon-release:
name: Abandon Release to Prod
runs-on: ubuntu-latest

# If PR was closed, but not merged
if: contains(github.event.pull_request.labels.*.name, 'release') && github.event.pull_request.merged == false && github.event.pull_request.head.ref == 'release'

steps:
...
- name: Delete tag
shell: bash
run: |
TAG=$(git describe --exact-match ${{ github.event.pull_request.head.sha }})
git tag -d $TAG
git push --delete origin $TAG
git push -v origin :refs/tags/$TAG
- name: Delete current release branch
uses: dawidd6/action-delete-branch@v3
continue-on-error: true
with:
github_token: ${{github.token}}
branches: release

5. Deploying Hotfixes

make create-hotfix-branch

Main workflow

name: 'Automatic: On Push'

on:
push:
branches:
- 'feature/**'
- 'main'
- 'hotfix'
notify:
if: github.ref == 'refs/heads/hotfix'
needs: build
name: Notify hotfix
runs-on: ubuntu-latest

steps:
- name: Hotfix notification
uses: 8398a7/action-slack@v3
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
with:
username: 'github actions'
author_name: ''
icon_emoji: ':github:'
status: ${{ job.status }}
fields:
text: ":eight_pointed_black_star: ${{ github.event.repository.name }} *hotfix* ready for release\n\n:arrow_right: <https://github.com/${{ github.repository }}/actions/workflows/release-workflow.yaml|Start Release Workflow ( hotfix ) >"

Release workflow

name: 'Manual: Start Release'

on:
workflow_dispatch:

jobs:
prepare-release:
...

if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/hotfix'

steps:
- name: Set release type
id: type
run: |
if [ "$REF" == "refs/heads/main" ]
then
echo "::set-output name=branch::release"
else
echo "::set-output name=branch::hotfix"
fi
env:
REF: ${{ github.ref }}

- name: Delete current release branch
uses: dawidd6/action-delete-branch@v3
if: steps.type.outputs.branch == 'release'
...

- name: Create new release branch
uses: peterjgrainger/action-create-branch@v2.0.1
if: steps.type.outputs.branch == 'release'
...

- name: Check out repository code
uses: actions/checkout@v2
with:
ref: ${{ steps.type.outputs.branch }}
fetch-depth: 0
- name: Bump Release
if: steps.type.outputs.branch == 'release'
shell: bash
run: make bump-release-and-push

- name: Bump Hotfix
if: steps.type.outputs.branch == 'hotfix'
shell: bash
run: make bump-patch-and-push

...

Prod release

name: 'Automatic: Deploy to PROD'

on:
pull_request:
types: [ closed ]

jobs:
prepare-release:
...

# If merged & pr was tagged release & from a release branch
if: contains(github.event.pull_request.labels.*.name, 'release') && github.event.pull_request.merged == true && (github.event.pull_request.head.ref == 'release' || github.event.pull_request.head.ref == 'hotfix')

steps:
...


build-deploy:
...

success:
...

steps:
...
- name: Delete current release/hotfix branch
uses: dawidd6/action-delete-branch@v3
continue-on-error: true
with:
github_token: ${{github.token}}
branches: ${{ github.event.pull_request.head.ref}}

6. Conclusion

Code:

Freelance Data & ML Engineer | husband + father of 2 | #Spark #Scala #BigData #ML #DeepLearning #Airflow #Kubernetes | Shodan Aikido

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store