Reliably Deploying Scala Spark containers for Kubernetes with Github Actions

One of the most under-appreciated parts of software engineering is actually deploying your code. There is al lot of focus on building highly scalable data pipelines, but in the end your code has to ‘magically’ transferred from a local machine to a deployable piece of pipeline in the cloud.

In a previous article I’ve discussed building data pipelines in Scala & Spark and deploying them on Kubernetes, well at least deploying them on your local minikube setup for testing purposes.

Most of the time you don’t want to immediately deploy & run these pipelines directly, but make them available as…


Day 1 of the journey on building my first Scala GraalVM ZIO application

» Day 0
» Day 1

Continued from Day 0

Day 1: Getting builds under control

Now that we have a basic project setup and somewhat building, we need to stabilise it a bit more, so we can spend all our future energy on actually writing the code. This means CI/CD, dockerising, versioning, but more important getting the binary stable on *nix and mac environments. This means diving deeper into GraalVM and it’s available build tools. …


Day 0 of the journey on building my first Scala GraalVM ZIO application

» Day 0
» Day 1

Caveat: Normally I write posts about how I tackled a problem and present the solution on a silver platter.
In this series I’d like you to take you on a journey of exploration. I have no idea how fast this will progress or where it will end up. I will try to work at least 1 day a week on this.

There are way better tutorials write ups for each of these tools (zio, kafka, api, graalvm), but I just want to…


Hazel Glass https://portlandopenstudios.com/artists/2019-artists/hazel-glass.html
Hazel Glass https://portlandopenstudios.com/artists/2019-artists/hazel-glass.html

For each challenge there are many technology stacks that can provide the solution. I’m not claiming this approach is the holy grail of data processing, but this more the tale of my quest to combine these widely supported tools in a maintainable fashion.

From the onset I’ve always tried to generate as much configuration as possible, mainly because I’ve experienced it’s easy to drown in a sea of yaml-files, conf-files and incompatible versions in registries, repositories, CI/CD pipelines and deployments.

What I created was a sbt script that, when triggered, builds a fat-jar, which gets wrapped it in a docker-file…


A scalable approach to fuzzy data matching

The challenge

Recently a colleague at Datlinq asked me to help her with a data problem, that seemed very straightforward at a glance.
She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.
She noticed that many of those companies share the same address, which makes sense, because a lot of those companies tend to cluster in business complexes.


Since my childhood I’ve always been a coder. I got started with some GW BASIC, but quickly moved to C and C++ during my high school years, though I never really considered it as a possible future occupation. It was more of a fun hobby and since my friends also did it, it didn’t seem that strange or special. Some liked football, some drawing, I liked programming.

In fact: I’ve always thought of ending up a veterinarian. Programming and hacking seemed more as as hobby to me, also because it came from and ended up in playing games (eg. …

Tom Lous

Freelance Data & ML Engineer | husband + father of 2 | #Spark #Scala #BigData #ML #DeepLearning #Airflow #Kubernetes | Shodan Aikido

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store