Building a fault-tolerant application stack on top of a dynamic foundation

UK fintech Bink’s mission is to reimagine loyalty programs — making them easier for everyone, including banks, shops, and customers. To achieve this, Bink developed a solution that links customer payment cards to any loyalty program and recognizes loyalty points every time people shop, connecting purchases with reward programs in a single tap.

Underpinning the solution is an extensible proprietary platform that meets the criteria that banks have for security and accountability. With the luxury of a greenfield site, the Bink engineering team leveraged multiple CNCF projects including Kubernetes, Linkerd, Fluentd, Prometheus, and Flux to build a technology stack that is performant, scalable, reliable, and secure, while reducing application issues caused by transient network problems.  

A vision for retail loyalty 

Like most startups, the infrastructure was built mostly by one individual. Budgets were tight, but they knew they had to build something that could grow with the company. Initially, Bink had three web servers on bare metal Ubuntu 14.04 instances running a handful of uWSGI applications load balanced behind NGINX instances — no automation of any kind was in place.

In 2016, they began converting Bink’s applications over to Docker containers and moved away from the existing approach of SFTPing code onto the production servers and restarting uWSGI pools. To enable this, they built a container orchestration utility in Chef which dynamically assigns host ports to containers and updates NGINX’s proxy_pass blocks to pass traffic through. This worked well enough until they realized that Docker caused many Kernel panics and other issues on their aging Ubuntu 14.04 infrastructure. 

Around the same time, the team got formal approval for evaluating a migration from their data center to the cloud since their needs were far outgrowing what the data center could offer. 

Moving to Kubernetes and looking for a service mesh 

Around 2017, Monzo engineers gave a KubeCon talk on a recent outage they experienced and the role Linkerd played. “Not everyone is transparent about these things and I really appreciated them sharing what happened so the community can learn from their failure — a big shout-out to the Monzo team for doing that!” said Mark Swarbrick, Head of Infrastructure at Bink.

Migrating their software stack onto a cloud native platform was a no-brainer. However, parts of the architecture weren’t as performant or stable as they had hoped. Linkerd enabled them to implement connection and retry logic at the right level of the stack, giving them the reliance and reliability they needed. Suddenly, the questions over whether they could use their software stack in the cloud without significant uplift had disappeared. Linkerd showed that placing the logic in the connection layer was the right approach and allowed them to focus on product innovation rather than worrying about network or connection instability. 

The power of cloud native

“Fast forward three years and a team of three is supporting our in-house built platform capable of processing millions of transactions per day  — a true testament to the amazing technology of the cloud native ecosystem! ” said Mark Swarbrick, Head of Infrastructure at Bink. 

Looking at the entire stack, cloud native technologies — and CNCF projects, in particular — enabled Bink to build a cloud-agnostic platform that scales as needed whilst allowing them to keep a close eye on performance and stability. The platform has been load tested to perform full disaster recovery in under 15 minutes, recover easily from transitory network issues, and is able to perform root cause analysis of problems quickly and efficiently.

Want to hear more? Read Bink’s case study and dive deeper into the development and implementation of their cloud native platform.