Remote sensing in agriculture requires complex systems that are able to communicate with various external devices like GPS and cameras, and use machine learning and AI inference to provide insights to the grower regarding their orchard, down to tree and crop-level precision. Aurea Imaging, a Dutch startup company, specializes in remote sensing solutions for agriculture using an embedded device with a powerful GPU-enabled NVIDIA Jetson on board.

Instead of opting for the traditional embedded C & ROS stack, we chose a cloud-native approach: running a lightweight distribution of Kubernetes (K3s) on the edge. Because of the team’s familiarity with cloud-native concepts, we took this (literally) to the edge. Even knowing there would be challenges due to the niche application, it was a clear choice given the vast amount of cloud-native projects and rapidly evolving landscape of AI on the edge in IoT applications.

The major challenge to address was the Jetson board’s use of NVIDIA’s ecosystem for all the GPU work, and our ML models needing to be constantly updated to industry standards. How do you keep a fleet up-to-date, not only at an application level, but also in terms of host-level tooling, like JetPack (Jetson’s official software stack), CUDA (the GPU interface), firmware packages, peripheral drivers, or the OS distribution?

The best place to find answers regarding cutting-edge use cases of Kubernetes is, of course, KubeCon + CloudNativeCon! During a presentation on deploying Kubernetes clusters around the world, we learned about Kairos, a CNCF Sandbox Project that allows you to convert Linux distributions into an immutable OS and simplify Day 2 operations. This means for any kind of host-level update, especially if it involves risky operations like firmware drivers and bootloaders, we do not need to send a technician with physical access to the farmer’s device but can perform the update remotely in a safe manner. 

Upgrading servers and continuous delivery work well in controlled environments like the cloud, but when your devices are mounted on a tractor in a field with flaky network connectivity, a traditional approach won’t cut it. Especially when you’re managing a global fleet of remote sensing devices, without IT technicians in the field, and with the rapidly evolving NVIDIA Jetson ecosystem. We needed a solution before deployment. 

Another important aspect to consider is the lifespan of agricultural products, which can typically be a decade long. Having control of our application software and the host level enables these commitments to be achieved without the need to fetch and replace devices. We evaluated possible scenarios: maintaining our own OS was one, but it would require a large development effort. We tried solutions offered by Yocto-based OS like Balena, but they didn’t suit our needs in terms of our stack.

In a workshop on running Kairos on  Raspberry Pi, we met with and discussed our challenges with members of the Kairos community. One of the principal challenges was flashing the OS on our board, the Jetson Orin NX, since it was not yet one of their supported devices. It was not a trivial development as the hardware was brand new (early 2023), and getting our hands dirty with bootloaders, partition layouts, NVMe, and device trees still did not make us embedded engineers, but we managed to make a proof of concept (PoC). Findings with a new Jetson board led to some valuable breakthroughs, which were contributed to the community.

One initial benefit was the elimination of “snowflakes” in our fleet. A major pain point before Kairos was that we had to deal with different packages on the OS, including some with different hardware drivers. Troubleshooting was inconsistent, and keeping track of discrepancies was getting out of hand. The concept of immutability and a containerized OS fit perfectly, and our PoC proved to be a reliable and almost maintenance-free fleet. Having a homogeneous fleet, and containerizing the app in addition to the OS would provide the desired operational flexibility, enabling Day-2 operations after device-specific provisioning is done. 

Another direct benefit is that the provisioning procedure no longer includes package installation steps on the host; it now only generates cloud resources and copies over certificates. As the devices get pre-flashed with Kairos, this significantly reduces the risk for human errors and the time it takes to make a device ready for a customer installation, leading to higher operational efficiency. 

How do these OS upgrades actually work? Another motivation behind Kairos was that it is designed with Kubernetes and its ecosystem in mind. The K3s single-node clusters we deploy take care of the updates on their own. By using the Kairos operator (or system upgrade controller), we can provide an image that is built through our CI and released in our private registry. The existing OS image already has permissions to access the registry as it is flashed with it in its K3s configuration (with cloud-init), so it can easily pick it up and perform the upgrade. The device reboots, and it is now running a different OS – whether it’s a minor package version that changed or the whole base OS. If something goes wrong, the A/B upgrade approach  makes sure the OS will boot to its fallback image, avoiding bricking a device that is hundreds of kilometers away. So how do we upgrade our OS? We don’t. Instead, we replace it fully with a newer image version, as Kairos provides atomic image-based upgrades in line with how immutable systems work, guaranteeing no partial states or drift.

It’s not too much work to maintain Kairos OSes because we use a Dockerfile with all the packages (public or custom ones) for our specific setup. This can be from NVIDIA CUDA binaries to device-specific firmware for the LTE modem or the cameras. Applying a firmware overlay fix to improve peripheral work is now a piece of cake, and can be done on all of our devices remotely and simultaneously, as long as they have a stable network connection.

Collaborating with the Kairos community was a huge benefit to our fleet’s maintainability and future-proofing. Kairos helped solve problems which could be tackled withan immutable OS, and helped us achieve our goal of reliable upgrades, a long lifecycle, and predictable operations. In return, we were able to provide support to Kairos for the Jetson Orin NX platform, unlocking a family of NVIDIA devices equipped with an NVMe disk, highlighting the strength of open-source: solving a real production problem while contributing improvements that others can build on.