Rethinking threat detection and response in cloud native ecosystems

Posted on February 1, 2024

CNCF projects highlighted in this post

Community post originally published on DZone by Nigel Douglas

In highly dynamic cloud-native environments, the traditional Threat Detection and Response (TDR) approaches are increasingly showing their limitations. With its unique architecture and operational dynamics, Kubernetes demands re-evaluating how we handle security threats, particularly in the context of Endpoint Detection & Response (EDR) solutions.

The Traditional EDR Approach: SIGKILL and Its Limitations

Traditionally, EDR solutions have relied heavily on the Signal Kill (SIGKILL) command in Linux systems to terminate processes deemed malicious or risky. SIGKILL is an abrupt method that forcibly stops a process and its running threads, offering no chance for the process to complete any cleanup operations. While effective in terminating processes, this brute-force approach can lead to unintended consequences like data loss or corruption. It’s a method suited for systems where immediate cessation is paramount, but in the cloud-native world, such an approach can be too heavy-handed, particularly for mission-critical applications.

The Need for a More Nuanced Approach: SIGTERM and Kubernetes

In Kubernetes environments, a more nuanced approach is often employed. Before SIGKILL is used, a SIGTERM command is sent, giving containers the opportunity to shut down gracefully. This method of graceful termination highlights a key principle in cloud-native ecosystems: the need for balance between aggressive threat mitigation and maintaining system integrity and stability. That’s where Falco Talon comes into play. It was designed as an open-source Response Engine for isolating threats, specifically in Kubernetes. It enhances the cloud-detection detection engine Falco with a no-code solution. DevOps teams can author simple Talon rules that respond to Falco events in real time.

Falco Talon: Bridging DevOps and DevSecOps

Falco Talon represents a paradigm shift in how we approach TDR in cloud-native systems. Unlike traditional EDR solutions that focus solely on killing processes, Falco Talon offers the option to gracefully terminate workloads. This approach not only mitigates the risk more effectively but also aligns closely with DevOps best practices.

Security teams can create Falco security rules in Kubernetes-native language (YAML), allowing for more transparent and collaborative security management. DevOps teams can interact directly with the agent via Helm, ensuring smooth upgrades and downgrades without compromising system stability. This security integration into the DevOps workflow bridges the gap between DevOps and DevSecOps, fostering a more holistic and effective approach to cloud-native security.

Operational Implications of Graceful Termination

When a threat is detected, such as terminal shell activity on an over-privileged workload, the response isn’t a blunt force termination but a graceful pod shutdown. This method ensures that DevOps teams are promptly notified and can take appropriate actions to adjust the security context as needed. It’s a stark contrast to traditional EDR approaches, where Operations teams might be left in the dark about SIGKILL activities, leading to more problems than solutions.

Your web applications in production handle termination gracefully so that there is minimal impact on the end user, and the time-to-recovery must be as fast as possible. By directly interacting with the intended Kubernetes primitives such as “kubernetes:terminate“, rather than applying the archaic, legacy approach of SIGKILL actions, the signal lets the pod know it will be shut down.

The application code will listen for this event and start shutting down cleanly at this point. According to Sandeep Dinesh, a developer advocate on Google Cloud, this process may include stopping any long-lived connections (like a database connection or WebSocket stream) while saving the current state and ultimately avoiding potential data loss.

A New Paradigm in Networking Security

The shift from perimeter-based firewalls to workload-level security in the form of Kubernetes Network Policies signifies another aspect of this evolution. Tools like Calico and Cilium have moved security and operations teams away from the complex direct interactions with IPTables, reducing the risk of service disruptions due to potential misconfigurations. However, this shift also necessitates a robust zero-trust networking design, where immediate threat response is based more on assigned labels than explicit IP address denials.

Falco Talon and Network Policy Enforcement

Falco Talon extends its capabilities to network security. When Falco detects a suspicious IP address, Talon can automatically enforce granular NetworkPolicies based on the detected IP address. This proactive approach is crucial in a cloud-native environment where threats can rapidly propagate across interconnected services. By automating the enforcement of NetworkPolicies with the “kubernetes:networkpolicy” actioner, Falco Talon ensures a swift and precise response to detected threats, minimising the risk of widespread system compromise.

Falco Talon and the Need for Automated Tagging

As highlighted in a Calico network security workshop for AWS, users were recommended to default-deny network policies based on matching tag context. In this case, if a pod has the tag “quarantine=true” than all traffic will be logged and denied by this NetworkPolicy configuration. This process itself is not really automated. If the threat is, let’s say, a crypto-miner like XMRig running within a Kubernetes pod, the DevOps teams need to identify the mining activity and then run a command like the below in order to activate the network policy on that specific pod:

Shell

1kubectl label <pod-name> -n <pod-namespace> -l quarantine=true

Falco Talon addresses this dilemma. Falco detects in milliseconds when the crypto miners execute a system call using the Stratum protocol. When this event is detected in Falco, Talon can then apply the action “kubernetes:labelize” to add, modify or delete labels associated with pods. The ability to instantly apply a label to a suspicious pod works hand-in-hand with desired automation efforts such as the below Kubernetes Network Policy.

Legacy EDR Technologies vs. DevOps Best Practices

One of the fundamental reasons legacy EDR technologies struggle to align with DevOps best practices is their often closed-source, black-box approach to threat detection. In many enterprises, operations teams do not have access to Managed Detection and Response (MDR) platforms. This lack of accessibility leaves them in the dark regarding the reasons behind workload terminations by the detection engine, leading to significant operational challenges. Operations teams, integral to the DevOps model, require transparency and the ability to troubleshoot and respond effectively. This is where traditional EDR tools fall short, as their opaque nature limits the scope for understanding and resolving security incidents.

Falco, by contrast, offers a solution that is more in tune with DevOps principles. It employs a transparent, open-source rules engine that is fully customizable, allowing both developers and security practitioners to define and adjust security policies in a universally understandable language: YAML. This approach not only fosters collaboration between development and security teams but also enhances operational efficiency. By allowing for the identification and exclusion of false positives, operations teams can fine-tune the system to avoid unnecessary disruptions. This level of control and visibility is critical in a DevOps environment, where the rapid iteration and deployment of applications demand an agile and adaptable security approach.

The shift from legacy EDR tools to solutions like Falco represents a move towards more integrated, transparent, and flexible security practices. These practices are essential in a cloud-native landscape where DevOps is not just a methodology but a critical framework for operational success.

Conclusion: A Shift Towards Adaptive, Integrated TDR Strategies

The evolution of threat detection and response in cloud-native ecosystems is not just a technological upgrade but a fundamental shift in mindset. The move from a one-size-fits-all, process-killing approach to a more adaptive, integrated strategy reflects a deeper understanding of the complexities and nuances of modern Cloud Detection & Response (CDR) approaches. By embracing tools like Falco Talon, organizations can ensure their TDR strategies are not only effective in mitigating risks but also in harmony with the operational and developmental realities of cloud-native systems.

In this new era, the collaboration between security and operations teams is key. Security is no longer a siloed function but an integral part of the entire lifecycle of cloud-native applications. As we continue to navigate the challenges and opportunities presented by cloud-native technologies, rethinking our approach to threat detection and response becomes not just advisable but essential.

Hyderabad, India