Case Study

Infosys Ltd. Client

Observability and Governance at Scale in Financial Services with Prometheus

Introduction

A leading U.S. financial services company offering life, disability, and long-term care insurance, annuities, and wealth management operates under stringent security and compliance mandates. Its engineering ecosystem spans over 1,000 GitLab projects leveraging Terraform, AWS CDK, Kubernetes, Cron jobs, and Control-M workloads. Infosys was engaged to build a unified observability and governance platform using Prometheus and Grafana, consolidating fragmented tooling into a single pane of glass for metrics, compliance, and automated incident response.

Industry:
Location:
Cloud Type:
Published:
April 9, 2026

Projects used

The Challenge: Fragmented tooling and compliance risk at scale

The client’s platform engineering team managed a sprawling cloud-native estate with over 1,000 GitLab projects, hundreds of Kubernetes workloads, Kafka streams, Databricks and Spark jobs, and Aurora databases, all under heavy regulatory scrutiny. Reliability, security, and compliance had become increasingly complex to maintain.

Logs and metrics were scattered across Kafka, GitLab, Splunk, Aurora, and container registries, with no single view of system health or compliance posture. CI/CD compliance checks were inconsistently enforced, and vulnerabilities in containers and packages took nearly two weeks to remediate. Manual API key rotations, hand-driven compliance checks, and the absence of automated task routing created persistent audit risks.

Non-compliant deployments were slipping through, vulnerability remediation lagged behind SLAs, and governance teams relied on manual processes that didn’t scale. The business needed a unified, automated approach and it had to be built on open-source, cloud-native foundations to avoid vendor lock-in.

The Solution: Prometheus as a unified governance platform

The Infosys team designed and implemented a unified observability and governance platform combining Prometheus for scalable metrics collection and alerting with Grafana as a centralized command center for visualization, policy enforcement, and automated actions.

The team executed a phased implementation:

  1. Data integration: Connected telemetry from Kafka, Databricks, Kubernetes, Aurora, GitLab, Splunk, and container registries. Prometheus scrapes metrics from GitLab, Kubernetes, and Spark workloads. AWS CloudWatch feeds infrastructure-level metrics (CPU, memory, network, Aurora DB performance). GitLab pipeline logs flow through the Grafana Agent into Loki for centralized log querying.
  2. Dashboards and analytics: Built Grafana dashboards for CI/CD compliance, vulnerability tracking, infrastructure health, Aurora cost optimization, and Kubernetes resource utilization all powered by Prometheus metrics and Redshift-based trend analytics from Splunk and container registry data.
  3. Alerting and automation: Configured Prometheus alert rules and Grafana alerting to trigger webhooks for automated task creation. Alerts flow to an Action Items Service that auto-creates Jira and ServiceNow tickets, with status synced back to Grafana dashboards. Notifications also reach teams via Slack and email.
  4. Security and access control: Implemented SSO, RBAC, and audit logging to ensure secure, role-based access to dashboards and governance workflows.

Solution architecture

The architecture integrates data sources, ETL and analytics, and visualization/alerting into a unified observability and governance platform.

Data sources

ETL and analytics

Visualization and notifications

Technology stack

The impact: From Fragmented Tooling to Automated Governance

The results were immediate and measurable:

A unified platform for observability and governance

By consolidating metrics, logs, and compliance insights into a single platform built on Prometheus and Grafana, the organization replaced fragmented monitoring tools and manual governance processes with centralized visibility and automated workflows. Engineering and governance teams can now monitor over 1,000 GitLab projects through unified Grafana dashboards, while Prometheus-driven alerts automatically trigger remediation workflows through Jira, ServiceNow, and Slack.

This approach significantly improved compliance enforcement, reduced vulnerability remediation time, and optimized infrastructure costs while decreasing manual governance effort. Built on open-source, cloud-native technologies running on Kubernetes and AWS, the platform also provides a scalable foundation for expanding observability and governance as the organization’s cloud environment continues to grow.