Cloud Native Live: Designing and Operating Reliable Cloud Services – A View from the Trenches

Users’ expectations for cloud service availability are sky-high. When we want to use these apps, we expect them to work like water flows from the tap. But when systems are up and running, few people are focused on reliability. It’s cost prohibitive and counterproductive to try to guarantee 100% availability since that pulls resources away from the innovation and development that drive business growth. These tradeoffs are hard. In this live webinar, we’ve assembled a panel of experts that led teams through these challenges during rapid cloud expansion at businesses including AWS, Amazon.com, Google, Microsoft, and IBM. They’re here to share their experiences and perspectives from the trenches. They’ll discuss how to identify potential reliability issues before they become customers’ concerns, how to implement proactive monitoring, alerting, and remediation systems and best practices for ensuring your cloud-based services are designed for maximum availability. They aim to share some hard-won wisdom on improving your cloud services’ reliability. This panel deeply believes in investing in reliability. They show this not only by their work in their day jobs but also by helping launch the reliability.org community as founding members. Their shared vision for creating a community of like-minded folks that can continue the reliability discussion is the idea that sparked this special CNCF webinar to mark the launch of the reliability.org community.

Yokohama, Japan