Confidence with Chaos for your Kubernetes Observability (ENG)

Kube-prometheus has successfully deployed the Kubernetes observability stack with Prometheus, and the dashboards provide many interesting insights. What’s next? Everything is overwhelming and your teams are drowning in alerts. They need to be modified, the dashboards require more fine granular graphs, and your team discusses service level objectives (SLO). Documentation and action items for your SRE and DevOps teams are needed too. Simulate a production incident to see whether SLOs are met, or alerts are fired. Is there a way to observe the deployed application, and see if it breaks from chaos?

Join this talk to dive into cloud-native chaos engineering, app instrumentation, and distributed tracing and learn about production incidents with failed SLOs. Gain confidence with chaos as an SRE, and as a developer seeing the value in Observability. Welcome to day 2 DevOps.

Michael Friedrich

Michael Friedrich is a Senior Developer Evangelist at GitLab, focussing on Observability, SRE, and Ops. He studied Hardware/Software Systems Engineering and moved into DNS and monitoring development at the University of Vienna and ACO.net. Michael was a maintainer of an OSS monitoring software for 11 years before joining GitLab. He loves to help educate everyone and regularly speaks at events and meetups. Michael co-founded the #EveryoneCanContribute cafe meetup group to learn cloud-native & DevOps. Michael is a Polynaut advisor at Polywork, created o11y.love as a learning platform for Observability, and shares insights in the opsindev.news newsletter.