Michael Gregory!

Site Reliability Engineer at OverDrive

Michael Gregory started his career as a full-stack web developer, but has gradually moved deeper into backend systems. Now he is part of an SRE team primarily supporting OverDrive's search systems. He has experience managing Kafka, RabbitMQ, Elasticsearch, and Kubernetes clusters. That's more than enough to keep the world's user interfaces safe from him.

Is Your Software Going to Fail?

No matter how hard we try, our software doesn't always behave the way we expect. Our applications will inevitably fail, sometimes in spectacular ways. Not all failures are our fault, but we can prepare for many problems, even the external ones. In this talk, we'll consider various reasons our apps might fail, ranging from bugs to the hosting platform. Every application will have its own set of failure conditions that should be considered. I'll challenge you to think about how you can make your application more resilient while sharing examples of how I prepare for failure in the distributed systems I manage.

1:00 PM

New hope (Live 15 - Simul 10, 11)