Designing Data Pipelines That Don’t Hate You Six Months Later
Most data pipelines don’t fail on day one. They fail months later — when requirements change, data volume grows, or the original developer is no longer around. Much like martial arts training, pipelines built on shaky fundamentals tend to collapse under pressure.
This session focuses on practical data pipeline design patterns that emphasize durability, maintainability, and operational confidence. Drawing from experience modernizing enterprise data platforms, I’ll walk through common failure modes in batch and streaming pipelines and how disciplined upfront design decisions can prevent them.
We’ll cover topics such as schema evolution, idempotency, observability, testing strategies, and scaling — grounded in real examples of what worked, what broke, and what we had to correct after the fact. There are no silver bullets here — just repeatable patterns and lessons learned from systems that had to stay running while everything around them changed.
This talk is tool-agnostic and focused on principles that apply whether you’re using Python, Spark, Kafka, or traditional ETL platforms.
What’s in it for the attendee
- How to design pipelines that withstand change and growth
- Patterns for building observability and testability into data systems
- Common anti-patterns that lead to brittle pipelines
- Practical guidance rooted in real production experience