How Not to Take Down Your SCADA Source

Data Engineering

Full-refresh ETL against a vendor-hosted SCADA source is the easy default and the wrong one. Here's why backfills push pipelines toward full refresh, what it actually costs the source, and the layered incremental pattern (overlap window, updated_at pass, reconciliation, planned deep pulls) that gets the same correctness without the call from the vendor.

By John Wassilak Read more →

We Run Our SDLC Out of Git

Data Engineering

We put our entire SDLC in git. Requirements, decisions, task assignments, everything. Then we cancelled standup. Nobody complained. OK, I complained, which is apparently how you get assigned the blog post about it.

By John Wassilak Read more →