fieldnotes

Fix the day

Michael Gonzalez

17 May 2026 • 1 min read

We don't have direct S3 access. Entra-AWS identity friction makes that a non-starter for now, which means passively reading external tables from S3 isn't viable; not unless we're committing to full truncate-and-reload cycles on billion-row tables. We're not.

So we built around it.

Internal tables in Snowflake, daily partitioning by ingest date, custom tasks and procedures for orchestration. One of our vendors has persistent data quality issues at the source. The nightly delta pull lands in a dated S3 folder; a procedure fires via cron a couple hours after the expected job run and pulls that partition into Snowflake. A control table logs every ingestion. A SQL view reads the most recent ingestion number and crosschecks record counts against the rolling average for that table.

It's not perfect. It's 99.9% better than what we were doing before.

Just to note: Snowflake's own data loading documentation recommends this partitioning structure. Date-stamped folders, load by partition, copy any fraction of that data with a single command. The control table pattern is also well-established; pipelines that track watermarks, row counts, and execution metadata per load are a standard approach for operational visibility without bolting on external monitoring tooling. What Snowflake now packages natively in Data Metric Functions (freshness checks, row count anomaly detection on a schedule) is essentially what the view is doing by hand.

The partitioning model is where this really pays off on a bad data day. When a vendor sends garbage, you blow away that day's partition and reload a clean one. No surgery on a billion-row table, no downstream contamination you're chasing for three days. You just fix the day.

Sometimes the constraint produces the better solution.