Workspace & Account Design
Metastore-per-region, workspaces by environment, service principals, IAM tied to your IdP. The boring foundation that scales past 50 users.
Metastore · Workspaces · SPDatabricks gave the lakehouse a name. We give it discipline: Unity Catalog as the spine, Delta as the storage truth, DLT for declarative pipelines, MLflow for the models that live next to the data they were trained on.
Not a notebook farm. A platform — with workspaces, lineage, contracts and CI.
Metastore-per-region, workspaces by environment, service principals, IAM tied to your IdP. The boring foundation that scales past 50 users.
Metastore · Workspaces · SPDeclarative pipelines with expectations, auto-recovery, schema evolution and CDC. Operational primitives we shouldn't have to write anymore.
DLT · Expectations · CDCOne catalog, cross-workspace lineage, fine-grained access, attribute-based policies, AI-assisted classification. Audit-grade without bureaucracy.
Unity · Lineage · ABACSQL serverless for BI and ad-hoc, photon for the heavy joins, query-result caching that actually fires. BI tools never know it's a lake.
SQL Warehouse · Photon · ServerlessTracking, registry, model serving, feature engineering, vector search. Production ML without leaving the platform the data lives in.
MLflow · Mosaic · Vector · Feature StoreRight-sized clusters with autoscaling and spot, job vs all-purpose split, cost-tag policies, DBU dashboards by team. Lake economics, in the open.
DBU · Spot · Autoscale · TaggingThe Medallion pattern works because each layer has one job. We make the contracts between layers explicit, tested and observable.
Each Medallion layer has a contract: Bronze is raw and append-only, Silver is clean and de-duped, Gold is business-ready. Delta Live Tables expresses the whole thing declaratively — including expectations, lineage and recovery.
Raw data lands once, schema evolution handled, replay always possible.
Deduplicated, joined to dimensions, contract enforced via DLT expectations.
Star schemas for BI, feature tables for ML. Same governance, different consumer.
Lineage from source to dashboard, ABAC policies, classification — one catalog, every workspace.
Capabilities we've shipped at scale. Production runbooks on file for each.
Three quick takes.
Bronze-silver-gold rebuild on DLT with expectations, schema-evolution and replay. Operational headcount halved, freshness improved.
Feature engineering as Delta tables, online + offline feature store, MLflow registry to Model Serving with traffic shadowing.
Metastore-per-region, ABAC policies aligned to GxP, classification & lineage from source to BI. Auditor finished early.
30 minutes. Bring your top three pipelines and your last DBU bill — we'll point to where the platform is buying its weight, and where it isn't.