We have a small project with four layers:
1. Raw — incoming raw data
2. Validation —Relationship and data validations. Transformations into facts and dims
3. Reporting — joining facts and dims to meet presentation requirements
4. Business Application — presentation layer. Views, with security where needed, of the reporting objects
When we added relationship tests between project layers, we started getting false test failures.
The raw layer is a bit like dbt’s stage layer, we do renaming and some very simple changes here and have validation for duplicates and nulls. The validation layer we build dims and facts, check data, count relationships and other tests. In reporting we build objects to help the BI tool visualize the data and business_app are simply the exposed, secure version of the reporting models.
Currently, we build the project in layers so if there’s a failure in one layer nothing in the next layer builds. Pseudo code of our cmd list would be:
dbt build -s layer:raw dbt build -s layer:validation dbt build -s layer:reporting dbt build -s layer:business_app
(yes, yes.. layer isn’t a real selector, but you get it)
The problem was, we added relationship tests between the layers, so our validation.yml has things like :
- name: validation_layer_model_1 tests: - dbt_utils.equal_rowcount: compare_model: ref('raw_layer_model_A')
Because of the ref() to the raw layer, this relationship test is run as part of
dbt build -s layer:raw
So we build raw_layer_model_A with more data, but haven’t yet run validation_layer_model_1 and now this rowcount fails.
We thought about different ways to address this:
- Change the selections we use to build the project. 👎 We’ve been otherwise happy with this command structure
- Write custom tests. 👎 more code to maintain
- Write where clauses to the relationship test to be sure the fact build time is after the associated raw build time. 👎 not all our models have build timestamp, so would need modify a bunch of them.
- Add some 🏷️s. 👍
- ✅ simple
- ✅ no new sql code to write or maintain
- 🏁
We added tags to all cross-layer relationship tests.
- name: validation_layer_model_1 tests: - dbt_utils.equal_rowcount: compare_model: ref('raw_layer_model_A') tags: val_raw_relationship
We updated the project build job as follows:
dbt build -s layer:raw --exclude tag:val_raw_relationship dbt build -s layer:validation --exclude tag:rep_val_relationship dbt build -s layer:reporting dbt build -s layer:business_app
We’ve found that the excluded tests were correctly run when we built the next layer in the next build statement. But you could also add explicitly call them with a test cmd:
dbt build -s layer:raw --exclude tag:val_raw_relationship dbt build -s layer:validation --exclude tag:rep_val_relationship dbt test -s tag:val_raw_relationship dbt build -s layer:reporting --exclude tag:val_rep_relationship dbt test -s tag:rep_val_relationship dbt build -s layer:business_app
For more insights, check out our dbt-related blogs or visit the dbt Developer Hub for comprehensive guides and updates.
Is your organization adapting to remain competitive? In The Architect Room, we design a roadmap for the future of your digital organization, while capitalizing on current technology investments. Our knowledge spans a variety of offerings across all of the major public cloud providers. Visit Red Pill Analytics or feel free to reach out on Twitter, Facebook, and LinkedIn.