Master dbt Testing: How to Avoid False Negatives when Testing Between Project Layers

We have a small project with four layers:

1. Raw — incoming raw data
2. Validation —Relationship and data validations. Transformations into facts and dims
3. Reporting — joining facts and dims to meet presentation requirements
4. Business Application — presentation layer. Views, with security where needed, of the reporting objects

When we added relationship tests between project layers, we started getting false test failures.

The raw layer is a bit like dbt’s stage layer, we do renaming and some very simple changes here and have validation for duplicates and nulls. The validation layer we build dims and facts, check data, count relationships and other tests. In reporting we build objects to help the BI tool visualize the data and business_app are simply the exposed, secure version of the reporting models.

Currently, we build the project in layers so if there’s a failure in one layer nothing in the next layer builds. Pseudo code of our cmd list would be:

dbt build -s layer:raw
dbt build -s layer:validation
dbt build -s layer:reporting
dbt build -s layer:business_app

(yes, yes.. layer isn’t a real selector, but you get it)

The problem was, we added relationship tests between the layers, so our validation.yml has things like :

- name: validation_layer_model_1
  tests:
    - dbt_utils.equal_rowcount:
        compare_model: ref('raw_layer_model_A')

Because of the ref() to the raw layer, this relationship test is run as part of

dbt build -s layer:raw

So we build raw_layer_model_A with more data, but haven’t yet run validation_layer_model_1 and now this rowcount fails.

We thought about different ways to address this:

Change the selections we use to build the project. 👎 We’ve been otherwise happy with this command structure
Write custom tests. 👎 more code to maintain
Write where clauses to the relationship test to be sure the fact build time is after the associated raw build time. 👎 not all our models have build timestamp, so would need modify a bunch of them.
Add some 🏷️s. 👍

- ✅ simple
- ✅ no new sql code to write or maintain
- 🏁

We added tags to all cross-layer relationship tests.

- name: validation_layer_model_1
  tests:
    - dbt_utils.equal_rowcount:
        compare_model: ref('raw_layer_model_A')
        tags: val_raw_relationship

We updated the project build job as follows:

dbt build -s layer:raw --exclude tag:val_raw_relationship
dbt build -s layer:validation --exclude tag:rep_val_relationship
dbt build -s layer:reporting
dbt build -s layer:business_app

We’ve found that the excluded tests were correctly run when we built the next layer in the next build statement. But you could also add explicitly call them with a test cmd:

dbt build -s layer:raw --exclude tag:val_raw_relationship
dbt build -s layer:validation --exclude tag:rep_val_relationship
dbt test -s tag:val_raw_relationship
dbt build -s layer:reporting --exclude tag:val_rep_relationship
dbt test -s tag:rep_val_relationship
dbt build -s layer:business_app

For more insights, check out our dbt-related blogs or visit the dbt Developer Hub for comprehensive guides and updates.

Is your organization adapting to remain competitive? In The Architect Room, we design a roadmap for the future of your digital organization, while capitalizing on current technology investments. Our knowledge spans a variety of offerings across all of the major public cloud providers. Visit Red Pill Analytics or feel free to reach out on Twitter, Facebook, and LinkedIn.

Master dbt Testing: How to Avoid False Negatives when Testing Between Project Layers

About The Author

Alyssa Kunau

Master dbt Testing: How to Avoid False Negatives when Testing Between Project Layers

About The Author

Alyssa Kunau

Related Posts