Photo by Photo by Ashley Batz on Unsplash
Exploring dbt Test Features
Struggling with the dbt row_count package bug? Follow our step-by-step guide to quickly resolve the issue and keep your data pipeline running smoothly.
dbt provides a variety of intrinsic testing options. Among these, you’ll find fundamental tests like unique, not_null, accepted_values, and relationships.
Additionally, dbt offers external packages, and this blog post centers on one of them: dbt_utils. Inside this package, there’s a valuable test for comparing row counts.
To put it simply, this test evaluates two models by comparing their respective row counts to determine if they align. This testing process holds significance for validation, ensuring data integrity, and pinpointing potential project issues.
The Issue
In a dbt project within Big Query, a problem occurs when utilizing the dbt_utils.equal_rowcount package.
This problem surfaces when one of the tables being compared is empty, as the test is not equipped to handle such scenarios within Big Query.
The error message indicates that “none” is not recognized as the expected ‘integer’ type. In typical cases with other applications, querying the row count of an empty table yields a result of 0 (an integer).
However, the equal_rowcount function returns “none,” implying a potential Python-related issue within the function.
Resolving the Issue
We will develop a custom test to replace the dbt_utils rowcount function. This new test will consistently return 0 rows instead of null, effectively preventing the error from occurring.
Here’s the step-by-step process:
- Generate a custom test function.
- Integrate this custom test into the YAML configuration.
- Execute dbt to observe the bug vanishing instantly — like magic!
This custom test, housed in the test section of the dbt project, functions similarly to the equal rowcount dbt package but ensures error-free operation.
The Code
{% test equal_rowcount_bigquery (model,column_name, compare_model) %}
with rpt as (
SELECT count(*) as test_model_count
FROM {{model}}
), valid as (
select count(*) as compare_to_model_count
from {{compare_model}}
)
select *
from rpt,valid
where rpt.test_model_count != valid.compare_to_model_count
{% endtest %}
This jinja snippet is created in a new test model within the dbt project. The test compares the row count of two different models.
Within the yaml of one of the two models, (in this case we are comparing compare_to_model to test_model) a rowcount test was added.
- name: test_model
tests:
- equal_rowcount_bigquery:
compare_model: ref('compare_to_model')
What Comes Next?
Now that the test has been put into action and can be evaluated, one of two outcomes will occur: it either fails or passes.
Failures
When the test doesn’t pass, there are additional actions to take.
Upon test failure, it will generate compiled SQL code, which gets stored in the target file within your dbt project. This compiled code should then be copied and pasted into Big Query for further analysis to pinpoint the cause of the failure.
The failure signals a mismatch in row counts between the models. The SQL output will reveal the row counts in each model and offer guidance on resolving the model discrepancies.
Sql query with bug:
When the tests are executed with the recommended solution, they will no longer trigger the “none is not of type integer” error. Instead, they will accurately report 0 rows instead of null, effectively sidestepping the frustrating bug.
Passes
If the new custom generic test successfully passes, it signifies that the source table matches the target, which is always a favorable outcome.
In this scenario, there won’t be any SQL code generated in the target due to the successful test outcome. However, if there were, it would resemble the following:
Conclusion
The challenge involving the dbt_utils.equal_rowcount package has been successfully addressed. The issue caused by empty tables has been resolved by implementing a custom test that consistently returns 0 rows, eliminating the error.
Is your organization adapting to remain competitive? In The Architect Room, we design a roadmap for the future of your digital organization, while capitalizing on current technology investments. Our knowledge spans a variety of offerings across all of the major public cloud providers. Visit Red Pill Analytics or feel free to reach out on Twitter, Facebook, and LinkedIn.
For more insights, check out our dbt-related blogs or visit the dbt Developer Hub for comprehensive guides and updates.