A Thought on ODI Variables
This is my first contribution on Medium under the Red Pill Analytics brand. I hope I will keep up the high standards of my co-workers.
One of the projects I have been working on recently had a requirement to enrich some data by bringing back additional information from a legacy system, For various reasons they did not want the whole of the legacy system extracted to enterprise data warehouse. We only need extract data for records that change in the current load cycle.
This may sound simple: we join the active records to the legacy to create an enriched data set. There is however a problem with this approach (or at least on Oracle databases where the legacy system is accessed through a database link) and that is driving site hints are ignored on insert statements. We end up a horrible execution plan as mentioned here by Jonathan Lewis. Yes, there are solutions, such as the pipelined table approach that I developed for that customer, but here I needed to do something in entirely in ODI.
Sometimes we need to step back and think “is the use of a join the right thing to do?” Can I code this another way that doesn’t require me to write a function that I will need to maintain in the database?
ODI gives us two fundamental ways of restricting the amount of data we bring back, the join and the filter; is there a way we could use a FILTER to restrict our data? Obviously we can hard code filters such as “data1.STATUS=‘Active’” but what if we need flexibility to dynamically choose our status value. This is of course simple for a single value — we use a variable in the filter. But what if we need to filter using an IN list, for example ‘data_source.key IN(1,2,5,7,99,2345)’? how do we supply a variable number of values? Here is one method that we can use.
Oracle 11gR2 introduced the LISTAGG function to the database. This function basically concatenates several rows of data into a single string. In our case we want to generate a comma separated list of values that we can slot into our filter expression in ODI. The variable refresh is a simple SELECT LISTAGG… from our list source table. Our filter expression in ODI would look something like
The Oracle documentation for LISTAGG has the syntax. For our use we need to specify the delimiter between items. For numeric matches this will need to be ‘,’ a comma, it is a bit more complex for string matches, here we need to quote each item so would need to quote our comma — we also need to quote the variable in the ODI filter condition so that we supply leading and trailing quotes
For example we could define the refresh for the variable Filter_List as
SELECT LISTAGG(TO_CHAR(PRODUCT_KEY) ,’,’) WITHIN GROUP (ORDER BY PRODUCT_KEY) from PRODUCTS_LOADED
I try to remember to explicitly to define type conversions when converting between numbers and text to avoid any implicit conversion surprises.
There are of course limitations to this approach. We can only have a finite size for a ODI character variable and there is that limitationon in-list size in the database. But for reasonable sized selections this approach works well.