In the depths of time before ODI 12c, ODI Studio had a fairly simple canvas to design our interfaces. We dropped on our datastores (tables) and then dragged columns between tables to create joins, and columns to the canvas to make filters. If we needed to use expressions we just typed them on the target table and selected where to execute the expression, be it source, stage or target.

In a data warehouse world we frequently need to lookup values, typically this would be by joining to a dimension table on the business key and picking up the surrogate key from the dimension. Sometimes, we would need to outer join to cover the case of missing keys (hopefully only on dimension loads), downstream we would convert the resulting NULL to another value using some form of expression. Of course, the more adventurous of us would write custom IKMs to handle dimension loads and fact load key lookup chains, the later is not too hard if we create a metadata table to identify all the keys to supply.

ODI 12c changes things. We now have a palette of components that we can drop on to our logical design canvas. On the palette we have the join operator and, now, a look-up operator. So what’s the difference between the two and why should we use one or the other.

Firstly, there is nothing wrong with using the pre-ODI 12c design technique of joining to a table supplying our lookup data using the key value… it works, and working is always the fundamental requirement to meet.

The lookup component gives us some additional functionality compared to a join. Firstly we can supply a default value for a key that does not exist in our look-up table. Using a join we might have to do an outer join followed by an expression operator providing a CASE statement to replace the NULL from the unmatched key. This however is not the only thing we can do with lookup component as we shall see.

Let’s take a look in the Oracle ODI 12.2 VM. In the ODI LCM Trunk repository there is a mapping that uses a lookup, and in this case we are not finding a key, but instead enhancing a feed to give age ranges.

We supply our lookup condition

Next we define our Match Row Rules

What happens if our key is not unique in the lookup table? This can happen in a versioned table. Sometimes we can change our conditions to force a single row select, but sometimes we need to be able to handle multiple rows. There is a dropdown to help us define our action

As you see, a lot of choice. The last three choices require us to specify an ‘order by’ column This can be good if we need to enforce using the most recent value in the lookup table.

Below the Multiple Match Rows dropdown there is another for No-Match Rows. This is the action to take when a match is not found. By default this is set to return a row, in other words do an outer join. We have two choices here, inner or outer join:

Finally, we can supply default values for the no-match rows. For example, we can set a key value to a key for ‘not known’.

So how does this look in the generated SQL?

Setting a default value of ‘0–110’ for age range we see

select 
 SRC_CUSTOMER_A.CUST_ID,
 INITCAP(SRC_CUSTOMER_A.FIRST_NAME) || ' ' || INITCAP(SRC_CUSTOMER_A.LAST_NAME),
 SRC_CUSTOMER_A.ADDRESS,
 SRC_CUSTOMER_A.CITY_ID,
 SRC_CUSTOMER_A.PHONE,
 SRC_CUSTOMER_A.AGE,
 CASE WHEN (SRC_AGE_GROUP_A.AGE_MIN IS NULL AND SRC_AGE_GROUP_A.AGE_MAX IS NULL AND SRC_AGE_GROUP_A.AGE_RANGE IS NULL) THEN '0-110' ELSE SRC_AGE_GROUP_A.AGE_RANGE END,
 SRC_SALES_PERSON_A.FIRST_NAME || ' ' || INITCAP(SRC_SALES_PERSON_A.LAST_NAME)
from (ODI_STAGING.C$_0SRC_CUSTOMER SRC_CUSTOMER_A LEFT OUTER JOIN ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ON SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP_A.AGE_MIN and SRC_AGE_GROUP_A.AGE_MAX = SRC_AGE_GROUP_A.AGE_RANGE
 ) INNER JOIN ODI_STAGING.C$_2SRC_SALES_PERSON SRC_SALES_PERSON_A 
 ON SRC_CUSTOMER_A.SALES_PERS_ID = SRC_SALES_PERSON_A.SALES_PERS_ID
where 
 (1=1)

The code in bold italics is specific to our LOOKUP component. A left outer join in the select and a case statement to replace the NULL on not matching.

Setting the lookup to return the last single value the code becomes more complex as we add in some analytics and change the way the case statement works. Note, I am using the supplied mapping as an example, it is most unlikely that I’d really order on the AGE_MIN!

select 
 SRC_AGE_GROUP.CUST_ID,
 INITCAP(SRC_AGE_GROUP.FIRST_NAME) || ' ' || INITCAP(SRC_AGE_GROUP.LAST_NAME),
 SRC_AGE_GROUP.ADDRESS,
 SRC_AGE_GROUP.CITY_ID,
 SRC_AGE_GROUP.PHONE,
 SRC_AGE_GROUP.AGE,
 CASE WHEN SRC_AGE_GROUP.TC > 1 OR SRC_AGE_GROUP.AGE_MIN IS NOT NULL AND SRC_AGE_GROUP.AGE_MAX IS NOT NULL AND SRC_AGE_GROUP.AGE_RANGE IS NOT NULL THEN SRC_AGE_GROUP.AGE_RANGE ELSE '0-110' END,
 SRC_SALES_PERSON_A.FIRST_NAME || ' ' || INITCAP(SRC_SALES_PERSON_A.LAST_NAME)
from (
SELECT 
 SRC_CUSTOMER_A.DEAR DEAR ,
 SRC_CUSTOMER_A.LAST_NAME LAST_NAME ,
 SRC_CUSTOMER_A.FIRST_NAME FIRST_NAME ,
 SRC_CUSTOMER_A.ADDRESS ADDRESS ,
 SRC_CUSTOMER_A.CITY_ID CITY_ID ,
 SRC_CUSTOMER_A.PHONE PHONE ,
 SRC_CUSTOMER_A.AGE AGE ,
 SRC_CUSTOMER_A.SALES_PERS_ID SALES_PERS_ID ,
 SRC_CUSTOMER_A.CUST_ID CUST_ID ,
 SRC_AGE_GROUP_A_1.AGE_MIN AGE_MIN ,
 SRC_AGE_GROUP_A_1.AGE_MAX AGE_MAX ,
 SRC_AGE_GROUP_A_1.AGE_RANGE AGE_RANGE ,
 ROW_NUMBER() OVER (PARTITION BY SRC_CUSTOMER_A.AGE ORDER BY SRC_AGE_GROUP_A_1.AGE_MIN ASC) RN ,
 COUNT(*) OVER (PARTITION BY SRC_CUSTOMER_A.AGE) TC 
FROM 
 ODI_STAGING.C$_0SRC_CUSTOMER SRC_CUSTOMER_A LEFT OUTER JOIN (
SELECT 
 SRC_AGE_GROUP_A.AGE_MIN AGE_MIN ,
 SRC_AGE_GROUP_A.AGE_MAX AGE_MAX ,
 SRC_AGE_GROUP_A.AGE_RANGE AGE_RANGE 
FROM 
 ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ) SRC_AGE_GROUP_A_1 
 ON SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP_A_1.AGE_MIN and SRC_AGE_GROUP_A_1.AGE_MAX = SRC_AGE_GROUP_A_1.AGE_RANGE
 ) SRC_AGE_GROUP INNER JOIN ODI_STAGING.C$_2SRC_SALES_PERSON SRC_SALES_PERSON_A 
 ON SRC_AGE_GROUP.SALES_PERS_ID = SRC_SALES_PERSON_A.SALES_PERS_ID
where 
 (1=1)
 and (TC=RN)

Erroring on multiple rows increases the complexity further:

select 
 SRC_CUSTOMER_A_1.CUST_ID,
 INITCAP(SRC_CUSTOMER_A_1.FIRST_NAME) || ' ' || INITCAP(SRC_CUSTOMER_A_1.LAST_NAME),
 SRC_CUSTOMER_A_1.ADDRESS,
 SRC_CUSTOMER_A_1.CITY_ID,
 SRC_CUSTOMER_A_1.PHONE,
 SRC_CUSTOMER_A_1.AGE,
 SRC_CUSTOMER_A_1.COL_2,
 SRC_SALES_PERSON_A.FIRST_NAME || ' ' || INITCAP(SRC_SALES_PERSON_A.LAST_NAME)
from (
SELECT 
 SRC_CUSTOMER_A.DEAR DEAR ,
 SRC_CUSTOMER_A.LAST_NAME LAST_NAME ,
 SRC_CUSTOMER_A.FIRST_NAME FIRST_NAME ,
 SRC_CUSTOMER_A.ADDRESS ADDRESS ,
 SRC_CUSTOMER_A.CITY_ID CITY_ID ,
 SRC_CUSTOMER_A.PHONE PHONE ,
 SRC_CUSTOMER_A.AGE AGE ,
 SRC_CUSTOMER_A.SALES_PERS_ID SALES_PERS_ID ,
 SRC_CUSTOMER_A.CUST_ID CUST_ID ,
 (
SELECT 
 SRC_AGE_GROUP.AGE_MIN AS AGE_MIN 
FROM 
 (
SELECT 
 SRC_AGE_GROUP_A.AGE_MIN AS AGE_MIN ,
 SRC_AGE_GROUP_A.AGE_MAX AS AGE_MAX ,
 SRC_AGE_GROUP_A.AGE_RANGE AS AGE_RANGE 
FROM 
 ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ) SRC_AGE_GROUP 
WHERE
 (SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP.AGE_MIN and SRC_AGE_GROUP.AGE_MAX = SRC_AGE_GROUP.AGE_RANGE
) ) COL ,
 (
SELECT 
 SRC_AGE_GROUP.AGE_MAX AS AGE_MAX 
FROM 
 (
SELECT 
 SRC_AGE_GROUP_A.AGE_MIN AS AGE_MIN ,
 SRC_AGE_GROUP_A.AGE_MAX AS AGE_MAX ,
 SRC_AGE_GROUP_A.AGE_RANGE AS AGE_RANGE 
FROM 
 ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ) SRC_AGE_GROUP 
WHERE
 (SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP.AGE_MIN and SRC_AGE_GROUP.AGE_MAX = SRC_AGE_GROUP.AGE_RANGE
) ) COL_1 ,
 COALESCE(((
SELECT 
 SRC_AGE_GROUP.AGE_RANGE AS AGE_RANGE 
FROM 
 (
SELECT 
 SRC_AGE_GROUP_A.AGE_MIN AS AGE_MIN ,
 SRC_AGE_GROUP_A.AGE_MAX AS AGE_MAX ,
 SRC_AGE_GROUP_A.AGE_RANGE AS AGE_RANGE 
FROM 
 ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ) SRC_AGE_GROUP 
WHERE
 (SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP.AGE_MIN and SRC_AGE_GROUP.AGE_MAX = SRC_AGE_GROUP.AGE_RANGE
) )),'0-110') COL_2 
FROM 
 ODI_STAGING.C$_0SRC_CUSTOMER SRC_CUSTOMER_A 
 ) SRC_CUSTOMER_A_1 INNER JOIN ODI_STAGING.C$_2SRC_SALES_PERSON SRC_SALES_PERSON_A 
 ON SRC_CUSTOMER_A_1.SALES_PERS_ID = SRC_SALES_PERSON_A.SALES_PERS_ID
where 
 (1=1)

Finally, electing to ignore no-match rows removes the outer join

select 
 SRC_CUSTOMER_A.CUST_ID,
 INITCAP(SRC_CUSTOMER_A.FIRST_NAME) || ' ' || INITCAP(SRC_CUSTOMER_A.LAST_NAME),
 SRC_CUSTOMER_A.ADDRESS,
 SRC_CUSTOMER_A.CITY_ID,
 SRC_CUSTOMER_A.PHONE,
 SRC_CUSTOMER_A.AGE,
 SRC_AGE_GROUP_A.AGE_RANGE,
 SRC_SALES_PERSON_A.FIRST_NAME || ' ' || INITCAP(SRC_SALES_PERSON_A.LAST_NAME)
from (ODI_STAGING.C$_0SRC_CUSTOMER SRC_CUSTOMER_A INNER JOIN ODI_STAGING.C$_1SRC_AGE_GROUP SRC_AGE_GROUP_A 
 ON SRC_CUSTOMER_A.AGE between SRC_AGE_GROUP_A.AGE_MIN and SRC_AGE_GROUP_A.AGE_MAX = SRC_AGE_GROUP_A.AGE_RANGE
 ) INNER JOIN ODI_STAGING.C$_2SRC_SALES_PERSON SRC_SALES_PERSON_A 
 ON SRC_CUSTOMER_A.SALES_PERS_ID = SRC_SALES_PERSON_A.SALES_PERS_ID
where 
 (1=1)

This just shows the richness of implementation available in the Lookup operator — and more importantly a way to remove the multiple steps we may have to implement with the join component.

You may also like

Leave a comment