Table Of Contents
Table Of Contents

2023-03-27 - Multiple Tile Sources Iteration

Release

Status: Available

Type: DataOps

Date: 2023-03-27

Problem

Joining different sources together inherits complexity when managing load windows.

Solution

Update the exisitng lookup_load_position to make it smarter and handle more scenarios.

Leverage the Magic

A behind the curtain feature, no action by Data Magicians required.

ADI

Wohoo!, as an AgileData Data Ops Engineer, I like simplicity in the patterns that we depend on to safely move data through the layers.

Customer

Rad!, being able to just pick tiles, join tiles, augment tiles and having the data refresh safely is a huge plus for me. I never have to worry about how the data gets refreshed, it just turns up … like magic!

Magician Partner

lookup_load_position has been updated to handle multi source rules (and lagging dependant tables). lookup_load_position(source, target) is a function we use to determine what the ‘partition filter’ value should be whenever we run a change rule to load from source into target. All change rukes include a where condition of where effective_date >= ‘{source partition}’ , eg where effective_date >= ‘2023-03-01’, When a rule has multiple source tables, they all get the same filter applied (based on the driving source table). When we run the config to create/update a target table we check when the source last updated the target, or when the target was created (it may be a new table) then set the filter to be a trailing 3 day window. This is fine for a simple source to target, but if a target has multilple sources, eg a consume which joins a concept to multiple detail tables, and one of those details has stopped loading for whatever reason then using the filter based on the concept would miss records if that detail was reloaded. The new pattern checks the last updated/created for all the tables used in a rule and sets the filter to the oldest date if one of the sources is out of sync with the others. When this happens a note is put in the log that the load watermark hasd been rolled back to a date outside the current 3 day trailing pattern.

Last Refreshed

Doc Refreshed: 2024-03-02