2024-02-28 - Landing to history for intraday change data¶
Problem¶
The original landing into history pattern used a simple row hash (excluding the effective date) to determine if we had seen a record before.
This works fine for most data, except where we have intraday (or daily) changes to an attribute like ‘status’, in that scenario we can miss status changes as the row hash would be the same if a status went from Active to Cancelled, then back to Active again, in that example we wouldn’t load the second Active record as we have already seen that record (row hash)
Solution¶
The solution was to include the effective date in the row hash to increase the granularity for intraday changes, and make sure we persist records that have changed multiple times in the same day.
This also makes sure that all status changes are captured as the inclusion of the effective date in the row hash of the status change records means we persist them all into the downstream history table.
Leverage the Magic¶
A small change to our extensible jinja template pattern so that if the user provides effective date in their data files, we can correctly pickup intraday change records
Magician Partner¶
'{event_id}' AS event_id,
{%- if "_key" in config.source_columns[0] -%}
{{config.source_columns[0]}} AS row_key,
{%- else -%}
{% if parsed.history_effective_date or "effective_date" in config.source_columns|lower %}
FARM_FINGERPRINT(TO_JSON_STRING(ARRAY(SELECT AS STRUCT DATE(effective_date) AS effective_date,first_parse.* EXCEPT(effective_date)))) AS row_key,
{% else %}
FARM_FINGERPRINT(TO_JSON_STRING(ARRAY(SELECT AS STRUCT first_parse.* EXCEPT(effective_date)))) AS row_key,
{%- endif -%}
{%- endif -%}
first_parse.*,
{%- if "expiry_date" not in config.source_columns -%}
TIMESTAMP(NULL) AS expiry_date,
{%- endif -%}
FALSE AS deleted_ind
Last Refreshed¶
Doc Refreshed: 2024-05-20