2024-08-15 - Processing Enhancements¶
Problem¶
With the onboarding of a new customer a number of existing processes need to scale to handle large volumes of files being ingested very day
Solution¶
The cloud function layer which handles all file processing and data loading had a whole raft of enhancements deployed in July, including:
Auto schema generation updated to set known source timestamps (eg created_at, updated_at etc) to be timestamps when we generate a schema, instead of defaulting to a string data type
Handling of nested files, eg when folders are dropped into the load bucket and then files dropped into that folder we end up with file names like /prod/customer/20240101.csv - the file name parser has been updated to automatically resolve this pattern
New autogenerated system trust rule for effective dates to check for duplicate timestamps for a business key, this is important to detect when change data is received but the timestamp to specify its effectivity is not unique
Updated rule templates to handle joining change table tables, to make sure the point in time between tables matches up correctly. Historically we just joined tables and used effective_date is null to pickup the latest record for each business key.
Leverage the Magic¶
We are constantly making small improvements and enhancements to our data processing layer.
ADI¶
Nailed it!
Last Refreshed¶
Doc Refreshed: 2024-08-21