Table Of Contents
Table Of Contents

2023-05-22 - Version Dropped Files Automatically

Release

Status: Available

Type: DataOps

Date: 2023-05-22

Problem

When Data Magicians drop multiple files with no version or timestamp in the name the files end up being overwriten in the archived bucket (because they have the same name).

When Data Magicians drop an initial load and backfill load files at the same time there is no pattern to safely ‘rack and stack’ them into the same landing tile because the file name was used to name the landing tile, and if multiple versioned files were dropped they would end up in different target landing tiles.

Solution

Check for and append the date the file was droppped to the dropped file name, will automatically provide file versioning for landing and history data, and versioning of archived files.

Leverage the Magic

A magical feature, no action by Data Magicians required.

ADI

Eureka!, no more file drop errors when multiple files are dropped at the same time.

Customer

Jolly good!, I can drop files until my heart is content and AgileData will magically make my data turn up.

Magician Partner

If a user drops the same file name over and over then append the timestamp to the file name when moving to processed_ , this is important so we can accurately track which file wrote which rows (event_id) into our landing and history tables By augment the existing pattern which renames a file if it starts with a digit, we check for date/datetime/timestamp in the filename, if not exists then append one so we can version incoming files and keep track when they are moved to processed_ bucket. Leverage the existing regex file name checks and tidy pattern (where we check for invalid table name characters and remove them) to append a date to the file name if none is found.

Last Refreshed

Doc Refreshed: 2024-05-20