2024-08-15 - Bucket Sync¶

Release¶

Status: Available

Type: DataOps

Date: 2024-08-15

Problem¶

When we receive replicated files from a customer we need to maintain the received file in its current location as the replication service relies on seeing both the source and target files to keep them in sync.

The only issue with that is we need to move the files into fieldrop to load them, then archive the processed (loaded) files.

If we take the file out of the replication bucket then the replication service will keep copying it back in.

Solution¶

A new cloud function called bucket_sync() which copies replicated files into filedrop so they can be processed, while leaving the original file in place

The logic is based on a window of time for new files and handles full reloads (ie parse all the files again), and new groups of files turning up

It can also be called with a single folder name to just load from a single set of files if required for testing or onboarding new datasets.

Leverage the Magic¶

This was a new cloud function to support a new ingestion pattern in our processing layer.

ADI¶

Nailed it!

Last Refreshed¶

Doc Refreshed: 2024-08-21