2023-03-27 - Orchestration Iteration¶
Using a wait pattern inside a serverless cloud function is not a very efficient pattern.
Use a log event trigger (bigquery job completed), so instead of waiting for something to finish we shut down all cloud functions and the log event triggers the next function to start.
Leverage the Magic¶
A behind the curtain feature, no action by Data Magicians required.
Boom!, as an AgileData Data Ops Engineer, I like simplicity in the patterns that we depend on to safely move data through the layers.
Hurrah!, it’s great to know the platform is constantly being improved to make it simpler and more efficient.
Previously we used a ‘wait’ function that checked job statuses every 30 seconds until they completed. This wasn’t efficient and meant a cloud function was tied up waiting which was causing scaling issues. This pattern has been replaced with the log trigger that fires whenever a bigquery job completes. We have always used the log trigger as an action to start post processing tasks, eg update catalog, validate data etc, but now its used as our core orchestration trigger. When a bigquery job completes it triggers the bigquery_job_completed cloud function and we can then check job status, and initiate post processing and trigger the next config job. This meant a complete re-work of our core orchestration functions, in particular the manifest creation and checking (for ensembles). The original ‘wait’ pattern worked well for manifest processing because we just kept ‘waiting’ and passing the details of the manifest we were waiting on back to the calling function until a manifest was complete. Because the jobs are now running as standalone, to keep track of the manifest we persist it into spanner, then whenever a job completes we query spanner to find out which manifest it was part of and perform our status checks.
Doc Refreshed: 2023-11-23