Release Time Machine¶
This page documents updates to AgileData.
AgileData contains features that are designed to make collecting, combining and consuming data simple.
The AgileData team are constantly adding more features which result in reducing the friction for an Analysts to manage data in a simply magical way.
As we release each new feature we list it briefly in this section. They are ordered by the date we released them.
7 February 2023¶
Capture source_partition in the data_quality table in spanner so we can rollback if necessary to do a full table validation when required
Daily Health Check - check_trust_rules_passed included in daily_sla_checks
Daily Health Check - check_anomaly_summary included in daily_sla_checks
persist last_access (consume) into the users table - uses the BQ audit log extension of the existing top user pattern aginst catalog
Remove dataplex data quality pattern and redirect to updated validate_table endpoint to run trust rules
Repoint /data_quality/summary endpoint to refer to spanner table instead of BigQuery view
Tenancy Features - only update catalog for shared tiles when the tile hasn’t already been inserted - preserve user topics and description
Use ‘Show Details’ pattern on notifications screen to open existing preview modal with the results of data quality exceptions (stub feature)
Tenancy Name into Browser Title for quick identification when using multiple open tenancues
Include BQ slot time in job statistics modal screen
Exclude reserved columns from trust rules screen so users can’t set trust rules against system columns - they are already validated automatically.
15 January 2023¶
New error handling pattern in all cloud functions to send errors to a) cloud logging for the app to display and b) websocket notifications for the app to display
Updated pattern for identifying event (vs change) source data based on a known list of event columns - eg event_timestamp, event_date, event_type etc
Updated event table template to autocreate the event key based on all the ‘output event key’ rules the user creates
Updated notifications and cross navigation in the app as a proof of concept for notifications
Small updated to daily SLA and Anomaly checks to harden the patterns
14 December 2022¶
Daily SLA check pattern - auto scheduled and updated daily
Tag BQ event jobs with the config version to allow displaying in the application
Draft of anomoly checking and display into event notifcations
Table level trust rules (anomaly and sla)
Extend tenancy attributes to include deployment versions and tenancy created attributed
10 November 2022¶
Persist BQ profiles into Spanner for re-use and increased performance in the app
First draft of auto generating C, D and E rules using the persisted profile data
Inclusion of topics on the config (rule) endpoints
Include data preview on the column list endpoints (for output fields rule step)
Push ARRAY/STRUCT metadata into the data preview results so the app can customise display for nested/repeated data
Update logic for calculating a watermark for running previews - try 14 days, then full back to max date in last 6 months
Augment the unnest_fields() and unnest_data() functions to traverse VERY wide tables to create usable preview/column data
New file_header_parser function to more accurately work out header rows and delimiters being used
New create_sample_table function which uses the new file header parser
Update upsert templates - reverted to INSERT() VALUES() pattern
Combine trust scores and persist against consume tiles (ie aggregate up the score from ensembles)
Updated rule type endpoint with better sorting for readability
Implement logrocket logging into the app layer
Implement dopt user registration into the app layer
04 September 2022¶
Updated cloud build pattern for all code repositories. We now use a tag based deployment pattern for all tenancies.
Update the ‘view change rules’ and ‘view consume rules’ links on tiles to be context aware
Include ‘advanced options’ show and hide for consume config
Include the catalog_key in the ensemble_config payload to enable cross screen browsing in the app
Use coalesce() for user defined keys in config. Otherwise concat() of null keys (especially in events) produces a null key value.
18 August 2022¶
ALL NEW jinja templates deployed to all projects. Major re-write and re-factor of all change rule and config generation.
Includes the new patterns to compare existing templates with new templates when new chnage rules are introducef.
12 August 2022¶
New cloud function export_history_tables which exports all history tables from one project as avro files to be ingested into another project
Alias driving tables in consume pattern to avoid issues with reserved words
Updated check_job_status which augments the event_logging table in spanner for load_jobs. Updates the rows_loaded attribute after load completes
small bug fixes for ensemble_check and check_load_position.
Mark consume tiles that have a report_key against them (initial plumbing for reports screen)
Export all css and js assets into storage bucket for LogRocket to use when rendering activity reports.
01 August 2022¶
Update the parse status for filedrop rules so we can set them to Active in the app
New compare_template endpoint to compare existing config against config generated from new templates
Update the sqlparse formatting for generated sql so when it displays for magicians in the app its more readable and standardised
Add policy tags (group membership) to the members screen in frontend
Update preview view logic to first find the latest effective data in a view then set the preview window (instead of trailing 14 days)
17 July 2022¶
Updated Search endpoint to rank and return most relevant results in order
New cloud function excel_to_csv which parses a dropped spreadsheet and creates a csv file for each worksheet
Updated filedrop to retry a failed file load using a string version of schema (bypass autodetect)
03 July 2022¶
Move invite_user email html into a jinja template
Draft of new jinja rule templates - break up large logic blocks into smaller components for reusability
Update the Given Filedrop rule file names to exclude schema and type as this pattern is just to create a filedrop to landing wildcard mapping rule
Secure all POST endpoints in API layer to either Admin or Editor roles
14 June 2022¶
Profile all history tables and persist into a new _profiles table for analysis and auto rule creation
New endpoint to auto generate a set of C and D tables and rules based on _profiles data
Union usage from the -c project into -d results and display in the app monitoring report
New manage_gsuite functions to grant|revoke users to user groups
New union pattern to create the required sql to union two or more tables/views together safely
30 May 2022¶
New concept and event matrix endpoints
Addition of catalog_key into config responses and config_key into catalog responses for cross navigation in app
New lookahead manifest pattern for loads that start without a manifest but it exists by the time they finish
Caching pattern for app engine - always cache app pages and cache select api responses
BQ sample function when previewing event_* datasets to reduce data scanned
24 May 2022¶
Cleanup bq_table and bq_column metadata when logical tables are deleted in app
New jinja template to use BQ pivot() function
18 May 2022¶
Generate file transfer keyfile for users to automate bucket transfers
Import and Export endpoints to create and read yaml rule files
Function to archive BQ tables to GCS bucket
3 May 2022¶
Spanner - persist ensemble_config into a table (previously calculated using a view)
Switch data map and gui outputs to use the new ensemble_config table
Execution - update the job_completed function to maintain the ensemble_config whenever concepts,details or events are changed.
29 April 2022¶
Tracking - track last_executed for all config rules
Load Position - Deploy the check_load_position() function which returns dynamic watermarking based on table and config metadata
14 April 2022¶
12 April 2022¶
Data Map - updated to include shared datasets and the ability to execute rules against them as if they were local, ie run all steps from here
Tasks - new task endpoints deployed, ability to create and update tasks (trello style screen in app)
11 April 2022¶
Shared Datasets - tenancy feature to share an external dataset and have the tiles appear within the local project
Templates - new create and load templates for external shared datasets
31 March 2022¶
Messaging - deploy websockets server to enable realtime chat and messages within and between orchestration layers
9 March 2022¶
Data Quality - Run data quality (data trust) rules after BigqQuery loads complete, and update summaries used by gui to show a trust score for each tile
18 January 2022¶
- Preview - Delete Items
App, Catalog, Topic Canvas, Rules, Manage
The ability to delete a Catalog Tile, Topic Canvas or Rule. When an item is deleted it is a ‘soft’ delete. Deleted items are available in a new Manage > Deleted Items screen and can be undeleted if required.
17 January 2022¶
- Create Concept button in Topic Canvas
App, Topic Canvas
Fixed unplanned trick in Topic Canvas where the Create button for a new Concept was hiding.
13 January 2022¶
- Default topic for events created via topic canvas
App, Topic Canvas, Catalog
When a event is defined in the topic canvas the Default topic tag is automagically assigned. Previously the topics were automagically created based on the cocenpts that werre added to the event, but this created to much noise and made it more complex than it needed to be.
- Streamlined deployment process
Streamlined the automation of code deployed across tenancies. This included the deployment of all code components. Reduced the effort to deploy across multiple tenancies.
12 January 2022¶
- Removed Tenancy Dependency for deployment
Removed the need to inject the tenancy name in the deployment process, reduced time to deploy.
From the begining of time¶
Data Validation across areas
Automagical reconciliation of data between areas and alerting of any anomaly’s.
Rule execution moved to pub/sub
Refactored rules hand offs to decouple them from direct hand off’s to using pub/sub.
Dynamically determine dependencies across rules at time of rule execution to ensure immediate consistency of concept, detail and event data when refreshing Consume views.
Ability to automagically push Consume views to a csv file to enable them to be consumed via an API.
Collection effective dated
Ability to add a effective date to Landing rules to define a date key for determining change data when loading History.
Google Secret Manager for Third Party Access
Leverage Google Secret Manager to provide a central place and single source of truth to manage, access, and audit secrets for third party app integration.
Levenshtein Distance Rule Pattern
Added Levenshtein rule pattern to allow Single Magic Record (Master Data Management) matching.
Extract table DDL from Oracle and SQL Server systems of records and convert it to BigQuery DDL to automatically create the tables in AgileData.
Shopify Change Rules
Define default change rules for Shopify, including Customers, Products, Orders and Transactions.
Unest Change Rule
Ability to define an unest rule step in a change rule.
Slack Logging Notifications Channel
Integrate Slack as a notification channel to view runtime logs and errors
Schedule based invocation of rules
Rules can be triggered/innovated based on a fixed schedule, for example at 2am each working day.
Shopify API Collection Rule
Change rule to allow you to automatically collect data from the Shopify API.
Rule and catalog API
API to allow rules and catalog entries to be called via the GUI or any other mechanism.
Rule execution code visibility
You can now see the code that will be executed for a rule.
Rule Natural Language Parsing
First version of natural language parsing of rules to:
identify key change rules words define key words for rule pattern execution
Natural Language Rule Framework
Allows the parsing of a rule and storage of the rule pattern mapping in a dedicated high performance data repository to allow multiple concurrent requests from the GUI.
Concept and Detail Storage
Concepts and Detail catalog entries are now stored in a dedicated high performance data repository to allow multiple concurrent requests from the GUI.
Rules are now stored in a dedicated high performance data repository to allow multiple concurrent requests from the GUI.
Automated deployment framework for documentation and documentation site setup.
Rules SQL is validated before rule is submitted to pub/sub for execution.
Automated User provisioning
Users are automatically provisioned when tenancy is created.
Soundex Rule Pattern
Added soundex rule pattern to allow Master Data Management matching.
Automatic provisioning of security groups
When automatically provisioning a tenancy standard security groups are also automatically deployed.
Filedrop bucket naming
Improved security of filedrop area by anonymising the filedrop bucket names to ensure they are not discoverable.
Separation of the filedrop area
The filedrop area is separated from other data areas to increase security in depth.
You can create a rule that uses a consume table as an input and outputs another consume table. This is useful when you want to create custom consume views with “pretty” field names (aka a semantic layer) for your visualisation or analytics tool to use.
Automatic creation of consume views in the consume area
Consume views are automatically created in the consume area when the consume tables are created in the event/consume layer.
Separation of the consume area
The consume area is separated from other data areas to increase security in depth.
- Data PII profiling
Profile data to identify any personally identifiable data that is being stored.
DDL files to create history tables
Allows files containing DDL to be droped into filedrop and they generate the table structure for the history tables.
Version Change Rules
When a change rule is updated and the rule executed the rules is versioned in config.
Data profiling stored in catalog
Results of the data profiling is stored against the object in the data catalog.
Initial cut of tracing lineage of data in filedrop all the way through history and event processing to the consume tables.
Updated Detail Fields
When you change the field list for a Detail change rule, the Detail table is dropped and rebuilt on next load to accommodate the field changes.
Rollback of loads
Option to reset load watermarks to force reload of data into history, events or consume.
Persist load statistics for data movement in filedrop, history, events and consume.
Rule execution state
Manage rule execution state to ensure two rules cannot simultaneously update the same concept, detail or event table
Rules issue a callback to dependent rules to remove the risk of time out issues with multiple dependent rule execution.
Filedrop based invocation of rules
When a file is dropped the relevant rules tat are dependent on that file will automatically execute.
Rule are executed via a publish and subscribe pattern rather than a data flow pattern to allow rules to be authored and executed in isolation of other rules, while also allowing them to be combined into an end-to-end data pipeline.
Autodetect preloaded csv file in filedrop
When dropping a new csv file into the filedrop area the file is compared against the previous file loaded and if it is the same the file is not reloaded.
Autodetect csv file metadata
For new files the ingestion process automatically samples the csv files and determines the file structure. This create a change rule to load the data into a history table. The metadata description and change rule is persisted is retained to ensure the same change rule is used on subsequent files.
Auto Generate Consume Tables
After a change rule has executed consume tables are created. Concepts and any related details are denormalised into a single table. Events, the relevant Concepts that are part of the event definition and the details for those concepts are denormalised into consume tables. All tables are as at the current point in time.
Rule Master Pattern - Output Concat Key
Allows the selection of multiple fields to use as the business key for a History table, Concept, Detail or Event.
Rule Master Pattern - Output Key
Allows the selection of the field to use as the business key for a Concept, Detail or Event.
Calculation change rule pattern
Allow the use of a formula to calculate a value, for example a / b
Aggregated change rule pattern
Allow the use of a formula to calculate a value, for example sum (a)
Parse change rule pattern
Allow the use of a formula to parse a field, for example split(trim(a),’ ‘) : a1
Single table relationship change rule pattern
Join the fields in a single table.
Multiple table relationship change rule pattern
Join fields across two tables.
Field filter change rule pattern
Filter on a field for a given value.
New event change rule pattern
Config support the creation of event records via a change rule.
New detail change rule pattern
Config support the creation of a detail records for a concept via a simple change rule.
New concept change rule pattern
Config support the creation of a concept records via a simple change rule.
Change data recognition and history point in time updates
When new data is ingested into the history tables, and similar data has previously been ingested into those tables, the ingestion processes identifies new records versus updated records. For records that have changed rather than being new, the previous version of that record is end dated.
File drop area
Users are able to drop a file into a filedrop bucket to enable it to be consumed into the history tables.