.. _2024-01-16-data-quality-jobs-optimisation---take-2_release: ================================================================================= 2024-01-16 - Data Quality Jobs Optimisation - Take 2 ================================================================================= Release ================== *Status: Available* *Type: DataOps* *Date: 2024-01-16* Problem ================== The system (default) data quality jobs that check for duplicate keys in concept and event tiles are too expensive. This is becasue we are querying the _key column across the whole table looking for duplicates. Solution ================== By updating the jinga data quality template to query the clustered column (row_hash) which is just the key hashed, we can reduce the cost of these queries by 10x !!! This brings a 3000MB (3GB) query down to 300MB which is a big performance and cost saving. Leverage the Magic ==================================== A small tweak to the data quality template for concept and event tiles means a big performance and cost saving. Last Refreshed =========================== *Doc Refreshed: 2024-03-02*