Catalog - Tile Detail - Trust Rule Rollup¶
Overview¶
Ensuring data quality is a critical pattern for all data teams. It is important to know whether you can trust the data at hand and whether it meets organisational data quality standards to be used for data visualization and reporting or used in artificial intelligence systems and machine learning models.
The AgileData App and AgileData Platform makes it easy for you to ensure the quality of the data in the tiles.
The feature of Trust Rules supplies the data team with a robust capability to validate any existing or incoming data conforms to defined data quality standards.
Trust Rules can be set for any tile in the Catalog. It is especially advantageous to set Trust Rules for Concept, Detail and Event Tiles for these not only apply to the tiles themselves but also to any Consume Tiles that rely on these dependent tiles.
If you want to learn more about the relations between the tiles and how the Trust Rules of consumable tiles are assembled, you can check the Data Map or the Event Matrix.
Trust Rules are set for individual fields of your data. For example, you could set a rule to ensure the email values in an email field have been captured in a valid email format.
You can choose the level of the Trust Rule from an internal hierarchy of trust levels. This enables you to set up a simple or complex system of data quality checks in the AgileData App.
Steps¶
1. Check Trust Rules overview in the tile detail screen¶
Trust Rules are an important feature of the AgileData platform to ensure data quality in all tiles. The Trust Rules overview provides you with a concrete evaluation of whether the data in the tile can be trusted or whether it might have quality issues. You can find the Trust Rules overview in the tile detail screen.
Here you can see which types of Trust Rules apply to the current tile and how many of them were passed or which ones failed. You also see a percentage as a general score of how much you can trust the data.
Trust Rules have an internal hierarchy which is represented by different fish types:
Dr. Bonefish: Represents an error, the strongest possible trust rule failure you can receive.
Piranha: Represents an alert, which is somewhere between an error and a warning.
Shark: Represents a warning, which is weaker than an error and an alert.
Anomaly Fish: Represents an automatically detected anomaly. These are triggered by system-generated Trust Rules that aren’t manually defined by users. (For example, Concept Tiles should always have a unique key, so a Trust Rule is automatically set to check this for such a tile. The results of these automated rules are shown under this category.)
You can set the error level of the Trust Rules when you’re creating them unless they are system-generated Trust Rules. In the latter case, they will be represented by an Anomaly Fish in the overview.
In the following example, four Trust Rules apply to the tile. All of them are automatically detected anomalies, which means they are system-generated Trust Rules. All four rules were passed by the data in the tile, resulting in an overall score of 100%. Therefore, the data analyst querying the tile can generally trust the data.
2. Set new Trust Rules for tiles in the Catalog¶
Trust Rules applying to a consumable tile are often made up of an assortment of Trust Rules that apply to tiles that feed data into the consumable tile like Concept or Detail Tiles. Consumable Tiles are put together by combining related tiles, and so is the set of rules applying to them.
You can set new Trust Rules for a tile in multiple places:
From the preview modal when defining a new Change Rule or when modifying a current rule.
From the Catalog screen, using the three dots Menu Anywhere option.
From the Catalog Tile Detail screen, using the three dots Menu Anywhere option.
From the Field List area in the Tile Detail screen.
For this example will access the Trust Rule defintions for the tile directly from the Catalog screen.
To open the trust rule screen for any tile in the catalog screen, click on the Menu Anywhere three dots option in the top right corner of the tile in the Catalog and click on “View Trust Rules”.
For this example, we’re going to apply a new Trust Rule for the Detail Tile “Ecommerce User”, which also will apply to the associated consumable tile “Order is shipped to User”.
The trust rule overview lists all fields of the data in the tile and shows information on field name, field type, sample field content and what Trust Rules are tied to the field.
To apply a new Trust Rules for a field, click on the list icon in the associated row in the rule overview table. For this example, we’re going to set a new rule for the email address field of the user.
You can select the Trust Rules in the pop-up window that appears on the left.
Each Trust Rule checks whether a certain business logic applies to the values of the associated field.
You can check
“is unique” - all values of the field must be unique and there are no duplicate values,
“not null” - values of the field can’t be null,
“valid email” - value is a valid email address,
“zero or positive” - values of the field must be zero or positive.
To set a rule, activate it and select its error level by clicking on any of the fish types.
In this case, we want to check whether the entered email address of the email field is valid, so we activate the “valid email” rule and select “alert” as an error level, which is represented by the second fish.
Once you’re finished setting your Trust Rules, close the sidebar and a dialogue in ADI will appear.
The Trust Rules changes are automatically applied, they will also automatically run whenever new data is collected into this tile.
Confirm with ADI if you also want the Trust Rules to be run immediately.
3. View the effect of new Trust Rules on consumable tile¶
If you want to view the effect of the new Trust Rules on the tile or any dependent consumable tile, you can go back to the tile detail screen of “Order is shipped to User”. Here you can see two new rules of the type alert, corresponding to the Piranha Fish, were successfully executed on the data in the tile.
The Trust Rules show up here as well in addition to the original Detail Tile because the rules of consumable tiles are put together from the rules of component tiles that are related to the consumable tile.
You can view these relations in the Data Map and the Event Matrix to understand where the Trust Rules in your tile might’ve originated from.
This is part of the magic of the AgileData App - you can establish rules for your data once and they will affect the entire lineage of your data.
Optional Steps¶
1. Check the Data Map to view connections between tiles¶
To understand which data got fed into a specific tile and which rules apply to it, you can open the Data Map which provides a complete overview of how the tiles in your instance are connected and how they are feeding data into each other. Through this, you can understand where the data in the tile came from and which tiles it supplies with data in return - a crucial step in investigating the quality of the data.
You can access the Data Map by clicking on the “Rules” menu and selecting the “Data Map” option.
The Data Map is a visual schema of all the tiles in your instance and the rules that are connecting them. The flow in the schema is from the left to the right. The tiles on the left side are the initial history tiles which are supplied with freshly collected data. This data is moved toward the right side which features consumable tiles with data ready to be queried or used in analytical models.
To investigate any tile and its connections, click on the circle that represents the tile in the map and hover over “View Lineage”. Now you can see a color-coding which shows where the tile in question got its data from on the left side and which tiles it feeds with data on the right side.
2. Check the Event Matrix to view connections between tiles and events¶
The Event Matrix provides you with another way of understanding how the data in your tiles is related. In the Event Matrix, events represented by Event Tiles are mapped to the concepts that are involved in conducting this event represented by Concept Tiles.
To access the Event Matrix, click on the “Design” menu and select the “Event Matrix” option.
The Event Matrix is a visual schema in a matrix format showing relations between Event Tiles and Concept Tiles. Events are listed in rows on the left side, such as “order shipment user” which represents the event of an order being shipped to a user. Concepts are listed in columns at the top of the table. In this case, the event “order shipment user” is related to the concepts “ecommerce shipment”, “ecommerce order” and “ecommerce user” as all these are involved in conducting the event.
To open the tile detail screen of the associated consume tile, you just need to click on any of the dots in the matrix. In this case, we have opened the tile detail screen for the consumable tile “Order is shipped to User”. In the tile detail screen, you can find further information on any trust rules that apply to the tile.
You can get to the Catalog Tile Details screen from any other screen by using the Menu Anywhere … option when its is available.
Use Cases¶
1. Ensure data quality and trustworthiness of your data¶
Using Trust Rules for your tiles is one of many ways you can ensure your data is meeting internal or external data quality standards. Data of high quality is defined as data that is fit for its intended uses in operations, decision making and planning. This also means data quality is a highly subjective area of data management.
In the end, it depends on the intended use of the data and the assessment of the data team and data consumers whether the data is meeting quality standards or not. Data that might be considered of low quality in scientific research, might still be of high value in a business setting.
With the AgileData App, you can implement your own data quality standards depending on the data that is available to you and the needs of your organisation.
Trust Rules enable you to set up data checks for each tile and fields individually, reflecting your organisations assessment and what’s considered trustworthy or high quality in your data.
2. Create a hierarchy of data quality errors¶
Trust Rules can be set with several different error levels depending on how severe you would rate the incident or artifact. This enables you to create an internal hierarchy of data quality errors. For example, you could treat incidents where you encounter a NULL value in a field as an error and incidents where you encounter a non-valid email address only as a warning.
Through this, you can establish a sophisticated system of data quality checks resulting in warnings that are easy to interpret for your team. The Trust Rules overview in the tile detail screen lets your data analysts gain an immediate understanding of the severity of quality issues of the data in the tile and whether the tile is fit for the intended use of the data or whether there is still some data remediation to be done.