Common Data Warehouse Terms¶
The Information Technology domain is rife with technical terms and three letter acronyms (TLA’s). The same term often get used to define slightly different things. So we thought a glossary of how common terms map to AgileData.io might be useful.
Ad-hoc queries are created when questions arise that are not able to be solved with predetermined or predefined reports or content.
Ad-hoc queries are supported by the AgileData.io Event, Rules and Consume layers. These allow you to quickly define the data structures you need to answer your ad-hoc queries. Importantly once defined these data structures are kept up to date so you can always retrieve the same answer to the ad-hoc question in the future.
AgileData.io does not allow you to run ad-hoc custom SQL to answer these ad-hoc questions.
Application Programming Interface (API)¶
API stands for Application Programming Interface. API’s allow APIs systems to communicate with one another, one system makes a request for data and the other system returns the data that has been requested. The API defines the language that is required to make the request and to receive the response to ensure the two systems are speaking the same language.
AgileData.io enables the use of API’s for two specific use cases. API’s can be used to collect data from systems of record and have that data stored in the History layer. API’s can also be defined to allow other systems to request data from the Consume layer, effectively providing a data service.
Change Data Capture (CDC)¶
Change Data Capture (CDC) is a pattern to extract changes made to data in Systems Of Record. The pattern identifies (tracks) what data has changed and then takes action based on those changes, typically moving those changes to the data platform.
The standard Change Data Capture pattern is to use a technology component that watches a relational database transaction log file (the left hand side), and then “replays” those changes on the target data platform (the right hand side). CDC technology deals with complex features such as moving data between different database flavours, for example from Oracle DB to SQl Server.
Other patterns exist to determine changes, for example comparing snapshots to identify “Delta’s”, applying “Triggers” to the database or application to identify changes or publishing “events’ each time something is updated. But these are not typically referred to using the Change Data Capture (CDC) term.
AgileData.io automagically stores Change Data Capture records in both the History and Event areas. The CDC records are provided as part of the data collection process or we calculate them automatically when loading the data into the History area, using a Delta pattern.
As you update your system of record, you create changes in the data it stores. For example, when you update the address of a customer or a citizen, or you update the price of a product or service, or you create a new order transaction. These changes to the your data need to be identified and loaded into your data warehouse so the data you use to make decisions includes these changes.
AgileData.io automagically stores change records in both the History and Event areas. The change records are either provided to us by your data collection systems or we calculate them automatically when loading the data into the History area.
Data architecture encompasses the rules, policies, models, and standards that govern data collection and how that data is then stored, managed, processed, and used within an organisation’s databases and data systems.
AgileData.io is based on a data architecture that has been refined over 20 years by its founders. As a Software as a Service offering AgileData.io provides the data architecture, without the need for you to spend a large amount of time or effort creating your own bespoke data architecture.
Data catalogs helps users understand what data is available and the context of that data. It assists users in locating the data quickly, via a self-service pattern.
Data Catalogs are often referred to as a Data Dictionary or a Business Glossary,
The data catalog is a core component of the AgileData.io solution. Every-time you create a Concept, Detail, Event, Rule or Consume View these are all stored in the Data Catalog area. You can then extend your understanding of these things by adding business descriptions, indication of data quality and labels to make data easier to find and understand.
Data Definition Language (DDL)¶
DDL stands for Data Definition Language and it is used to create or modify the structure of a database. It’s considered a subset of SQL and is distinct from the Data Manipulation Language (DML), which deals with the data contained within that structure. DDL consists of these three statements:
CREATE: You use the various forms of this statement to build the essential structures of the database.
ALTER: You use this statement to change structures that you have created.
DROP: You apply this statement to structures created with the CREATE statement, to destroy them.
AgileData.io automagically creates and executes DDL based on the concepts, details, events and change rules you define.
You cannot see the DDL AgileData.io generates.
Data encryption translates data into another form, or code, so that only people with access to a secret key or password can read it. Often, it is is automated as part of other processes within a data platform.
AgileData.io automagically encrypts data both while it is being collected and while it is being stored, ensuring your data is protected at all times.
Data Manipulation Language (DML)¶
DML stands for Data Manipulation Language and it is used to insert, update, delete or the data within a database. It’s considered a subset of SQL and is distinct from the Data Definition Language (DDL), which deals with the structure within the database which holds that data.
AgileData.io automagically creates and executes DML when executing the change or validation rules you define.
You cannot see the DML AgileData.io generates.
A data mart is a subject-oriented data store that is often a subset of a data warehouse. The subset of data held in a data mart typically aligns with a particular business unit or core business process, for example, sales, finance, or marketing.
Data Marts vs Data Warehouses¶
Data marts and data warehouses are both structured data repositories where data is stored and managed until it is needed. However, they differ in the scope of data stored: data warehouses are built to serve as the central store of data for the entire business, whereas a data mart fulfills the request of a specific division or business function.
Data marts are often built to provide an increased speed of delivery of data to users. However the downside is that the data marts often get orphaned or silo’d resulting in increased maintenance and confusion when a single view of the data is required across the organisation.
AgileData.io provides the benefit of quick delivery and ease of use associated with data marts, while ensuring the ongoing maintenance is kept to a minimum and a single view across the organisation is always available.
The definition of Events inherently maps to a data mart pattern as you define them based on business unit or business process specific data. However the Concepts and Details you define that are part of those events automagically provide a single view of this data across your organisation.
A data model is a conceptual, logical or abstract design of how data will be physically stored in the data warehouse.
AgileData.io automagically creates and maintains a data model to store your data, but you never get to see it. When you create a Concept, Detail or Event or when you define a Change Rule, in the background AgileData.io is creating a data model to represent these things.
Data vault is a data modeling pattern for storing data in a data warehouse. It is defined by its inventor as “The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.”
AgileData.io adopts some of the data vault patterns in the way it stores the data in the Events layer.
A Dimensional Model is a database modeling pattern that is optimised for online queries and data warehousing tools. It is comprised of “fact” and “dimension” tables and is often referred to as a “Star Schema”
A “fact” is a numeric value that a business wishes to count or sum. A “dimension” is essentially an entry point for getting at the facts. Dimensions are things of interest to the business.
The Consume area can provide access to your data by presenting it in a dimensional model format (i.e a Star Schema).
Extract Load and Transform (ELT)¶
ELT is the process of extracting data from a Systems Of Record, loading the results into a data store and transforming the structure of the data or the data itself so it can be used later.
ELT differs from ETL as the changes (transformations) are made once the data has been loaded into the data warehouse data store.
When you define a Change Rule within AgileData.io you are effectively creating a ELT process which collects the data (extract), stores the data in the History or Event layer (load) and changes the data (transform) that is stored . AgileData.io leverages the ELT pattern to ensure the transformation is undertaken in the part of its underlying infrastructure that has the ability to apply the changes the fastest.
Extract Transform and Load (ETL)¶
ETL is the process of extracting data from a Systems Of Record transforming the structure of the data or the data itself and loading the results into a data store so it can be used later.
ETL differs from ELT as the changes (transformations) are made while the data is being moved between between the Systems Of Record and the data warehouse data store.
When you define a Change Rule within AgileData.io you are effectively creating a ETL process which collects the data (extract), changes the data (transform) and stores the data in the History or Event layer (load). However AgileData.io does not leverage the ETL pattern, it leverages the ELT pattern to ensure the transformation is undertaken in the part of its underlying infrastructure that has the ability to apply the changes the fastest.
AgileData.io enables you to easily collect data from Systems Of Record using API’s provided by those systems. Often those API’s will return the data you are collecting in a JSON format. AgileData.io automagically understands the structure of those JSON data files and creates the relevant data structures in the History layer to store the data.
Software as a Service (SaaS)¶
SaaS is short for Software as a Service and describes a software distribution model in which a third-party provider hosts applications and makes them available to customers.
AgileData.io is a native SaaS solution. This means that AgileData.io automagically removes the complexity of creating and managing your own software and infrastructure.
AgileData.io is not available as an on-premise or Infrastructure as a Service (IaaS) solution.
Slowly Changing Dimensions (SCD)¶
A Slowly Changing Dimension (SCD) refers to a dimension that stores and manages both current and historical data over time in a data warehouse. It enables you to see what the data looked like as at a specific point in time.
AgileData.io automagically stores change records and the effective dates for each record in both the History and Event areas. The Consume area provides access to both the current point in time view of the data, or a historical point in time view of the data.