Fünf Personen sitzen in einem Meetingraum. Eine Frau stellt ein Dashboard vor an einem Screen.

If you are working in or interacting with the data management industry, you will quickly notice that a wide range of technical terms is being used. The terminology employed by vendors and influencers can be quite confusing. If you are planning to build a data platform or improve an existing one, it can be challenging to first select the right concepts for your needs (without yet considering technology, tools, or implementation). It can also be difficult to distinguish between a conceptual approach and a commercially branded offering used by vendors. That is why we would like to provide an overview of general terminology and concepts you are likely to encounter and that are useful for your understanding.

Why Should I Care About Data Management?

Ultimately, it is about better decision-making, as well as saving time and money. Companies want to make data-driven decisions and need data democratization to do so. A data management platform can act as a single source of truth. This means departments no longer rely on different versions of the same facts, enabling better collaboration. In addition, access to data becomes simpler and is no longer limited to senior management. Data can be prepared and made available, for example, for analytical purposes. By storing not only current information (which supports better decisions) but also historical data, the foundation for future use cases is created. However, not every data management platform is a self-running system. There is no product you can simply buy that will solve all your problems. A platform must be designed and built intentionally. To lay the foundation for implementing a high-quality data architecture, you must first understand what you want to achieve. In the following sections, we take the first step by introducing the fundamental concepts of data management.

Data Warehouse: A Classic but Well Established

When discussing the following terminology including the Data Warehouse, we are referring to concepts rather than specific technologies. Let us look at Bill Inmon’s definition from the 1990s, which is still valid today. A Data Warehouse is a subject oriented, integrated, time variant, and non volatile collection of data in support of management’s decision making process. A Data Warehouse is therefore a collection of data, but not just any storage or simple copy. It is designed for analytics and supports data driven decision making. A Data Warehouse collects data from various sources. Crucially, it harmonizes and integrates this data into a consistent whole. The focus is not on the system the data originates from, which may be many, but on the subject it describes. For decision making, not only availability and correctness matter, but timeliness as well. Data changes over time, so a Data Warehouse must include up to date information. At the same time, historical data is equally valuable for analytics. One of the major strengths of a Data Warehouse is that it accumulates and processes data over time and can therefore provide historical insights.

Fundamentals of the Data Warehouse Central Hub for Analytics and Data Driven Decision Making

In summary, a Data Warehouse is a central location where data from different sources is consolidated and harmonized. It contains current information while also preserving historical data. As such, it serves as the central hub for analytics and data driven decisions.

Traditionally, Data Warehouses are associated with structured data, meaning data that can be stored in tables. To make it simple, think of data that you could store in Excel. To some extent, they can also process semi structured data such as JSON or XML files. Unstructured data such as PDFs or images leads us to the next concept, the Data Lake.

Data Lake What Should I Choose If I Do Not Want to Spend My Time Modeling and Organizing Data

Data Lake The Smart Choice for Time Efficient Data Management

As mentioned earlier, the term unstructured data leads us directly to the concept of the Data Lake. Today, it is also a widely used and well established concept in data management. It was introduced to address the challenges of rapidly growing data volumes and to meet the increasing demand to leverage unstructured data for various use cases. This trend has also been enabled by the fact that cloud storage has become significantly more affordable.

A Data Lake serves a similar purpose to a Data Warehouse in that it stores data in a central repository. However, as a concept, it is more loosely defined. Gartner describes it as: “A Data Lake is a collection of storage instances for various data assets stored in near-exact or exact copies of source formats, complementing original data stores.”

Some may assume that the concept of a Data Lake is particularly attractive because it stores data in its raw format and therefore does not require modeling or additional overhead. At this point, however, we would like to clarify that this is a misconception. Data Lakes also require strong data modeling and governance concepts, which we will discuss in our next blog post, because they tend to turn into data swamps if data is simply dumped without structure or control.

Data Lakehouse? That Sounds Like a Dream Vacation Destination!

Data Lakehouse Bringing Together the Best of Data Lake and Data Warehouse

This is probably the most stylish sounding name of them all. As you may have noticed above, the boundaries between the concepts of Data Warehouse and Data Lake have become increasingly blurred as technologies have evolved in practice. It is no longer easy to distinguish between them, for example based on the type of data stored, since these categories now overlap. A widely accepted convention is that a modern data platform should include both Data Lake and Data Warehouse capabilities. One of the roles of a Data Lake is often to serve as a universal collection point for all data. For instance, the term Data Lake may be used to describe the landing zone of a Data Warehouse, meaning the stage into which data from various sources is replicated before it is modeled.

This demonstrates that the terms can no longer be clearly separated. The most well known combined technological concept today is the Data Lakehouse. As the name suggests, it combines elements of both the Data Lake and the Data Warehouse. Essentially, it is a sophisticated sounding term for something that is quite logical from an architectural perspective.

Data Mart Hard to Avoid

Data Mart A Key Component in Data Management Projects

In your data management projects, you will most likely encounter a Data Mart, especially if you model your data properly, which we will discuss in the next blog post. A Data Mart can be considered a subset of a Data Warehouse. Its purpose is to be use case oriented. Your Data Warehouse may contain a set of harmonized and normalized data that is relevant for multiple use cases. A Data Mart prepares this data in a way that ideally fits the needs of the business and its users and is suitable for the target system, for example various BI tools. This means there may be multiple Data Marts based on the same underlying data, each structured differently depending on its purpose.

Data Mesh The New Kid on the Block

Exploring Data Mesh Decentralized Data Management in 2024

Data Mesh is a relatively new term mentioned in this blog post. It was introduced by Zhamak Dehghani in 2019. It is important to note that Data Mesh is not a successor to a Data Warehouse, a Data Lake, or any combination of the two. It is also not a universal solution that will solve every problem. Until now, most analytics platforms have been built around a centralized system managed by a central data team. However, this approach can become a bottleneck as analytical demands increase and the data team is unable to handle all requests. Data Mesh therefore follows the idea of a decentralized, domain oriented architecture. Domain teams take responsibility for their own data and its management. Data is published as a product for consumers outside the domain. The central data team enables domain teams to create and use data products by providing standards, governance, and infrastructure support. Strong governance principles and standardization are essential for this approach to succeed.

Data Powerhouse: The Term That Makes You Smile

Unlocking the Potential of a Data Powerhouse: More Than Just a Buzzword

To be honest, this is more of an informal expression than a formal concept. The intention is to keep the description clear and concise without overly complex terminology. The term “Data Powerhouse” refers to combining Microsoft’s Power Platform with a Data Warehouse or a Data Lake. It is a good example of our earlier point that it can be difficult to distinguish between a conceptual idea and a commercially branded offering. At the same time, some people use the term to describe what your company can become when you implement an efficient data infrastructure.

Which Fancy Name Should You Choose? The Right Data Management Strategy for Your Company

In the end, you can call it whatever you like as long as you implement it correctly, it fits your needs, and you can clearly explain it to everyone you work with so that you are all aligned. INFORM DataLab can support you on this journey no matter where you currently stand. If you would like to develop a data strategy and are wondering how it could benefit you and what value it could create, take a look at our data strategy offering. If you are ready to move beyond terminology and start building your data architecture, we are also happy to support you in that process.

Stay tuned for more helpful blog posts. In the next article, we will explore whether data modeling will still be relevant in 2024.

In our Data Vault Experience Workshop, you will learn about the principles and practical applications of Data Vault. You will gain an understanding of the importance of data modeling, its impact on an agile BI layer, and the building blocks of the Data Vault approach. Through hands on exercises, you can exchange ideas with other data professionals and apply your knowledge in practice. Register now!