Why do you need a data warehouse? You’ve already got a database, and it has all of your business information inside it. You’ve been getting reports from this database for years, why incur the additional expense of a data warehouse too? What is a data warehouse and its difference to a database?
A database, to most, is a repository of information from an application, typically a single application but sometimes from many. This database is designed to make transactional systems run efficiently and is typically an OLTP (online transaction processing) database. It allows concurrent access to the data in real-time in a secure manner while maintaining integrity, reducing redundancy, and restricting access to prohibited data.
So, if you were to ask us if what is a data warehouse, it is another layer added to an existing database or databases. It is designed to effectively and efficiently perform analytical requests on the data. A data warehouse imports data from one or more source datasets into its data structure. Regardless of the internal designs and processes underneath the data warehouse, it can now be used to run complex and multidimensional queries on the data without affecting the production data environment, and without being affected by such. These queries and reports can also run more quickly due to the way the data is structured, and because a data warehouse can accept data from multiple sources, analysis and reports can be run against a broad range of data, such as sales, and customer support, and application. Additionally, what a data warehouse is used for is storing historical data for longer periods of time while still being available for comparative analysis. This is especially true when dealing with trending data, where you don’t need the historical information or metadata cluttering the transactional/primary database, but which could be used to monitor growth trends. A growing use of this kind of warehouse is the use of AI/ML (Artificial Intelligence/Machine Learning) to analyze vast amounts of historical data to help make better-informed business decisions.
Alongside knowing what a data warehouse is, it is also important to note that many of the characteristics behind a transactional database do not work well with analytics. As the amount of data grows with the business, those reports used to make business decisions have been taking longer and longer to run. Likewise, a data warehouse does not work as a primary database as its data is not easily amenable to rapid or atomic change. In fact, one of the more common ways to utilize a data warehouse is with transient data, where data sources are loaded periodically from snapshots, analyzed, and then purged for the next batch of data. How is a data warehouse structured? The internal structure and organization of a data warehouse can differ from a traditional database, up to and including columnar structure, parallel, sharded, and clustered processing, which allows for greater processing ability. The benefits of a columnar vs row structure in a data warehouse will be covered in a future post.
The key reasons to use a data warehouse over a collection of disparate data sources are that it allows complex queries to be executed outside of the production environment, it allows related information spread across multiple sources to be analyzed together more easily, and it speeds up the analytical and reporting processes, especially when there are massive amounts of data involved. The best reason to use a data warehouse I have seen is that it off-loads the processing from the primary or production database, freeing up those resources and allowing the analysis and reporting to be run at any time. A data warehouse set in the cloud can also be spun down when not needed, and additional resources can be allocated as needed, both of which can result in significant cost savings without impacting either the production database or the data warehouse.
LucidPoint Sr. Cloud Engineer