Data has always been critical to business. It’s what guides the formation and why managing data growth is so integral to the success of a company. Be it transactions marked down on Sumerian clay tablets, double-entry account records from Victorian-era England, or a database of web site statistics today, data and its collection are foundational to the decision-making process. From small, innocuous beginnings it grows and changes, consuming ever more time and resources until it becomes overwhelming. This transition point happens at different times for different kinds of businesses, and how it manifests can be as equally diverse. In the past, the main approach has been to simply throw more money, in the form of personnel, technology, or services at the problem. This is often done without realizing what the underlying data management challenge actually is. Let's review some generic data growth management best practices which help control the chaos and improve how data is used to make decisions.
Today there are many different ways to easily store, retrieve, and analyze information, which can quickly become a double-edged sword of benefits and necessary maintenance. In many companies there is simply too much going on for the key decision-makers to manage data growth to record quality, so logically they delegate these tasks back down to the individual department and engineer level. Left to their own methods, the various internal departments or business units, accounting, purchasing, shipping, sales, marketing, customer support, IT and so on, go about their specialized tasks, selecting and using tools effective for their respective data use-cases. This can lead to fragmented and siloed data, where, for example, a customer may be identified as ‘XYZ’ in Sales, ‘XYZ Corp’ in Support, and ‘XYZ, Inc’ in Accounting. Each of these entries contains duplicated data along with unique data, and each group is required to manage data and their records of its growth when changes occur. Names, addresses, phone numbers, emails, titles, labels, etc can and do change over time. Then there are internal changes and updates for products and services, resellers, and heaven forbid a customer is also a vendor. Finally, there are mergers and acquisitions which don’t just multiply the data growth issue, they do so exponentially.
While tools can help in the collection and management of data growth, in order to effectively maintain and utilize all of this information an overall data architect is required. Many companies see this as an extension of the duties of the Chief Information Officer or Chief Technology Officer while others have created a new Chief Data Officer position to help manage and guide the overall data situation instead. Quite often this is delegated to a database administrator (DBA) but this is not generally a good idea in the long term. While DBAs may have a good understanding of the technical side of organization, storage, and access, there are additional factors on the business side that need to be considered such as data context in the business unit, how the data is used inside the business unit, and how that data is expected to be used for analysis and reporting. This requires a data architect to have a good understanding of not only the managing of data growth involved, but also of the nature of data relationships, data organization, and how the data is expected to be used in each of the business units. This extends well outside of the traditional technical realm and can often involve extensive discussions on defining what terms are used and what they actually mean. In many large organizations the task of generating reports often falls either to each individual business unit or to a small group, permanent or ephemeral, assembled for that purpose. Because of this, gathering an effective high-level report might require traversing multiple reporting tools, formats, and platforms, requiring extra time and effort to assemble and manage data growth. Without an architect guiding the process, loss of context is a greater risk, and employee turnover has a greater impact on the report generation process.
One of the main tools in organizing the information is a Single Source of Truth (SSOT) database. While this is a fantastic tool to combat the data rats nest, it does come with its own set of challenges. This is especially true if such a design was not implemented initially and requires additional data migration and integration with internal tools. An SSOT makes control of data much easier, reducing duplicate data and facilitating audits, reporting, and access. Now, this is not a singular data source from which all the tools connect to directly, rather, it is a repository into which the various and sundry systems in the company can deposit their unique data. Use of the centralized warehouse drives consistency in the use of data, reduces redundancy, and standardizes the relationships between managing data growth.
I have been asked “Wouldn’t this SSOT just be a data warehouse?” The short answer is yes, but with a couple major caveats. A data warehouse is indeed a repository of information from disparate sources, but an SSOT has an additional requirement of maintenance and normalization. It’s not a fire and forget solution. Data in business is constantly changing and evolving, and it’s up to the data architect to monitor and maintain the system, to make sure that it stays as the Source of Truth for the company’s data. Without this Data Architect role and its associated responsibilities being fulfilled, it would indeed be just another data growth management warehouse, and the analytics and reports generated from the information contained therein would require additional steps of validation, reducing the utility of the system as a whole. Likewise, a data lake can also be used as an SSOT, but it too requires the same maintenance and normalization as its data warehouse counterpart.
Recently there have been inroads made on the use of Artificial Intelligence and Machine Learning (AI/ML) to help with the analysis of the prodigious amounts of data generated by large organizations. Amazon’s SageMaker is a good example of this. Even the application of such impressive technology doesn’t replace the need of a data architect to orchestrate the process of managing data growth, and in fact the introduction of AI/ML to the mix increases the need for a dedicated data scientist role in addition to that of the data architect.
The complex and evolving nature of business managing data growth means that it cannot be solved through the use of tools and technology alone. Proper organization and ongoing analysis and maintenance are critical to the effective use of the vast amount of information available. The old records vaults of the past requiring an army of scribes, accountants, and librarians to maintain are not gone, they now exist as 1’s and 0’s within a vast virtual construct. However, now the tasks of ordering and understanding this information is the purview of expert data architect.
LucidPoint Sr. Cloud Engineer