By Chris Probert, partner and data practice lead at Capco
Banks have always held vast amounts of data within their organizations but with the recent exponential growth in data volume, efficient usage and governance of data has become firmly established as a source of competitive advantage for financial institutions.
The ability to identify, monitor, interpret, and extract value from data is something that many organizations have historically struggled to achieve, mostly due to poor tracking of data across the enterprise. Data lineage serves as a tool to track data from its origins, through any transformations it may undergo to its ultimate consumption.
Financial institutions have an opportunity to provide major value to their organizations by using data lineage to provide benefits along three dimensions: regulatory compliance, control optimization and cost reduction.
Why data lineage – stopping Chinese Whispers
To understand how data lineage is useful, it is important to acknowledge the obvious: data in a large organization is not held in a single repository, but rather flows across many systems and databases. During this proliferation there is a risk to data quality, security and availability. For example, as anyone familiar with the game of Chinese whispers knows, despite best intentions an original message can swiftly metamorphize into something completely different as it moves down the chain from person to person.
Information is no different – data can become compromised every time it enters a new system or database. Data lineage offers a solution to this issue, as it allows compromised data to be hunted down efficiently and quickly.
Traditional data lineage and challenges
Data lineage rose to prominence due to requirements in the wake of the financial crisis as regulators sought evidence to substantiate the veracity of stress-test reporting for banks. Since then, regulations such as Markets in Financial Instruments Directive II (MiFID II) and General Data Protection Regulation (GDPR) among others, have required financial institutions to implement data lineage procedures to demonstrate the reliability of their reporting.
However, data lineage is currently underutilized. It largely focuses on the mechanical movement of data and less on its contextual flow; furthermore it is often targeted towards an IT audience, mapping technical data. In falling back on this traditional approach to data lineage, businesses are not tapping its full potential: namely, the insights provided through enhanced clarity around data movement and transformation.
For data lineage to be meaningful beyond mere regulatory compliance, traditional lineage must expand to address the following dimensions: who, what, where, when, why, and how. Establishing industry standards or accepted practices would help provide a structure for capturing these different dimensions and drive business value.
Three ways to make lineage useful
A big challenge facing organizations that want to use data lineage is the lack of standards around its depiction. There are significant differences how lineage is represented, ranging from spaghetti diagrams –which prove overwhelming to a business audience -through to process diagrams that often leave out data and technical representations of architecture and infrastructure that obfuscate the nuances of transformation.
There are three critical guiding principles to make data lineage useful and standardized: make it business friendly; highlight context and ownership of data; and show how data is transformed and used.
Making data lineage more attractive to businesses can be achieved by ensuring the presentation of lineage is easy to read and understand via business-centric nomenclature and easy-to-consume forms and applications. It is also important with data lineage to show only those elements that are critical to the output, or would impact the quality of the output if compromised. These are typically referred to as ‘critical data elements’ (CDEs) or ‘key data elements’ (KDEs).
The next step is understanding the context of the data – vital to setting up standardized data lineage diagrams. Organizations need to start by determining the connections that show who owns the process, the application, and the data element. Given the size and complexity of financial institutions, this can be a gargantuan undertaking. Once the data ownership structure has been established, the enterprise can begin to align the process and create a visual diagram of data lineage.
Visualizing the process by looking at the connections – application ownership, process ownership, data quality, data usage, access requests and the number of outstanding data issues – can help drive improvements, identify risk, and strengthen process governance.
Once each step is documented and unique data elements identified and validated by the relevant subject matter experts, an organization will have clarity around ownership of each stage of the process and the associated data elements. The enterprise will then be able to leverage process diagrams to better understand what happens to the data, not least how it is transformed and used throughout its lifecycle.
It is essential to know how data is moving, not just where it is coming from and going to. In determining the ‘who, what, when, where and how’ of data flow we can answer questions related to risks, whether they can be mitigated and the efficiency or otherwise of existing data management and processing systems. This leads to a more secure and streamlined data flow.
Cutting costs and boosting efficiency
Using data lineage, financial institutions can easily visualize the flow of data through systems and applications, making it allowing distinct patterns within an organization’s data – be they good or bad -to be identified.
Via data lineage organizations have an opportunity to optimize costs and processes by minimizing repetition and redundancy within their systems. For example, during account set-up an organization can end up with three separate applications performing just one function. Wherever data is handled by multiple applications for the same task that should serve as a red flag. This kind of pattern occurs often in organizations undergoing rapid growth through mergers and acquisitions or the launch of new products and services.
Without data lineage inefficiencies can remain hidden, leading to wastage due to improper data visualization and a lack of insight into applications. Data lineage allows organizations to clearly align“ business area” “process” and “applications” using visual diagrams that create a holistic picture of data management across the enterprise.
Once a redundancy in data processes is discovered, organizations can rapidly eliminate the associated wastage by enhancing one application to handle all closely related functions while retiring the rest.
Patterns for risk management
Determining where to implement controls within a given data supply chain is crucial for maintaining data quality and integrity. Data lineage techniques can quickly identify inefficiencies and risks that arise in common data control methods.
Two types of controls play significant roles in maintaining the quality of data. The first is an accuracy control, which is best implemented at the system of origin (where data is first created or entered). The second is a consistency control, which supports and maintains the accuracy of the data throughout the entire data supply chain. The recommended implementation of the consistency control is in applications downstream from the system of origin.
These two controls, when effectively maintained, will reduce in efficiency and duplicate controls, thereby improving data quality. However, organizations often unnecessarily build multiple accuracy controls into downstream systems. A classic example is a data warehouse or data lake where consistency controls would be the better choice. This leads to data in the warehouse or lake to diverge from the system of origin, creating a maintenance nightmare and leading to cost increases.
Data lineage allow organizations to make the correct choice between which applications require accuracy controls and which need consistency controls. If controls have been already implemented, it will highlight any redundancy, allowing the organization to optimize on controls and save costs.
As data flows from one application to the next within an organization there is an inherent security risk. The most vulnerable applications are those that are external-facing or vendor-hosted. Data lineage can spotlight such applications and in doing so lift the veil on security risks. Existing data controls can be reassessed and enhanced as necessary, reducing cyber risk and protecting data as it moves across the enterprise.
Data lineage will continue to evolve. At the same time, its power to help organizations think more clearly about their data, control frameworks and process optimization is yet to be fully realized.
Financial institutions that successfully leverage data lineage will drive value through cost reductions arising from the elimination of redundancies and unnecessary manual processing – and hence while simultaneously mitigating risks related to data quality, integrity and security via better controls.
Adopting common standards and injecting contextual content will facilitate the implementation of lineage in a business-friendly fashion and spur adoption. In today’s world of Big Data, data lineage offers organizations quantifiable business value as they move towards harnessing data as a source of competitive advantage.