Don’t let your data lakes become mucky, stagnant swamps

By Chiara Pensato, VP of International Marketing, Alteryx

Big data continues to transform every facet of business. The landscape is becoming increasingly complex and the amount of data so vast – some analysts are now suggesting that businesses are now drowning in data, and starved for insights. With the Internet of Things promising to deliver an influx of data, it’s not just streams of data businesses will have to contend with – organisations will have to deal with large, unordered data lakes.

Without proper structures in place to deal with unstructured, messy data, lakes can quickly stagnate into vast murky swamps. In fact, 90% of an organisation’s unstructured data is never analysed, referred to as “dark data,” by analysts IDC.

People are clearly trying to make sense of all of the data, judging by the fact that big data management solutions are expected to grow 12.8% by 2021. Part of the creation of the data lakes are due to the influx of social data and data collected through connected devices, from a multitude of sources, like self-service analytics, cloud data, device-driven data, financial data and everything in between.

For companies to remain competitive, no matter the industry, analysts across business units will need to make sense of all the data sets. Manual solutions are falling short in the age of big and distributed data: the volume, velocity, variety, and distribution of data and analysis across on-premise, in-cloud, and hybrid environments, make it impossible to manually capture and maintain data catalogues as in times past. But, how can companies ensure that the right people are activating the right data to better the business, instead of leaving it to drown in the swamp, potentially unloved and unused?

Cataloguing – if all data sets are stored in various places and not easily accessible, those working with the data will spend more time searching for specific assets, creating inefficiencies and a slow turnaround. Through cataloguing the data, those accessing and activating the data can work with the most up to date information. From there, a single version of a data analysis is created, providing the same data-driven insights for similar questions asked in various units instead of multiple versions that share the same theme but are supported by different data sets.

Governance – in the data swamp, all data is treated equally. But, in reality certain data points can be considered “more important” than others based on the sensitivity of the content or level at which the data is used. There needs to be a differentiation of data that is trusted, official data of record, and data that is available for general consumption. Helping authorised analysts make sure that they discuss the data, determine who produced it, and know who is responsible for updating it, helps to make sure that the data is approved and is agreed upon. Through proper governance, companies can ensure data being used across business units is accurate and vetted.

Collaboration – a company’s entire data collection may be an effort of many lines of business, but with the current state of data intelligence, rarely do the multiple units discuss the insights derived from analysing the data. With so many data sources available, analysts across the enterprise may end up getting stuck in the details of all the possibilities rather than picking a clear goal and driving towards it.

It’s therefore crucial for the enterprise to usher in a culture of data, encouraging all members of the enterprise to participate, irrespective of their skillset. The proliferation of cloud technology makes this possible. With the power of data platforms employees will be able to share live, interactive workbooks and data sources to drive decisions, with the ability to answer each other’s questions, building on each other’s work. Right now, 50% of data professionals lose time to inefficient, repeat actions. Increased collaboration with the help of a data platform and catalogue will go a long way to solving this.

In addition, employees across the enterprise will be able to experience the thrill of solving problems they’ve been unable to solve before. That moment when you crack a problem that was once thought to be impossible brings a rush of excitement like no other. Solving problems with data is not just a starting-point for business growth, it is also a stimulating exercise everyone can get involved in.

Data analysts want to point to data to resolve issues. But without proper cataloguing, governance, and collaboration, analysts are unable to have a deep understanding of what data they have, what data is being used, and who is using it.

Related Articles