A recent study by KPMG revealed that executives do see the value in data and analytics, but 85% face challenges in implementing the right solution to analyze and interpret their existing data.
This is the first article in a series of articles aimed at highlighting best practices to overcome the challenges in implementing Big Data initiatives and full data-driven company transformations. We will focus on real world examples and ask industry practitioners from both the business, technical and analytical side to share their experience and advice throughout the series.
To start off we asked Adam Drake, Chief Data Scientist and Director of Engineering at Zanox , to give us his view on the current issues in implementing strategies to handle ‘Big Data’ and why corporations keep failing at it.
1. Misconception of Big Data concept and application
Although most Big Data projects are driven by the realization that there is inherent value in improved data and analytics, the hype around the topic adds to the pressure on executives to start their own initiatives. They try to avoid accusations of either missing a chance to gain competitive advantage or of having wasted valuable resources on storing data that remains unused and is thus worthless.
A successful data initiative must always be guided by clear business objectives and well defined performance metrics. Only if these are made clear and communicated well to the organization, the right analytical techniques and tools can be selected.
The issue of unguided Big Data projects mostly stems from the lack of understanding of what Big Data really is and how it compares to standard analytics techniques, but especially of when to choose one approach over the other.
From a more financial perspective, “wanting to capitalize an investment into Big Data infrastructure rather than hurting operating performance by expensing additional standard database technology can lead to companies buying and setting up big Hadoop clusters without a clear use case”, Adam Drake notes. Instead of guiding the infrastructure and analytics decisions by the business problem at hand, the need for name plate ‘Big Data’ initiatives can lead to ill-conceived investments that may never pay off.
In many or even most cases, modern installations of traditional relational databases will be sufficient. They tend to be cheaper and faster to market than a fully fledged Big Data infrastructure (don’t think Amazon cloud, but a physical Hadoop cluster with hundreds of nodes in your own datacenter).
Another common misconception is that hiring an army of highly skilled and well paid Data Scientists will make magic happen and reap huge profits for the company. “This often turns out to be wishful thinking, as any successful data initiative highly depends on a strong governance across departments and stakeholders, even more than on the infrastructure itself”, as Adam Drake knows.
2. Lack of sponsorship
While implementing an isolated solution in one business unit will require less cooperation and ultimately less governance compared to a full data-driven company transformation, the challenges are similar.
The main challenge is often resistance from certain functions and/ or lack of acceptance on lower levels of the organization. So where does the resistance come from in the first place? A new IDC report on the Big Data industry finds that “Decision automation solutions based on Big Data technology will increasingly begin to replace or significantly impact knowledge worker roles”. In effect this means that an efficient data-driven decision making process will require less people or reduce authority of selected roles within an organization.
When thinking about introducing Big Data initiatives into their business, many executives seem to focus only on the technological and analytical challenges and forget to provide for the right organizational foundation and data-related governance.
In fact most organizational structures are not designed to foster seamless data governance needed for a larger analytics initiative or even a data-driven transformation. They don’t yet have a strong data-driven mindset of their management and there is no strong sponsor who makes data-driven decision making a priority.
This shows in the fact that although many larger organizations have by now installed Chief Data Scientists/ Officers, they are often reporting to e.g. the COO or other C-level directors and therefore can be biased or lacking authority to effectively implement cross-functional initiatives. “Especially for larger initiatives and transformations, it is essential to secure senior sponsorship, a senior supervisory board member is preferable and most effective.” as Adam Drake says. Since transforming corporate thinking and culture to be more data-driven cuts across multiple departments and units, it is critical to have a top-level data position that is unbiased and has sufficient authority to enact policy.
3. Low quality or unusable data
From many data audits Adam Drake knows, that “there are two extremes to the spectrum of how data-driven an organization already is: On one end, data is used only to validate management decisions after the fact, while on the other end there is what you might call ‘data nirvana’, a state in which all decisions are purely based on hard data. Most organizations I’ve seen were stuck somewhere in between, and have a long way to go before reaching their ideal state”.
To move from using data for sporadic validation to taking all decisions purely based on data, a number of process-related challenges need to be overcome. Executives often believe that they have relevant data available or that they could make them available if they wanted. “Too often this turned out to be a false belief”, as Adam experienced. “Sometimes they are willing to provide the data but don’t have it available as they had thought and sometimes you even run into organizational barriers where people are not willing to share what you need”.
For a successful implementation a change agent must own the entire data value-chain, i.e. collection, storage, analysis and handling of data across all functions and business units involved.
Quality is another issue. In many cases the quality of data is low or needs to be cleaned up significantly for any analysis to be performed which slows the process and causes initial results to take far longer than anticipated. This will also affect business processes involved in handling and working with data and again involve a large number of stakeholders, all of whom may have different motives and goals.
Only now would technological considerations come into play, given that use case, organizational setup and data availability have been successfully established in previous steps.
So what’ next?
While IDC predicts the Big Data technology and services market will grow at 27% CAGR through 2017 to a total of $32.4 billion, Svetlana Sicular of Gartner pointed out in her blog about a year ago that Gartner’s hype curve for ‘Big Data’ was falling into the trough of disillusionment as high expectations have not always been met. This is often caused by organizations underestimating the magnitude of the cultural change required for a transformation to becoming more data-driven, and usually manifests in the form of starting a transformation without granting sufficient authority to a change agent.
Still, we believe that going forward there is tremendous potential in using data and analytics in business and that the industry will mature fast. We want to use this blog to focus on the issues and shortcomings of current uses of Big Data in the corporate world and present industry best practices from people who have been there.
Please watch out for more on this soon.
Adam is currently the Chief Data Scientist and Director of Engineering for Zanox, Europe’s largest affiliate network. He has been in technology roles for over 17 years in a variety of industries, including online marketing, financial services, healthcare, and oil and gas. His background is in Applied Mathematics, and his interests include online learning systems, high-frequency/ low-latency data processing systems, recommender systems, distributed systems, and functional programming.
(Image credit: Brandon Grasely)