These days, every business is exploring ways to use data and new technologies to gain a competitive edge. While there’s no questioning the value to be found in big data analytics, organizations have had a low success rate to date when it comes to rolling out data initiatives. A recent Capgemini study, for instance, found that only 13 percent of big data initiatives in organizations surveyed have achieved full-scale production, and only 27 percent of the executives surveyed described their big data initiatives as “successful.”
In light of this, what should organizations be thinking about as they roll out their own big data initiatives?
Make the data accessible
It may sound obvious, but data isn’t useful if it isn’t put in the hands of teams that can make use of it. The more people that have access to data throughout an organization, the more value it can have.
Yet, organizations still struggle with this for various reasons. To name a few, the tools are too complex; each team needs different tools; the strain on IT to handle the admin required of big data deployments is too much; or the training involved in getting teams up and running takes time. A common complaint from companies declaring failure in their big data projects is a “lack of talent”. To a great extent, this gap in talent can be attributed to the fact that it is indeed difficult to find data scientists and analysts who also code in the various languages needed to navigate the Hadoop Ecosystem. Hadoop was built by developers for developers but there are now emerging cloud-based tools to continually abstract away this added technical complexity enabling data teams to focus on data problems and not be pulled into programming or infrastructure issues.
In an ideal world, organizations would have a completely self-service platform with a range of the latest tools and technologies that can be used as needed for the task at hand. This reduces barriers to insights while making teams more productive with their data analysis.
Make the data more collaborative
In addition to getting data into the hands of more teams, fostering collaboration around the data also drives greater value from it in three ways. First, it promotes better insights as more brains ponder the problem at hand and dig into the results from different perspectives. Second, it reduces duplicate effort, so that different people on the same or even different teams aren’t conducting the same queries in silos. This saves man-hours and it can also save money or compute resources. Third, it can also create more efficient and transparent relationships among teams, clients or partners if you’re able to share results and collaborate on analysis with them.
Plan to scale
No matter how small your organization or datasets are now, they will grow—rapidly.
Your infrastructure needs to scale too, and you need to plan and invest out ahead of your needs. That means adding hardware, software and staff resources to keep pace. For on-premises big data deployments, it can take anywhere from weeks to months to expand the capacity and capabilities of data infrastructure—that means you always need to be thinking and planning several months out.
Use the right technology for the job
It goes without saying that organizations should use the right tools and technologies for the job. For organizations deploying on-premises big data infrastructure, this is much easier said than done.
Organizations have a wide range of tools and technologies to choose from—Hadoop, Hive, Spark, Presto, Pig, etc.—and new technologies are popping up all the time. Each technology requires specialized expertise to deploy and integrate with existing systems, and carries its own costs and admin requirements.
Yet, the more tools available, the more productive and the more effective your teams’ analyses will be. The challenge is weighing the comprehensiveness of tools with the cost, support and admin requirements to install and maintain it.
Putting it all together
Big data initiatives are all about gaining new insights from all the data companies have access to as quickly as possible to propel their businesses forward. To do this successfully, organizations need a big data infrastructure that is flexible, accessible and scalable.
When you look at all the requirements listed throughout this article, it becomes clear that those qualities map nicely to the characteristics of the cloud. On-premises deployments take months or even years to implement and tune. Scaling becomes a challenge because you have to plan well ahead of your needs and devote resources to adding capacity. Adding new technologies can be time-consuming and often requires special expertise (adding to costs).
Big data analytics in the cloud eliminates the risks of on-premises deployments, while providing several advantages that speed the time to value of data. For organizations of all sizes looking to become truly data-driven businesses, moving their big data initiatives into the cloud is the fastest and most effective way to achieve this.
With companies like Amazon, Google and Microsoft devoting significant resources to not only grow their cloud offerings, but they’re now aggressively rolling big data services on top of their clouds, it’s clear that the future of big data is in the cloud.
About Ashish Thusoo, CEO and co-founder of Qubole: Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team. The team built one of the largest data processing and analytics platforms in the world under his leadership. This platform achieved not just the bold aim of making data accessible to analysts, engineers and data scientists, but drove the “big data” revolution. In the process of scaling Facebook’s big data infrastructure, he helped drive the creation of a host of tools, technologies and templates that are used industry wide today.
Image credit: Brett Jordan