For decades, managing data essentially meant collecting, storing, and occasionally accessing it. That has all changed in recent years, as businesses look for the critical information they can pull from the massive amounts of data generated, accessed, and stored in myriad locations, from corporate data centers to the cloud and the edge.
Given that, data analytics – helped by such modern technologies as artificial intelligence (AI) and machine learning – has become a must-have capability and in 2022, the importance will be amplified. Enterprises need to rapidly parse through data – much of it unstructured – to find the information that will drive business decisions. They also need to create a modern data environment in which to make that happen.
Below are a few trends in data management that will come to the fore in 2022:
Data managers will broaden their focus from structured data to unstructured data analytics
Traditionally, a lot of data science was focused on feeding structured data to data warehouses. But with 90 percent of the world’s data becoming unstructured and with the rise of machine learning, which relies on unstructured data, data scientists should broaden their skills to incorporate unstructured data analytics. They need to learn to glean value from data with no specific structure or schema and ranges across video files, genomics files, seismic images, IoT data, audio recording,s and user data such as emails. Developing these skills, which involves staying current and experimenting with new unstructured data analytics capabilities in data lakes as well as learning unstructured data management techniques, will be paramount in 2022.
‘Right data’ analytics will surpass Big Data analytics as a key trend
Big Data is almost too big and is creating data swamps that are hard to leverage. Precisely finding the right data in place no matter where it was created and ingesting it for data analytics is a game-changer because it will save ample time and manual effort while delivering more relevant analysis. So, instead of Big Data, a new trend will be the development of so-called “right data” analytics.
Storage-agnostic data management will become a critical component of the modern data fabric
A data fabric is an architecture that provides visibility of data and the ability to move, replicate and access data across hybrid storage and cloud resources. Through near real-time analytics, it puts data owners in control of where their data lives across clouds and storage so that data can reside in the right place at the right time. IT and storage managers will choose data fabric architectures to unlock data from storage and enable data-centric vs. storage-centric management. For example, instead of storing all medical images on the same NAS, storage pros can use analytics and user feedback to segment these files, such as copying medical images for access by machine learning in a clinical study or moving critical data to immutable cloud storage to defend against ransomware.
Data fabrics will be a strategic enterprise IT trend in 2022
Data fabric is still a vision. It recognizes that your data exists in many places, and a fabric can bridge the silos and deliver greater portability, visibility, and governance. Data fabric research has typically focused on semi-structured and structured data. But 90 percent of the world’s data is unstructured (think videos, X-rays, genomics files, log files, and sensor data) and has no defined schema. Data lakes and analytics applications cannot readily access this dark data locked in files. Data fabric technologies need to bridge the unstructured data storage (file storage and object storage) and data analytics platforms (including data lakes, machine learning, natural language processors, and image analytics). Analyzing unstructured data is becoming pivotal because machine learning relies on unstructured data. Data fabric technologies need to be open and standards-based and look across environments. In 2022, the data fabric should move from a vision to a set of architectural data management principles. Given its rising relevance and sheer magnitude, technology vendors need to incorporate unstructured data into their data fabric architectures.
Multi-cloud will evolve with different data strategies
Today, many organizations have a hybrid cloud environment in which the bulk of data is stored and backed up in private data centers across multiple vendor systems. As unstructured (file) data has grown exponentially, the cloud is used as a secondary or tertiary storage tier. It can be difficult to see across the silos to manage costs, ensure performance and manage risk. As a result, IT leaders realize that extracting value from data across clouds and on-premises environments is a formidable challenge. Multi-cloud strategies work best when organizations use different clouds for different use cases and data sets. However, this brings about another issue: moving data is very expensive when and if you need to later move data from one cloud to another. A newer concept is to pull compute toward data that lives in one place. That central place could be a colocation center with direct links to cloud providers. Multi-cloud will evolve with different strategies: sometimes compute comes to your data, sometimes the data resides in multiple clouds.
Synthetic data + unstructured data will be needed to manage data growth
Data security and privacy are becoming more pressing, and synthetic data is an excellent solution to prevent user data collection. Synthetic data is also more portable since you do not have many privacy laws to consider. While synthetic data reduces the footprint of customer data, it is still a tiny fraction of the total unstructured data. The bulk of data is application-generated, not user data, so synthetic data coupled with unstructured data management is needed to manage data growth.
Enterprises continue to come under increasing pressure to adopt data management strategies that will enable them to derive useful information from the data tsunami to drive critical business decisions. Analytics will be central to this effort, as will creating open and standards-based data fabrics that enable organizations to bring all this data under control for analysis and action.