Aleksandar Kovačević, Sales Engineer at InterSystems, shares how companies use MLOps combined with a central multi-model database to get the most out of their machine learning initiatives.
Artificial Intelligence (AI) and Machine Learning (ML) are hot topics at the moment. But when it comes to producing quantifiable results, there is still a lot of work to be done. How can MLOps, which merges machine learning with operations (procedures and processes), help to make ML projects more successful?
There is no doubt that Machine Learning and Deep Learning offer a lot of potential. But the problem is that most business leaders see Machine Learning as a kind of optimization tool rather than as an opportunity to develop entirely new services and products. That’s why all that potential remains largely untapped and ML initiatives often fail to meet expectations. IDC predicts that up to 88 percent of all AI and ML projects will fail during the test phase. MLOps can help to overcome this risk by offering support in the planning and implementation of these types of initiatives.
Can you tell us more about MLOps?
Just like with DevOps or DataOps approaches, MLOps aims to increase automation within companies and improve both production and process quality. Put simply, the idea is to improve collaboration between data analysts, data scientists, and process managers. Of course, that also means complying with corporate and legal guidelines. MLOps affects the entire life cycle of an ML project, including modelling, orchestration, and implementation as well as validation, diagnostics, governance, and business metrics. It can also help to reduce the number of failed ML initiatives, because when companies are able to iron out errors early on and bolster their machine learning projects with powerful MLOps, they can more effectively tap the potential of this new technology.
Why are so many ML initiatives prone to failure?
Primarily, this is due to the fact that companies are so keen to implement Machine Learning that they’ll apply it anywhere, whether that’s in customer service or predictive maintenance. But they don’t take the time beforehand to get a clear picture of how complex and multifaceted an ML project can be. Even just creating a model can be very time-consuming. For that reason, it can be helpful to start out by automating the modelling process. That’s one of MLOps’ key strengths, as it offers companies critical insights to help them efficiently integrate machine learning into their production processes.
How should companies approach this type of project?
A typical Data Science project consists of a number of steps. The first step is all about identifying a problem and clearly delineating the way in which it affects the business. Then data has to be extracted from what tends to be a very wide range of different company databases. After that, there’s the data preparation process, which deals with the challenge of ensuring that terminological and coordinative definitions are clear, and that they are verified over the course of the project. The next step is to derive correlations between data and identify datasets that are linked to the business challenge at hand. Data modelling and evaluation come next – here there’s a particular focus on continuous monitoring of the selected data model. These steps are indispensable for any ML initiative.
What aspects are important when it comes to monitoring and adjusting the model?
When the first ML model is created, the accuracy of that model is measured based on historical data and data that has been manually annotated. However, once the model has been implemented, it consumes completely new data. And because the new data is not annotated, there is no way to know whether or not the model is working the way it should. The historical data used to train the model produces an inaccurate depiction of reality. Long-term changes affecting the data set have not been taken into account. Therefore, the model must be subject to continuous monitoring. But how can you tell whether the model deviates from reality? Annotating new data is an expensive and time-consuming manual process that can’t be carried out on a continuous basis. But there are certain methods and indicators that can reveal errors in the model: for example, if properties or correlations between the features or the target distribution deviate from the training data, or if the correlations between the properties change. It’s not the model itself that changes over time, it’s the data concept, and that renders the initial model obsolete. That’s why changes in data should be implemented in the initial model. After the model’s performance has been monitored, the model is restructured, either by creating a new version of the best model or by retraining all of the models using background analyses, explanations, and coordination provided by employees.
There’s one important question at this juncture: What is the best way to implement this type of detailed and multifaceted ML project in existing business processes in order to optimize them? What are the critical aspects for companies here?
Machine Learning can involve multiple different programming languages that each utilize different systems in order to achieve the best possible performance. It is therefore absolutely essential that companies are able to seamlessly adapt and expand their flexibility when it comes to different programming languages. Model monitoring reports can provide insights into the accuracy and reliability of the ML models the company is using. However, established business processes often require special manual confirmation as well. To enable companies to use MLOps more effectively, they must have a high level of interoperability, or integration of data and systems and orchestration of business processes and services. Against this background, companies should look into leveraging a multi-model database which is particularly flexible when it comes to diverse data types, regardless of whether that data is relational, document-based, or object-based. All data models should be available for use simultaneously. The best databases for ML are also scalable, and provide a graphical user interface for improved data analysis, dashboards for different graphics, and interaction between different functions based on the company’s individual needs and the ML implementation scenario.
Can you provide an example of that kind of multi-model database?
Our InterSystems IRIS platform combines scalable data management, flexible interoperability, and an open analysis environment, all in a single solution. It can also be seamlessly integrated into existing IT infrastructures. This type of solution can whip ML models into shape so that they can provide comprehensive and efficient automation for a company’s digital transformation. It’s not only the perfect solution for handling diverse programming languages and systems – it also produces detailed model monitoring reports. Data scientists are immediately informed if deviations are detected between the data and the model. They can also initiate a range of AI processes to evaluate and explain the behavior of new models. In addition, InterSystems IRIS-based applications can be seamlessly scaled across different servers. Other useful tools include analyses, graph makers, and dashboards. This approach, based on reliable data, provides a sound foundation for future ML projects, allowing companies to truly get the most out of their machine learning initiatives.