Organizations implementing AI have increased by 270 per cent over the last four years, according to a recent survey by Gartner. Even though the implementation of AI is a growing trend, 63 per cent of organizations
For many organizations, it is the inability to reach the desired confidence level in the algorithm itself. Data science teams often blow their budget, time and resources on AI models that never make it out of the beginning stages of testing. And even if projects make it out of the initial stage, not all projects are a success.
One example we saw last year was Amazon’s attempt to implement AI in their HR department. Amazon receives a huge number of resumes for the thousands of open positions. They hypothesized that they could use machine learning to sift through all of the resumes and find the top talent. While the system was able to filter the resumes and apply scores to the candidates, it also showed gender bias. While this proof of concept was approved, they didn’t watch for bias in their training data and the project was recalled.
Companies want to jump on the “Fourth Industrial revolution” bandwagon and prove that AI will deliver ROI for their businesses, the truth is, AI is in early stages and many companies are just now getting AI-ready. For machine learning (ML) project teams that are starting a project for the first-time, a deliberate, three-stage approach to project evolution will pave a shortcut to success.
Test the fundamental efficacy of your model with an internal Proof of Concept (POC)
The point of a POC is to just prove that, in this case, it is possible to save money or improve a customer experience using AI. You are not attempting to get the model to the level of confidence needed to deploy it, just to say (and show) the project can work.
A POC like this is all about testing things to see if a given approach produces results. There is no sense in making deep investments for a POC. You can use an off-the-shelf algorithm, find open source training data, purchase a sample dataset, create your own algorithm with limited functionality, and/or label your own data. Find what works for you to prove that your project will achieve the corporate goal. A successful POC is what is going to get the rest of the project funded.
In the grand scheme of your AI project, this is the easiest part of your journey. Keep in mind, as you get further into training your algorithm, you will not be able to use sample data or prepare all of your training data yourself. The subsequent improvements in model confidence required to make your system production-ready will take immense amounts of training data.
Prepare the data you’ll need to train your algorithm…and keep going
Now the hard work begins. Let’s say that your POC using pre-labeled data got your model to 60 percent confidence. Sixty percent is not ready for primetime. In theory, that could mean that 40 percent of the interactions your algorithm has with customers will be unsatisfactory. How do you reach a higher level of confidence? More training data.
Proving AI will work for your business is a huge step toward implementing it and actually reaping the benefits. But don’t let it lull you into thinking the next 10 percent confidence is going to be 6x easier than that. The ugly truth is that models have an insatiable appetite for training data and getting from 60 to 70 percent confidence could take more training data that it took to get that original 60 percent. The needs become exponential.
Roadblocks to watch out for
Imagine, if it took tens of thousands of labeled images to prove one use case for a successful POC, it is going to take tens of thousands of images for each use case you need your algorithm to learn. How many use cases is that? Hundreds? Thousands? And there are edge cases that will continually arise, and each of those will require training data. And on and on. It is understandable that data science teams often underestimate the quantity of training data they will need and attempt to do the labeling and annotating in-house. This could also partially account for why data scientists are leaving their jobs.
While not enough training data is one common pitfall, there are others. It is essential that you are watching for and eliminating any sample, measurement, algorithm, or prejudicial bias in your training data as you go. You’ll want to implement agile practices to catch these things early and make adjustments.
And one final thing to keep in mind, AI labs, data scientists, AI teams, and training data are expensive. Yet, in a Gartner report that says that AI projects are in the top three priorities, it also states that AI is thirteenth on their list of funding priorities. Yeah, you’re going to need a bigger budget.