A Complete MLOps Tutorial: Automate Machine Learning Project Pipeline

CI/CD, Data Science, DevOps, Machine Learning

A Complete MLOps Tutorial: Automate Machine Learning Project Pipeline

As a Machine Learning Engineer or Data Scientist, work to solve the business problem by following ML Lifecycle. Like Data collection, Data processing, Model building and help to deploy models on cloud or edge computing platforms.

Have you noticed, Data Scientist mostly work on Jupyter Notebook and it is not ready for deployment. Data Scientist do research on data and try to make a best prediction model which gives the prediction on input dataset.

He / She is mostly expertise in statistics, data analysis and model building. But after creating a powerful prediction model we need to serve the service of that model to the users. So how will you serve this model?

Here is the term comes model deployment. So, who is responsible for model deployment? The person who knows Machine Learning / Data Science, S/W Development, Cloud and Maintenance. That person can deploy the ML model in production and be responsible to maintain it. The name of that person is called MLOps Engineer.

So, what is MLOps?

What is MLOps?

MLOps is the process of streamlining the ML model building, deployment, maintenance and monitoring.

The major role of MLOps engineer is to keep ML model Lifecyle in production efficiently and active, if any update comes in this lifecycle, then the cycle of this pipeline updates it automatically and serves the best model services to the user. This process continues until the project is live.

What is the exact meaning of MLOps?

MLOPs word derived from DevOps.

In short DevOps mean, shorten the process of software development lifecycle by providing the service of continuous integration and continuous delivery in production.

DevOps = Development + Operation

I hope you guessed the meaning of MLOps.

MLOps = Machine Learning + Development + Operation

IMG 1: MLOps Venn Diagram
Source: wikipedia

In MLOps, DevOps is a major part of MLOps with ML Model development. It means Continuous Integration, Continues Development, Continuous Delivery and Continues model training and Continues model evaluation.

MLOps vs DevOps

DevOps is used for software development and application development. MLOps is specially used for Machine Learning / Data Science project.

In DevOps, once the code is ready for deployment then the process of production is very easy. After production, if there are any changes needed in code then the update is also less time consuming and the variation in output very less or we can say no variation. Because the output is predefined.

In MLOps, Once the POC (proof of concept) is done or we can say research on business data is done then we get the best prediction model. After that we deploy it. But in future variation of data or due to data drift, needs to train model on a new data set.

Then evaluate and compare previous model metrics with new one. At the end deploy the best version of the model in the pipeline.

The time complexity of productionisation / recursion in DevOps is very less. In terms of MLOps, model training takes more time like a days or weeks.

MLOps History

Deployment and maintenance of the ML model in application is challenging and that is highlighted in the 2015 paper. The growth of ML implementation is exponential in the market and demand of pipeline automation also increases day by day. The MLOps keyword search started from 2017 and exponential growth noticed from 2019, you can see in Google Trend in below image.

Google launched the Kuberflow open-source project in 2018 to provide MLOps services on Kubernetes.

Research shows that 88% percent of corporations are doing R&D on AI technology but few of them productionize their model with a 3-15% profit margin. Due to that growth, MLOps market was estimated at $23.2 billion in 2019 and will reach up to $126 billion till 2025. It shows that MLOps is the future.

So, if you are looking forward to a new career opportunity then the MLOps engineer position will be one of the better choices for you. If you are running any business using AI, then you must adopt MLOps in your projects.

Traditional Machine Learning Lifecycle

Machine Learning is a subfield of Artificial Intelligence, the goal is to find the hidden patterns from the data using mathematical models called ML algorithms to solve business problems. The data is the oil for ML model without data ML engine will not run in the business world.

IMG 3: AI, ML Venn Diagram
Source: researchgate

Steps Involved in ML Projects

Business Problem
Data Collection
Data Preprocessing
Model Training
Model Evaluation
Model Deployment
Monitoring & Maintenance

Machine learning / Data Science projects start from business problems which are introduced by the client. To solve that problem ML engineers collect the data, data source will be any. After collection of data, 70% to 80% time spent on data processing. The main purpose of this step is to make data ready to train ML models. According to the nature of data and business problems, ML engineer train ML models. Model evaluation steps help to give the best performing model on the existing dataset. At the end deployment process starts with other department team members like DevOps Engineer, Data Engineer, Software Developer etc.

Problems in traditional ML lifecycle

As we understood the lifecycle of ML projects. All the steps are done manually. After deployment of the ML model, model performance starts degrading because of variation in data or due to data drift. Then ML engineers train models on new data and deploy the next better version of ML model. After training, if the model performance is not improving then they start working on a feature engineering pipeline. Once it is done then they check the performance of trained models with old ones. At the end deploy the best model.

We can observe that the repetition of the ML pipeline is done manually, and it’s time consuming. Talking about time, think about how much time it takes? One horse or 2 hours, noooo!!!! It takes days or weeks. Depends on data and changes in the ML pipeline.

This is the pain point of the traditional ML lifecycle. To solve this problem MLOps was born and now it’s the main part of the Machine Learning / Data Science project.

Lifecycle of MLOps

MLOps lifecycle is combination of Machine Learning, Development and Operation

Phase 1 – Machine Learning (ML):

In this phase, start working on problem statements and gathering data. After that do experiment on data and find the initial best ML model.

Phase 2 – Development (Dev):

This is a recursive phase, works on CI/CD pipeline. Means continues model building, testing, integration and development is done here.

Phase 3 – Operation (Ops):

It is the last phase of MLOps. This phase is responsible for continuous delivery and takes the feedback to retrain the model. Maintaining and monitoring is the part of phase 3.

Hope you understood the overall working of MLOps.

Deep Diving into Lifecycle of MLOps

Before deep diving into lifecycle of MLOps, let me tell you that lifecycle of ML project and MLOps is mostly same but the differences is that ML project pipeline done by manually, every time to deploy better model but in MLOps is done automatically for single commit also.

MLOps lifecycle in detail — IMG 6: MLOps Lifecycle
Source: Udemy

The role of a data scientist is to create best predictive models to solve business problems from the existing data by experimentation. After completion of POC, he handed over the Jupyter file to MLOps engineer.

MLOps engineers create the automatic ML pipeline for deployment. If any changes are done in the existing ML pipeline or single commit, this pipeline will trigger and start the whole process of ML pipeline automatically till best model serving.

Automatic ML pipeline generate the best model then register this in the centralised model store and do versioning for it.

Then the best model deploys in the existing production pipeline to start prediction and keep monitoring its performance and maint it. Once the model degrades or any changes happen in code. This pipeline triggers and starts the whole process automatically again and again. So this is called MLOps.

Hey, don’t worry about how to create an automatic ML pipeline or MLOps workflow. We have tons of framework to make this pipeline.

MLOPs Levels

The maturity of MLOps is defined by the level of automation. Which is based on the automation process of Machine Learning, CI/CD pipeline. The different framework providers or cloud services define their own level of MLOps. Here we have given 3 levels which are defined by Google.

MLOps level 0: No MLOps, all the process of ML project is manual

MLOps level 1: Automated ML pipeline for continuous training

MLOps level 2: Whole ML pipeline like CI/CD is automated

MLOps Tools/Frameworks – Make ML pipeline Automated

Till know you have learned what is MLOps but might be you are curious to know that how to create MLOps automated workflow. To create MLOps workflow we have tons of tools and frameworks. Which can make your work so easy.

Below is the list of MLOps Frameworks, Tools and Cloud Platforms. Some of them are open-source and paid services. All the tools have their own specialty but according to your project choose one or more.

MLflow
Metaflow
Kubeflow
Seldon
Amazon SageMaker
Azure Machine Learning
Google Cloud AI Platform
Paperspace
Algorithmia
HPE Ezmeral ML Ops
Domino Data Lab
Kedro
FastAPI
ZenML
Valohai
Iguazio
H2O MLOps
Neptune.ai
Cloudera Data Platform
TensorFlow Extended (TFX)
Data Version Control (DVC)
Pachyderm
Flyte
MLRun
more….

How to Choose Best MLOPs Framework

Already told that, every framework has its own specialty. So, your project tells you which framework you should use. But here you will get a list of 15 important concepts of the MLOps framework. You should consider it before choosing.

Development platform
Model unit testing
Version Control
Model registry
Model Governance
Deployments
Monitoring
Feedback
A/B testing
Drift detection
Outlier detection
Adversarial Attack Detection
Interpretability
Governance of deployments
Data-centricity

Benefits of MLOps

Time: Time is the most important entity of an ML project. Continues training and validation takes more time and if you are doing manually then you will lose your market because competitors are also in the market to give better service than you. So MLOps is the best option to speed up the process of ML projects.

Accuracy & Efficiency: We know that models degrade with time, so we train it with new data and deploy better version models. MLOps helps to give better models repeatedly.

Scalability: To serve the services of your project on multiple platforms then MLOPs is the best choice because it has great scalability.

Monitor: Everything at one place and organized, due to that it’s easy to monitor a full project.

Cost Effective: It saves the time of team members in a project and structures the pipeline.

Open-source & closed-source: Lots of powerful MLOps framework are free to use and paid also

Platform & specific tools

Model templating and cataloguing

Pipeline management

Collaboration and communication capabilities

Roles & Responsibility in MLOps

In every project, multiple experts are involved to solve the problem. Like in MLOps multiple roles involved directly and indirectly. The main heroes in MLOPs are Data Scientist, Data Engineer and Machine Learning Engineer. Apart from those, other roles are also involved. You can refer to the image below, it’s self-explanatory.

IMG 7: Roles in MLOps Project
Source: devopsschool

Future & Career in MLOps

Demand for AI and ML is exponential. Every company started an AI R&D center in the company and solving complex problems. MLOps help to automate ML project pipeline. Investment in this field will touch $126 billion till 2025. Those factors indicate that the future is yours if you will become a MLOps engineer.

Conclusion

So, in this complete MLOps tutorial you learned complete information about MLOps. Hope, your all MLOps concepts are cleared. The major goal of this tutorial is to give theoretical understanding. In the next tutorial, we will get our hands dirty on practical MLOps.

0 Comments

MLOps, MLOps Tutorial, statusneo

StatusNeo

A Complete MLOps Tutorial: Automate Machine Learning Project Pipeline