Five theses on Machine Learning Operations (MLOps)

5 Thesen zu Machine Learning Operations (MLOps)

More and more companies want to integrate artificial intelligence efficiently into products and services. In many cases, machine learning (ML) will be the method employed as a means of implementing AI solutions. However, challenges often arise when it comes to making ML solutions operational, with studies such as Gartner’s suggesting that around 80% of all projects ultimately fail:

The quality of the training data is still inadequate in many cases
Many employees and end-customers lack confidence in ML/AI-based applications and services.
ML solutions are sometimes developed from scratch and do not take into account the actual processes and requirements of the individual business departments.
The wide range of machine learning developer tools and the lack of standards and frameworks to date make ML difficult to implement.
Integrating ML systems into existing applications and processes is complex, costs a lot of time and requires significant capacities.

Machine Learning Operations (MLOps) offers a solution to this problem: just as DevOps has increased software developer productivity, MLOps encompasses a set of practices and methodologies for faster development and more sustainable operation of ML-based solutions. Why is MLOps important if you want to make your business data-driven? And what does MLOps mean in practice? Since the technology and terminology are still new and used in different ways, let’s take a closer look at them first. We will then illustrate the relevance of this topic for the data-driven business models of the future with 5 statements about MLOps.

What is MLOps? A review of common definitions

There is currently no standard definition of Machine Learning Operations (MLOps), as this is a relatively new concept. Popular sources such as Wikipedia understand it to mean “a set of practices aimed at reliably and efficiently deploying and maintaining machine learning models in production.” The practices mentioned are often reduced to technological aspects, ignoring the fact that people, with their roles, skills and method competences, and the processes in a company also play an important role. Adding these levels results in a holistic concept that aims to bring data-based services into productive use in short development cycles and to operate them so that they quickly generate economic added value and reduce employee workloads. This concept is flexible and does not entail a fixed structure. Instead, it is a collection of a wide variety of data science and software development approaches as well as best practices around machine learning (ML).

In practice, this means that MLOps brings many different ML models into operational use in the short term and drives improvements in business processes. The entire ML system must be monitored during this process to ensure that it is operating properly. This creates high demands on three levels: processes, technologies and people. MLOps therefore also encompasses the overall management of all ML models within an organization at these three levels.

Visualisierung von MLOps als Schnittmenge aus DevOps, Data Engineering & Machine Learning

The advantages of MLOps

If ML models are quickly put into operation and continuously retrained and cleanly integrated into the existing IT landscape, companies can streamline and optimize many processes, ranging from internal processes and customer service to manufacturing and production processes. MLOps helps to implement Machine Learning quickly and smoothly in business operations:

Faster delivery of more ML models thanks to automated processes
Optimize productivity of ML solutions through integration and reuse of machine learning models and algorithms
Conserve resources for administration and maintenance and thereby reduce the workload of the IT department and data science teams
Increase the quality of processes and data in the machine learning lifecycle through continuous monitoring and updating of the respective ML model (continuous integration and continuous delivery, CI/CD)

MLOps and DevOps: differences and similarities

The term MLOps is reminiscent of DevOps and is actually based on it, but came into being somewhat later on. DevOps refers to a set of practices and tools that dovetail the work of software development (Dev) and IT operations (Ops) and support them, in some cases through automated processes. Its purpose is to ensure the continuous provision of software. This is achieved through a high degree of automation, including integration, testing, and code implementation. By pooling both areas of responsibility in one team, the lifecycle of software system development is shortened and its quality improved.

MLOps is similar to DevOps in that it combines the name components Machine Learning and Operations. This is supplemented by data engineering, which encompasses data acquisition, data preparation and data processing throughout the entire lifecycle of the data. It is essential that high-quality data are available in sufficient quantity to be used for training machine learning algorithms and optimizing models. This adds data management and analytics in addition to code. They are combined with software development and deployment and implemented iteratively.

The strategic relevance of machine learning operations

There is hardly an industry or company that has not identified artificial intelligence and therefore machine learning as a key competence. As Bitkom reports, many companies are already using machine learning in their products and services, and the proportion of companies using ML is rising steadily. Is MLOps just a short-term trend? How can the relevance of machine learning operations be assessed with an eye to the future? The following five theses give an indication of the future viability of MLOps:

Five theses about MLOps

1. The business models of the future will be data-driven

Every business model, every product, every service can be optimized using insights from data. This optimization determines who prevails in the market. Supported by data, the price and quality of services in particular are improved and products are made more attractive. Competitiveness is strengthened by offering customers solutions and products that are more precisely tailored to their needs. Data can also be used to optimize numerous internal processes and release employees from monotonous, time-consuming tasks. This gives them more time to devote to strategic issues or customer care, which in turn improves the company’s competitiveness.

2. Machine learning allows for data-driven products and services

Patterns in data detected by machine learning (ML) open the door to a wide range of applications. Services and products that were too labor- or cost-intensive in the past become feasible. Decisions that tie up skilled workers today can be automated for tomorrow with ML techniques, such as recognizing and extracting information about customer concerns. ML can also assist the processing of complex issues by making reference material semantically retrievable. Information that is available in data sets that are impossible for humans to comprehend can be used to make better and faster decisions.

3. Every aspect of data-driven products and services requires ML

In order to exploit the full potential of enterprise data, ML must intervene in increasingly small-scale decisions and be fully integrated into operational processes. Every decision made by ML brings with it a small increase in efficiency. The result is highly automated and optimized business models. Furthermore, processes and products must be designed to generate usable data and enable versatile use of ML.

4. Companies must be able to deploy a wide range of ML services

Applying ML to every aspect of a company’s products calls for a variety of ML services to be deployed and integrated into all of the company’s products and processes. Given that these services require professional and technical support, resource bottlenecks can easily occur here. The overhead for any new ML service must be kept low – both during development and once it is up and running. This requires a high degree of standardization and automation, which is often not yet present in the field of ML.

5. MLOps is the paradigm for deploying ML efficiently in a company

MLOps provides the technological and organizational basis for integrating ML into all of a company’s products and services. Much as DevOps and Agile have streamlined and accelerated software development, MLOps streamlines and standardizes the development and delivery of ML-based services, enabling the next generation of data-driven business models.

Key aspects and conditions for making MLOps a success

The multitude of ML platforms and other tools show that MLOps is often reduced to its purely technological aspects. In practice, this is not sufficient and will not achieve the desired success, as the human factor is not taken into account. In fact, MLOps requires change in several areas: on the technological level, on the process level, and on the human level.

Human level

Enablement of employees who work with machine learning or whose work is supported by it.
Change & adoption management for all departments whose way of working is affected by Machine Learning
Establishment of roles & responsibilities for the operation of MLOps
Development of internal software development expertise or support from external experts

Process level

Establishment of processes & platforms through which the ML models are developed, evaluated & operated.
Automation of machine learning & data science processes
Establish ML pipelines & procedures for testing & monitoring
Establishment of communicative processes within the team and of quality standards

Technological level

Creation of a platform on which code and machine learning models are developed, evaluated and operated
Establishment of the necessary IT architecture, notebook environments & ML pipelines
Preparation & procurement of training data, if necessary
If necessary, establishment of compliance-compliant cloud platforms & access to code & models from open-source sources

The challenges of ML systems

Technical challenges

Long-term operation and, in this regard, the constant monitoring, adaptation and further development of the systems is a major challenge of ML systems. If technical vulnerabilities arise, which is naturally relatively common with complex systems, high costs are incurred in ongoing rollout and maintenance. In the worst-case scenario, the machine learning solutions may even fail. MLOps can counteract this by providing a high level of automation that extends into testing, monitoring, and delivery, as well as by changing the mindset of the organization.

Ethical challenges

Another challenge that has become increasingly prevalent recently is the reliability of machine learning-based decisions.

How are decisions made using ML? Are they fair and do they not discriminate against anyone?
How robust & safe are the systems & models? What precautions are needed in order to minimize the risk of cyber-attacks?
How are data protection issues handled? Are regulations such as the EU GDPR complied with?
How are ML solutions & ML-based decisions monitored?

The principles of digital ethics require that these aspects be taken into account when using artificial intelligence, including machine learning. MLOps helps to meet the requirements for reliable AI by, for example, regularly monitoring ML systems and also paying attention to the human level alongside the technical and process levels.

How does integrating MLOps work in practice?

There are numerous MLOPs practices and approaches. Anyone who is familiar with these multi-layered aspects knows which levers need to be adjusted at which level in order to close the gaps quickly and with lasting effect. MLOps is usually rolled out through interdisciplinary collaboration between data scientists, machine learning engineers and product owners from the business units. The following guidelines help to ensure that machine learning can be used successfully with MLOps:

1. Take data quality & quantity into account

Machine learning requires a sufficiently large amount of data with which to train the algorithms. Medium-sized companies in particular do not always have such a pool of data at their disposal compared to large corporations, and they need to step up their game. One way to do this is to collaborate with partners from the same industry using federated learning. Here, ML models are trained in each company and exchanged with those of other companies. This means that the original data does not have to leave the respective organization. Those who want to use their own data for machine learning will discover that a BI tool is useful to help identify and classify the relevant data.

2. Keep an eye on the use case roadmap

Even though everyone is talking about machine learning and artificial intelligence and many projects have already passed the proof of concept, the decisive step for implementation in business operations is often missing. To avoid getting stuck here, it is important to identify the use cases that are to be supported by ML at an early stage. Ideally, these will be staggered based on the expected success: quick wins come first on the roadmap, more complex cases later. This enables a company to proceed step by step and learn from the first cases for future projects without incurring too much investment risk.

3. Agile and step-by-step approach

The vision of a completely AI- and data-driven company indicates the direction when it comes to maintaining and expanding future and competitive strength. However, in addition to long-term solutions, companies also need solutions that can be implemented in the short term and that make a difference. An agile approach makes exactly that possible. In order to achieve this, it is worthwhile to rely on small teams that cover three different competencies: data science, technical and IT infrastructure expertise. This allows ML solutions to be developed according to the principles and methods of MLOps.

4. Commit to a central platform & API-First

Ideally, small, agile data science teams will develop ML services independently on a central ML platform. Interdisciplinary collaboration between data scientists, machine learning engineers and product owners from the business unit is recommended. Standardized interfaces are used to exchange information between the departments. Here, an API-first approach has proven to be the best solution for companies. The data science teams are empowered both technically and procedurally to provide ML services to business units independently.

5. Ensure that data & models are up to date

Once an ML service has gone live, the data science teams retain functional responsibility for the service and have the ability to monitor and develop the services. This can also be done independently of the release cycles of the other applications in the departments. Given that all operational data changes on a regular basis, fast and regular updating of ML models is a key factor for success.

6. Automate & reduce workload

A high degree of automation can minimize the effort required for monitoring and integrating new data. The central platform helps teams implement fully automated processes right from the outset. Container technologies and version control systems ensure the reproducibility and auditability necessary for reliable operation.

Examples of AI solutions built on machine learning operations

Digital twins

Digital twins that use data and algorithms to simulate the status, functions and reactions of machines and processes, among other things, have been a familiar sight in industry for a long time now. This enables companies in industry to develop prototypes quickly and cost-effectively, for example, and to optimize their entire value chain. Digital twins also offer numerous advantages in the fields of medicine and pharmaceutical research: trial cohorts can be created with virtual patients, which accelerates drug development. Doctors can make diagnoses more quickly and simulate therapies on digital twins. This results in better opportunities for personalized medicine and lower healthcare costs.

The challenge here is that companies sometimes do not have enough of their own data to train digital twins. However, expanding a company’s in-house data set by exchanging sensitive personal health data or business-related production data is legally and ethically unacceptable. Federated machine learning approaches that make share-without-sharing possible offer a safe alternative here. The standardization and automation among participating companies that comes with implementing MLOps makes it easier to apply federated learning across organizational boundaries.

Input management

Numerous sub-processes in incoming mail processing can be improved and accelerated through machine learning. This starts with improving the recognition quality of OCR/ICR components in the scanning line and continues through page and document separation to extracting specialist data for further use in downstream processes. It also enables fully automated dark processing for many customer requirements. This frees case handlers from having to perform manual, repetitive tasks that do not require specialist knowledge. Many input management solutions now have suitable interfaces for connecting ML modules. This can be used to optimize an existing input management system if a new one is not going to be introduced immediately or if the company is not going to develop its own solution.

MLOps is important to ensure that the large number of required ML components can be operated smoothly and integrated into the existing processes.

Compliance-conforming AI solutions

The key question with many AI solutions is how decisions are made. This is relevant in the insurance industry, for example, for ensuring that there is no discrimination in the processing of claims: no group of people may be given preferential treatment. If insurers use AI-supported and automated processes, they must be able to prove they have done so if there is any doubt. Now that the EU has formulated guidelines for trustworthy AI, BaFin and EIOPA are following suit in the insurance environment and formulating their own approaches to digital ethics and regulatory requirements. In addition to regulation, however, more and more customers are also demanding transparency. Compliance-conformant, trustworthy AI is therefore becoming a competitive criterion.

When AI solutions are implemented using MLOps, the automation aspect comes into play. The high degree of traceability makes it possible to answer questions such as: Which data was used to train this model most recently? How are fairness & non-discrimination ensured in decision-making? Are the results of AI applications understandable and transparent? These and other aspects are taken into account in the course of monitoring and operation with MLOps.

Next steps

Comma Soft brings to the table many years of experience from numerous projects with DAX and large Mittelstand customers and provides support for the planning, design and implementation of ML and MLOps projects. We guide our clients through the process of assessing current approaches to developing ML services within the company and integrating them into their own products. Together, we create a bespoke gap analysis that takes into account the individual requirements of each company.

We support the selection of new and the configuration of existing tools for MLOps, and work together with our customers to prepare and conduct training sessions to establish best practices. Acting as a sparring partner during the initial projects with MLOps, we work with IT and relevant departments to design processes that enable the rapid and secure integration of ML into products and services.

If you would like to learn more about MLOps and how it can be used in your company, please contact Dr. Lars Flöer.