ML Ops Landscape of 2020
A survey of ML Ops off-the-shelf solutions for building an automated machine learning pipeline
At WAY2VAT we have been running a specialized homebrew machine learning pipeline for years. Our pipeline is a piece of software that governs aspects of running our machine-learning-based patented product, the Automatic Invoice Analyzer (AIA). The AIA supports our everyday business by eliminating human processing time on all fronts of the VAT\GST reclaim process – from extracting basic transaction information to determining whether an invoice is eligible for submission in a claim for VAT\GST return.
The core of the AIA technology is composed of more than a dozen algorithms, each with specific training data and evaluation metrics. Together, the algorithms provide a coherent, detailed analysis of any invoice that we receive, from extracting fields to determining the language and currency. Many of the algorithms are co-dependent and run sequentially, each following on the results of the last, while the purpose of others is to correct the results of intermediate steps for getting a clearer picture.
The complexity of our product demands we keep on top of training and evaluating the models in production, as well as research into new methods. To that end, we use several tools:
- Dataset management: versioning, storage, querying
- Experiment management: running, validation and evaluation metrics, monitoring and dashboarding, hyperparameters and argument
- Model management: versioning, storage and deployment, health monitoring in production
- Orchestration: In a highly complex pipeline such as the one we have at WAY2VAT, we built an orchestrator that chains several experiment blocks with model and data inter-dependency in a directed acyclic graph(DAG) with cloud execution.
At one point several years ago, ML Ops – Machine Learning Operations (cf. a definition by Google: https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) has become a full-time occupation for our engineers. Based on our research into the ML Ops world, we offer a survey of the ML Ops landscape of off-the-shelf solutions, especially around the major pain points of experiment management. An off-the-shelf solution can dramatically reduce upkeep and maintenance of the scripts that run and schedule experiments and enable a tighter lock on hyperparameter searching and choosing, let alone the nice dashboards and graphs it produces.
Apache AirFlow
After a round of sourcing and research, we converged (pun intended) on Apache AirFlow (https://airflow.apache.org/) as the orchestrator for our DAG. AirFlow is a generic tool for running custom-code DAGs, and we repurposed it for spinning up cloud machines with Docker images that load data and run experiments. This may prove successful for others, but keep in mind it needs constant, daily care, since data and experiments are constantly changing with new features implemented. While very versatile, AirFlow’s code blocks in Python may end up being quite long and with many dependencies such as the cloud API, therefore making them harder to maintain. AirFlow also needs its own server for running, which adds some overhead cost and maintenance care for consistent uptime.
AirFlow does not give any tooling for managing datasets, experiments or models. It’s strictly a DAG executor. To complement AirFlow one must provide dataset, experiment and cloud machines management. We use a mix of homebrewed solutions as well as off-the-shelf data and experiment management systems.
The Current ML Ops Landscape
As part of our surveying the landscape of ML Ops tools to automate ML pipelines, we compiled the following list of vendors and open-source solutions. While it is not exhaustive, the list is valuable for those running ML-based products and looking for the right vendor or project. Note: we do not affiliate with any of these projects and this list simply reflects our team’s research.
Open-source tools are marked with an open book: 📖
Experiment Managers & Hyperparameter Tuners
These projects’ primary goal is to run and track ML experiments. Often, they would offer a DAG orchestrator as well as dataset management features. The outputs are usually accessible through actionable dashboards where experiments can be run, stopped, logged and tracked through their lifecycle (learning curves, exploring intermediate results). We also include in this bucket the Hyperparameter Tuners / Optimizers (HPO) that frequently go with experiment managers, since HPO many times requires multiple experiments. In this category we see many open-source tools, sometimes catering to the academic community and not just for industry.
- Weights and Biases: https://www.wandb.com/
Experiment tracking, hyperparameter optimization, model and dataset versioning - 📖Trains (by Allegro.ai): https://github.c om/allegroai/trains
Auto-Magical Experiment Manager, Version Control and ML-Ops for AI - 📖DAGsHub: https://dagshub.com/
A community-oriented data science platform for collaboration. Based on Open-Source tools and open formats. - 📖BOHB / HpBandSter: https://www.automl.org/automl/bohb/ https://automl.github.io/HpBandSter/build/html/quickstart.html
Implementations of recently published methods for optimizing hyperparameters of machine learning algorithms to efficiently search for well performing configurations. - 📖Sacred: https://github.com/IDSIA/sacred
A tool to help you configure, organize, log and reproduce experiments - ai: https://neptune.ai/
Lightweight experiment management tool - 📖ai: https://deepkit.ai/
collaborative and analytical training suite for insightful, fast, and reproducible modern machine learning - 📖KubeFlow: https://www.kubeflow.org/
Machine Learning Toolkit for Kubernetes - 📖Luigi: https://github.com/spotify/luigi
From Spotify, Luigi helps you build complex pipelines of batch jobs handling dependency resolution, workflow management, visualization, handling failures, and more.
Dashboards
These projects primary goal is to provide visualization for ML experiments. Usually, this functionality is built into projects in the above list. Since the application is fairly generic, some tools such as Kibana are in fact tools for general purpose dashboarding however they can be applied very simply towards tracking ML experiments by analyzing runtime logs.
- 📖Tensorboard from Tensorflow: https://www.tensorflow.org/tensorboard
Visualization and tooling needed for machine learning experimentation. - 📖Omniboard: https://github.com/vivekratnavel/omniboard
a web dashboard for the Sacred machine learning experiment management tool. - 📖Kibana: https://www.elastic.co/kibana
An open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. - Grafana: https://grafana.com/
Compose observability dashboards. It has an open source 📖
All-in-One: Experiment, Deploy and Monitor
These projects are a one-stop-shop for ML Ops. They offer solid experiment management alongside dataset and model management. These are most often paid services, with prices in the $100s/seat/month range, rendering them infeasible for smaller organizations. In return to a steep price point they offer customer support, guaranteed uptime through reliable hosting.
- 📖ML Flow: https://mlflow.org/
An open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. - CNVRG: https://cnvrg.io/
An end-to-end machine learning platform to build and deploy AI models at scale. - SageMaker: https://aws.amazon.com/sagemaker/
helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models. - Valohai: https://valohai.com/
Automates everything from data extraction to model deployment. - 📖ai: https://guild.ai/
Systematic control to machine learning to help you build better models faster. It’s freely available under the Apache 2.0 open-source license. - Polyaxon: https://polyaxon.com/
Reproduce, automate, and scale your data science workflows with production-grade MLOps tools. - ml: https://www.comet.ml/site/
Track, compare, explain and optimize experiments and models. - ai: https://allegro.ai/
End-to-end enterprise-grade platform for data scientists, data engineers, DevOps and managers to manage the entire machine learning & deep learning product life cycle.
Conclusion
The ML Ops landscape in 2020 has grown to great proportions. From general tools for DAG execution, logging and dashboards it transformed to a multi-billion-dollar industry that’s driving AI in the largest companies in any domain. All-in-one tools can offer end-to-end features for rolling out a machine learning pipeline but are usually expensive and sometimes over-promise and under-deliver. Open-source point solutions for experiment management are at zero cost but have overhead for maintenance and require building infrastructure. There are tradeoffs to selecting a path, however it is now clear that ML Ops is going to keep growing and changing the way we productize machine learning.