Why Automated Machine Learning is Here to Stay
Getting good at Machine Learning requires hours and hours of study, and for many, the idea can be daunting. Training a model involves many complicated steps, such as data cleaning, feature engineering (transformation), algorithm selection, model training and model evaluation. Each worthy of its own course.
Data science deviates from Software Engineering in the sense that it is often more of an art than an engineering discipline. While we as developers may know how to write well-performing applications (or quickly look up how to do so), training and attaining an accurate model is not a given.
Thankfully, companies such as Google, Microsoft, and other open-source projects have spent a lot of time developing a concept known as Automated Machine Learning, or AutoML.
The idea behind AutoML is to automate certain aspects of the machine learning workflow, such as:
- Data cleaning (e.g. replacing missing values, converting data types)
- Feature engineering (feature selection and creation)
- Model training and evaluation
- Fine-tuning of hyper parameters
Now instead of having to know the specifics of binary classification, or how to use one-hot-encoding to transform textual values to float vectors, a developer can instead become productive from day one, by just leveraging known data sources. To train a model using AutoML all we need to do is to tell a given framework what data we wish to use, and define the maximum amount of time we wish to search for a solution.
As you may suspect, the ease of use and the productivity gains AutoML offers comes at a cost. We'll instead pay for additional computing power while searching a given problem space, which may or may not yield a good model. Most frameworks allow automated machine learning to be run either on-premise or in the cloud, so you'll have some control of the cost.
In my view, Microsoft's open-source cross-platform library ML.NET takes AutoML to a new level. Although their AutoML capabilities are not yet fully built-out to support Natural Language Processing (NLP) and other advanced deep learning tasks, they provide a neat and simple extension to Visual Studio to guide users through the process.
In addition, ML.NET will also generate both the code required to re-train the model, and the code needed to consume the model. Being able to view the code used to train a model serves two distinct purposes. Firstly we are able to fine-tune the model ourselves given business domain specific knowledge, and secondly, it serves as a great starting point for anyone getting started with the framework.
In summary, Automated Machine Learning is here to stay, and we'll likely continue to see new breakthroughs in the field in the coming year. AutoML enables software developers without any previous machine learning expertise to become productive machine learning engineers from day one. It's an important step in the democratization of AI and ML, and it nicely blends the engineering best practices of automation with the art of data science.