Paper Review: “Efficient and Robust Automated Machine Learning”
Feurer, Matthias, et al. “Efficient and robust automated machine learning.” Advances in Neural Information Processing Systems. 2015.
The authors make the case for their contributions by motivating a two problems:
- There is no one ML model that works best across all datasets
- Hyperparameter tuning ML models expensive
Figure 1: Some examples of hyper parameters in popular ML systems
They propose to remedy these problems by a two fold solution:
- An optimised hyperparameter search solution that drastically cuts down on the original $O(n^n)$ search space
- A solution that optimally combines models from step 1 into a an ensemble
Figure 2: The proposed AutoML system features meta-learning and ensemble construction
The authors utilise a powerful black-box bayesian optimisation techniques originally outlined by . The popular method, called SMAC, was not originally designed for ML hyperparameters but has clear application on the context of ML hyper parameters. The authors leverage SMAC by making an extension that reduces the search time more, by bootstrapping the search process w/ something they call Meta-Learners
In order to reduce the search time of the bayesian method, SMAC, the authors introduce some past knowledge that was gathered from a repository of dataset called OpenML. They made a feature-generator that takes input as an arbitrary dataset and the output is a set of features which describe the dataset. Given this new dataset of meta-datasets they can now generalise datasets in the future and predict models that will work well on them. Since you can immediately cut down on the search space of applicability model/params you can start the SMAC routine w/ some more sensible defaults closer to the global minima.
Given that we can now find the best model and parameters for given dataset, we are still faced w/ the problem that no one base model works best on all datasets. I think this is where they contribute this ensembeling solution. The system builds an ensemble of many base models and preprocessing routines.
All of this is integrated w/ the popular Scikit-learn package.
At first this paper seemed like magic: Auto ML! However, after reading it and understanding how it works, it is clear to me know that it is not magic, but rather good engineering. They took many existing components and technologies and diced them tougher to make a piece of powerful technology. For example, see the two case studies below. In the first, the model will try to fit a classifier to MNIST. It gets rather impressive accuracy, 99% but you also have to consider the cost: it builds an insane ensemble of 5 parts. It is known that simpler and more interpretable base-models can get in that range. The other case study is a bit contrived, but a colleague of mine spent about 40 hours building a model regression that got .16 MAE. I had him send me the same dataset and I fit the regression using AutoML and it got very close to the same accuracy and only took 5 minutes! The trade-off is the crazy 5 component ensemble that it generated.
I think the ensembeling component is very powerful, especially if you are a Kaggler, but if you are in industry, I think the more important value-add is the meta-learning and the boost that it gives the SMAC search process. Also the authors are attempting to standardise the reporting of the meta learnings in a common database format so the community can share and grow as more datasets are published, which will have the system get smarter.
So in conclusion, this is a great engineering work in that it gathered a great dataset of dataset characteristics and found a way to use it to bootstrap the hyperparameter search process.
Also, if anything, it introduced me to the space of bayesian black-box optimisation which I will have to study formally now — seems very powerful. I ran some examples scripts of SMAC which find the optimal params for an SVM and random forest and was very impressed w/ the speed (even w/o meta learners).
Also kudos to the authors for reproducibility in that all their code is on GitHub w/ a docker image and a working pip package w/ deep integration into Scikit-learn!
Figure 3: Shows how meta learning finds an optimal model in a fraction of the time while ensembling isn’t that much of a value add.
Figure 4: Shows how there is no one universal model but most of the time GB is a good choice which is evidenced by most Kaggle submission winners.
Figure 5: Shows how the ensembles are not superior to the base models, but that using the hyperparam search solution you can get to the optimum in a fraction of the time.
Figure 6: AutoML made a competing model in 5 minutes but at what cost? Look at the insane ensemble it made when a simple linear model can get the same evaluation. The clear value add here is time savings not the ensembling.
-  F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithmconfiguration. In Proc. of LION’11, pages 507–523, 2011.