Black-box Optimization

Alex Egg,


There are is a wealth of literature on optimization techniques outside of gradient-based methods, however, I think we just don’t often consider them as such.

For example, I was chatting with a guy once and he asked me if I know any black-box optimization techniques. I thought for a bit and naievly said no. In his expression he looked a bit perplexed b/c we just had a conversation about hyper-parameter turning which is essentially block-box optimization. I hadn’t made that connection: whether we are doing grid search, random search, or something more advanced like bayesian optimization or some bandits method we are practicing black-box optimization, meaning we don’t have any heuristics to exploit.



Combinatorial problem if you have more than a few hyper parameters.

This is much better than grid search b/c it won’t get stuck iterating parameters that potentially have no effect.

This paper by Bengio1 motivates random search over grid search.


Among the choices above that are not completely trivial and/or complex is the Nelder-Mead method which outdates machine learning. Nelder-Mead is a numerical technique so it uses iteration to solve the problem and is included in the Scipy package.

Define the function you want to optimize, in this case it’s in 2 dimensions and give the algorithm a starting point: (1,1)

def func_1(t):
    return (x*x) + (y*y)
from scipy.optimize import fmin
fmin(func_1, [1,1])
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 38
         Function evaluations: 69
Out[17]: array([ -2.10235293e-05,   2.54845649e-05])

As you can see I had it optimize a convex (parabolic) function in 2 dimensions and it found the minimum which is indeed at (0,0)

Bayesian Methods

Became interested in this topic after reading the AutoML2 paper, as I though the Bayesian Optimisation they did was more interesting than that main contribution of the paper. They describe a bayesian optimisation routine called SMAC which cut their hyper-parameter search space considerably. This is an exploration of previous work in the space.

Gaussian Process Models

Instead of using random combinations or exhaustive search naively, lets look at what has done well in the past are repeat that. In other words predict regions of the hyper-parameter space in which we expect to get good results.

These models assume similar inputs give similar outputs.

Utility or Activation Functions help us formalise what the next best-guess is.

Expected Improvement Activation Function: This function, sometimes called the Acquisition function, looks at the variance of the gaussians and chooses the one w/ the most potential upside. Re: Explore vs Exploit paradox. Like a reckless investor since we don’t have much to loose.


All of these bayesian models need to start off of something, so they often do a few random samples and go bayesian from there. This is the valuable contribution from the auto-ml paper2 in that instead of doing random sampling for initialisation you can use smart samples from the meta learners.


  1. Bergstra, James, and Yoshua Bengio. “Random search for hyper-parameter optimization.” Journal of Machine Learning Research 13.Feb (2012): 281-305. 

  2. Feurer, Matthias, et al. “Efficient and robust automated machine learning.” Advances in Neural Information Processing Systems. 2015.  2

Permalink: black-box-optimization


Last edited by Alex Egg, 2018-03-28 10:43:51
View Revision History