Machine Learning Models

Alex Egg,

Logistic Regression

Regularization

Regularization is the process of penalizing model complexity during training. There are two popular techniques: L1 Lasso or L2 Ridge:

Tradeoff: How do we value more: accuracy vs. complexity?

Decision Tree Classifier

Naive Bayes

Generative model. Common technique for text classification. Build probability distribution for each word in the corpus relative to the targets.

SVM

Ensembling

In practice, it turns out that instead choosing the best model out of a set of models, if we combine many various, the results are better — often much better — and at little extra effort.

Creating such model ensembles is now standard. In the simplest technique, called bagging, we simply generate radom variates of the training set by revamping, learn a classier on each, and combine the results by voting. This works because it greatly reduces variance who only slightly increasing bias. In boosting, training examples have weights, and thse are varied so that each new classifier focus on the exams the previous ones tended to get wrong. In stacking, the outputs of individual classifiers come the inputs of a “higher-level” larger that figure out how best to combine them.

Bagging

The theory bagging, in a nutshell, is that you have multiple models that you blend together to reduce variance and make your predictions more stable. That’s how a random forest works, it is a combination of n_estimators decision tree models that use majority voting (in the case of Random Forest Classifier) or straight averaging (in the case of Random Forest Regressor). Random Forests are called Bagging Meta-Estimators. Bagging reduces variance by introducing randomness into selection of the best feature to use at a particular split. Bagging estimators work best with when you have very deep levels on the decision trees. The randomness prevents overfitting the model on the training data.

In bagging, we use many overfitted classifiers (low bias but high variance) and do a bootstrap to reduce the variance.

Bagging is short for Bootstrap Aggregation. It uses several versions of the same model trained on slightly different samples of the training data to reduce variance without any noticeable effect on bias. Bagging could be computationally intensive esp. in terms of memory.

The intuition behind bagging is that averaging a set of observations reduces the variance. Given $Z_n$ observations w/ variance $\sigma^2$, the variance of the mean Z is given by $\sigma^2/n$. Hence a natural way to reduce the variance and hence increasing the predicting accuracy of a model is too take many samples of the training set and average the resulting predictions.

Examples: Random Forrest

Random Forrest Classifier

Boosting

In boosting, we allow many weak classifiers (high bias with low variance) to learn form their mistakes sequentially with the aim that they can correct their high bias problem while maintaining the low-variance property.

Examples: AdaBoost, GradientBoost

AdaBoost

Boosting relies on training several (simple, usually decision stumps) models successively each trying to learn from the errors of the models preceding it. Boosting differes from other algorithms in that in addition to it gives weights to training examples (as opposed to linear models which apply weights on features). So in essence we can weight the scarcer observations more heavily than the more populous ones. Boosting decreases bias and hardly affects variance (unless you are very sloppy). Depending on your n_estimators paramenter you are adding another inner-loop to your training step, so the the price of AdaBoost is an exensive jump in computational time and memory.

Bagging vs Boosting

Random Forest is bagging instead of boosting. In boosting, we allow many weak classifiers (high bias with low variance) to learn form their mistakes sequentially with the aim that they can correct their high bias problem while maintaining the low-variance property. In bagging, we use many overfitted classifiers (low bias but high variance) and do a bootstrap to reduce the variance.

Permalink: machine-learning-method-tradeoffs

Tags:

Last edited by Alex Egg, 2017-06-16 06:11:08
View Revision History