Semi-Supervised and Active Learning

Alex Egg,

(Weakly supervised learning, Semi-supervised Learning)

Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.

Start w/ an initial weak classification model and tag all your images.

Run your images through a pre-trained CNN and cluster the embeddings. You can visually inspect the clusters and then tag whole clusters in 1 action than each image individually.

Then you take those new labels and put into the classifier and rerank the images.

Until accuracy is high enough

You kinda divide and conquer the dataset

You want to label data for food items. Start w/ “Strawberry”.

Start w/ weak classifier (pertained?)

  1. Run your classifier on the images and sort by most confident strawberry.
  2. Take top k images
  3. Retrain initial model on the subset of images (strawberries) from step #2.
  4. Repeat Step #1

Clustering on the embedding space also is a quick way to preform step #2

Instead of labeling all the data at once, AL prioritizes which labels are most confusing and sends only those for human annotation. The model then trains on the small amount labeled data, classifies, and then asks for human labels for most confused examples.

By prioritizing on the most confusing examples, the model can focus the experts on the hard challenges. This helps learning begin faster and with lower costs.

  1. Start with a pool of unlabeled data
  2. Pick a few points at random and get their labels
  3. Repeat
    1. Fit a classifier to the labels seen so far
    2. Query the unlabeled point that is closest to the boundary (or most uncertain, or most likely to decrease overall uncertainty,…)

Permalink: semi-supervised-and-active-learning


Last edited by Alex Egg, 2018-10-24 16:52:41
View Revision History