Alex Egg,

The seminal paper by Goodfellow et. al outlines a system where a Generative Network $G(x)$ tries to fool a Discriminate Network $D(x)$ . By optimising both networks, they each will get better.

I think it is obvious why a better Discriminator is important, but I don’t think it’s clear why a better Generator is important:

It’s because probability distribution of data $P_{data}$ might be a very complicated distribution and very hard and intractable to infer. So, having a generative machine that could generate samples from $P_{data}$ without having to deal with nasty probability distribution is very nice. But can’t an auto encoder do the same thing?

The GAN paper is pretty open-ended on implementation details, so lets try it in tensorflow:

Implementation

By the definition of GAN, we need two nets. This could be anything, be it a sophisticated net like convnet or just a two layer neural net. Let’s be simple first and use a two layer nets for both of them:

# Discriminator Net
X = tf.placeholder(tf.float32, shape=[None, 784], name='X')

D_W1 = tf.Variable(xavier_init([784, 128]), name='D_W1')
D_b1 = tf.Variable(tf.zeros(shape=[128]), name='D_b1')

D_W2 = tf.Variable(xavier_init([128, 1]), name='D_W2')
D_b2 = tf.Variable(tf.zeros(shape=[1]), name='D_b2')

theta_D = [D_W1, D_W2, D_b1, D_b2]

# Generator Net
Z = tf.placeholder(tf.float32, shape=[None, 100], name='Z')

G_W1 = tf.Variable(xavier_init([100, 128]), name='G_W1')
G_b1 = tf.Variable(tf.zeros(shape=[128]), name='G_b1')

G_W2 = tf.Variable(xavier_init([128, 784]), name='G_W2')
G_b2 = tf.Variable(tf.zeros(shape=[784]), name='G_b2')

theta_G = [G_W1, G_W2, G_b1, G_b2]

def generator(z):
G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
G_prob = tf.nn.sigmoid(G_log_prob)

return G_prob

def discriminator(x):
D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
D_logit = tf.matmul(D_h1, D_W2) + D_b2
D_prob = tf.nn.sigmoid(D_logit)

return D_prob, D_logit


Above, generator(z) takes 100-dimensional vector and returns 786-dimensional vector, which is MNIST image (28x28). z here is the prior for the G(Z). In a way it learns a mapping between the prior space to Pdata.

The discriminator(x) takes MNIST image(s) and return a scalar which represents a probability of real MNIST image.

Now, let’s declare the Adversarial Process for training this GAN. Here’s the training algorithm from the paper: