discriminative model: learns to determine whether a sample is from the model distribution or the data distribution
generative model: generate samples which look like from data distribution
pg(x): generator's distribution pz(z): prior distribution on input noize variable z G(z;θg): differentiable function represented by MLP with parameter θg D(x;θd): differentiable function represented by MLP with parameter θd that outputs a single scalar
The goal of the training is to maximize the probability to assign correct label for D, and to minimize the probability to assign correct label for G. Therefore, the objection is
If G and D have enough capacity, and each step of Algorithm 1, the discriminator is allowed to reach its optimum given G, and pg is updated so as to improve the criterion V(G,D), then pg converges to pdata
Since global minima can be attained by gradient descent and its global optima is attaied when pg=pdata, the conclusion follows.