生成模型 · Hao Chen's Blog

VAE
#

Settings and Loss
#

$$ x \underset{q_\phi(z|x)}{\xrightarrow{\text{enc.}}} z \underset{p_\theta(x|z)}{\xrightarrow{\text{dec.}}} \hat{x} $$

Encoder (approximate posterior): $q_{\phi}(z\mid x)$
Decoder (likelihood model): $p_{\theta}(x\mid z)$

AE: deterministic in $z, \hat{x}$, $f_\theta$ acts like a corset, $\mathcal{L}_{\text{recons.}} = |x - \hat{x}|^2$

$$ \mathcal{L}=-\mathbb{E}_{z \sim q_{\phi}(z\mid x)}\left[\log p_{\theta}(x\mid z)\right]+\beta \cdot \mathrm{KL}\bigl(q_{\phi}(z\mid x)|p(z)\bigr) $$

(If we do not have this KL term, it becomes stochastic AE.)

The first term is reconstruction negative log-likelihood, e.g. Cross Entropy for Softmax, MSE for Gaussian.

Prior: $ p(z)=\mathcal{N}(0,I) $

Reparameterized posterior: $ q_{\phi}(z\mid x) = \mu_{\phi}(x) + \sigma_{\phi}(x)\varepsilon, \quad \varepsilon \sim \mathcal{N}(0,I) $

Thus recons. term can be written as $$ \mathcal{L}_{\text{recons.}}=\mathbb{E}_{\varepsilon}\left[\log p_{\theta}\bigl(x \mid \mu_{\phi}(x)+\sigma_{\phi}(x)\varepsilon\bigr)\right] $$

The second term: $$ \beta > 1: \quad \text{information bottleneck } I(x;z), \ \text{(i.e. disentangle)}. $$

$$ \beta < 1: \quad \text{relieve posterior collapse } (q_{\phi}(z\mid x)\approx p(z)). $$

Generative view
#

Goal: $$ \begin{aligned} \max_{\theta}& \log p_{\theta}(x)\\ & =\log \int p_{\theta}(x\mid z)p(z)dz\\ & =\log \int\frac{p_{\theta}(x\mid z)p(z)}{q_{\phi}(z\mid x)}q_{\phi}(z\mid x)dz\\ & \overset{\text{Jensen}}{\ge}\mathbb{E}_{z\sim q_{\phi}(z\mid x)}\left[\log p_{\theta}(x\mid z)\right]-\mathrm{KL}\bigl(q_{\phi}(z\mid x)|p(z)\bigr) \triangleq \text{ELBO} \end{aligned} $$

Note there is a equation $$ \log p_{\theta}(x) = \text{ELBO} + \mathrm{KL}\bigl(q_{\phi}(z\mid x)|p(z\mid x)\bigr) $$

Maximizing ELBO is equivalent to minimizing $\mathrm{KL}\bigl(q_{\phi}(z\mid x)|p(z\mid x)\bigr)$.

Because $p_{\theta}(z\mid x)$ is difficult to compute deu to $\frac{1}{Z}$(by Bayes, $Z \approx p_{\theta}(x)$), so we use $q_{\phi}(z\mid x)$ to approximate it.

Intuition
#

$$ \begin{aligned} \nabla_{\theta}\log p_{\theta}(x) & = \nabla_{\theta}\log \int p(z) p_{\theta}(x\mid z)\\ & = \frac{\int p(z)\nabla_{\theta}p_{\theta}(x\mid z)dz}{\bm{p_{\theta}(x)}}\\ & = \int p_{\theta}(z\mid x)\nabla_{\theta}\log p_{\theta}(x\mid z)dz \end{aligned} $$

Note $\nabla_{\theta}\log \int p(z) p_{\theta}(x\mid z)\ge \mathbb{E}_{z\sim p(z)}\left[\nabla_{\theta}\log p_{\theta}(x\mid z)\right]$.

Without an encoder we still cannot obtain $p_{\theta}(z\mid x)$, then introduce $q_{\phi}(z\mid x)$.

The gap is two KL terms: $\mathrm{KL}\bigl(q_{\phi}|p_{\theta}(z\mid x)\bigr)-\mathrm{KL}\bigl(q_{\phi}|p(z)\bigr)$.

VAE#

Settings and Loss#

Generative view#

Intuition#

VAE
#

Settings and Loss
#

Generative view
#

Intuition
#